DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 25 April 2022 has been entered.

Response to Amendment
Applicant’s response, filed 25 April 2022, to the last office action has been entered and made of record. 
In response to the amendments to the claims, they are acknowledged, supported by the original disclosure, and no new matter is added.
Amendments to the independent claims 1, 11, 14, and 23 have necessitated a new ground of rejection over the applied prior art. Please see below for the updated interpretations and rejections.

Response to Arguments
Applicant’s arguments with respect to amended independent claims 1, 11, 14, and 23 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

CLAIM INTERPRETATION

The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitations are: “identification module”, “comparison module”, and “labeling module” in claims 14-18, 20, and 22.
Because these claim limitations are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, they are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 1-6, 8, 10, 11, 13, 14-18, 20, 22-23, and 25-29 are rejected under 35 U.S.C. 103 as being unpatentable over Funayama et al. (US 2010/0239123), herein Funayama, in view of Tripathi et al. (“Detecting Temporally Consistent Objects in Videos through Object Class Label Propagation”), herein Tripathi, and Gaidon (US 2017/0286774).
Regarding claim 1, Funayama discloses a method for labeling unlabeled frames within a sequence of frames, the method comprising:
-performing:
-	receiving an unlabeled frame from said unlabeled frames and a labeled frame from said sequence, said labeled frame being temporally close to said unlabeled frame within said sequence (see Funayama [0059], where video data comprising a plurality of images are obtained; see Funayama [0074], where labels of at least one object in a first image are obtained and may be propagated to subsequent image); 
-	identifying at least one labeled feature in said labeled frame (see Funayama [0075], where labels form the input map are associated with regions and keypoints in the first frame); 
-	identifying at least one potential feature referred as the first potential feature in said unlabeled frame (see Funayama [0062]-[0065], where histograms and keypoints for image regions are computed; see Funayama [0066] where keypoints may be associated subsequent regions in subsequent images); 
-	comparing said first potential feature with said at least one labeled feature (see Funayama [0065]-[0072], where keypoints associated with labeled regions are compared and matched with keypoints of a subsequent region in a subsequent frame); and 
-	applying a label to said unlabeled frame when said first potential feature matches said at least one said labeled feature, to thereby produce a newly labeled frame (see Funayama [0066], [0074]-[0077], where labels of at least one object associated with image regions in a first image are obtained and may be propagated to subsequent regions in a subsequent image based on matching keypoints between the labeled region and subsequent regions);
-	receiving a second unlabeled frame from said unlabeled frames, said second unlabeled frame being temporally close to said newly labeled frame (see Funayama [0076], where for subsequent frames, the keypoint label association will then be provided by the previous frame, which suggests that subsequent unlabeled frames of the image sequence are obtained); and 
-	identifying at least one labeled feature in said newly labeled frame (see Funayama [0075], where labels from the input map are associated with regions and keypoints in the first frame; see Funayama [0074]-[0076], where labels of at least one object in a first image are propagated to objects in the subsequent image, and that the keypoint label association will then be provided by the previous image; where the prior art teachings suggest that in propagating the labels to subsequent images, labels propagated to associated regions and keypoints of a previous image are identified to be propagated); 
-	identifying at least one potential feature referred as the second potential feature in said second unlabeled frame (see Funayama [0062]-[0065], where histograms and keypoints for image regions are computed; see Funayama [0066] where keypoints may be associated subsequent regions in subsequent images; see Funayama [0074]-[0076], where labels of at least one object in a first image are propagated to objects in the subsequent image, and that the keypoint label association will then be provided by the previous image; where the prior art teachings suggest that in propagating the labels to subsequent images, histograms and keypoints for image regions of the subsequent image are computed); 
-	comparing said second potential feature with said at least one labeled feature (see Funayama [0065]-[0072], where keypoints associated with labeled regions are compared and matched with keypoints of a subsequent region in a subsequent image; see Funayama [0074]-[0076], where labels of at least one object in a first image are propagated to objects in the subsequent image, and that the keypoint label association will then be provided by the previous frame; where the prior art teachings suggest that in propagating the labels to subsequent images, keypoints associated with identified labeled regions of the previous image are compared and matched with computed keypoints of a subsequent region in a subsequent frame); and 
-	applying a label to said second unlabeled frame when said second potential feature matches said at least one said labeled feature, to thereby produce a second newly labeled frame (see Funayama [0066], [0074]-[0077], where labels of at least one object associated with image regions in a first image are obtained and may be propagated to subsequent regions in a subsequent image based on matching keypoints between the labeled region and subsequent regions; see Funayama [0074]-[0076], where labels of at least one object in a first image are propagated to objects in the subsequent image, and that the keypoint label association will then be provided by the previous image; where the prior art teachings suggest that in propagating the labels to subsequent images, labels of the identified labeled regions of the previous image are propagated to subsequent regions in a subsequent image based on the matching keypoints between subsequent regions in a subsequent image and identified labeled regions of the previous image);
	-until said sequence has been processed (see Funayama [0060], which describes obtaining a set of keypoints for each of the plurality of images; and see Funayama [0074]-[0076], where labels of at least one object in a first image are propagated to objects in the subsequent image, and that the keypoint label association will then be provided by the previous frame; which suggests to one of ordinary skill in the art that label propagation is performed for all frames of the video image sequence when corresponding keypoints exists for each of the plurality of images).
Although Funayama describes various disclosed features in different embodiments, Funayama further discloses that combinations of features of different embodiments are within the scope of the disclosed invention as would be understood by one of ordinary skill in the art and any of the claimed embodiments can be used in any combination (see Funayama [0050]-[0051]). Thus, Funayama provides some teaching, suggestion, or motivation in the prior art that would have led one of ordinary skill to modify or to combine prior art reference teachings to arrive at the claimed invention. 
Funayama does not explicitly disclose that identifying at least one labeled feature in said labeled frame is by one of a neural network and a convolutional neural network; and that identifying at least one potential feature referred as the first potential feature in said unlabeled frame is by one of a neural network and a convolutional neural network. 
Tripathi teaches in a related and pertinent method and apparatus for extracting region proposals and identifying objects in a video sequence (see Tripathi Abstract), where a convolutional neural network based object detector model is used to identify objects proposals in the images and propagate the detected object label, where feature vectors are extracted from corresponding region proposals (see Tripathi sect. 4. Learning Video Object Detector Model and see Tripathi sect. 5.3 Object Label Propagation). 
At the time of filing, one of ordinary skill in the art would have found it obvious to apply the teachings of Tripathi to the teachings of Funayama, such that objects labels are identified in the image regions using a convolutional neural network based object detector model and allows for the propagation of identified corresponding labels of objects through the image sequence. This modification is rationalized as an application of a known technique to a known device ready for improvement to yield predictable results. In this instance, Funayama disclose a base method for performing label propagation through a sequence of video images based on corresponding image regions between labeled image regions and subsequent image regions. Tripathi teaches a known technique of performing object identification upon video images using a convolutional neural network based object detector model to identify objects in video images and propagate the detected object label. One of ordinary skill in the art would have recognized that by applying Tripathi’s technique would allow for the method of Funayama to identify objects among the plurality of image regions and allow for an improved label propagation method where labels of identified objects of interest are propagated through the image sequence.  
Although Funayama and Triparthi suggests the use of a convolutional neural network based object detector model to identify objects labels in the image regions (see Tripathi sect. 4. Learning Video Object Detector Model and see Tripathi sect. 5.3 Object Label Propagation); Funayama and Triparthi do not explicitly disclose that the identifying at least one labeled feature in said labeled frame is by a portion of available layers from one of a neural network and a convolutional neural network; and that the identifying at least one potential feature referred as the first potential feature in said unlabeled frame is by the portion of available layers.
Gaidon teaches in a related and pertinent system for online multi-class multi-object tracking based on applying video data to a neural network (see Gaidon Abstract), where the neural network includes an ordered sequence of supervised operations or layers that are trained on a set of labeled training objects such as images and their true labels and generates a prediction for a new unlabeled video whether detected objects in the frames match (see Gaidon [0022]-[0024]), where the neural network generates a probability / “association score” for a corresponding detected object in the current frame and matches the target object in the previous frame and then outputs a label regarding the prediction  where or not the target and detected objects belong to a matching object (see Gaidon [0037]-[0038]), and that the neural network learns a stack of filters in each of the layers to learn features that characterize the motion boundaries in the training images  and are used to make match predictions given tow bounding boxes extracted from pairwise images (see Gaidon [0041]-[0042]).
At the time of filing, one of ordinary skill in the art would have found it obvious to apply the teachings of Gaidon to the teachings of Funayama and Tripathi, such that objects labels are identified in the image regions using a convolutional neural network based object detector model in current and previous image frames and allows for the propagation of identified corresponding labels of objects through the image sequence is performed using the layers of the trained neural network. This modification is rationalized as an application of a known technique to a known device ready for improvement to yield predictable results. In this instance, Funayama and Tripathi disclose a base method for extracting features from corresponding region proposals and performing label propagation through a sequence of video images based on corresponding image regions between labeled image regions and subsequent image regions using a convolutional neural network based object detector. Gaidon teaches a known technique of performing online multi-class multi-object tracking based on a trained neural network, where the neural network learns a stack of filters in each of the layers to learn features that characterize the motion boundaries in the training images and are used to make match predictions given tow bounding boxes extracted from pairwise images. One of ordinary skill in the art would have recognized that by applying Gaidon’s technique would allow for the method of Funayama and Triparthi to identify object features in the image regions using layers of the trained convolutional neural network based object detector model in current and previous image frames and that the propagation of identified corresponding labels of objects through the image sequence is performed using the layers of the trained neural network and allow for an improved label propagation method where labels of identified objects of interest are propagated through the image sequence.  


Regarding claim 2, please see the above rejection of claim 1. Funayama, Tripathi, and Gaidon disclose the method according to claim 1, wherein:
said at least one labeled feature has a specific location within said labeled frame or newly labeled frame or newly labeled frame (see Funayama [0066], where keypoints may be associated with a labeled region); and 
said first potential feature or said second potential feature has a similar location within, respectively, said unlabeled frame or said second unlabeled frame (see Funayama [0066] where keypoints may be associated subsequent regions in subsequent images, and keypoints associated with labeled regions may be matched with keypoints associated with a subsequent region), 
said similar location being similar to said specific location of said at least one labeled feature within said labeled frame or said newly labeled frame (see Funayama [0066], where labeled regions may be matched with corresponding subsequent regions; see also Funayama [0067]-[0072], where regions are matched based on similarity measures); and 
wherein identifying at least one potential labeled feature is performed based on said similar location (see Funayama [0066]-[0072], where the matched keypoints between two frames can be used to match their corresponding regions, and the similarity measures are based on the matching of corresponding keypoints).

Regarding claim 3, please see the above rejection of claim 1. Funayama, Tripathi, and Gaidon disclose the method according to claim 1, wherein identifying at least one labeled feature comprises passing said labeled frame through an identification module to thereby identify at least one feature parameter of said at least one labeled feature (see Funayama [0062]-[0065], where histograms and keypoints for image regions are computed; see Funayama [0066] where keypoints may be associated with labeled regions).

Regarding claim 4, please see the above rejection of claim 3. Funayama, Tripathi, and Gaidon disclose the method according to claim 3, wherein identifying at least one potential feature comprises passing said unlabeled frame or said newly unlabeled frame through said identification module, to thereby identify at least one potential-feature parameter of, respectively, said first potential feature or second potential feature (see Funayama [0062]-[0065], where histograms and keypoints for image regions are computed; see Funayama [0066] where keypoints may be associated subsequent regions in subsequent images).

Regarding claim 5, please see the above rejection of claim 4. Funayama, Tripathi, and Gaidon disclose the method according to claim 4, wherein comparing said first potential feature or second potential feature comprises comparing said at least one potential-feature parameter to said at least one feature parameter (see Funayama [0065]-[0072], where keypoints associated with labeled regions are compared and matched with keypoints of a subsequent region in a subsequent frame).

Regarding claim 6, please see the above rejection of claim 5. Funayama, Tripathi, and Gaidon disclose the method according to claim 5, wherein said at least one potential-feature parameter matches said at least one feature parameter when at least one of the following occurs: 
- said potential-feature parameter is the same as said feature parameter; and 
- a difference between said potential-feature parameter and said feature parameter is within a margin of tolerance (see Funayama [0067]-[0072], where the matching is performed until the best adjacent group is found which satisfies a minimum inclusion similarity which is above a certain threshold and maximizes the overall appearance similarity between the regions).

Regarding claim 8, please see the above rejection of claim 4. Funayama, Tripathi, and Gaidon disclose the method according to claim 4, wherein said at least one feature parameter and said at least one potential feature parameter are numeric tensors (see Tripathi sect. 4. Learning Video Object Detector Model, where a convolutional neural network based object detector model is used to identify objects proposals in the images, which extract a 4096 dimensional feature vector for each region proposal,  and see Tripathi sect. 5.3 Object Label Propagation, where the detected object label is propagated through the video frames; Examiner notes that one of ordinary skill in the art would understand that tensors are a mathematical object which can be a generalizations of scalars, vectors, and matrices, please see below pertinent art section for corresponding evidence).

Regarding claim 10, please see the above rejection of claim 1. Funayama, Tripathi, and Gaidon disclose the method according to claim 1 being performed until all frames within said sequence have at least one label (see Funayama [0060], which describes obtaining a set of keypoints for each of the plurality of images; and see Funayama [0074]-[0076], where labels of at least one object in a first image are propagated to objects in the subsequent image, and that the keypoint label association will then be provided by the previous frame; which suggests to one of ordinary skill in the art that label propagation is performed for all frames of the video image sequence when corresponding keypoints exists for each of the plurality of images). 

Regarding claim 11, Funayama discloses a method for labeling an unlabeled frame within a sequence of frames, the method comprising: 
-performing: 
-	receiving an unlabeled frame from said unlabeled frames and a labeled frame from said sequence, said labeled frame being temporally close to said unlabeled frame within said sequence (see Funayama [0059], where video data comprising a plurality of images are obtained; see Funayama [0074], where labels of at least one object in a first image are obtained and may be propagated to subsequent image);
-	identifying at least one labeled feature in said labeled frame (see Funayama [0075], where labels form the input map are associated with regions and keypoints in the first frame); 
-	identifying a specific location of said at least one labeled feature within said labeled frame (see Funayama [0066], where keypoints may be associated with a labeled region); 
-	generating a specific signature of a specific region of said labeled frame, said specific region being based on said specific location (see Funayama [0062]-[0065], where histograms and keypoints for image regions are computed; see Funayama [0066] where keypoints may be associated with labeled regions); 
-	identifying a similar location within said unlabeled frame (see Funayama [0066] where keypoints may be associated subsequent regions in subsequent images, and keypoints associated with labeled regions may be matched with keypoints associated with a subsequent region), said similar location being a location within said unlabeled frame that is similar to said specific location within said labeled frame (see Funayama [0066], where labeled regions may be matched with corresponding subsequent regions; see also Funayama [0067]-[0072], where regions are matched based on similarity measures); 
- iteratively repeating (see Funayama [0074]-[0076], where labels of at least one object in a first image are propagated to objects in the subsequent images, and that the keypoint label association will then be provided by the previous frame; which suggests to one of ordinary skill in the art that label propagation is performed for all frames of the video image sequence and thus iteratively repeats the label propagation process):
-	generating a random trial signature of a random trial region of said unlabeled frame, said random trial region being based on said similar location (see Funayama [0062]-[0065], where histograms and keypoints for image regions are computed; see Funayama [0066] where keypoints may be associated subsequent regions in subsequent images; see Funayama [0066]-[0072], where the matched keypoints between two frames can be used to match their corresponding regions, and the similarity measures are based on the matching of corresponding keypoints); 
-	comparing said random trial signature to said specific signature (see Funayama [0065]-[0072], where keypoints associated with labeled regions are compared and matched with keypoints of a subsequent region in a subsequent frame); 
-	when said random trial signature matches said specific signature within a margin of tolerance, applying a label to said unlabeled frame, to thereby produce a newly labeled frame (see Funayama [0066], [0074]-[0077], where labels of at least one object associated with image regions in a first image are obtained and may be propagated to subsequent regions in a subsequent image based on matching keypoints between the labeled region and subsequent regions; see Funayama [0067]-[0072], where the matching is performed until the best adjacent group is found which satisfies a minimum inclusion similarity which is above a certain threshold and maximizes the overall appearance similarity between the regions)
- until an exit condition is met, wherein said exit condition is one of: 
- said random trial signature matches said specific signature within a margin of tolerance (see Funayama [0067]-[0072], where the matching is performed until the best adjacent group is found which satisfies a minimum inclusion similarity which is above a certain threshold and maximizes the overall appearance similarity between the regions); and 
- a predetermined number of iterations are performed.
Although Funayama describes various disclosed features in different embodiments, Funayama further discloses that combinations of features of different embodiments are within the scope of the disclosed invention as would be understood by one of ordinary skill in the art and any of the claimed embodiments can be used in any combination (see Funayama [0050]-[0051]). Thus, Funayama provides some teaching, suggestion, or motivation in the prior art that would have led one of ordinary skill to modify or to combine prior art reference teachings to arrive at the claimed invention. 
Funayama does not explicitly disclose that generating a specific signature of a specific region of said labeled frame is by one of a neural network and a convolutional neural network; and that generating a random trial signature of a random trial region of said unlabeled frame is by one of a neural network and a convolutional neural network. 
Tripathi teaches in a related and pertinent method and apparatus for extracting region proposals and identifying objects in a video sequence (see Tripathi Abstract), where a convolutional neural network based object detector model is used to identify objects proposals in the images and propagate the detected object label, where feature vectors are extracted from corresponding region proposals (see Tripathi sect. 4. Learning Video Object Detector Model and see Tripathi sect. 5.3 Object Label Propagation). 
At the time of filing, one of ordinary skill in the art would have found it obvious to apply the teachings of Tripathi to the teachings of Funayama, such that objects labels are identified in the image regions using a convolutional neural network based object detector model and allows for the propagation of identified corresponding labels of objects through the image sequence. This modification is rationalized as an application of a known technique to a known device ready for improvement to yield predictable results. In this instance, Funayama disclose a base method for performing label propagation through a sequence of video images based on corresponding image regions between labeled image regions and subsequent image regions. Tripathi teaches a known technique of performing object identification upon video images using a convolutional neural network based object detector model to identify objects in video images and propagate the detected object label. One of ordinary skill in the art would have recognized that by applying Tripathi’s technique would allow for the method of Funayama to identify objects among the plurality of image regions and allow for an improved label propagation method where labels of identified objects of interest are propagated through the image sequence.  
Although Funayama and Triparthi suggests the use of a convolutional neural network based object detector model to identify objects labels in the image regions (see Tripathi sect. 4. Learning Video Object Detector Model and see Tripathi sect. 5.3 Object Label Propagation); Funayama and Tripathi do not explicitly disclose that the generating a specific signature of a specific region of said labeled frame is by a portion of available layers from one of a neural network and a convolutional neural network; and that the generating a random trial signature of a random trial region of said unlabeled frame is by the portion of available layers. 
Gaidon teaches in a related and pertinent system for online multi-class multi-object tracking based on applying video data to a neural network (see Gaidon Abstract), where the neural network includes an ordered sequence of supervised operations or layers that are trained on a set of labeled training objects such as images and their true labels and generates a prediction for a new unlabeled video whether detected objects in the frames match (see Gaidon [0022]-[0024]), where the neural network generates a probability / “association score” for a corresponding detected object in the current frame and matches the target object in the previous frame and then outputs a label regarding the prediction  where or not the target and detected objects belong to a matching object (see Gaidon [0037]-[0038]), and that the neural network learns a stack of filters in each of the layers to learn features that characterize the motion boundaries in the training images  and are used to make match predictions given tow bounding boxes extracted from pairwise images (see Gaidon [0041]-[0042]).
At the time of filing, one of ordinary skill in the art would have found it obvious to apply the teachings of Gaidon to the teachings of Funayama and Tripathi, such that object features are extracted from corresponding image regions using a convolutional neural network based object detector model in current and previous image frames and allows for the detection of matching objects from the extracted features and that the propagation of identified corresponding labels of objects through the image sequence is performed using the layers of the trained neural network. This modification is rationalized as an application of a known technique to a known device ready for improvement to yield predictable results. In this instance, Funayama and Tripathi disclose a base method for extracting features from corresponding region proposals and performing label propagation through a sequence of video images based on corresponding image regions between labeled image regions and subsequent image regions using a convolutional neural network based object detector. Gaidon teaches a known technique of performing online multi-class multi-object tracking based on a trained neural network, where the neural network learns a stack of filters in each of the layers to learn features that characterize the motion boundaries in the training images  and are used to make match predictions given tow bounding boxes extracted from pairwise images. One of ordinary skill in the art would have recognized that by applying Gaidon’s technique would allow for the method of Funayama and Triparthi to extract and identify object features in the image regions using layers of the trained convolutional neural network based object detector model in current and previous image frames and that the propagation of identified corresponding labels of objects through the image sequence is performed using the layers of the trained neural network for determining matching object features between current and previous image frames and allow for an improved label propagation method where labels of identified objects of interest are propagated through the image sequence.  

Regarding claim 13, see above rejection for claim 11. It is a method claim reciting similar subject matter as claim 8. Please see above claim 8 for detailed claim analysis as the limitations of claim 13 are similarly rejected.

Regarding claim 14, Funayama discloses a system for labeling unlabeled frames within a sequence of frames, the system comprising: 
- an identification module (see Funayama [0080], where a programmable processor coupled to memory is disclosed to implement the disclosed teachings) for: 
- receive an unlabeled frame from said unlabeled frames and a labeled frame from said sequence, said labeled frame being temporally close to said unlabeled frame within said sequence (see Funayama [0059], where video data comprising a plurality of images are obtained; see Funayama [0074], where labels of at least one object in a first image are obtained and may be propagated to subsequent image); 
- identify at least one labeled feature in said labeled frame (see Funayama [0075], where labels form the input map are associated with regions and keypoints in the first frame); and 
- identify at least one potential feature in said unlabeled frame (see Funayama [0062]-[0065], where histograms and keypoints for image regions are computed; see Funayama [0066] where keypoints may be associated subsequent regions in subsequent images); 
- receive a second unlabeled frame from said unlabeled frames (see Funayama [0076], where for subsequent frames, the keypoint label association will then be provided by the previous frame, which suggests that subsequent unlabeled frames of the image sequence are obtained); 
- identify said at least one labeled feature in said newly labeled frame from a labeling module (see Funayama [0075], where labels from the input map are associated with regions and keypoints in the first frame; see Funayama [0074]-[0076], where labels of at least one object in a first image are propagated to objects in the subsequent image, and that the keypoint label association will then be provided by the previous image; where the prior art teachings suggest that in propagating the labels to subsequent images, labels propagated to associated regions and keypoints of a previous image are identified to be propagated); 
- identify at least one potential feature referred as the second potential feature in said second unlabeled frame (see Funayama [0062]-[0065], where histograms and keypoints for image regions are computed; see Funayama [0066] where keypoints may be associated subsequent regions in subsequent images; see Funayama [0074]-[0076], where labels of at least one object in a first image are propagated to objects in the subsequent image, and that the keypoint label association will then be provided by the previous image; where the prior art teachings suggest that in propagating the labels to subsequent images, histograms and keypoints for image regions of the subsequent image are computed); 
- a comparison module configured to: (see Funayama [0080], where a programmable processor coupled to memory is disclosed to implement the disclosed teachings) 
- compare said at least one potential feature to said at least one labeled feature(see Funayama [0065]-[0072], where keypoints associated with labeled regions are compared and matched with keypoints of a subsequent region in a subsequent frame); and 
- compare said second potential feature with said at least one labeled feature (see Funayama [0065]-[0072], where keypoints associated with labeled regions are compared and matched with keypoints of a subsequent region in a subsequent image; see Funayama [0074]-[0076], where labels of at least one object in a first image are propagated to objects in the subsequent image, and that the keypoint label association will then be provided by the previous frame; where the prior art teachings suggest that in propagating the labels to subsequent images, keypoints associated with identified labeled regions of the previous image are compared and matched with computed keypoints of a subsequent region in a subsequent frame); and
- the labeling module configure to: (see Funayama [0080], where a programmable processor coupled to memory is disclosed to implement the disclosed teachings) 
- apply a label to said unlabeled frame when said comparison module determines a match between at least one potential feature and at least one labeled feature, said labeling module thereby produce the newly labeled frame (see Funayama [0066], [0074]-[0077], where labels of at least one object associated with image regions in a first image are obtained and may be propagated to subsequent regions in a subsequent image based on matching keypoints between the labeled region and subsequent regions); and 
- apply a label to said second unlabeled frame when said second potential feature matches said at least one said labeled feature, to thereby produce a second newly labeled frame (see Funayama [0066], [0074]-[0077], where labels of at least one object associated with image regions in a first image are obtained and may be propagated to subsequent regions in a subsequent image based on matching keypoints between the labeled region and subsequent regions; see Funayama [0074]-[0076], where labels of at least one object in a first image are propagated to objects in the subsequent image, and that the keypoint label association will then be provided by the previous image; where the prior art teachings suggest that in propagating the labels to subsequent images, labels of the identified labeled regions of the previous image are propagated to subsequent regions in a subsequent image based on the matching keypoints between subsequent regions in a subsequent image and identified labeled regions of the previous image);
	- wherein said second unlabeled frame is temporally close to the newly labeled frame and wherein the identification module, the comparing module and the labeling module being further configured to process the sequence until a last frame thereof (see Funayama [0060], which describes obtaining a set of keypoints for each of the plurality of images; and see Funayama [0074]-[0076], where labels of at least one object in a first image are propagated to objects in the subsequent image, and that the keypoint label association will then be provided by the previous image; which suggests to one of ordinary skill in the art that label propagation is performed for all frames of the video image sequence when corresponding keypoints exists for each of the plurality of images, and that the previous image and subsequent image would be temporally adjacent and thus temporally close to each other).
Funayama does not explicitly disclose that identifying at least one labeled feature in said labeled frame is by one of a neural network and a convolutional neural network; that identifying at least one potential feature referred as the first potential feature in said unlabeled frame is by one of a neural network and a convolutional neural network; that identifying said at least one labeled feature in a newly labeled frame from a labeling module is by one of a neural network and a convolutional neural network; and that identifying at least one potential feature referred as the second potential feature in said second unlabeled frame is by one of a neural network and a convolutional neural network. 
Tripathi teaches in a related and pertinent method and apparatus for extracting region proposals and identifying objects in a video sequence (see Tripathi Abstract), where a convolutional neural network based object detector model is used to identify objects proposals in the images and propagate the detected object label, where feature vectors are extracted from corresponding region proposals (see Tripathi sect. 4. Learning Video Object Detector Model and see Tripathi sect. 5.3 Object Label Propagation).  
At the time of filing, one of ordinary skill in the art would have found it obvious to apply the teachings of Tripathi to the teachings of Funayama, such that objects labels are identified in the image regions using a convolutional neural network based object detector model and allows for the propagation of identified corresponding labels of objects through the image sequence. This modification is rationalized as an application of a known technique to a known device ready for improvement to yield predictable results. In this instance, Funayama disclose a base method for performing label propagation through a sequence of video images based on corresponding image regions between labeled image regions and subsequent image regions. Tripathi teaches a known technique of performing object identification upon video images using a convolutional neural network based object detector model to identify objects in video images and propagate the detected object label. One of ordinary skill in the art would have recognized that by applying Tripathi’s technique would allow for the method of Funayama to identify objects among the plurality of image regions and allow for an improved label propagation method where labels of identified objects of interest are propagated through the image sequence.  
Although Funayama and Triparthi suggests the use of a convolutional neural network based object detector model to identify objects labels in the image regions (see Tripathi sect. 4. Learning Video Object Detector Model and see Tripathi sect. 5.3 Object Label Propagation); Funayama and Triparthi do not explicitly disclose that the identifying at least one labeled feature in said labeled frame is by a portion of available layers from one of a neural network and a convolutional neural network; that the identifying at least one potential feature said unlabeled frame is by the portion of available layers; that identifying said at least one labeled feature in a newly labeled frame from a labeling module is by the portion of available layers; and that the identifying at least one potential feature referred as the second potential feature in said second unlabeled frame is by the portion of available layers. 
Gaidon teaches in a related and pertinent system for online multi-class multi-object tracking based on applying video data to a neural network (see Gaidon Abstract), where the neural network includes an ordered sequence of supervised operations or layers that are trained on a set of labeled training objects such as images and their true labels and generates a prediction for a new unlabeled video whether detected objects in the frames match (see Gaidon [0022]-[0024]), where the neural network generates a probability / “association score” for a corresponding detected object in the current frame and matches the target object in the previous frame and then outputs a label regarding the prediction  where or not the target and detected objects belong to a matching object (see Gaidon [0037]-[0038]), and that the neural network learns a stack of filters in each of the layers to learn features that characterize the motion boundaries in the training images  and are used to make match predictions given tow bounding boxes extracted from pairwise images (see Gaidon [0041]-[0042]).
At the time of filing, one of ordinary skill in the art would have found it obvious to apply the teachings of Gaidon to the teachings of Funayama and Tripathi, such that objects labels are identified in the image regions using a convolutional neural network based object detector model in current and previous image frames and allows for the propagation of identified corresponding labels of objects through the image sequence, including subsequently label propagated frames, be performed using the layers of the trained neural network. This modification is rationalized as an application of a known technique to a known device ready for improvement to yield predictable results. In this instance, Funayama and Tripathi disclose a base method for extracting features from corresponding region proposals and performing label propagation through a sequence of video images based on corresponding image regions between labeled image regions and subsequent image regions using a convolutional neural network based object detector. Gaidon teaches a known technique of performing online multi-class multi-object tracking based on a trained neural network, where the neural network learns a stack of filters in each of the layers to learn features that characterize the motion boundaries in the training images and are used to make match predictions given tow bounding boxes extracted from pairwise images. One of ordinary skill in the art would have recognized that by applying Gaidon’s technique would allow for the method of Funayama and Triparthi to identify object features in the image regions using layers of the trained convolutional neural network based object detector model in current and previous image frames and that the propagation of identified corresponding labels of objects through the image sequence is performed using the layers of the trained neural network and allow for an improved label propagation method where labels of identified objects of interest are propagated through the image sequence.  

Regarding claim 15, see above rejection for claim 14. It is a system claim reciting similar subject matter as claim 2. Please see above claim 2 for detailed claim analysis as the limitations of claim 15 are similarly rejected.

Regarding claim 16, see above rejection for claim 14. It is a system claim reciting similar subject matter as claim 4. Please see above claim 4 for detailed claim analysis as the limitations of claim 16 are similarly rejected.

Regarding claim 17, see above rejection for claim 14. It is a system claim reciting similar subject matter as claim 5. Please see above claim 5 for detailed claim analysis as the limitations of claim 17 are similarly rejected.

Regarding claim 18, see above rejection for claim 17. It is a system claim reciting similar subject matter as claim 6. Please see above claim 6 for detailed claim analysis as the limitations of claim 18 are similarly rejected.

Regarding claim 20, see above rejection for claim 14. It is a system claim reciting similar subject matter as claim 8. Please see above claim 8 for detailed claim analysis as the limitations of claim 20 are similarly rejected.

Regarding claim 22, see above rejection for claim 14. It is a system claim reciting similar subject matter as claim 10. Please see above claim 10 for detailed claim analysis as the limitations of claim 22 are similarly rejected.

Regarding claim 23, it recites a non-transitory computer-readable media for performing the method of claim 11. Funayama, Tripathi, and Gaidon teach a non-transitory computer-readable media performing the method of claim 11 (see Funayama [0080], where a programmable processor coupled to memory is disclosed to implement the disclosed teachings, where the memory may be RAM, ROM, etc.). Please see above for detailed claim analysis, with the exception to the following further limitations:
Please see the above rejection for claim 11, as the rationale to combine the teachings of Funayama’s different embodiments, Tripathi, and Gaidon are similar, mutatis mutandis.

Regarding claim 25, see above rejection for claim 23. It is a non-transitory computer-readable media claim reciting similar subject matter as claim 8. Please see above claim 8 for detailed claim analysis as the limitations of claim 25 are similarly rejected. 

Regarding claim 26, please see the above rejection of claim 1. Funayama, Tripathi, and Gaidon disclose the method of claim 1, wherein a plurality of labeled features comprises said at least one labeled feature (see Funayama [0075], where labels from the input map are associated with regions and keypoints in the first frame; see Funayama [0074]-[0076], where labels of at least one object in a first image are propagated to objects in the subsequent image; where the prior art teachings suggest that a plurality of keypoints are computed for corresponding plurality of labels) and a plurality of potential features comprises said first potential feature and said second potential feature (see Funayama [0062]-[0065], where histograms and keypoints for image regions are computed; see Funayama [0066] where keypoints may be associated subsequent regions in subsequent images; where the prior art teachings suggest that a plurality of histograms and keypoints for image regions of the image sequence are computed, comprising the first and subsequent images), performing until said sequence has been processed further comprising performing for each of said plurality labeled features and for each of said plurality of potential features (see Funayama [0060], which describes obtaining a set of keypoints for each of the plurality of images; and see Funayama [0074]-[0076], where labels of at least one object in a first image are propagated to objects in the subsequent image, and that the keypoint label association will then be provided by the previous frame; which suggests to one of ordinary skill in the art that label propagation is performed for all of the frames of the video image sequence when corresponding keypoints exists for each of the plurality of images).

Regarding claim 27, please see the above rejection of claim 11. Funayama, Tripathi, and Gaidon disclose the method of claim 11, wherein a plurality of labeled features comprises said at least one labeled feature (see Funayama [0075], where labels from the input map are associated with regions and keypoints in the first frame; see Funayama [0074]-[0076], where labels of at least one object in a first image are propagated to objects in the subsequent image; where the prior art teachings suggest that a plurality of keypoints are computed for corresponding plurality of labels), performing until said sequence has been processed further comprising performing for each of said plurality labeled features (see Funayama [0060], which describes obtaining a set of keypoints for each of the plurality of images; and see Funayama [0074]-[0076], where labels of at least one object in a first image are propagated to objects in the subsequent image, and that the keypoint label association will then be provided by the previous frame; which suggests to one of ordinary skill in the art that label propagation is performed for all of the frames of the video image sequence when corresponding keypoints exists for each of the plurality of images).

Regarding claim 28, please see the above rejection of claim 14. Funayama, Tripathi, and Gaidon disclose the system of claim 14, wherein a plurality of labeled features comprises said at least one labeled feature (see Funayama [0075], where labels from the input map are associated with regions and keypoints in the first frame; see Funayama [0074]-[0076], where labels of at least one object in a first image are propagated to objects in the subsequent image; where the prior art teachings suggest that a plurality of keypoints are computed for corresponding plurality of labels) and a plurality of potential features comprises said first potential feature and said second potential feature (see Funayama [0062]-[0065], where histograms and keypoints for image regions are computed; see Funayama [0066] where keypoints may be associated subsequent regions in subsequent images; where the prior art teachings suggest that a plurality of histograms and keypoints for image regions of the image sequence are computed, comprising the first and subsequent images), performing for each of said plurality labeled features and for each of said plurality of potential features (see Funayama [0060], which describes obtaining a set of keypoints for each of the plurality of images; and see Funayama [0074]-[0076], where labels of at least one object in a first image are propagated to objects in the subsequent image, and that the keypoint label association will then be provided by the previous frame; which suggests to one of ordinary skill in the art that label propagation is performed for all of the frames of the video image sequence when corresponding keypoints exists for each of the plurality of images).

Regarding claim 29, please see the above rejection of claim 23. Funayama, Tripathi, and Gaidon disclose the non-transitory computer-readable media of claim 23, wherein a plurality of labeled features comprises said at least one labeled feature (see Funayama [0075], where labels from the input map are associated with regions and keypoints in the first frame; see Funayama [0074]-[0076], where labels of at least one object in a first image are propagated to objects in the subsequent image; where the prior art teachings suggest that a plurality of keypoints are computed for corresponding plurality of labels), performing until said sequence has been processed further comprising performing for each of said plurality labeled features (see Funayama [0060], which describes obtaining a set of keypoints for each of the plurality of images; and see Funayama [0074]-[0076], where labels of at least one object in a first image are propagated to objects in the subsequent image, and that the keypoint label association will then be provided by the previous frame; which suggests to one of ordinary skill in the art that label propagation is performed for all of the frames of the video image sequence when corresponding keypoints exists for each of the plurality of images).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TIMOTHY WING HO CHOI whose telephone number is (571)270-3814. The examiner can normally be reached 9:00 AM to 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, VINCENT RUDOLPH can be reached on (571) 272-8243. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/TIMOTHY CHOI/Examiner, Art Unit 2661                                                                                                                                                                                                        

/VINCENT RUDOLPH/Supervisory Patent Examiner, Art Unit 2661