DETAILED ACTION
This action is in response to the application filed January 23, 2019 which claims priority to PRO 62/773,499 filed on November 30, 2018. Claims 1-20 are pending and have been considered.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statements (IDS) submitted on 01/23/2019 and 5/19/2020 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Objections
Claim 19 is objected to because of the following informalities: The letter "j" appears after the period of the last line.  Appropriate correction is required.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.


claim 1, 
Step 1 Analysis: Claim 1 is directed to a process, which falls within one of the four statutory categories. 
Step 2A Prong 1 Analysis: Claim 1 recites, in part, identifying a first set of object classes, adapting for use with a second set of object classes, performing a knowledge distillation process, and detect or classify one or more objects. The limitations of identifying a first set of object classes, adapting for use with a second set of object classes, performing a knowledge distillation process, and detect or classify one or more objects, as drafted, are processes that, under broadest reasonable interpretation, covers performance of the limitation in the mind which falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2 Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites the additional elements – “model”, “first set of object classes”, “second set of object classes” and “an adapted model”. These elements that are recited are only generally linked to the judicial exception. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.
Step 2B Analysis: The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of utilizing a model, a first set of object classes, a second set of object classes and an adapted model to perform the steps of the claimed process amount to no more than generally linking the elements to the judicial exception. The claim is not patent eligible. 

Regarding claim 2, the rejection of claim 1 is further incorporated, and further, the claim recites: wherein the model is a first model and adapting the first model for use with the second set of object classes different from the first set of object classes comprises: generating a second model to detect or classify the second set of object classes using a labeled set of data for the second set of object classes; and combining the first model and the second model using an unlabeled set of auxiliary data to generate the adapted model. This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 1 above.
The claim recites the additional elements “first model”, “second model”, and “unlabeled set of auxiliary data”, however they do not amount to an integration of the judicial exception into a practical application nor to significantly more than the judicial exception, for the reasons set forth in connection with the rejection of claim 1 above. The claim is not patent eligible. 

Regarding claim 3, the rejection of claim 2 is further incorporated, and further, the claim recites: wherein combining the first model and the second model to generate the adapted model using the unlabeled set of auxiliary data comprises: performing object detection or classification on the unlabeled set of auxiliary data using the first model to generate a first set of model outputs; performing object detection or classification on the unlabeled set of auxiliary data using the second model to generate a second set of model outputs; and combining the first model and the second model based on a loss function using the first and second sets of model outputs. This claim recites additional mental and mathematical steps in addition to the judicial exception identified in the rejection of claim 1, thus recites a judicial exception.
The claim recites the additional elements “first set of model outputs” and “second set of model outputs” however they do not amount to an integration of the judicial exception into a practical application nor to significantly more than the judicial exception, for the reasons set forth in connection with the rejection of claim 1 above. The claim is not patent eligible. 

Regarding claim 4, the rejection of claim 1 is further incorporated, and further, the claim recites: wherein retaining the detection or classification performance on the first set of object classes in the adapted model comprises: extracting a feature for each of a plurality of training samples for the first set of object classes in the model; generating, for a set of the training samples belonging to a same class in the first set of object classes, N clusters based on the extracted features; for each of the N clusters, selecting a training sample from the set of training samples that is a nearest-neighbor of a cluster centroid; and retaining the detection or classification performance on the first set of object classes. This claim recites additional mental steps in addition to the judicial exception identified in the rejection of claim 1, thus recites a judicial exception.
The claim recites the additional elements “a feature”, “set of training samples”, “N clusters”, and “cluster centroid”, however they do not amount to an integration of the judicial exception into a practical application nor to significantly more than the judicial exception, for the reasons set forth in connection with the rejection of claim 1 above. The claim is not patent eligible. 

Regarding claim 5, the rejection of claim 1 is further incorporated, and further, the claim recites: further comprising: in response to being unable to identify an object from the second set of object classes based on the model, receiving a label of the object, wherein adapting the model for use with the second set of object classes to generate the adapted model comprises adapting the model for use with the second set of object classes, the labeled object being one of the object classes in the second set. This claim recites additional mental steps in addition to the judicial exception identified in the rejection of claim 1, thus recites a judicial exception.
The claim recites the additional elements “an object” and “labeled object” however they do not amount to an integration of the judicial exception into a practical application nor to significantly more than the judicial exception, for the reasons set forth in connection with the rejection of claim 1 above. The claim is not patent eligible. 

Regarding claim 6, the rejection of claim 5 is further incorporated, and further, the claim recites: further comprising: searching for additional instances of objects in the object class of the labeled object based on the label, wherein adapting the model for use with the second set of object classes further comprises training the model using the additional instances of the objects. This claim recites additional mental steps in addition to the judicial exception identified in the rejection of claim 1, thus recites a judicial exception.
The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible. 

Regarding claim 7, the rejection of claim 1 is further incorporated, and further, the claim recites: further comprising using the adapted model to perform object classification. This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 1 above.
The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible. 

Regarding claim 8, 
Step 1 Analysis: Claim 8 is directed to a process, which falls within one of the four statutory categories. 
Step 2A Prong 1 Analysis: Claim 8 recites, in part, identifying a first set of object classes, adapting for use with a second set of object classes, performing a knowledge distillation process, and detect or classify one or more objects. The limitations of identifying a first set of object classes, adapting for use with a second set of object classes, performing a knowledge distillation process, and detect or classify one or more objects, as drafted, are processes that, under broadest reasonable interpretation, covers performance of the limitation in the mind which falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2 Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites the additional elements – “electronic device”, “memory”, “processor”, “model”, “first set of object classes”, “second set of object classes” and “an adapted model”. These elements that are recited are only generally linked to the judicial exception. The – “electronic device”, “memory”, and “processor” are recited at a high-level of generality (i.e. as a generic processor performing a generic computer function of generating an index) such that it amounts to no more than mere instructions to apply the exception using a generic computer component. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.
Step 2B Analysis: The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of utilizing a model, a first set of object classes, a second set of object classes and an adapted model to perform the steps of the claimed process amount to no more than generally linking the elements to the judicial exception. The electronic device, memory, and processor utilized to perform the claimed process amount to no more than mere instructions to apply an exception using a generic computer component. The claim is not patent eligible. 
Regarding claim 9, the rejection of claim 8 is further incorporated, and further, the claim recites: wherein the model is a first model and to adapt the first model for use with the second set of object classes different from the first set of object classes, generate a second model to detect or classify the second set of object classes using a labeled set of data for the second set of object classes; and combine the first model and the second model using an unlabeled set of auxiliary data to generate the adapted model. This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 8 above.
The claim recites the additional elements “first model”, “second model”, “unlabeled set of auxiliary data”, and “processor”, however they do not amount to an integration of the judicial exception into a practical application nor to significantly more than the judicial exception, for the reasons set forth in connection with the rejection of claim 8 above. The claim is not patent eligible. 

Regarding claim 10, the rejection of claim 9 is further incorporated, and further, the claim recites: wherein to combine the first model and the second model to generate the adapted model using the unlabeled set of auxiliary data, perform object detection or classification on the unlabeled set of auxiliary data using the first model to generate a first set of model outputs; perform object detection on the unlabeled set of auxiliary data using the second model to generate a second set of model outputs; and combine the first model and the second model based on a loss function using the first and second sets of model outputs. This claim recites additional mental and mathematical steps in addition to the judicial exception identified in the rejection of claim 8, thus recites a judicial exception.
The claim recites the additional elements “first set of model outputs” and “second set of model outputs”, and “processor”, however they do not amount to an integration of the judicial exception into a practical application nor to significantly more than the judicial exception, for the reasons set forth in connection with the rejection of claim 8 above. The claim is not patent eligible. 

Regarding claim 11, the rejection of claim 8 is further incorporated, and further, the claim recites: wherein to retain the detection or classification performance on the first set of object classes in the adapted model, extract a feature for each of a plurality of training samples for the first set of object classes in the model; generate, for a set of the training samples belonging to a same class in the first set of object classes, N clusters based on the extracted features; for each of the N clusters, select a training sample from the set of training samples that is a nearest-neighbor of a cluster centroid; and retain the detection or classification performance on the first set of object classes. This claim recites additional mental steps in addition to the judicial exception identified in the rejection of claim 8, thus recites a judicial exception.
The claim recites the additional elements “a feature”, “set of training samples”, “N clusters”, and “cluster centroid”, and “processor”, however they do not amount to an integration of the judicial exception into a practical application nor to significantly more than the judicial exception, for the reasons set forth in connection with the rejection of claim 8 above. The claim is not patent eligible. 
Regarding claim 12, the rejection of claim 8 is further incorporated, and further, the claim recites: in response to being unable to identify an object from the second set of object classes based on the model, receive a label of the object, wherein to adapt the model for use with the second set of object classes to generate the adapted model, adapt the model for use with the second set of object classes, the labeled object being one of the object classes in the second set. This claim recites additional mental steps in addition to the judicial exception identified in the rejection of claim 8, thus recites a judicial exception.
The claim recites the additional elements “an object”, “labeled object”, and “processor”, however they do not amount to an integration of the judicial exception into a practical application nor to significantly more than the judicial exception, for the reasons set forth in connection with the rejection of claim 8 above. The claim is not patent eligible. 

Regarding claim 13, the rejection of claim 12 is further incorporated, and further, the claim recites: search for additional instances of objects in the object class of the labeled object based on the label, and to adapt the model for use with the second set of object classes, to train the model using the additional instances of the objects in the object class of the labeled object. This claim recites additional mental steps in addition to the judicial exception identified in the rejection of claim 8, thus recites a judicial exception.
As noted above, the claim does recite the additional element “processor”, however it does not amount to an integration of the judicial exception into a practical application nor to significantly more than the judicial exception, for the reasons set forth in connection with the rejection of claim 8 above. The claim is not patent eligible.

Regarding claim 14, the rejection of claim 1 is further incorporated, and further, the claim recites: use the adapted model to perform object classification. This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 8 above.
As noted above, the claim does recite the additional element “processor”, however it does not amount to an integration of the judicial exception into a practical application nor to significantly more than the judicial exception, for the reasons set forth in connection with the rejection of claim 8 above. The claim is not patent eligible.

Regarding claim 15, 
Step 1 Analysis: Claim 15 is directed to a process, which falls within one of the four statutory categories. 
Step 2A Prong 1 Analysis: Claim 15 recites, in part, identifying a first set of object classes, adapting for use with a second set of object classes, performing a knowledge distillation process, and detect or classify one or more objects. The limitations of identifying a first set of object classes, adapting for use with a second set of object classes, performing a knowledge distillation process, and detect or classify one or more objects, as drafted, are processes that, under broadest reasonable interpretation, covers performance of the limitation in the mind which falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2 Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites the additional elements – “electronic device”, “non-transitory computer-readable medium”, “processor”, “model”, “first set of object classes”, “second set of object classes” and “an adapted model”. These elements that are recited are only generally linked to the judicial exception. The – “electronic device”, “non-transitory computer-readable medium”, and “processor” are recited at a high-level of generality (i.e. as a generic processor performing a generic computer function of generating an index) such that it amounts to no more than mere instructions to apply the exception using a generic computer component. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.
Step 2B Analysis: The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of utilizing a model, a first set of object classes, a second set of object classes and an adapted model to perform the steps of the claimed process amount to no more than generally linking the elements to the judicial exception. The electronic device, non-transitory computer-readable medium, and processor utilized to perform the claimed process amount to no more than mere instructions to apply an exception using a generic computer component. The claim is not patent eligible. 
Regarding claim 16, the rejection of claim 15 is further incorporated, and further, the claim recites: wherein the model is a first model that, when executed, to adapt the first model for use with the second set of object classes different from the first set of object classes, generate a second model to detect or classify the second set of object classes using a labeled set of data for the second set of object classes, and combine the first model and the second model using an unlabeled set of auxiliary data to generate the adapted model. This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 15 above.
The claim recites the additional elements “first model”, “second model”, “unlabeled set of auxiliary data”, “program code”, “electronic device”, and “processor”, however they do not amount to an integration of the judicial exception into a practical application nor to significantly more than the judicial exception, for the reasons set forth in connection with the rejection of claim 15 above. The claim is not patent eligible. 

Regarding claim 17, the rejection of claim 16 is further incorporated, and further, the claim recites: combine the first model and the second model to generate the adapted model using the unlabeled set of auxiliary data, perform object detection or classification on the unlabeled set of auxiliary data using the first model to generate a first set of model outputs; perform object detection or classification on the unlabeled set of auxiliary data using the second model to generate a second set of model outputs; and combine the first model and the second model based on a loss function using the first and second sets of model outputs. This claim recites additional mental and mathematical steps in addition to the judicial exception identified in the rejection of claim 15, thus recites a judicial exception.
The claim recites the additional elements “first set of model outputs” and “second set of model outputs”, “program code”, “electronic device”, and “processor”, however they do not amount to an integration of the judicial exception into a practical application nor to significantly more than the judicial exception, for the reasons set forth in connection with the rejection of claim 15 above. The claim is not patent eligible. 

Regarding claim 18, the rejection of claim 15 is further incorporated, and further, the claim recites: retain the detection or classification performance on the first set of object classes in the adapted model, extract a feature for each of a plurality of training samples for the first set of object classes in the model; generate, for a set of the training samples belonging to a same class in the first set of object classes, N clusters based on the extracted features; for each of the N clusters, select a training sample from the set of training samples that is a nearest-neighbor of a cluster centroid; and retain the detection or classification performance on the first set of object classes. This claim recites additional mental steps in addition to the judicial exception identified in the rejection of claim 15, thus recites a judicial exception.
The claim recites the additional elements “a feature”, “set of training samples”, “N clusters”, and “cluster centroid”, “program code”, “electronic device”, and “processor”, however they do not amount to an integration of the judicial exception into a practical application nor to significantly more than the judicial exception, for the reasons set forth in connection with the rejection of claim 15 above. The claim is not patent eligible. 
Regarding claim 19, the rejection of claim 15 is further incorporated, and further, the claim recites: in response to being unable to identify an object from the second set of object classes based on the model, receive a label of the object, adapt the model for use with the second set of object classes to generate the adapted model, adapt the model for use with the second set of object classes, the labeled object being one of the object classes in the second set. This claim recites additional mental steps in addition to the judicial exception identified in the rejection of claim 15, thus recites a judicial exception.
The claim recites the additional elements “an object”, “labeled object”, “program code”, “electronic device”, and “processor”, however they do not amount to an integration of the judicial exception into a practical application nor to significantly more than the judicial exception, for the reasons set forth in connection with the rejection of claim 15 above. The claim is not patent eligible. 

Regarding claim 20, the rejection of claim 19 is further incorporated, and further, the claim recites: search for additional instances of objects in the object class of the labeled object based on the label, adapt the model for use with the second set of object classes, to train the model using the additional instances of the objects in the object class of the labeled object. This claim recites additional mental steps in addition to the judicial exception identified in the rejection of claim 15, thus recites a judicial exception.
As noted above, the claim does recite the additional elements “program code”, “electronic device”, and “processor”, however they do not amount to an integration of the judicial exception into a practical application nor to significantly more than the judicial exception, for the reasons set forth in connection with the rejection of claim 15 above. The claim is not patent eligible.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1, 2, 5, 7-9, 12, 14-16, and 19  are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Shmelkov et al. ("Incremental Learning of Object Detectors without Catastrophic Forgetting" cited by Applicant in the IDS filed 01/23/2019, hereinafter "Shmelkov").

Regarding claim 1, Shmelkov discloses A method for incremental learning, the method comprising: 
identifying, via a model for object detection or classification, a first set of object classes the model is trained to detect or classify (“First, we train a Fast R-CNN to detect the original set of classes CA. We refer to this network as A(CA).” [pg. 3403, § 3.2 Dual-network Learning, ¶1; original classes corresponds to a first set of object classes.]); 
adapting the model for use with a second set of object classes different from the first set of object classes to generate an adapted model (“The interplay between the two networks A(CA) and B(CB) provides the necessary supervision that prevents the catastrophic forgetting in the absence of original training data used by A(CA). After the training of B(CB) is completed, we can add more classes by freezing the newly trained network and using it for distillation. We can thus add new classes sequentially. Since B(CB) is structurally identical to A(CA ∪ CB), the extension can be repeated to add more classes.” [pg. 3403, § 3.2 Dual-network learning, ¶1; new classes corresponds to a second set of object classes. A(CA ∪ CB) would be equivalent to an adapted model.]); 
retaining detection or classification performance on the first set of object classes in the adapted model by performing a knowledge distillation process for the model (“Specifically, we evaluate each new training sample on the frozen copy (Network A) to choose a diverse set of proposals (distillation proposals in Figure 2), and record their responses. With these responses in hand, we compute a distillation loss which measures the discrepancy between the two networks for the distillation proposals. This loss is added to the crossentropy loss on the new classes to make up the loss function for training the adapted detection network. As we show in the experimental evaluation, the distillation loss as well as the strategy to select the distillation proposals are critical in preserving the performance on the old classes (see §4)” [pg. 3402, § 3. Incremental learning of new classes, ¶2; Shmeltov discloses performing knowledge distillation on Network A and Network B is adapted to retain knowledge from Network A in Figure 2.]); and
using the adapted model to detect or classify one or more objects from the first set of object classes and one or more objects from the second set of object classes (“We create sibling (i.e., fully-connected) layers for new classes only and concatenate their outputs with the original ones. The new layers are initialized randomly in the same way as the corresponding layers in Fast R-CNN. Our goal is to train B(CB) to recognize classes CA ∪ CB using only new data and annotations for CB.” [pg. 3403, § 3.2 Dual-network learning, ¶1; Shmelkov discloses CA to be the original (i.e. old) classes and CB to be the set of new classes. “Trained to recognize” would correspond to using the network to classify or detect objects.]).

Regarding claim 2, Shmelkov discloses The method of claim 1, wherein the model is a first model and adapting the first model for use with the second set of object classes different from the first set of object classes (“First, we train a Fast R-CNN to detect the original set of classes CA. We refer to this network as A(CA). The goal now is to add a new set of classes CB to this.” [pg. 3403, § 3. Dual-network learning, ¶1, Network A corresponds to a first model.]) comprises:
 generating a second model to detect or classify the second set of object classes using a labeled set of data for the second set of object classes (“In the first experiment we take 19 classes in alphabetical order from the VOC dataset as CA, and the remaining one as the only new class CB. We then train the A(1-19) network on the VOC trainval subset containing any of the 19 classes, and the B(20) network is trained on the trainval subset containing the new class” [pg. 3404, § 4.3 Addition of one class, ¶3, See Table 1. Class 20 is labeled “tvmonitor” which corresponds to a labeled set of data in a second set of object classes.]); 
and combining the first model and the second model using an unlabeled set of auxiliary data to generate the adapted model (“We make two copies of A(CA): one that is frozen to recognize classes CA through distillation loss and the second B(CB) that is extended to detect the new classes CB, which were not present or at least not annotated in the source images. (note: see pg. 3404, § 4.1 Datasets and evaluation; Shmelkov discloses using PSACAL VOC 2007 and COCO challenge datasets which corresponds to auxiliary data) The extension is done only in the last fully connected layers, i.e., classification and bounding box regression. We create sibling (i.e., fully-connected) layers for new classes only and concatenate their outputs with the original ones. The new layers are initialized randomly in the same way as the corresponding layers in Fast R-CNN.” [pg. 3403, § 3.2 Dual-network learning, ¶1; Examiner is interpreting concatenating the outputs of Network A (original class) and Network B (new class) to be equivalent to combining and thus would produce an extended network (A(CA ∪ CB)) which corresponds to an adapted model.])



Regarding claim 5, Shmelkov discloses The method of Claim 1, further comprising: 
in response to being unable to identify an object from the second set of object classes based on the model, receiving a label of the object (“The second CNN (bottom) is an incrementally trained version of the first one for the category horse. In other words, the original network is adapted with images from only this new class. This adapted network localizes the horse in the image, but fails to detect the rider, which it was capable of originally, and despite the fact that the person class was not updated. In this paper, we present a method to alleviate this issue” [pg. 3400, § 1 Introduction, ¶2; Shmelkov discloses a method to solve the problem of a second network being unable to detect the original class in § 3.2 Dual-network learning, It is implicit that a label would be received for an object to perform this training method.]), 
wherein adapting the model for use with the second set of object classes to generate the adapted model comprises adapting the model for use with the second set of object classes, the labeled object being one of the object classes in the second set (“Each experiment begins with choosing a subset of classes to form the set CA. Then, a network is learned only on the subset of the training set composed of all the images containing at least one object from CA. Annotations for other classes in these images are ignored. With the new classes chosen to form the set CB, we learn the extended network as described in Section 3.2 with the subset of the training set containing at least one object from CB. As in the previous case, annotations of all the other classes, including those of the original classes CA, are ignored.” [pg. 3404, § 4.2 Implementation details, ¶1; note: See pg. 3407, Tables 6-9 discloses a labeled object in the second set.]).

Regarding claim 7, Shmelkov discloses The method of Claim 1, further comprising using the adapted model to perform object classification (“The extension is done only in the last fully connected layers, i.e., classification and bounding box regression.” [pg. 3403, § 3.2 Dual Network learning, ¶1]).

Regarding claim 8, Shmelkov discloses An electronic device for incremental learning, the electronic device comprising: 
a memory configured to store a model for object detection or classification; and 
a processor operably connected to the memory (“These paradigms, and in particular fine-tuning, a special case of transfer learning, are very popular in computer vision” [pg. 3401, § 2. Related Work, ¶2, Computer vision implies the use of processors and memory. See § Acknowledgments where Shmelkov discloses use of GPUs.]), the processor configured to:
identify, via the model for object detection or classification, a first set of object classes the model is trained to detect or classification (“First, we train a Fast R-CNN to detect the original set of classes CA. We refer to this network as A(CA).” [pg. 3403, § 3.2 Dual-network Learning, ¶1; original classes corresponds to a first set of object classes.]); 
adapt the model for use with a second set of object classes different from the first set of object classes to generate an adapted model (“The interplay between the two networks A(CA) and B(CB) provides the necessary supervision that prevents the catastrophic forgetting in the absence of original training data used by A(CA). After the training of B(CB) is completed, we can add more classes by freezing the newly trained network and using it for distillation. We can thus add new classes sequentially. Since B(CB) is structurally identical to A(CA ∪ CB), the extension can be repeated to add more classes.” [pg. 3403, § 3.2 Dual-network learning, ¶1; new classes corresponds to a second set of object classes. A(CA ∪ CB) would be equivalent to an adapted model.]); 
retain detection or classification performance on the first set of object classes in the adapted model by performing a knowledge distillation process for the model (“Specifically, we evaluate each new training sample on the frozen copy (Network A) to choose a diverse set of proposals (distillation proposals in Figure 2), and record their responses. With these responses in hand, we compute a distillation loss which measures the discrepancy between the two networks for the distillation proposals. This loss is added to the crossentropy loss on the new classes to make up the loss function for training the adapted detection network. As we show in the experimental evaluation, the distillation loss as well as the strategy to select the distillation proposals are critical in preserving the performance on the old classes (see §4)” [pg. 3402, § 3. Incremental learning of new classes, ¶2; Shmeltov discloses performing knowledge distillation on Network A and Network B is adapted to retain knowledge from Network A in Figure 2.]); and 
use the adapted model to detect or classify one or more objects from the first set of object classes and one or more objects from the second set of object classes (“We create sibling (i.e., fully-connected) layers for new classes only and concatenate their outputs with the original ones. The new layers are initialized randomly in the same way as the corresponding layers in Fast R-CNN. Our goal is to train B(CB) to recognize classes CA ∪ CB using only new data and annotations for CB.” [pg. 3403, § 3.2 Dual-network learning, ¶1; Shmelkov discloses CA to be the original (i.e. old) classes and CB to be the set of new classes. “Trained to recognize” would correspond to using the network to classify or detect objects.]).

Regarding claim 9, Shmelkov discloses The electronic device of claim 8, wherein the model is a first model and to adapt the first model for use with the second set of object classes different from the first set of object classes (“First, we train a Fast R-CNN to detect the original set of classes CA. We refer to this network as A(CA). The goal now is to add a new set of classes CB to this.” [pg. 3403, § 3. Dual-network learning, ¶1, Network A corresponds to a first model.]), the processor is further configured to: 
generate a second model to detect or classify the second set of object classes using a labeled set of data for the second set of object classes (“In the first experiment we take 19 classes in alphabetical order from the VOC dataset as CA, and the remaining one as the only new class CB. We then train the A(1-19) network on the VOC trainval subset containing any of the 19 classes, and the B(20) network is trained on the trainval subset containing the new class” [pg. 3404, § 4.3 Addition of one class, ¶3, See Table 1. Class 20 is labeled “tvmonitor” which corresponds to a labeled set of data in a second set of object classes.]); and 
combine the first model and the second model using an unlabeled set of auxiliary data to generate the adapted model (“We make two copies of A(CA): one that is frozen to recognize classes CA through distillation loss and the second B(CB) that is extended to detect the new classes CB, which were not present or at least not annotated in the source images. (note: see pg. 3404, § 4.1 Datasets and evaluation; Shmelkov discloses using PSACAL VOC 2007 and COCO challenge datasets which corresponds to auxiliary data) The extension is done only in the last fully connected layers, i.e., classification and bounding box regression. We create sibling (i.e., fully-connected) layers for new classes only and concatenate their outputs with the original ones. The new layers are initialized randomly in the same way as the corresponding layers in Fast R-CNN.” [pg. 3403, § 3.2 Dual-network learning, ¶1; Examiner is interpreting concatenating the outputs of Network A (original class) and Network B (new class) to be equivalent to combining and thus would produce an extended network (A(CA ∪ CB)) which corresponds to an adapted model.]).

Regarding claim 12, Shmelkov discloses The electronic device of Claim 8, wherein the processor is further configured to: 
in response to being unable to identify an object from the second set of object classes based on the model, receiving a label of the object (“The second CNN (bottom) is an incrementally trained version of the first one for the category horse. In other words, the original network is adapted with images from only this new class. This adapted network localizes the horse in the image, but fails to detect the rider, which it was capable of originally, and despite the fact that the person class was not updated. In this paper, we present a method to alleviate this issue” [pg. 3400, § 1 Introduction, ¶2; Shmelkov discloses a method to solve the problem of a second network being unable to detect the original class in § 3.2 Dual-network learning, It is implicit that a label would be received for an object to perform this training method.]), 
wherein to adapt the model for use with the second set of object classes to generate the adapted model, the processor is further configured to adapt the model for use with the second set of object classes, the labeled object being one of the object classes in the second set (“Each experiment begins with choosing a subset of classes to form the set CA. Then, a network is learned only on the subset of the training set composed of all the images containing at least one object from CA. Annotations for other classes in these images are ignored. With the new classes chosen to form the set CB, we learn the extended network as described in Section 3.2 with the subset of the training set containing at least one object from CB. As in the previous case, annotations of all the other classes, including those of the original classes CA, are ignored.” [pg. 3404, § 4.2 Implementation details, ¶1; note: See pg. 3407, Tables 6-9 discloses a labeled object in the second set.]).


	Regarding claim 14, Shmelkov discloses The electronic device of Claim 8, wherein the processor is further configured to use the adapted model to perform object classification (“The extension is done only in the last fully connected layers, i.e., classification and bounding box regression.” [pg. 3403, § 3.2 Dual Network learning, ¶1]).

Regarding claim 15, Shmelkov discloses A non-transitory, computer-readable medium comprising program code for incremental learning that, when executed by a processor of an electronic device, (“These paradigms, and in particular fine-tuning, a special case of transfer learning, are very popular in computer vision” [pg. 3401, § 2. Related Work, ¶2, Computer vision implies the use of processors and memory. See § Acknowledgments where Shmelkov discloses use of GPUs.]), causes the electronic device to: 
identify, via the model for object detection or classification, a first set of object classes the model is trained to detect or classification (“First, we train a Fast R-CNN to detect the original set of classes CA. We refer to this network as A(CA).” [pg. 3403, § 3.2 Dual-network Learning, ¶1; original classes corresponds to a first set of object classes.]); 
adapt the model for use with a second set of object classes different from the first set of object classes to generate an adapted model (“The interplay between the two networks A(CA) and B(CB) provides the necessary supervision that prevents the catastrophic forgetting in the absence of original training data used by A(CA). After the training of B(CB) is completed, we can add more classes by freezing the newly trained network and using it for distillation. We can thus add new classes sequentially. Since B(CB) is structurally identical to A(CA ∪ CB), the extension can be repeated to add more classes.” [pg. 3403, § 3.2 Dual-network learning, ¶1; new classes corresponds to a second set of object classes. A(CA ∪ CB) would be equivalent to an adapted model.]); 
retain detection or classification performance on the first set of object classes in the adapted model by performing a knowledge distillation process for the model (“Specifically, we evaluate each new training sample on the frozen copy (Network A) to choose a diverse set of proposals (distillation proposals in Figure 2), and record their responses. With these responses in hand, we compute a distillation loss which measures the discrepancy between the two networks for the distillation proposals. This loss is added to the crossentropy loss on the new classes to make up the loss function for training the adapted detection network. As we show in the experimental evaluation, the distillation loss as well as the strategy to select the distillation proposals are critical in preserving the performance on the old classes (see §4)” [pg. 3402, § 3. Incremental learning of new classes, ¶2; Shmeltov discloses performing knowledge distillation on Network A and Network B is adapted to retain knowledge from Network A in Figure 2.]); and 
use the adapted model to detect or classify one or more objects from the first set of object classes and one or more objects from the second set of object classes (“We create sibling (i.e., fully-connected) layers for new classes only and concatenate their outputs with the original ones. The new layers are initialized randomly in the same way as the corresponding layers in Fast R-CNN. Our goal is to train B(CB) to recognize classes CA ∪ CB using only new data and annotations for CB.” [pg. 3403, § 3.2 Dual-network learning, ¶1; Shmelkov discloses CA to be the original (i.e. old) classes and CB to be the set of new classes. “Trained to recognize” would correspond to using the network to classify or detect objects.]).

Regarding claim 16, Shmelkov discloses The non-transitory, computer-readable medium of claim 15, wherein the model is a first model and the program code that, when executed, causes the electronic device to adapt the first model for use with the second set of object classes different from the first set of object classes (“First, we train a Fast R-CNN to detect the original set of classes CA. We refer to this network as A(CA). The goal now is to add a new set of classes CB to this.” [pg. 3403, § 3. Dual-network learning, ¶1, Network A corresponds to a first model.]), comprises program code that, when executed by the processor, causes the electronic device to: 
generate a second model to detect or classify the second set of object classes using a labeled set of data for the second set of object classes (“In the first experiment we take 19 classes in alphabetical order from the VOC dataset as CA, and the remaining one as the only new class CB. We then train the A(1-19) network on the VOC trainval subset containing any of the 19 classes, and the B(20) network is trained on the trainval subset containing the new class” [pg. 3404, § 4.3 Addition of one class, ¶3, See Table 1. Class 20 is labeled “tvmonitor” which corresponds to a labeled set of data in a second set of object classes.]), and 
combine the first model and the second model using an unlabeled set of auxiliary data to generate the adapted model (“We make two copies of A(CA): one that is frozen to recognize classes CA through distillation loss and the second B(CB) that is extended to detect the new classes CB, which were not present or at least not annotated in the source images. (note: see pg. 3404, § 4.1 Datasets and evaluation; Shmelkov discloses using PSACAL VOC 2007 and COCO challenge datasets which corresponds to auxiliary data) The extension is done only in the last fully connected layers, i.e., classification and bounding box regression. We create sibling (i.e., fully-connected) layers for new classes only and concatenate their outputs with the original ones. The new layers are initialized randomly in the same way as the corresponding layers in Fast R-CNN.” [pg. 3403, § 3.2 Dual-network learning, ¶1; Examiner is interpreting concatenating the outputs of Network A (original class) and Network B (new class) to be equivalent to combining and thus would produce an extended network (A(CA ∪ CB)) which corresponds to an adapted model.]).

Regarding claim 19, Shmelkov discloses The non-transitory, computer-readable medium of Claim 15, further comprising program code that, when executed by the processor, causes the electronic device to: 
in response to being unable to identify an object from the second set of object classes based on the model, receive a label of the object (“The second CNN (bottom) is an incrementally trained version of the first one for the category horse. In other words, the original network is adapted with images from only this new class. This adapted network localizes the horse in the image, but fails to detect the rider, which it was capable of originally, and despite the fact that the person class was not updated. In this paper, we present a method to alleviate this issue” [pg. 3400, § 1 Introduction, ¶2; Shmelkov discloses a method to solve the problem of a second network being unable to detect the original class in § 3.2 Dual-network learning, It is implicit that a label would be received for an object to perform this training method.]), 
wherein the program code that, when executed, causes the electronic device to adapt the model for use with the second set of object classes to generate the adapted model, comprises program code that, when executed by the processor, causes the electronic device to adapt the model for use with the second set of object classes, the labeled object being one of the object classes in the second set (“Each experiment begins with choosing a subset of classes to form the set CA. Then, a network is learned only on the subset of the training set composed of all the images containing at least one object from CA. Annotations for other classes in these images are ignored. With the new classes chosen to form the set CB, we learn the extended network as described in Section 3.2 with the subset of the training set containing at least one object from CB. As in the previous case, annotations of all the other classes, including those of the original classes CA, are ignored.” [pg. 3404, § 4.2 Implementation details, ¶1; note: See pg. 3407, Tables 6-9 discloses a labeled object in the second set.]).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 3, 4, 10, 11, 17, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Shmelkov in view of Coates et al. ("Emergence of Object-Selective Features in Unsupervised Feature Learning", hereinafter "Coates").

Regarding claim 3, Shmelkov teaches The method of Claim 2, where Shmelkov further teaches wherein combining the first model and the second model to generate the adapted model using the unlabeled set of auxiliary data comprises: 
performing object detection or classification on the unlabeled set of auxiliary data using the second model to generate a second set of model outputs (“the second B(CB) that is extended to detect the new classes CB, which were not present or at least not annotated in the source images.” [pg. 3403, § 3.2 Dual-network learning, ¶1]); and
combining the first model and the second model based on a loss function using the first and second sets of model outputs (“The core of our approach is a loss function balancing the interplay between predictions on the new classes, i.e., cross-entropy loss, and a new distillation loss which minimizes the discrepancy between responses for old classes from the original and the new networks. The overall approach is illustrated in Figure 2.” [pg. 3401, top left col, ¶1]).
However Shmelkov fails to explicitly teach performing object detection or classification on the unlabeled set of auxiliary data using the first model to generate a first set of model outputs;
Coates teaches performing object detection or classification on the unlabeled set of auxiliary data using the first model to generate a first set of model outputs (“As described above, we ran our algorithm on patches harvested from YouTube thumbnails downloaded from the web. Specifically, we downloaded the thumbnails for over 1.4 million YouTube videos, some of which are shown in Figure 2b. These images were downsampled to 128-by-96 pixels and converted to grayscale. We cropped 57 million randomly selected 32-by-32 pixel patches from these images to form our unlabeled training set. No supervision was used—thus most patches contain partial views of objects or clutter at differing scales. We ran our algorithm on these images using a cluster of 30 machines over 3 days—virtually all of the time spent training the 150,000 second-layer features. We will now visualize these features and check whether any of them have learned to identify an object class.” [pg. 5, § 3 Experiments, ¶1]);
Shmelkov and Coates are both in the same field of endeavor of machine learning using image data. Shmelkov discloses an incremental learning method using dual networks. Coates discloses feature learning from unlabeled image data. Although Shmelkov discloses a first model that performs object detection, he fails to teach using unlabeled set of auxiliary data. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date to modify the first model disclosed by Shmelkov in order to train the model using the unlabeled auxiliary data taught by Coates. One would have been motivated to use unlabeled auxiliary data in order to train the model to detect more complex patterns. [pg. 1, §1 Introduction, Coates]

Regarding claim 4, Shmelkov teaches The method of Claim 1, where Smelkov further teaches wherein retaining the detection or classification performance on the first set of object classes in the adapted model comprises: 
extracting a feature for each of a plurality of training samples for the first set of object classes in the model (“In our variant of Fast R-CNN, we replaced the VGG-16 trunk with a deeper ResNet-50 component, which is faster and more accurate than VGG-16. We follow the suggestions in to combine Fast R-CNN and ResNet architectures. The network processes the whole image through a sequence of residual blocks. Before the last strided convolution layer we insert a RoI pooling layer, which performs maxpooling over regions of varied sizes, i.e., proposals, into a 7 × 7 feature map. Then we add the remaining residual blocks, a layer for average pooling over spatial dimensions, and two fully connected layers: a softmax layer for classification (PASCAL or COCO classes, for example, along with the background class) and a regression layer for bounding box refinement, with independent corrections for each class.” [pg. 3402, § 3.1 Object detection network, ¶2; note: It is implicit that feature extraction would have to occur as RoI pooling layer in a Fast R-CNN extracts a feature vector from a feature map.); 
retaining the detection or classification performance on the first set of object classes (“Using only the training samples for the new classes, we propose a method for not only adapting the old network to the new classes, but also ensuring performance on the old classes does not degrade.” [pg. 3401, top left col, ¶1; ensuring performance does not degrade would be equivalent to retaining.]).
Shmelkov fails to explicitly teach
generating, for a set of the training samples belonging to a same class in the first set of object classes, N clusters based on the extracted features; 
for each of the N clusters, selecting a training sample from the set of training samples that is a nearest-neighbor of a cluster centroid; and 
Coates teaches generating, for a set of the training samples belonging to a same class in the first set of object classes, N clusters based on the extracted features (“To construct the groups G, we will use a version of single-link agglomerative clustering to combine sets of features that have low dissimilarity according to d(k, l). To construct a single group G0 we begin by choosing a random simple cell filter, say D(k) , as the first member. We then search for candidate cells to be added to the group by computing d(k, l) for each simple cell filter D(l) and add D(l) to the group if d(k, l) is less than some limit τ” [pg. 3, § 2.2 Learning Invariant Features, ¶4; groups G would be equivalent to N clusters. Low dissimilarity is equivalent to samples belonging to same class.]); 
for each of the N clusters, selecting a training sample from the set of training samples that is a nearest-neighbor of a cluster centroid (“The algorithm then continues to expand G0 by adding any additional simple cells that are closer than τ to any one of the simple cells already in the group. This procedure continues until there are no more cells to be added, or until the diameter of the group (the dissimilarity between the two furthest cells in the group) reaches a limit ∆. See further: “We use τ = 0.3 for the first layer of complex cells and τ = 1.0 for the second layer. These were chosen by examining the typical distance between a filter D (k) and its nearest neighbor. We use ∆ = 1.5 > √ 2 so that a complex cell group may include orthogonal filters but cannot grow without limit.” [pg. 3, 2.2 Learning Invariant Features, ¶4])
Shmelkov and Coates are both in the same field of endeavor of machine learning using image data. Shmelkov discloses an incremental learning method using dual networks. Coates discloses feature learning from unlabeled image data. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify the incremental learning method disclosed by Shmelkov to include generating clusters and selecting a sample from the clusters as taught by Coates. One would have been motivated to use a K-means algorithm to generate a cluster of training samples to find features that are sensitive to commonly occurring object classes. [pg. 1, §Abstract, Coates]

Regarding claim 10, Shmelkov teaches The electronic device of Claim 9, wherein to combine the first model and the second model to generate the adapted model using the unlabeled set of auxiliary data, the processor is further configured to: 
perform object detection or classification on the unlabeled set of auxiliary data using the second model to generate a second set of model outputs (“the second B(CB) that is extended to detect the new classes CB, which were not present or at least not annotated in the source images.” [pg. 3403, § 3.2 Dual-network learning, ¶1]); and
combine the first model and the second model based on a loss function using the first and second sets of model outputs (“The core of our approach is a loss function balancing the interplay between predictions on the new classes, i.e., cross-entropy loss, and a new distillation loss which minimizes the discrepancy between responses for old classes from the original and the new networks. The overall approach is illustrated in Figure 2.” [pg. 3401, top left col, ¶1]).
However Shmelkov fails to explicitly teach perform object detection or classification on the unlabeled set of auxiliary data using the first model to generate a first set of model outputs;
Coates teaches perform object detection or classification on the unlabeled set of auxiliary data using the first model to generate a first set of model outputs (“As described above, we ran our algorithm on patches harvested from YouTube thumbnails downloaded from the web. Specifically, we downloaded the thumbnails for over 1.4 million YouTube videos, some of which are shown in Figure 2b. These images were downsampled to 128-by-96 pixels and converted to grayscale. We cropped 57 million randomly selected 32-by-32 pixel patches from these images to form our unlabeled training set. No supervision was used—thus most patches contain partial views of objects or clutter at differing scales. We ran our algorithm on these images using a cluster of 30 machines over 3 days—virtually all of the time spent training the 150,000 second-layer features. We will now visualize these features and check whether any of them have learned to identify an object class.” [pg. 5, § 3 Experiments, ¶1]);
Shmelkov and Coates are both in the same field of endeavor of machine learning using image data. Shmelkov discloses an incremental learning method using dual networks. Coates discloses feature learning from unlabeled image data. Although Shmelkov discloses a first model that performs object detection, he fails to teach using unlabeled set of auxiliary data. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date to modify the first model disclosed by Shmelkov in order to train the model using the unlabeled auxiliary data taught by Coates. One would have been motivated to use unlabeled auxiliary data in order to train the model to detect more complex patterns. [pg. 1, §1 Introduction, Coates]

Regarding claim 11, Shmelkov teaches The electronic device of Claim 8, where Shmelkov further teaches wherein to retain the detection or classification performance on the first set of object classes in the adapted model, the processor is further configured to: 
extract a feature for each of a plurality of training samples for the first set of object classes in the model (“In our variant of Fast R-CNN, we replaced the VGG-16 trunk with a deeper ResNet-50 component, which is faster and more accurate than VGG-16. We follow the suggestions in to combine Fast R-CNN and ResNet architectures. The network processes the whole image through a sequence of residual blocks. Before the last strided convolution layer we insert a RoI pooling layer, which performs maxpooling over regions of varied sizes, i.e., proposals, into a 7 × 7 feature map. Then we add the remaining residual blocks, a layer for average pooling over spatial dimensions, and two fully connected layers: a softmax layer for classification (PASCAL or COCO classes, for example, along with the background class) and a regression layer for bounding box refinement, with independent corrections for each class.” [pg. 3402, § 3.1 Object detection network, ¶2; note: It is implicit that feature extraction would have to occur as RoI pooling layer in a Fast R-CNN extracts a feature vector from a feature map.); 
retain the detection or classification performance on the first set of object classes (“Using only the training samples for the new classes, we propose a method for not only adapting the old network to the new classes, but also ensuring performance on the old classes does not degrade.” [pg. 3401, top left col, ¶1; ensuring performance does not degrade would be equivalent to retaining.]).
Shmelkov fails to explicitly teach
generate, for a set of the training samples belonging to a same class in the first set of object classes, N clusters based on the extracted features; 
for each of the N clusters, select a training sample from the set of training samples that is a nearest-neighbor of a cluster centroid; and 
Coates teaches generate, for a set of the training samples belonging to a same class in the first set of object classes, N clusters based on the extracted features (“To construct the groups G, we will use a version of single-link agglomerative clustering to combine sets of features that have low dissimilarity according to d(k, l). To construct a single group G0 we begin by choosing a random simple cell filter, say D(k) , as the first member. We then search for candidate cells to be added to the group by computing d(k, l) for each simple cell filter D(l) and add D(l) to the group if d(k, l) is less than some limit τ” [pg. 3, § 2.2 Learning Invariant Features, ¶4; groups G would be equivalent to N clusters. Low dissimilarity is equivalent to samples belonging to same class.]); 
for each of the N clusters, select a training sample from the set of training samples that is a nearest-neighbor of a cluster centroid (“The algorithm then continues to expand G0 by adding any additional simple cells that are closer than τ to any one of the simple cells already in the group. This procedure continues until there are no more cells to be added, or until the diameter of the group (the dissimilarity between the two furthest cells in the group) reaches a limit ∆. See further: “We use τ = 0.3 for the first layer of complex cells and τ = 1.0 for the second layer. These were chosen by examining the typical distance between a filter D (k) and its nearest neighbor. We use ∆ = 1.5 > √ 2 so that a complex cell group may include orthogonal filters but cannot grow without limit.” [pg. 3, 2.2 Learning Invariant Features, ¶4])
Shmelkov and Coates are both in the same field of endeavor of machine learning using image data. Shmelkov discloses an incremental learning method using dual networks. Coates discloses feature learning from unlabeled image data. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify the incremental learning method disclosed by Shmelkov to include generating clusters and selecting a sample from the clusters as taught by Coates. One would have been motivated to use a K-means algorithm to generate a cluster of training samples to find features that are sensitive to commonly occurring object classes. [pg. 1, §Abstract, Coates]

Regarding claim 17, Shmelkov teaches The non-transitory, computer-readable medium of Claim 16, where Shmelkov further teaches wherein the program code that, when executed, causes the electronic device to combine the first model and the second model to generate the adapted model using the unlabeled set of auxiliary data, comprises program code that, when executed by the processor, causes the electronic device to:
 perform object detection or classification on the unlabeled set of auxiliary data using the second model to generate a second set of model outputs (“the second B(CB) that is extended to detect the new classes CB, which were not present or at least not annotated in the source images.” [pg. 3403, § 3.2 Dual-network learning, ¶1]); and
combine the first model and the second model based on a loss function using the first and second sets of model outputs (“The core of our approach is a loss function balancing the interplay between predictions on the new classes, i.e., cross-entropy loss, and a new distillation loss which minimizes the discrepancy between responses for old classes from the original and the new networks. The overall approach is illustrated in Figure 2.” [pg. 3401, top left col, ¶1]).
However Shmelkov fails to explicitly teach perform object detection or classification on the unlabeled set of auxiliary data using the first model to generate a first set of model outputs;
Coates teaches perform object detection or classification on the unlabeled set of auxiliary data using the first model to generate a first set of model outputs (“As described above, we ran our algorithm on patches harvested from YouTube thumbnails downloaded from the web. Specifically, we downloaded the thumbnails for over 1.4 million YouTube videos, some of which are shown in Figure 2b. These images were downsampled to 128-by-96 pixels and converted to grayscale. We cropped 57 million randomly selected 32-by-32 pixel patches from these images to form our unlabeled training set. No supervision was used—thus most patches contain partial views of objects or clutter at differing scales. We ran our algorithm on these images using a cluster of 30 machines over 3 days—virtually all of the time spent training the 150,000 second-layer features. We will now visualize these features and check whether any of them have learned to identify an object class.” [pg. 5, § 3 Experiments, ¶1]);
Shmelkov and Coates are both in the same field of endeavor of machine learning using image data. Shmelkov discloses an incremental learning method using dual networks. Coates discloses feature learning from unlabeled image data. Although Shmelkov discloses a first model that performs object detection, he fails to teach using unlabeled set of auxiliary data. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date to modify the first model disclosed by Shmelkov in order to train the model using the unlabeled auxiliary data taught by Coates. One would have been motivated to use unlabeled auxiliary data in order to train the model to detect more complex patterns. [pg. 1, §1 Introduction, Coates]

Regarding claim 18, Shmelkov teaches The non-transitory, computer-readable medium of Claim 15, where Shmelkov further teaches wherein the program code that, when executed, causes the electronic device to retain the detection or classification performance on the first set of object classes in the adapted model, comprises program code that, when executed by the processor, causes the electronic device to: 
extract a feature for each of a plurality of training samples for the first set of object classes in the model (“In our variant of Fast R-CNN, we replaced the VGG-16 trunk with a deeper ResNet-50 component, which is faster and more accurate than VGG-16. We follow the suggestions in to combine Fast R-CNN and ResNet architectures. The network processes the whole image through a sequence of residual blocks. Before the last strided convolution layer we insert a RoI pooling layer, which performs maxpooling over regions of varied sizes, i.e., proposals, into a 7 × 7 feature map. Then we add the remaining residual blocks, a layer for average pooling over spatial dimensions, and two fully connected layers: a softmax layer for classification (PASCAL or COCO classes, for example, along with the background class) and a regression layer for bounding box refinement, with independent corrections for each class.” [pg. 3402, § 3.1 Object detection network, ¶2; note: It is implicit that feature extraction would have to occur as RoI pooling layer in a Fast R-CNN extracts a feature vector from a feature map.); 
retain the detection or classification performance on the first set of object classes (“Using only the training samples for the new classes, we propose a method for not only adapting the old network to the new classes, but also ensuring performance on the old classes does not degrade.” [pg. 3401, top left col, ¶1; ensuring performance does not degrade would be equivalent to retaining.]).
Shmelkov fails to explicitly teach
generate, for a set of the training samples belonging to a same class in the first set of object classes, N clusters based on the extracted features; 
for each of the N clusters, select a training sample from the set of training samples that is a nearest-neighbor of a cluster centroid; and 
Coates teaches generate, for a set of the training samples belonging to a same class in the first set of object classes, N clusters based on the extracted features (“To construct the groups G, we will use a version of single-link agglomerative clustering to combine sets of features that have low dissimilarity according to d(k, l). To construct a single group G0 we begin by choosing a random simple cell filter, say D(k) , as the first member. We then search for candidate cells to be added to the group by computing d(k, l) for each simple cell filter D(l) and add D(l) to the group if d(k, l) is less than some limit τ” [pg. 3, § 2.2 Learning Invariant Features, ¶4; groups G would be equivalent to N clusters. Low dissimilarity is equivalent to samples belonging to same class.]); 
for each of the N clusters, select a training sample from the set of training samples that is a nearest-neighbor of a cluster centroid (“The algorithm then continues to expand G0 by adding any additional simple cells that are closer than τ to any one of the simple cells already in the group. This procedure continues until there are no more cells to be added, or until the diameter of the group (the dissimilarity between the two furthest cells in the group) reaches a limit ∆. See further: “We use τ = 0.3 for the first layer of complex cells and τ = 1.0 for the second layer. These were chosen by examining the typical distance between a filter D (k) and its nearest neighbor. We use ∆ = 1.5 > √ 2 so that a complex cell group may include orthogonal filters but cannot grow without limit.” [pg. 3, 2.2 Learning Invariant Features, ¶4])
Shmelkov and Coates are both in the same field of endeavor of machine learning using image data. Shmelkov discloses an incremental learning method using dual networks. Coates discloses feature learning from unlabeled image data. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify the incremental learning method disclosed by Shmelkov to include generating clusters and selecting a sample from the clusters as taught by Coates. One would have been motivated to use a K-means algorithm to generate a cluster of training samples to find features that are sensitive to commonly occurring object classes. [pg. 1, §Abstract, Coates]


Claims 6, 13, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Shmelkov in view of Lin et al. ("Microsoft COCO: Common Objects in Context", hereinafter "Lin").

Regarding claim 6, Shmelkov The method of Claim 5, further comprising: however fails to explicitly teach searching for additional instances of objects in the object class of the labeled object based on the label, 
wherein adapting the model for use with the second set of object classes further comprises training the model using the additional instances of the objects.
Lin teaches searching for additional instances of objects in the object class of the labeled object based on the label (“
    PNG
    media_image1.png
    213
    201
    media_image1.png
    Greyscale
, In the next stage all instances of the object categories in an image were labeled, Fig. 3(b). In the previous stage each worker labeled one instance of a category, but multiple object instances may exist” [pg. 746, § Instance Spotting, ¶1; Red annotated arrow shows searching additional instances of the label “bottle”.]), 
wherein adapting the model for use with the second set of object classes further comprises training the model using the additional instances of the objects (“In the next stage all instances of the object categories in an image were labeled, Fig. 3(b). In the previous stage each worker labeled one instance of a category, but multiple object instances may exist. Therefore, for each image, a worker was asked to place a cross on top of each instance of a specific category found in the previous stage. To boost recall, the location of the instance found by a worker in the previous stage was shown to the current worker. Such priming helped workers quickly find an initial instance upon first seeing the image. The workers could also use a magnifying glass to find small instances. Each worker was asked to label at most 10 instances of a given category per image. Each image was labeled by 8 workers for a total of ∼10k worker hours.” [pg. 746, § Instance Spotting, ¶1]).
Shmelkov and Lin are both in the same field of endeavor of machine learning using image data. Shmelkov discloses an incremental learning method using dual networks. Lin discloses an image dataset that addresses common objects in context. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify the incremental learning method disclosed by Shmelkov by using the image dataset disclosed by Lin in order to train the models. One would have been motivated to search for multiple instances of an object to further train the model to have better accuracy in object detection. [pg. 743, § Object detection, ¶3, Lin]

	Regarding claim 13, Shmelkov teaches The electronic device of Claim 12, however fails to explicitly teach wherein: the processor is further configured to search for additional instances of objects in the object class of the labeled object based on the label, and 
to adapt the model for use with the second set of object classes, the processor is further configured to train the model using the additional instances of the objects in the object class of the labeled object.
Lin teaches wherein: the processor is further configured to search for additional instances of objects in the object class of the labeled object based on the label (“
    PNG
    media_image1.png
    213
    201
    media_image1.png
    Greyscale
, In the next stage all instances of the object categories in an image were labeled, Fig. 3(b). In the previous stage each worker labeled one instance of a category, but multiple object instances may exist” [pg. 746, § Instance Spotting, ¶1; Red annotated arrow shows searching additional instances of the label “bottle”.]), and 
to adapt the model for use with the second set of object classes, the processor is further configured to train the model using the additional instances of the objects in the object class of the labeled object (“In the next stage all instances of the object categories in an image were labeled, Fig. 3(b). In the previous stage each worker labeled one instance of a category, but multiple object instances may exist. Therefore, for each image, a worker was asked to place a cross on top of each instance of a specific category found in the previous stage. To boost recall, the location of the instance found by a worker in the previous stage was shown to the current worker. Such priming helped workers quickly find an initial instance upon first seeing the image. The workers could also use a magnifying glass to find small instances. Each worker was asked to label at most 10 instances of a given category per image. Each image was labeled by 8 workers for a total of ∼10k worker hours.” [pg. 746, § Instance Spotting, ¶1]).
Shmelkov and Lin are both in the same field of endeavor of machine learning using image data. Shmelkov discloses an incremental learning method using dual networks. Lin discloses an image dataset that addresses common objects in context. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify the incremental learning method disclosed by Shmelkov by using the image dataset disclosed by Lin in order to train the models. One would have been motivated to search for multiple instances of an object to further train the model to have better accuracy in object detection. [pg. 743, § Object detection, ¶3, Lin]

Regarding claim 20, Shmelkov teaches The non-transitory, computer-readable medium of Claim 19, however fails to explicitly teach further comprising program code that, when executed by the processor, causes the electronic device to: search for additional instances of objects in the object class of the labeled object based on the label, wherein the program code that, when executed, causes the electronic device to adapt the model for use with the second set of object classes, comprises program code that, when executed by the processor, causes the electronic device to train the model using the additional instances of the objects in the object class of the labeled object.
Lin teaches further comprising program code that, when executed by the processor, causes the electronic device to: 
search for additional instances of objects in the object class of the labeled object based on the label (“
    PNG
    media_image1.png
    213
    201
    media_image1.png
    Greyscale
, In the next stage all instances of the object categories in an image were labeled, Fig. 3(b). In the previous stage each worker labeled one instance of a category, but multiple object instances may exist” [pg. 746, § Instance Spotting, ¶1; Red annotated arrow shows searching additional instances of the label “bottle”.]), 
wherein the program code that, when executed, causes the electronic device to adapt the model for use with the second set of object classes, comprises program code that, when executed by the processor, causes the electronic device to train the model using the additional instances of the objects in the object class of the labeled object (“In the next stage all instances of the object categories in an image were labeled, Fig. 3(b). In the previous stage each worker labeled one instance of a category, but multiple object instances may exist. Therefore, for each image, a worker was asked to place a cross on top of each instance of a specific category found in the previous stage. To boost recall, the location of the instance found by a worker in the previous stage was shown to the current worker. Such priming helped workers quickly find an initial instance upon first seeing the image. The workers could also use a magnifying glass to find small instances. Each worker was asked to label at most 10 instances of a given category per image. Each image was labeled by 8 workers for a total of ∼10k worker hours.” [pg. 746, § Instance Spotting, ¶1]).
Shmelkov and Lin are both in the same field of endeavor of machine learning using image data. Shmelkov discloses an incremental learning method using dual networks. Lin discloses an image dataset that addresses common objects in context. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify the incremental learning method disclosed by Shmelkov by using the image dataset disclosed by Lin in order to train the models. One would have been motivated to search for multiple instances of an object to further train the model to have better accuracy in object detection. [pg. 743, § Object detection, ¶3, Lin]

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Raina et al. ("Self-taught Learning: Transfer Learning from Unlabeled Data") discloses a transfer learning algorithm using unlabeled datasets for supervised classification tasks.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL H HOANG whose telephone number is (571)272-8491.  The examiner can normally be reached on Mon-Fri 8:30AM-4:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571) 272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/M.H.H./Examiner, Art Unit 2122                                                                                                                                                                                                        




/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122