DETAILED ACTION
This action is in response to the claims filed 11/29/2021 for application 16/255,737. Claims 1, 3-8, 10-15, and 17-20 have been amended, claims 2, 9, and 16 have been cancelled, and claims 21-23 have been added. Claims 1, 3-8, 10-15, and 17-23 are currently pending. 

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 11/29/2021 has been entered.
 Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 2, 5, 7-9, 12, 14-16, 19 and 21-23  are rejected under 35 U.S.C. 103 as being unpatentable over Shmelkov et al. ("Incremental Learning of Object Detectors without Catastrophic Forgetting" cited by Applicant in the IDS filed 01/23/2019, hereinafter "Shmelkov") in view of Lin et al. ("Feature Pyramid Networks for Object Detection", hereinafter "Lin1").

Regarding claim 1, Shmelkov teaches A method, the method comprising: 
identifying, via a first machine learning (ML) model for object detection or classification, a first set of object classes that the first ML model is trained to detect or classify (“First, we train a Fast R-CNN to detect the original set of classes CA. We refer to this network as A(CA).” [pg. 3403, § 3.2 Dual-network Learning, ¶1; original classes corresponds to a first set of object classes.]); 
generating a second ML model to detect or classify the second set of object classes different from the first set of object classes using a labeled set of data for the second set of object classes (“In the first experiment we take 19 classes in alphabetical order from the VOC dataset as CA, and the remaining one as the only new class CB. We then train the A(1-19) network on the VOC trainval subset containing any of the 19 classes, and the B(20) network is trained on the trainval subset containing the new class” [pg. 3404, § 4.3 Addition of one class, ¶3, See Table 1. Class 20 is labeled “tvmonitor” which corresponds to a labeled set of data in a second set of object classes.]);
adapting the first ML model for use with the second set of object classes (“First, we train a Fast R-CNN to detect the original set of classes CA. We refer to this network as A(CA). The goal now is to add a new set of classes CB to this.” [pg. 3403, § 3. Dual-network learning, ¶1, Network A corresponds to a first model.])
by combining the first ML model and the second ML model using an unlabeled set of auxiliary data to generate the adapted ML model (“We make two copies of A(CA): one that is frozen to recognize classes CA through distillation loss and the second B(CB) that is extended to detect the new classes CB, which were not present or at least not annotated in the source images. (note: see pg. 3404, § 4.1 Datasets and evaluation; Shmelkov discloses using PSACAL VOC 2007 and COCO challenge datasets which corresponds to auxiliary data) The extension is done only in the last fully connected layers, i.e., classification and bounding box regression. We create sibling (i.e., fully-connected) layers for new classes only and concatenate their outputs with the original ones. The new layers are initialized randomly in the same way as the corresponding layers in Fast R-CNN.” [pg. 3403, § 3.2 Dual-network learning, ¶1; Examiner is interpreting concatenating the outputs of Network A (original class) and Network B (new class) to be equivalent to combining and thus would produce an extended network (A(CA ∪ CB)) which corresponds to an adapted model.])
retaining detection or classification performance on the first set of object classes in the ML adapted model by performing a knowledge distillation process on the first ML model (“Specifically, we evaluate each new training sample on the frozen copy (Network A) to choose a diverse set of proposals (distillation proposals in Figure 2), and record their responses. With these responses in hand, we compute a distillation loss which measures the discrepancy between the two networks for the distillation proposals. This loss is added to the crossentropy loss on the new classes to make up the loss function for training the adapted detection network. As we show in the experimental evaluation, the distillation loss as well as the strategy to select the distillation proposals are critical in preserving the performance on the old classes (see §4)” [pg. 3402, § 3. Incremental learning of new classes, ¶2; Shmelkov discloses performing knowledge distillation on Network A and Network B is adapted to retain knowledge from Network A in Figure 2.]), wherein the knowledge distillation process comprises using a network to extract one or more features for each of a plurality of training samples for the first set of object classes in the first ML model (“In our variant of Fast R-CNN, we replaced the VGG-16 trunk with a deeper ResNet-50 component, which is faster and more accurate than VGG-16. We follow the suggestions in to combine Fast R-CNN and ResNet architectures. The network processes the whole image through a sequence of residual blocks. Before the last strided convolution layer we insert a RoI pooling layer, which performs maxpooling over regions of varied sizes, i.e., proposals, into a 7 × 7 feature map. Then we add the remaining residual blocks, a layer for average pooling over spatial dimensions, and two fully connected layers: a softmax layer for classification (PASCAL or COCO classes, for example, along with the background class) and a regression layer for bounding box refinement, with independent corrections for each class.” [pg. 3402, § 3.1 Object detection network, ¶2; note: It is implicit that feature extraction would have to occur as RoI pooling layer in a Fast R-CNN extracts a feature vector from a feature map.); and
using the adapted ML model to detect or classify one or more objects from the first set of object classes and one or more objects from the second set of object classes (“We create sibling (i.e., fully-connected) layers for new classes only and concatenate their outputs with the original ones. The new layers are initialized randomly in the same way as the corresponding layers in Fast R-CNN. Our goal is to train B(CB) to recognize classes CA ∪ CB using only new data and annotations for CB.” [pg. 3403, § 3.2 Dual-network learning, ¶1; Shmelkov discloses CA to be the original (i.e. old) classes and CB to be the set of new classes. “Trained to recognize” would correspond to using the network to classify or detect objects.]).
However Shmelkov fails to explicitly teach using a feature pyramid network.
Lin1 teaches using a feature pyramid network (“Fast R-CNN is a region-based object detector in which Region-of-Interest (RoI) pooling is used to extract features. Fast R-CNN is most commonly performed on a single-scale feature map. To use it with our FPN, we need to assign RoIs of different scales to the pyramid levels… We view our feature pyramid as if it were produced from an image pyramid. Thus we can adapt the assignment strategy of region-based detectors in the case when they are run on image pyramids” [pg. 4, § 4.2 Feature Pyramid Networks for Fast R-CNN])
Shmelkov and Lin1 are both in the same field of endeavor of object detection. Shmelkov discloses an incremental learning method using dual networks. Lin1 discloses feature pyramid networks for performing object detection. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the Fast R-CNN object detector disclosed by Shmelkov with the feature pyramid map as taught by Lin1. Implementing a feature pyramid network with a Fast R-CNN would be a feasible combination as disclosed by Lin1 (§4.2). Thus, one would have been motivated to make this modification in order to detect objects at different scales more efficient and accurate. [pg. 2, § 1 Introduction, ¶7-8, Lin1]

Regarding claim 5, the combination of Shmelkov and Lin1 teaches The method of claim 1, where Shmelkov further teaches further comprising: 
in response to being unable to identify an object from the second set of object classes based on the first ML model, receiving a label of the object (“The second CNN (bottom) is an incrementally trained version of the first one for the category horse. In other words, the original network is adapted with images from only this new class. This adapted network localizes the horse in the image, but fails to detect the rider, which it was capable of originally, and despite the fact that the person class was not updated. In this paper, we present a method to alleviate this issue” [pg. 3400, § 1 Introduction, ¶2; Shmelkov discloses a method to solve the problem of a second network being unable to detect the original class in § 3.2 Dual-network learning, It is implicit that a label would be received for an object to perform this training method.]), 
wherein adapting the first ML model for use with the second set of object classes to generate the adapted ML model comprises adapting the first ML model for use with the second set of object classes, the labeled object being one of the object classes in the second set (“Each experiment begins with choosing a subset of classes to form the set CA. Then, a network is learned only on the subset of the training set composed of all the images containing at least one object from CA. Annotations for other classes in these images are ignored. With the new classes chosen to form the set CB, we learn the extended network as described in Section 3.2 with the subset of the training set containing at least one object from CB. As in the previous case, annotations of all the other classes, including those of the original classes CA, are ignored.” [pg. 3404, § 4.2 Implementation details, ¶1; note: See pg. 3407, Tables 6-9 discloses a labeled object in the second set.]).

Regarding claim 7, the combination of Shmelkov and Lin1 teaches The method of claim 1, where Shmelkov further teaches further comprising using the adapted ML model to perform object classification (“The extension is done only in the last fully connected layers, i.e., classification and bounding box regression.” [pg. 3403, § 3.2 Dual Network learning, ¶1]).


Regarding claim 8, Shmelkov teaches An electronic device comprising: 
a memory configured to store a first machine learning (ML)model for object detection or classification; and 
a processor operably connected to the memory (“These paradigms, and in particular fine-tuning, a special case of transfer learning, are very popular in computer vision” [pg. 3401, § 2. Related Work, ¶2, Computer vision implies the use of processors and memory. See § Acknowledgments where Shmelkov discloses use of GPUs.]), the processor configured to:
identify, via the first (ML) model, a first set of object classes that the first ML model is trained to detect or classify (“First, we train a Fast R-CNN to detect the original set of classes CA. We refer to this network as A(CA).” [pg. 3403, § 3.2 Dual-network Learning, ¶1; original classes corresponds to a first set of object classes.]); 
generate a second ML model to detect or classify the second set of object classes different from the first set of object classes using a labeled set of data for the second set of object classes (“In the first experiment we take 19 classes in alphabetical order from the VOC dataset as CA, and the remaining one as the only new class CB. We then train the A(1-19) network on the VOC trainval subset containing any of the 19 classes, and the B(20) network is trained on the trainval subset containing the new class” [pg. 3404, § 4.3 Addition of one class, ¶3, See Table 1. Class 20 is labeled “tvmonitor” which corresponds to a labeled set of data in a second set of object classes.]);
adapt the first ML model for use with the second set of object classes (“First, we train a Fast R-CNN to detect the original set of classes CA. We refer to this network as A(CA). The goal now is to add a new set of classes CB to this.” [pg. 3403, § 3. Dual-network learning, ¶1, Network A corresponds to a first model.])
by combining the first ML model and the second ML model using an unlabeled set of auxiliary data to generate the adapted ML model (“We make two copies of A(CA): one that is frozen to recognize classes CA through distillation loss and the second B(CB) that is extended to detect the new classes CB, which were not present or at least not annotated in the source images. (note: see pg. 3404, § 4.1 Datasets and evaluation; Shmelkov discloses using PSACAL VOC 2007 and COCO challenge datasets which corresponds to auxiliary data) The extension is done only in the last fully connected layers, i.e., classification and bounding box regression. We create sibling (i.e., fully-connected) layers for new classes only and concatenate their outputs with the original ones. The new layers are initialized randomly in the same way as the corresponding layers in Fast R-CNN.” [pg. 3403, § 3.2 Dual-network learning, ¶1; Examiner is interpreting concatenating the outputs of Network A (original class) and Network B (new class) to be equivalent to combining and thus would produce an extended network (A(CA ∪ CB)) which corresponds to an adapted model.])
retain detection or classification performance on the first set of object classes in the ML adapted model by performing a knowledge distillation process on the first ML model (“Specifically, we evaluate each new training sample on the frozen copy (Network A) to choose a diverse set of proposals (distillation proposals in Figure 2), and record their responses. With these responses in hand, we compute a distillation loss which measures the discrepancy between the two networks for the distillation proposals. This loss is added to the crossentropy loss on the new classes to make up the loss function for training the adapted detection network. As we show in the experimental evaluation, the distillation loss as well as the strategy to select the distillation proposals are critical in preserving the performance on the old classes (see §4)” [pg. 3402, § 3. Incremental learning of new classes, ¶2; Shmelkov discloses performing knowledge distillation on Network A and Network B is adapted to retain knowledge from Network A in Figure 2.]), wherein, to perform the knowledge distillation process, the processor is configured to use a network to extract one or more features for each of a plurality of training samples for the first set of object classes in the first ML model (“In our variant of Fast R-CNN, we replaced the VGG-16 trunk with a deeper ResNet-50 component, which is faster and more accurate than VGG-16. We follow the suggestions in to combine Fast R-CNN and ResNet architectures. The network processes the whole image through a sequence of residual blocks. Before the last strided convolution layer we insert a RoI pooling layer, which performs maxpooling over regions of varied sizes, i.e., proposals, into a 7 × 7 feature map. Then we add the remaining residual blocks, a layer for average pooling over spatial dimensions, and two fully connected layers: a softmax layer for classification (PASCAL or COCO classes, for example, along with the background class) and a regression layer for bounding box refinement, with independent corrections for each class.” [pg. 3402, § 3.1 Object detection network, ¶2; note: It is implicit that feature extraction would have to occur as RoI pooling layer in a Fast R-CNN extracts a feature vector from a feature map.); and
use the adapted ML model to detect or classify one or more objects from the first set of object classes and one or more objects from the second set of object classes (“We create sibling (i.e., fully-connected) layers for new classes only and concatenate their outputs with the original ones. The new layers are initialized randomly in the same way as the corresponding layers in Fast R-CNN. Our goal is to train B(CB) to recognize classes CA ∪ CB using only new data and annotations for CB.” [pg. 3403, § 3.2 Dual-network learning, ¶1; Shmelkov discloses CA to be the original (i.e. old) classes and CB to be the set of new classes. “Trained to recognize” would correspond to using the network to classify or detect objects.]).
However Shmelkov fails to explicitly teach using a feature pyramid network.
Lin1 teaches using a feature pyramid network (“Fast R-CNN is a region-based object detector in which Region-of-Interest (RoI) pooling is used to extract features. Fast R-CNN is most commonly performed on a single-scale feature map. To use it with our FPN, we need to assign RoIs of different scales to the pyramid levels… We view our feature pyramid as if it were produced from an image pyramid. Thus we can adapt the assignment strategy of region-based detectors in the case when they are run on image pyramids” [pg. 4, § 4.2 Feature Pyramid Networks for Fast R-CNN])
Shmelkov and Lin1 are both in the same field of endeavor of object detection. Shmelkov discloses an incremental learning method using dual networks. Lin1 discloses feature pyramid networks for performing object detection. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the Fast R-CNN object detector disclosed by Shmelkov with the feature pyramid map as taught by Lin1. Implementing a feature pyramid network with a Fast R-CNN would be a feasible combination as disclosed by Lin1 (§4.2). Thus, one would have been motivated to make this modification in order to detect objects at different scales more efficient and accurate. [pg. 2, § 1 Introduction, ¶7-8, Lin1]

Regarding claim 12, the combination of Shmelkov and Lin1 discloses The electronic device of claim 8, where Shmelkov further teaches wherein the processor is further configured to: 
in response to being unable to identify an object from the second set of object classes based on the first ML model, receiving a label of the object (“The second CNN (bottom) is an incrementally trained version of the first one for the category horse. In other words, the original network is adapted with images from only this new class. This adapted network localizes the horse in the image, but fails to detect the rider, which it was capable of originally, and despite the fact that the person class was not updated. In this paper, we present a method to alleviate this issue” [pg. 3400, § 1 Introduction, ¶2; Shmelkov discloses a method to solve the problem of a second network being unable to detect the original class in § 3.2 Dual-network learning, It is implicit that a label would be received for an object to perform this training method.]), 
wherein, to adapt the first ML model for use with the second set of object classes to generate the adapted ML model, the processor is further configured to adapt the first ML model for use with the second set of object classes, the labeled object being one of the object classes in the second set (“Each experiment begins with choosing a subset of classes to form the set CA. Then, a network is learned only on the subset of the training set composed of all the images containing at least one object from CA. Annotations for other classes in these images are ignored. With the new classes chosen to form the set CB, we learn the extended network as described in Section 3.2 with the subset of the training set containing at least one object from CB. As in the previous case, annotations of all the other classes, including those of the original classes CA, are ignored.” [pg. 3404, § 4.2 Implementation details, ¶1; note: See pg. 3407, Tables 6-9 discloses a labeled object in the second set.]).

Regarding claim 14, the combination of Shmelkov and Lin1 teaches The electronic device of claim 8, where Shmelkov further teaches wherein the processor is further configured to use the adapted ML model to perform object classification (“The extension is done only in the last fully connected layers, i.e., classification and bounding box regression.” [pg. 3403, § 3.2 Dual Network learning, ¶1]).

Regarding claim 15, Shmelkov teaches A non-transitory, computer-readable medium comprising program code that, when executed by a processor of an electronic device, (“These paradigms, and in particular fine-tuning, a special case of transfer learning, are very popular in computer vision” [pg. 3401, § 2. Related Work, ¶2, Computer vision implies the use of processors and memory. See § Acknowledgments where Shmelkov discloses use of GPUs.]), causes the electronic device to: 
identify, via the first machine learning (ML) model, a first set of object classes that the first ML model is trained to detect or classify (“First, we train a Fast R-CNN to detect the original set of classes CA. We refer to this network as A(CA).” [pg. 3403, § 3.2 Dual-network Learning, ¶1; original classes corresponds to a first set of object classes.]); 
generate a second ML model to detect or classify the second set of object classes different from the first set of object classes using a labeled set of data for the second set of object classes (“In the first experiment we take 19 classes in alphabetical order from the VOC dataset as CA, and the remaining one as the only new class CB. We then train the A(1-19) network on the VOC trainval subset containing any of the 19 classes, and the B(20) network is trained on the trainval subset containing the new class” [pg. 3404, § 4.3 Addition of one class, ¶3, See Table 1. Class 20 is labeled “tvmonitor” which corresponds to a labeled set of data in a second set of object classes.]);
adapt the first ML model for use with the second set of object classes (“First, we train a Fast R-CNN to detect the original set of classes CA. We refer to this network as A(CA). The goal now is to add a new set of classes CB to this.” [pg. 3403, § 3. Dual-network learning, ¶1, Network A corresponds to a first model.])
by combining the first ML model and the second ML model using an unlabeled set of auxiliary data to generate the adapted ML model (“We make two copies of A(CA): one that is frozen to recognize classes CA through distillation loss and the second B(CB) that is extended to detect the new classes CB, which were not present or at least not annotated in the source images. (note: see pg. 3404, § 4.1 Datasets and evaluation; Shmelkov discloses using PSACAL VOC 2007 and COCO challenge datasets which corresponds to auxiliary data) The extension is done only in the last fully connected layers, i.e., classification and bounding box regression. We create sibling (i.e., fully-connected) layers for new classes only and concatenate their outputs with the original ones. The new layers are initialized randomly in the same way as the corresponding layers in Fast R-CNN.” [pg. 3403, § 3.2 Dual-network learning, ¶1; Examiner is interpreting concatenating the outputs of Network A (original class) and Network B (new class) to be equivalent to combining and thus would produce an extended network (A(CA ∪ CB)) which corresponds to an adapted model.])
retain detection or classification performance on the first set of object classes in the ML adapted model by performing a knowledge distillation process on the first ML model (“Specifically, we evaluate each new training sample on the frozen copy (Network A) to choose a diverse set of proposals (distillation proposals in Figure 2), and record their responses. With these responses in hand, we compute a distillation loss which measures the discrepancy between the two networks for the distillation proposals. This loss is added to the crossentropy loss on the new classes to make up the loss function for training the adapted detection network. As we show in the experimental evaluation, the distillation loss as well as the strategy to select the distillation proposals are critical in preserving the performance on the old classes (see §4)” [pg. 3402, § 3. Incremental learning of new classes, ¶2; Shmelkov discloses performing knowledge distillation on Network A and Network B is adapted to retain knowledge from Network A in Figure 2.]), wherein, the program code that when executed causes the electronic device to perform the knowledge distillation process comprises program code that when executed causes the electronic device to use a network to extract one or more features for each of a plurality of training samples for the first set of object classes in the first ML model (“In our variant of Fast R-CNN, we replaced the VGG-16 trunk with a deeper ResNet-50 component, which is faster and more accurate than VGG-16. We follow the suggestions in to combine Fast R-CNN and ResNet architectures. The network processes the whole image through a sequence of residual blocks. Before the last strided convolution layer we insert a RoI pooling layer, which performs maxpooling over regions of varied sizes, i.e., proposals, into a 7 × 7 feature map. Then we add the remaining residual blocks, a layer for average pooling over spatial dimensions, and two fully connected layers: a softmax layer for classification (PASCAL or COCO classes, for example, along with the background class) and a regression layer for bounding box refinement, with independent corrections for each class.” [pg. 3402, § 3.1 Object detection network, ¶2; note: It is implicit that feature extraction would have to occur as RoI pooling layer in a Fast R-CNN extracts a feature vector from a feature map.); and
use the adapted ML model to detect or classify one or more objects from the first set of object classes and one or more objects from the second set of object classes (“We create sibling (i.e., fully-connected) layers for new classes only and concatenate their outputs with the original ones. The new layers are initialized randomly in the same way as the corresponding layers in Fast R-CNN. Our goal is to train B(CB) to recognize classes CA ∪ CB using only new data and annotations for CB.” [pg. 3403, § 3.2 Dual-network learning, ¶1; Shmelkov discloses CA to be the original (i.e. old) classes and CB to be the set of new classes. “Trained to recognize” would correspond to using the network to classify or detect objects.]).
However Shmelkov fails to explicitly teach using a feature pyramid network.
Lin1 teaches using a feature pyramid network (“Fast R-CNN is a region-based object detector in which Region-of-Interest (RoI) pooling is used to extract features. Fast R-CNN is most commonly performed on a single-scale feature map. To use it with our FPN, we need to assign RoIs of different scales to the pyramid levels… We view our feature pyramid as if it were produced from an image pyramid. Thus we can adapt the assignment strategy of region-based detectors in the case when they are run on image pyramids” [pg. 4, § 4.2 Feature Pyramid Networks for Fast R-CNN])
Shmelkov and Lin1 are both in the same field of endeavor of object detection. Shmelkov discloses an incremental learning method using dual networks. Lin1 discloses feature pyramid networks for performing object detection. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the Fast R-CNN object detector disclosed by Shmelkov with the feature pyramid map as taught by Lin1. Implementing a feature pyramid network with a Fast R-CNN would be a feasible combination as disclosed by Lin1 (§4.2). Thus, one would have been motivated to make this modification in order to detect objects at different scales more efficient and accurate. [pg. 2, § 1 Introduction, ¶7-8, Lin1]
Regarding claim 19, the combination of Shmelkov and Lin1 teaches The non-transitory, computer-readable medium of Claim 15, where Shmelkov further teaches further comprising program code that, when executed by the processor, causes the electronic device to: 
in response to being unable to identify an object from the second set of object classes based on the first ML model, receive a label of the object (“The second CNN (bottom) is an incrementally trained version of the first one for the category horse. In other words, the original network is adapted with images from only this new class. This adapted network localizes the horse in the image, but fails to detect the rider, which it was capable of originally, and despite the fact that the person class was not updated. In this paper, we present a method to alleviate this issue” [pg. 3400, § 1 Introduction, ¶2; Shmelkov discloses a method to solve the problem of a second network being unable to detect the original class in § 3.2 Dual-network learning, It is implicit that a label would be received for an object to perform this training method.]), 
wherein the program code that, when executed, causes the electronic device to adapt the first ML model for use with the second set of object classes to generate the adapted ML model comprises program code that, when executed by the processor, causes the electronic device to adapt the first ML model for use with the second set of object classes, the labeled object being one of the object classes in the second set (“Each experiment begins with choosing a subset of classes to form the set CA. Then, a network is learned only on the subset of the training set composed of all the images containing at least one object from CA. Annotations for other classes in these images are ignored. With the new classes chosen to form the set CB, we learn the extended network as described in Section 3.2 with the subset of the training set containing at least one object from CB. As in the previous case, annotations of all the other classes, including those of the original classes CA, are ignored.” [pg. 3404, § 4.2 Implementation details, ¶1; note: See pg. 3407, Tables 6-9 discloses a labeled object in the second set.]).

Regarding claim 21, the combination of Shmelkov and Lin1 teaches The method of Claim 1, where Shmelkov teaches wherein the first and second sets of object classes comprise image classes and the training samples comprise training images (“We evaluate our method on the PASCAL VOC 2007 detection benchmark and the Microsoft COCO challenge dataset. VOC 2007 consists of 5K images in the trainval split and 5K images in the test split for 20 object classes. COCO on the other hand has 80K images in the training set and 40K images in the validation set for 80 object classes (which includes all the classes from VOC).” [pg. 5, 4.1. Datasets and Evaluation, ¶1]).

Regarding claim 22, the combination of Shmelkov and Lin1 teaches The electronic device of Claim 8, where Shmelkov teaches wherein the first and second sets of object classes comprise image classes and the training samples comprise training images (“We evaluate our method on the PASCAL VOC 2007 detection benchmark and the Microsoft COCO challenge dataset. VOC 2007 consists of 5K images in the trainval split and 5K images in the test split for 20 object classes. COCO on the other hand has 80K images in the training set and 40K images in the validation set for 80 object classes (which includes all the classes from VOC).” [pg. 5, 4.1. Datasets and Evaluation, ¶1]).

Regarding claim 23, the combination of Shmelkov and Lin1 teaches The non-transitory, computer-readable medium of Claim 15, where Shmelkov teaches wherein the first and second sets of object classes comprise image classes and the training samples comprise training images (“We evaluate our method on the PASCAL VOC 2007 detection benchmark and the Microsoft COCO challenge dataset. VOC 2007 consists of 5K images in the trainval split and 5K images in the test split for 20 object classes. COCO on the other hand has 80K images in the training set and 40K images in the validation set for 80 object classes (which includes all the classes from VOC).” [pg. 5, 4.1. Datasets and Evaluation, ¶1]).

Claims 3, 4, 10, 11, 17, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Shmelkov in view of Lin1 and further in view of Coates et al. ("Emergence of Object-Selective Features in Unsupervised Feature Learning", hereinafter "Coates").
Regarding claim 3, the combination of Shmelkov and Lin1 teaches The method of Claim 1, where Shmelkov further teaches wherein combining the first ML model and the second ML model using the unlabeled set of auxiliary data to generate the adapted model comprises: 
performing object detection or classification on the unlabeled set of auxiliary data using the second ML model to generate a second set of model outputs (“the second B(CB) that is extended to detect the new classes CB, which were not present or at least not annotated in the source images.” [pg. 3403, § 3.2 Dual-network learning, ¶1]); and
combining the first ML model and the second ML model based on a loss function using the first and second sets of model outputs (“The core of our approach is a loss function balancing the interplay between predictions on the new classes, i.e., cross-entropy loss, and a new distillation loss which minimizes the discrepancy between responses for old classes from the original and the new networks. The overall approach is illustrated in Figure 2.” [pg. 3401, top left col, ¶1]).
the combination of Shmelkov and Lin1 fails to explicitly teach performing object detection or classification on the unlabeled set of auxiliary data using the first ML model to generate a first set of model outputs;
Coates teaches performing object detection or classification on the unlabeled set of auxiliary data using the first ML model to generate a first set of model outputs (“As described above, we ran our algorithm on patches harvested from YouTube thumbnails downloaded from the web. Specifically, we downloaded the thumbnails for over 1.4 million YouTube videos, some of which are shown in Figure 2b. These images were downsampled to 128-by-96 pixels and converted to grayscale. We cropped 57 million randomly selected 32-by-32 pixel patches from these images to form our unlabeled training set. No supervision was used—thus most patches contain partial views of objects or clutter at differing scales. We ran our algorithm on these images using a cluster of 30 machines over 3 days—virtually all of the time spent training the 150,000 second-layer features. We will now visualize these features and check whether any of them have learned to identify an object class.” [pg. 5, § 3 Experiments, ¶1]);
Shmelkov, Lin1 and Coates are all in the same field of endeavor of machine learning using image data. Shmelkov discloses an incremental learning method using dual networks. Lin1 discloses feature pyramid networks for performing object detection. Coates discloses feature learning from unlabeled image data. Although Shmelkov discloses a first model that performs object detection, he fails to teach using unlabeled set of auxiliary data. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date to modify the teachings of Shmelkov and Lin1, in particular, the first model disclosed by Shmelkov in order to train the model using the unlabeled auxiliary data taught by Coates. One would have been motivated to use unlabeled auxiliary data in order to train the model to detect more complex patterns. [pg. 1, §1 Introduction, Coates]

Regarding claim 4, the combination of Shmelkov and Lin1 teaches The method of claim 1, where Shmelkov further teaches wherein retaining the detection or classification performance on the first set of object classes in the adapted ML model comprises: 
retaining the detection or classification performance on the first set of object classes (“Using only the training samples for the new classes, we propose a method for not only adapting the old network to the new classes, but also ensuring performance on the old classes does not degrade.” [pg. 3401, top left col, ¶1; ensuring performance does not degrade would be equivalent to retaining.]).
the combination of Shmelkov and Lin1 fails to explicitly teach
generating, for a set of the training samples belonging to a same class in the first set of object classes, N clusters based on the extracted features; 
for each of the N clusters, selecting a training sample from the set of training samples that is a nearest-neighbor of a cluster centroid; and 
Coates teaches generating, for a set of the training samples belonging to a same class in the first set of object classes, N clusters based on the extracted features (“To construct the groups G, we will use a version of single-link agglomerative clustering to combine sets of features that have low dissimilarity according to d(k, l). To construct a single group G0 we begin by choosing a random simple cell filter, say D(k) , as the first member. We then search for candidate cells to be added to the group by computing d(k, l) for each simple cell filter D(l) and add D(l) to the group if d(k, l) is less than some limit τ” [pg. 3, § 2.2 Learning Invariant Features, ¶4; groups G would be equivalent to N clusters. Low dissimilarity is equivalent to samples belonging to same class.]); 
for each of the N clusters, selecting a training sample from the set of training samples that is a nearest-neighbor of a cluster centroid (“The algorithm then continues to expand G0 by adding any additional simple cells that are closer than τ to any one of the simple cells already in the group. This procedure continues until there are no more cells to be added, or until the diameter of the group (the dissimilarity between the two furthest cells in the group) reaches a limit ∆. See further: “We use τ = 0.3 for the first layer of complex cells and τ = 1.0 for the second layer. These were chosen by examining the typical distance between a filter D (k) and its nearest neighbor. We use ∆ = 1.5 > √ 2 so that a complex cell group may include orthogonal filters but cannot grow without limit.” [pg. 3, 2.2 Learning Invariant Features, ¶4])
Shmelkov, Lin1 and Coates are all in the same field of endeavor of machine learning using image data. Shmelkov discloses an incremental learning method using dual networks. Lin1 discloses feature pyramid networks for performing object detection.  Coates discloses feature learning from unlabeled image data. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify the incremental learning method disclosed by Shmelkov and feature pyramid network disclosed by Lin1 to include generating clusters and selecting a sample from the clusters as taught by Coates. One would have been motivated to use a K-means algorithm to generate a cluster of training samples to find features that are sensitive to commonly occurring object classes. [pg. 1, §Abstract, Coates]

Regarding claim 10, the combination of Shmelkov and Lin1 teaches The electronic device of Claim 9, where Shmelkov further teaches wherein to combine the first ML model and the ML second model using the unlabeled set of auxiliary data to generate the adapted model, the processor is further configured to: 
perform object detection or classification on the unlabeled set of auxiliary data using the second ML model to generate a second set of model outputs (“the second B(CB) that is extended to detect the new classes CB, which were not present or at least not annotated in the source images.” [pg. 3403, § 3.2 Dual-network learning, ¶1]); and
combine the first ML model and the second ML model based on a loss function using the first and second sets of model outputs (“The core of our approach is a loss function balancing the interplay between predictions on the new classes, i.e., cross-entropy loss, and a new distillation loss which minimizes the discrepancy between responses for old classes from the original and the new networks. The overall approach is illustrated in Figure 2.” [pg. 3401, top left col, ¶1]).
the combination of Shmelkov and Lin1 fails to explicitly teach perform object detection or classification on the unlabeled set of auxiliary data using the first ML model to generate a first set of model outputs;
Coates teaches perform object detection or classification on the unlabeled set of auxiliary data using the first ML model to generate a first set of model outputs (“As described above, we ran our algorithm on patches harvested from YouTube thumbnails downloaded from the web. Specifically, we downloaded the thumbnails for over 1.4 million YouTube videos, some of which are shown in Figure 2b. These images were downsampled to 128-by-96 pixels and converted to grayscale. We cropped 57 million randomly selected 32-by-32 pixel patches from these images to form our unlabeled training set. No supervision was used—thus most patches contain partial views of objects or clutter at differing scales. We ran our algorithm on these images using a cluster of 30 machines over 3 days—virtually all of the time spent training the 150,000 second-layer features. We will now visualize these features and check whether any of them have learned to identify an object class.” [pg. 5, § 3 Experiments, ¶1]);
Shmelkov, Lin1 and Coates are all in the same field of endeavor of machine learning using image data. Shmelkov discloses an incremental learning method using dual networks. Lin1 discloses feature pyramid networks for performing object detection. Coates discloses feature learning from unlabeled image data. Although Shmelkov discloses a first model that performs object detection, he fails to teach using unlabeled set of auxiliary data. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date to modify the teachings of Shmelkov and Lin1, in particular, the first model disclosed by Shmelkov in order to train the model using the unlabeled auxiliary data taught by Coates. One would have been motivated to use unlabeled auxiliary data in order to train the model to detect more complex patterns. [pg. 1, §1 Introduction, Coates]

Regarding claim 11, the combination of Shmelkov and Lin1 teaches The electronic device of claim 8, where Shmelkov further teaches wherein to retain the detection or classification performance on the first set of object classes in the adapted ML model, the processor is further configured to: 
retain the detection or classification performance on the first set of object classes (“Using only the training samples for the new classes, we propose a method for not only adapting the old network to the new classes, but also ensuring performance on the old classes does not degrade.” [pg. 3401, top left col, ¶1; ensuring performance does not degrade would be equivalent to retaining.]).
the combination of Shmelkov and Lin1 fails to explicitly teach
generate, for a set of the training samples belonging to a same class in the first set of object classes, N clusters based on the extracted features; 
for each of the N clusters, select a training sample from the set of training samples that is a nearest-neighbor of a cluster centroid; and 
Coates teaches generate, for a set of the training samples belonging to a same class in the first set of object classes, N clusters based on the extracted features (“To construct the groups G, we will use a version of single-link agglomerative clustering to combine sets of features that have low dissimilarity according to d(k, l). To construct a single group G0 we begin by choosing a random simple cell filter, say D(k) , as the first member. We then search for candidate cells to be added to the group by computing d(k, l) for each simple cell filter D(l) and add D(l) to the group if d(k, l) is less than some limit τ” [pg. 3, § 2.2 Learning Invariant Features, ¶4; groups G would be equivalent to N clusters. Low dissimilarity is equivalent to samples belonging to same class.]); 
for each of the N clusters, select a training sample from the set of training samples that is a nearest-neighbor of a cluster centroid (“The algorithm then continues to expand G0 by adding any additional simple cells that are closer than τ to any one of the simple cells already in the group. This procedure continues until there are no more cells to be added, or until the diameter of the group (the dissimilarity between the two furthest cells in the group) reaches a limit ∆. See further: “We use τ = 0.3 for the first layer of complex cells and τ = 1.0 for the second layer. These were chosen by examining the typical distance between a filter D (k) and its nearest neighbor. We use ∆ = 1.5 > √ 2 so that a complex cell group may include orthogonal filters but cannot grow without limit.” [pg. 3, 2.2 Learning Invariant Features, ¶4])
Shmelkov, Lin1 and Coates are all in the same field of endeavor of machine learning using image data. Shmelkov discloses an incremental learning method using dual networks. Lin1 discloses feature pyramid networks for performing object detection.  Coates discloses feature learning from unlabeled image data. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify the incremental learning method disclosed by Shmelkov and feature pyramid network disclosed by Lin1 to include generating clusters and selecting a sample from the clusters as taught by Coates. One would have been motivated to use a K-means algorithm to generate a cluster of training samples to find features that are sensitive to commonly occurring object classes. [pg. 1, §Abstract, Coates]

Regarding claim 17, the combination of Shmelkov and Lin1 teaches The non-transitory, computer-readable medium of claim 15, where Shmelkov further teaches wherein the program code that, when executed, causes the electronic device to combine the first ML model and the second ML model using the unlabeled set of auxiliary data to generate the adapted model comprises program code that, when executed by the processor, causes the electronic device to:
 perform object detection or classification on the unlabeled set of auxiliary data using the second ML model to generate a second set of model outputs (“the second B(CB) that is extended to detect the new classes CB, which were not present or at least not annotated in the source images.” [pg. 3403, § 3.2 Dual-network learning, ¶1]); and
combine the first ML model and the second ML model based on a loss function using the first and second sets of model outputs (“The core of our approach is a loss function balancing the interplay between predictions on the new classes, i.e., cross-entropy loss, and a new distillation loss which minimizes the discrepancy between responses for old classes from the original and the new networks. The overall approach is illustrated in Figure 2.” [pg. 3401, top left col, ¶1]).
However the combination of Shmelkov and Lin1 fails to explicitly teach perform object detection or classification on the unlabeled set of auxiliary data using the first ML model to generate a first set of model outputs;
Coates teaches perform object detection or classification on the unlabeled set of auxiliary data using the first ML model to generate a first set of model outputs (“As described above, we ran our algorithm on patches harvested from YouTube thumbnails downloaded from the web. Specifically, we downloaded the thumbnails for over 1.4 million YouTube videos, some of which are shown in Figure 2b. These images were downsampled to 128-by-96 pixels and converted to grayscale. We cropped 57 million randomly selected 32-by-32 pixel patches from these images to form our unlabeled training set. No supervision was used—thus most patches contain partial views of objects or clutter at differing scales. We ran our algorithm on these images using a cluster of 30 machines over 3 days—virtually all of the time spent training the 150,000 second-layer features. We will now visualize these features and check whether any of them have learned to identify an object class.” [pg. 5, § 3 Experiments, ¶1]);
Shmelkov, Lin1 and Coates are all in the same field of endeavor of machine learning using image data. Shmelkov discloses an incremental learning method using dual networks. Lin1 discloses feature pyramid networks for performing object detection. Coates discloses feature learning from unlabeled image data. Although Shmelkov discloses a first model that performs object detection, he fails to teach using unlabeled set of auxiliary data. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date to modify the teachings of Shmelkov and Lin1, in particular, the first model disclosed by Shmelkov in order to train the model using the unlabeled auxiliary data taught by Coates. One would have been motivated to use unlabeled auxiliary data in order to train the model to detect more complex patterns. [pg. 1, §1 Introduction, Coates]

Regarding claim 18, the combination of Shmelkov and Lin1 teaches The non-transitory, computer-readable medium of claim 15, where Shmelkov further teaches wherein the program code that, when executed, causes the electronic device to retain the detection or classification performance on the first set of object classes in the adapted ML model comprises program code that, when executed by the processor, causes the electronic device to: 
retain the detection or classification performance on the first set of object classes (“Using only the training samples for the new classes, we propose a method for not only adapting the old network to the new classes, but also ensuring performance on the old classes does not degrade.” [pg. 3401, top left col, ¶1; ensuring performance does not degrade would be equivalent to retaining.]).
the combination of Shmelkov and Lin1 fails to explicitly teach
generate, for a set of the training samples belonging to a same class in the first set of object classes, N clusters based on the extracted features; 
for each of the N clusters, select a training sample from the set of training samples that is a nearest-neighbor of a cluster centroid; and 
Coates teaches generate, for a set of the training samples belonging to a same class in the first set of object classes, N clusters based on the extracted features (“To construct the groups G, we will use a version of single-link agglomerative clustering to combine sets of features that have low dissimilarity according to d(k, l). To construct a single group G0 we begin by choosing a random simple cell filter, say D(k) , as the first member. We then search for candidate cells to be added to the group by computing d(k, l) for each simple cell filter D(l) and add D(l) to the group if d(k, l) is less than some limit τ” [pg. 3, § 2.2 Learning Invariant Features, ¶4; groups G would be equivalent to N clusters. Low dissimilarity is equivalent to samples belonging to same class.]); 
for each of the N clusters, select a training sample from the set of training samples that is a nearest-neighbor of a cluster centroid (“The algorithm then continues to expand G0 by adding any additional simple cells that are closer than τ to any one of the simple cells already in the group. This procedure continues until there are no more cells to be added, or until the diameter of the group (the dissimilarity between the two furthest cells in the group) reaches a limit ∆. See further: “We use τ = 0.3 for the first layer of complex cells and τ = 1.0 for the second layer. These were chosen by examining the typical distance between a filter D (k) and its nearest neighbor. We use ∆ = 1.5 > √ 2 so that a complex cell group may include orthogonal filters but cannot grow without limit.” [pg. 3, 2.2 Learning Invariant Features, ¶4])
Shmelkov, Lin1 and Coates are all in the same field of endeavor of machine learning using image data. Shmelkov discloses an incremental learning method using dual networks. Lin1 discloses feature pyramid networks for performing object detection.  Coates discloses feature learning from unlabeled image data. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify the incremental learning method disclosed by Shmelkov and feature pyramid network disclosed by Lin1 to include generating clusters and selecting a sample from the clusters as taught by Coates. One would have been motivated to use a K-means algorithm to generate a cluster of training samples to find features that are sensitive to commonly occurring object classes. [pg. 1, §Abstract, Coates]

Claims 6, 13, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Shmelkov in view Lin1 and further in view of Lin et al. ("Microsoft COCO: Common Objects in Context", hereinafter "Lin2").

Regarding claim 6, the combination Shmelkov and Lin1 teaches The method of claim 5, however fails to explicitly teach further comprising: searching for additional instances of objects in the object class of the labeled object based on the label, 
wherein adapting the first ML model for use with the second set of object classes further comprises training the first ML model using the additional instances of the objects.
Lin2 teaches searching for additional instances of objects in the object class of the labeled object based on the label (“
    PNG
    media_image1.png
    213
    201
    media_image1.png
    Greyscale
, In the next stage all instances of the object categories in an image were labeled, Fig. 3(b). In the previous stage each worker labeled one instance of a category, but multiple object instances may exist” [pg. 746, § Instance Spotting, ¶1; Red annotated arrow shows searching additional instances of the label “bottle”.]), 
wherein adapting the first ML model for use with the second set of object classes further comprises training the first ML model using the additional instances of the objects (“In the next stage all instances of the object categories in an image were labeled, Fig. 3(b). In the previous stage each worker labeled one instance of a category, but multiple object instances may exist. Therefore, for each image, a worker was asked to place a cross on top of each instance of a specific category found in the previous stage. To boost recall, the location of the instance found by a worker in the previous stage was shown to the current worker. Such priming helped workers quickly find an initial instance upon first seeing the image. The workers could also use a magnifying glass to find small instances. Each worker was asked to label at most 10 instances of a given category per image. Each image was labeled by 8 workers for a total of ∼10k worker hours.” [pg. 746, § Instance Spotting, ¶1]).
Shmelkov, Lin1, and Lin2 are all in the same field of endeavor of machine learning using image data. Shmelkov discloses an incremental learning method using dual networks. Lin1 discloses feature pyramid networks for performing object detection. Lin2 discloses an image dataset that addresses common objects in context. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify the incremental learning method disclosed by Shmelkov and feature pyramid network as taught by Lin1 by using the image dataset disclosed by Lin2 in order to train the models. One would have been motivated to search for multiple instances of an object to further train the model to have better accuracy in object detection. [pg. 743, § Object detection, ¶3, Lin2]

Regarding claim 13, the combination of Shmelkov and Lin 1 teaches The electronic device of claim 12, however fails to explicitly teach wherein: the processor is further configured to search for additional instances of objects in the object class of the labeled object based on the label, and 
to adapt the first ML model for use with the second set of object classes, the processor is further configured to train the first ML model using the additional instances of the objects in the object class of the labeled object.
Lin2 teaches wherein: the processor is further configured to search for additional instances of objects in the object class of the labeled object based on the label (“
    PNG
    media_image1.png
    213
    201
    media_image1.png
    Greyscale
, In the next stage all instances of the object categories in an image were labeled, Fig. 3(b). In the previous stage each worker labeled one instance of a category, but multiple object instances may exist” [pg. 746, § Instance Spotting, ¶1; Red annotated arrow shows searching additional instances of the label “bottle”.]), and 
to adapt the first ML model for use with the second set of object classes, the processor is further configured to train the first ML model using the additional instances of the objects in the object class of the labeled object (“In the next stage all instances of the object categories in an image were labeled, Fig. 3(b). In the previous stage each worker labeled one instance of a category, but multiple object instances may exist. Therefore, for each image, a worker was asked to place a cross on top of each instance of a specific category found in the previous stage. To boost recall, the location of the instance found by a worker in the previous stage was shown to the current worker. Such priming helped workers quickly find an initial instance upon first seeing the image. The workers could also use a magnifying glass to find small instances. Each worker was asked to label at most 10 instances of a given category per image. Each image was labeled by 8 workers for a total of ∼10k worker hours.” [pg. 746, § Instance Spotting, ¶1]).
Shmelkov, Lin1, and Lin2 are all in the same field of endeavor of machine learning using image data. Shmelkov discloses an incremental learning method using dual networks. Lin1 discloses feature pyramid networks for performing object detection. Lin2 discloses an image dataset that addresses common objects in context. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify the incremental learning method disclosed by Shmelkov and feature pyramid network as taught by Lin1 by using the image dataset disclosed by Lin2 in order to train the models. One would have been motivated to search for multiple instances of an object to further train the model to have better accuracy in object detection. [pg. 743, § Object detection, ¶3, Lin2]
Regarding claim 20, the combination of Shmelkov and Lin1 teaches The non-transitory, computer-readable medium of Claim 19, however fails to explicitly teach further comprising program code that, when executed by the processor, causes the electronic device to: search for additional instances of objects in the object class of the labeled object based on the label, wherein the program code that, when executed, causes the electronic device to adapt the first ML model for use with the second set of object classes comprises program code that, when executed by the processor, causes the electronic device to train the first ML model using the additional instances of the objects in the object class of the labeled object.
Lin2 teaches further comprising program code that, when executed by the processor, causes the electronic device to: 
search for additional instances of objects in the object class of the labeled object based on the label (“
    PNG
    media_image1.png
    213
    201
    media_image1.png
    Greyscale
, In the next stage all instances of the object categories in an image were labeled, Fig. 3(b). In the previous stage each worker labeled one instance of a category, but multiple object instances may exist” [pg. 746, § Instance Spotting, ¶1; Red annotated arrow shows searching additional instances of the label “bottle”.]), 
wherein the program code that, when executed, causes the electronic device to adapt the first ML model for use with the second set of object classes comprises program code that, when executed by the processor, causes the electronic device to train the first ML model using the additional instances of the objects in the object class of the labeled object (“In the next stage all instances of the object categories in an image were labeled, Fig. 3(b). In the previous stage each worker labeled one instance of a category, but multiple object instances may exist. Therefore, for each image, a worker was asked to place a cross on top of each instance of a specific category found in the previous stage. To boost recall, the location of the instance found by a worker in the previous stage was shown to the current worker. Such priming helped workers quickly find an initial instance upon first seeing the image. The workers could also use a magnifying glass to find small instances. Each worker was asked to label at most 10 instances of a given category per image. Each image was labeled by 8 workers for a total of ∼10k worker hours.” [pg. 746, § Instance Spotting, ¶1]).
Shmelkov, Lin1, and Lin2 are all in the same field of endeavor of machine learning using image data. Shmelkov discloses an incremental learning method using dual networks. Lin1 discloses feature pyramid networks for performing object detection. Lin2 discloses an image dataset that addresses common objects in context. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify the incremental learning method disclosed by Shmelkov and feature pyramid network as taught by Lin1 by using the image dataset disclosed by Lin2 in order to train the models. One would have been motivated to search for multiple instances of an object to further train the model to have better accuracy in object detection. [pg. 743, § Object detection, ¶3, Lin2]

Response to Arguments
Applicant's arguments filed 11/29/2021 have been fully considered but they are not persuasive. 

Regarding the 35 U.S.C. § 101 Rejection:
Applicant’s arguments regarding the 101 rejection have been considered and are persuasive. Therefore, the rejection has been withdrawn. 


Regarding claim the 35 U.S.C. § 103 Rejection:
In response to applicant's remarks regarding the cited art of Shmelkov being silent regarding different operations being performed with labeled or unlabeled data: As evidenced by Shmelkov, on pg. 3402, 3.1 Object detection network, para 3, "the input to the network is an image with 2000 pre-computed object proposals with bounding boxes." This would mean that the original class dataset used to train the the first ML model and second ML model is labeled. [pg. 3403, 3.2 Dual-network learning and 3.3 Sampling strategy] 

Shmelkov teaches unlabeled auxiliary data; see pg. 4, 3.2 Dual -network learning "detecting new classes Cb which were not present or not annotated" which would mean that set of classes is unlabeled data. Note: annotations and labels are equivalent. Please see further [abstract]: "We present a method to address this issue, and learn object detectors incrementally, when neither the original training data nor annotations for the original classes in the new training set are available"

New claims 21-23 are taught by Shmelkov as cited above in the prior art rejection. Please see the updated 103 rejection. 

Applicant’s arguments with respect to the rejections of the dependent claims have been fully considered but they are not persuasive as they rely upon the allowability of the independent claims.



Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL H HOANG whose telephone number is (571)272-8491. The examiner can normally be reached Mon-Fri 8:30AM-4:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571) 272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/M.H.H./Examiner, Art Unit 2122                                                                                                                                                                                                        
/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122