DETAILED ACTION
This action is in response to the claims filed 08/24/2022 for application 16/255,737. Claims 1, 3, 5, 6, 8, 10, 12, 13, 15, 17, 19, and 20 have been amended. Claims 1, 3-8, 10-15, and 17-23 are currently pending.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 5, 7-9, 12, 14, 15, 19, and 21-23 are rejected under 35 U.S.C. 103 as being unpatentable over Shmelkov et al. ("Incremental Learning of Object Detectors without Catastrophic Forgetting" cited by Applicant in the IDS filed 01/23/2019, hereinafter "Shmelkov") in view of Lin et al. ("Feature Pyramid Networks for Object Detection", hereinafter "Lin1") further in view of Li et al. ("Learning without Forgetting" cited by Applicant in the IDS filed 01/23/2019, hereinafter "Li") and further in view of Lee et al. ("Overcoming Catastrophic Forgetting by Incremental Moment Matching", hereinafter "Lee").

Regarding claim 1, Shmelkov teaches A method, the method comprising: 
identifying, via a first machine learning (ML) model for object detection or classification, a first set of object classes that the first ML model is trained to detect or classify (“First, we train a Fast R-CNN to detect the original set of classes CA. We refer to this network as A(CA).” [pg. 3403, § 3.2 Dual-network Learning, ¶1; original classes corresponds to a first set of object classes.])
	generating a second ML model to detect or classify the second set of object classes using a labeled set of data for the second set of object classes (“In the first experiment we take 19 classes in alphabetical order from the VOC dataset as CA, and the remaining one as the only new class CB. We then train the A(1-19) network on the VOC trainval subset containing any of the 19 classes, and the B(20) network is trained on the trainval subset containing the new class” [pg. 3404, § 4.3 Addition of one class, ¶3, See Table 1. Class 20 is labeled “tvmonitor” which corresponds to a labeled set of data in a second set of object classes.]);
retaining detection or classification performance on the first set of object classes in the ML adapted model by performing a knowledge distillation process on the first ML model (“Specifically, we evaluate each new training sample on the frozen copy (Network A) to choose a diverse set of proposals (distillation proposals in Figure 2), and record their responses. With these responses in hand, we compute a distillation loss which measures the discrepancy between the two networks for the distillation proposals. This loss is added to the crossentropy loss on the new classes to make up the loss function for training the adapted detection network. As we show in the experimental evaluation, the distillation loss as well as the strategy to select the distillation proposals are critical in preserving the performance on the old classes (see §4)” [pg. 3402, § 3. Incremental learning of new classes, ¶2; Shmelkov discloses performing knowledge distillation on Network A and Network B is adapted to retain knowledge from Network A in Figure 2.]), wherein the knowledge distillation process comprises using a network to extract one or more features for each of a plurality of training samples for the first set of object classes in the first ML model (“In our variant of Fast R-CNN, we replaced the VGG-16 trunk with a deeper ResNet-50 component, which is faster and more accurate than VGG-16. We follow the suggestions in to combine Fast R-CNN and ResNet architectures. The network processes the whole image through a sequence of residual blocks. Before the last strided convolution layer we insert a RoI pooling layer, which performs maxpooling over regions of varied sizes, i.e., proposals, into a 7 × 7 feature map. Then we add the remaining residual blocks, a layer for average pooling over spatial dimensions, and two fully connected layers: a softmax layer for classification (PASCAL or COCO classes, for example, along with the background class) and a regression layer for bounding box refinement, with independent corrections for each class.” [pg. 3402, § 3.1 Object detection network, ¶2; note: It is implicit that feature extraction would have to occur as RoI pooling layer in a Fast R-CNN extracts a feature vector from a feature map.); and
using the adapted ML model to detect or classify one or more objects from the first set of object classes and one or more objects from the second set of object classes (“We create sibling (i.e., fully-connected) layers for new classes only and concatenate their outputs with the original ones. The new layers are initialized randomly in the same way as the corresponding layers in Fast R-CNN. Our goal is to train B(CB) to recognize classes CA ∪ CB using only new data and annotations for CB.” [pg. 3403, § 3.2 Dual-network learning, ¶1; Shmelkov discloses CA to be the original (i.e. old) classes and CB to be the set of new classes. “Trained to recognize” would correspond to using the network to classify or detect objects.]).
However Shmelkov fails to explicitly teach using a feature pyramid network.
Lin1 teaches using a feature pyramid network (“Fast R-CNN is a region-based object detector in which Region-of-Interest (RoI) pooling is used to extract features. Fast R-CNN is most commonly performed on a single-scale feature map. To use it with our FPN, we need to assign RoIs of different scales to the pyramid levels… We view our feature pyramid as if it were produced from an image pyramid. Thus we can adapt the assignment strategy of region-based detectors in the case when they are run on image pyramids” [pg. 4, § 4.2 Feature Pyramid Networks for Fast R-CNN])
Shmelkov and Lin1 are both in the same field of endeavor of object detection. Shmelkov discloses an incremental learning method using dual networks. Lin1 discloses feature pyramid networks for performing object detection. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the Fast R-CNN object detector disclosed by Shmelkov with the feature pyramid map as taught by Lin1. Implementing a feature pyramid network with a Fast R-CNN would be a feasible combination as disclosed by Lin1 (§4.2). Thus, one would have been motivated to make this modification in order to detect objects at different scales more efficient and accurate. [pg. 2, § 1 Introduction, ¶7-8, Lin1]
	However Shmelkov/Lin1 fails to explicitly teach wherein (i) the second set of object classes includes at least one object class not present in the first set of object classes and (ii) the first set of object classes includes at least one object class not present in the second set of object classes;
	Li teaches wherein (i) the second set of object classes includes at least one object class not present in the first set of object classes and (ii) the first set of object classes includes at least one object class not present in the second set of object classes (“Given a CNN with shared parameters θs and task-specific parameters θo (Fig. 2(a)), our goal is to add task-specific parameters θn for a new task and to learn parameters that work well on old and new tasks, using images and labels from only the new task (i.e., without using data from existing tasks). Our algorithm is outlined in Fig. 3, and the network structure illustrated in Fig. 2(e).” [pg. 4, 3. Learning without Forgetting, ¶1; using images/labels from new tasks without using data from existing tasks implies that these images/labels were not present in the first set of object classes.]);
Shmelkov, Lin1, and Li are all in the same field of endeavor of object detection. Shmelkov discloses an incremental learning method using dual networks. Lin1 discloses feature pyramid networks for performing object detection. Li discloses a method of learning new tasks without catastrophic forgetting. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Shmelkov’s/Lin1’s teachings by using a second set of object classes that didn’t include an object class from the first set of object classes as taught by Li. One would have been motivated to make this modification in order to allow the model to recognize new classes with high accuracy while preserving the ability to recognize the original classes. [pg. 1, § 1 Introduction, ¶5, Li]
However Shmelkov/Lin1/Li fails to explicitly teach generating an adapted ML model for use with the second set of object classes by (i) instantiating the adapted ML model separate from the first ML model and the second ML model and (ii) training the adapted ML model using a combination of the first ML model, the second ML model, and an unlabeled set of auxiliary data;
Lee teaches generating an adapted ML model for use with the second set of object classes (“In naïve modeIMM, the second last-layer of the second network is used for the second last-layer of the final IMM model.” [pg. 15, Figure 5 caption; final model corresponds to an adapted ML model]) by (i) instantiating the adapted ML model separate from the first ML model and the second ML model (“The IMM procedure produces a neural network without a performance loss for kth task µk, which is better than the final solution µ1:k in terms of the performance of the kth task” [pg. 9, Balancing the Information of an Old and a New task, ¶1; IMM produces a neural network by combining multiple models trained on individual tasks.]) and (ii) training the adapted ML model using a combination of the first ML model, the second ML model, and an unlabeled set of auxiliary data (“The dimension of the random variable in the posterior distribution is the number of the parameters in the neural networks. IMM approximates the mixture of Gaussian posterior with each component representing parameters for a single task to one Gaussian distribution for a combined task. To merge the posteriors, we introduce two novel methods of moment matching. One is mean-IMM, which simply averages the parameters of two networks for old and new tasks as the minimization of the average of KL-divergence between one approximated posterior distribution for the combined task and each Gaussian posterior for the single task. The other is mode-IMM, which merges the parameters of two networks using a Laplacian approximation to approximate a mode of the mixture of two Gaussian posteriors, which represent the parameters of the two networks” [pg. 1-2, Introduction, ¶3; Lee’s method combines models that were individually trained on a task so that the final model has the ability to carry out multiple tasks.]);
Shmelkov discloses an incremental learning method using dual networks. Lin1 discloses feature pyramid networks for performing object detection. Li discloses a method of learning new tasks without catastrophic forgetting. Lee discloses a method to overcome catastrophic forgetting by incremental moment matching. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Shmelkov’s/Lin1’s/Li’s teachings by generating a final model by combining networks trained on individual tasks as taught by Lee. One would have been motivated to make this modification in order to overcome catastrophic forgetting in neural networks by balancing the information between new and old tasks. [Abstract, Lee]

Regarding claim 5, Shmelkov/Lin1/Li/Lee teaches The method of claim 1, where Shmelkov further teaches further comprising: 
in response to being unable to identify an object from the second set of object classes based on the first ML model, receiving a label of the object (“The second CNN (bottom) is an incrementally trained version of the first one for the category horse. In other words, the original network is adapted with images from only this new class. This adapted network localizes the horse in the image, but fails to detect the rider, which it was capable of originally, and despite the fact that the person class was not updated. In this paper, we present a method to alleviate this issue” [pg. 3400, § 1 Introduction, ¶2; Shmelkov discloses a method to solve the problem of a second network being unable to detect the original class in § 3.2 Dual-network learning, It is implicit that a label would be received for an object to perform this training method.]), 
wherein generating the adapted ML model for use with the second set of object classes comprises generating the adapted ML model for use with the second set of object classes, the labeled object being one of the object classes in the second set (“Each experiment begins with choosing a subset of classes to form the set CA. Then, a network is learned only on the subset of the training set composed of all the images containing at least one object from CA. Annotations for other classes in these images are ignored. With the new classes chosen to form the set CB, we learn the extended network as described in Section 3.2 with the subset of the training set containing at least one object from CB. As in the previous case, annotations of all the other classes, including those of the original classes CA, are ignored.” [pg. 3404, § 4.2 Implementation details, ¶1; note: See pg. 3407, Tables 6-9 discloses a labeled object in the second set.]).

Regarding claim 7, Shmelkov/Lin1/Li/Lee teaches The method of claim 1, where Shmelkov further teaches further comprising using the adapted ML model to perform object classification (“The extension is done only in the last fully connected layers, i.e., classification and bounding box regression.” [pg. 3403, § 3.2 Dual Network learning, ¶1]).

Regarding claim 8, Shmelkov teaches An electronic device comprising: 
a memory configured to store a first machine learning (ML)model for object detection or classification; and 
a processor operably connected to the memory (“These paradigms, and in particular fine-tuning, a special case of transfer learning, are very popular in computer vision” [pg. 3401, § 2. Related Work, ¶2, Computer vision implies the use of processors and memory. See § Acknowledgments where Shmelkov discloses use of GPUs.]), the processor configured to:
identify, via the first (ML) model, a first set of object classes that the first ML model is trained to detect or classify (“First, we train a Fast R-CNN to detect the original set of classes CA. We refer to this network as A(CA).” [pg. 3403, § 3.2 Dual-network Learning, ¶1; original classes corresponds to a first set of object classes.]); 
generate a second ML model to detect or classify the second set of object classes using a labeled set of data for the second set of object classes (“In the first experiment we take 19 classes in alphabetical order from the VOC dataset as CA, and the remaining one as the only new class CB. We then train the A(1-19) network on the VOC trainval subset containing any of the 19 classes, and the B(20) network is trained on the trainval subset containing the new class” [pg. 3404, § 4.3 Addition of one class, ¶3, See Table 1. Class 20 is labeled “tvmonitor” which corresponds to a labeled set of data in a second set of object classes.]);
retain detection or classification performance on the first set of object classes in the ML adapted model by performing a knowledge distillation process on the first ML model (“Specifically, we evaluate each new training sample on the frozen copy (Network A) to choose a diverse set of proposals (distillation proposals in Figure 2), and record their responses. With these responses in hand, we compute a distillation loss which measures the discrepancy between the two networks for the distillation proposals. This loss is added to the crossentropy loss on the new classes to make up the loss function for training the adapted detection network. As we show in the experimental evaluation, the distillation loss as well as the strategy to select the distillation proposals are critical in preserving the performance on the old classes (see §4)” [pg. 3402, § 3. Incremental learning of new classes, ¶2; Shmelkov discloses performing knowledge distillation on Network A and Network B is adapted to retain knowledge from Network A in Figure 2.]), wherein the knowledge distillation process comprises using a network to extract one or more features for each of a plurality of training samples for the first set of object classes in the first ML model (“In our variant of Fast R-CNN, we replaced the VGG-16 trunk with a deeper ResNet-50 component, which is faster and more accurate than VGG-16. We follow the suggestions in to combine Fast R-CNN and ResNet architectures. The network processes the whole image through a sequence of residual blocks. Before the last strided convolution layer we insert a RoI pooling layer, which performs maxpooling over regions of varied sizes, i.e., proposals, into a 7 × 7 feature map. Then we add the remaining residual blocks, a layer for average pooling over spatial dimensions, and two fully connected layers: a softmax layer for classification (PASCAL or COCO classes, for example, along with the background class) and a regression layer for bounding box refinement, with independent corrections for each class.” [pg. 3402, § 3.1 Object detection network, ¶2; note: It is implicit that feature extraction would have to occur as RoI pooling layer in a Fast R-CNN extracts a feature vector from a feature map.); and
use the adapted ML model to detect or classify one or more objects from the first set of object classes and one or more objects from the second set of object classes (“We create sibling (i.e., fully-connected) layers for new classes only and concatenate their outputs with the original ones. The new layers are initialized randomly in the same way as the corresponding layers in Fast R-CNN. Our goal is to train B(CB) to recognize classes CA ∪ CB using only new data and annotations for CB.” [pg. 3403, § 3.2 Dual-network learning, ¶1; Shmelkov discloses CA to be the original (i.e. old) classes and CB to be the set of new classes. “Trained to recognize” would correspond to using the network to classify or detect objects.]).
However Shmelkov fails to explicitly teach using a feature pyramid network.
Lin1 teaches using a feature pyramid network (“Fast R-CNN is a region-based object detector in which Region-of-Interest (RoI) pooling is used to extract features. Fast R-CNN is most commonly performed on a single-scale feature map. To use it with our FPN, we need to assign RoIs of different scales to the pyramid levels… We view our feature pyramid as if it were produced from an image pyramid. Thus we can adapt the assignment strategy of region-based detectors in the case when they are run on image pyramids” [pg. 4, § 4.2 Feature Pyramid Networks for Fast R-CNN])
Shmelkov and Lin1 are both in the same field of endeavor of object detection. Shmelkov discloses an incremental learning method using dual networks. Lin1 discloses feature pyramid networks for performing object detection. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the Fast R-CNN object detector disclosed by Shmelkov with the feature pyramid map as taught by Lin1. Implementing a feature pyramid network with a Fast R-CNN would be a feasible combination as disclosed by Lin1 (§4.2). Thus, one would have been motivated to make this modification in order to detect objects at different scales more efficient and accurate. [pg. 2, § 1 Introduction, ¶7-8, Lin1]
	However Shmelkov/Lin1 fails to explicitly teach wherein (i) the second set of object classes includes at least one object class not present in the first set of object classes and (ii) the first set of object classes includes at least one object class not present in the second set of object classes;
	Li teaches wherein (i) the second set of object classes includes at least one object class not present in the first set of object classes and (ii) the first set of object classes includes at least one object class not present in the second set of object classes (“Given a CNN with shared parameters θs and task-specific parameters θo (Fig. 2(a)), our goal is to add task-specific parameters θn for a new task and to learn parameters that work well on old and new tasks, using images and labels from only the new task (i.e., without using data from existing tasks). Our algorithm is outlined in Fig. 3, and the network structure illustrated in Fig. 2(e).” [pg. 4, 3. Learning without Forgetting, ¶1; using images/labels from new tasks without using data from existing tasks implies that these images/labels were not present in the first set of object classes.]);
Shmelkov, Lin1, and Li are all in the same field of endeavor of object detection. Shmelkov discloses an incremental learning method using dual networks. Lin1 discloses feature pyramid networks for performing object detection. Li discloses a method of learning new tasks without catastrophic forgetting. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Shmelkov’s/Lin1’s teachings by using a second set of object classes that didn’t include an object class from the first set of object classes as taught by Li. One would have been motivated to make this modification in order to allow the model to recognize new classes with high accuracy while preserving the ability to recognize the original classes. [pg. 1, § 1 Introduction, ¶5, Li]
However Shmelkov/Lin1/Li fails to explicitly teach generate an adapted ML model for use with the second set of object classes by (i) instantiating the adapted ML model separate from the first ML model and the second ML model and (ii) training the adapted ML model using a combination of the first ML model, the second ML model, and an unlabeled set of auxiliary data;
Lee teaches generate an adapted ML model for use with the second set of object classes (“In naïve modeIMM, the second last-layer of the second network is used for the second last-layer of the final IMM model.” [pg. 15, Figure 5 caption; final model corresponds to an adapted ML model]) by (i) instantiating the adapted ML model separate from the first ML model and the second ML model (“The IMM procedure produces a neural network without a performance loss for kth task µk, which is better than the final solution µ1:k in terms of the performance of the kth task” [pg. 9, Balancing the Information of an Old and a New task, ¶1; IMM produces a neural network by combining multiple models trained on individual tasks.]) and (ii) training the adapted ML model using a combination of the first ML model, the second ML model, and an unlabeled set of auxiliary data (“The dimension of the random variable in the posterior distribution is the number of the parameters in the neural networks. IMM approximates the mixture of Gaussian posterior with each component representing parameters for a single task to one Gaussian distribution for a combined task. To merge the posteriors, we introduce two novel methods of moment matching. One is mean-IMM, which simply averages the parameters of two networks for old and new tasks as the minimization of the average of KL-divergence between one approximated posterior distribution for the combined task and each Gaussian posterior for the single task. The other is mode-IMM, which merges the parameters of two networks using a Laplacian approximation to approximate a mode of the mixture of two Gaussian posteriors, which represent the parameters of the two networks” [pg. 1-2, Introduction, ¶3; Lee’s method combines models that were individually trained on a task so that the final model has the ability to carry out multiple tasks.]);
Shmelkov discloses an incremental learning method using dual networks. Lin1 discloses feature pyramid networks for performing object detection. Li discloses a method of learning new tasks without catastrophic forgetting. Lee discloses a method to overcome catastrophic forgetting by incremental moment matching. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Shmelkov’s/Lin1’s/Li’s teachings by generating a final model by combining networks trained on individual tasks as taught by Lee. One would have been motivated to make this modification in order to overcome catastrophic forgetting in neural networks by balancing the information between new and old tasks. [Abstract, Lee]

Regarding claim 12, Shmelkov/Lin1/Li/Lee discloses The electronic device of claim 8, where Shmelkov further teaches wherein the processor is further configured to: 
in response to being unable to identify an object from the second set of object classes based on the first ML model, receiving a label of the object (“The second CNN (bottom) is an incrementally trained version of the first one for the category horse. In other words, the original network is adapted with images from only this new class. This adapted network localizes the horse in the image, but fails to detect the rider, which it was capable of originally, and despite the fact that the person class was not updated. In this paper, we present a method to alleviate this issue” [pg. 3400, § 1 Introduction, ¶2; Shmelkov discloses a method to solve the problem of a second network being unable to detect the original class in § 3.2 Dual-network learning, It is implicit that a label would be received for an object to perform this training method.]), 
wherein, to generate the adapted ML model for use with the second set of object classes, the processor is further configured to generate the adapted ML model for use with the second set of object classes, the labeled object being one of the object classes in the second set (“Each experiment begins with choosing a subset of classes to form the set CA. Then, a network is learned only on the subset of the training set composed of all the images containing at least one object from CA. Annotations for other classes in these images are ignored. With the new classes chosen to form the set CB, we learn the extended network as described in Section 3.2 with the subset of the training set containing at least one object from CB. As in the previous case, annotations of all the other classes, including those of the original classes CA, are ignored.” [pg. 3404, § 4.2 Implementation details, ¶1; note: See pg. 3407, Tables 6-9 discloses a labeled object in the second set.]).

Regarding claim 14, Shmelkov/Lin1/Li/Lee teaches The electronic device of claim 8, where Shmelkov further teaches wherein the processor is further configured to use the adapted ML model to perform object classification (“The extension is done only in the last fully connected layers, i.e., classification and bounding box regression.” [pg. 3403, § 3.2 Dual Network learning, ¶1]).

Regarding claim 15, Shmelkov teaches A non-transitory, computer-readable medium comprising program code that, when executed by a processor of an electronic device, (“These paradigms, and in particular fine-tuning, a special case of transfer learning, are very popular in computer vision” [pg. 3401, § 2. Related Work, ¶2, Computer vision implies the use of processors and memory. See § Acknowledgments where Shmelkov discloses use of GPUs.]), causes the electronic device to: 
identify, via the first machine learning (ML) model, a first set of object classes that the first ML model is trained to detect or classify (“First, we train a Fast R-CNN to detect the original set of classes CA. We refer to this network as A(CA).” [pg. 3403, § 3.2 Dual-network Learning, ¶1; original classes corresponds to a first set of object classes.]); 
generate a second ML model to detect or classify the second set of object classes using a labeled set of data for the second set of object classes (“In the first experiment we take 19 classes in alphabetical order from the VOC dataset as CA, and the remaining one as the only new class CB. We then train the A(1-19) network on the VOC trainval subset containing any of the 19 classes, and the B(20) network is trained on the trainval subset containing the new class” [pg. 3404, § 4.3 Addition of one class, ¶3, See Table 1. Class 20 is labeled “tvmonitor” which corresponds to a labeled set of data in a second set of object classes.]);
retain detection or classification performance on the first set of object classes in the ML adapted model by performing a knowledge distillation process on the first ML model (“Specifically, we evaluate each new training sample on the frozen copy (Network A) to choose a diverse set of proposals (distillation proposals in Figure 2), and record their responses. With these responses in hand, we compute a distillation loss which measures the discrepancy between the two networks for the distillation proposals. This loss is added to the crossentropy loss on the new classes to make up the loss function for training the adapted detection network. As we show in the experimental evaluation, the distillation loss as well as the strategy to select the distillation proposals are critical in preserving the performance on the old classes (see §4)” [pg. 3402, § 3. Incremental learning of new classes, ¶2; Shmelkov discloses performing knowledge distillation on Network A and Network B is adapted to retain knowledge from Network A in Figure 2.]), wherein the knowledge distillation process comprises using a network to extract one or more features for each of a plurality of training samples for the first set of object classes in the first ML model (“In our variant of Fast R-CNN, we replaced the VGG-16 trunk with a deeper ResNet-50 component, which is faster and more accurate than VGG-16. We follow the suggestions in to combine Fast R-CNN and ResNet architectures. The network processes the whole image through a sequence of residual blocks. Before the last strided convolution layer we insert a RoI pooling layer, which performs maxpooling over regions of varied sizes, i.e., proposals, into a 7 × 7 feature map. Then we add the remaining residual blocks, a layer for average pooling over spatial dimensions, and two fully connected layers: a softmax layer for classification (PASCAL or COCO classes, for example, along with the background class) and a regression layer for bounding box refinement, with independent corrections for each class.” [pg. 3402, § 3.1 Object detection network, ¶2; note: It is implicit that feature extraction would have to occur as RoI pooling layer in a Fast R-CNN extracts a feature vector from a feature map.); and
use the adapted ML model to detect or classify one or more objects from the first set of object classes and one or more objects from the second set of object classes (“We create sibling (i.e., fully-connected) layers for new classes only and concatenate their outputs with the original ones. The new layers are initialized randomly in the same way as the corresponding layers in Fast R-CNN. Our goal is to train B(CB) to recognize classes CA ∪ CB using only new data and annotations for CB.” [pg. 3403, § 3.2 Dual-network learning, ¶1; Shmelkov discloses CA to be the original (i.e. old) classes and CB to be the set of new classes. “Trained to recognize” would correspond to using the network to classify or detect objects.]).
However Shmelkov fails to explicitly teach using a feature pyramid network.
Lin1 teaches using a feature pyramid network (“Fast R-CNN is a region-based object detector in which Region-of-Interest (RoI) pooling is used to extract features. Fast R-CNN is most commonly performed on a single-scale feature map. To use it with our FPN, we need to assign RoIs of different scales to the pyramid levels… We view our feature pyramid as if it were produced from an image pyramid. Thus we can adapt the assignment strategy of region-based detectors in the case when they are run on image pyramids” [pg. 4, § 4.2 Feature Pyramid Networks for Fast R-CNN])
Shmelkov and Lin1 are both in the same field of endeavor of object detection. Shmelkov discloses an incremental learning method using dual networks. Lin1 discloses feature pyramid networks for performing object detection. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the Fast R-CNN object detector disclosed by Shmelkov with the feature pyramid map as taught by Lin1. Implementing a feature pyramid network with a Fast R-CNN would be a feasible combination as disclosed by Lin1 (§4.2). Thus, one would have been motivated to make this modification in order to detect objects at different scales more efficient and accurate. [pg. 2, § 1 Introduction, ¶7-8, Lin1]
	However Shmelkov/Lin1 fails to explicitly teach wherein (i) the second set of object classes includes at least one object class not present in the first set of object classes and (ii) the first set of object classes includes at least one object class not present in the second set of object classes;
	Li teaches wherein (i) the second set of object classes includes at least one object class not present in the first set of object classes and (ii) the first set of object classes includes at least one object class not present in the second set of object classes (“Given a CNN with shared parameters θs and task-specific parameters θo (Fig. 2(a)), our goal is to add task-specific parameters θn for a new task and to learn parameters that work well on old and new tasks, using images and labels from only the new task (i.e., without using data from existing tasks). Our algorithm is outlined in Fig. 3, and the network structure illustrated in Fig. 2(e).” [pg. 4, 3. Learning without Forgetting, ¶1; using images/labels from new tasks without using data from existing tasks implies that these images/labels were not present in the first set of object classes.]);
Shmelkov, Lin1, and Li are all in the same field of endeavor of object detection. Shmelkov discloses an incremental learning method using dual networks. Lin1 discloses feature pyramid networks for performing object detection. Li discloses a method of learning new tasks without catastrophic forgetting. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Shmelkov’s/Lin1’s teachings by using a second set of object classes that didn’t include an object class from the first set of object classes as taught by Li. One would have been motivated to make this modification in order to allow the model to recognize new classes with high accuracy while preserving the ability to recognize the original classes. [pg. 1, § 1 Introduction, ¶5, Li]
However Shmelkov/Lin1/Li fails to explicitly teach generate an adapted ML model for use with the second set of object classes by (i) instantiating the adapted ML model separate from the first ML model and the second ML model and (ii) training the adapted ML model using a combination of the first ML model, the second ML model, and an unlabeled set of auxiliary data;
Lee teaches generate an adapted ML model for use with the second set of object classes (“In naïve modeIMM, the second last-layer of the second network is used for the second last-layer of the final IMM model.” [pg. 15, Figure 5 caption; final model corresponds to an adapted ML model]) by (i) instantiating the adapted ML model separate from the first ML model and the second ML model (“The IMM procedure produces a neural network without a performance loss for kth task µk, which is better than the final solution µ1:k in terms of the performance of the kth task” [pg. 9, Balancing the Information of an Old and a New task, ¶1; IMM produces a neural network by combining multiple models trained on individual tasks.]) and (ii) training the adapted ML model using a combination of the first ML model, the second ML model, and an unlabeled set of auxiliary data (“The dimension of the random variable in the posterior distribution is the number of the parameters in the neural networks. IMM approximates the mixture of Gaussian posterior with each component representing parameters for a single task to one Gaussian distribution for a combined task. To merge the posteriors, we introduce two novel methods of moment matching. One is mean-IMM, which simply averages the parameters of two networks for old and new tasks as the minimization of the average of KL-divergence between one approximated posterior distribution for the combined task and each Gaussian posterior for the single task. The other is mode-IMM, which merges the parameters of two networks using a Laplacian approximation to approximate a mode of the mixture of two Gaussian posteriors, which represent the parameters of the two networks” [pg. 1-2, Introduction, ¶3; Lee’s method combines models that were individually trained on a task so that the final model has the ability to carry out multiple tasks.]);
Shmelkov discloses an incremental learning method using dual networks. Lin1 discloses feature pyramid networks for performing object detection. Li discloses a method of learning new tasks without catastrophic forgetting. Lee discloses a method to overcome catastrophic forgetting by incremental moment matching. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Shmelkov’s/Lin1’s/Li’s teachings by generating a final model by combining networks trained on individual tasks as taught by Lee. One would have been motivated to make this modification in order to overcome catastrophic forgetting in neural networks by balancing the information between new and old tasks. [Abstract, Lee]

Regarding claim 19, Shmelkov/Lin1/Li/Lee teaches The non-transitory, computer-readable medium of Claim 15, where Shmelkov further teaches further comprising program code that, when executed by the processor, causes the electronic device to: 
in response to being unable to identify an object from the second set of object classes based on the first ML model, receive a label of the object (“The second CNN (bottom) is an incrementally trained version of the first one for the category horse. In other words, the original network is adapted with images from only this new class. This adapted network localizes the horse in the image, but fails to detect the rider, which it was capable of originally, and despite the fact that the person class was not updated. In this paper, we present a method to alleviate this issue” [pg. 3400, § 1 Introduction, ¶2; Shmelkov discloses a method to solve the problem of a second network being unable to detect the original class in § 3.2 Dual-network learning, It is implicit that a label would be received for an object to perform this training method.]), 
wherein the program code that, when executed, causes the electronic device to generate the adapted ML model for use with the second set of object classes comprises program code that, when executed by the processor, causes the electronic device to generate the adapted ML model for use with the second set of object classes, the labeled object being one of the object classes in the second set (“Each experiment begins with choosing a subset of classes to form the set CA. Then, a network is learned only on the subset of the training set composed of all the images containing at least one object from CA. Annotations for other classes in these images are ignored. With the new classes chosen to form the set CB, we learn the extended network as described in Section 3.2 with the subset of the training set containing at least one object from CB. As in the previous case, annotations of all the other classes, including those of the original classes CA, are ignored.” [pg. 3404, § 4.2 Implementation details, ¶1; note: See pg. 3407, Tables 6-9 discloses a labeled object in the second set.]).

Regarding claim 21, Shmelkov/Lin1/Li/Lee teaches The method of Claim 1, where Shmelkov teaches wherein the first and second sets of object classes comprise image classes and the training samples comprise training images (“We evaluate our method on the PASCAL VOC 2007 detection benchmark and the Microsoft COCO challenge dataset. VOC 2007 consists of 5K images in the trainval split and 5K images in the test split for 20 object classes. COCO on the other hand has 80K images in the training set and 40K images in the validation set for 80 object classes (which includes all the classes from VOC).” [pg. 5, 4.1. Datasets and Evaluation, ¶1]).

Regarding claim 22, Shmelkov/Lin1/Li/Lee teaches The electronic device of Claim 8, where Shmelkov teaches wherein the first and second sets of object classes comprise image classes and the training samples comprise training images (“We evaluate our method on the PASCAL VOC 2007 detection benchmark and the Microsoft COCO challenge dataset. VOC 2007 consists of 5K images in the trainval split and 5K images in the test split for 20 object classes. COCO on the other hand has 80K images in the training set and 40K images in the validation set for 80 object classes (which includes all the classes from VOC).” [pg. 5, 4.1. Datasets and Evaluation, ¶1]).

Regarding claim 23, Shmelkov/Lin1/Li/Lee teaches The non-transitory, computer-readable medium of Claim 15, where Shmelkov teaches wherein the first and second sets of object classes comprise image classes and the training samples comprise training images (“We evaluate our method on the PASCAL VOC 2007 detection benchmark and the Microsoft COCO challenge dataset. VOC 2007 consists of 5K images in the trainval split and 5K images in the test split for 20 object classes. COCO on the other hand has 80K images in the training set and 40K images in the validation set for 80 object classes (which includes all the classes from VOC).” [pg. 5, 4.1. Datasets and Evaluation, ¶1]).

Claims 3, 4, 10, 11, 17, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Shmelkov in view of Lin1, Li, and Lee and further in view of Coates et al. ("Emergence of Object-Selective Features in Unsupervised Feature Learning", hereinafter "Coates").

Regarding claim 3, Shmelkov/Lin1/Li/Lee teaches The method of Claim 1, where Shmelkov further teaches wherein training the adapted ML model using the combination of the first ML model and the second ML model and the unlabeled set of auxiliary data comprises: 
performing object detection or classification on the unlabeled set of auxiliary data using the second ML model to generate a second set of model outputs (“the second B(CB) that is extended to detect the new classes CB, which were not present or at least not annotated in the source images.” [pg. 3403, § 3.2 Dual-network learning, ¶1]); and
where Lee teaches combining the first ML model and the second ML model based on a loss function using the first and second sets of model outputs (“In general, it is too naïve to assume that the final posterior distribution for the whole task is Gaussian. To make our IMM work, the search space of the loss function between the posterior means needs to be smooth and convex-like. In other words, there should not be high cost barriers between the means of the two networks for an old and a new task.” [pg. 2, ¶2; See Figure 1]).
Shmelkov/Lin1/Li/Lee fails to explicitly teach performing object detection or classification on the unlabeled set of auxiliary data using the first ML model to generate a first set of model outputs;
Coates teaches performing object detection or classification on the unlabeled set of auxiliary data using the first ML model to generate a first set of model outputs (“As described above, we ran our algorithm on patches harvested from YouTube thumbnails downloaded from the web. Specifically, we downloaded the thumbnails for over 1.4 million YouTube videos, some of which are shown in Figure 2b. These images were downsampled to 128-by-96 pixels and converted to grayscale. We cropped 57 million randomly selected 32-by-32 pixel patches from these images to form our unlabeled training set. No supervision was used—thus most patches contain partial views of objects or clutter at differing scales. We ran our algorithm on these images using a cluster of 30 machines over 3 days—virtually all of the time spent training the 150,000 second-layer features. We will now visualize these features and check whether any of them have learned to identify an object class.” [pg. 5, § 3 Experiments, ¶1]);
Shmelkov discloses an incremental learning method using dual networks. Lin1 discloses feature pyramid networks for performing object detection. Li discloses a method of learning new tasks without catastrophic forgetting. Lee discloses a method to overcome catastrophic forgetting by incremental moment matching. Coates discloses feature learning from unlabeled image data. Although Shmelkov discloses a first model that performs object detection, he fails to teach using unlabeled set of auxiliary data. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date to modify the teachings of Shmelkov/Lin1/Li/Lee, in particular, the first model disclosed by Shmelkov in order to train the model using the unlabeled auxiliary data taught by Coates. One would have been motivated to use unlabeled auxiliary data in order to train the model to detect more complex patterns. [pg. 1, §1 Introduction, Coates]

Regarding claim 4, Shmelkov/Lin1/Li/Lee teaches The method of claim 1, where Shmelkov further teaches wherein retaining the detection or classification performance on the first set of object classes in the adapted ML model comprises: 
retaining the detection or classification performance on the first set of object classes (“Using only the training samples for the new classes, we propose a method for not only adapting the old network to the new classes, but also ensuring performance on the old classes does not degrade.” [pg. 3401, top left col, ¶1; ensuring performance does not degrade would be equivalent to retaining.]).
Shmelkov/Lin1/Li/Lee fails to explicitly teach
generating, for a set of the training samples belonging to a same class in the first set of object classes, N clusters based on the extracted features; 
for each of the N clusters, selecting a training sample from the set of training samples that is a nearest-neighbor of a cluster centroid; and 
Coates teaches generating, for a set of the training samples belonging to a same class in the first set of object classes, N clusters based on the extracted features (“To construct the groups G, we will use a version of single-link agglomerative clustering to combine sets of features that have low dissimilarity according to d(k, l). To construct a single group G0 we begin by choosing a random simple cell filter, say D(k) , as the first member. We then search for candidate cells to be added to the group by computing d(k, l) for each simple cell filter D(l) and add D(l) to the group if d(k, l) is less than some limit τ” [pg. 3, § 2.2 Learning Invariant Features, ¶4; groups G would be equivalent to N clusters. Low dissimilarity is equivalent to samples belonging to same class.]); 
for each of the N clusters, selecting a training sample from the set of training samples that is a nearest-neighbor of a cluster centroid (“The algorithm then continues to expand G0 by adding any additional simple cells that are closer than τ to any one of the simple cells already in the group. This procedure continues until there are no more cells to be added, or until the diameter of the group (the dissimilarity between the two furthest cells in the group) reaches a limit ∆. See further: “We use τ = 0.3 for the first layer of complex cells and τ = 1.0 for the second layer. These were chosen by examining the typical distance between a filter D (k) and its nearest neighbor. We use ∆ = 1.5 > √ 2 so that a complex cell group may include orthogonal filters but cannot grow without limit.” [pg. 3, 2.2 Learning Invariant Features, ¶4])
Shmelkov discloses an incremental learning method using dual networks. Lin1 discloses feature pyramid networks for performing object detection. Li discloses a method of learning new tasks without catastrophic forgetting. Lee discloses a method to overcome catastrophic forgetting by incremental moment matching. Coates discloses feature learning from unlabeled image data. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify the incremental learning method disclosed by Shmelkov, the feature pyramid network disclosed by Lin1, the learning without forgetting methods of Li and Lee to include generating clusters and selecting a sample from the clusters as taught by Coates. One would have been motivated to use a K-means algorithm to generate a cluster of training samples to find features that are sensitive to commonly occurring object classes. [pg. 1, §Abstract, Coates]

Regarding claim 10, the combination of Shmelkov and Lin1 teaches The electronic device of Claim 9, where Shmelkov further teaches wherein to train the adapted ML model using the combination of the first ML model and the ML second model, and the unlabeled set of auxiliary data, the processor is further configured to: 
perform object detection or classification on the unlabeled set of auxiliary data using the second ML model to generate a second set of model outputs (“the second B(CB) that is extended to detect the new classes CB, which were not present or at least not annotated in the source images.” [pg. 3403, § 3.2 Dual-network learning, ¶1]); and
where Lee teaches combining the first ML model and the second ML model based on a loss function using the first and second sets of model outputs (“In general, it is too naïve to assume that the final posterior distribution for the whole task is Gaussian. To make our IMM work, the search space of the loss function between the posterior means needs to be smooth and convex-like. In other words, there should not be high cost barriers between the means of the two networks for an old and a new task.” [pg. 2, ¶2; See Figure 1]).
Shmelkov/Lin1/Li/Lee fails to explicitly teach performing object detection or classification on the unlabeled set of auxiliary data using the first ML model to generate a first set of model outputs;
Coates teaches performing object detection or classification on the unlabeled set of auxiliary data using the first ML model to generate a first set of model outputs (“As described above, we ran our algorithm on patches harvested from YouTube thumbnails downloaded from the web. Specifically, we downloaded the thumbnails for over 1.4 million YouTube videos, some of which are shown in Figure 2b. These images were downsampled to 128-by-96 pixels and converted to grayscale. We cropped 57 million randomly selected 32-by-32 pixel patches from these images to form our unlabeled training set. No supervision was used—thus most patches contain partial views of objects or clutter at differing scales. We ran our algorithm on these images using a cluster of 30 machines over 3 days—virtually all of the time spent training the 150,000 second-layer features. We will now visualize these features and check whether any of them have learned to identify an object class.” [pg. 5, § 3 Experiments, ¶1]);
Shmelkov discloses an incremental learning method using dual networks. Lin1 discloses feature pyramid networks for performing object detection. Li discloses a method of learning new tasks without catastrophic forgetting. Lee discloses a method to overcome catastrophic forgetting by incremental moment matching. Coates discloses feature learning from unlabeled image data. Although Shmelkov discloses a first model that performs object detection, he fails to teach using unlabeled set of auxiliary data. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date to modify the teachings of Shmelkov/Lin1/Li/Lee, in particular, the first model disclosed by Shmelkov in order to train the model using the unlabeled auxiliary data taught by Coates. One would have been motivated to use unlabeled auxiliary data in order to train the model to detect more complex patterns. [pg. 1, §1 Introduction, Coates]

Regarding claim 11, Shmelkov/Lin1/Li/Lee teaches The electronic device of claim 8, where Shmelkov further teaches wherein to retain the detection or classification performance on the first set of object classes in the adapted ML model, the processor is further configured to: 
retain the detection or classification performance on the first set of object classes (“Using only the training samples for the new classes, we propose a method for not only adapting the old network to the new classes, but also ensuring performance on the old classes does not degrade.” [pg. 3401, top left col, ¶1; ensuring performance does not degrade would be equivalent to retaining.]).
Shmelkov/Lin1/Li/Lee fails to explicitly teach
generate, for a set of the training samples belonging to a same class in the first set of object classes, N clusters based on the extracted features; 
for each of the N clusters, select a training sample from the set of training samples that is a nearest-neighbor of a cluster centroid; and 
Coates teaches generate, for a set of the training samples belonging to a same class in the first set of object classes, N clusters based on the extracted features (“To construct the groups G, we will use a version of single-link agglomerative clustering to combine sets of features that have low dissimilarity according to d(k, l). To construct a single group G0 we begin by choosing a random simple cell filter, say D(k) , as the first member. We then search for candidate cells to be added to the group by computing d(k, l) for each simple cell filter D(l) and add D(l) to the group if d(k, l) is less than some limit τ” [pg. 3, § 2.2 Learning Invariant Features, ¶4; groups G would be equivalent to N clusters. Low dissimilarity is equivalent to samples belonging to same class.]); 
for each of the N clusters, select a training sample from the set of training samples that is a nearest-neighbor of a cluster centroid (“The algorithm then continues to expand G0 by adding any additional simple cells that are closer than τ to any one of the simple cells already in the group. This procedure continues until there are no more cells to be added, or until the diameter of the group (the dissimilarity between the two furthest cells in the group) reaches a limit ∆. See further: “We use τ = 0.3 for the first layer of complex cells and τ = 1.0 for the second layer. These were chosen by examining the typical distance between a filter D (k) and its nearest neighbor. We use ∆ = 1.5 > √ 2 so that a complex cell group may include orthogonal filters but cannot grow without limit.” [pg. 3, 2.2 Learning Invariant Features, ¶4])
Shmelkov discloses an incremental learning method using dual networks. Lin1 discloses feature pyramid networks for performing object detection. Li discloses a method of learning new tasks without catastrophic forgetting. Lee discloses a method to overcome catastrophic forgetting by incremental moment matching. Coates discloses feature learning from unlabeled image data. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify the incremental learning method disclosed by Shmelkov, the feature pyramid network disclosed by Lin1, the learning without forgetting methods of Li and Lee to include generating clusters and selecting a sample from the clusters as taught by Coates. One would have been motivated to use a K-means algorithm to generate a cluster of training samples to find features that are sensitive to commonly occurring object classes. [pg. 1, §Abstract, Coates]

Regarding claim 17, the combination of Shmelkov and Lin1 teaches The non-transitory, computer-readable medium of claim 15, where Shmelkov further teaches wherein the program code that, when executed, causes the electronic device to train the adapted ML model using the combination of the first ML model, the second ML model and the unlabeled set of auxiliary data comprises program code that, when executed by the processor, causes the electronic device to:
 perform object detection or classification on the unlabeled set of auxiliary data using the second ML model to generate a second set of model outputs (“the second B(CB) that is extended to detect the new classes CB, which were not present or at least not annotated in the source images.” [pg. 3403, § 3.2 Dual-network learning, ¶1]); and
where Lee teaches combining the first ML model and the second ML model based on a loss function using the first and second sets of model outputs (“In general, it is too naïve to assume that the final posterior distribution for the whole task is Gaussian. To make our IMM work, the search space of the loss function between the posterior means needs to be smooth and convex-like. In other words, there should not be high cost barriers between the means of the two networks for an old and a new task.” [pg. 2, ¶2; See Figure 1]).
Shmelkov/Lin1/Li/Lee fails to explicitly teach performing object detection or classification on the unlabeled set of auxiliary data using the first ML model to generate a first set of model outputs;
Coates teaches performing object detection or classification on the unlabeled set of auxiliary data using the first ML model to generate a first set of model outputs (“As described above, we ran our algorithm on patches harvested from YouTube thumbnails downloaded from the web. Specifically, we downloaded the thumbnails for over 1.4 million YouTube videos, some of which are shown in Figure 2b. These images were downsampled to 128-by-96 pixels and converted to grayscale. We cropped 57 million randomly selected 32-by-32 pixel patches from these images to form our unlabeled training set. No supervision was used—thus most patches contain partial views of objects or clutter at differing scales. We ran our algorithm on these images using a cluster of 30 machines over 3 days—virtually all of the time spent training the 150,000 second-layer features. We will now visualize these features and check whether any of them have learned to identify an object class.” [pg. 5, § 3 Experiments, ¶1]);
Shmelkov discloses an incremental learning method using dual networks. Lin1 discloses feature pyramid networks for performing object detection. Li discloses a method of learning new tasks without catastrophic forgetting. Lee discloses a method to overcome catastrophic forgetting by incremental moment matching. Coates discloses feature learning from unlabeled image data. Although Shmelkov discloses a first model that performs object detection, he fails to teach using unlabeled set of auxiliary data. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date to modify the teachings of Shmelkov/Lin1/Li/Lee, in particular, the first model disclosed by Shmelkov in order to train the model using the unlabeled auxiliary data taught by Coates. One would have been motivated to use unlabeled auxiliary data in order to train the model to detect more complex patterns. [pg. 1, §1 Introduction, Coates]

Regarding claim 18, Shmelkov/Lin1/Li/Lee teaches The non-transitory, computer-readable medium of claim 15, where Shmelkov further teaches wherein the program code that, when executed, causes the electronic device to retain the detection or classification performance on the first set of object classes in the adapted ML model comprises program code that, when executed by the processor, causes the electronic device to: 
retain the detection or classification performance on the first set of object classes (“Using only the training samples for the new classes, we propose a method for not only adapting the old network to the new classes, but also ensuring performance on the old classes does not degrade.” [pg. 3401, top left col, ¶1; ensuring performance does not degrade would be equivalent to retaining.]).
Shmelkov/Lin1/Li/Lee fails to explicitly teach
generate, for a set of the training samples belonging to a same class in the first set of object classes, N clusters based on the extracted features; 
for each of the N clusters, select a training sample from the set of training samples that is a nearest-neighbor of a cluster centroid; and 
Coates teaches generate, for a set of the training samples belonging to a same class in the first set of object classes, N clusters based on the extracted features (“To construct the groups G, we will use a version of single-link agglomerative clustering to combine sets of features that have low dissimilarity according to d(k, l). To construct a single group G0 we begin by choosing a random simple cell filter, say D(k) , as the first member. We then search for candidate cells to be added to the group by computing d(k, l) for each simple cell filter D(l) and add D(l) to the group if d(k, l) is less than some limit τ” [pg. 3, § 2.2 Learning Invariant Features, ¶4; groups G would be equivalent to N clusters. Low dissimilarity is equivalent to samples belonging to same class.]); 
for each of the N clusters, select a training sample from the set of training samples that is a nearest-neighbor of a cluster centroid (“The algorithm then continues to expand G0 by adding any additional simple cells that are closer than τ to any one of the simple cells already in the group. This procedure continues until there are no more cells to be added, or until the diameter of the group (the dissimilarity between the two furthest cells in the group) reaches a limit ∆. See further: “We use τ = 0.3 for the first layer of complex cells and τ = 1.0 for the second layer. These were chosen by examining the typical distance between a filter D (k) and its nearest neighbor. We use ∆ = 1.5 > √ 2 so that a complex cell group may include orthogonal filters but cannot grow without limit.” [pg. 3, 2.2 Learning Invariant Features, ¶4])
Shmelkov discloses an incremental learning method using dual networks. Lin1 discloses feature pyramid networks for performing object detection. Li discloses a method of learning new tasks without catastrophic forgetting. Lee discloses a method to overcome catastrophic forgetting by incremental moment matching. Coates discloses feature learning from unlabeled image data. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify the incremental learning method disclosed by Shmelkov, the feature pyramid network disclosed by Lin1, the learning without forgetting methods of Li and Lee to include generating clusters and selecting a sample from the clusters as taught by Coates. One would have been motivated to use a K-means algorithm to generate a cluster of training samples to find features that are sensitive to commonly occurring object classes. [pg. 1, §Abstract, Coates]

Claims 6, 13, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Shmelkov in view Lin1, Li, and Lee and further in view of Lin et al. ("Microsoft COCO: Common Objects in Context", hereinafter "Lin2").

Regarding claim 6, Shmelkov/Lin1/Li/Lee teaches The method of claim 5, however fails to explicitly teach further comprising: searching for additional instances of objects in the object class of the labeled object based on the label, 
wherein generating the adapted ML model for use with the second set of object classes further comprises training the adapted ML model using the additional instances of the objects.
Lin2 teaches searching for additional instances of objects in the object class of the labeled object based on the label (“
    PNG
    media_image1.png
    213
    201
    media_image1.png
    Greyscale
, In the next stage all instances of the object categories in an image were labeled, Fig. 3(b). In the previous stage each worker labeled one instance of a category, but multiple object instances may exist” [pg. 746, § Instance Spotting, ¶1; Red annotated arrow shows searching additional instances of the label “bottle”.]), 
wherein generating the adapted ML model for use with the second set of object classes further comprises training the adapted ML model using the additional instances of the objects (“In the next stage all instances of the object categories in an image were labeled, Fig. 3(b). In the previous stage each worker labeled one instance of a category, but multiple object instances may exist. Therefore, for each image, a worker was asked to place a cross on top of each instance of a specific category found in the previous stage. To boost recall, the location of the instance found by a worker in the previous stage was shown to the current worker. Such priming helped workers quickly find an initial instance upon first seeing the image. The workers could also use a magnifying glass to find small instances. Each worker was asked to label at most 10 instances of a given category per image. Each image was labeled by 8 workers for a total of ∼10k worker hours.” [pg. 746, § Instance Spotting, ¶1]).
Shmelkov discloses an incremental learning method using dual networks. Lin1 discloses feature pyramid networks for performing object detection. Li discloses a method of learning new tasks without catastrophic forgetting. Lee discloses a method to overcome catastrophic forgetting by incremental moment matching. Lin2 discloses an image dataset that addresses common objects in context. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify the incremental learning method disclosed by Shmelkov and feature pyramid network as taught by Lin1, and the learning without forgetting methods of Li and Lee by using the image dataset disclosed by Lin2 in order to train the models. One would have been motivated to search for multiple instances of an object to further train the model to have better accuracy in object detection. [pg. 743, § Object detection, ¶3, Lin2]

Regarding claim 13, Shmelkov/Lin1/Li/Lee teaches The electronic device of claim 12, however fails to explicitly teach wherein: the processor is further configured to search for additional instances of objects in the object class of the labeled object based on the label, and 
to generate the adapted ML model for use with the second set of object classes, the processor is further configured to train the adapted ML model using the additional instances of the objects in the object class of the labeled object.
Lin2 teaches wherein: the processor is further configured to search for additional instances of objects in the object class of the labeled object based on the label (“
    PNG
    media_image1.png
    213
    201
    media_image1.png
    Greyscale
, In the next stage all instances of the object categories in an image were labeled, Fig. 3(b). In the previous stage each worker labeled one instance of a category, but multiple object instances may exist” [pg. 746, § Instance Spotting, ¶1; Red annotated arrow shows searching additional instances of the label “bottle”.]), and 
to generate the adapted ML model for use with the second set of object classes, the processor is further configured to train the adapted ML model using the additional instances of the objects in the object class of the labeled object (“In the next stage all instances of the object categories in an image were labeled, Fig. 3(b). In the previous stage each worker labeled one instance of a category, but multiple object instances may exist. Therefore, for each image, a worker was asked to place a cross on top of each instance of a specific category found in the previous stage. To boost recall, the location of the instance found by a worker in the previous stage was shown to the current worker. Such priming helped workers quickly find an initial instance upon first seeing the image. The workers could also use a magnifying glass to find small instances. Each worker was asked to label at most 10 instances of a given category per image. Each image was labeled by 8 workers for a total of ∼10k worker hours.” [pg. 746, § Instance Spotting, ¶1]).
Shmelkov discloses an incremental learning method using dual networks. Lin1 discloses feature pyramid networks for performing object detection. Li discloses a method of learning new tasks without catastrophic forgetting. Lee discloses a method to overcome catastrophic forgetting by incremental moment matching. Lin2 discloses an image dataset that addresses common objects in context. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify the incremental learning method disclosed by Shmelkov and feature pyramid network as taught by Lin1, and the learning without forgetting methods of Li and Lee by using the image dataset disclosed by Lin2 in order to train the models. One would have been motivated to search for multiple instances of an object to further train the model to have better accuracy in object detection. [pg. 743, § Object detection, ¶3, Lin2]

Regarding claim 20, Shmelkov/Lin1/Li/Lee teaches The non-transitory, computer-readable medium of Claim 19, however fails to explicitly teach further comprising program code that, when executed by the processor, causes the electronic device to: 
search for additional instances of objects in the object class of the labeled object based on the label, 
wherein the program code that, when executed, causes the electronic device to generate the adapted ML model for use with the second set of object classes comprises program code that, when executed by the processor, causes the electronic device to train the adapted ML model using the additional instances of the objects in the object class of the labeled object.
Lin2 teaches further comprising program code that, when executed by the processor, causes the electronic device to: 
search for additional instances of objects in the object class of the labeled object based on the label (“
    PNG
    media_image1.png
    213
    201
    media_image1.png
    Greyscale
, In the next stage all instances of the object categories in an image were labeled, Fig. 3(b). In the previous stage each worker labeled one instance of a category, but multiple object instances may exist” [pg. 746, § Instance Spotting, ¶1; Red annotated arrow shows searching additional instances of the label “bottle”.]), 
wherein the program code that, when executed, causes the electronic device to generate the adapted ML model for use with the second set of object classes comprises program code that, when executed by the processor, causes the electronic device to train the adapted ML model using the additional instances of the objects in the object class of the labeled object (“In the next stage all instances of the object categories in an image were labeled, Fig. 3(b). In the previous stage each worker labeled one instance of a category, but multiple object instances may exist. Therefore, for each image, a worker was asked to place a cross on top of each instance of a specific category found in the previous stage. To boost recall, the location of the instance found by a worker in the previous stage was shown to the current worker. Such priming helped workers quickly find an initial instance upon first seeing the image. The workers could also use a magnifying glass to find small instances. Each worker was asked to label at most 10 instances of a given category per image. Each image was labeled by 8 workers for a total of ∼10k worker hours.” [pg. 746, § Instance Spotting, ¶1]).
Shmelkov discloses an incremental learning method using dual networks. Lin1 discloses feature pyramid networks for performing object detection. Li discloses a method of learning new tasks without catastrophic forgetting. Lee discloses a method to overcome catastrophic forgetting by incremental moment matching. Lin2 discloses an image dataset that addresses common objects in context. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify the incremental learning method disclosed by Shmelkov and feature pyramid network as taught by Lin1, and the learning without forgetting methods of Li and Lee by using the image dataset disclosed by Lin2 in order to train the models. One would have been motivated to search for multiple instances of an object to further train the model to have better accuracy in object detection. [pg. 743, § Object detection, ¶3, Lin2]

Response to Arguments
Applicant's arguments filed 08/24/2022 have been fully considered but they are not persuasive. 

Regarding claim the 35 U.S.C. § 103 Rejection:
Applicant’s arguments on pgs. 13-14 regarding the prior art of Shmelkov failing to explicitly teach or disclose “wherein (i) the second set of object classes includes at least one object class not present in the first set of object classes and (ii) the first set of object classes includes at least one object class not present in the second set of object classes” has been considered but are moot because the amended limitation is now taught by the newly presented art of Li. Please see the updated 103 rejection above. 

Applicant’s arguments on pg. 14 regarding the prior art of Shmelkov failing to teach “the adapted ML model is generated by instantiating the adapted ML model separate from the first ML model and the second ML model” has been considered but are moot because the newly amended limitation is now taught by the newly presented art of Lee. Please see the updated 103 rejection above. 

Applicant’s arguments with respect to the rejections of the dependent claims have been fully considered but they are not persuasive as they rely upon the allowability of the independent claims.

Conclusion
Applicant's amendment necessitated the new grounds of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL H HOANG whose telephone number is (571)272-8491. The examiner can normally be reached Mon-Fri 8:30AM-4:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571) 272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/M.H.H./Examiner, Art Unit 2122                                                                                                                                                                                                        

/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122