DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 09/05/2019 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 14-20 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 14 recites the following limitation: “the rearranging further applying a second weight vector to the subset of channels according to a relevance criterion”. It is unclear whether there is a first weight vector or a plurality of vectors. For examination purposes, Examiner will be interpreting the “second weight vector” as a first weight vector. Claim 15 recites the limitation: “the rearranging further applying a third weight vector to the second subset of channels according to the relevance criterion”. This limitation has similar issues for the same reasons mentioned above. For examination purposes, Examiner will interpret the “third weight vector” as the “second weight vector. Claim 16-20 are dependent claims and are rejected for the same reasons. 
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1 and 2 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Gautam et al. (US-10452959-B1).
Regarding Claim 1,
Gautam (US 10452959 B1) teaches a method comprising: 
selecting a pretrained model (Col. 17 lines 65-67; In these training examples, all object detector models were pre-trained on the COCO dataset.sup.11.) to operate in an augmented model configuration with a submodel (fig. 4; Col. 14, lines 9-13; In the example of FIG. 4, a multi-perspective object detector consists of a first single-stage object detection pipeline 404 and a second single-single stage object detection pipeline 414. First single-stage object detection pipeline 404 (i.e. pretrained model). Second single-stage object detection pipeline 414 (i.e. submodel). Col. 6 lines 60-64. submodel); 
training (FIG. 1 is a conceptual diagram illustrating an example computing device that may be configured to train and execute an object detector, such as a multi-perspective object detector configured in accordance with the techniques of this disclosure.), using a processor and a memory, the submodel using training data corresponding to a second domain (Fig. 4; input image 422 (i.e. second domain).), wherein the pretrained model is trained to operate on data of a first domain (Fig. 4; input image 402 (i.e. first domain); and 
augmenting, to form the augmented model configuration, the pretrained model with the submodel, the augmenting comprising: 
combining, to form a combined feature map, a first feature map being output from a layer in the pretrained model with a second feature map being output from a layer in the submodel (Col. 6 lines 60-64; To begin the functions of combining feature map data, FL.sub.1 may transform feature map A, and FL.sub.2 may transform feature map B to a common basis. To transform feature maps A and B to a common basis, FL.sub.1 and FL.sub.2 may use a set of convolutional and residual layers F.); and 
inputting the combined feature map into a different layer in the submodel (Col. 6 lines 14-18; The fused feature map data formed by the sub-layers may then be used by other layers of an object detection pipeline, such as classifier and/or a region proposal network to localize and classify objects within a given input image.).
Regarding Claim 2,
Gautam teaches the method of claim 1, further comprising: concatenating, as a part of the combining, the first feature map and the second feature map (col. 6 lines 4-9; According to a more particular implementation, a fusion layer may combine one perspective's feature map with feature map data of a fusion layer map of another different perspective to form a “fused” feature map that effectively cross-references the two (or more) perspectives.).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 3-5 are rejected under 35 U.S.C. 103 as being unpatentable over Gautam et al., as applied above, and further in view of Li et al. (US-20200280717-A1).
Regarding Claim 3,
Gautam teaches the method of claim 1. 
	Gautam does not explicitly disclose
further comprising: adjusting a dimensionality of an original feature map, the original feature map being an original output from the layer in the pretrained model, the adjusting resulting in the first feature map used in the combining.
However, Li teaches
further comprising: adjusting a dimensionality of an original feature map, the original feature map being an original output from the layer in the pretrained model, the adjusting resulting in the first feature map used in the combining (para [0130] The classifier 920 receives the feature maps 919 and convolves each of the feature maps with 2×2 separable convolution filters to combine feature maps of the feature maps 919 into one, thereby resulting in feature maps 921. It is noted that the block 902 can be partitioned into 2×2 blocks, each of size 32×32. As such, the classifier 920 reduces, to the size of 2×2, the feature maps 919 (which are each of size 4×4) through a series of non-overlapping convolutions using 1×1 filters to gradually reduce the feature dimension size to 1, as described above with respect to the feature maps 919, thereby resulting in a feature map 927.).
Gautam and Li are analogous arts because both are directed to the same field of endeavor of convolutional neural networks.
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the CNN of Gautam with the feature map reduction of Li.
Doing so would allow for would allow for improving accuracy while significantly reducing the computational complexity. This would result in less time for the model execution and more accurate predictions (para [0033]).
Regarding Claim 4,
Gautam and Li teach the method of claim 3. Li further teaches wherein the adjusting comprises reducing the dimensionality of the original feature map (para [0130] The classifier 920 receives the feature maps 919 and convolves each of the feature maps with 2×2 separable convolution filters to combine feature maps of the feature maps 919 into one, thereby resulting in feature maps 921. It is noted that the block 902 can be partitioned into 2×2 blocks, each of size 32×32. As such, the classifier 920 reduces, to the size of 2×2, the feature maps 919 (which are each of size 4×4) through a series of non-overlapping convolutions using 1×1 filters to gradually reduce the feature dimension size to 1, as described above with respect to the feature maps 919, thereby resulting in a feature map 927.).
Gautam and Li are analogous arts because both are directed to the same field of endeavor of convolutional neural networks.
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the CNN of Gautam with the feature map reduction of Li.
Doing so would allow for would allow for improving accuracy while significantly reducing the computational complexity. This would result in less time for the model execution and more accurate predictions (para [0033]).
Regarding Claim 5,
Gautam and Li teach the method of claim 4. Li further teaches wherein the reducing comprises applying a 1-by-1 convolution to the original feature map (para [0130] As such, the classifier 920 reduces, to the size of 2×2, the feature maps 919 (which are each of size 4×4) through a series of non-overlapping convolutions using 1×1 filters to gradually reduce the feature dimension size to 1, as described above with respect to the feature maps 919, thereby resulting in a feature map 927.).
Gautam and Li are analogous arts because both are directed to the same field of endeavor of convolutional neural networks.
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the CNN of Gautam with the feature map reduction of Li.
Doing so would allow for would allow for improving accuracy while significantly reducing the computational complexity. This would result in less time for the model execution and more accurate predictions (para [0033]).

Claims 6 and 7 are rejected under 35 U.S.C. 103 as being unpatentable over Gautam et al., as applied above, and further in view of Baker (US-20200285939-A1).
Regarding Claim 6,
Gautam teaches the method of claim 1.
	Gautam does not explicitly disclose 
wherein the submodel is smaller than the pretrained model according to at least one factor selected from a set of factors comprising (i) a total number of nodes in the submodel and (ii) a total number of layers in the submodel.
However, Baker (US 20200285939 A1) teaches
wherein the submodel is smaller than the pretrained model according to at least one factor selected from a set of factors comprising (i) a total number of nodes in the submodel and (ii) a total number of layers in the submodel (para [0539] The embodiment illustrated in FIG. 27B is specific to neural networks and includes soft tying of nodes between the two networks. The networks as drawn show the example of a network with fewer layers transferring knowledge to a network with an expanded number of layers. Network with less layers (i.e submodel). Network with more layers (i.e. pre-trained model).).
Gautam and Baker are analogous arts because both are directed to the field of endeavor of transfer learning.
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify transfer machine learning of Gautam with the smaller model of Baker.
Doing so would allow for transferring knowledge from any network to another network trying to learn the same classification task. This would save time, instead of training a machine learning model from scratch (para [0542]).
Regarding Claim 7,
Gautam teaches the method of claim 1. 
	Gautam does not explicitly disclose
wherein the submodel is smaller than the pretrained model according to a total number of model parameters.
However, Baker (US 20200285939 A1) teaches
wherein the submodel is smaller than the pretrained model according to a total number of model parameters (para [0623] Learning by imitation can transfer knowledge from a smaller network to a larger network, which facilitates growing a deeper neural network. It also can be used to transfer knowledge from an ensemble of shorter, wider networks to a single, deeper, thinner network with a smaller total number of parameters.).
Gautam and Baker are analogous arts because both are directed to the field of endeavor of transfer learning.
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify transfer machine learning of Gautam with the smaller model of Baker.
Doing so would allow for transferring knowledge from any network to another network trying to learn the same classification task. This would save time, instead of training a machine learning model from scratch (para [0542]).

Claims 8 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Gautam et al. (US-10452959-B1) in view of Molchanov et al. (US 20180114114 A1).
Regarding Claim 8,
Gautam teaches a method comprising: 
selecting a pretrained model (Col. 17 lines 65-67; In these training examples, all object detector models were pre-trained on the COCO dataset.sup.11.) to operate in an augmented model configuration with a submodel (fig. 4; Col. 14, lines 9-13; In the example of FIG. 4, a multi-perspective object detector consists of a first single-stage object detection pipeline 404 and a second single-single stage object detection pipeline 414. First single-stage object detection pipeline 404 (i.e. pretrained model). Second single-stage object detection pipeline 414 (i.e. submodel). Col. 6 lines 60-64. submodel); 
training (FIG. 1 is a conceptual diagram illustrating an example computing device that may be configured to train and execute an object detector, such as a multi-perspective object detector configured in accordance with the techniques of this disclosure.), using a processor and a memory, the submodel using training data corresponding to a second domain (Fig. 4; input image 422 (i.e. second domain).), wherein the pretrained model is trained to operate on data of a first domain (Fig. 4; input image 402 (i.e. first domain)); and 
augmenting, to form the augmented model configuration, the pretrained model with the submodel, the augmenting comprising: 
combining, to form a combined feature map, a first feature matrix of the channel in the first feature map with a second feature map being output from a layer in the submodel (Col. 6 lines 60-64; To begin the functions of combining feature map data, FL.sub.1 may transform feature map A, and FL.sub.2 may transform feature map B to a common basis. To transform feature maps A and B to a common basis, FL.sub.1 and FL.sub.2 may use a set of convolutional and residual layers F.); and 
inputting the combined feature map into a different layer in the submodel (Col. 6 lines 14-18; The fused feature map data formed by the sub-layers may then be used by other layers of an object detection pipeline, such as classifier and/or a region proposal network to localize and classify objects within a given input image.).
Gautam does not explicitly disclose
adjusting an attention value of a channel in a first feature map being output from a layer in the pretrained model, wherein the adjusting causes a first feature matrix of the channel in the first feature map to have a greater weight relative to a second feature matrix of a different channel in the first feature map; 
However, Molchanov (US 20180114114 A1) teaches
adjusting an attention value of a channel in a first feature map being output from a layer in the pretrained model, wherein the adjusting causes a first feature matrix of the channel in the first feature map to have a greater weight relative to a second feature matrix of a different channel in the first feature map (para [0034] Scaling a criterion across layers is very important for pruning. If the criterion is not properly scaled, then a hand-tuned multiplier would need to be selected for each layer. Without normalization, a conventional weight magnitude criterion tends to rank feature maps from the first layers more important than last layers; a conventional activation criterion ranks middle layers more important; Weight magnitude criterion (i.e. selection parameter).); 
Gautam and Molchanov are analogous arts because both are directed to the field of endeavor of transfer learning.
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the transfer learning model of Gautam with the data pruning of Molchanov
Doing so would allow for the fine-tuning of existing deep neural networks to improve accuracy. Issues such as memory demand and power consumption may be addressed with the pruning technique (para [0003])
Regarding Claim 10,
Gautam and Molchanov teach the method of claim 8. Molchanov further teaches further comprising: applying a scaling factor to a plurality of weighted feature matrices from at least one of the first feature map and the second feature map (para [0034] Scaling a criterion across layers is very important for pruning. If the criterion is not properly scaled, then a hand-tuned multiplier would need to be selected for each layer. Without normalization, a conventional weight magnitude criterion tends to rank feature maps from the first layers more important than last layers; a conventional activation criterion ranks middle layers more important; Weight magnitude criterion (i.e. selection parameter).); 
Gautam and Molchanov are analogous arts because both are directed to the field of endeavor of transfer learning.
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the transfer learning model of Gautam with the data pruning of Molchanov
Doing so would allow for the fine-tuning of existing deep neural networks to improve accuracy. Issues such as memory demand and power consumption may be addressed with the pruning technique (para [0003])

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Gautam et al. and Molchanov et al., as applied above, and further in view of Panboonyuen et al. (“Semantic segmentation on remotely sensed images using an enhanced global convolutional network with channel attention and domain specific transfer learning”).
Regarding Claim 9,
Gautam and Molchanov teach the method of claim 8.
	Gautam and Molchanov do not explicitly disclose
further comprising: adjusting a second attention value of a second channel in a second feature map being output from a layer in the submodel, wherein the adjusting the second attention value causes a first feature matrix of the second channel in the second feature map to have a greater weight relative to a second feature matrix of a second different channel in the second feature map, and wherein the combining combines the first feature matrix of the second channel in the second feature map with the first feature matrix of the channel in the first feature map.
However, Panboonyuen teaches
further comprising: adjusting a second attention value of a second channel in a second feature map being output from a layer in the submodel, wherein the adjusting the second attention value causes a first feature matrix of the second channel in the second feature map to have a greater weight relative to a second feature matrix of a second different channel in the second feature map (pg. 6, section 3.3; To apply this atttentional layer to our network, the channel attention block is shown in Block A in Figure 2 and its detailed architecture is shown in Figure 4. It is designed to change the weights of the remote sensing features on each stage (level), so that the weights are assigned more values on important features adaptively.), and wherein the combining combines the first feature matrix of the second channel in the second feature map with the first feature matrix of the channel in the first feature map (pg. 4, section 2.3; Meanwhile, it can combine the information across all channels. Then the following is a basic residual block [7], which can refine the feature map.).
Gautam, Molchanov, and Panboonyuen are analogous arts because both are directed to the field of endeavor of transfer learning.
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the transfer learning model of Gautam and Molchanov with the channel attention of Panboonyuen.
Doing so would allow for extracting multi-scale features from the network. Experiments have shown to outperform conventional methods in terms of f1 scores by seventeen percent (Abs.)

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Gautam et al. and Molchanov et al., as applied above, and further in view of Zhu et al. (US-20220189170-A1).
Regarding Claim 11,
Gautam and Molchanov teach the method of claim 8. 
	Gautam and Molchanov do not explicitly disclose
further comprising: applying a channel-wise multiplexing to the combined feature map prior to inputting the combined feature map.
However, Zhu (US 20220189170 A1) teaches
further comprising: applying a channel-wise multiplexing to the combined feature map prior to inputting the combined feature map (para [0050] Finally, the slices can be concatenated channel-wise to obtain c.sub.t and h.sub.t. Note that .sup.jW.sup.kåX denotes a depthwise separable convolution with weights W, input X, j input channels, and k output channels, ϕ denotes the ReLU activation function, .Math. denotes the Hadamard product, σ denotes the sigmoid function, and [a, b] denotes channel-wise concatenation of a and b, as shown in FIG. 3. In some implementations, the number of groups G=4 with a 320-channel state.).
Gautam, Molchanov, and Zhu are analogous arts because both are directed to the same field of endeavor of convolutional neural networks.
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the CNN of Gautam and Molachanov with the channel-wise multiplexing of Zhu.
Doing so would allow for improving the accuracy and speed of the CNN. This efficiency allows for the network to run on environments with extreme computation and energy restraints while maintaining high accuracy (para [0002]).

Claims 12 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Gautam et al. and Molchanov et al., as applied above, and further in view of Baker (US-20200285939-A1).
Regarding Claim 12,
Gautam and Molchanov teach the method of claim 8. 
	Gautam and Molchanov do not explicitly disclose
wherein the submodel is smaller than the pretrained model according to at least one factor selected from a set of factors comprising (i) a total number of nodes in the submodel and (ii) a total number of layers in the submodel.
However, Baker (US 20200285939 A1) teaches
wherein the submodel is smaller than the pretrained model according to at least one factor selected from a set of factors comprising (i) a total number of nodes in the submodel and (ii) a total number of layers in the submodel (para [0539] The embodiment illustrated in FIG. 27B is specific to neural networks and includes soft tying of nodes between the two networks. The networks as drawn show the example of a network with fewer layers transferring knowledge to a network with an expanded number of layers. Network with less layers (i.e submodel). Network with more layers (i.e. pre-trained model).).
Gautam, Molchanov, and Baker are analogous arts because both are directed to the field of endeavor of transfer learning.
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify transfer machine learning of Gautam with the smaller model of Baker.
Doing so would allow for transferring knowledge from any network to another network trying to learn the same classification task. This would save time, instead of training a machine learning model from scratch (para [0542]).
Regarding Claim 13,
Gautam and Molchanov teach the method of claim 8. 
	Gautam and Molchanov do not explicitly disclose
wherein the submodel is smaller than the pretrained model according to a total number of model parameters.
However, Baker (US 20200285939 A1) teaches
wherein the submodel is smaller than the pretrained model according to a total number of model parameters (para [0623] Learning by imitation can transfer knowledge from a smaller network to a larger network, which facilitates growing a deeper neural network. It also can be used to transfer knowledge from an ensemble of shorter, wider networks to a single, deeper, thinner network with a smaller total number of parameters.).
Gautam, Molchanov, and Baker are analogous arts because both are directed to the field of endeavor of transfer learning.
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify transfer machine learning of Gautam with the smaller model of Baker.
Doing so would allow for transferring knowledge from any network to another network trying to learn the same classification task. This would save time, instead of training a machine learning model from scratch (para [0542]).

Claims 14, 16, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Gautam et al. (US-10452959-B1) in view of Molchanov et al. (US 20180114114 A1) and Nishiyuki (US-20220172046-A1).
Regarding Claim 14,
Gautam teaches a method comprising:  Page 35 of 39 
Docket No. P201901116US01selecting a pretrained model (Col. 17 lines 65-67; In these training examples, all object detector models were pre-trained on the COCO dataset.sup.11.) to operate in an augmented model configuration with a submodel (fig. 4; Col. 14, lines 9-13; In the example of FIG. 4, a multi-perspective object detector consists of a first single-stage object detection pipeline 404 and a second single-single stage object detection pipeline 414. First single-stage object detection pipeline 404 (i.e. pretrained model). Second single-stage object detection pipeline 414 (i.e. submodel). Col. 6 lines 60-64. Submodel.); 
training (FIG. 1 is a conceptual diagram illustrating an example computing device that may be configured to train and execute an object detector, such as a multi-perspective object detector configured in accordance with the techniques of this disclosure.), using a processor and a memory, the submodel using training data corresponding to a second domain (Fig. 4; input image 422 (i.e. second domain).), wherein the pretrained model is trained to operate on data of a first domain (Fig. 4; input image 402 (i.e. first domain)).; and 
augmenting, to form the augmented model configuration, the pretrained model with the submodel, the augmenting comprising: 
combining, to form a combined feature map, a first feature matrix of the first channel in the first feature map with a second feature map being output from a layer in the submodel (Col. 6 lines 60-64; To begin the functions of combining feature map data, FL.sub.1 may transform feature map A, and FL.sub.2 may transform feature map B to a common basis. To transform feature maps A and B to a common basis, FL.sub.1 and FL.sub.2 may use a set of convolutional and residual layers F.); and 
inputting the combined feature map into a different layer in the submodel (Col. 6 lines 14-18; The fused feature map data formed by the sub-layers may then be used by other layers of an object detection pipeline, such as classifier and/or a region proposal network to localize and classify objects within a given input image.).
Gautam does not explicitly disclose
applying a channel selection parameter to a first channel in a first feature map being output from a layer in the pretrained model, wherein the applying causes a first feature matrix of the first channel in the first feature map to have a greater weight relative to a second feature matrix of a different channel in the first feature map; 
rearranging a subset of channels from the output of the layer in the pretrained model, the subset including those channels whose channel selection parameters cause those channels to have a greater than a threshold weight, the rearranging further applying a second weight vector to the subset of channels according to a relevance criterion, the subset including the first channel as a highest weighted channel; 
However, Molchanov (US 20180114114 A1) teaches
applying a channel selection parameter to a first channel in a first feature map being output from a layer in the pretrained model, wherein the applying causes a first feature matrix of the first channel in the first feature map to have a greater weight relative to a second feature matrix of a different channel in the first feature map (para [0034] Scaling a criterion across layers is very important for pruning. If the criterion is not properly scaled, then a hand-tuned multiplier would need to be selected for each layer. Without normalization, a conventional weight magnitude criterion tends to rank feature maps from the first layers more important than last layers; a conventional activation criterion ranks middle layers more important; Weight magnitude criterion (i.e. selection parameter).); 
rearranging a subset of channels from the output of the layer in the pretrained model,..., the rearranging further applying a second weight vector to the subset of channels according to a relevance criterion (para [0043] For example, a pruning gate g.sub.l.sup.(k)∈{0, 1}.sup.C.sub.l, may be a switch that determines if a particular feature map is included or pruned during feed-forward propagation, such that when g is vectorized: W′=gW. W’ (i.e. weight vector).), the subset including the first channel as a highest weighted channel (para [0034] Scaling a criterion across layers is very important for pruning. If the criterion is not properly scaled, then a hand-tuned multiplier would need to be selected for each layer. Without normalization, a conventional weight magnitude criterion tends to rank feature maps from the first layers more important than last layers; a conventional activation criterion ranks middle layers more important; Examiner interprets output of a layer as ‘Channels’. Output of first layer (i.e. first channel) is weighted higher.); 
Gautam and Molchanov are analogous arts because both are directed to the field of endeavor of transfer learning.
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the transfer learning model of Gautam with the data pruning of Molchanov
Doing so would allow for the fine-tuning of existing deep neural networks to improve accuracy. Issues such as memory demand and power consumption may be addressed with the pruning technique (para [0003])
Nishiyuki (US 20220172046 A1) teaches
the subset including those channels whose channel selection parameters cause those channels to have a greater than a threshold weight (para [0128] The layers (501 to 507, 511 to 517) in each neural network (50, 51) have computational parameters for computation. More specifically, the neurons in each layer are connected to the neurons in the neighboring layer as appropriate, with each connection having a preset weight (connection weight). Each neuron in each layer (501 to 507, 511 to 517) has a preset threshold. An output of each neuron is determined based on whether the sum of the product of each input and the corresponding weight exceeds the threshold.),
Gautam and Nishiyuki are analogous arts because both are directed to the same field of endeavor of convolutional neural networks.
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the CNN of Gautam with the weight threshold of Nishiyuki.
Doing so would allow for improving the performance of the classifier with fewer samples. A higher accuracy may be achieved in less time resulting in improved performance (para [0005])
Regarding Claim 16,
Gautam, Molchanov, and Nishiyuki teach the method of claim 14. Molchanov further teaches further comprising: applying a batch normalization to a plurality of feature matrices in the combined feature map prior to the inputting (para [0034] Without normalization, a conventional weight magnitude criterion tends to rank feature maps from the first layers more important than last layers; a conventional activation criterion ranks middle layers more important; and the first criterion technique ranks first layers higher. After l.sub.2 normalization, each layer has some feature maps that are highly important and others that are unimportant).
Gautam and Molchanov are analogous arts because both are directed to the field of endeavor of transfer learning.
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the transfer learning model of Gautam with the data pruning of Molchanov
Doing so would allow for the fine-tuning of existing deep neural networks to improve accuracy. Issues such as memory demand and power consumption may be addressed with the pruning technique (para [0003])
Regarding Claim 18,
Gautam, Molchanov, and Nishiyuki teach the method of claim 14. Molchanov further teaches further comprising: applying a rectified linear unit computation to the combined feature map prior to inputting the combined feature map (para [0024] The nonlinear activation R is assumed to be the rectified linear unit. Although the techniques are described in the context of two-dimensional (2D) convolutions, the techniques may also be applied to three-dimensional (3D) convolutions.).
Gautam and Molchanov are analogous arts because both are directed to the field of endeavor of transfer learning.
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the transfer learning model of Gautam with the data pruning of Molchanov
Doing so would allow for the fine-tuning of existing deep neural networks to improve accuracy. Issues such as memory demand and power consumption may be addressed with the pruning technique (para [0003]).

Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Gautam, Molchanov, and Nishiyuki, as applied above, and further in view of Panboonyuen et al. (“Semantic segmentation on remotely sensed images using an enhanced global convolutional network with channel attention and domain specific transfer learning”).
Regarding Claim 15,
Gautam, Molchanov, and Nishiyuki teach the method of claim 14. Gautam further teaches further comprising: 
wherein the combining to form the combined feature map combines a first feature matrix of the first channel in the subset of channels with a second feature map from the second channel in the second subset (Col. 6 lines 60-64; To begin the functions of combining feature map data, FL.sub.1 may transform feature map A, and FL.sub.2 may transform feature map B to a common basis. To transform feature maps A and B to a common basis, FL.sub.1 and FL.sub.2 may use a set of convolutional and residual layers F.).
Molchanov further teaches
rearranging a second subset of channels from the output of the layer in the submodel, …, the rearranging further applying a third weight vector to the second subset of channels according to the relevance criterion (para [0043] For example, a pruning gate g.sub.l.sup.(k)∈{0, 1}.sup.C.sub.l, may be a switch that determines if a particular feature map is included or pruned during feed-forward propagation, such that when g is vectorized: W′=gW. W’ (i.e. weight vector).), the second subset including the second channel as a highest weighted channel (para [0034] Scaling a criterion across layers is very important for pruning. If the criterion is not properly scaled, then a hand-tuned multiplier would need to be selected for each layer. Without normalization, a conventional weight magnitude criterion tends to rank feature maps from the first layers more important than last layers; a conventional activation criterion ranks middle layers more important; Examiner interprets output of a layer as ‘Channels’. Output of middle layer (i.e. first channel) is weighted higher.), and 
Gautam and Molchanov are analogous arts because both are directed to the field of endeavor of transfer learning.
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the transfer learning model of Gautam with the data pruning of Molchanov
Doing so would allow for the fine-tuning of existing deep neural networks to improve accuracy. Issues such as memory demand and power consumption may be addressed with the pruning technique (para [0003])
Nishiyuki further teaches
the second subset including those channels whose channel selection parameters cause those channels to have a greater than the threshold weight (para [0128] The layers (501 to 507, 511 to 517) in each neural network (50, 51) have computational parameters for computation. More specifically, the neurons in each layer are connected to the neurons in the neighboring layer as appropriate, with each connection having a preset weight (connection weight). Each neuron in each layer (501 to 507, 511 to 517) has a preset threshold. An output of each neuron is determined based on whether the sum of the product of each input and the corresponding weight exceeds the threshold.)
Gautam and Nishiyuki are analogous arts because both are directed to the same field of endeavor of convolutional neural networks.
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the CNN of Gautam with the weight threshold of Nishiyuki.
Doing so would allow for improving the performance of the classifier with fewer samples. A higher accuracy may be achieved in less time resulting in improved performance (para [0005])
Gautam, Molchanov, and Nishiyuki do not explicitly disclose
applying a second channel selection parameter to a second channel in the second feature map, wherein the applying the second channel selection parameter causes a second feature matrix of the second channel in the second feature map to Page 36 of 39 Docket No. P201901116US01have a greater weight relative to a third feature matrix of a different channel in the second feature map; 
However, However, Panboonyuen teaches
applying a second channel selection parameter to a second channel in the second feature map, wherein the applying the second channel selection parameter causes a second feature matrix of the second channel in the second feature map to Page 36 of 39 Docket No. P201901116US01have a greater weight relative to a third feature matrix of a different channel in the second feature map (pg. 6, section 3.3; To apply this atttentional layer to our network, the channel attention block is shown in Block A in Figure 2 and its detailed architecture is shown in Figure 4. It is designed to change the weights of the remote sensing features on each stage (level), so that the weights are assigned more values on important features adaptively.); 
Gautam and Panboonyuen are analogous arts because both are directed to the field of endeavor of transfer learning.
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the transfer learning model of Gautam and Molchanov with the channel attention of Panboonyuen.
Doing so would allow for extracting multi-scale features from the network. Experiments have shown to outperform conventional methods in terms of f1 scores by seventeen percent (Abs.)

Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Gautam, Molchanov, and Nishiyuki, as applied above, and further in view of Zhu et al. (US-20220189170-A1).
Regarding Claim 17,
Gautam, Molchanov, and Nishiyuki teach the method of claim 14. 
	Gautam, Molchanov, and Nishiyuki do not explicitly disclose
further comprising: applying a channel-wise multiplexing to the combined feature map prior to inputting the combined feature map.
However, Zhu (US 20220189170 A1) teaches
further comprising: applying a channel-wise multiplexing to the combined feature map prior to inputting the combined feature map (para [0050] Finally, the slices can be concatenated channel-wise to obtain c.sub.t and h.sub.t. Note that .sup.jW.sup.kåX denotes a depthwise separable convolution with weights W, input X, j input channels, and k output channels, ϕ denotes the ReLU activation function, .Math. denotes the Hadamard product, σ denotes the sigmoid function, and [a, b] denotes channel-wise concatenation of a and b, as shown in FIG. 3. In some implementations, the number of groups G=4 with a 320-channel state.).
Gautam, Molchanov, Nishiyuki, and Zhu are analogous arts because both are directed to the same field of endeavor of convolutional neural networks.
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the CNN of Gautam and Molachanov with the channel-wise multiplexing of Zhu.
Doing so would allow for improving the accuracy and speed of the CNN. This efficiency allows for the network to run on environments with extreme computation and energy restraints while maintaining high accuracy (para [0002]).

Claims 19 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Gautam, Molchanov, and Nishiyuki, as applied above, and further in view of Baker (US-20200285939-A1).
Regarding Claim 19,
Gautam, Molchanov, and Nishiyuki teach the method of claim 14. 
	Gautam, Molchanov, and Nishiyuki do not explicitly disclose
wherein the submodel is smaller than the pretrained model according to at least one factor selected from a set of factors comprising (i) a total number of nodes in the submodel and (ii) a total number of layers in the submodel.
However, Baker (US 20200285939 A1) teaches
wherein the submodel is smaller than the pretrained model according to at least one factor selected from a set of factors comprising (i) a total number of nodes in the submodel and (ii) a total number of layers in the submodel (para [0539] The embodiment illustrated in FIG. 27B is specific to neural networks and includes soft tying of nodes between the two networks. The networks as drawn show the example of a network with fewer layers transferring knowledge to a network with an expanded number of layers. Network with less layers (i.e submodel). Network with more layers (i.e. pre-trained model).).
Gautam, Molchanov, Nishiyuki, Baker are analogous arts because both are directed to the field of endeavor of transfer learning.
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify transfer machine learning of Gautam, Molchanov, and Nishiyuki with the smaller model of Baker.
Doing so would allow for transferring knowledge from any network to another network trying to learn the same classification task. This would save time, instead of training a machine learning model from scratch (para [0542]).
Regarding Claim 20,
Gautam, Molchanov, and Nishiyuki teach the method of claim 14. 
Gautam, Molchanov, and Nishiyuki do not explicitly disclose
wherein the submodel is smaller than the pretrained model according to a total number of model parameters.
However, Baker (US 20200285939 A1) teaches
wherein the submodel is smaller than the pretrained model according to a total number of model parameters (para [0623] Learning by imitation can transfer knowledge from a smaller network to a larger network, which facilitates growing a deeper neural network. It also can be used to transfer knowledge from an ensemble of shorter, wider networks to a single, deeper, thinner network with a smaller total number of parameters.).
Gautam, Molchanov, Nishiyuki, and Baker are analogous arts because both are directed to the field of endeavor of transfer learning.
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify transfer machine learning of Gautam, Molchanov, and Nishiyuki with the smaller model of Baker.
Doing so would allow for transferring knowledge from any network to another network trying to learn the same classification task. This would save time, instead of training a machine learning model from scratch (para [0542]).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Abou Shousha et al. (US 10468142 B1) – discloses transfer learning with feature maps.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HENRY K NGUYEN whose telephone number is (571)272-0217. The examiner can normally be reached Mon - Fri 7:00am-4:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached on 5712723768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/H.N./Examiner, Art Unit 2121                                                                                                                                                                                                        
/DANIEL T PELLETT/Primary Examiner, Art Unit 2121