Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
1. This Office Action is in response to the application filed on 05/12/2020. Claims 1-25 are pending in this application. Claims 1, 8, 14 and 20 are independent claims. 


Claim Rejections - 35 USC § 103
2. In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

3. The following is a quotation of pre-AIA  35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:
(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the manner in which the invention was made.

4. Claims 1 and 2 are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Brothers (US PGPub 20160358068), and further in view of Yao (US PGPub 20210133518).

As per Claim 1, Brothers teaches of a system comprising: at least one processor; and at least one memory containing instructions that, when executed by the at least one processor, cause the system to perform: generating a neural network output from a neural network input, generation of the neural network output comprising: generating at least two output feature maps using at least two input feature maps, generation of the at least two output feature maps comprising: (Par 18, In a number of neural networks, for a given convolution layer of a neural network, a number, e.g., “M,” of input feature maps are processed to generate a number, e.g., “N,” of output feature maps. For each output feature map generated by the convolution layer, each of the input feature maps may be processed by a different convolution kernel and then summed. The input feature maps determined to be processed by a group of similar convolution kernels may be scaled using the scaling factors. The scaled input feature maps may be summed to generate a composite input feature map. The convolution kernels of the group, i.e., those determined to be similar, may be replaced with the base convolution kernel. Accordingly, when the modified neural network is executed, rather than execute each of the convolution kernels of the group to process an input feature map, the neural network may apply the base convolution kernel to the composite input feature map to generate the output feature map, or a portion of an output feature map.)
convolving a first input feature map of the at least two input feature maps with at least one first kernel to generate a first intermediate feature map; convolving a second input feature map of the at least two input feature maps with at least one second kernel to generate a second intermediate feature map; (Par 4, Each layer may receive input data and generate output data by processing the input data to the layer. The output data may be a feature map of the input data that the neural network generates by convolving an input image or a feature map with convolution kernels. Par 23, neural network 110 includes a convolution layer in which input feature maps A, B, C, and D are processed by convolution kernels K1, K2, K3, and K4 respectively. Neural network analyzer 105 has determined that convolution kernels K1, K2, and K3 are similar and formed a group 120. The results of applying convolution kernels K1, K2, K3, and K4 are summed to generate an output feature map 125. In this example, output feature map 125 may be represented by the expression: A*K1+B*K2+C*K3+D*K4.)
Brothers does not specifically teach, however Yao teaches of generating, by up-sampling the first intermediate feature map, an up-sampled version of the first intermediate feature map; (Par 16, For example, using a size 200×000 of convolutional layer 108A as reference, the feature maps generated by the convolutional layers 1086, 108C, 108D, and 108E may be up-sampled by the feature extraction and combination module 104 to size 200×200. )
generating, by down-sampling the second intermediate feature map, a down-sampled version of the second intermediate feature map; (Par 18, for reference convolutional layer 108B, the feature extraction and combination module 104 can down-sample the feature map of convolutional layer 108A)
combining the first intermediate feature map with the down-sampled version of the second intermediate feature map to generate a first output feature map of the at least two output feature maps; and (Par 26, the multi-scale hard miner can select a reference layer in the convolutional neural network and up-sample or down-sample feature maps from other layers in the convolutional neural network to generate the concatenated feature maps. For example, each concatenated feature maps may include a feature map of the reference layer and the up-sampled or down-sampled feature maps of other layers in a convolutional neural network.)
combining the second intermediate feature map with the up-sampled version of the first intermediate feature map to generate a second output feature map of the at least two output feature maps. (Par 17 In some examples, the feature extraction and combination module 104 may then concatenate the up-sampled feature maps from convolutional layers 1088, 108C, 108D, and 108E with the native size feature map of convolutional layer 108A to generate concatenated feature maps 110A.)
Therefore, it would have been obvious for one of the ordinary skill in the art before the effective filing date of the claimed invention to add generating, by up-sampling the first intermediate feature map, an up-sampled version of the first intermediate feature map; generating, by down-sampling the second intermediate feature map, a down-sampled version of the second intermediate feature map; combining the first intermediate feature map with the down-sampled version of the second intermediate feature map to generate a first output feature map of the at least two output feature maps; and combining the second intermediate feature map with the up-sampled version of the first intermediate feature map to generate a second output feature map of the at least two output feature maps, as conceptually seen from the teaching of Yao, into that of Brothers because this modification can help enrich feature maps with multiple scales to improve the detection and prediction with the neural network.

As per Claim 2. Yao teaches of the system of claim 1, wherein generation of the neural network output further comprises: obtaining the neural network input; generating, by down-sampling the neural network input, a down-sampled version of the neural network input; and applying the down-sampled version of the neural network input to one or more convolutional neural network layers to generate the first input feature map. (Par 18, for reference convolutional layer 108B, the feature extraction and combination module 104 can down-sample the feature map of convolutional layer 108A and Par 26, the multi-scale hard miner can select a reference layer in the convolutional neural network and up-sample or down-sample feature maps from other layers in the convolutional neural network to generate the concatenated feature maps. For example, each concatenated feature maps may include a feature map of the reference layer and the up-sampled or down-sampled feature maps of other layers in a convolutional neural network.)
Therefore, it would have been obvious for one of the ordinary skill in the art before the effective filing date of the claimed invention to add obtaining the neural network input; generating, by down-sampling the neural network input, a down-sampled version of the neural network input; and applying the down-sampled version of the neural network input to one or more convolutional neural network layers to generate the first input feature map, as conceptually seen from the teaching of Yao, into that of Brothers because this modification can help enrich feature maps with multiple scales to improve the detection and prediction with the neural network.

5. Claim 3 is rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Brothers (US PGPub 20160358068), and further in view of Yao (US PGPub 20210133518), and further in view of Liu (US PGPub 20190220746).

As per Claim 3, neither Brothers nor Yao specifically teaches, however Liu teaches of the system of claim 1, wherein the down-sampling comprises at least one of convolution, sampling, max pooling, or averaging pooling. (Par 107, For example, the first standard down-sampling layer SD1 and the second standard down-sampling layer SD2 may adopt a maximum merging (max pooling) method, an average value merging (average pooling) method, a strided convolution method or other down-sampling methods. The first standard up-sampling layer SU1 and the second standard up-sampling layer SU2 may adopt a strided transposed convolution methods or other up-sampling methods.)
Therefore, it would have been obvious for one of the ordinary skill in the art before the effective filing date of the claimed invention to add at least one of convolution, sampling, max pooling, or averaging pooling, as conceptually seen from the teaching of Liu, into that of Brothers and Yao because this modification can help enrich feature maps with multiple scales to improve the detection and prediction with the neural network.

6. Claims 4-5 are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Brothers (US PGPub 20160358068), and further in view of Yao (US PGPub 20210133518), and further in view of Jang (US PGPub 20200175313).

As per Claim 4, neither Brothers nor Yao specifically teaches, however Jang teaches of the system of claim 1, wherein generation of the neural network output further comprises combining the at least two output feature maps or selecting one of the at least two output feature maps. (Par 7, neural network apparatus includes … generating a plurality of intermediate feature maps by performing a convolution operation between the plurality of sub-feature maps and the trained weights, and generate a dilated output feature map by merging the plurality of intermediate feature maps based on the dilation rate.)
Therefore, it would have been obvious for one of the ordinary skill in the art before the effective filing date of the claimed invention to add combining the at least two output feature maps or selecting one of the at least two output feature maps, as conceptually seen from the teaching of Jang, into that of Brothers and Yao because this modification can help enrich feature maps with multiple scales to improve the detection and prediction with the neural network.

As per Claim 5, neither Brothers nor Yao specifically teaches, however Jang teaches of the system of claim 1, wherein the at least two input feature maps each include channels having a predetermined size, the predetermined sizes differing between the at least two input feature maps. (Par 57, For example, when an image of a 24×24 pixel size is input to the neural network 1 of FIG. 1, the input image may be output as feature maps of 4 channels each having a 20×20 size through a convolution operation with weights. Also, some of the pixel values of the feature maps of 4 channels each having the 20×20 size may be subject to a sub-sampling operation to output feature maps of 4 channels each having a 10×10 size.)
Therefore, it would have been obvious for one of the ordinary skill in the art before the effective filing date of the claimed invention to add combining the at least two output feature maps or selecting one of the at least two output feature maps, as conceptually seen from the teaching of Jang, into that of Brothers and Yao because this modification can help enrich feature maps with multiple scales to improve the detection and prediction with the neural network.

7. Claim 6 is rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Brothers (US PGPub 20160358068), in view of Yao (US PGPub 20210133518), in view of Jang (US PGPub 20200175313), and further in view of Du (US PGPub 20190087716).

As per Claim 6, none of Brothers, Yao and Jang specifically teaches, however Du teaches of the system of claim 5, wherein the at least two input feature maps comprises 2, 4, 8, 16, or 32 input feature maps. (Par 46, In practical application, the input feature maps, the core processing modules and the output feature maps can be multiple. Taking two cores (#1, #2), four output feature maps (#1, #2, #3, #4), and four input feature maps (#1, #2, #3, #4) for example, the processing way of the multi-core processing module is explained below.)
Therefore, it would have been obvious for one of the ordinary skill in the art before the effective filing date of the claimed invention to add the at least two input feature maps comprises 2, 4, 8, 16, or 32 input feature maps, as conceptually seen from the teaching of Du, into that of Brothers, Yao and Jang because this modification can help enrich feature maps with multiple scales to improve the detection and prediction with the neural network.

8. Claim 7 is rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Brothers (US PGPub 20160358068), in view of Yao (US PGPub 20210133518), in view of Jang (US PGPub 20200175313), and further in view of Lazarus (US PGPub 20200058106).

As per Claim 7, none of Brothers, Yao and Jang specifically teaches, however Lazarus teaches of the system of claim 5, wherein the predetermined sizes differ by powers of four or more. (Par 71, At each down-sampling step the number of feature channels is doubled from 64 to 128 to 256. The number of feature channels is also doubled from 256 to 512 at the bottom layer.)
Therefore, it would have been obvious for one of the ordinary skill in the art before the effective filing date of the claimed invention to add the predetermined sizes differ by powers of four or more, as conceptually seen from the teaching of Lazarus, into that of Brothers, Yao and Jang because this modification can help enrich feature maps with multiple scales to improve the detection and prediction with the neural network.

9. Claim 8 is rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Brothers (US PGPub 20160358068), in view of Savvides (US PGPub 20180068198), and further in view of Ting (US PGPub 20220157048).

As per Claim 8, Brothers teaches of a system comprising: at least one processor; and at least one memory containing instructions that, when executed by the at least one processor, cause the system to perform: generating a neural network output from a neural network input, generation of the neural network output comprising: generating at least two output feature maps of differing channel sizes using at least two input feature maps of the differing channel sizes, generation of the at least two output feature maps comprising: (Par 18, In a number of neural networks, for a given convolution layer of a neural network, a number, e.g., “M,” of input feature maps are processed to generate a number, e.g., “N,” of output feature maps. For each output feature map generated by the convolution layer, each of the input feature maps may be processed by a different convolution kernel and then summed. The input feature maps determined to be processed by a group of similar convolution kernels may be scaled using the scaling factors. The scaled input feature maps may be summed to generate a composite input feature map. The convolution kernels of the group, i.e., those determined to be similar, may be replaced with the base convolution kernel. Accordingly, when the modified neural network is executed, rather than execute each of the convolution kernels of the group to process an input feature map, the neural network may apply the base convolution kernel to the composite input feature map to generate the output feature map, or a portion of an output feature map.)
Brothers does not specifically teach, however Savvides teaches of generating a first intermediate map by providing a first input feature map of the at least two input feature maps to a first convolutional sub-layer, the first input feature map having a first channel size; generating a second intermediate map by providing a second input feature map of the at least two input feature maps to a second convolutional sub-layer, the second input feature map having a second channel size; generating, using the first intermediate map, a version of the first intermediate map having the second channel size; generating, using the second intermediate map, a version of the second intermediate map having the first channel size;  (Par 5, create a corresponding series of feature maps of differing scales [different channel sizes]; … concatenating the series of normalized feature maps together with one another to create a concatenated feature map [output feature map];  Par 56, Then, since the channel size is different among layers, the normalized feature map from each layer needed to be re-weighted so that their values are at the same scale. After that, the feature maps are concatenated to one single feature map tensor. This modification helps to stabilize the system and increase the accuracy. The channel size of the concatenated feature map is then shrunk to fit right in the original architecture for the downstream fully-connected layers. And par 63, It is noted that while at least two convolutional layers are needed for steps of method 300 that follow, in practice, the more-robust object-detection system of the present invention will typically include more than two convolutional layers. In addition, each convolution layer may include multiple convolution sublayers. Par 76, Each bounding box, here just bounding box 440, suspected of containing an occurrence of the desired class, is then projected back to each of feature maps 416(1) to 416(3) on convolution layers 404(3) to 404(5) used to create highlighted feature map 436. Claim 1, … sequentially convolving the image in a series of at least two convolution layers to create a corresponding series of feature maps of differing scales)
Therefore, it would have been obvious for one of the ordinary skill in the art before the effective filing date of the claimed invention to add generating a first intermediate map by providing a first input feature map of the at least two input feature maps to a first convolutional sub-layer, the first input feature map having a first channel size; generating a second intermediate map by providing a second input feature map of the at least two input feature maps to a second convolutional sub-layer, the second input feature map having a second channel size; generating, using the first intermediate map, a version of the first intermediate map having the second channel size; generating, using the second intermediate map, a version of the second intermediate map having the first channel size, as conceptually seen from the teaching of Savvides, into that of Brothers because this modification can help enrich feature maps with multiple scales to improve the detection and prediction with the neural network.
Neither Brothers nor Savvides specifically teaches, however Ting teaches of combining the first intermediate map and the version of the second intermediate map having the first channel size to generate a first output feature map of the at least two output feature maps; and combining the second intermediate map and the version of the first intermediate map having the second channel size to generate a second output feature map of the at least two output feature maps. ((Par 24, (ii) applying multiple resizing operations to the primary feature maps to generate multiple resized feature maps, each resizing operation having a different size parameter; and (iii) applying a set of convolution filters to the multiple resized feature maps to generate secondary feature maps; (b) combine the secondary feature maps to generate a feature vector, respective elements of the feature vector corresponding to respective convolutional channels of the secondary feature maps; Par 24-25 and 37-38, (b) combine the secondary feature maps to generate a feature vector, respective elements of the feature vector corresponding to respective convolutional channels of the secondary feature maps… combine the secondary feature maps by: determining the top k values across the secondary feature maps for each convolutional channel.)
Therefore, it would have been obvious for one of the ordinary skill in the art before the effective filing date of the claimed invention to add combining the first intermediate map and the version of the second intermediate map having the first channel size to generate a first output feature map of the at least two output feature maps; and combining the second intermediate map and the version of the first intermediate map having the second channel size to generate a second output feature map of the at least two output feature maps, as conceptually seen from the teaching of Ting, into that of Brothers and Savvides because this modification can help enrich feature maps with multiple scales to improve the detection and prediction with the neural network.

10. Claim 9 is rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Brothers (US PGPub 20160358068), in view of Savvides (US PGPub 20180068198), in view of Ting (US PGPub 20220157048), and further in view of Kim (US PGPub 20200118249).

As per Claim 9, none of Brothers, Savvides and Ting specifically teaches, however Kim teaches of the system of claim 8, wherein: generating the neural network output comprises repeatedly generating the neural network output; and the at least two input feature maps in a repeat comprise the at least two output feature maps generated in a prior repeat. (Par 107, After operations S720 and S730 are performed, the neural network device 13 may repeat operations S720 and S730 to generate subsequent output feature maps, such as a third output feature map and a fourth output feature map. That is, the neural network device 13 may repeatedly perform operations S720 and S730 until an N-th output feature map OFM_N is generated.)
Therefore, it would have been obvious for one of the ordinary skill in the art before the effective filing date of the claimed invention to add repeatedly generating the neural network output; and the at least two input feature maps in a repeat comprise the at least two output feature maps generated in a prior repeat, as conceptually seen from the teaching of Kim, into that of Brothers, Savvides and Ting because this modification can help enrich feature maps with multiple scales to improve the detection and prediction with the neural network.

11. Claims 10-13 are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Brothers (US PGPub 20160358068), in view of Savvides (US PGPub 20180068198), in view of Ting (US PGPub 20220157048), and further in view of Lazarus (US PGPub 20200058106).

As per Claim 10, none of Brothers, Savvides and Ting specifically teaches, however Lazarus teaches of the system of claim 8, wherein: the version of the first intermediate map having the second channel size is generated by up-sampling the first intermediate map; and the version of the second intermediate map having the first channel size is generated by down-sampling the second intermediate map. (Par 71, FIG. 1C illustrates a specific example of the architecture of an example convolutional neural network block shown in FIG. 1B, in accordance with some embodiments of the technology described herein. As shown in FIG. 1C, all of the convolutional layers apply a 3×3 kernel. In the down-sampling path, the input at each level is processed by repeated application of two (or three at the bottom level) convolutions with 3×3 kernels, each followed by an application of a non-linearity, an average 2×2 pooling operation with stride 2 for down-sampling. At each down-sampling step the number of feature channels is doubled from 64 to 128 to 256. The number of feature channels is also doubled from 256 to 512 at the bottom layer. In the up-sampling path, the data is processed by repeated up-sampling of the feature maps using an average unpooling step that halves the number of feature channels (e.g., from 256 to 128 to 64), concatenating with the corresponding feature map from the down-sampling path and one or more convolutional layers (using 3×3 kernels), each followed by application of a non-linearity. The last convolutional layer 140c reduces the number of feature maps to 2.)
Therefore, it would have been obvious for one of the ordinary skill in the art before the effective filing date of the claimed invention to add repeatedly generating the neural network output; and the at least two input feature maps in a repeat comprise the at least two output feature maps generated in a prior repeat, as conceptually seen from the teaching of Lazarus, into that of Brothers, Savvides and Ting because this modification can help enrich feature maps with multiple scales to improve the detection and prediction with the neural network.


As per Claim 11, none of Brothers, Savvides and Ting specifically teaches, however Lazarus teaches of the system of claim 10, wherein the up-sampling comprises at least one of deconvolution, unpooling, or interpolation. (Par 71, In the up-sampling path, the data is processed by repeated up-sampling of the feature maps using an average unpooling step that halves the number of feature channels (e.g., from 256 to 128 to 64), concatenating with the corresponding feature map from the down-sampling path and one or more convolutional layers (using 3×3 kernels), each followed by application of a non-linearity. The last convolutional layer 140c reduces the number of feature maps to 2.)
Therefore, it would have been obvious for one of the ordinary skill in the art before the effective filing date of the claimed invention to add repeatedly generating the neural network output; and the at least two input feature maps in a repeat comprise the at least two output feature maps generated in a prior repeat, as conceptually seen from the teaching of Lazarus, into that of Brothers, Savvides and Ting because this modification can help enrich feature maps with multiple scales to improve the detection and prediction with the neural network.
 
As per Claim 12, neither Brothers nor Ting specifically teaches, however Savvides teaches of the system of claim 8, wherein the differing channel sizes comprise 2, 4, 8, 16, or 32 differing channel sizes. (Par 5, create a corresponding series of feature maps of differing scales [different channel sizes]; … concatenating the series of normalized feature maps together with one another to create a concatenated feature map [output feature map];  Par 56, Then, since the channel size is different among layers, the normalized feature map from each layer needed to be re-weighted so that their values are at the same scale. After that, the feature maps are concatenated to one single feature map tensor. It’s obvious that the number of different channel size is 2 or more.)
Therefore, it would have been obvious for one of the ordinary skill in the art before the effective filing date of the claimed invention to add the differing channel sizes comprise 2, 4, 8, 16, or 32 differing channel sizes, as conceptually seen from the teaching of Savvides, into that of Brothers and Ting because this modification can help enrich feature maps with multiple scales to improve the detection and prediction with the neural network.

As per Claim 13, none of Brothers, Savvides and Ting specifically teaches, however Lazarus teaches of the system of claim 12, wherein the differing channel sizes differ by powers of four or more. (Par 71, At each down-sampling step the number of feature channels is doubled from 64 to 128 to 256. The number of feature channels is also doubled from 256 to 512 at the bottom layer.)
Therefore, it would have been obvious for one of the ordinary skill in the art before the effective filing date of the claimed invention to add the differing channel sizes differ by powers of four or more, as conceptually seen from the teaching of Lazarus, into that of Brothers, Savvides and Ting because this modification can help enrich feature maps with multiple scales to improve the detection and prediction with the neural network.

12. Claims 14-15, 19, 20-21 and 25 are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Savvides (US PGPub 20180068198), in view of Ting (US PGPub 20220157048).

As per Claim 14, Savvides teaches of a non-transitory computer-readable medium storing a set of instructions that are executable by one or more processors of a system to cause the system to perform: obtaining at least two input feature maps of differing channel sizes; generating an output feature map for each one of the at least two input feature maps, (Par 5, create a corresponding series of feature maps of differing scales [channel sizes]; … concatenating the series of normalized feature maps together with one another to create a concatenated feature map [output feature map];  Par 56, Then, since the channel size is different among layers, the normalized feature map from each layer needed to be re-weighted so that their values are at the same scale. After that, the feature maps are concatenated to one single feature map tensor. This modification helps to stabilize the system and increase the accuracy. The channel size of the concatenated feature map is then shrunk to fit right in the original architecture for the downstream fully-connected layers.)
generation comprising: applying the one of the at least two input feature maps to a convolutional sub-layer to generate an intermediate feature map; (Par 63, It is noted that while at least two convolutional layers are needed for steps of method 300 that follow, in practice, the more-robust object-detection system of the present invention will typically include more than two convolutional layers. In addition, each convolution layer may include multiple convolution sublayers. Par 76, Each bounding box, here just bounding box 440, suspected of containing an occurrence of the desired class, is then projected back to each of feature maps 416(1) to 416(3) on convolution layers 404(3) to 404(5) used to create highlighted feature map 436. Claim 1,  … sequentially convolving the image in a series of at least two convolution layers to create a corresponding series of feature maps of differing scales)
… to match the channel size of the each one of the at least two input feature maps  (par 56, Then, since the channel size is different among layers, the normalized feature map from each layer needed to be re-weighted so that their values are at the same scale [to match the channel size]. Par 44, In both MS-RPN and CMS-CNN, concatenation of feature maps is done using normalization functions, such as the L2 norm function, because the feature maps from different layers have generally different properties in terms of numbers of channels, scale of value, and norm of feature map pixels.)
Savvides does not specifically teach, however Ting teaches of resizing intermediate feature maps generated from the remaining input feature maps [to match the channel size of the each one of the at least two input feature maps]; and (Par 24, (ii) applying multiple resizing operations to the primary feature maps to generate multiple resized feature maps, each resizing operation having a different size parameter; and (iii) applying a set of convolution filters to the multiple resized feature maps to generate secondary feature maps; (b) combine the secondary feature maps to generate a feature vector, respective elements of the feature vector corresponding to respective convolutional channels of the secondary feature maps; and (c) generate the class probabilities for the plurality of 2D input images from the feature vector. Par 66, Each feature map may be a two-dimensional array of float numbers. The feature map may be grouped into a number of convolutional channels, and each channel of the feature map may index one or multiple feature maps. Par 68, Each set of affine transformations may have the same number of transformations as the number of the convolutional channels of the input feature map, and each affine transformation may correspond to a channel. A feature reweighting component may … transform each channel of the input feature maps with the corresponding parameters of the affine transformations, and then assemble all channels to get the reweighted feature maps.)
combining the intermediate feature map and the resized intermediate feature maps to generate the output feature map. (Par 24-25 and 37-38, (b) combine the secondary feature maps to generate a feature vector, respective elements of the feature vector corresponding to respective convolutional channels of the secondary feature maps… combine the secondary feature maps by: determining the top k values across the secondary feature maps for each convolutional channel,)
Therefore, it would have been obvious for one of the ordinary skill in the art before the effective filing date of the claimed invention to add resizing intermediate feature maps generated from the remaining input feature maps; and combining the intermediate feature map and the resized intermediate feature maps to generate the output feature map, as conceptually seen from the teaching of Ting, into that of Savvides because this modification can help enrich feature maps with multiple scales to improve the detection and prediction with the neural network.

As per Claims 15, Savvides further teaches of the computer-readable medium of claim 14, wherein the at least two input feature maps comprises between 2 and 32 input feature maps. (Par 5, create a corresponding series of feature maps of differing scales [channel sizes]; … concatenating the series of normalized feature maps together with one another to create a concatenated feature map [output feature map];  Par 56, Then, since the channel size is different among layers, the normalized feature map from each layer needed to be re-weighted so that their values are at the same scale. After that, the feature maps are concatenated to one single feature map tensor. This modification helps to stabilize the system and increase the accuracy. The channel size of the concatenated feature map is then shrunk to fit right in the original architecture for the downstream fully-connected layers.)

As per Claim 19. Savvides does not specifically teach, however Ting teaches of the computer-readable medium of claim 14, wherein the performance further comprises: generating an output feature map by combining the output feature maps or selecting one of the output feature maps. (Par 24-25 and 37-38, (b) combine the secondary feature maps to generate a feature vector, respective elements of the feature vector corresponding to respective convolutional channels of the secondary feature maps… combine the secondary feature maps by: determining the top k values across the secondary feature maps for each convolutional channel)
Therefore, it would have been obvious for one of the ordinary skill in the art before the effective filing date of the claimed invention to add generating an output feature map by combining the output feature maps or selecting one of the output feature maps, as conceptually seen from the teaching of Ting, into that of Savvides because this modification can help enrich feature maps with multiple scales to improve the detection and prediction with the neural network.

Re Claim 20, it is the method claim, having similar limitations of claim 14. Thus, claim 20 is also rejected under the similar rationale as cited in the rejection of claim 14.

Re Claim 21, it is the method claim, having similar limitations of claim 15. Thus, claim 21 is also rejected under the similar rationale as cited in the rejection of claim 15.

Re Claim 25, it is the method claim, having similar limitations of claim 19. Thus, claim 25 is also rejected under the similar rationale as cited in the rejection of claim 19.


13. Claims 16 and 22 are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Savvides (US PGPub 20180068198), in view of Ting (US PGPub 20220157048), and further in view of Lazarus (US PGPub 20200058106).

As per Claim 16, neither Savvides nor Ting specifically teaches, however Lazarus teaches of the computer-readable medium of claim 14, wherein the differing channel sizes differ by powers of four or more. (Par 71, At each down-sampling step the number of feature channels is doubled from 64 to 128 to 256. The number of feature channels is also doubled from 256 to 512 at the bottom layer.)
Therefore, it would have been obvious for one of the ordinary skill in the art before the effective filing date of the claimed invention to add the differing channel sizes differ by powers of four or more, as conceptually seen from the teaching of Lazarus, into that of Savvides and Ting because this modification can help enrich feature maps with multiple scales to improve the detection and prediction with the neural network.

Re Claim 22, it is the method claim, having similar limitations of claim 16. Thus, claim 22 is also rejected under the similar rationale as cited in the rejection of claim 16.

14. Claims 17 and 23 are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Savvides (US PGPub 20180068198), in view of Ting (US PGPub 20220157048), and further in view of Kim (US PGPub 20190138892).

As per Claim 17, neither Savvides nor Ting specifically teaches, however Kim teaches of the computer-readable medium of claim 14, wherein the performance further comprises: obtaining an initial feature map; and generating the at least two input feature maps using the initial feature map.  (Par 194, The neural network device may split the input feature map 1810 into sub feature maps 1811 when the number of pixel data in the input feature map 1810 is greater than the number of input values in the core 1800, i.e. the number of columns M. In an example, the neural network device may split the input feature map 1810 into the sub feature maps 1811 based on size information of the core 1800.)
Therefore, it would have been obvious for one of the ordinary skill in the art before the effective filing date of the claimed invention to add obtaining an initial feature map; and generating the at least two input feature maps using the initial feature map, as conceptually seen from the teaching of Kim, into that of Savvides and Ting because this modification can help enrich feature maps with multiple scales to improve the detection and prediction with the neural network.

Re Claim 23, it is the method claim, having similar limitations of claim 17. Thus, claim 23 is also rejected under the similar rationale as cited in the rejection of claim 17.

15. Claims 18 and 24 are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Savvides (US PGPub 20180068198), in view of Ting (US PGPub 20220157048), and further in view of Oh (US PGPub 20100239163).

As per Claim 18, neither Savvides nor Ting specifically teaches, however Oh teaches of the computer-readable medium of claim 14, wherein the resizing comprises at least one of convolution, max pooling, averaging pooling, deconvolution, unpooling, or interpolation. (Par 24, Next, the prepared black-and-white image is resized to a preset size by bi-cubic interpolation. To make a resized color image, the input image is then divided into three channel (Red, Green, and Blue) images and the bi-cubic interpolation is applied to each of the channel images to resize them to the preset size. After that, the resized channel images are matched with each other again to make a resized color image.)
Therefore, it would have been obvious for one of the ordinary skill in the art before the effective filing date of the claimed invention to add the resizing comprises at least one of convolution, max pooling, averaging pooling, deconvolution, unpooling, or interpolation, as conceptually seen from the teaching of Oh, into that of Savvides and Ting because this modification can help enrich feature maps with multiple scales to improve the detection and prediction with the neural network.

Re Claim 24, it is the method claim, having similar limitations of claim 18. Thus, claim 24 is also rejected under the similar rationale as cited in the rejection of claim 18.



Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to JAE UK JEON whose telephone number is (571)270-3649. The examiner can normally be reached 9am-6pm. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chat Do can be reached on 571-272-3721. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JAE U JEON/Primary Examiner, Art Unit 2193