DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
The amendment filed August 30th, 2022 has been entered. Claims 20-27 and 29-35 remain pending in application. Applicant’s amendment to the Claims have overcome each and every objections and 112(b) rejections previously set forth in the Non-Final Office Action mailed May 25th, 2022.

Response to Arguments
Applicant's arguments filed 08/30/2022 have been fully considered but they are not persuasive. 
On page 10 of the Amendment, Applicant contend that Kanazawa fails to disclose or suggest “resizing the convolution results to the size of the convolutional layer input” as recited in claim 20. In support of this assertion, Applicants argue that although Kanazawa explains the use of normalizing, Kanazawa normalizes so that the 3 scales are the same size but Kanazawa does not explain that the 3 scales are re-sized to the size of the input. The Examiner respectfully disagrees with this characterization of Kanazawa and submits that the reference does indeed disclose the limitation in question. Kanazawa teaches “Locally Scale-Invariant Convolutional Neural Networks", that the input layers are scaled into three different sizes as shown in Figure 1. Scaling means reducing or increasing in size according to a common scale. In Fig. 1, convolution is done to the three scaled layers in step 3 scaling is undone (3. Undo Scaling). Undo means cancel or reverse the effects or results of therefore undoing the scaling means reversing or returning it back to its’ original size before the scaling was done which means it is resized to the convolutional layer input size. Kanazawa explained the undoing as normalizing as seen in Fig. 1. Normalize is defined as to bring or return to a normal or standard condition and the normal size of layer was the original size before it was scaled in step 1 of Fig. 1. For example, in Fig. 2, the input was scaled by multiplying by 2 and to inverse the scale or undo it the layer was multiplied by ½. Thus, Kanazawa does indeed disclose the claimed “resizing the convolution results to the size of the convolutional layer input”. 
On page 10 of the Amendment, Applicant contend that there is lack of motivation to combine Farabet and Kanazawa because Farabet resizes and concatenates while Kanazawa normalizes and perform a max pooling to provide scale invariance. Applicant argues that a person of ordinary skill in the art would recognize that these two approaches cannot be combined as the Examiner suggests at the combination would impermissibly change the principle of operation of each. The Examiner respectfully disagrees with the applicant’s argument regarding the lack of motivation to combine. Kanazawa’s teaching of normalizing is a form of resizing the layer. In Fig. 1 of Kanazawa, it is normalizing the layer or return it back to it’s original size before the scaling was done. Kanazawa performing max-pooling would not change the principle operation of Farabet because Farabet also performs max pooling as explained in Section 1.2.3 of Farabet as well as resizing the layer. Kanazawa teaches resizing the layer to the size of the convolutional layer input in Fig. 1 and Fig. 2. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method as taught by Farabet to incorporate the teachings of Kanazawa of resizing the convolution results to the size of the layer input. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been “align the feature maps” and it summarizes the responses in a concise way that allows to maintain the same output size as a standard convolution layer (Kanazawa, Section 3.1, para. 1).


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 20-24, 27, 29-32, and 35 are rejected under 35 U.S.C. 103 as being unpatentable over Farabet in view of Kanazawa et al., "Locally Scale-Invariant Convolutional Neural Networks" (December 2014, previously cited by applicant in IDS), hereinafter referred to as Kanazawa.

Regarding claim 20, Farabet teaches a method (Abstract, “method that uses a multiscale convolutional network trained from raw pixels to extract dense feature vectors that encode regions of multiple sizes centered on each pixel”), comprising: 
- resizing a convolutional layer input (Fig. 1, input image) of a convolutional artificial neural network (Fig. 1) with at least two different scales to obtain multiple groups of intermediate features maps (Fig. 1, “The raw input image is transformed through a Laplacian pyramid. Each scale is fed to a three-stage ConvNet, which produces a set of feature maps.”, as seen in Fig.1, the raw input image is resized to different scales and output multiple groups of intermediate feature maps);
- convolving the intermediate feature maps with a filter (Section 3.1, para. 3, “The filters (convolution kernels) are subject to training. Each filter is applied to the input feature maps through a two-dimensional convolution operation which detects local features at all locations on the input”);
- resizing the convolution results (Fig. 1, “the coarser scale maps being upsampled to match the size of the finest scale map”, as seen in Fig. 1, the coarser feature maps which are the convolution results are upsampled to the size of the finest feature map, upsampling is a way of resizing an image); and
- concatenating the resized convolution results to form an output of the convolutional layer (Fig. 1, “The feature maps of all scales are concatenated, the coarser scale maps being upsampled to match the size of the finest scale map”, Section 3.1, para. 10, “the outputs of the N networks are upsampled and concatenated so as to produce F, a map of feature vectors of size N times the size of f1”).  

Farabet does not explicitly teach resizing the convolution results to the size of the layer convolutional input.
	However, Kanazawa teaches resizing the convolution results to the size of the convolutional layer input (Kanazawa, Fig. 1, “the responses of the convolution in each scale are normalized”, the input layer is resized to different scales which Farabet also teaches and after the convolution there’s an undo scaling step which resize the results to the size of the input image., normalize definition is to bring or return to a normal or standard condition and the normal size of layer was the original size before it was scaled in step 1 of Fig. 1. For example, in Fig. 2, the input was scaled by multiplying by 2 and to inverse the scale or undo it the layer was multiplied by ½. Therefore the Undo scaling step in Figure. 1 returns the size of the layer before it was scaled to a different size).
Farabet and Kanazawa are both considered to be analogous to the claimed invention because they are in the same field of image processing using convolutional neural network. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method as taught by Farabet to incorporate the teachings of Kanazawa of resizing the convolution results to the size of the layer convolutional input. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been “align the feature maps” and it summarizes the responses in a concise way that allows to maintain the same output size as a standard convolution layer (Kanazawa, Section 3.1, para. 1).

Regarding claim 21, the combination of Farabet in view of Kanazawa teaches the method of claim 20 (Farabet, Abstract, “method that uses a multiscale convolutional network trained from raw pixels to extract dense feature vectors that encode regions of multiple sizes centered on each pixel”), wherein the convolutional layer is a layer pyramid (Farabet, Fig.1, pyramid layers, “raw input image is transformed through a Laplacian pyramid. Each scale is fed to a three-stage ConvNet, which produces a set of feature maps”) comprising the multiple groups of intermediate feature maps of different scales (Farabet, Fig. 1, “Each scale is fed to a three-stage ConvNet, which produces a set of feature maps”), 
a series of layer pyramids is cascaded (Farabet, Section 3.1, para. 3, “Our feature extractor is a three-stage ConvNet”, Section 3.1, para. 2, “A typical ConvNet is composed of one, two, or three such three-layer stages, followed by a classification module”, Fig. 1 shows the cascaded layers), and 
a classification decision is prepared upon result of the last pyramid layer in the series (Farabet, Section 3.1, para. 3, “Our feature extractor is a three-stage ConvNet”, Section 3.1, para. 2, “A typical ConvNet is composed of one, two, or three such three-layer stages, followed by a classification module”).  

Regarding  claim 22, the combination of Farabet in view of Kanazawa teaches the method of claim 20 (Farabet, Abstract, “method that uses a multiscale convolutional network trained from raw pixels to extract dense feature vectors that encode regions of multiple sizes centered on each pixel”), wherein the intermediate feature maps (Farabet, Fig. 1, “Each scale is fed to a three-stage ConvNet, which produces a set of feature maps”) are convolved with a single size filter (Farabet, Section 3, para. 1, “This multiscale model in which weights are shared across scales allows the model to capture long-range interactions without the penalty of extra parameters to train.”, Fig. 10, “The 16 filters obtained when sharing weights across all three scales. All the filters are 7x7”).  

Regarding  claim 23, the combination of Farabet in view of Kanazawa teaches the method of claim 20 (Farabet, Abstract, “method that uses a multiscale convolutional network trained from raw pixels to extract dense feature vectors that encode regions of multiple sizes centered on each pixel”), wherein the convolutional artificial neural network (Farabet, Fig. 1) is constructed and trained (Farabet, Section 4.3.3, Training Procedure) by: 
- receiving a set of training data (Farabet, Section 4.3.3, “Let F be the set of all feature maps in the training set and T the set of all families of segmentations”), 
- selecting a number of multi-scale convolutional layers (Farabet, Fig. 1, Section 5.1, “, the pyramid consists of three rescaled versions of the input (N = 3), in octaves: 320 x 240, 160 x 120, 80 x 60”), 
- defining the different scales (Farabet, Section 5.1, “, the pyramid consists of three rescaled versions of the input (N = 3), in octaves: 320 x 240, 160 x 120, 80 x 60”), 
- forming each multi-scale convolutional layer by convolving the multiple groups of intermediate features maps at different scales (Farabet, Fig. 1, multi-scale convolutional layer, “The raw input image is transformed through a Laplacian pyramid. Each scale is fed to a three-stage ConvNet, which produces a set of feature maps.”) and concatenating the resized convolution results (Farabet, Fig. 1, “The feature maps of all scales are concatenated, the coarser scale maps being upsampled to match the size of the finest scale map”, Section 3.1, para. 10, “the outputs of the N networks are upsampled and concatenated so as to produce F, a map of feature vectors of size N times the size of f1”, Kanazawa teaches resizing the image to the size of the input layer), 
- constructing the convolutional artificial neural network (Farabet, Fig. 1) by cascading a series of the layers (Farabet, Section 3.1, para. 3, “Our feature extractor is a three-stage ConvNet”, Section 3.1, para. 2, “A typical ConvNet is composed of one, two, or three such three-layer stages, followed by a classification module”, Fig. 1 shows the cascaded layers), and 
- training the filters in each layer pyramid (Farabet, Section 3.1, para. 3, “The filters (convolution kernels) are subject to training”) by applying a backpropagation method (Kanazawa, Section 2.1, “The final layer is a classifier or regressor with a cost function such that the network can be trained in a supervised manner. The entire network is optimized jointly via stochastic gradient descent with gradients computed by backpropagation”, “The local feature detectors are trainable filters (kernels)”, part of the training is training the filters or kernels which is optimized backpropagation”). 
Farabet and Kanazawa are both considered to be analogous to the claimed invention because they are in the same field of image processing using convolutional neural network. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method as taught by Farabet to incorporate the teachings of Kanazawa of training the filters in each layer pyramid by applying a backpropagation method. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to optimized the entire network (Kanazawa, Section 2.1, para. 1).

Regarding claim 24, the combination of Farabet in view of Kanazawa teaches the method of claim 23 (Farabet, Abstract, “method that uses a multiscale convolutional network trained from raw pixels to extract dense feature vectors that encode regions of multiple sizes centered on each pixel”), wherein the constructed and trained neural network (Farabet, Section 4.3.3, Training Procedure, Fig. 1, convolutional neural network) is tested (Section 5, “We use the evaluation procedure introduced in [15], 5-fold cross validation: 572 images used for training, and 143 for testing”, 143 images are used to test the convolutional neural network) by computing (Section 5, “To evaluate the representation from our multiscale ConvNet, we report results from several experiments on the Stanford Background dataset, the multiscale ConvNet presented in Section 3.1, with raw pixelwise prediction”, the classification accuracy can be seen in table 3) the series of convolutional layers (Farabet, Section 3.1, para. 3, “Our feature extractor is a three-stage ConvNet”, Section 3.1, para. 2, “A typical ConvNet is composed of one, two, or three such three-layer stages, followed by a classification module”, Fig. 1 shows the cascaded layers) for a test image (Section 5, “We use the evaluation procedure introduced in [15], 5-fold cross validation: 572 images used for training, and 143 for testing”, 143 images are used to test the convolutional neural network) and making a classification decision upon result of the last convolutional layer in the series (Farabet, Section 3.1, para. 3, “Our feature extractor is a three-stage ConvNet”, Section 3.1, para. 2, “A typical ConvNet is composed of one, two, or three such three-layer stages, followed by a classification module”).  

Regarding claim 27, Farabet teaches an apparatus (Section 5, “Intel i7 laptop”) comprising: 
- memory (Section 5, “Intel i7 laptop”, the Intel i7 laptop has memory) configured to store data defining (Section 5, “computing convolutions in parallel allows us to parse an image of size 320 x 240 in less than 1 second on a 4-core Intel i7 laptop”, the convolution is done in the laptop), at least partly, a convolutional artificial neural network (Fig. 1, convolutional neural network), and 
- at least one processing core (Section 5, “GPUs or other types of dedicated hardware”);
- at least one memory comprising program code which when executed configures the apparatus to at least:
- resize a convolutional layer input (Fig. 1, input image) of a convolutional artificial neural network (Fig. 1) with at least two different scales to obtain multiple groups of intermediate features maps (Fig. 1, “The raw input image is transformed through a Laplacian pyramid. Each scale is fed to a three-stage ConvNet, which produces a set of feature maps.”, as seen in Fig.1, the raw input image is resized to different scales and output multiple groups of intermediate feature maps);
- convolve the intermediate feature maps with a filter (Section 3.1, para. 3, “The filters (convolution kernels) are subject to training. Each filter is applied to the input feature maps through a two-dimensional convolution operation which detects local features at all locations on the input”);
- resize the convolution results (Fig. 1, “the coarser scale maps being upsampled to match the size of the finest scale map”, as seen in Fig. 1, the coarser feature maps which are the convolution results are upsampled to the size of the finest feature map, upsampling is a way of resizing an image); and
- concatenate the resized convolution results to form an output of the convolutional layer (Fig. 1, “The feature maps of all scales are concatenated, the coarser scale maps being upsampled to match the size of the finest scale map”, Section 3.1, para. 10, “the outputs of the N networks are upsampled and concatenated so as to produce F, a map of feature vectors of size N times the size of f1”).  

Farabet does not explicitly teach resizing the convolution results to the size of the convolutional layer input.
	However, Kanazawa teaches resizing the convolution results to the size of the convolutional layer input (Kanazawa, Fig. 1, “the responses of the convolution in each scale are normalized”, the input layer is resized to different scales which Farabet also teaches and after the convolution there’s an undo scaling step which resize the results to the size of the input image., normalize definition is to bring or return to a normal or standard condition and the normal size of layer was the original size before it was scaled in step 1 of Fig. 1. For example, in Fig. 2, the input was scaled by multiplying by 2 and to inverse the scale or undo it the layer was multiplied by ½. Therefore the Undo scaling step in Figure. 1 returns the size of the layer before it was scaled to a different size).
Farabet and Kanazawa are both considered to be analogous to the claimed invention because they are in the same field of image processing using convolutional neural network. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the apparatus as taught by Farabet to incorporate the teachings of Kanazawa of resizing the convolution results to the size of the convolutional layer input. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been “align the feature maps” and it summarizes the responses in a concise way that allows to maintain the same output size as a standard convolution layer (Kanazawa, Section 3.1, para. 1).
Regarding claim 29, the combination of Farabet in view of Kanazawa teaches the apparatus of claim 27 (Section 5, “Intel i7 laptop”), wherein the convolutional layer is a layer pyramid (Fig.1, pyramid layers, “raw input image is transformed through a Laplacian pyramid. Each scale is fed to a three-stage ConvNet, which produces a set of feature maps”) comprising the multiple groups of intermediate feature maps of different scales (Fig. 1, “Each scale is fed to a three-stage ConvNet, which produces a set of feature maps”), and the processing core (Section 5, “GPUs or other types of dedicated hardware”) is configured to cascade a series of layer pyramids (Section 3.1, para. 3, “Our feature extractor is a three-stage ConvNet”, Section 3.1, para. 2, “A typical ConvNet is composed of one, two, or three such three-layer stages, followed by a classification module”, Fig. 1 shows the cascaded layers) and prepare a classification decision upon result of the last pyramid layer in the series (Farabet, Section 3.1, para. 3, “Our feature extractor is a three-stage ConvNet”, Section 3.1, para. 2, “A typical ConvNet is composed of one, two, or three such three-layer stages, followed by a classification module”).  

Regarding claim 30, the combination of Farabet in view of Kanazawa teaches the apparatus of claim 27 (Section 5, “Intel i7 laptop”), wherein the filter is of a single size (Section 3, para. 1, “This multiscale model in which weights are shared across scales allows the model to capture long-range interactions without the penalty of extra parameters to train.”, Fig. 10, “The 16 filters obtained when sharing weights across all three scales. All the filters are 7x7”). 

Regarding claim 31, the combination of Farabet in view of Kanazawa teaches the apparatus of claim 27 (Section 5, “Intel i7 laptop”),  wherein the at least one processing core (Section 5, “GPUs or other types of dedicated hardware”) is configured to construct and train (Section 4.3.3, Training Procedure) the convolutional artificial neural network (Fig. 1) by at least: 
- receiving a set of training data (Farabet, Section 4.3.3, “Let F be the set of all feature maps in the training set and T the set of all families of segmentations”), 
- selecting a number of multi-scale convolutional layers (Farabet, Fig. 1, Section 5.1, “, the pyramid consists of three rescaled versions of the input (N = 3), in octaves: 320 x 240, 160 x 120, 80 x 60”), 
- defining the different scales (Farabet, Section 5.1, “, the pyramid consists of three rescaled versions of the input (N = 3), in octaves: 320 x 240, 160 x 120, 80 x 60”), 
- forming each multi-scale convolutional layer by convolving the multiple groups of intermediate features maps at different scales (Farabet, Fig. 1, multi-scale convolutional layer, “The raw input image is transformed through a Laplacian pyramid. Each scale is fed to a three-stage ConvNet, which produces a set of feature maps.”) and concatenating the resized convolution results (Farabet, Fig. 1, “The feature maps of all scales are concatenated, the coarser scale maps being upsampled to match the size of the finest scale map”, Section 3.1, para. 10, “the outputs of the N networks are upsampled and concatenated so as to produce F, a map of feature vectors of size N times the size of f1”), 
- constructing the convolutional artificial neural network (Farabet, Fig. 1) by cascading a series of the layers (Farabet, Section 3.1, para. 3, “Our feature extractor is a three-stage ConvNet”, Section 3.1, para. 2, “A typical ConvNet is composed of one, two, or three such three-layer stages, followed by a classification module”, Fig. 1 shows the cascaded layers).

Farabet does not explicitly teach training the filters in each layer pyramid by applying a backpropagation method.
However, Kanazawa teaches training the filters in each layer pyramid (Farabet, Section 3.1, para. 3, “The filters (convolution kernels) are subject to training”) by applying a backpropagation method (Kanazawa, Section 2.1, “The final layer is a classifier or regressor with a cost function such that the network can be trained in a supervised manner. The entire network is optimized jointly via stochastic gradient descent with gradients computed by backpropagation”, “The local feature detectors are trainable filters (kernels)”, part of the training is training the filters or kernels which is optimized backpropagation”). 
Farabet and Kanazawa are both considered to be analogous to the claimed invention because they are in the same field of image processing using convolutional neural network. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the apparatus as taught by Farabet to incorporate the teachings of Kanazawa of training the filters in each layer pyramid by applying a backpropagation method. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to optimized the entire network (Kanazawa, Section 2.1, para. 1).

Regarding claim 32, the combination of Farabet in view of Kanazawa teaches the apparatus of claim 31 (Farabet, Section 5, “Intel i7 laptop”), wherein the at least one processing core (Farabet, Section 5, “GPUs or other types of dedicated hardware”) is configured to test (Farabet, Section 4.3.3, Training Procedure, Fig. 1, convolutional neural network)  the constructed and trained neural network (Fig. 1) by computing  (Section 5, “To evaluate the representation from our multiscale ConvNet, we report results from several experiments on the Stanford Background dataset, the multiscale ConvNet presented in Section 3.1, with raw pixelwise prediction”, the classification accuracy can be seen in table 3) the series of convolutional layers (Farabet, Section 3.1, para. 3, “Our feature extractor is a three-stage ConvNet”, Section 3.1, para. 2, “A typical ConvNet is composed of one, two, or three such three-layer stages, followed by a classification module”, Fig. 1 shows the cascaded layers) for a test image (Section 5, “We use the evaluation procedure introduced in [15], 5-fold cross validation: 572 images used for training, and 143 for testing”, 143 images are used to test the convolutional neural network) and making a classification decision upon result of the last convolutional layer in the series (Farabet, Section 3.1, para. 3, “Our feature extractor is a three-stage ConvNet”, Section 3.1, para. 2, “A typical ConvNet is composed of one, two, or three such three-layer stages, followed by a classification module”).  

Regarding claim 35, Farabet teaches a non-transitory computer readable medium (Section 5, “Intel i7 laptop”, the Intel i7 laptop has memory) having stored (Section 5, “computing convolutions in parallel allows us to parse an image of size 320 x 240 in less than 1 second on a 4-core Intel i7 laptop”, the convolution is done in the laptop so the neural network is stored in the laptop) thereon a set of computer readable instructions (Fig. 1, convolutional neural network)that, when executed by at least one processor (Section 5, “GPUs or other types of dedicated hardware”), cause an apparatus (Section 5, “Intel i7 laptop”) to at least: 
- resize a convolutional layer input (Fig. 1, input image) of a convolutional artificial neural network (Fig. 1) with at least two different scales to obtain multiple groups of intermediate features maps (Fig. 1, “The raw input image is transformed through a Laplacian pyramid. Each scale is fed to a three-stage ConvNet, which produces a set of feature maps.”, as seen in Fig.1, the raw input image is resized to different scales and output multiple groups of intermediate feature maps), 
- convolve the intermediate feature maps with a filter (Section 3.1, para. 3, “The filters (convolution kernels) are subject to training. Each filter is applied to the input feature maps through a two-dimensional convolution operation which detects local features at all locations on the input”), 
- resize the convolution results (Fig. 1, “the coarser scale maps being upsampled to match the size of the finest scale map”, as seen in Fig. 1, the coarser feature maps which are the convolution results are upsampled to the size of the finest feature map, upsampling is a way of resizing an image) and 
- concatenate the resized convolution results to form an output of the convolutional layer (Fig. 1, “The feature maps of all scales are concatenated, the coarser scale maps being upsampled to match the size of the finest scale map”, Section 3.1, para. 10, “the outputs of the N networks are upsampled and concatenated so as to produce F, a map of feature vectors of size N times the size of f1”).

Farabet does not explicitly teach resize the convolution results to the size of the convolutional layer input.
	However, Kanazawa teaches resize the convolution results to the size of the convolutional layer input (Kanazawa, Fig. 1, “the responses of the convolution in each scale are normalized”, the input layer is resized to different scales which Farabet also teaches and after the convolution there’s an undo scaling step which resize the results to the size of the input image., normalize definition is to bring or return to a normal or standard condition and the normal size of layer was the original size before it was scaled in step 1 of Fig. 1. For example, in Fig. 2, the input was scaled by multiplying by 2 and to inverse the scale or undo it the layer was multiplied by ½. Therefore the Undo scaling step in Figure. 1 returns the size of the layer before it was scaled to a different size).
Farabet and Kanazawa are both considered to be analogous to the claimed invention because they are in the same field of image processing using convolutional neural network. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the non-transitory computer readable medium as taught by Farabet to incorporate the teachings of Kanazawa to resize the convolution results to the size of the convolutional layer input. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been “align the feature maps” and it summarizes the responses in a concise way that allows to maintain the same output size as a standard convolution layer (Kanazawa, Section 3.1, para. 1).

Claims 25-26 are rejected under 35 U.S.C. 103 as being unpatentable over Farabet in view of Kanazawa and in further view of Alvarez et al. "Semantic Road Segmentation via Multi-scale Ensembles of Learned Features" (2012), hereinafter referred to as Alvarez.

Regarding claim 25, the combination of Farabet in view of Kanazawa teaches the method of claim 20 (Farabet, Abstract, “method that uses a multiscale convolutional network trained from raw pixels to extract dense feature vectors that encode regions of multiple sizes centered on each pixel”).
The combination of Farabet in view of Kanazawa does not explicitly teach wherein different weights are applied for the convolution at different scales.
	However, Alvarez teaches wherein different weights are applied for the convolution at different scales (Alvarez, Fig. 2, Alvarez also teaches a multi-scale convolutional neural network, Section 3.1, “More precisely, we focus on class– dependent weighted linear combination were each feature (scale and resolution) for each class receives a different weight.”).
Alvarez is considered to be analogous to the claimed invention because it is in the same field of image processing using convolutional neural network. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method as taught by the combination of Farabet in view of Kanazawa to incorporate the teachings of Alvarez wherein different weights are applied for the convolution at different scales. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to differentiate the importance of the feature as Alvarez disclosed “the larger the weight, the more important the feature. (Alvarez, Section 4.1, para. 3).

Regarding claim 26, the combination of Farabet in view of Kanazawa teaches the method of claim 20 (Farabet, Abstract, “method that uses a multiscale convolutional network trained from raw pixels to extract dense feature vectors that encode regions of multiple sizes centered on each pixel”).

The combination of Farabet in view of Kanazawa does not explicitly teach downsampling the output of the convolution layer to form a subsequent convolutional layer input, and constructing the subsequent convolutional layer starting from the downsampled output.  
	However, Alvarez teaches downsampling the output of the convolution layer (Fig. 3, Alvarez also teaches a multi-scale neural network, after the first C-layer which is a convolutional layer, the there is a subsampling layer) to form a subsequent convolutional layer input (Fig. 2, the sub-sampling layer or S-layer is input to the next C-layer or convolutional layer), and constructing the subsequent convolutional layer starting from the downsampled output (Fig. 2, after the subsampling, there is a second C-layer or convolutional layer which starts from the subsampled output).  
Alvarez is considered to be analogous to the claimed invention because it is in the same field of image processing using convolutional neural network. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method as taught by the combination of Farabet in view of Kanazawa to incorporate the teachings of Alvarez downsampling the output of the convolution layer to form a subsequent convolutional layer input, and constructing the subsequent convolutional layer starting from the downsampled output. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to reduce the computational training overhead. (Alvarez, Fig. 3).

Claims 33-34 are rejected under 35 U.S.C. 103 as being unpatentable over Farabet in view of Alvarez et al. "Semantic Road Segmentation via Multi-scale Ensembles of Learned Features" (2012), hereinafter referred to as Alvarez.

Regarding claim 33, the combination of Farabet in view of Kanazawa teaches the apparatus of claim 27 (Section 5, “Intel i7 laptop”), wherein the at least one processing core (Section 5, “GPUs or other types of dedicated hardware”).

Farabet does not explicitly teach to apply different weights for the convolution at different scales.
	However, Alvarez teaches to apply different weights for the convolution at different scales (Alvarez, Fig. 2, Alvarez also teaches a multi-scale convolutional neural network, Section 3.1, “More precisely, we focus on class– dependent weighted linear combination were each feature (scale and resolution) for each class receives a different weight.”).
Alvarez is considered to be analogous to the claimed invention because it is in the same field of image processing using convolutional neural network. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the apparatus as taught by Farabet to incorporate the teachings of Alvarez to apply different weights for the convolution at different scales. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to differentiate the importance of the feature as Alvarez disclosed “the larger the weight, the more important the feature. (Alvarez, Section 4.1, para. 3).

Regarding claim 34, the combination of Farabet in view of Kanazawa teaches the apparatus of claim 27 (Section 5, “Intel i7 laptop”),  wherein the at least one processing core (Section 5, “GPUs or other types of dedicated hardware”). 

Farabet does not explicitly teach to downsample the output of the convolution layer to form a subsequent convolutional layer input, and construct the subsequent convolutional layer starting from the downsampled output.    
	However, Alvarez teaches to downsample the output of the convolution layer (Fig. 3, Alvarez also teaches a multi-scale neural network, after the first C-layer which is a convolutional layer, the there is a subsampling layer) to form a subsequent convolutional layer input (Fig. 2, the sub-sampling layer or S-layer is input to the next C-layer or convolutional layer), and construct the subsequent convolutional layer starting from the downsampled output (Fig. 2, after the subsampling, there is a second C-layer or convolutional layer which starts from the subsampled output). 
Alvarez is considered to be analogous to the claimed invention because it is in the same field of image processing using convolutional neural network. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method as taught by Farabet to incorporate the teachings of Alvarez to downsample the output of the convolution layer to form a subsequent convolutional layer input, and construct the subsequent convolutional layer starting from the downsampled output. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to reduce the computational training overhead. (Alvarez, Fig. 3).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DENISE G ALFONSO whose telephone number is (571)272-1360. The examiner can normally be reached Monday - Friday 7:30 - 5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Claire Wang can be reached on 571-270-1051. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DENISE G ALFONSO/Examiner, Art Unit 2663                                                                                                                                                                                                        
/CLAIRE X WANG/Supervisory Patent Examiner, Art Unit 2663