DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
The amendment filed April 21st, 2022 has been entered. Claims 1-20 remain pending in the application. Applicant’s amendment to the Claims have overcome each and every 101 rejections previously set forth in the Non-Final Office Action mailed March 17th, 2022.

Response to Arguments
Applicant's arguments filed April 21st, 2022 have been fully considered. 
In response to applicant's argument regarding claims 1-20 that the claims are not directed to an abstract idea because, like the claims in Enfish, "they are directed to a specific improvement to the way computers operate...”, Id. At 1336. The applicant cited paragraph [0063] as an example which can be seen in the screenshot below. Applicant’s argument has been found to be persuasive.

    PNG
    media_image1.png
    387
    676
    media_image1.png
    Greyscale

The concurrency of the processing of the layers of neural network in conjunction with the other recited additional elements reflects the improvement and therefore integrates the abstract idea into a practical application. Therefore, claims 1-20 are not directed to an abstract idea because they are directed to a specific improvement.
In response to applicant’s argument about claim 1, 10 and, 19 that Abdelouahab and Song do not appear to teach or suggest the recited features of amended claims 1, 10, and 19 (seen in the two screenshots below. Applicant’s argument has been found to be not persuasive.

    PNG
    media_image2.png
    219
    701
    media_image2.png
    Greyscale


    PNG
    media_image3.png
    200
    696
    media_image3.png
    Greyscale


	Abdelouahab teaches the limitations above on the amended claims. Abdelouahab teaches first and second filter that is associated with the same layer of neural network in Fig. 3.a and Section 3.1. “As mentioned in section 2.4.1, most of the weights of CNNs are employed in the FC parts. Instead of loading these weights multiple times to classify multiple inputs, feature maps of FC layers are batched in a way that FC weights are loaded only one time per batch” (Abdelouahab, Section 3.1). As seen in Fig. 3.a there is a first i-th filter and a second i-th filter that are being multiplied to the input FMs. Also, as seen in Fig. 3.a, the stride of the combined filter is greater than the single first i-th filter and second i-th filter. Therefore, the teachings of Abdelouahab reads on the limitations above on the amended claims 1, 20, and 19.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claims 1-18 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Abdelouahab et al. "Accelerating CNN inference on FPGAs: A Survey" (May, 2018), hereinafter referred to as Abdelouahab (cited previously in IDS).

Regarding claim 1, Abdelouahab teaches a method of pipelining inference (see Abstract, “the methods and tools investigated in this survey represent the recent trends in FPGA CNN inference accelerators”) of a neural network (see Fig. 1, CNN) comprising a plurality of layers (see Section 2.1, “CNN structure consists of a pipeline of layers) comprising an i-th layer (i being an integer greater than zero) (see Fig. 1, there are more than 1 layers in the CNN, conv layer) and an (i+1)-th layer (see Fig. 1, there are more than 1 layers in the CNN, act layer after the conv layer), the method comprising: 
processing, for a first input image (see Section 2.3, B input images, B stand for batch size or number of input frames, Fig. 1, input FMs or feature map), first i-th values (see, Fig. 1, input FMs corresponds to the i-th values) of the i-th layer to generate first (i+1)-th values for the (i+1)-th layer (see Fig. 1 and Fig. 3, the input FM or feature map is processed by the conv layer which is the i-th layer, then it outputs a FM to go to act layer which is the (i+1)-th layer); 
processing, for the first input image (see Section 2.3, B input images, Fig. 1, input FMs) , by a controller (Fig. 4, controller)  using a composite filter (Section 3.1, Fig. 3.a, multiple FC or Fully-Connected weights or filter) comprising a first i-th filter associated with the i-th layer and a second i-th filter associated with the i-th laver (Section 3.1, Fig. 3.a, “As mentioned in section 2.4.1, most of the weights of CNNs are employed in the FC parts. Instead of loading these weights multiple times to classify multiple inputs, feature maps of FC layers are batched in a way that FC weights are loaded only one time per batch”, as seen in Fig. 3.a there is first i-th filter and second i-th filter that is being multiplied to the input FMs),  the first (i+1)-th values (see, Fig. 1 and Fig. 3, the conv layer outputs an FM which corresponds to (i+1)-th values) of the (i+1)-th layer to generate output values (see Fig. 1 and Fig. 3, the output FM of conv layer becomes the input FM of act layer which then generate an output FM that is applied to the pool layer) ), wherein a stride of the composite filter is greater than or equal to a sum of a stride of the first i-th filter and a stride of the second i-th filter (Section 3.1, Fig. 3.a, “As mentioned in section 2.4.1, most of the weights of CNNs are employed in the FC parts. Instead of loading these weights multiple times to classify multiple inputs, feature maps of FC layers are batched in a way that FC weights are loaded only one time per batch”, as seen in Fig. 3.a, the stride of the combined filter is greater than the single first i-th filter and second i-th filter); and
concurrently (see Section 2.4.2, “the extensive concurrency exhibited by CNNs”, inter-layer parallelism) with processing (see Fig. 1 and Fig. 3, the output FM of conv layer becomes the input FM of act layer which then generate an output FM that is applied to the pool layer), for the first input image (see Section 2.3, B input images, Fig. 1, input FMs), the (i+1)-th values (see, Fig. 1 and Fig. 3, the conv layer outputs an FM which corresponds to (i+1)-th values);
processing, for a second input image (see Section 2.4.2, “multiple frames grouped as a batch B” so B input images can have first and second input images, Section 2.4.2, inter-layer parallelism, These layers can be executed in a pipelined fashion by launching layer (l) before ending the execution of layer (l − 1), so multiple frames are being processed at the same time), second i-th values of the i-th layer to generate second (i+1)-th values (see Fig. 1 and Fig. 3, shows the CNN layers and process, FM corresponds to i-th values and conv layer corresponds to i-th layer).

Regarding claim 2, Abdelouahab teaches the method of claim 1 (see Abstract, “the methods and tools investigated in this survey represent the recent trends in FPGA CNN inference accelerators”), wherein the processing, for the second input image, the second i-th values is performed concurrently with the processing, for the first input image, the first i-th values (see Section 2.4.2, batch parallelism, “CNN implementation can simultaneously classify multiple frames grouped as a batch B in order to reuses the filters in each layer and minimize the external memory access”, so the images in batch B which in this case are the first input image and second input image are processed at the same time by reusing the filters in each layer). 

Regarding claim 3, Abdelouahab teaches the method of claim 1 (see Abstract, “the methods and tools investigated in this survey represent the recent trends in FPGA CNN inference accelerators”), wherein the first i-th values comprise pixel values of the first input image (see Fig. 1, input FM or feature map comes from the input image which consists of pixel value, input FM corresponds to first i-th values), and wherein the second i-th values comprise pixel values of the second image (see Section 2.4.2, “multiple frames grouped as a batch B”, so B input images can have first and second input images, Fig. 1, second input image has an input feature map which corresponds to second i-th values).

Regarding claim 4, Abdelouahab teaches the method of claim 1 (see Abstract, “the methods and tools investigated in this survey represent the recent trends in FPGA CNN inference accelerators”), wherein the first i-th values comprise values of a first feature map generated by a previous layer of the neural network, the first feature map corresponding to the first input image (see Fig. 1, input FM, Fig. 3, the CNN layer has an input FM that comes from the input image which then generates an output FM or feature map), and 
wherein the second i-th values (see Section 2.4.2, “multiple frames grouped as a batch B”, so B input images can have first and second input images, Fig. 1, second input image has an input feature map which corresponds to second i-th values) comprise values of a second feature map generated by the previous layer of the neural network, the second feature map corresponding to the second input image (see Fig. 1, input FM, Fig. 3, the CNN layer has an input FM that comes from the input image which then generates an output FM or feature map, see Section 2.4.2, “multiple frames grouped as a batch B” so B input images can have first and second input images, Section 2.4.2, inter-layer parallelism, These layers can be executed in a pipelined fashion by launching layer (l) before ending the execution of layer (l − 1), so multiple frames are being processed at the same time).

Regarding claim 5, Abdelouahab teaches the method of claim 1 (see Abstract, “the methods and tools investigated in this survey represent the recent trends in FPGA CNN inference accelerators”), wherein the processing, for the first input image (see Section 2.3, input images, Fig. 1, input FMs or feature map), the first i-th values (see, Fig. 1, input FMs corresponds to the i-th values) of the i-th layer (see Fig.1, different CNN layers) comprises:
applying the composite filter (Section 3.1, Fig. 3.a, “As mentioned in section 2.4.1, most of the weights of CNNs are employed in the FC parts. Instead of loading these weights multiple times to classify multiple inputs, feature maps of FC layers are batched in a way that FC weights are loaded only one time per batch”, as seen in Fig. 3.a there is first i-th filter and second i-th filter that is being multiplied to the input FMs) to the first i-th values of the i-th layer to generate the (i+1)-th values for the (i+1)-th layer (see Fig. 3, where the filter is applied to the first i-th values of input FM to generate an output FM or (i+1)-th values, Section 3.1, “a transformation flattens all the filters of a given conv layer onto an N ×CKJ matrix Θ˜ and re-arranges input FMs onto a CKJ ×UV matrix X”).

Regarding claim 6, Abdelouahab teaches the method of claim 5 (see Abstract, “the methods and tools investigated in this survey represent the recent trends in FPGA CNN inference accelerators”), wherein the processing, for the second input image (see Section 2.4.2, “multiple frames grouped as a batch B” so B input images can have first and second input images, Section 2.4.2, inter-layer parallelism, These layers can be executed in a pipelined fashion by launching layer (l) before ending the execution of layer (l − 1), so multiple frames are being processed at the same time), the second i-th values of the i-th layer (see Fig. 1 and Fig. 3, shows the CNN layers and process, FM corresponds to i-th values and conv layer corresponds to i-th layer) comprises: 
applying the composite filter (Section 3.1, Fig. 3.a, “As mentioned in section 2.4.1, most of the weights of CNNs are employed in the FC parts. Instead of loading these weights multiple times to classify multiple inputs, feature maps of FC layers are batched in a way that FC weights are loaded only one time per batch”, as seen in Fig. 3.a there is first i-th filter and second i-th filter that is being multiplied to the input FMs) to the second i-th values of the i-th layer to generate the second (i+1)-th values for the (i+1)-th layer (see Fig. 3, where the filter is applied to the first i-th values of input FM to generate an output FM or (i+1)-th values, Section 3.1, “a transformation flattens all the filters of a given conv layer onto an N ×CKJ matrix Θ˜ and re-arranges input FMs onto a CKJ ×UV matrix X”, in accordance to section 2.4.2, batch parallelism, the same process is being done to second input image).

Regarding claim 7, Abdelouahab teaches the method of claim 5 (see Abstract, “the methods and tools investigated in this survey represent the recent trends in FPGA CNN inference accelerators”), wherein the first i-th filter (see Section 3.1, “filters of a given conv layer is flatten onto N x CKJ matrix) is a sliding convolutional filter in a form of a p x q matrix (see Fig. 3, the filter is a sliding convolutional filter which is in N x C matrix or p x q matrix), where p and q are integers greater than zero (see Fig. 3, the filters N x CKJ matrix is greater than zero).

Regarding claim 8, Abdelouahab teaches the method of claim 5 (see Abstract, “the methods and tools investigated in this survey represent the recent trends in FPGA CNN inference accelerators”), wherein applying the composite filter (see Section 3.1, Fig. 3, “filters” , Section 3.1, Fig. 3.a, “As mentioned in section 2.4.1, most of the weights of CNNs are employed in the FC parts. Instead of loading these weights multiple times to classify multiple inputs, feature maps of FC layers are batched in a way that FC weights are loaded only one time per batch”, as seen in Fig. 3.a there is first i-th filter and second i-th filter that is being multiplied to the input FMs) comprises: 
performing a matrix multiplication operation between the composite filter (see Section 3.1, Fig. 3, “filters” , Section 3.1, Fig. 3.a, “As mentioned in section 2.4.1, most of the weights of CNNs are employed in the FC parts. Instead of loading these weights multiple times to classify multiple inputs, feature maps of FC layers are batched in a way that FC weights are loaded only one time per batch”, as seen in Fig. 3.a there is first i-th filter and second i-th filter that is being multiplied to the input FMs)and ones of the first i-th values overlapping the composite filter (see Section 3.1, Fig. 3, matrix multiplication operation between the weights or filter and the input FM or i-th values of the input image).

Regarding claim 9, Abdelouahab teaches the method of claim 1 (see Abstract, “the methods and tools investigated in this survey represent the recent trends in FPGA CNN inference accelerators”), wherein the processing, for the second input image(see Section 2.4.2, “multiple frames grouped as a batch B” so B input images can have first and second input images, Section 2.4.2, inter-layer parallelism, These layers can be executed in a pipelined fashion by launching layer (l) before ending the execution of layer (l − 1), so multiple frames are being processed at the same time), second i-th values of the i-th layer (see Fig. 1 and Fig. 3, shows the CNN layers and process, FM corresponds to i-th values and conv layer corresponds to i-th layer) is initiated a time offset after initiation of the processing (see Section 2.4.2, inter-layer parallelism, “layers can be executed in a pipelined fashion by launching layer (l) before
ending the execution of layer (l − 1)”, so there is a time offset between the processing of the time first input image and the second input image because layer l which processes the second input image is launched before the ending of the execution of layer (l – 1) which processes the first input image), for the first input image (see Section 2.3, B input images, B stand for batch size or number of input frames, Fig. 1, input FMs or feature map), the first i-th values of the i-th layer (see, Fig. 1, input FMs corresponds to the i-th values), and 
wherein the time offset is greater than or equal to a number of clock cycles (see Fig. 1 and Fig. 3, conv weights corresponds to i-th filter, as explained in Section 2.4.2, inter-layer parallelism, the offset is between the processing of layer l and layer (l – 1) which should use at least one conv weight or filter, so the time offset is greater than or equal to a number of cycles corresponding to a single stride of the conv weight or filter) corresponding to a single stride of the composite filter (see Section 3.1, Fig. 3, “filters” , Section 3.1, Fig. 3.a, “As mentioned in section 2.4.1, most of the weights of CNNs are employed in the FC parts. Instead of loading these weights multiple times to classify multiple inputs, feature maps of FC layers are batched in a way that FC weights are loaded only one time per batch”, as seen in Fig. 3.a there is first i-th filter and second i-th filter that is being multiplied to the input FMs).

Regarding claim 10, Abdelouahab teaches a system for pipelining inference of a neural network (see Abstract, “the methods and tools investigated in this survey represent the recent trends in FPGA CNN inference accelerators”, “CNN workloads have a streaming nature, well suited to reconfigurable hardware architectures such as FPGAs”) comprising a plurality of layers (see Section 2.1, “CNN structure consists of a pipeline of layers) comprising an i-th layer (i being an integer greater than zero) (see Fig. 1, there are more than 1 layers in the CNN, conv layer), an (i+1)-th layer (see Fig. 1, there are more than 1 layers in the CNN, act layer after the conv layer), and an (i+2)-th layer (see Fig. 1, there are more than 1 layers in the CNN, pool layer after the act layer), the system comprising: 
a processor (see Section 3, “computational transforms are mainly deployed in CPUs and GPU”); and 
a processor memory (see Section 2.4.3, Memory accesses in CNNs, “DRAM”) local to the processor (see Section 3, “computational transforms are mainly deployed in CPUs and GPU”), wherein the processor memory has stored thereon instructions that (see Section 2.4.3, Memory accesses in CNNs, “DRAM”), when executed by the processor, cause the processor to perform: 
processing, for a first input image (see Section 2.3, B input images, B stand for batch size or number of input frames, Fig. 1, input FMs or feature map) , by a controller (Fig. 4, controller)  using a composite filter (Section 3.1, Fig. 3.a, multiple FC or Fully-Connected weights or filter) comprising a first i-th filter associated with the i-th layer and a second i-th filter associated with the i-th laver (Section 3.1, Fig. 3.a, “As mentioned in section 2.4.1, most of the weights of CNNs are employed in the FC parts. Instead of loading these weights multiple times to classify multiple inputs, feature maps of FC layers are batched in a way that FC weights are loaded only one time per batch”, as seen in Fig. 3.a there is first i-th filter and second i-th filter that is being multiplied to the input FMs), the first (i+1)-th values (see, Fig. 1 and Fig. 3, the conv layer outputs an FM which corresponds to (i+1)-th values) of the (i+1)-th layer to generate output values (see Fig. 1 and Fig. 3, the output FM of conv layer becomes the input FM of act layer which then generate an output FM that is applied to the pool layer) , wherein a stride of the composite filter is greater than or equal to a sum of a stride of the first i-th filter and a stride of the second i-th filter (Section 3.1, Fig. 3.a, “As mentioned in section 2.4.1, most of the weights of CNNs are employed in the FC parts. Instead of loading these weights multiple times to classify multiple inputs, feature maps of FC layers are batched in a way that FC weights are loaded only one time per batch”, as seen in Fig. 3.a, the stride of the combined filter is greater than the single first i-th filter and second i-th filter); 
processing, for the first input image (see Section 2.3, B input images, Fig. 1, input FMs), the first (i+1)-th values (see, Fig. 1 and Fig. 3, the conv layer outputs an FM which corresponds to (i+1)-th values) of the (i+1)-th layer to generate output values (see Fig. 1 and Fig. 3, the output FM of conv layer becomes the input FM of act layer which then generate an output FM that is applied to the pool layer); and
concurrently (see Section 2.4.2, “the extensive concurrency exhibited by CNNs”, inter-layer parallelism) with processing (see Fig. 1 and Fig. 3, the output FM of conv layer becomes the input FM of act layer which then generate an output FM that is applied to the pool layer), for the first input image (see Section 2.3, B input images, Fig. 1, input FMs), the (i+1)-th values (see, Fig. 1 and Fig. 3, the conv layer outputs an FM which corresponds to (i+1)-th values),
processing, for a second input image (see Section 2.4.2, “multiple frames grouped as a batch B” so B input images can have first and second input images, Section 2.4.2, inter-layer parallelism, These layers can be executed in a pipelined fashion by launching layer (l) before ending the execution of layer (l − 1), so multiple frames are being processed at the same time), second i-th values of the i-th layer to generate second (i+1)-th values (see Fig. 1 and Fig. 3, shows the CNN layers and process, FM corresponds to i-th values and conv layer corresponds to i-th layer).

Regarding claim 11, Abdelouahab teaches the system of claim 10 (see Abstract, “the methods and tools investigated in this survey represent the recent trends in FPGA CNN inference accelerators”, “CNN workloads have a streaming nature, well suited to reconfigurable hardware architectures such as FPGAs”), wherein the processing, for the second input image, the second i-th values is performed concurrently with the processing, for the first input image, the first i-th values (see Section 2.4.2, batch parallelism, “CNN implementation can simultaneously classify multiple frames grouped as a batch B in order to reuses the filters in each layer and minimize the external memory access”, so the images in batch B which in this case are the first input image and second input image are processed at the same time by reusing the filters in each layer).

Regarding claim 12, Abdelouahab teaches the system of claim 10 (see Abstract, “the methods and tools investigated in this survey represent the recent trends in FPGA CNN inference accelerators”, “CNN workloads have a streaming nature, well suited to reconfigurable hardware architectures such as FPGAs”), wherein the first i-th values comprise pixel values of the first input image (see Fig. 1, input FM or feature map comes from the input image which consists of pixel value, input FM corresponds to first i-th values), and wherein the second i-th values comprise pixel values of the second input image (see Section 2.4.2, “multiple frames grouped as a batch B”, so B input images can have first and second input images, Fig. 1, second input image has an input feature map which corresponds to second i-th values).

Regarding claim 13, Abdelouahab teaches the system of claim 10 (see Abstract, “the methods and tools investigated in this survey represent the recent trends in FPGA CNN inference accelerators”, “CNN workloads have a streaming nature, well suited to reconfigurable hardware architectures such as FPGAs”), wherein the first i-th values comprise values of a first feature map generated by a previous layer of the neural network, the first feature map corresponding to the first input image (see Fig. 1, input FM, Fig. 3, the CNN layer has an input FM that comes from the input image which then generates an output FM or feature map), and 
wherein the second i-th values (see Section 2.4.2, “multiple frames grouped as a batch B”, so B input images can have first and second input images, Fig. 1, second input image has an input feature map which corresponds to second i-th values) comprise values of a second feature map generated by the previous layer of the neural network, the second feature map corresponding to the second input image (see Fig. 1, input FM, Fig. 3, the CNN layer has an input FM that comes from the input image which then generates an output FM or feature map, see Section 2.4.2, “multiple frames grouped as a batch B” so B input images can have first and second input images, Section 2.4.2, inter-layer parallelism, These layers can be executed in a pipelined fashion by launching layer (l) before ending the execution of layer (l − 1), so multiple frames are being processed at the same time).

Regarding claim 14, Abdelouahab teaches the system of claim 10 (see Abstract, “the methods and tools investigated in this survey represent the recent trends in FPGA CNN inference accelerators”, “CNN workloads have a streaming nature, well suited to reconfigurable hardware architectures such as FPGAs”), wherein the processing, for the first input image (see Section 2.3, input images, Fig. 1, input FMs or feature map), the first i-th values (see, Fig. 1, input FMs corresponds to the i-th values) of the i-th layer (see Fig.1, different CNN layers) comprises:
applying the composite filter (Section 3.1, Fig. 3.a, “As mentioned in section 2.4.1, most of the weights of CNNs are employed in the FC parts. Instead of loading these weights multiple times to classify multiple inputs, feature maps of FC layers are batched in a way that FC weights are loaded only one time per batch”, as seen in Fig. 3.a there is first i-th filter and second i-th filter that is being multiplied to the input FMs) associated with the i-th layer to the first i-th values of the i-th layer to generate the (i+1)-th values for the (i+1)-th layer (see Fig. 3, where the filter is applied to the first i-th values of input FM to generate an output FM or (i+1)-th values, Section 3.1, “a transformation flattens all the filters of a given conv layer onto an N ×CKJ matrix Θ˜ and re-arranges input FMs onto a CKJ ×UV matrix X”).

Regarding claim 15, Abdelouahab teaches the system of claim 14 (see Abstract, “the methods and tools investigated in this survey represent the recent trends in FPGA CNN inference accelerators”, “CNN workloads have a streaming nature, well suited to reconfigurable hardware architectures such as FPGAs”), wherein the processing, for the second input image (see Section 2.4.2, “multiple frames grouped as a batch B” so B input images can have first and second input images, Section 2.4.2, inter-layer parallelism, These layers can be executed in a pipelined fashion by launching layer (l) before ending the execution of layer (l − 1), so multiple frames are being processed at the same time), the second i-th values of the i-th layer (see Fig. 1 and Fig. 3, shows the CNN layers and process, FM corresponds to i-th values and conv layer corresponds to i-th layer) comprises: 
applying the composite filter (Section 3.1, Fig. 3.a, “As mentioned in section 2.4.1, most of the weights of CNNs are employed in the FC parts. Instead of loading these weights multiple times to classify multiple inputs, feature maps of FC layers are batched in a way that FC weights are loaded only one time per batch”, as seen in Fig. 3.a there is first i-th filter and second i-th filter that is being multiplied to the input FMs) associated with the i-th layer to the second i-th values of the i-th layer to generate the second (i+1)-th values for the (i+1)-th layer (see Fig. 3, where the filter is applied to the first i-th values of input FM to generate an output FM or (i+1)-th values, Section 3.1, “a transformation flattens all the filters of a given conv layer onto an N ×CKJ matrix Θ˜ and re-arranges input FMs onto a CKJ ×UV matrix X”, in accordance to section 2.4.2, batch parallelism, the same process is being done to second input image).

Regarding claim 16, Abdelouahab teaches the system of claim 14 (see Abstract, “the methods and tools investigated in this survey represent the recent trends in FPGA CNN inference accelerators”, “CNN workloads have a streaming nature, well suited to reconfigurable hardware architectures such as FPGAs”), wherein the first i-th filter (see Section 3.1, “filters of a given conv layer is flatten onto N x CKJ matrix) is a sliding convolutional filter in a form of a p x q matrix (see Fig. 3, the filter is a sliding convolutional filter which is in N x C matrix or p x q matrix), where p and q are integers greater than zero (see Fig. 3, the filters N x CKJ matrix is greater than zero).

Regarding claim 17, Abdelouahab teaches the system of claim 14 (see Abstract, “the methods and tools investigated in this survey represent the recent trends in FPGA CNN inference accelerators”, “CNN workloads have a streaming nature, well suited to reconfigurable hardware architectures such as FPGAs”), wherein applying the composite filter (see Section 3.1, Fig. 3, “filters”, Section 3.1, Fig. 3.a, “As mentioned in section 2.4.1, most of the weights of CNNs are employed in the FC parts. Instead of loading these weights multiple times to classify multiple inputs, feature maps of FC layers are batched in a way that FC weights are loaded only one time per batch”, as seen in Fig. 3.a there is first i-th filter and second i-th filter that is being multiplied to the input FMs) comprises: 
performing a matrix multiplication operation between the composite filter (Section 3.1, Fig. 3.a, “As mentioned in section 2.4.1, most of the weights of CNNs are employed in the FC parts. Instead of loading these weights multiple times to classify multiple inputs, feature maps of FC layers are batched in a way that FC weights are loaded only one time per batch”, as seen in Fig. 3.a there is first i-th filter and second i-th filter that is being multiplied to the input FMs) and ones of the first i-th values overlapping the composite filter (see Section 3.1, Fig. 3, matrix multiplication operation between the weights or filter and the input FM or i-th values of the input image).

Regarding claim 18, Abdelouahab teaches the system of claim 10 (see Abstract, “the methods and tools investigated in this survey represent the recent trends in FPGA CNN inference accelerators”, “CNN workloads have a streaming nature, well suited to reconfigurable hardware architectures such as FPGAs”), wherein the processing, for the second input image(see Section 2.4.2, “multiple frames grouped as a batch B” so B input images can have first and second input images, Section 2.4.2, inter-layer parallelism, These layers can be executed in a pipelined fashion by launching layer (l) before ending the execution of layer (l − 1), so multiple frames are being processed at the same time), second i-th values of the i-th layer (see Fig. 1 and Fig. 3, shows the CNN layers and process, FM corresponds to i-th values and conv layer corresponds to i-th layer) is initiated a time offset after initiation of the processing (see Section 2.4.2, inter-layer parallelism, “layers can be executed in a pipelined fashion by launching layer (l) before
ending the execution of layer (l − 1)”, so there is a time offset between the processing of the time first input image and the second input image because layer l which processes the second input image is launched before the ending of the execution of layer (l – 1) which processes the first input image), for the first input image (see Section 2.3, B input images, B stand for batch size or number of input frames, Fig. 1, input FMs or feature map), the first i-th values of the i-th layer (see, Fig. 1, input FMs corresponds to the i-th values), and 
wherein the time offset is greater than or equal to a number of clock cycles (see Fig. 1 and Fig. 3, conv weights corresponds to i-th filter, as explained in Section 2.4.2, inter-layer parallelism, the offset is between the processing of layer l and layer (l – 1) which should use at least one conv weight or filter, so the time offset is greater than or equal to a number of cycles corresponding to a single stride of the conv weight or filter) corresponding to a single stride of the composite filter (Section 3.1, Fig. 3.a, “As mentioned in section 2.4.1, most of the weights of CNNs are employed in the FC parts. Instead of loading these weights multiple times to classify multiple inputs, feature maps of FC layers are batched in a way that FC weights are loaded only one time per batch”, as seen in Fig. 3.a there is first i-th filter and second i-th filter that is being multiplied to the input FMs).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Song et al. "PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning" (May, 2017), hereinafter referred to as Song (cited previously in IDS) in view of Abdelouahab. 

Regarding claim 19, Song teaches a configurable processing-in-memory (PIM) system (see Fig. 9, Abstract, “PipeLayer, a ReRAM-based PIM accelerator for CNNs that support both training and testing”) configured to implement a neural network (see Fig. 1), the system comprising: 
a first at least one PIM subarray (see Fig. 9, Section 3, “ReRAM-based main memory into two regions: morphable subarrays (Morp) and memory subarrays (Mem)”, “the morphable subarrays perform matrix-vector multiplications”) configured to perform a filtering operation (see Section 3, “the morphable subarrays perform matrix-vector multiplications”) of a filter (see Section 2.1 and Fig. 1, the filter in the CNN is the kernel or K, “K  is the kernel composed of a set of weights. Kl is the kernel used in the computation to generate data in layer l”) of an i-th layer of the neural network (i being an integer greater than zero) (see Fig.1, Section 2.1, “a convolution layer, a set of kernels are convoluted with data of channels from the previous layer (layer l) to generate data for channels of next layer (layer l+1)”, set of kernel corresponds to i-th filter and convolutional layer corresponds to i-th layer); 
a second at least one PIM subarray (see Fig. 9, Section 3, “the morphable subarrays perform matrix-vector multiplications”) configured to perform a filtering operation (see Section 3, “the morphable subarrays perform matrix-vector multiplications”) of a (i+1)-th filter of an (i+1)-th layer of the neural network (see Fig.1, Section 2.1, “a pooling layer”, set of kernel corresponds to (i+1)-th filter and pooling layer corresponds to (i+1)-th layer); and 
a controller (see Fig. 9, controller) configured to control the first and second at least one PIM subarrays (see Fig. 9 and Section 4.1 and 4.2.2), the controller (see Fig. 9, controller) being configured to perform: 
supplying first i-th values (see Fig. 1, Section 2.1, the i-th values is the “data of channels from the previous layer (layer l) to generate data for channels of next layer (layer l+1)”, Fig. 3, input d0) of the i-th layer (see Fig.1 and Section 2.1, i-th layer is convolutional layer, Fig. 3, cycle T1) to the first at least one PIM subarray (see Fig. 9, Section 3, “the morphable subarrays perform matrix-vector multiplications”) to generate first (i+1)-th values for the (i+1)-th layer (see Fig. 1, Section 2.1, the i-th values is the “data of channels from the previous layer (layer l) to generate data for channels of next layer (layer l+1)”, the (i+1)-th values is generated by convolutional layer which is for the next layer which is the (i+1)-th layer or pool layer, Fig. 3, d1 is the results of cycle T1), the first i-th values corresponding to a first input image (see Fig. 1, Section 2.1, Fig. 3); 
supplying the first (i+1)-th values of the (i+1)-th layer (see Fig. 1, Section 2.1, the i-th values is the “data of channels from the previous layer (layer l) to generate data for channels of next layer (layer l+1)”, the (i+1)-th values is generated by convolutional layer which is for the next layer which is the (i+1)-th layer or pool layer, in Fig. 3, d1 is the (i+1)-th values and T2 is the (i+1)-th layer) to the second at least one PIM subarray (see Fig. 9, Section 3, “the morphable subarrays perform matrix-vector multiplications”) to generate output values associated with the first input image (see Fig. 1 and Section 2.1, the pool layer generates an output data of channels for the next layer); and 
concurrently (see Section 3.3, inter layer parallelism, Fig. 6, (i+1)-th values in this Figure is A2 and  the second i-th values in this Figure is A1, they are processed simultaneously in cycle T1 and T2) with supplying the (i+1)-th values corresponding to the first input image (see Fig. 6, (i+1)-th values in this Figure is A2), 
supplying second i-th values (see Section 3.3, Fig. 6, the second i-th values in this Figure is A1)  of the i-th layer (see Section 3.3, Fig. 6, the second i-th layer in this Figure is cycle T1) to the first at least one PIM subarray (see Fig. 9, Section 3, “the morphable subarrays perform matrix-vector multiplications”) to generate second (i+1)-th values (see Section 2.1 and Fig. 1, it shows how CNN generates an output data or values for the next layer of the CNN), the second i-th values corresponding to a second input image (see Section 2.1 and Fig. 1 and Fig. 6).

Song does not explicitly disclose a composite filter comprising a first i-th filter and a second i-th filter, wherein a stride of the composite filter is greater than or equal to a sum of a stride of the first i-th filter and a stride of the second i-th filter.
	However, Abdelouahab teaches a composite filter (Section 3.1, Fig. 3.a, multiple FC or Fully-Connected weights or filter)  comprising a first i-th filter and a second i-th filter (Section 3.1, Fig. 3.a, “As mentioned in section 2.4.1, most of the weights of CNNs are employed in the FC parts. Instead of loading these weights multiple times to classify multiple inputs, feature maps of FC layers are batched in a way that FC weights are loaded only one time per batch”, as seen in Fig. 3.a there is first i-th filter and second i-th filter that is being multiplied to the input FMs), wherein a stride of the composite filter is greater than or equal to a sum of a stride of the first i-th filter and a stride of the second i-th filter (Section 3.1, Fig. 3.a, “As mentioned in section 2.4.1, most of the weights of CNNs are employed in the FC parts. Instead of loading these weights multiple times to classify multiple inputs, feature maps of FC layers are batched in a way that FC weights are loaded only one time per batch”, as seen in Fig. 3.a, the stride of the combined filter is greater than the single first i-th filter and second i-th filter).
	Song and Abdelouahab are both considered to be analogous to the claimed invention because they are in the same field of pipeline designed convolutional neural network. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the configurable processing-in-memory (PIM) system as taught by Song to incorporate the teachings of Abdelouahab wherein that a composite filter comprising a first i-th filter and a second i-th filter, wherein a stride of the composite filter is greater than or equal to a sum of a stride of the first i-th filter and a stride of the second i-th filter. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been “to increase the computational throughput in FC (Fully-Connected) layers while maintaining a constant memory bandwidth utilization” (Abdelouahab, Section 3.1).

Regarding claim 20, the combination of Song in view of Abdelouahab teaches the system of claim 19 (Song, see Fig. 9, Abstract, “PipeLayer, a ReRAM-based PIM accelerator for CNNs that support both training and testing”), wherein a PIM subarray of the first and second at least one PIM subarrays (Song, see Fig. 9, Section 3, “the morphable subarrays perform matrix-vector multiplications”) comprises: 
a plurality of bitcells (Song, see Section 3, “PipeLayer architecture directly leverages ReRAM cells to perform computation without the need for extra processing units”), for storing a plurality of weights (Song, see Section 3.2, “weights of one kernel is mapped to cells of one bit line, for example, the blue cuboid is mapped to the blue bar in the array and it is the same case for the red, green and the rest cuboids, resulting in 256 bit lines and 1152 world lines for the ReRAM array. Therefore, the ReRAM array needs to have a size of 1152×256”) corresponding to a respective one of the first i-th or second i-th or (i+1)-th filters (Song, see Section 2.1 and Fig. 1, the filter in the CNN is the kernel or K, “K  is the kernel composed of a set of weights. Kl is the kernel used in the computation to generate data in layer l”, therefore weights corresponds to the respective filters, Abdelouahab teaches the first and second i-th filter in one layer).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 


Any inquiry concerning this communication or earlier communications from the examiner should be directed to DENISE G ALFONSO whose telephone number is (571)272-1360. The examiner can normally be reached Monday - Friday 7:30 - 5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Claire Wang can be reached on 571-270-1051. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DENISE G ALFONSO/Examiner, Art Unit 2663            

/CLAIRE X WANG/Supervisory Patent Examiner, Art Unit 2663