Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
This action is in reply to the amendments and remarks filed on 06/01/2022.
Claims 1-24 are pending.
Claims 1, 7-8, 16, and 19 have been amended.  

Response to Arguments
Applicant’s arguments, with respect to the rejection(s) of claim(s) 1, 8, and 16 under 35 U.S.C. 103, have been considered but they are not persuasive. Specifically, the applicant argues at a high level that no art of record teaches the amended claim limitations of claims 1, 8, and 16 limitations that now recite “wherein data is shifted by one bit for continuous data and shifted by two bits at each of the discontinuities that occurs in the data to allow for unraveling of the data and eliminate a superfluous write of data in the at least one line buffer”. The examiner respectfully disagrees.
Due to the broadness of the claim language, the combination has been found to teach all requirements of the amended claim limitations. Examiner note: Applicant’s spec, paragraph 0009 states “if stride has a value of 0 then the lines of a line buffer can have the same data, where if the stride has a value of 1, the data can be shifted by 1 to allow for more efficient read/write operation within the line buffer”. Alwani, sections 1, 2 intro, 3.3.2’s “Convolution Engine”, 3.4.2, 5.2, Table 2.1 teach determining how much/directions to move the layer’s “pyramids” on the data read from memory based on a set “stride” to produce values; where the stride is set to 1 (data is shifted by one bit for continuous data) and utilizing operations for “nonlinearities” for “each output value” including adjusting a layer to “stride 2” (and shifted by two bits at each of the discontinuities that occurs in the data to allow for unraveling of the data) for implementing “the fused-layer CNN accelerator” to consolidate buffer usage and “minimizing data movement” (and eliminate a superfluous write of data in the at least one line buffer).
See 35 U.S.C 103 section for full mapping of claim limitations necessitated by applicant’s amendments.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-5, 7-10, 12-16, and 19-22 are rejected under 35 U.S.C. 103 as being unpatentable over Alwani ("Fused Convolutional Neural Network Accelerators", 2015), hereinafter Alwani, in view of Haraden et al (US Pub 20170004092) hereinafter Haraden.
Regarding claim 1, Alwani teaches a system for enhanced data processing using one or more virtualized hardware iterators in a neural network environment, the system comprising: 
at least one processor (section 3.4.2 teaches using a “processor”); 
at least one line buffer operable to read and/or write data (section 3.3.1, page 19, and section 3.3.2, page 22 teach using “on-chip buffers…called block RAM (BRAM)” (at least one line buffer), where each “BRAM has only two read and write ports” (operable to read and/or write data)); and 
at least one memory in communication with the at least one processor, the at least one memory having computer-readable instructions stored thereupon that, when executed by the at least one processor (sections “Chapter 1”, 3.3.1-3.3.2 and 3.4.2 teach memories connected to a processor, well known to execute stored instructions for implementing the described “methodology”), cause the at least one processor to: 
receive one or more initialization parameters from a cooperating controller component of the neural network environment, the initialization parameters indicative of dimensions of data to be processed by the neural network environment and data representative of one or more discontinuities of one or more data elements between one or more rows of the data (Examiner note: Applicant’s specification, paragraphs 0008, 0025, and 0028 state “data to be processed by the NN/DN environment can be represented as a blob”, where the blobs can include defined “dimensions…number of channels, number of kernels, and other available dimensional units.” Paragraph 0034 states training DNN to get parameters, where “parameters…can be known as either weights or kernels.” Further, paragraph 0048 and Fig. 6 state the “unraveled logical mapping” has “continuous data…segments 630 and 635”, therefore it is interpreted the “logical mapping” of 605 depicts discontinuous data segments “610, 615, and 620”, or otherwise depicted as “height” vs “width”, rows by columns, etc.
Alwani, sections 4.1 and 4.3 teaches each convolutional layer has different parameters (MNRCS) (initialization parameters), including “number of output/input feature map channels (M, N)” (indicative of the dimensions of data to be processed by the neural network environment) and where “R is number of rows for feature map and C is number of columns for feature map” (and data representative of one or more discontinuities of one or more data elements between one or more rows of the data). Sections 3.2, 4.1 and Table 2.1 further teach finding the “dimensions of the pyramid” from loaded parameters, and getting all weights and parameters “on chip” from external memory for pyramid operations (receive one or more initialization parameters from a cooperating controller component of the neural network environment).); 
load the data to be processed by the neural network environment from a cooperating memory component of the neural network environment (section 3.1, page 11, and section 4.3 teach “initial feature map is read from memory” for application to the CNN (load the data to be processed by the neural network environment from a cooperating memory of the neural network environment)); 
calculate shifting bits representative of a number of bits to shift the one or more data elements of the data according to the initialization parameters to enable a contiguous single write of the data (sections 2 intro, 3.2, 3.2.1, 3.2.2’s “Convolution Engine”, 3.3.1, 4.1, Table 2.1, and Figs. 3.2 and 3.4 teach finding the “dimensions of the pyramid” from loaded parameters such as “stride” (initialization parameters), and determining how much to move the “pyramids” on the data based on a set “stride” (calculate shifting bits representative of the number of bits to shift the one or more data elements of the data) in order to produce values; where intermediate values are stored in on-chip buffer for processing the next layer. It is taught that “on-chip…BRAM[s]” have two read/write ports to then send the values based on map dimensions/“parameters” for further processing in a single cycle (to enable a contiguous single write of the data)); 
receive one or more instructions from the cooperating controller component of the neural network environment to insert one or more of the shifting bits into the loaded data to generate directed line buffer data and to write the directed line buffer data in the at least one line buffer according to the one or more initialization parameters (sections 2 intro, 3.1, 3.2, 3.2.1, 3.3.2’s “Convolution Engine”, 4.1, Table 2.1, and Figs. 3.2 and 3.4 teach “[d]ata are first loaded from external memory and stored in on-chip buffers. These on-chip buffers are then used to feed each layer [of the CNN]”. Further it is taught determining how much/directions to move the “pyramids” on (insert one or more of the shifting bits into) the data read from memory (loaded data) based on a set “stride” to produce values, where intermediate values are stored in on-chip buffer for processing the next layer (generate directed line buffer data). It is taught that “on-chip buffers are called block RAM (BRAM) in FPGA terminology” and have two read/write ports to then send the values based on map dimensions/“parameters” for further processing (write the directed line buffer data in the at least one line buffer according to the one or more initialization parameters)), wherein data is shifted by one bit for continuous data and shifted by two bits at each of the discontinuities that occurs in the data to allow for unraveling of the data and eliminate a superfluous write of data in the at least one line buffer (Examiner note: Applicant’s spec, paragraph 0009 states “if stride has a value of 0 then the lines of a line buffer can have the same data, where if the stride has a value of 1, the data can be shifted by 1 to allow for more efficient read/write operation within the line buffer”.
Alwani, sections 1, 2 intro, 3.3.2’s “Convolution Engine”, 3.4.2, 5.2, Table 2.1 teach determining how much/directions to move the layer’s “pyramids” on the data read from memory based on a set “stride” to produce values; where the stride is set to 1 (data is shifted by one bit for continuous data) and utilizing operations for “nonlinearities” for “each output value” including adjusting a layer to “stride 2” (and shifted by two bits at each of the discontinuities that occurs in the data to allow for unraveling of the data) for implementing “the fused-layer CNN accelerator” to consolidate buffer usage and “minimizing data movement” (and eliminate a superfluous write of data in the at least one line buffer)); and 
communicate the written directed line buffer data in the at least one line buffer to the at least one processor of the neural network environment for processing, wherein the number of bits to be shifted are calculated at a periodic number of elements of the line buffer data that allow for the contiguous single write of the data (sections 3.1, 3.3.2, 3.4.2, 4.1, Table 2.1, and Figs. 3.2 and 3.4 teach that “on-chip buffers are called block RAM (BRAM) in FPGA terminology” and have two read/write ports to then send (communicate) the intermediate values (written directed line buffer data in the at least one line buffer) based on map dimensions/“parameters” for further processing by the “processor” for CNN computations (to the at least one processor of the neural network environment for processing). Further, sections 2 intro, 3.2, 3.2.1, 3.2.2’s “Convolution Engine”, 3.3.1, 4.1, Table 2.1, and Figs. 3.2 and 3.4 teach determining how much/directions to move the “pyramids” on the data for each layer (wherein the number of bits to be shifted are calculated) based on a set “stride” value of the data (at a periodic number of elements of the line buffer data) in order to produce values; where intermediate values are stored in on-chip buffer for processing the next layer. It is taught that “on-chip…BRAM[s]” have two read/write ports to then send the values based on map dimensions/“parameters” for further processing in a single cycle (that allow for the contiguous single write of the data)).
Alwani at least implies at least one memory in communication with the at least one processor, the at least one memory having computer-readable instructions stored thereupon that, when executed by the at least one processor (see mapping above), calculate shifting bits representative of the number of bits to shift the one or more data elements of the data according to the initialization parameters to enable a contiguous single write of the data (see mapping above); receive one or more instructions from the cooperating controller component of the neural network environment to insert one or more of the shifting bits into the loaded data to generate directed line buffer data and to write the directed line buffer data in the at least one line buffer according to the one or more initialization parameters (see mapping above), and wherein the number of bits to be shifted are calculated at a periodic number of elements of the line buffer data that allow for the contiguous single write of the data (see mapping above), however Haraden teaches at least one memory in communication with the at least one processor, the at least one memory having computer-readable instructions stored thereupon that, when executed by the at least one processor (paragraphs 0023 and 0031 teach “[a]ny of the disclosed methods can be implemented as computer-executable instructions stored on one or more computer-readable media” and executed by “one or more processors…coupled to memory”),
calculate shifting bits representative of the number of bits to shift the one or more data elements of the data according to the initialization parameters to enable a contiguous single write of the data (paragraphs 0033, 0039-0042, 0045, 0051 teach “specifying…parameters for specifying a block memory move” used by a line buffer and shift register for processing the data “in a single cycle” (calculate shifting bits representative of the number of bits to shift the one or more data elements of the data according to the initialization parameter to enable a contiguous single write of the data)); 
receive one or more instructions from the cooperating controller component of the neural network environment to insert one or more of the shifting bits into the loaded data to generate directed line buffer data and to write the directed line buffer data in the at least one line buffer according to the one or more initialization parameters (paragraphs 0033, 0039-0042, 0045 teach “a DMA controller 150 receives instructions from one or more of the processors 110 (cooperating controller component) specifying memory operations to be performed by the DMA engine 140.  For example, a DMA instruction will specify a starting address, an ending address, a stride width, element size, number of elements, and/or other suitable parameters for specifying a block memory move” (according to the one or more initialization parameters). Further the DMA is taught to include a “line buffer” and “shift register” for reading, processing (insert one or more of the shifting bits into the loaded data to generate directed line buffer data), and writing data (write the directed line buffer data in the at least one line buffer) and used in “neural networks” operations (neural network environment).), and 
wherein the number of bits to be shifted are calculated at a periodic number of elements of the line buffer data that allow for the contiguous single write of the data (paragraphs 0033, 0039-0042, 0045, 0051 teach “specifying…parameters for specifying a block memory move” used by a line buffer and shift register for processing the data “in a single cycle” (wherein the number of bits to be shifted are calculated at a periodic number of elements of the line buffer data that allow for the contiguous single write of the data)).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement Haraden’s teachings of utilizing a DMA including a line buffer, shift registers, and specified parameters for neural network operations into Alwani’s teaching of neural network accelerators utilizing on-chip buffers and sliding windows in order to increase processing efficiency and “increasing data throughput” (Haraden, paragraphs 0033, 0039-0042, 0045, 0051, 0062).

Regarding claim 2, the combination of Alwani and Haraden teach all the claim limitations of claim 1 above; and further teach wherein the insertion of the one or more shifting bits results in a single write of the directed line buffer data in the at least one line buffer (Alwani, sections 2 intro, 3.1, 3.2, 3.2.1, 3.2.2’s “Convolution Engine”, 3.3.1, 4.1, Table 2.1, and Figs. 3.2 and 3.4 teach determining how much/directions to move the “pyramids” on (insertion of the one or more shifting bits) the data read from memory based on a set “stride” to produce values, where intermediate values are stored in on-chip buffer for processing the next layer. It is taught that “on-chip…BRAM[s]” have two read/write ports to then send the values based on map dimensions/“parameters” for further processing in a single cycle (results in a single write of the directed line buffer data in the at least one line buffer)).

Regarding claim 3, the combination of Alwani and Haraden teach all the claim limitations of claim 1 above; and further teach wherein the computer-readable instructions further cause the at least one processor to communicate data that is traversed by a cooperating iterator to the line buffer (Alwani, sections “Chapter 1”, 3.3.1-3.3.2 and 3.4.2 teach memories connected to a processor, well known to execute stored instructions for implementing the described “methodology” of, as taught in sections 3.1, 3.3.2 and 4.3, a “module” used “iteratively” for completing “computations for the current layer” by the “processor” for CNN computations (traversed by a cooperating iterator), and that these computed intermediate values and/or output values are stored (at least one processor to communicate data) in on-chip buffer (to the line buffer) for processing the next layer).

Regarding claim 4, the combination of Alwani and Haraden teach all the claim limitations of claim 3 above; and further teach wherein the computer-readable instructions further cause the at least one processor to traverse the data utilizing one or more sliding windows, the windows operative to select one or more of the data elements (Alwani, sections “Chapter 1”, 3.3.1-3.3.2 and 3.4.2 teach memories connected to a processor (at least one processor to), well known to execute stored instructions for implementing the described “methodology” of, as taught in section 1, paragraph 7, and sections 3.1 page 11, and 3.2, using “a pyramid-shaped multi-layer sliding window” (utilizing one or more sliding windows) to bring “original input data…on chip” (at least one processor to traverse the data) as intermediate/output values (windows operative to select one or more of the data elements) for “effective on-chip caching during CNN evaluation”).

Regarding claim 5, the combination of Alwani and Haraden teach all the claim limitations of claim 4 above; and further teach wherein the computer-readable instructions further cause the at least one processor to traverse the loaded data using one or more sliding windows that straddle a data dimensional boundary of the loaded data (Alwani, sections “Chapter 1”, 3.3.1-3.3.2 and 3.4.2 teach memories connected to a processor, well known to execute stored instructions for implementing the described “methodology” of, as taught in section 1, paragraph 7, and sections 3.1 page 11, and 3.2, using “a pyramid-shaped multi-layer sliding window[s]” with computed “dimensions” that overlap and move about the input data in order to bring “original input data…on chip” (traverse the loaded data using one or more sliding windows that straddle a data dimensional boundary of the loaded data) as intermediate/output values for “effective on-chip caching during CNN evaluation”).

Regarding claim 7, the combination of Alwani and Haraden teach all the claim limitations of claim 1 above; and further teach wherein the computer-readable instructions further cause a copy of the written directed line buffer data in the at least one line buffer to be created and accessed by the one or more processing components of the neural network environment (Haraden, paragraphs 0033, 0039-0043, 0045 teach “a DMA controller 150 receives instructions from one or more of the processors 110 (one or more processing components) specifying memory operations to be performed by the DMA engine 140.  For example, a DMA instruction will specify a starting address, an ending address, a stride width, element size, number of elements, and/or other suitable parameters for specifying a block memory move”. Further the DMA is taught to include a “line buffer” and “shift register” for reading, processing, and writing data by “copying values” (cause a copy of the written directed line buffer data in the at least one line buffer to be created) and used in “neural networks” operations by “one or more of the processors” (accessed by the one or more processing components of the neural network environment).).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement Haraden’s teachings of utilizing a DMA including a line buffer, shift registers, copying data, and specified parameters for neural network operations into Alwani’s teaching of neural network accelerators utilizing on-chip buffers and sliding windows in order to increase processing efficiency and “increasing data throughput” (Haraden, paragraphs 0033, 0039-0043, 0051, 0062).

Regarding claim 8, Alwani teaches a computer-implemented method, comprising: 
receiving one or more initialization parameters from a cooperating controller component of a neural network environment, the initialization parameters representative of dimensions of data to be processed by the neural network environment and one or more discontinuities of one or more data elements between one or more rows of the data (Examiner note: Applicant’s specification, paragraphs 0008, 0025, and 0028 state “data to be processed by the NN/DN environment can be represented as a blob”, where the blobs can include defined “dimensions…number of channels, number of kernels, and other available dimensional units.” Paragraph 0034 states training DNN to get parameters, where “parameters…can be known as either weights or kernels.” Further, paragraph 0048 and Fig. 6 state the “unraveled logical mapping” has “continuous data…segments 630 and 635”, therefore it is interpreted the “logical mapping” of 605 depicts discontinuous data segments “610, 615, and 620”, or otherwise depicted as “height” vs “width”, rows by columns, etc.
Alwani, sections 4.1 and 4.3 teaches each convolutional layer has different parameters (MNRCS) (initialization parameters), including “number of output/input feature map channels (M, N)” (representative of the dimensions of data to be processed by the neural network environment) and where “R is number of rows for feature map and C is number of columns for feature map” (and one or more discontinuities of one or more data elements between one or more rows of the data). Sections 3.2, 4.1 and Table 2.1 further teach finding the “dimensions of the pyramid” from loaded parameters, and getting all weights and parameters “on chip” from external memory for pyramid operations (receiving one or more initialization parameters from a cooperating controller component of the neural network environment).); 
loading the data to be processed by the neural network environment from a cooperating memory component of the neural network environment (section 3.1, page 11, and section 4.3 teach “initial feature map is read from memory” for application to the CNN (loading the data to be processed by the neural network environment from a cooperating memory of the neural network environment)); 
iterating the loaded data according to a selected iteration operation by a cooperating iterator component of the neural network environment (sections 3.1, 3.3.2 and 4.3 teach “iteratively using a module” for completing “computations for the current layer” of “BRAM” data (iterating the loaded data according to a selected iteration operation by a cooperating iterator component) for neural network operations (neural network environment));
calculating shifting bits representative of a number of bits to shift the one or more data elements of the data according to the initialization parameters to enable a contiguous single write of the data (sections 2 intro, 3.2, 3.2.1, 3.2.2’s “Convolution Engine”, 3.3.1, 4.1, Table 2.1, and Figs. 3.2 and 3.4 teach finding the “dimensions of the pyramid” from loaded parameters such as “stride” (initialization parameters), and determining how much to move the “pyramids” on the data based on a set “stride” (calculate shifting bits representative of the number of bits to shift the one or more data elements of the data) in order to produce values; where intermediate values are stored in on-chip buffer for processing the next layer. It is taught that “on-chip…BRAM[s]” have two read/write ports to then send the values based on map dimensions/“parameters” for further processing in a single cycle (to enable a contiguous single write of the data)); 
receiving one or more instructions from the cooperating controller component of the neural network environment to insert one or more of the shifting bits into the loaded data to generate directed line buffer data and to write the directed line buffer data in the at least one line buffer according to the one or more initialization parameters (sections 2 intro, 3.1, 3.2, 3.2.1, 3.2.2’s “Convolution Engine”, 4.1, Table 2.1, and Figs. 3.2 and 3.4 teach “[d]ata are first loaded from external memory and stored in on-chip buffers. These on-chip buffers are then used to feed each layer [of the CNN]”. Further it is taught determining how much/directions to move the “pyramids” on (insert one or more of the shifting bits into) the data read from memory (loaded data) based on a set “stride” to produce values, where intermediate values are stored in on-chip buffer for processing the next layer (generate directed line buffer data). It is taught that “on-chip buffers are called block RAM (BRAM) in FPGA terminology” and have two read/write ports to then send the values based on map dimensions/“parameters” for further processing (write the directed line buffer data in the at least one line buffer according to the one or more initialization parameters)), wherein data is shifted by one bit for continuous data and shifted by two bits at each of the discontinuities that occurs in the data to allow for unraveling of the data and eliminate a superfluous write of data in the line buffer (Examiner note: Applicant’s spec, paragraph 0009 states “if stride has a value of 0 then the lines of a line buffer can have the same data, where if the stride has a value of 1, the data can be shifted by 1 to allow for more efficient read/write operation within the line buffer”.
Alwani, sections 1, 2 intro, 3.3.2’s “Convolution Engine”, 3.4.2, 5.2, Table 2.1 teach determining how much/directions to move the layer’s “pyramids” on the data read from memory based on a set “stride” to produce values; where the stride is set to 1 (data is shifted by one bit for continuous data) and utilizing operations for “nonlinearities” for “each output value” including adjusting a layer to “stride 2” (and shifted by two bits at each of the discontinuities that occurs in the data to allow for unraveling of the data) for implementing “the fused-layer CNN accelerator” to consolidate buffer usage and “minimizing data movement” (and eliminate a superfluous write of data in the at least one line buffer)); and 
communicate the written directed line buffer data in the at least one line buffer to the at least one processor of the neural network environment for processing, wherein the number of bits to be shifted are calculated at a periodic number of elements of the line buffer data that allow for the contiguous single write of the data (sections 3.1, 3.3.2, 3.4.2, 4.1, Table 2.1, and Figs. 3.2 and 3.4 teach that “on-chip buffers are called block RAM (BRAM) in FPGA terminology” and have two read/write ports to then send (communicate) the intermediate values (written directed line buffer data in the at least one line buffer) based on map dimensions/“parameters” for further processing by the “processor” for CNN computations (to the at least one processor of the neural network environment for processing). Further, sections 2 intro, 3.2, 3.2.1, 3.2.2’s “Convolution Engine”, 3.3.1, 4.1, Table 2.1, and Figs. 3.2 and 3.4 teach determining how much/directions to move the “pyramids” on the data for each layer (wherein the number of bits to be shifted are calculated) based on a set “stride” value of the data (at a periodic number of elements of the line buffer data) in order to produce values; where intermediate values are stored in on-chip buffer for processing the next layer. It is taught that “on-chip…BRAM[s]” have two read/write ports to then send the values based on map dimensions/“parameters” for further processing in a single cycle (that allow for the contiguous single write of the data)).
Alwani at least implies calculating shifting bits representative of the number of bits to shift the one or more data elements of the data according to the initialization parameters to enable a contiguous single write of the data (see mapping above); receiving one or more instructions from the cooperating controller component of the neural network environment to insert one or more of the shifting bits into the loaded data to generate directed line buffer data and to write the directed line buffer data in the at least one line buffer according to the one or more initialization parameters (see mapping above), and wherein the number of bits to be shifted are calculated at a periodic number of elements of the line buffer data that allow for the contiguous single write of the data (see mapping above), however Haraden teaches calculate shifting bits representative of the number of bits to shift the one or more data elements of the data according to the initialization parameters to enable a contiguous single write of the data (paragraphs 0033, 0039-0042, 0045, 0051 teach “specifying…parameters for specifying a block memory move” used by a line buffer and shift register for processing the data “in a single cycle” (calculate shifting bits representative of the number of bits to shift the one or more data elements of the data according to the initialization parameter to enable a contiguous single write of the data)); 
receive one or more instructions from the cooperating controller component of the neural network environment to insert one or more of the shifting bits into the loaded data to generate directed line buffer data and to write the directed line buffer data in the at least one line buffer according to the one or more initialization parameters (paragraphs 0033, 0039-0042 teach “a DMA controller 150 receives instructions from one or more of the processors 110 (cooperating controller component) specifying memory operations to be performed by the DMA engine 140.  For example, a DMA instruction will specify a starting address, an ending address, a stride width, element size, number of elements, and/or other suitable parameters for specifying a block memory move” (according to the one or more initialization parameters). Further the DMA is taught to include a “line buffer” and “shift register” for reading, processing (insert one or more of the shifting bits into the loaded data to generate directed line buffer data), and writing data (write the directed line buffer data in the at least one line buffer) and used in “neural networks” operations (neural network environment).), and 
wherein the number of bits to be shifted are calculated at a periodic number of elements of the line buffer data that allow for the contiguous single write of the data (paragraphs 0033, 0039-0042, 0045, 0051 teach “specifying…parameters for specifying a block memory move” used by a line buffer and shift register for processing the data “in a single cycle” (wherein the number of bits to be shifted are calculated at a periodic number of elements of the line buffer data that allow for the contiguous single write of the data)).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement Haraden’s teachings of utilizing a DMA including a line buffer, shift registers, and specified parameters for neural network operations into Alwani’s teaching of neural network accelerators utilizing on-chip buffers and sliding windows in order to increase processing efficiency and “increasing data throughput” (Haraden, paragraphs 0033, 0039-0042, 0051, 0062).

Regarding claim 9, the combination of Alwani and Haraden teach all the claim limitations of claim 8 above; and further teach wherein insertion of the one or more shifting bits results in a single write of the directed line buffer data in the line buffer (Alwani, sections 2 intro, 3.1, 3.2, 3.2.1, 3.2.2’s “Convolution Engine”, 3.3.1, 4.1, Table 2.1, and Figs. 3.2 and 3.4 teach determining how much/directions to move the “pyramids” on (insertion of the one or more shifting bits) the data read from memory based on a set “stride” to produce values, where intermediate values are stored in on-chip buffer for processing the next layer. It is taught that “on-chip…BRAM[s]” have two read/write ports to then send the values based on map dimensions/“parameters” for further processing in a single cycle (results in a single write of the directed line buffer data in the at least one line buffer)).

Regarding claim 10, the combination of Alwani and Haraden teach all the claim limitations of claim 8 above; and further teach further comprising traversing the data utilizing one or more sliding windows, wherein the sliding windows are operative to straddle a data dimensional boundary of the data (Alwani, section 1, paragraph 7, and sections 3.1 page 11, and 3.2 teach using “a pyramid-shaped multi-layer sliding window[s]” with computed “dimensions” that overlap and move about the input data in order to bring “original input data…on chip” (traverse the data utilizing one or more sliding windows, wherein the sliding windows are operative straddle a data dimensional boundary of the loaded data) as intermediate/output values for “effective on-chip caching during CNN evaluation”).

Regarding claim 12, the combination of Alwani and Haraden teach all the claim limitations of claim 8 above; and further teach generating a copy of written directed line buffer data (Haraden, paragraphs 0033, 0039-0043, 0045, 0065, and 0068 teach “a DMA controller 150 receives instructions from one or more of the processors 110 specifying memory operations to be performed by the DMA engine 140.  For example, a DMA instruction will specify a starting address, an ending address, a stride width, element size, number of elements, and/or other suitable parameters for specifying a block memory move”. Further the DMA is taught to include a “line buffer” and “shift register” for reading, processing, and writing data by “copying values” (generating a copy of written directed line buffer data) and used in “neural networks” operations by “one or more of the processors”.).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement Haraden’s teachings of utilizing a DMA including a line buffer, shift registers, copying data, and specified parameters for neural network operations into Alwani’s teaching of neural network accelerators utilizing on-chip buffers and sliding windows in order to increase processing efficiency and “increasing data throughput” (Haraden, paragraphs 0033, 0039-0043, 0051, 0062, 0065, and 0068).

Regarding claim 13, the combination of Alwani and Haraden teach all the claim limitations of claim 12 above; and further teach processing the generated copy of the written directed line buffer data by one or more cooperating processing units of the neural network environment (Haraden, paragraphs 0033, 0039-0043, 0045 teach “a DMA controller 150 receives instructions from one or more of the processors 110 (one or more cooperating processing units) specifying memory operations to be performed by the DMA engine 140.  For example, a DMA instruction will specify a starting address, an ending address, a stride width, element size, number of elements, and/or other suitable parameters for specifying a block memory move”. Further the DMA is taught to include a “line buffer” and “shift register” for reading, processing, and writing data by “copying values” (generated copy of the written directed line buffer data) and used in “neural networks” operations by “one or more of the processors” (processing…by one or more cooperating processing units of the neural network environment).).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement Haraden’s teachings of utilizing a DMA including a line buffer, shift registers, copying data, and specified parameters for neural network operations into Alwani’s teaching of neural network accelerators utilizing on-chip buffers and sliding windows in order to increase processing efficiency and “increasing data throughput” (Haraden, paragraphs 0033, 0039-0043, 0051, 0062).

Regarding claim 14, the combination of Alwani and Haraden teach all the claim limitations of claim 8 above; and further teach clearing the line buffer of the written directed line buffer data to receive additional buffer data for writing in the line buffer (Haraden, paragraphs 0033, 0039-0043, 0045 teach using a line buffer, where “one portion of the buffer to received filtered values (to receive additional buffer data for writing in the line buffer) while another portion of the buffer is being written” out (clearing the line buffer of the written directed line buffer data)).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement Haraden’s teachings of utilizing a DMA including a line buffer receiving and writing data for neural network operations into Alwani’s teaching of neural network accelerators utilizing on-chip buffers and sliding windows in order to increase processing efficiency and “increasing data throughput” (Haraden, paragraphs 0033, 0039-0043, 0051, 0062).

Regarding claim 15, the combination of Alwani and Haraden teach all the claim limitations of claim 8 above; and further teach writing the directed line buffer data in a selected number of lines in the line buffer wherein each line of the line buffer is associated with a cooperating processing unit of the neural network environment (Haraden, paragraphs 0033, 0039-0043, 0045, and Fig. 2 teach receiving “operations to be performed by the DMA engine 140.  For example, a DMA instruction will specify a starting address, an ending address, a stride width, element size, number of elements, and/or other suitable parameters for specifying a block memory move”. Further the DMA is taught to include a “line buffer” and “shift register” for reading, processing, and writing data according to the designated buffer rows (writing the directed line buffer data in a selected number of lines in the line buffer) for “neural networks” operations by “one or more of the processors” (wherein each line of the line buffer is associated with a cooperating processing unit of the neural network environment).).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement Haraden’s teachings of utilizing a DMA including designating line buffer rows for receiving and writing data for neural network operations into Alwani’s teaching of neural network accelerators utilizing on-chip buffers and sliding windows in order to increase processing efficiency and “increasing data throughput” (Haraden, paragraphs 0033, 0039-0043, 0051, 0062).

Regarding claim 16, Alwani teaches a computer-readable storage medium having computer-executable instructions stored thereupon which, when executed by one or more processors of a computing device (sections “Chapter 1”, 3.3.1-3.3.2 and 3.4.2 teach memories connected to a processor, well known to execute stored instructions for implementing the described “methodology”), cause the one or more processors of the computing device to: 
receiving one or more initialization parameters from a cooperating controller component of a neural network environment, the initialization parameters indicative of dimensions of data to be processed by the neural network environment and one or more discontinuities of one or more data elements between one or more rows of the data (Examiner note: Applicant’s specification, paragraphs 0008, 0025, and 0028 state “data to be processed by the NN/DN environment can be represented as a blob”, where the blobs can include defined “dimensions…number of channels, number of kernels, and other available dimensional units.” Paragraph 0034 states training DNN to get parameters, where “parameters…can be known as either weights or kernels.” Further, paragraph 0048 and Fig. 6 state the “unraveled logical mapping” has “continuous data…segments 630 and 635”, therefore it is interpreted the “logical mapping” of 605 depicts discontinuous data segments “610, 615, and 620”, or otherwise depicted as “height” vs “width”, rows by columns, etc.
Alwani, sections 4.1 and 4.3 teaches each convolutional layer has different parameters (MNRCS) (initialization parameters), including “number of output/input feature map channels (M, N)” (indicative of the dimensions of the data to be processed by the neural network environment) and where “R is number of rows for feature map and C is number of columns for feature map” (and one or more discontinuities of one or more data elements between one or more rows of the data). Sections 3.2, 4.1 and Table 2.1 further teach finding the “dimensions of the pyramid” from loaded parameters, and getting all weights and parameters “on chip” from external memory for pyramid operations (receiving one or more initialization parameters from a cooperating controller component of the neural network environment).); 
load the data to be processed by the neural network environment from a cooperating memory component of the neural network environment (section 3.1, page 11, and section 4.3 teach “initial feature map is read from memory” for application to the CNN (load the data to be processed by the neural network environment from a cooperating memory of the neural network environment)); 
iterating the loaded data according to a selected iteration operation by a cooperating iterator component of the neural network environment (sections 3.1, 3.3.2 and 4.3 teach “iteratively using a module” for completing “computations for the current layer” of “BRAM” data (iterating the loaded data according to a selected iteration operation by a cooperating iterator component) for neural network operations (neural network environment));
calculating shifting bits representative of a number of bits to shift the one or more data elements of the data according to the initialization parameters to enable a contiguous single write of the data (sections 2 intro, 3.2, 3.2.1, 3.2.2’s “Convolution Engine”, 3.3.1, 4.1, Table 2.1, and Figs. 3.2 and 3.4 teach finding the “dimensions of the pyramid” from loaded parameters such as “stride” (initialization parameters), and determining how much to move the “pyramids” on the data based on a set “stride” (calculate shifting bits representative of the number of bits to shift the one or more data elements of the data) in order to produce values; where intermediate values are stored in on-chip buffer for processing the next layer. It is taught that “on-chip…BRAM[s]” have two read/write ports to then send the values based on map dimensions/“parameters” for further processing in a single cycle (to enable a contiguous single write of the data)); 
receive one or more instructions from the cooperating controller component of the neural network environment to insert one or more of the shifting bits into the loaded data to generate directed line buffer data and to write the directed line buffer data in the at least one or more lines of a line buffer wherein the one or more lines of the line buffer are associated to one or more processing components of the neural network environment (sections 2 intro, 3.1, 3.2, 3.2.1, 3.2.2’s “Convolution Engine”, 4.1, Table 2.1, and Figs. 3.2 and 3.4 teach “[d]ata are first loaded from external memory and stored in on-chip buffers. These on-chip buffers are then used to feed each layer [of the CNN]”. Further it is taught determining how much/directions to move the “pyramids” on (insert one or more of the shifting bits into) the data read from memory (loaded data) based on a set “stride” to produce values, where intermediate values are stored in on-chip buffer for processing the next layer (generate directed line buffer data). It is taught that “on-chip buffers are called block RAM (BRAM) in FPGA terminology” and have two read/write ports to then send the values based on map dimensions/“parameters” for further processing (write the directed line buffer data in the at least one or more lines of a line buffer wherein the one or more lines of the line buffer are associated to one or more processing components of the neural network environment)), wherein data is shifted by one bit for continuous data and shifted by two bits at each of the discontinuities that occurs in the data to allow for unraveling of the data and eliminate a superfluous write of data in the line buffer (Examiner note: Applicant’s spec, paragraph 0009 states “if stride has a value of 0 then the lines of a line buffer can have the same data, where if the stride has a value of 1, the data can be shifted by 1 to allow for more efficient read/write operation within the line buffer”.
Alwani, sections 1, 2 intro, 3.3.2’s “Convolution Engine”, 3.4.2, 5.2, Table 2.1 teach determining how much/directions to move the layer’s “pyramids” on the data read from memory based on a set “stride” to produce values; where the stride is set to 1 (data is shifted by one bit for continuous data) and utilizing operations for “nonlinearities” for “each output value” including adjusting a layer to “stride 2” (and shifted by two bits at each of the discontinuities that occurs in the data to allow for unraveling of the data) for implementing “the fused-layer CNN accelerator” to consolidate buffer usage and “minimizing data movement” (and eliminate a superfluous write of data in the at least one line buffer)); and 
communicate the written directed line buffer data in the at least one line buffer to the at least one processor of the neural network environment for processing, wherein the number of bits to be shifted are calculated at a periodic number of elements of the line buffer data that allow for the contiguous single write of the data (sections 3.1, 3.3.2, 3.4.2, 4.1, Table 2.1, and Figs. 3.2 and 3.4 teach that “on-chip buffers are called block RAM (BRAM) in FPGA terminology” and have two read/write ports to then send (communicate) the intermediate values (written directed line buffer data in the at least one line buffer) based on map dimensions/“parameters” for further processing by the “processor” for CNN computations (to the at least one processor of the neural network environment for processing). Further, sections 2 intro, 3.2, 3.2.1, 3.2.2’s “Convolution Engine”, 3.3.1, 4.1, Table 2.1, and Figs. 3.2 and 3.4 teach determining how much/directions to move the “pyramids” on the data for each layer (wherein the number of bits to be shifted are calculated) based on a set “stride” value of the data (at a periodic number of elements of the line buffer data) in order to produce values; where intermediate values are stored in on-chip buffer for processing the next layer. It is taught that “on-chip…BRAM[s]” have two read/write ports to then send the values based on map dimensions/“parameters” for further processing in a single cycle (that allow for the contiguous single write of the data)).
Alwani at least implies a computer-readable storage medium having computer-executable instructions stored thereupon which, when executed by one or more processors of a computing device (see mapping above), calculating shifting bits representative of the number of bits to shift the one or more data elements of the data according to the initialization parameters to enable a contiguous single write of the data (see mapping above); receiving one or more instructions from the cooperating controller component of the neural network environment to insert one or more of the shifting bits into the loaded data to generate directed line buffer data and to write the directed line buffer data in the at least one line buffer according to the one or more initialization parameters (see mapping above), and wherein the number of bits to be shifted are calculated at a periodic number of elements of the line buffer data that allow for the contiguous single write of the data (see mapping above), however Haraden teaches a computer-readable storage medium having computer-executable instructions stored thereupon which, when executed by one or more processors of a computing device (paragraphs 0023 and 0031 teach “[a]ny of the disclosed methods can be implemented as computer-executable instructions stored on one or more computer-readable media” and executed by “one or more processors…coupled to memory”),
calculate shifting bits representative of the number of bits to shift the one or more data elements of the data according to the initialization parameters to enable a contiguous single write of the data (paragraphs 0033, 0039-0042, 0045, 0051 teach “specifying…parameters for specifying a block memory move” used by a line buffer and shift register for processing the data “in a single cycle” (calculate shifting bits representative of the number of bits to shift the one or more data elements of the data according to the initialization parameter to enable a contiguous single write of the data)); 
receive one or more instructions from the cooperating controller component of the neural network environment to insert one or more of the shifting bits into the loaded data to generate directed line buffer data and to write the directed line buffer data in the at least one line buffer according to the one or more initialization parameters (paragraphs 0033, 0039-0042 teach “a DMA controller 150 receives instructions from one or more of the processors 110 (cooperating controller component) specifying memory operations to be performed by the DMA engine 140.  For example, a DMA instruction will specify a starting address, an ending address, a stride width, element size, number of elements, and/or other suitable parameters for specifying a block memory move” (according to the one or more initialization parameters). Further the DMA is taught to include a “line buffer” and “shift register” for reading, processing (insert one or more of the shifting bits into the loaded data to generate directed line buffer data), and writing data (write the directed line buffer data in the at least one line buffer) and used in “neural networks” operations (neural network environment).), and 
wherein the number of bits to be shifted are calculated at a periodic number of elements of the line buffer data that allow for the contiguous single write of the data (paragraphs 0033, 0039-0042, 0045, 0051 teach “specifying…parameters for specifying a block memory move” used by a line buffer and shift register for processing the data “in a single cycle” (wherein the number of bits to be shifted are calculated at a periodic number of elements of the line buffer data that allow for the contiguous single write of the data)).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement Haraden’s teachings of utilizing a DMA including a line buffer, shift registers, and specified parameters for neural network operations into Alwani’s teaching of neural network accelerators utilizing on-chip buffers and sliding windows in order to increase processing efficiency and “increasing data throughput” (Haraden, paragraphs 0033, 0039-0042, 0051, 0062).

Regarding claim 19, the combination of Alwani and Haraden teach all the claim limitations of claim 16 above; and further teach process the written directed line buffer data according to a selected number of processing cycles wherein the selected number of processing cycles are based on a number of line buffer columns (Alwani, sections 3.3.2 and 4.3 teaches all layers executing the data (process the written directed line buffer data) in “the same number of execution cycles” (according to a selected number of processing cycles) dependent on “map size” for the buffer including number of columns (wherein the selected number of processing cycles are based on the number of line buffer columns)).

Regarding claim 20, the combination of Alwani and Haraden teach all the claim limitations of claim 16 above; and further teach wherein the instructions further cause the at least one processor of the computing device to: traverse the loaded data utilizing a logical data mapping of the loaded data, the traversing of the loaded data comprising applying one or more sliding windows to the logical data mapping to associate a portion of the loaded data to one or more physical memory addresses (Alwani, sections “Chapter 1”, 3.3.1-3.3.2 and 3.4.2 teach memories connected to a processor, well known to execute stored instructions for implementing the described “methodology” of, as taught in section 1, paragraph 7, and sections 3.1 page 11, and 3.2, using “a pyramid-shaped multi-layer sliding window[s]” with computed “dimensions” that overlap and move about the input data in order to bring “original input data…on chip” to a buffer (traverse the loaded data utilizing a logical data mapping of the loaded data, the traversing of the loaded data comprising applying one or more sliding windows to the logical data mapping to associate a portion of the loaded data to one or more physical memory addresses) as intermediate/output values for “effective on-chip caching during CNN evaluation”).
Alwani at least implies traverse the loaded data utilizing a logical data mapping of the loaded data, the traversing of the loaded data comprising applying one or more sliding windows to the logical data mapping to associate a portion of the loaded data to one or more physical memory addresses (see mapping above), however Haraden teaches traverse the loaded data utilizing a logical data mapping of the loaded data, the traversing of the loaded data comprising applying one or more sliding windows to the logical data mapping to associate a portion of the loaded data to one or more physical memory addresses (paragraphs 0033, 0039-0042 teach “a DMA controller 150 receives instructions from one or more of the processors 110 specifying memory operations to be performed by the DMA engine 140.  For example, a DMA instruction will specify a starting address (associate a portion of the loaded data to one or more physical memory addresses), an ending address (associate a portion of the loaded data to one or more physical memory addresses), a stride width, element size, number of elements, and/or other suitable parameters for specifying a block memory move”. Further the DMA is taught to include a “window used to filter the line buffer data” (traverse the loaded data utilizing a logical data mapping of the loaded data, the traversing of the loaded data comprising applying one or more sliding windows to the logical data mapping) and “shift register” for reading, processing, and writing data and used in “neural networks” operations.).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement Haraden’s teachings of utilizing a DMA including a line buffer, shift registers, and specified parameters for neural network operations into Alwani’s teaching of neural network accelerators utilizing on-chip buffers and sliding windows in order to increase processing efficiency and “increasing data throughput” (Haraden, paragraphs 0033, 0039-0042, 0051, 0062).

Regarding claim 21, the combination of Alwani and Haraden teach all the claim limitations of claim 16 above; and further teach wherein the memory component cooperates with a physical sensor capable of producing input data comprising audio data, video data, haptic sensory data, and other data for subsequent processing by the one or more cooperating processing units (Haraden, paragraphs 0033, 0039-0042, 0077, 0080, and 0081 teach a “memory” communicating with “input device(s)” (memory component cooperates with a physical sensor) for (capable of producing input data) “audio or video input” (audio data, video data), and/or “touch input” (haptic sensory data), used by a DMA for data processing in “neural networks” operations by “one or more of the processors” (for subsequent processing by the one or more cooperating processing units)).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement Haraden’s teachings of utilizing a DMA for neural network operations on specific data into Alwani’s teaching of neural network accelerators utilizing on-chip buffers and sliding windows in order to increase processing efficiency and “increasing data throughput” (Haraden, paragraphs 0033, 0039-0042, 0077, 0080, and 0081).

Regarding claim 22, the combination of Alwani and Haraden teach all the claim limitations of claim 21 above; and further teach wherein the processing components electronically cooperate with one or more output physical components operative to receive for human interaction processed input data comprising audio data, video data, haptic sensory data and other data (Haraden, paragraphs 0033, 0039-0042, 0077, 0080, and 0081 teach a DMA for data processing in “neural networks” operations by “one or more of the processors” (processing components) utilizing (cooperate with) “input device(s)” (one or more output physical components operative to receive for human interaction processed input data) for “audio or video input” (comprising audio data, video data), “other data in a modulated data signal” (and other data), and/or “touch input” (haptic sensory data)).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement Haraden’s teachings of utilizing a DMA for neural network operations on specific data into Alwani’s teaching of neural network accelerators utilizing on-chip buffers and sliding windows in order to increase processing efficiency and “increasing data throughput” (Haraden, paragraphs 0033, 0039-0042, 0077, 0080, and 0081).


Claims 6, 11, 17-18, and 23-24 are rejected under 35 U.S.C. 103 as being unpatentable over Alwani ("Fused Convolutional Neural Network Accelerators", 2015), in view of Haraden et al (US Pub 20170004092) hereinafter Haraden, in view of Boesch et al (US Pub 20180189642) hereinafter Boesch.
Regarding claim 6, the combination of Alwani and Haraden teach all the claim limitations of claim 1 above; and further teach wherein the computer-readable instructions further cause the at least one processor to insert one or more data paddings to the loaded data (Alwani, sections “Chapter 1”, 3.3.1-3.3.2 and 3.4.2 teach memories connected to a processor (at least one processor to), well known to execute stored instructions for implementing the described “methodology” of, as taught in section 5.1, using different types of CNN layers including “padding”, known for padding input values for network operations (insert one or more data paddings to the loaded data)).
Alwani at least implies wherein the computer-readable instructions further cause the at least one processor to insert one or more data paddings to the loaded data (see mapping above), however Boesch teaches wherein the computer-readable instructions further cause the at least one processor to insert one or more data paddings to the loaded data (paragraphs 0066, 0244, 0249 teach a processor adding padding rows/columns to the input data for “neural network” processing (insert one or more data paddings to the loaded data)).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify neural network accelerators utilizing on-chip buffers and sliding windows, as taught by Alwani as modified by utilizing a DMA including a line buffer, shift registers, and specified parameters for neural network operations as taught by Haraden, to include adding padding to the input data for neural network processing as taught by Boesch in order to improve computing efficiency and accuracy of pixel recognition (Boesch, paragraphs 0066, 0068, 0244, 0249-0250).

Regarding claim 11, the combination of Alwani and Haraden teach all the claim limitations of claim 8 above; and further teach inserting a padding sub-volume into the loaded data that is defined by the received one or more instructions from the cooperating controller components and by the received one or more initialization parameters (Alwani, sections “Chapter 1”, 3.3.1-3.3.2 and 3.4.2 teach memories connected to a processor, well known to execute stored instructions for implementing the described “methodology” of, as taught in sections 3.2, 4.1, 4.3, 5.1, and Table 2.1, using different types of CNN layers including “padding”, known for padding input values for network operations (inserting a padding sub-volume into the loaded data) by calculating “dimensions of the pyramid” from loaded parameters and the dimensions of the input data (defined by the received one or more instructions from the cooperating controller components and by the received one or more initialization parameters)).
Alwani at least implies inserting a padding sub-volume into the loaded data that is defined by the received one or more instructions from the cooperating controller components and by the received one or more initialization parameters (see mapping above), however Boesch teaches inserting a padding sub-volume into the loaded data that is defined by the received one or more instructions from the cooperating controller components and by the received one or more initialization parameters (paragraphs 0066, 0241, 0244, 0249, and 0261 teach a processor executing instructions, such as “configuration logic” defining “desired stride, padding, and the like” (defined by the received one or more instructions from the cooperating controller components and by the received one or more initialization parameters), for adding padding rows/columns to the input data for “neural network” processing (inserting a padding sub-volume into the loaded data)).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify neural network accelerators utilizing on-chip buffers and sliding windows, as taught by Alwani as modified by utilizing a DMA including a line buffer, shift registers, and specified parameters for neural network operations as taught by Haraden, to include configuration logic for adding padding to the input data for neural network processing as taught by Boesch in order to improve computing efficiency and accuracy of pixel recognition (Boesch, paragraphs 0066, 0068, 0241, 0244, 0249-0250, and 0261).

Regarding claim 17, the combination of Alwani and Haraden teach all the claim limitations of claim 16 above; and further teach wherein the instructions further cause the one or more processors of the computing device to: insert an additional data volume to the loaded data (Alwani, sections “Chapter 1”, 3.3.1-3.3.2 and 3.4.2 teach memories connected to a processor, well known to execute stored instructions for implementing the described “methodology” of, as taught in section 5.1, using different types of CNN layers including “padding”, known for padding input values for network operations (insert an additional data volume to the loaded data)).
Alwani at least implies wherein the instructions further cause the one or more processors of the computing device to: insert an additional data volume to the loaded data (see mapping above), however Boesch teaches wherein the instructions further cause the one or more processors of the computing device to: insert an additional data volume to the loaded data (paragraphs 0066, 0241, 0244, 0249, and 0261 teach a processor executing instructions, such as “configuration logic” defining “desired stride, padding, and the like” (wherein the instructions further cause the one or more processors of the computing device), for adding padding rows/columns to the input data for “neural network” processing (insert an additional data volume to the loaded data)).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify neural network accelerators utilizing on-chip buffers and sliding windows, as taught by Alwani as modified by utilizing a DMA including a line buffer, shift registers, and specified parameters for neural network operations as taught by Haraden, to include configuration logic for adding padding to the input data for neural network processing as taught by Boesch in order to improve computing efficiency and accuracy of pixel recognition (Boesch, paragraphs 0066, 0068, 0241, 0244, 0249-0250, and 0261).

Regarding claim 18, the combination of Alwani, Haraden, and Boesch teach all the claim limitations of claim 17 above; and further teach retrieve a stride value from the initialization parameters representative of the calculated shifting bits (Alwani, sections 2 intro, 3.2, 3.2.1, 4.1, Table 2.1, and Fig. 3.2 teach finding the “dimensions of the pyramid” from loaded parameters such as “stride” (initialization parameters), and determining how much to move the “sliding filters across the input feature map with a stride of S” (retrieve a stride value from the initialization parameters representative of the calculated shifting bits)).

Regarding claim 23, the combination of Alwani and Haraden teach all the claim limitations of claim 16 above. However while the combination does teach shifting data, Boesch better teaches calculating a first shifting bit value for the loaded data that is continuous and another shifting bit value for the loaded data that is discontinuous (paragraphs 0020, 0066, 0241, 0244, 0249, and 0261 teach shifting the data by a determined value of one (calculating a first shifting bit value for the loaded data) in the row (that is continuous), and then at the end of the row (that is discontinuous) returning to the first column while moving down a row of the data, similar to a “carriage return” (another shifting bit value for the loaded data)).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify neural network accelerators utilizing on-chip buffers and sliding windows, as taught by Alwani as modified by utilizing a DMA including a line buffer, shift registers, and specified parameters for neural network operations as taught by Haraden, to include configuration logic for shifting the input data for neural network processing as taught by Boesch in order to improve computing efficiency and accuracy of pixel recognition (Boesch, paragraphs 0020, 0066, 0068, 0241, 0244, 0249-0250, and 0261).

Regarding claim 24, the combination of Alwani, Haraden, and Boesch teach all the claim limitations of claim 23 above; and further teach first shifting the loaded data according to the calculated first shift bit value, and then shifting the loaded data according to another shifting bit value (paragraphs 0020, 0066, 0241, 0244, 0249, and 0261 teach shifting the data by a determined value of one in the row (first shifting the loaded data according to the calculated first shift bit value), and then at the end of the row returning to the first column while moving down a row of the data, similar to a “carriage return” (shifting the loaded data according to another shifting bit value)).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify neural network accelerators utilizing on-chip buffers and sliding windows, as taught by Alwani as modified by utilizing a DMA including a line buffer, shift registers, and specified parameters for neural network operations as taught by Haraden, to include configuration logic for shifting the input data for neural network processing as taught by Boesch in order to improve computing efficiency and accuracy of pixel recognition (Boesch, paragraphs 0020, 0066, 0068, 0241, 0244, 0249-0250, and 0261).

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CLINT MULLINAX whose telephone number is 571-272-3241.  The examiner can normally be reached on Mon - Fri 8:00-4:30 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on 571-270-3428.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/C.M./Examiner, Art Unit 2123                                                                                                                                                                                                        

/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123