DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
	The claim limitations that use the word “means” and are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
“a means to store pixels;” in claim 1 – interpreted broadly as any buffer or memory based on the specifications supporting structure of a RAM and FIFO for storing pixel values on page 9 of the instant application.
“a means to store weights;” in claim 1 – interpreted broadly as any buffer based on the buffer storing of weights on page 8 of the instant application.
“a means to control retrieval of pixels and weights and their simultaneous transmission to at least one or a plurality of computational processing means;” in claim 1 – any controller used for loading units as described on page 6 of the instant application.
“a means to conduct computational processing;” in claim 1 – interpreted broadly as any multiply and accumulator such as those used for convolution operations as described on page 8 of the instant application.
“a means to buffer output pixels” in claim 1 – interpreted broadly as any buffer based on the buffer storing of outputs on page 11 of the instant application.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1-6 and 8-10 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Vantrease et al., (US 2019/0236049 A1, hereinafter Vantrease).
Regarding claim 1:
Vantrease shows:
“A hardware accelerator for image processing comprising: a means to store pixels; a means to store weights;” (Paragraph [0032]: “The layer 302 may include PEs 302a, 302b, 302c, . . . , 302n. The layer 302 may process an input data set, e.g., pixel data representing different portions of an image.” In paragraph [0055]: “The memory 614 may include any suitable memory, e.g., dynamic random access memory (DRAM), synchronous DRAM (SDRAM), double data rate DRAM (DDR DRAM), storage class memory (SCM), flash memory, etc.” In paragraph [0056]: “The host interface 616 may be configured to enable communication between the host device and the neural network processor 602. For example, the host interface 616 may be configured to transmit memory descriptors including the memory addresses of the stored data (e.g., input data sets, weights, results of computations, etc.) between the host device and the neural network processor 602. The host interface 614 may include, e.g., a peripheral component interconnect express (PCIe) interface or any suitable interface for communicating with the host device. The host device may include a host processor and a host memory.” And in paragraph [0061]: “The state buffer 608 may be configured to provide caching of data used for computations at the computing engine 604. The data cached at the state buffer 608 may include, e.g., the input data sets and the weights acquired from the memory 614, as well as intermediate outputs of computations at the computing engine 604” – The buffer and memory for storing weights and pixel data of Vantrease is the means to store pixels; a means to store weights.)
“at least two loading units, one being a weight loading unit and another being a pixel loading unit;” (Paragraph [0056]: “The host interface 616 may be configured to enable communication between the host device and the neural network processor 602. For example, the host interface 616 may be configured to transmit memory descriptors including the memory addresses of the stored data (e.g., input data sets, weights, results of computations, etc.) between the host device and the neural network processor 602. The host interface 614 may include, e.g., a peripheral component interconnect express (PCIe) interface or any suitable interface for communicating with the host device. The host device may include a host processor and a host memory.” – The memory descriptors of weights and pixels of Vantrease are the two loading for use by the host interface are the two loading units.)
“a means to control retrieval of pixels and weights and their simultaneous transmission to at least one or a plurality of computational processing means;” (Paragraph [0056]: “The host interface 616 may be configured to enable communication between the host device and the neural network processor 602. For example, the host interface 616 may be configured to transmit memory descriptors including the memory addresses of the stored data (e.g., input data sets, weights, results of computations, etc.) between the host device and the neural network processor 602. The host interface 614 may include, e.g., a peripheral component interconnect express (PCIe) interface or any suitable interface for communicating with the host device. The host device may include a host processor and a host memory.” – The host interface is the means for controlling the loading units which are the memory descriptors.)
“a means to conduct computational processing;” (Paragraph [0060]: “In some embodiments, the computation controller 606 may determine an operating mode of the computing engine 604 based on the data type and the size of the input data set. For example, if the input data set is much larger (e.g., 2000 data elements) than the size of the systolic array (e.g., 16×16), the computation controller 606 may switch the operating mode of the computing engine 604 to an optimization mode. The optimization mode may enable the computing engine 604 to perform multiple computations in parallel for each input data set. For example, each PE can perform four 4-bit computations in parallel for the 4-bit data type, or two 8-bit computations in parallel for the 8-bit data type. It will be understood that based on the size of the PE, the number of input data elements that can be processed concurrently by the PE may vary, without deviating from the scope of the disclosed technologies. For example, for a 32-bit PE, the optimization mode can enable the computing engine 604 to perform four 8-bit computations, eight 4-bit computations, two 16-bit computations, etc. In some other instances, if the input data set is smaller or comparable (e.g., 200 data elements) to the size of the systolic array (e.g., 16×16), switching the operating mode of the computing engine 604 to the optimization mode may not be very effective since loading of the weights into the systolic array may not be amortized with the smaller data set.” And in paragraph [0038]: “The PE 304b may generate a convolution output 410a based on a summation of multiplication results between each weight of the filter 402 and each corresponding pixel in the group 408a according to Equation 1” – The PE for generating computational output of Vantrease is the means for computational processing.)
“a means to buffer output pixels.” (Paragraph [0038]: “The group 408a of pixel values may be presented as a first input data set. The PE 304b may generate a convolution output 410a based on a summation of multiplication results between each weight of the filter 402 and each corresponding pixel in the group 408a according to Equation 1. For example, the PE 304b may generate a dot-product between a matrix represented by the filter 402 and a matrix represented by the group 408a.” In paragraph [0061]: “The state buffer 608 may be configured to provide caching of data used for computations at the computing engine 604. The data cached at the state buffer 608 may include, e.g., the input data sets and the weights acquired from the memory 614, as well as intermediate outputs of computations at the computing engine 604” In paragraph [0062]: “The output buffer 610 may include a set of registers to store the output data sets generated by the computing engine 604.” And in paragraph [0099]: “In step 1002, a processing element (PE) in a two-dimensional array of PEs may receive a first Xin element and a second Xin element concurrently. The PEs of the array may be arranged into rows and columns. Each row of the array may be mapped to a respective input data set and each column may be mapped to a respective output data set.” – The convolution output and the output data sets that are buffered based off of the computations on the input data sets of pixel data are the pixel output buffered data.)

Regarding claim 2:
Vantrease shows the hardware accelerator of claim 1 as claimed and specified above.
And Vantrease shows “wherein the means to conduct computational processing is a plurality of Multiply-Accumulate units.” (Paragraph [0060]: “In some embodiments, the computation controller 606 may determine an operating mode of the computing engine 604 based on the data type and the size of the input data set. For example, if the input data set is much larger (e.g., 2000 data elements) than the size of the systolic array (e.g., 16×16), the computation controller 606 may switch the operating mode of the computing engine 604 to an optimization mode. The optimization mode may enable the computing engine 604 to perform multiple computations in parallel for each input data set. For example, each PE can perform four 4-bit computations in parallel for the 4-bit data type, or two 8-bit computations in parallel for the 8-bit data type. It will be understood that based on the size of the PE, the number of input data elements that can be processed concurrently by the PE may vary, without deviating from the scope of the disclosed technologies. For example, for a 32-bit PE, the optimization mode can enable the computing engine 604 to perform four 8-bit computations, eight 4-bit computations, two 16-bit computations, etc. In some other instances, if the input data set is smaller or comparable (e.g., 200 data elements) to the size of the systolic array (e.g., 16×16), switching the operating mode of the computing engine 604 to the optimization mode may not be very effective since loading of the weights into the systolic array may not be amortized with the smaller data set.” And in paragraph [0038]: “The PE 304b may generate a convolution output 410a based on a summation of multiplication results between each weight of the filter 402 and each corresponding pixel in the group 408a according to Equation 1” – The changing of the PE to perform multiple computations of Vantrease is the wherein the means to conduct computational processing is a plurality of Multiply-Accumulate units. The PE that is a summation multiplication of Vantrease is the multiply and accumulate unit. )

Regarding claim 3:
Vantrease shows the hardware accelerator of claim 2 as claimed and specified above.
And Vantrease shows “wherein multiple Multiply-Accumulate units are deployed concurrently depending on size of input image.” (Paragraph [0060]: “In some embodiments, the computation controller 606 may determine an operating mode of the computing engine 604 based on the data type and the size of the input data set. For example, if the input data set is much larger (e.g., 2000 data elements) than the size of the systolic array (e.g., 16×16), the computation controller 606 may switch the operating mode of the computing engine 604 to an optimization mode. The optimization mode may enable the computing engine 604 to perform multiple computations in parallel for each input data set. For example, each PE can perform four 4-bit computations in parallel for the 4-bit data type, or two 8-bit computations in parallel for the 8-bit data type. It will be understood that based on the size of the PE, the number of input data elements that can be processed concurrently by the PE may vary, without deviating from the scope of the disclosed technologies. For example, for a 32-bit PE, the optimization mode can enable the computing engine 604 to perform four 8-bit computations, eight 4-bit computations, two 16-bit computations, etc. In some other instances, if the input data set is smaller or comparable (e.g., 200 data elements) to the size of the systolic array (e.g., 16×16), switching the operating mode of the computing engine 604 to the optimization mode may not be very effective since loading of the weights into the systolic array may not be amortized with the smaller data set.” And in paragraph [0038]: “The PE 304b may generate a convolution output 410a based on a summation of multiplication results between each weight of the filter 402 and each corresponding pixel in the group 408a according to Equation 1” – The changing of the PE to perform multiple computations based on the size of input data of Vantrease is the wherein multiple Multiply-Accumulate units are deployed concurrently depending on size of input image.)

Regarding claim 4:
Vantrease shows the hardware accelerator of claim 1 as claimed and specified above.
And Vantrease shows “wherein the said accelerator is used for a Convolutional Neural Network improving its performance by increasing speed of convolution operation by functioning of convolution processing units in parallel.” (Paragraph [0031]: “FIG. 3 illustrates an example of a prediction model that can use techniques disclosed herein. In the example of FIG. 3, the prediction model 204 may be a multi-layer neural network 300 such as a deep neural network (DNN), a convolutional neural network (CNN), or any suitable neural network.” In paragraph [0035]: “On the other hand, in a case where the prediction model 204 is a CNN, each PE of the layer 304 may generate the sum based on the scaling of pixel values from a group of PEs of the layers 302. The sum may represent a convolution result between a group of pixel values and a filter comprising the weight values.” And in paragraph [0026]: “Thus, the time to process the input data set can be reduced by performing multiple computations in parallel by each PE of the systolic array. In addition, the embodiments can provide two or more output data elements concurrently corresponding to each output data set which can improve the performance of the systolic array. Some embodiments can provide significant improvement in performance for larger input data sets as loading of the weights into the systolic array can be amortized for the larger input data sets.” – The improving performance by conducting computations in parallel of Vantrease is the by increasing speed of convolution operation by functioning of convolution processing units in parallel.)

Regarding claim 5:
Vantrease shows the hardware accelerator of claim 1 as claimed and specified above.
And Vantrease shows “wherein said architecture and means utilize hardware computational resource efficiently by reduced number of pixel and weight loading cycles.” (Paragraph [0031]: “FIG. 3 illustrates an example of a prediction model that can use techniques disclosed herein. In the example of FIG. 3, the prediction model 204 may be a multi-layer neural network 300 such as a deep neural network (DNN), a convolutional neural network (CNN), or any suitable neural network.” And in paragraph [0035]: “On the other hand, in a case where the prediction model 204 is a CNN, each PE of the layer 304 may generate the sum based on the scaling of pixel values from a group of PEs of the layers 302. The sum may represent a convolution result between a group of pixel values and a filter comprising the weight values.” And in paragraph [0058]: “The computation controller 606 may be configured to provide controls to various components of the neural network processor 602 to perform neural network computations. The computation controller 606 may perform scheduling of loading the weights into the computing engine 604. The weights may be stored in the state buffer 608. In one embodiment, the computation controller 606 may schedule loading of the weights for all the PEs in the systolic array sequentially using a respective row data bus. For example, one weight for one PE may be loaded per cycle. In another embodiment, the computation controller 606 may schedule loading of the weights in the systolic array in parallel for each row using a respective column data bus for each PE in a given row. For example, weights for each row may be loaded in parallel per cycle.” And in paragraph [0087]: “According to an embodiment, two external sequential input elements may be fed simultaneously to the PE 00 every cycle using a first interface (e.g., the row input data bus 816).” – The loading of weights and inputs in parallel and simultaneously of Vantrease is the reduced number of pixel and weight loading cycles)

Regarding claim 6:
Vantrease shows the hardware accelerator of claim 1 as claimed and specified above.
And Vantrease shows “wherein said architecture and means utilizing hardware computational resource efficiently reduces computation cost.” (Paragraph [0031]: “FIG. 3 illustrates an example of a prediction model that can use techniques disclosed herein. In the example of FIG. 3, the prediction model 204 may be a multi-layer neural network 300 such as a deep neural network (DNN), a convolutional neural network (CNN), or any suitable neural network.” In paragraph [0035]: “On the other hand, in a case where the prediction model 204 is a CNN, each PE of the layer 304 may generate the sum based on the scaling of pixel values from a group of PEs of the layers 302. The sum may represent a convolution result between a group of pixel values and a filter comprising the weight values.” And in paragraph [0026]: “Thus, the time to process the input data set can be reduced by performing multiple computations in parallel by each PE of the systolic array. In addition, the embodiments can provide two or more output data elements concurrently corresponding to each output data set which can improve the performance of the systolic array. Some embodiments can provide significant improvement in performance for larger input data sets as loading of the weights into the systolic array can be amortized for the larger input data sets.” – The reducing of time to process the input data of Vantrease is the reducing of computation cost.)

Regarding claim 8:
Vantrease shows:
“A method of operation of hardware accelerator for image processing comprising: segmentation of input image pixels into a plurality of matrices each with k rows and less than k columns;” (Paragraph [0037]: “In FIG. 4A, a filter 402 may include a two-dimensional array of weights. The weights in the filter 402 may represent a spatial distribution of pixels for certain features to be detected from an input image 404. The input image 404 may include a height of H pixels and a width of W pixels. The filter 402 may have a height of R rows and a width of S columns, and is typically smaller than the input image 404. Each weight in the filter 402 may be mapped to a pixel in a rectangular block of pixel values with the same R rows and S columns. In some implementations, the pixel data in the input image 404 may be referred to as input feature map elements of an input feature map, and may indicate that the pixels are processed by the same filter (or same sets of filters) corresponding to certain feature(s). An output feature map may represent convolution outputs between the filter 402 and the input feature map.” And in paragraph [0038]: “As discussed with reference to FIG. 3, a PE of the layer 304 (e.g., the PE 304b) can receive, from a group of PEs of the input layer 302, a group 408a of pixel values corresponding to a first rectangular block of pixels from the input image 404. The group 408a of pixel values may be presented as a first input data set. The PE 304b may generate a convolution output 410a based on a summation of multiplication results between each weight of the filter 402 and each corresponding pixel in the group 408a according to Equation 1. For example, the PE 304b may generate a dot-product between a matrix represented by the filter 402 and a matrix represented by the group 408a.” – The processing of pixel values in rectangular blocks of Vantrease is the image processing comprising: segmentation of input image pixels into a plurality of matrices each with k rows and less than k columns.)
“loading of pixels and weights to computational processing means simultaneously;” (Paragraph [0021]: “Input data (e.g., pixels for an image) and the weights may be received from a host server. Each PE may be capable of performing concurrent arithmetic operations including additions and multiplications on the input data and the weights. The PEs may then pass the input data and the weights to other elements in the systolic array for further processing, e.g., normalization and activation”)
“convolution of spatially adjacent pixels of said input image with weights.” (Paragraph [0038]: “As discussed with reference to FIG. 3, a PE of the layer 304 (e.g., the PE 304b) can receive, from a group of PEs of the input layer 302, a group 408a of pixel values corresponding to a first rectangular block of pixels from the input image 404. The group 408a of pixel values may be presented as a first input data set. The PE 304b may generate a convolution output 410a based on a summation of multiplication results between each weight of the filter 402 and each corresponding pixel in the group 408a according to Equation 1. For example, the PE 304b may generate a dot-product between a matrix represented by the filter 402 and a matrix represented by the group 408a.” And in paragraph [0040]: “As shown in FIG. 4B, the convolution operations can be arranged in a sliding-window such that the second rectangular block for the group 408b overlaps, or is otherwise adjacent to, the first rectangular block for the group 408a in the input image 404.” – The processing of pixel values in rectangular blocks for convolutions by a sliding window of Vantrease show that they are adjacent.)

Regarding claim 9:
Vantrease shows the method of claim 8 as claimed and specified above.
And Vantrease shows “wherein the convolution is conducted concurrently deploying multiple Multiply-Accumulate units.” (Paragraph [0060]: “In some embodiments, the computation controller 606 may determine an operating mode of the computing engine 604 based on the data type and the size of the input data set. For example, if the input data set is much larger (e.g., 2000 data elements) than the size of the systolic array (e.g., 16×16), the computation controller 606 may switch the operating mode of the computing engine 604 to an optimization mode. The optimization mode may enable the computing engine 604 to perform multiple computations in parallel for each input data set. For example, each PE can perform four 4-bit computations in parallel for the 4-bit data type, or two 8-bit computations in parallel for the 8-bit data type. It will be understood that based on the size of the PE, the number of input data elements that can be processed concurrently by the PE may vary, without deviating from the scope of the disclosed technologies. For example, for a 32-bit PE, the optimization mode can enable the computing engine 604 to perform four 8-bit computations, eight 4-bit computations, two 16-bit computations, etc. In some other instances, if the input data set is smaller or comparable (e.g., 200 data elements) to the size of the systolic array (e.g., 16×16), switching the operating mode of the computing engine 604 to the optimization mode may not be very effective since loading of the weights into the systolic array may not be amortized with the smaller data set.” And in paragraph [0038]: “The PE 304b may generate a convolution output 410a based on a summation of multiplication results between each weight of the filter 402 and each corresponding pixel in the group 408a according to Equation 1” – The changing of the PE to perform multiple computations of Vantrease is the wherein the means to conduct computational processing is a plurality of Multiply-Accumulate units. The PE that is a summation multiplication of Vantrease is the multiply and accumulate unit. )

Regarding claim 10:
Vantrease shows the method of claim 8 as claimed and specified above.
And Vantrease shows “wherein the weights correspond to a plurality of matrices of pixels that overlap over each other.” (Paragraph [0040]: “As shown in FIG. 4B, the convolution operations can be arranged in a sliding-window such that the second rectangular block for the group 408b overlaps, or is otherwise adjacent to, the first rectangular block for the group 408a in the input image 404.” – The processing of pixel values in rectangular blocks for convolutions by a sliding window of Vantrease show that they overlap.)

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim(s) 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Vantrease in view of Vantrease et al., (US 2019/0294413 A1, hereinafter Vantrease 2).
Regarding claim 5:
Vantrease shows the hardware accelerator of claim 1 as claimed and specified above.
But Vantrease does not appear to explicitly recite “wherein said architecture and means utilizing hardware computational resource efficiently reduces power consumption.”
However, Vantrease 2 teaches “wherein said architecture and means utilizing hardware computational resource efficiently reduces power consumption.” (Paragraph [0080]: “Referring back to FIG. 5, post-processor 528 may be configured to perform post-processing on the outputs of computing engine 524 (which may act as a neural network layer, such as a convolution layer or fully-connected layer) that may be stored in output buffer 526 to generate final outputs for the neural network layer.” In paragraph [0082]: “Read access engine 536 may provide read access to state buffer 522 for a read access requesting device including, for example, computing engine 524 and post-processor 528. Write access engine 538 may provide write access to state buffer 522 for a write access requesting device including, for example, post-processor 528. Each of read access engine 536 and write access engine 538 may convert a sequential series of access operations (e.g., multiple read or write operations across multiple clock cycles) to a single access operation to reduce power and reduce wait latency.” And in paragraph [0124]: “In some implementations, techniques disclosed above can be used to reduce the storage space, transportation bandwidth, and computing power used to perform convolution operations or other matrix multiplications for data” – The using a single access operation to reduce power for processing of a computing engine of Vantrease 2 is the wherein said architecture and means utilizing hardware computational resource efficiently reduces power consumption.)
Vantrease and Vantrease 2 are analogous in the arts because both Vantrease and Vantrease 2 describe performing computational processing.
Therefore, it would be obvious to one of ordinary skill in the art at the filing date of the instant application, having the teachings of Vantrease and Vantrease 2 before him or her, to modify the teachings of Vantrease to include the teachings of Vantrease 2 increase efficiency and lower cost of Vantrease by “convert a sequential series of access operations (e.g., multiple read or write operations across multiple clock cycles) to a single access operation to reduce power and reduce wait latency” (see Vantrease 2 paragraph [0082]).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Kuo et al., (US 20190220742 A1), part of the prior art made of record, teaches the use of buffers for weights and pixels, and the computational processing of claim 1 in paragraphs [0026], [0027], and [0029] through the use of a convolutional engine on an input image with access to weights and buffers that are loaded from memory and the storing of data from buffers. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHANE D WOOLWINE whose telephone number is (571)272-4138. The examiner can normally be reached M-F 9:30-6:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, MIRANDA HUANG can be reached on (571) 270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

SHANE D. WOOLWINE
Primary Examiner
Art Unit 2124



/SHANE D WOOLWINE/Primary Examiner, Art Unit 2124