DETAILED ACTION
Claims 1-19 are pending.
The office acknowledges the following papers:
Oath and Power of Attorney filed on 6/9/2020,
IDS filed on 12/31/2020.

	Priority
The effective filing date for the subject matter defined in the pending claims in this application is 7/9/2019.

IDS
The information disclosure statement filed 12/31/2020 fails to comply with 37 CFR 1.98(a)(3)(i) because it does not include a concise explanation of the relevance, as it is presently understood by the individual designated in 37 CFR 1.56(c) most knowledgeable about the content of the information, of each reference listed that is not in the English language.  It has been placed in the application file, but the information referred to therein has not been considered.

Drawings
The Examiner contends that the drawings submitted on 12/31/2019 are acceptable for examination proceedings. 

Specification
The disclosure is objected to because of the following informalities:
The lengthy specification has not been checked to the extent necessary to determine the presence of all possible minor errors. The Applicant’s cooperation is requested in correcting any errors of which the Applicant may become aware.
The title of the invention is not descriptive. A new title is required that is clearly indicative of the invention to which the claims are directed. The following title is suggested: “Matrix Data Reuse Techniques in Matrix Convolution Operations”.
Appropriate correction is required.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1-17 are rejected under 35 U.S.C. 103 as being unpatentable over Whatmough et al. (U.S. 2019/0311243), in view of Official Notice.
As per claim 1:
Whatmough disclosed a system comprising:

one or more processors configured to perform a convolution of the first matrix and the second matrix to generate a third matrix using a plurality of multiply and accumulate units with data reuse of adjacent values in one or both of the first matrix and second matrix by respective ones of the plurality of multiply and accumulate units (Whatmough: Figures 1-2, 5, and 8 elements 100, 104-106, 202-204, 510, and 806-810, paragraphs 20-24, 27, 31, 37, and 61)(The accelerator performs convolution computations using an input weight matrix and an input feature matrix to generate an output feature map matrix. The systolic array uses data pipelining to reuse weight and input feature elements shifted into the systolic array each clock cycle. Each processing cell of the systolic array performs a MAC operation on the received matrix elements.).
As per claim 2:
Whatmough disclosed the system according to Claim 1, wherein a current value of the first matrix is loaded in from the one or more memories to the plurality of multiply and 
As per claim 3:
Whatmough disclosed the system according to Claim 2, further comprising:
a serial shift buffer including a plurality of subsets of buffer elements, wherein respective subsets of the buffer elements are coupled to respective multiply and accumulate units (Whatmough: Figures 1-2 and 7 elements 104-106, 202-204, and 704, paragraphs 20, 31, and 55)(The buffers hold matrix data and shift in columns/rows of matrix data to the systolic array each clock cycle from each matrix buffer element.); and
wherein a value of the second matrix is loaded in from the one or more memories to the serial shift buffer (Whatmough: Figures 1-2, 5, and 8 elements 104-106, 202-204, 510, and 806-810, paragraphs 20, 31, 37, and 61)(The buffer receives input feature and weight matrices that are sent to the systolic array for MAC processing. In view of the above official notice, both matrices are loaded from memory.).
As per claim 4:
Whatmough disclosed the system according to Claim 1, wherein a current value in the second matrix is loaded in from the one or more memories to the plurality of multiply and accumulate units (Whatmough: Figures 1-2, 5, and 8 elements 104-106, 202-204, 510, and 806-810, paragraphs 20, 31, 37, and 61)(The data buffer receives input feature and weight matrices that are sent to the convolution accelerator for MAC processing at 
As per claim 5:
Whatmough disclosed the system according to Claim 1, wherein:
the first matrix comprises a plurality of weight filters, each weight filter including a plurality of weight input channels, each weight input channel characterized by a weight kernel height and a weight kernel width (Whatmough: Figure 2 and 6 elements 204 and 616, paragraphs 30-31 and 50);
the second matrix comprises a plurality of input feature map input channels, each input feature map input channel characterized by an input feature map height and an input feature map width (Whatmough: Figures 2 and 6 elements 202 and 602-604, paragraphs 30-31 and 49-50); and
the third matrix comprises a plurality of output feature map output channels, each output feature map output channel characterized by an output feature map height and an output feature map width (Whatmough: Figures 2 and 6 elements 106 and 612, paragraphs 30-31 and 51).
As per claim 6:
Whatmough disclosed the system according to Claim 5, wherein the one or more memories include: 
a static random access memory (SRAM), resistive random access memory (RRAM), magnetic random access memory (MRAM), phase change random access memory (PCRAM), or flash memory configured to store the plurality of weight filters (Whatmough: Figures 1-2, 5, and 8 elements 104-106, 202-204, 510, and 806-810, 
a static random access memory (SRAM), resistive random access memory (RRAM), magnetic random access memory (MRAM), phase change random access memory (PCRAM), or flash memory configured to store the plurality of input feature map input channels (Whatmough: Figures 1-2, 5, and 8 elements 104-106, 202-204, 510, and 806-810, paragraphs 20, 31, 37, and 61)(The data buffer receives input feature and weight matrices that are sent to the convolution accelerator for MAC processing at the plurality of processing elements of the systolic array. In view of the above official notice, both matrices are loaded from SRAM.).
As per claim 7:
Whatmough disclosed the system according to Claim 6, wherein:
the plurality of input feature map input channels comprise a plurality of image pixel values (Whatmough: Figure 6 elements 602-604, paragraph 22).
As per claim 8:
Whatmough disclosed the system according to Claim 1, further comprising one or more pooling circuits coupled to the plurality of multiply and accumulate units, wherein the one or more pooling circuits are configured to pool a plurality of corresponding values from the plurality of multiply and accumulate units to generate a corresponding pooled value (Whatmough: Figures 5 and 8 elements 510 and 814-816, paragraphs 37 and 61)(A pooling operation is performed on output results generated by the processing 
As per claim 9:
Whatmough disclosed a method comprising:
loading values of a first matrix and values of a second matrix in from one or more memory devices (Whatmough: Figures 1 and 4 elements 112-114 and 202-204, paragraphs 16-18)(The data buffer receives input feature and weight matrices that are sent to the convolution accelerator for processing. Whatmough discusses systolic arrays reducing accesses to SRAM/main memory, but doesn’t explicitly state that the DMA and MMU elements access SRAM/main memory for fetching input feature and weight matrices. Official notice is given that direct memory access modules can be used to load and store data from and to a SRAM for the advantage of fetching and storing data directly. Thus, it would have been obvious to one of ordinary skill in the art to implement a host with a SRAM accessible via the DMA of the accelerator. In view of the above official notice, both matrices are loaded from memory.); and
performing multiply and accumulate operations in a plurality of multiply and accumulate units on corresponding values of the first matrix and values of the second matrix, with data reuse of adjacent values in one or both of the first matrix and second matrix by respective ones of the plurality of multiply and accumulate units, to generate a third matrix (Whatmough: Figures 1-2, 5, and 8 elements 100, 104-106, 202-204, 510, and 806-810, paragraphs 20-24, 27, 31, 37, and 61)(The accelerator performs convolution computations using an input weight matrix and an input feature matrix to generate an output feature map matrix. The systolic array uses data pipelining to reuse weight and input feature elements shifted into the systolic array each clock cycle. Each 
As per claim 10:
The additional limitation(s) of claim 10 basically recite the additional limitation(s) of claim 5. Therefore, claim 10 is rejected for the same reason(s) as claim 5.
As per claim 11:
The additional limitation(s) of claim 11 basically recite the additional limitation(s) of claim 2. Therefore, claim 11 is rejected for the same reason(s) as claim 2.
As per claim 12:
Whatmough disclosed the method of Claim 11, further comprising:
loading a current weight value from the one or more memory devices into a plurality of multiply and accumulate units, and a plurality of adjacent current input feature map values from the one or more memory devices into respective multiply and accumulate units (Whatmough: Figures 1-2, 5, and 8 elements 104-106, 202-204, 510, and 806-810, paragraphs 20, 31, 37, and 61)(The data buffer receives input feature and weight matrices that are sent to the convolution accelerator for MAC processing at the plurality of processing elements of the systolic array. In view of the above official notice, both matrices are loaded from memory into the systolic array.);
performing corresponding multiply and accumulate operations using the current weight value and corresponding, ones of the plurality current input feature map values to generate corresponding current accumulated values by the respective multiply and accumulate units (Whatmough: Figures 1-2, 5, and 8 elements 106, 202-204, 510, and 810, paragraphs 20-24, 27, 31, 37, and 61)(The accelerator performs convolution 
iterating through corresponding input channels of input feature map and corresponding input channels of weights (Whatmough: Figures 2 and 5-6 elements 202-204, 510, and 616, paragraphs 31, 37, and 49-50)(Each clock cycle, input feature matrix data is pipelined from the buffer into the systolic array.); 
iterating through kernel height and kernel width of weights, and corresponding map width and map height in the input feature map (Whatmough: Figures 2 and 5-6 elements 202-204, 510, and 616, paragraphs 31, 37, and 49-50)(Each clock cycle, weight matrix data is pipelined from the buffer into the systolic array.);
outputting corresponding current accumulated values as corresponding output feature map values (Whatmough: Figures 6 and 8 elements 622 and 814, paragraphs 51 and 61)(Accumulated execution results are output.);
resetting the corresponding current accumulated values and iterating through map width and map height of input feature map, and corresponding kernel height and kernel width of weights (Whatmough: Paragraph 45)(Accumulators are reset after all matrix input columns are processed.); and
iterating through filters of weights (Whatmough: Figures 2 and 5-6 elements 202-204, 510, and 616, paragraphs 31, 37, and 49-50)(After a first pooling region completes and the convolution isn’t complete, a second pooling region is processed. Each clock cycle during this second pooling region, weight matrix data is pipelined from the buffer into the systolic array. In addition, it would have been obvious to one of ordinary skill in 
As per claim 13:
Whatmough disclosed the method of Claim 11, further comprising:
shifting values in the input feature map through a serial shift buffer (Whatmough: Figures 1-2, 5, and 8 elements 104-106, 202-204, 510, and 806-810, paragraphs 20, 31, 37, and 61)(The buffer receives input feature and weight matrices that are sent to the systolic array for MAC processing. The buffer values are shifted through each clock cycle in a FIFO manner.); and 
a plurality of values in the input feature map are input from corresponding shift elements of the serial shift buffer to corresponding ones of the plurality of multiply and accumulate units (Whatmough: Figures 1-2, 5, and 7 elements 104-106, 202-204, 510, and 704, paragraphs 20, 31, 37, and 55)(The buffers hold matrix data and shift in columns/rows of matrix data to the systolic array each clock cycle from each matrix buffer element.).
As per claim 14:
Whatmough disclosed the method of Claim 13, further comprising:
loading associated input feature map values into a serial shift buffer, a current weight value into a plurality of multiply and accumulate units, and a plurality of current input feature map values from respective subsets of buffer elements of the serial shift, buffer into respective multiply and accumulate units (Whatmough: Figures 1-2, 5, and 8 elements 104-106, 202-204, 510, and 806-810, paragraphs 20, 31, 37, and 61)(The 
performing corresponding multiply and accumulate operations using the current weight value and corresponding ones of the plurality current input feature map values from respective subsets of the buffer elements of the serial shift, buffer to generate corresponding current accumulated values by the respective multiply and accumulate units (Whatmough: Figures 1-2, 5, and 8 elements 106, 202-204, 510, and 810, paragraphs 20-24, 27, 31, 37, and 61)(The accelerator performs convolution computations using an input weight matrix and an input feature matrix to generate an output feature map matrix. Each processing cell of the systolic array performs a MAC operation on the received matrix elements.);
iterating through corresponding input channels of input feature map and corresponding input channels of weights (Whatmough: Figures 2 and 5-6 elements 202-204, 510, and 616, paragraphs 31, 37, and 49-50)(Each clock cycle, input feature matrix data is pipelined from the buffer into the systolic array.);
iterating through kernel height and kernel width of weights, and corresponding map width and map height hi the input feature map (Whatmough: Figures 2 and 5-6 elements 202-204, 510, and 616, paragraphs 31, 37, and 49-50)(Each clock cycle, weight matrix data is pipelined from the buffer into the systolic array.);
outputting corresponding current accumulated values as corresponding output feature map values (Whatmough: Figures 6 and 8 elements 622 and 814, paragraphs 51 and 61)(Accumulated execution results are output.);

iterating through filters of weights (Whatmough: Figures 2 and 5-6 elements 202-204, 510, and 616, paragraphs 31, 37, and 49-50)(After a first pooling region completes and the convolution isn’t complete, a second pooling region is processed. Each clock cycle during this second pooling region, weight matrix data is pipelined from the buffer into the systolic array. In addition, it would have been obvious to one of ordinary skill in the art that multiple convolutions can occur for the advantage of performing additional image processing operations. This also results in weight matrix data being pipelined into the systolic array from the buffer.).
As per claim 15:
The additional limitation(s) of claim 15 basically recite the additional limitation(s) of claim 4. Therefore, claim 15 is rejected for the same reason(s) as claim 4.
As per claim 16:
Whatmough disclosed the method of Claim 15, further comprising:
loading a plurality of current weight values into respective plurality of multiply and accumulate units, and a current input feature map value into a plurality of multiply and accumulate units (Whatmough: Figures 1-2, 5, and 8 elements 104-106, 202-204, 510, and 806-810, paragraphs 20, 31, 37, and 61)(The buffer receives input feature and weight matrices that are sent to the systolic array for MAC processing. The weight matrix includes a plurality of filters.);

iterating through corresponding input channel s of input feature map and corresponding input channels of weights (Whatmough: Figures 2 and 5-6 elements 202-204, 510, and 616, paragraphs 31, 37, and 49-50)(Each clock cycle, input feature matrix data is pipelined from the buffer into the systolic array.);
iterating through kernel height and kernel width of weights, and corresponding map width and map height in the input feature map (Whatmough: Figures 2 and 5-6 elements 202-204, 510, and 616, paragraphs 31, 37, and 49-50)(Each clock cycle, weight matrix data is pipelined from the buffer into the systolic array.);
outputting corresponding current accumulated values as corresponding output feature map values (Whatmough: Figures 6 and 8 elements 622 and 814, paragraphs 51 and 61)(Accumulated execution results are output.);
resetting the corresponding current accumulated values and iterating through map width and map height of input feature map, and corresponding kernel height and kernel width of weights (Whatmough: Paragraph 45)(Accumulators are reset after all matrix input columns are processed.); and

As per claim 17:
The additional limitation(s) of claim 17 basically recite the additional limitation(s) of claim 7. Therefore, claim 17 is rejected for the same reason(s) as claim 7.

Claims 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over Whatmough et al. (U.S. 2019/0311243), in view of Official Notice, further in view of Mills et al. (U.S. 2019/0340486).
As per claim 18:
Whatmough disclosed the method according to Claim 9.
Whatmough failed to teach loading values output from the plurality of multiply and accumulate units out to the one or more memory de vices as corresponding values of a third matrix.
However, Mills combined with Whatmough disclosed loading values output from the plurality of multiply and accumulate units out to the one or more memory devices as corresponding values of a third matrix (Mills: Figures 3-4 elements 230, 314A-N, and 
The advantage of storing execution results in system memory is that they can be saved upon powering down the system for later use upon rebooting the system. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to implement storing finished execution results as in Mills within the accelerator of Whatmough.
As per claim 19:
Whatmough disclosed the method according to Claim 9, further comprising:
pooling values output from the plurality of multiply and accumulate units  (Whatmough: Figures 5 and 8 elements 510 and 814-816, paragraphs 37 and 61)(A pooling operation is performed on output results generated by the processing elements that include MAC circuitry.); and
Whatmough failed to teach loading the pooled values out to the one or more memory devices as corresponding values of a pooled third matrix.
However, Mills combined with Whatmough disclosed loading the pooled values out to the one or more memory devices as corresponding values of a pooled third matrix (Mills: Figures 3-4 elements 230, 314A-N, and 318, paragraphs 52 and 63)(Whatmough: Figures 5 and 8 elements 510 and 814-816, paragraphs 37 and 61)(Mills disclosed a neural processing circuit performing convolution operations with weight and input 
The advantage of storing execution results in system memory is that they can be saved upon powering down the system for later use upon rebooting the system. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to implement storing finished execution results as in Mills within the accelerator of Whatmough.

	Conclusion
The following is text cited from 37 CFR 1.111(c): In amending in reply to a rejection of claims in an application or patent under reexamination, the applicant or patent owner must clearly point out the patentable novelty which he or she thinks the claims present in view of the state of the art disclosed by the references cited or the objections made. The applicant or patent owner must also show how the amendments avoid such references or objections.
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.  
Chen et al. (U.S. 2019/0171448), taught vector register and double buffers holding matrix elements for reuse over multiple clock cycles.
Ginzburg et al. (U.S. 2011/0153707), taught a matrix MAC unit reusing matrix A data and shifting-in matrix B data for execution.

Mansell et al. (U.S. 2020/0117450), taught register-based matrix multiplication.
Narayanamoorthy et al. (U.S. 2020/0265107), taught sparse and dense matrix multiplication.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JACOB A. PETRANEK whose telephone number is (571)272-5988.  The examiner can normally be reached on M-F 8:00-4:30.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached on (571) 272-4169.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/JACOB PETRANEK/Primary Examiner, Art Unit 2183