DETAILED ACTION
Claims 1, 5-10, 14, and 17-19 are pending.
The office acknowledges the following papers:
Claims and remarks filed on 11/3/2022.

New Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 5-10, 14, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Whatmough et al. (U.S. 2019/0311243), in view of Official Notice.
As per claim 1:
Whatmough disclosed a system comprising:
one or more memories configured to store a first matrix and a second matrix (Whatmough: Figures 1 and 4 elements 112-114 and 202-204, paragraphs 16-18)(The data buffer receives input feature and weight matrices that are sent to the convolution accelerator for processing. Whatmough discusses systolic arrays reducing accesses to SRAM/main memory, but doesn’t explicitly state that the DMA and MMU elements access SRAM/main memory for fetching input feature and weight matrices. Official notice is given that direct memory access modules can be used to load and store data from and to a SRAM for the advantage of fetching and storing data directly. Thus, it would have been obvious to one of ordinary skill in the art to implement a host with a SRAM accessible via the DMA of the accelerator.); and
a plurality of multiply and accumulate units configured to perform a convolution of the first matrix and the second matrix to generate a third matrix (Whatmough: Figures 1-2, 4-5, and 8 elements 100, 104-106, 200-204, 500, 510, and 806-810, paragraphs 20-24, 27, 31, 33-34, 37, and 61)(The accelerator uses a plurality of MAC units to perform convolution computations using an input weight matrix and an input feature matrix to generate an output feature map matrix.);
a serial shift buffer including a plurality of subsets of buffer elements, wherein respective subsets of the buffer elements are coupled to respective multiply and accumulate units (Whatmough: Figures 1-2 and 7 elements 104-106, 202-204, and 704, paragraphs 20, 31, and 55)(The buffers hold matrix data and shift in columns/rows of matrix data to the systolic array each clock cycle from each matrix buffer element. Individual rows/columns of the buffers (i.e. subsets) output data to a given processing element that includes a MAC unit of the systolic array.); and
wherein values of the first matrix are loaded in from the one or more memories to the respective subsets of the plurality of multiply and accumulate units, wherein values of the second matrix are loaded in from the respective subsets of buffer elements to the respective subsets of the multiply and accumulate units (Whatmough: Figures 1-2, 5, and 8 elements 104-106, 202-204, 510, and 806-810, paragraphs 20, 31, 37, and 61)(The data buffer receives input feature and weight matrices that are sent to the convolution accelerator for MAC processing at the plurality of processing elements of the systolic array. In view of the above official notice, both matrices are loaded from memory.), and wherein current values of the third matrix are concurrently computed from a current value of the first matrix loaded in the subset of the multiply and accumulate units and respective current values of the second matrix loaded in respective ones of the subset of multiply and accumulate units (Whatmough: Figures 1-2, 5, and 8 elements 104-106, 200-204, 510, 516-518, and 806-810, paragraphs 20, 31, 37, and 61)(The data buffer receives input feature and weight matrices that are sent to the convolution accelerator for MAC processing at the plurality of processing elements of the systolic array. The input features and weight matrices are clocked into the systolic array each clock cycle. The input features and weight matrices are buffered in processing elements for computations and data passing to adjacent processing elements. Thus, computations of the output matrix (i.e third matrix) are based on current values of the buffered input features and weight matrices within processing elements.).
As per claim 5:
Whatmough disclosed the system according to Claim 1, wherein:
the first matrix comprises a plurality of weight filters, each weight filter including a plurality of weight input channels, each weight input channel characterized by a weight kernel height and a weight kernel width (Whatmough: Figure 2 and 6 elements 204 and 616, paragraphs 30-31 and 50);
the second matrix comprises a plurality of input feature map input channels, each input feature map input channel characterized by an input feature map height and an input feature map width (Whatmough: Figures 2 and 6 elements 202 and 602-604, paragraphs 30-31 and 49-50); and
the third matrix comprises a plurality of output feature map output channels, each output feature map output channel characterized by an output feature map height and an output feature map width (Whatmough: Figures 2 and 6 elements 106 and 612, paragraphs 30-31 and 51).
As per claim 6:
Whatmough disclosed the system according to Claim 5, wherein the one or more memories include: 
a static random access memory (SRAM), resistive random access memory (RRAM), magnetic random access memory (MRAM), phase change random access memory (PCRAM), or flash memory configured to store the plurality of weight filters (Whatmough: Figures 1-2, 5, and 8 elements 104-106, 202-204, 510, and 806-810, paragraphs 20, 31, 37, and 61)(The data buffer receives input feature and weight matrices that are sent to the convolution accelerator for MAC processing at the plurality of processing elements of the systolic array. In view of the above official notice, both matrices are loaded from SRAM.); and 
a static random access memory (SRAM), resistive random access memory (RRAM), magnetic random access memory (MRAM), phase change random access memory (PCRAM), or flash memory configured to store the plurality of input feature map input channels (Whatmough: Figures 1-2, 5, and 8 elements 104-106, 202-204, 510, and 806-810, paragraphs 20, 31, 37, and 61)(The data buffer receives input feature and weight matrices that are sent to the convolution accelerator for MAC processing at the plurality of processing elements of the systolic array. In view of the above official notice, both matrices are loaded from SRAM.).
As per claim 7:
Whatmough disclosed the system according to Claim 6, wherein:
the plurality of input feature map input channels comprise a plurality of image pixel values (Whatmough: Figure 6 elements 602-604, paragraph 22).
As per claim 8:
Whatmough disclosed the system according to Claim 1, further comprising one or more pooling circuits coupled to the plurality of multiply and accumulate units, wherein the one or more pooling circuits are configured to pool a plurality of corresponding values from the plurality of multiply and accumulate units to generate a corresponding pooled value (Whatmough: Figures 5 and 8 elements 510 and 814-816, paragraphs 37 and 61)(A pooling operation is performed on output results generated by the processing elements that include MAC circuitry.).
As per claim 9:
Whatmough disclosed a method comprising:
loading values of a first matrix from one or more memory devices into respective subsets of multiply and accumulate units (Whatmough: Figures 1-2, 5, and 8 elements 104-106, 112-114, 202-204, 510, and 806-810, paragraphs 16-18, 20, 31, 37, and 61)(The data buffer receives input feature and weight matrices that are sent to the convolution accelerator for MAC processing at the plurality of processing elements of the systolic array. Whatmough discusses systolic arrays reducing accesses to SRAM/main memory, but doesn’t explicitly state that the DMA and MMU elements access SRAM/main memory for fetching input feature and weight matrices. Official notice is given that direct memory access modules can be used to load and store data from and to a SRAM for the advantage of fetching and storing data directly. Thus, it would have been obvious to one of ordinary skill in the art to implement a host with a SRAM accessible via the DMA of the accelerator. In view of the above official notice, both matrices are loaded from memory.); and
shifting values of a second matrix from the one or more memory devices through a serial shift buffer, wherein the serial shift buffer includes a plurality of subsets of buffer elements and respective subsets of the buffer elements are coupled to respective ones of a plurality of multiply and accumulate units (Whatmough: Figures 1-2 and 7 elements 104-106, 202-204, and 704, paragraphs 20, 31, and 55)(The buffers hold matrix data and shift in columns/rows of matrix data to the systolic array each clock cycle from each matrix buffer element. Individual rows/columns of the buffers (i.e. subsets) output data to a given processing element that includes a MAC unit of the systolic array. In view of the above official notice, both matrices are loaded from memory.);
loading values of the second matrix from corresponding subsets of buffer elements into respective ones of the plurality of multiply and accumulate units (Whatmough: Figures 1-2, 5, and 8 elements 104-106, 202-204, 510, and 806-810, paragraphs 20, 31, 37, and 61)(The data buffer receives input feature and weight matrices that are sent to the convolution accelerator for MAC processing at the plurality of processing elements of the systolic array. In view of the above official notice, both matrices are loaded from memory.); and
performing multiply and accumulate operations concurrently in the plurality of multiply and accumulate units on corresponding values of the first matrix and values of the second matrix (Whatmough: Figures 1-2, 4-5, and 8 elements 100, 104-106, 202-204, 510, and 806-810, paragraphs 20-24, 27, 31, 33-34, 37, and 61)(The accelerator uses a plurality of MAC units to perform concurrent convolution computations using an input weight matrix and an input feature matrix to generate an output feature map matrix.) wherein corresponding values of the second matrix are reused from the serial shift buffer without reshifting the corresponding values of the second matrix from the one or more memory devices into the serial shift buffer (Whatmough: Figures 1-2, 5, and 8 elements 104-106, 200-204, 510, 516-518, and 806-810, paragraphs 20, 31, 37, and 61)(The data buffer receives input feature and weight matrices that are sent to the convolution accelerator for MAC processing at the plurality of processing elements of the systolic array. The input features and weight matrices are clocked into the systolic array each clock cycle. The input features and weight matrices are buffered in processing elements for computations and data passing to adjacent processing elements. This allows for reusing values shifted out of the data buffer into the systolic array without refetching from memory.).
As per claim 10:
The additional limitation(s) of claim 10 basically recite the additional limitation(s) of claim 5. Therefore, claim 10 is rejected for the same reason(s) as claim 5.
As per claim 14:
Whatmough disclosed the method of Claim 10, further comprising:
performing corresponding multiply and accumulate operations using the current weight value and corresponding ones of the plurality current input feature map values from respective subsets of the buffer elements of the serial shift, buffer to generate corresponding current accumulated values by the respective multiply and accumulate units (Whatmough: Figures 1-2, 5, and 8 elements 106, 202-204, 510, and 810, paragraphs 20-24, 27, 31, 37, and 61)(The accelerator performs convolution computations using an input weight matrix and an input feature matrix to generate an output feature map matrix. Each processing cell of the systolic array performs a MAC operation on the received matrix elements.);
iterating through corresponding input channels of input feature map and corresponding input channels of weights (Whatmough: Figures 2 and 5-6 elements 202-204, 510, and 616, paragraphs 31, 37, and 49-50)(Each clock cycle, input feature matrix data is pipelined from the buffer into the systolic array.);
iterating through kernel height and kernel width of weights, and corresponding map width and map height hi the input feature map (Whatmough: Figures 2 and 5-6 elements 202-204, 510, and 616, paragraphs 31, 37, and 49-50)(Each clock cycle, weight matrix data is pipelined from the buffer into the systolic array.);
outputting corresponding current accumulated values as corresponding output feature map values (Whatmough: Figures 6 and 8 elements 622 and 814, paragraphs 51 and 61)(Accumulated execution results are output.);
resetting the corresponding current accumulated values and iterating through map width and map height of input feature map, and corresponding kernel height and kernel width of weights (Whatmough: Paragraph 45)(Accumulators are reset after all matrix input columns are processed.); and
iterating through filters of weights (Whatmough: Figures 2 and 5-6 elements 202-204, 510, and 616, paragraphs 31, 37, and 49-50)(After a first pooling region completes and the convolution isn’t complete, a second pooling region is processed. Each clock cycle during this second pooling region, weight matrix data is pipelined from the buffer into the systolic array. In addition, it would have been obvious to one of ordinary skill in the art that multiple convolutions can occur for the advantage of performing additional image processing operations. This also results in weight matrix data being pipelined into the systolic array from the buffer.).
As per claim 17:
The additional limitation(s) of claim 17 basically recite the additional limitation(s) of claim 7. Therefore, claim 17 is rejected for the same reason(s) as claim 7.

Claims 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over Whatmough et al. (U.S. 2019/0311243), in view of Official Notice, further in view of Mills et al. (U.S. 2019/0340486).
As per claim 18:
Whatmough disclosed the method according to Claim 9.
Whatmough failed to teach loading values output from the plurality of multiply and accumulate units out to the one or more memory de vices as corresponding values of a third matrix.
However, Mills combined with Whatmough disclosed loading values output from the plurality of multiply and accumulate units out to the one or more memory devices as corresponding values of a third matrix (Mills: Figures 3-4 elements 230, 314A-N, and 318, paragraphs 52 and 63)(Whatmough: Figures 5 and 8 elements 510 and 814-816, paragraphs 37 and 61)(Mills disclosed a neural processing circuit performing convolution operations with weight and input matrices. Mills disclosed a post-processing circuit to output finished data to system memory. The combination allows for pooling results in Whatmough to be stored in system memory.).
The advantage of storing execution results in system memory is that they can be saved upon powering down the system for later use upon rebooting the system. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to implement storing finished execution results as in Mills within the accelerator of Whatmough.
As per claim 19:
Whatmough disclosed the method according to Claim 9, further comprising:
pooling values output from the plurality of multiply and accumulate units  (Whatmough: Figures 5 and 8 elements 510 and 814-816, paragraphs 37 and 61)(A pooling operation is performed on output results generated by the processing elements that include MAC circuitry.).
Whatmough failed to teach loading the pooled values out to the one or more memory devices as corresponding values of a pooled third matrix.
However, Mills combined with Whatmough disclosed loading the pooled values out to the one or more memory devices as corresponding values of a pooled third matrix (Mills: Figures 3-4 elements 230, 314A-N, and 318, paragraphs 52 and 63)(Whatmough: Figures 5 and 8 elements 510 and 814-816, paragraphs 37 and 61)(Mills disclosed a neural processing circuit performing convolution operations with weight and input matrices. Mills disclosed a post-processing circuit to output finished data to system memory. The combination allows for pooling results in Whatmough to be stored in system memory.).
The advantage of storing execution results in system memory is that they can be saved upon powering down the system for later use upon rebooting the system. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to implement storing finished execution results as in Mills within the accelerator of Whatmough.

Response to Arguments
The arguments presented by Applicant in the response, received on 11/3/2022 are not considered persuasive.
Applicant argues regarding claims 1 and 9:
“For example, the first subset of buffer element 810-820 are coupled to a first multiply and accumulate unit and the second subset of buffer elements 825-835 are coupled to a second multiply and accumulate unit. It is to be noted that all of the buffer elements within each subset are coupled in series and each of the subsets are coupled in series. Therefore, the plurality of subset of buffer elements, wherein respective subsets of the buffer element are coupled to respective ones of the multiply and accumulate units, comprise "a serial shift buffer." In contrast, Whatmough teaches respective shift registers 202, 204 coupled to respective multiply and accumulate units 200. For example, a first row of buffers 202 holding matrix elements 11, 12, 13 shift into a first PE 200, a second row of buffers 220 holding matrix elements 21, 22, 23 shift into a second PE 200, and a third row of buffer 220 holding matrix elements 31, 32, 33 shift into a third PE 200. The first, second and third rows of buffer 220 are not coupled to each other in series (See FIGS. 2-6 and paragraphs 0030-0054). Therefore, Whatmough teaches a plurality separate shift registers coupled to respective multiply and accumulate unit. Consequently, Whatmough does not teach or suggest "wherein current values of the third matrix are concurrently computed from a current value of the first matrix loaded in the subset of the multiply and accumulate units and respective current values of the second matrix loaded in respective ones of the subset of multiply and accumulate units" as recited in independent Claim 1.”  

This argument is not found to be persuasive for the following reason. Whatmough disclosed in figure 2 and paragraph 31 that input features and weights are clocked into the processing elements of the systolic array each clock cycle. Figure 5 shows an individual processing element performing a MAC operation and output registers 516-518 storing input features and weights. These input features and weights are passed to adjacent processing elements each clock cycle. The data passing allows for concurrent MAC execution of input feature and weight matrix data values to compute the output third matrix result. 
The applicant notes that figure 8 does describe a distinctive feature of the serial shift buffer. This feature is buffer element outputting an input feature value to a PE/MAC cell and to an input of a second shift buffer element 820. The examiner notes that claiming this feature would likely overcome the current rejections as Whatmough doesn’t describe such a feedback feature within adjacent rows/columns of the data buffer coupled to adjacent PE/MAC cells.

	Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
The following is text cited from 37 CFR 1.111(c): In amending in reply to a rejection of claims in an application or patent under reexamination, the applicant or patent owner must clearly point out the patentable novelty which he or she thinks the claims present in view of the state of the art disclosed by the references cited or the objections made. The applicant or patent owner must also show how the amendments avoid such references or objections.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JACOB A. PETRANEK whose telephone number is (571)272-5988.  The examiner can normally be reached on M-F 8:00-4:30.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached on (571) 272-4169.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/JACOB PETRANEK/Primary Examiner, Art Unit 2183