DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Specification
The disclosure is objected to because of the following informalities.  Paragraph [0079] appears to have typographical error, an incomplete sentence, in line 25 reciting “Each column of processing engines 411”.
Appropriate correction is required.

Claim Objections
Claims 7-17 are objected to because of the following informalities.  
Claim 7 line 8 recites “the first subset”.  This limitation lacks antecedent basis.  Antecedent basis is present for “the first subset of input data elements”.  Claims 8-17 inherit the same deficiency as claim 7 by reason of dependence.
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.



Claims 7-17 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.

Claim 7 line 15 recites “the first weight data element”.  It is unclear whether this recites the first weight data element of an array of weight data elements, or the first weight data element in a rotated array of weight data elements. For purposes of examination, Examiner interprets this limitation as the first weight data element of the rotated array of weight data elements.   Claims 8-17 inherit the same deficiency as claim 11 by reason of dependence.
Claim 8 lines 8 and 9 recite “the first address” and “the second address” respectively. These limitations lack antecedent basis.  It is unclear to which addresses these refer. Claim 9-17 inherit the same deficiency as claim 8 by reason of dependence.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 7-10 are rejected under 35 U.S.C. 103 as being unpatentable over WO 2019079102 A1, Delaye et al.,  (hereinafter “Delaye”) in view of US 20220147797 A1 Mclelland et al., (hereinafter “Mclelland”).

Regarding claim 7, 
A non-transitory computer readable medium storing instructions that, when executed by one or more hardware processors, cause the one or more hardware processors to (fig 1, p 8 line 8 – p 11 line 22, fig 4, fig 6-600 comprising instructions for operating the processor): 
load a first weight data element of an array of weight data elements from a memory into a systolic array, the first weight data element having first coordinates in the array of weight data elements (p. 14 line 8-16, weight matrices for array of weight data elements read from RAM 226, , p. 3 line 19-22 filter, filter for weight, having parameters width, horizontal stride, and horizontal dilation for coordinates in the array of weight data elements, p. 16 line 15 – line 33 obtain filter data for load a first weight data element of an array of weight data elements, including filter size, stride, and dilation for first weight data having coordinates in the array, processor 606 includes a systolic array, claim 7, p. 19 line 26 – p. 21 line 8); 
extract, from the instructions, information indicating a first subset of input data elements to be obtained from the memory, the first subset being based on a stride of a transposed convolution operation (p. 16 lines 19-24 instruction data including convolution parameters including filter size, stride, and dilation, the convolution processor performs a convolution in which the input data matrix may be dilated, hence a transposed convolution operation is performed, also p. 2 lines 10-16, p. 17 line 33 – p. 18 line 12,  p. 19 lines 3-p. 21 line 8, fig 19 p. 23 lines 23- p. 24 line 23); 
based on the information, obtain the first subset of input data elements from the memory (p. 16 lines 19-24 processor 604 obtains blocks of image data 802 from memory, fig 7, 10; p. 17 line 33 – p. 18 line 12,  p. 19 lines 3-p. 21 line 8, fig 19 p. 23 lines 23- p. 24 line 23)
load the first subset of input data elements into the systolic array (p. 16 lines 19-24 the processor 606 receives the sample streams and convolutional filter data); and
control the systolic array to perform first computations based on the first weight data element and the first subset of input data elements to generate output data elements of an array of output data elements (p. 16 lines 29-33 the processor includes a systolic array of data processing units that execute multiply and accumulate operations based on the sample streams and the filter data to generate output image data).
Delaye discloses an architecture to compute generalized convolutions using an architecture by adapting the order in which input data elements are provided to the systolic array (p. 2 lines 7-16, p. 21 lines 8-27).  Delay does not, however, explicitly disclose the second coordinates of first weight data elements in a rotated array of weight data elements.  However, in the same field of endeavor, Mclelland discloses performing transposed convolution (fig 20, [0086], fig 22, [0227]).  Mclelland further discloses  second coordinates of first weight data element in a rotated array of weight data elements ([0164], fig 8-10, [0174-0189]). 
	It would have been obvious to one of ordinary skill in the art before the effective filing date to use Delaye’s architecture to compute generalized convolutions comprising an architecture that allows adapting the order in which data elements are provided to the systolic array to rotate the array of weight data elements according to coordinates of first weight data element as disclosed by Mclelland.  It would have been obvious to rotate the array of weight data elements to achieve the benefit to allow the hardware to quickly calculate the address of the event-weight product to be delivered ([0104]).

Regarding claim 8, in addition to the teachings addressed in the claim 7 analysis, Delaye teaches the following:
wherein the instructions include: 
a source address of the first subset of input data elements in the memory based on the second coordinates (p. 18 line 22 – p. 19 line 2, fig 7, fig 9, p. 21 line 28 – p. 23 line 22, fig 10 p. 23 line 23 – p. 24 line 9, p. 3 line 19-22), 
wherein the execution of the instructions causes the one or more hardware processors to obtain the first weight data element from the memory and to obtain the first subset of input data elements from the memory based on the second address (p. 18 line 22 – p. 19 line 2, fig 7, fig 9, p. 21 line 28 – p. 23 line 22, fig 10 p. 23 line 23 – p. 24 line 9, p. 3 line 19-22).
Delaye further discloses different patterns of the plurality of storage locations of the weight data, the processor receiving weight data, and operations on specific elements of weight data (p. 3 lines 19-22, p. 16 lines 25-26, fig 8 p. 19 line 26-27).  Delaye does not, however, explicitly disclose a weight address of the first weight data element in the memory based on the first coordinates, and wherein the execution of the instructions causes the one or more hardware processors to obtain the first weight data element from the memory based on the first address.  However, in the same field of endeavor, Mclelland discloses:
a weight address of the first weight data element in the memory based on the first coordinates ([0174-0176]), and 
obtain the first weight data element from the memory based on the first address ([0174-0176], fig 11, [0190], fig 12).
The motivation to combine provided with respect to claim 7 applies equally to claim 8.

	Regarding claim 9, Delaye teaches the claim 8 limitations.  Delaye is silent with respect to storing in contiguous address space within memory.  However, in the same field of endeavor Mclelland discloses:
	input data elements are stored in contiguous address space within the memory ([0098], potential for input data element  [0177]).
The motivation to combine provided with respect to claim 7 applies equally to claim 9.

Regarding claim 10, in addition to the teachings addressed in the claim 9 analysis, Delaye teaches the following:
wherein the source address is a first source address associated with a first portion of the contiguous address space that stores the first subset of input data elements (p. 17 lines 22-p. 18 line 12 patterns of locations based on widths, strides and dilation, first of the addresses for first source address associated with a first portion, p. 16 lines 15-28 blocks of data for subset, contiguous as in claim 9); 
wherein the instructions include: 
the first source address (p. 16 lines 15-28, in response to instruction data image blocks obtained, fig 10 lines 23-29); 
a first count of input data elements in the first subset of input data elements (fig 13 cycle 1, ); 
a second source address associated with a second portion of the contiguous address space that stores a second subset of input data elements (p. 16 lines 15-28, in response to instruction data image blocks obtained, p. 23 line 30 – p. 24 line 9, subsequent row of the entire image for second portion with associated second source address, contiguous as in claim 9); and 
a second count of input data elements in the second subset of input data elements (fig 13 cycle 2); 
wherein the instructions, when executed by the one or more hardware processors, further cause the one or more hardware processors to: 
obtain, from the memory, the first subset of the input data elements based on the first source address and the first count (p. 22 line 6 – line 34 first row, p. 23 line 23 – p. 24 line 23,  fig 9, fig 10); 
control the systolic array to perform the first computations between the first subset of input data elements with the first weight data element to generate first partial sums (p. 16 line 15 – line 28); 
obtain, from the memory, the second subset of the input data elements based on the second source address and the second count; obtain a second weight data element from the memory (p. 22 line 6 – line 34 subsequent row, p. 23 line 23 – p. 24 line 23, fig 9, fig 10, fig 8A, 8B); and 
control the systolic array to perform second computations between the second subset of the input data elements with the second weight data element to generate second partial sums (p. 16 line 15 – line 28, p. 1 line 20-28); and 
wherein the output data elements are generated from the first partial sums and the second partial sums (fig 8A, 8B, p. 1 line 20-28).

Allowable Subject Matter
Claims 11-17 would be allowable if rewritten to overcome the rejections under 35 USC 112(b), and if rewritten in independent form including all the limitations of the base claim and any intervening claims. The following is a statement of reasons for the indication of allowable subject matter:  
Applicant claims non-transitory computer readable medium storing instructions that, when executed by one or more hardware processors cause the one or more hardware processors to perform operations including control of a systolic array to perform computations, including instructions as in claim 11 wherein the instructions include
first destination addresses of a summation buffer to receive first partial sums and second destination addresses of the summation buffer to receive second partial sums; 
wherein the first destination addresses are based on a stride pattern being shifted from a reference location by a first offset, the first offset being based on first coordinates, wherein the first coordinates are of a first weight data element in an array of weight data elements; 
wherein the second destination addresses are based on the stride pattern being shifted from the reference location by a second offset, the second offset being based on second coordinates, wherein the second coordinates are of a first weight data element in a rotated array of weight data elements; and 
wherein the stride pattern is based on the stride of a transposed convolution operation.
 .
The primary reason for indication of allowable subject matter are the limitations in combination with the remaining limitations, wherein .
wherein the first destination addresses are based on a stride pattern being shifted from a reference location by a first offset, the first offset being based on first coordinates, wherein the first coordinates are of a first weight data element in an array of weight data elements; 
wherein the second destination addresses are based on the stride pattern being shifted from the reference location by a second offset, the second offset being based on second coordinates, wherein the second coordinates are of a first weight data element in a rotated array of weight data elements.

Delaye is the closest prior art found.  Delaye discloses the claimed invention as in the above claim mappings.  Delaye discloses an architecture to compute generalized convolutions using an architecture by adapting the order in which input data elements are provided to the systolic array (p. 2 lines 7-16, p. 21 lines 8-27).  Delaye does not, however, explicitly disclose a rotated array of data elements. Delaye further does not explicitly disclose first destination addresses of a summation buffer to receive first partial sums and second destination addresses of the summation buffer to receive second partial sums; wherein the first destination addresses are based on a stride pattern being shifted from a reference location by a first offset, the first offset being based on first coordinates, wherein the first coordinates are of a first weight data element in an array of weight data elements; and wherein the second destination addresses are based on the stride pattern being shifted from the reference location by a second offset, the second offset being based on second coordinates, wherein the second coordinates are of a first weight data element in a rotated array of weight data elements.
Mclelland discloses transposed convolution as in the above claim mappings.  Mclelland further discloses  memory for storing data representative of at least one kernel, a plurality of spiking neuron circuits, a transformation module to transform a kernel and a convolutional neural processor to perform event-based convolution using memory and at least one of the transformed input spike array and transformed kernel (abstract).  Mclelland further discloses calculating output event SRAM addresses for each row using the inverted kernel format ([0185-0186]).  Mclelland does not, however explicitly disclose wherein the first destination addresses are based on a stride pattern being shifted from a reference location by a first offset, the first offset being based on first coordinates, wherein the first coordinates are of a first weight data element in an array of weight data elements; and wherein the second destination addresses are based on the stride pattern being shifted from the reference location by a second offset, the second offset being based on second coordinates, wherein the second coordinates are of a first weight data element in a rotated array of weight data elements.
US 20210056396 A1 Majnemer et al., (hereinafter “Majnemer”) discloses general padding support for convolution in systolic arrays (abstract). Majnemer further discloses including data access to and from memory, with access including based on stride (fig 3, 4, 7).  Majnemer further discloses instructions including a destination data store address in memory, and identification of stride length for shifting the data transfer from memory ([0045-0047]). Majnemer does not, however explicitly disclose wherein the first destination addresses are based on a stride pattern being shifted from a reference location by a first offset, the first offset being based on first coordinates, wherein the first coordinates are of a first weight data element in an array of weight data elements; and wherein the second destination addresses are based on the stride pattern being shifted from the reference location by a second offset, the second offset being based on second coordinates, wherein the second coordinates are of a first weight data element in a rotated array of weight data elements.
US 20190138898 A1 Song et al., (hereinafter “Song”) discloses a method and apparatus for neural network performing of deconvolution (abstract).  Song further discloses a processor configured to obtain from memory, a first kernel, calculate a second kernel by rotating an arrangement of matrix elements and adjustment including a stride comprised in the first kernel, perform a convolution operation between an input feature map and the second kernel to generate convolution results (abstract, [0078], fig 4). Song does not explicitly disclose first destination addresses of a summation buffer to receive first partial sums and second destination addresses of the summation buffer to receive second partial sums; wherein the first destination addresses are based on a stride pattern being shifted from a reference location by a first offset, the first offset being based on first coordinates, wherein the first coordinates are of a first weight data element in an array of weight data elements; and wherein the second destination addresses are based on the stride pattern being shifted from the reference location by a second offset, the second offset being based on second coordinates, wherein the second coordinates are of a first weight data element in a rotated array of weight data elements.
US 20220172033 A1 Ross et al., (hereinafter “Ross”) discloses rotating data for neural network computations including convolutions performed in a systolic array (abstract, fig 3, fig 9). Ross does not explicitly disclose first destination addresses of a summation buffer to receive first partial sums and second destination addresses of the summation buffer to receive second partial sums; wherein the first destination addresses are based on a stride pattern being shifted from a reference location by a first offset, the first offset being based on first coordinates, wherein the first coordinates are of a first weight data element in an array of weight data elements; and wherein the second destination addresses are based on the stride pattern being shifted from the reference location by a second offset, the second offset being based on second coordinates, wherein the second coordinates are of a first weight data element in a rotated array of weight data elements.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to EMILY E LAROCQUE whose telephone number is (469)295-9289.  The examiner can normally be reached on 10:00am - 1200pm, 2:00pm - 8pm ET M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor Jyoti Mehta can be reached on 571-270-3995.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/EMILY E LAROCQUE/Primary Examiner, Art Unit 2182