DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statements (IDS) submitted on August 27, 2019; May 6, 2020; July 31, 2020; April 12, 2021; and May 20, 2022 were filed after the mailing date of the application on August 27, 2019.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statements are being considered by the examiner.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-4, 7, 11-14, 17, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ross (US010521488B1) in view of Takahashi (US 20180217962A1) and Webb (US006061749A).
As per Claim 1, Ross teaches a processor, comprising: a first tile, a second tile (a first number of activation registers in a first group of cells and a second, lesser number of activation registers in a second group of cells, col. 10, lines 54-58).  Since Ross teaches a memory (210), and the memory sends the weights and activations to the first tile and the second tile (dynamic memory 210 can send the sets of weight inputs and the sets of activation inputs to the matrix computation unit 212, the matrix computation unit 212 may be a two-dimensional systolic array of cells, col. 6, line 63-col. 7, line 2; col. 10, lines 54-58), it would have been obvious to one of ordinary skill in the art that there is a bus connected to the memory, the first tile, and the second tile in order for the memory to send the weights and activations to the first tile and the second tile (col. 6, line 63-col. 7, line 2; col. 10, lines 54-58).  Ross teaches the first tile comprising:  a first weight register, a second weight register (each cell of the first plurality of cells including: a weight register configured to store a weight input, col. 2, lines 17-19), an activations buffer (activation registers within cell, col. 9, lines 48-51), a first multiplier, and a second multiplier (each cell of the first plurality of cells including multiple activation registers, each activation register of the multiple activation registers configured to store a corresponding activation input, multiplexer circuitry communicatively coupled to the multiple activation registers and configured to select, from the multiple activation registers, one of the activation input as a selected activation input, and multiplication circuitry communicatively coupled to the weight register and to the multiplexer, in which the multiplication circuitry is configured to output a product of the weight input and the selected activation input, col. 2, lines 17-28; cell may include multiple activation registers (506a, 506b, 506c) that store activation inputs, col. 10, lines 40-41, Fig. 5), the first tile being configured to perform a convolution of an array of activations with a kernel of weights (col. 6, lines 30-31; col. 9, lines 54-56; col. 1, lines 31-32), the performing of the convolution comprising, in order:  forming a tensor product of the kernel with a first subarray of the array of activations; forming a tensor product of the kernel with a second subarray of the array of activations, the second subarray being offset from the first subarray by n array elements in a first direction, n being a positive integer (col. 9, lines 28-35, 47-56; col. 11, lines 33-35; col. 5, lines 5-6; Fig. 5; col. 1, lines 31-32).
	However, Ross does not teach forming a tensor product of the kernel with a third subarray of the array of activations, the third subarray being offset from the second subarray by one array element in a second direction, perpendicular to the first direction.  However, Takahashi teaches forming a tensor product of the weight with a third subarray of the array of activations by shifting the position of the weight in the second direction, perpendicular to the first direction (multiplication operation unit 112 iteratively performs the process of performing the multiplication and thereafter shifting the weight data 221 by one stride in the row direction, if the weight data 221 reaches the end of the row, then, in the following calculation iteration, the multiplication operation unit 112 shifts the position of weight data 221 in the column direction by on stride, [0084], convolution operation unit receives an input from activation process unit, convolution operation unit performs the convolution operation using weight data, [0113]).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Ross to include forming a tensor product of the weight with a third subarray of the array of activations by shifting the position of the weight in the second direction, perpendicular to the first direction because Takahashi suggests that in order to do this for the entire array, if the weight reaches the end of the row, then it needs to shift the position of the weight data in the column direction [0084].
However, Ross and Takahashi do not teach forming a tensor product of the kernel with a third subarray of the array of activations, the third subarray being offset from the second subarray by one array element in a second direction, perpendicular to the first direction.  However, Webb teaches forming a tensor product (in convolution, the processing block multiplies interpolated pixels with coefficients in a sub-sample weight matrix, col. 96, lines 53-57) of the kernel with a third subarray, the third subarray being offset from the second subarray by one array element in a second direction, perpendicular to the first direction (address generator adds the horizontal delta to the current coordinates until one row of the matrix is finished, after that address generator adds the vertical delta to the current coordinates to find the coordinates in the next row, after that address generator subtracts the horizontal delta form the current coordinates to find the next coordinates, until one more row is finished, after that, address generator adds the vertical delta to the current coordinates and the procedure is repeated again, using this scheme, the matrix is traversed in a zig-zag way, the accumulation matrix coefficients must be listed in the kernel descriptor in the same order, col. 102, lines 26-41).  Since Takahashi teaches forming a tensor product of the weight with a third subarray of the array of activations by shifting the position of the weight in the second direction, perpendicular to the first direction [0084, 0113], this teaching from Webb can be implemented into the array of activations of Takahashi so that it forms a tensor product of the kernel with a third subarray of the array of activations, the third subarray being offset from the second subarray by one array element in a second direction, perpendicular to the first direction.  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Ross and Takahashi to include forming a tensor product of the kernel with a third subarray of the array of activations, the third subarray being offset from the second subarray by one array element in a second direction, perpendicular to the first direction because Webb suggests that using this scheme, fewer registers are required (col. 102, lines 37-41).
8.	As per Claim 2, Ross does not teach the performing of the convolution further comprises, in order, after the forming of the tensor product of the kernel with the third subarray:  forming a tensor product of the kernel with a fourth subarray of the array of activations, the fourth subarray being offset from the third subarray by m array elements in a third direction, opposite to the first direction, m being a positive integer, and forming a tensor product of the kernel with a fifth subarray of the array of activations, the fifth subarray being offset from the fourth subarray by one array element in the second direction.  However, Takahashi teaches wherein the performing of the convolution comprises, forming the tensor product of the weight with each sub array of the array of activations by shifting the position of the weight in the row direction by one stride.  If the weight reaches the end of the row, then, in the following calculation iteration, the multiplication operation unit shifts the position of the weight in the column direction by one stride, and the multiplication operation returns the position of the weight to the top position of the row direction.  The multiplication operation unit again iteratively performs the process of performing multiplication and thereafter shifting the weight by one stride in the row direction.  The multiplication operation unit repeats the multiplication until it reaches the bottom row [0084, 0113].  This would be obvious for the reasons given in the rejection for Claim 1.
	However, Ross and Takahashi do not teach wherein the performing of the convolution further comprises, in order, after the forming of the tensor product of the kernel with the third subarray:  forming a tensor product of the kernel with a fourth subarray of the array of activations, the fourth subarray being offset from the third subarray by m array elements in a third direction, opposite to the first direction, m being a positive integer, and forming a tensor product of the kernel with a fifth subarray of the array of activations, the fifth subarray being offset from the fourth subarray by one array element in the second direction.  However, Webb teaches that if the array had 2 columns, then wherein the performing of the convolution further comprises, in order, after the forming of the tensor product of the kernel with the third subarray; forming a tensor product of the kernel with a fourth subarray, the fourth subarray being offset from the third subarray by m array elements in a third direction, opposite to the first direction, m being a positive integer, and forming a tensor product of the kernel with a fifth subarray, the fifth subarray being offset from the fourth subarray by one array element in the second direction. (col. 96, lines 53-57; col. 102, lines 26-41).  Thus, this teaching from Webb can be implemented into the array of activations of Takahashi so that the performing of the convolution further comprises, in order, after the forming of the tensor product of the kernel with the third subarray:  forming a tensor product of the kernel with a fourth subarray of the array of activations, the fourth subarray being offset from the third subarray by m array elements in a third direction, opposite to the first direction, m being a positive integer, and forming a tensor product of the kernel with a fifth subarray of the array of activations, the fifth subarray being offset from the fourth subarray by one array element in the second direction.  This would be obvious for the reasons given in the rejection for Claim 1.
9.	As per Claim 3, Ross and Takahashi do not teach wherein m equals n.  However, Webb teaches wherein m equals n (col. 102, lines 26-41).  This would be obvious for the reasons given in the rejection for Claim 1.
10.	As per Claim 4, Ross teaches wherein n equals 1 (weight input at cell 314 may be shifted to a corresponding weight register within cell 318, which is below cell 314, col. 9, lines 51-53; Fig. 4).
11.	As per Claim 7, Ross teaches wherein:  the activations buffer is configured to include:  a first queue connected to the first multiplier, and a second queue connected to the second multiplier, the first queue comprises a first register and a second register adjacent to the first register, the first register being an output register of the first queue (each cell of the first plurality of cells including multiple activation registers, each activation register of the multiple activation registers configured to store a corresponding activation input, multiplexer circuitry communicatively coupled to the multiple activation registers and configured to select, from the multiple activation registers, one of the activation input as a selected activation input, and multiplication circuitry communicatively coupled to the weight register and to the multiplexer, in which the multiplication circuitry is configured to output a product of the weight input and the selected activation input, col. 2, lines 17-28; cell may include multiple activation registers (506a, 506b, 506c) that store activation inputs, col. 10, lines 40-41, Fig. 5), the first file is further configured:  in a first state:  to multiply, in the first multiplier, a first weight by an activation from the output register of the first queue, and in a second state:  to multiply, in the first multiplier, the first weight by an activation from the second register of the first queue (mux select value of 0 may result in selection of an activation input from register 506a, a mux select value of 1 may result in select of an activation input from register 506b, and a mux select value of 2 may result in selection of an activation input from register 506c, multiplication circuitry 508 may be used to multiply the weight input from the weight register 502 with the selected activation input from the activation register 506, col. 11, lines 25-35).
12.	As per Claims 11-14 and 17, these claims are similar in scope to Claims 1-4 and 7 respectively, and therefore are rejected under the same rationale.  As per Claim 20, Claim 20 is similar in scope to Claim 11, and therefore is rejected under the same rationale.
13.	Claims 8, 9, 18, and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ross (US010521488B1), Takahashi (US 20180217962A1), and Webb (US006061749A) in view of Yan (US 20180218518A1).
14.	As per Claim 8, Ross, Takahashi, and Webb are relied upon for the teachings as discussed above relative to Claim 7.
	However, Ross, Tahakashi, and Webb do not teach wherein, in the second state, the output register of the first queue contains zero.  However, Yan teaches preventing the input activation registers 262 and weight registers 260 from updating the input activation and weight values output to the input registers 275 when either the input activation or the weight equals zero [0042].  Fig. 2C shows that an input activation is output from an input activation register 262 to the input registers 275, and a weight is output from a weight register 260 to the input registers 275, and the input activation and the weight are output from the input registers 275 to the multiplier 280 where they are multiplied.  Thus, it would have been obvious to one of ordinary skill in the art that in a state where the last input activation register 262 contains zero, then the last input activation register 262 is prevented from updating the input activation output to the input register 275 and thus is not output to the multiplier 280, so then a second input activation register 262 that contains a non-zero updates the input activation output to the input register 275 and thus is output to the multiplier 280 [0042] (Fig. 2C).  Thus, Yan teaches wherein, in the second state, the output register of the first queue contains zero.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Ross, Tahakashi, and Webb so that, in the second state, the output register of the first queue contains zero because Yan suggests that this avoids performing multiplication operations when the product is zero which reduces energy consumption [0020].
15.	As per Claim 9, Ross, Tahakashi, and Webb do not teach further comprising:  a first adder, configured, in the first state:  to be connected to an output of the first multiplier, and an output of the second multiplier, and to add; a product received from the output of the first multiplier, and a product received from the output of the second multiplier.  However, Yan teaches further comprising:  a first adder, configured, in the first state:  to be connected to an output of the first multiplier, and an output of the second multiplier, and to add; a product received from the output of the first multiplier, and a product received from the output of the second multiplier (each of the PEs 250 generates a product by multiplying a weight value and an input activation, the products for each pipeline stage are summed by an adder 243 to produce a partial product, the partial products generated by the PEs 250 in the PE array 240 are summed by an adder 286 and the resulting partial sum is output to the accumulator 245, [0037], multiple PEs 250 are included within the PE array 240, each PE 250 includes a multiplier 280, [0038]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Ross, Tahakashi, and Webb to include a first adder, configured, in the first state:  to be connected to an output of the first multiplier, and an output of the second multiplier, and to add; a product received from the output of the first multiplier, and a product received from the output of the second multiplier because Yan suggests that this is needed to output the sum to the accumulator [0037], which is needed to complete the convolution operation [0032], and it is well-known in the art that convolutional neural networks have become the most popular algorithmic approach for deep learning for many domains [0002].
16.	As per Claims 18-19, these claims are similar in scope to Claims 8-9, and therefore are rejected under the same rationale.
Allowable Subject Matter
17.	Claims 5, 6, 10, 15, and 16 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:  
18.	The prior art taken singly or in combination do not teach or suggest the combination of all the limitations of Claim 5 and base Claim 1, and in particular, do not teach wherein the performing of the convolution further comprises, in order, after the forming of the products of the kernel with the first subarray:  forming n-1 products of the kernel with n-1 respective subarrays of the array of activations, the subarray in a k-th product, of the n-1 products, being offset from the first subarray by k+1 array elements in the first direction.  Claim 6 depends from Claim 5, and therefore also contains allowable subject matter.  Claims 15-16 are similar in scope to Claims 5-6 respectively, and therefore also contain allowable subject matter.
19.	The prior art taken singly or in combination do not teach or suggest the combination of all the limitations of Claim 10 and base Claim 1 and intervening Claims 7 and 9, and in particular, do not teach a second adder, configured, in the second state, to be connected to the output of the first multiplier.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JONI HSU whose telephone number is (571)272-7785. The examiner can normally be reached M-F 10am-6:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached on (571)272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

JH
/JONI HSU/Primary Examiner, Art Unit 2611