DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1, 3, 5, 8, 9, 11, 13, 15, 18, 19, and 20 have been amended.
Claims 4 and 14 have been cancelled.
Claims 1-3, 5-13, and 15-20 have been examined.
The claim objections in the previous Office Action have been addressed and are withdrawn.
The § 112 rejections in the previous Office Action have been addressed and are withdrawn.

Information Disclosure Statement
The applicant's submission of the Information Disclosure Statement dated July 13, 2022 is acknowledged by the examiner and the cited references have been considered in the examination of the claims now pending. A copy of the PTOL-1449 initialed and dated by the examiner is attached to the instant office action.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1, 2, 5, 6, 11, 12, 15, 19, and 20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by US Publication No. 2021/0011732 by Botimer et al. (hereinafter referred to as “Botimer”).
Regarding claims 1 and 11, taking claim 1 as representative, Botimer discloses:
a method for performing a two dimensional (2D) convolution operation, the method comprising: storing, by a processor, a convolution kernel in a first storage device of the processor, the convolution kernel having dimensions x by y, wherein x is a number of rows in the convolution kernel and y is a number of columns in the convolution kernel (Botimer discloses, at ¶ [0033], loading a weight value from a first memory, which discloses the weight values being stored by a processor. As disclosed at ¶ [0032], the weight value is one of a matrix of weight values, which discloses x rows by y columns, being used in a two dimensional convolution operation. Botimer also discloses, at ¶ [0031], implementing the technology with one or more processors and one or more memories.); 
storing, by the processor, in a second storage device of the processor, a first subset of element values of an input feature map having dimensions n by m, wherein n is a number of rows in the input feature map and m is a number of columns in the input feature map (Botimer discloses, at ¶ [0033], loading a plurality of adjacent values of an input feature map (IFM) from a second memory. As disclosed at ¶ [0032], the IFM values are part of a matrix, which discloses n rows by m columns.);
performing a first simultaneous multiplication, by the processor, of each value of the first subset of element values of the input feature map with a first element value from among the x * y elements of the convolution kernel (Botimer discloses, at ¶ [0035], simultaneously multiplying each of the IFM values by the weight value.) ; 
shifting, by the processor, the first subset of element values one register to the left in a plurality of registers of the second storage device (Botimer discloses, at ¶ [0039], a plurality of shift buffer elements, which discloses shifting to the left.); 
based on the shifting, performing a second simultaneous multiplication, by the processor, of each value of a second subset of element values of the input feature map with a second element value from among the x * v elements of the convolution kernel, wherein the first subset of element values of the input feature map comprises values in first to p-th column of a first row of the input feature map, and the second subset of element values of the input feature map comprises values in second to (p+1)-th column of the first row of the input feature map (Botimer discloses, at ¶¶ [0033]-[0036], ¶ [0039], and Figure 6, simultaneously multiplying a second subset of IFM values, where the second subset is shifted with respect to a first subset, e.g., the first subset of input values and the second set of are in the first and second columns and second and third columns, respectively, by a second value of the weight values matrix.);
for each remaining value of the x * y elements of the convolution kernel, performing, by the processor, a simultaneous multiplication of the remaining value with a corresponding subset of element values of the input feature map (Botimer discloses, at ¶ [0036], iteratively performing corresponding multiplications with the remaining elements.); 
for each simultaneous multiplication, storing, by the processor, result of the simultaneous multiplication in an accumulator connected to the processor (Botimer discloses, at ¶ [0035], accumulating the results of each multiplication.); and 
outputting, by the processor, the values of the accumulator as a first row of an output feature map (OFM) (Botimer discloses, at ¶ [0037], outputting the results as corresponding values of an OFM.).

Regarding claims 2 and 12, taking claim 2 as representative, Botimer discloses the elements of claim 1, as discussed above. Botimer also discloses:
wherein outputting the values of the accumulator as the first row of the OFM comprises adding, by the processor, results of plurality of simultaneous multiplications stored in the accumulator to generate the first row of the OFM (Botimer discloses, at ¶ [0035], accumulating the results of each multiplication to produce corresponding OFM values.).

Regarding claims 5 and 15, taking claim 5 as representative, Botimer discloses the elements of claim 1, as discussed above. Botimer also discloses:
broadcasting, by the processor, the first element value from among x * y elements of the convolution kernel to the second storage device (Botimer discloses, at ¶ [0035], using the first weight value with multiple multiply and accumulate units in calculations with each of the subset of input values, which discloses the claimed broadcasting to the second storage device.); and 
broadcasting, by the processor, the second element value from among the x *y elements of the convolution kernel to the second storage device (Botimer discloses, at ¶ [0035], using the second weight value with multiple multiply and accumulate units in calculations with each of the subset of input values, which discloses the claimed broadcasting to the second storage device.). 

Regarding claims 6, Botimer discloses the elements of claim 1, as discussed above. Botimer also discloses:
wherein the convolution kernel having the dimension x by y is a weight matrix, the weight matrix comprising a plurality of weight values, the plurality of weight values of the weight matrix are being stored in a cache memory connected to the processor, and wherein the input feature map having the dimension n by m comprises a plurality of activation values (Botimer discloses, at ¶ [0032], the first matrix is a weight matrix stored in a cache (see ¶ [0033] and the second matrix is a matrix of input feature map (activation) values.).

Regarding claim 19, Botimer discloses:
a method comprising: performing a first simultaneous multiplication, by a processor, of each value of a first subset of element values of a first matrix with a first element value from among x * y elements of a second matrix (Botimer discloses, at ¶ [0035], simultaneously multiplying each of the IFM values by the weight value.);
shifting, by the processor, the first subset of element values one register to the left in a plurality of registers of the second storage device (Botimer discloses, at ¶ [0039], a plurality of shift buffer elements, which discloses shifting to the left.); 
based on the shifting, performing a second simultaneous multiplication, by the processor, of each value of a second subset of element values of the input feature map with a second element value from among the x * v elements of the convolution kernel, wherein the first subset of element values of the input feature map comprises values in first to p-th column of a first row of the input feature map, and the second subset of element values of the input feature map comprises values in second to (p+1)-th column of the first row of the input feature map (Botimer discloses, at ¶¶ [0033]-[0036], ¶ [0039], and Figure 6, simultaneously multiplying a second subset of IFM values, where the second subset is shifted with respect to a first subset, e.g., the first subset of input values and the second set of are in the first and second columns and second and third columns, respectively, by a second value of the weight values matrix.);

for each remaining value of the x * y elements of the second matrix, performing, by the processor, a simultaneous multiplication of the remaining value with a corresponding subset of element values of the first matrix (Botimer discloses, at ¶ [0036], iteratively performing corresponding multiplications with the remaining elements.);
for each simultaneous multiplication, storing, by the processor, result of the simultaneous multiplication in an accumulator connected to the processor (Botimer discloses, at ¶ [0035], accumulating the results of each multiplication.); and 
outputting, by the processor, the values of the accumulator as a first row of an output feature map (OFM) (Botimer discloses, at ¶ [0037], outputting the results as corresponding values of an OFM.).

Regarding claim 20, Botimer discloses the elements of claim 19, as discussed above. Botimer also discloses:
storing, by the processor, the second matrix in a first storage device of the processor, the second matrix having dimensions x by y, wherein x is a number of rows in the second matrix and y is a number of columns in the second matrix (Botimer discloses, at ¶ [0033], loading a weight value from a first memory, which discloses the weight values being stored by a processor. As disclosed at ¶ [0032], the weight value is one of a matrix of weight values, which discloses x rows by y columns, being used in a convolution operation.); 
storing, by the processor, in the second storage device connected to the processor, the first subset of element values of the first matrix having dimensions n by m, wherein n is a number of rows in the first matrix and m is a number of columns in the first matrix (Botimer discloses, at ¶ [0033], loading a plurality of adjacent values of an input feature map (IFM) from a second memory. As disclosed at ¶ [0032], the IFM values are part of a matrix, which discloses n rows by m columns.); and 
storing, by the processor, in the second storage device, the second subset of element values of the first matrix (Botimer discloses, at ¶ [0035], storing a second set of input values.).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 3 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Botimer in view of US Patent No. 5,285,403 by Quisquater et al. (hereinafter referred to as “Quisquater”). 
Regarding claims 3 and 13, taking claim 3 as representative, Botimer discloses the elements of claim 1, as discussed above. Botimer also discloses:
wherein the first storage device comprises a cache memory connected to the processor (Botimer discloses, at ¶ [0033], the weights are stored in various types of memory, which discloses a cache.), and 
wherein the second storage device comprises the plurality of registers located in a processing element (PE) of a multiply-accumulate (MAC) tile of the processor (Botimer discloses, at ¶ [0039], a plurality of shift buffer elements, which discloses registers.).
Botimer does not explicitly disclose the plurality of registers are 9-bit registers.
However, in the same field of endeavor (e.g., arithmetic processing) Quisquater discloses 
9-bit registers (Quisquater discloses, at col. 10, lines 44-49, 9-bit registers.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Botimer’s shift elements to utilize 9 bits, as disclosed by Quisquater because the number of bits to use is an arbitrary design choice. Therefore, this modification merely entails a combination of prior art elements (cited above) according to known methods to yield predictable results, which is an exemplary rationale to support a conclusion of obviousness, as per MPEP § 2143.

Claims 7-10 and 16-18 are rejected under 35 U.S.C. 103 as being unpatentable over Botimer in view of US Publication No. 2020/0210517 by Baum et al. (hereinafter referred to as “Baum”). 
Regarding claim 7, Botimer discloses the elements of claim 6, as discussed above. Botimer also discloses:
...an adder tree (Botimer discloses, at ¶ [0035], accumulating products, which discloses an adder tree.).
Botimer does not explicitly disclose wherein the processor comprises a plurality of multiply- accumulate (MAC) tiles, each MAC tile comprising an array of processing element (PE) comprising a plurality of PE rows and a plurality of PE columns, each PE column of the plurality of PE columns comprising a plurality of PE....
However, in the same field of endeavor (e.g., matrix computations) Baum discloses:
a plurality of multiply- accumulate (MAC) tiles, each MAC tile comprising an array of processing element (PE) comprising a plurality of PE rows and a plurality of PE columns, each PE column of the plurality of PE columns comprising a plurality of PEs... (Baum discloses, at Figure 21D, an array of PEs comprising rows and columns.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Botimer’s calculations to run on a PE array, as disclosed by Baum, because this modification merely entails a combination of prior art elements (cited above) according to known methods to yield predictable results, which is an exemplary rationale to support a conclusion of obviousness, as per MPEP § 2143.

Regarding claim 8, Botimer, as modified, discloses the elements of claim 7, as discussed above. 
Botimer does not explicitly disclose wherein each MAC tile further comprises a cache memory comprising a plurality of parallel activation lanes, each activation lane corresponding to a row of the array of PEs in the MAC tile. 
However, in the same field of endeavor (e.g., matrix computations) Baum discloses:
wherein each MAC tile further comprises a cache memory comprising a plurality of parallel activation lanes, each activation lane corresponding to a row of the array of PEs in the MAC tile (Baum discloses, at Figure 21D, each row comprises storage, which discloses the claimed cache.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Botimer’s calculations to run on a PE array utilizing cache, as disclosed by Baum, because this modification merely entails a combination of prior art elements (cited above) according to known methods to yield predictable results, which is an exemplary rationale to support a conclusion of obviousness, as per MPEP § 2143.

Regarding claims 9 and 18, taking claim 9 as representative, Botimer, as modified, discloses the elements of claim 7, as discussed above. Botimer also discloses:
wherein each PE of the plurality of PEs comprises the plurality of registers configured to store the plurality of activation values of the input feature map, a multiplier connected to the plurality of registers and configured to multiply activations from the input feature map by the weight values from the cache memory, and a PE accumulator connected to the multiplier and configured to add an output from the multiplier to a value from the plurality of registers and store the result in the plurality of registers (Botimer discloses, at ¶ [0039], a plurality of shift buffer elements, which discloses registers, that store the IFM values and are connected to multiply and accumulate units to multiply and accumulate the input and weight values.)

Regarding claim 10, Botimer, as modified, discloses the elements of claim 7, as discussed above. Botimer also discloses:
Botimer does not explicitly disclose wherein the array of PEs in the MAC tile comprises eight PE columns, wherein each PE column of the eight PE columns of the MAC tile comprises sixteen PEs.
However, in the same field of endeavor (e.g., matrix computations) Baum discloses:
wherein the array of PEs in the MAC tile comprises eight PE columns, wherein each PE column of the eight PE columns of the MAC tile comprises sixteen PEs (Baum discloses, at Figure 21B, an 8x8 array. Baum also discloses, at ¶ [0079], using 16 rows.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Botimer’s calculations to run on a PE array, as disclosed by Baum, because this modification merely entails a combination of prior art elements (cited above) according to known methods to yield predictable results, which is an exemplary rationale to support a conclusion of obviousness, as per MPEP § 2143.

Regarding claim 16, Botimer discloses the elements of claim 11, as discussed above. Botimer also discloses:
wherein the convolution kernel having the dimension x by y is a weight matrix, the weight matrix comprising a plurality of weight values, the plurality of weight values of the weight matrix are being stored in a cache memory connected to the processor, and wherein the input feature map having the dimension n by m comprises a plurality of activation values, wherein the processor comprises...an adder tree (Botimer discloses, at ¶ [0032], the first matrix is a weight matrix stored in a cache (see ¶ [0033] and the second matrix is a matrix of input feature map (activation) values. Botimer also discloses, at ¶ [0035], accumulating products, which discloses an adder tree.).
Botimer does not explicitly disclose wherein the processor comprises a plurality of multiply- accumulate (MAC) tiles, each MAC tile comprising an array of processing element (PE) comprising a plurality of PE rows and a plurality of PE columns, each PE column of the plurality of PE columns comprising a plurality of PE....
However, in the same field of endeavor (e.g., matrix computations) Baum discloses:
a plurality of multiply- accumulate (MAC) tiles, each MAC tile comprising an array of processing element (PE) comprising a plurality of PE rows and a plurality of PE columns, each PE column of the plurality of PE columns comprising a plurality of PEs... (Baum discloses, at Figure 21D, an array of PEs comprising rows and columns.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Botimer’s calculations to run on a PE array, as disclosed by Baum, because this modification merely entails a combination of prior art elements (cited above) according to known methods to yield predictable results, which is an exemplary rationale to support a conclusion of obviousness, as per MPEP § 2143.

Regarding claim 17, Botimer discloses the elements of claim 16, as discussed above Botimer does not explicitly disclose wherein each MAC tile further comprises a cache memory comprising a plurality of parallel activation lanes, each activation lane being corresponding to a row of the array of PEs in the MAC tile, wherein the array of PEs in the MAC tile comprises eight PE columns, wherein each PE column of the eight PE columns of the MAC tile comprises sixteen PEs. 
However, in the same field of endeavor (e.g., matrix computations) Baum discloses:
wherein each MAC tile further comprises a cache memory comprising a plurality of parallel activation lanes, each activation lane being corresponding to a row of the array of PEs in the MAC tile, wherein the array of PEs in the MAC tile comprises eight PE columns, wherein each PE column of the eight PE columns of the MAC tile comprises sixteen PEs (Baum discloses, at Figure 21D, each row comprises storage, which discloses the claimed cache. Baum also discloses, at Figure 21B, an 8x8 array. Baum also discloses, at ¶ [0079], using 16 rows).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Botimer’s calculations to run on a PE array utilizing cache, as disclosed by Baum, because this modification merely entails a combination of prior art elements (cited above) according to known methods to yield predictable results, which is an exemplary rationale to support a conclusion of obviousness, as per MPEP § 2143.

Response to Arguments
On page 14 of the response filed September 21, 2022 (“response”), the Applicant argues, “unlike claim 1, which recites "method for performing a two dimensional (2D) convolution operation" (claim 1), the cited portions of Botimer appear to disclose a 3-dimensional (3D) convolution operation.”
Though fully considered, the Examiner respectfully disagrees. Botimer discloses throughout, e.g., Figure 1, Table 1, etc., the capability of processing matrices having height, width, and a number of channels. These three aspects represent three dimensions. However, the number of channels can be one, which discloses is a two dimensional operation. Furthermore, even when there are multiple channels, e.g., a three dimensional operation, performing the three dimensional operation is a sequence of two dimensional operations. Therefore, the Applicant’s arguments are deemed unpersuasive. 

On page 14 of the response the Applicant argues,” unlike claim 1, the cited portions of Botimer appear to be completely silent regarding "... performing a first simultaneous multiplication, by the processor, of each value of the first subset of element values of the input feature map with a first element value from among the x * y elements of the convolution kernel; shifting, by the processor, the first subset of element values one register to the left in a plurality of registers of the second storage device; based on the shifting, performing a second simultaneous multiplication, by the processor, of each value of a second subset of element values of the input feature map with a second element value from among the x * y elements of the convolution kernel, wherein the first subset of element values of the input feature map comprises values in first to p-th column of a first row of the input feature map, and the second subset of element values of the input feature map comprises values in second to (p+1)-th column of the first row of the input feature map..."  
Though fully considered, the Examiner respectfully disagrees. Botimer discloses the above features. See, e.g., Figure 6. At T=0, a first subset of input values (0,0,0) and (0,1,0) are multiplied by a first weight value (0,0,0). The input values of the first subset are in the first and second columns of the first row. At T=1, a second subset of input values (0,1,0) and (0,2,0) are multiplied by a second weight value (0,1,0). The input values of the second subset are in the second and third columns of the first row. The Examiner maintains that this discloses the claimed features. Accordingly, the Applicant’s arguments are deemed unpersuasive.

On page 15 of the response the Applicant argues the remaining claims are allowable for similar reasons. 
Though fully considered, the Examiner respectfully disagrees. The reasons set forth in the remarks and rejections presented above, including those regarding the independent claims, are applicable to these claims.

Conclusion
THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHAWN DOMAN whose telephone number is (571)270-5677.  The examiner can normally be reached on Monday through Friday 8:30am-6pm Eastern Time.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on 571-270-3995.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/SHAWN DOMAN/
Primary Examiner, Art Unit 2183