EXAMINER’S AMENDMENT
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.
Authorization for this examiner’s amendment was given in an interview with Predrag Radosavljevic, registration no. 73537 on 8/2/2022.
The application has been amended as follows: 

In Claims
1. 	(Currently Amended) A method comprising:
accessing, from a buffer, a flattened input stream that includes a set of parallel vectors, each vector in the set of parallel vectors representing a set of input values of a unique kernel-sized tile of an input tensor that is to be convolved with a kernel to generate an output activation;
receiving an expanded kernel generated by permuting values from the kernel, the expanded kernel having a plurality of kernel vectors that each correspond to an output value position of a kernel-sized tile of the output activation;
generating a control pattern by generating a value for each position of the control pattern based on coordinates of that position in the control pattern and a size of each dimension of the kernel;
receiving [[a]] the control pattern that includes a plurality of vectors, 
each of the vectors corresponding to the output value position for the kernel-sized tile of the output activation, 
each of the vectors including delay values that indicate a parallel vector of the set of parallel vectors to access input values for the convolution, a number of the vectors of the control pattern corresponding to a number of output value positions in the kernel-sized tile of the output activation, 
each of the delay values in each of the vectors of the control pattern indicating an amount of delay for which to access an individual input value in the flattened input stream, and each of the delay values specifying one parallel vector of the set of parallel vectors; and
generating, using a hardware accelerated processor, for each output value position of each kernel-sized tile of the output activation, 
a dot product between a first vector that includes values of the flattened input stream as selected by the delay values of each of the vectors of the control pattern, and a second vector corresponding to a kernel vector of the plurality of kernel vectors of the expanded kernel corresponding to the output value position.
5. 	(Currently Amended) The method of claim 1, further comprising:
padding a trailing edge of each dimension of the input tensor with padding values having a width equal to [[a]] the size of the kernel in a corresponding dimension of the kernel.



13. 	(Currently Amended) A system comprising:
a control pattern generator configured to generate a control pattern by generating a value for each position of the control pattern based on coordinates of that position in the control pattern and a size of each dimension of a kernel; and
a processor and a multiply-accumulate unit configured to:
access, from a buffer, a flattened input stream that includes a set of parallel vectors, each vector in the set of parallel vectors representing a set of input values of a unique kernel-sized tile of an input tensor that is to be convolved with [[a]] the kernel to generate an output activation[[;]],
receive an expanded kernel generated by permuting values from the kernel, the expanded kernel having a plurality of kernel vectors that each correspond to an output value position of a kernel-sized tile of the output activation[[;]], 
receive [[a]] the control pattern that includes a plurality of vectors, each of the vectors corresponding to the output value position for the kernel-sized tile of the output activation, each of the vectors including delay values that indicate a parallel vector of the set of parallel vectors to access input values for the convolution, a number of the vectors of the control pattern corresponding to a number of output value positions in the kernel-sized tile of the output activation, each of the delay values in each of the vectors of the control pattern indicating an amount of delay for which to access an individual input value in the flattened input stream, and each of the delay values specifying one parallel vector of the set of parallel vectors[[;]], and
generate, for each output value position of each kernel-sized tile of the output activation, a dot product between a first vector that includes values of the flattened input stream as selected by the delay values of each of the vectors of the control pattern, and a second vector corresponding to a kernel vector of the plurality of kernel vectors of the expanded kernel corresponding to the output value position.
19.	(Currently Amended) A system comprising:
a control pattern generator configured to generate a control pattern by generating a value for each position of the control pattern based on coordinates of that position in the control pattern and a size of each dimension of a kernel; and
a processor and a multiply-accumulate unit configured to:
access, from a buffer, a flattened input stream that includes a set of parallel vectors, each vector in the set of parallel vectors representing a set of input values of a unique kernel-sized tile of an input tensor that is to be convolved with [[a]] the kernel to generate an output activation[[;]],
receive an expanded kernel generated by permuting values from the kernel, the expanded kernel having a plurality of kernel vectors that each correspond to an output value position of a kernel-sized tile of the output activation[[;]],
receive [[a]] the control pattern that includes a plurality of vectors, each of the vectors corresponding to the output value position for the kernel-sized tile of the output activation, each of the vectors including delay values that indicate a parallel vector of the set of parallel vectors to access input values for the convolution [[;]], and
generate, for each output value position of each kernel-sized tile of the output activation, a dot product between a first vector that includes values of the flattened input stream as selected by the delay values of each of the vectors of the control pattern, and a second vector corresponding to a kernel vector of the plurality of kernel vectors of the expanded kernel corresponding to the output value position,
wherein the expanded kernel is formed by:
for a first dimension of the kernel, generating a square block of values for each single dimensional vector of the kernel that includes all rotations of that single dimensional vector;
for each additional dimension of the kernel:
grouping blocks of an immediately preceding dimension of the kernel into sets of blocks, each set of blocks including blocks of the immediately preceding dimension that are aligned along a vector that is parallel to an axis of the additional dimension;
generating, for the additional dimension, one or more blocks of values, each of the one or more blocks including all rotations of blocks within each of the sets of blocks of the immediately preceding dimension; and
outputting a block of values corresponding to a last dimension in the additional dimensions of the kernel as the expanded kernel.

Allowable Subject Matter
Claims 1-9 and 12-19 (renumbered 1-17) are allowed.
The following is an examiner’s statement of reasons for allowance: 
The arguments in applicant’s remarks filed on 7/18/2022 were fully considered and are persuasive. The applicant successfully argued the cited prior art does not teach or suggest the limitations “generating a control pattern by generating a value for each position of the control pattern based on coordinates of that position in the control pattern and a size of each dimension of the kernel" (remarks, pages 11-12).
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Anil Khatri whose telephone number is (571)272-3725. The examiner can normally be reached M-F 8:30-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, W Zhen can be reached on 571-272-3708. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/ANIL KHATRI/Primary Examiner, Art Unit 2191