Examiner Comment
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
EXAMINER’S AMENDMENT
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.
Authorization for this examiner’s amendment was given in an interview with applicant’s representative, Attorney for Applicant, Scott Simmons, Registration Number 60206, on 01/20/2022. See also attached Interview Summary.



	In claim 4 at line 13 after the phrase “clocked flip-flop circuit by”
		Delete “a”
		Insert --the--
	In claim 4 at line 17 after the phrase “clocked flip-flop circuit by”
		Delete “a”
		Insert --the--
In claim 12 at line 12 after the phrase “clocked flip-flop circuit by”
		Delete “a”
		Insert --the--
In claim 12 at line 16 after the phrase “clocked flip-flop circuit by”
		Delete “a”
		Insert --the--
In claim 20 at line 13 after the phrase “clocked flip-flop circuit by”
		Delete “a”
		Insert --the--
In claim 20 at line 17 after the phrase “clocked flip-flop circuit by”
		Delete “a”
		Insert --the--

Reasons for Allowance
The following is an examiner’s statement of reasons for allowance:
Claim 1 is directed to an apparatus comprising: a matrix operations accelerator circuit comprising a two-dimensional grid of multiplier circuits; a first plurality of registers that represents a first two-dimensional matrix coupled to the matrix operations accelerator circuit; a second plurality of registers that represents a second two-dimensional matrix coupled to the matrix operations accelerator circuit; a decoder, of a core coupled to the matrix operations accelerator circuit, to decode a single instruction into a decoded single instruction; and an execution circuit of the core to execute the decoded single instruction to: store each element of the first two-dimensional matrix from the first plurality of registers into a respective clocked flip-flop circuit of each multiplier circuit of the two-dimensional grid of multiplier circuits, store a first element of a first proper subset of elements of the second two-dimensional matrix from the second plurality of registers into a single first clocked flip-flop circuit coupled to a first proper subset of multiplier circuits of the two-dimensional grid of multiplier circuits, store a second element of the first proper subset of elements of the second two-dimensional matrix from the second plurality of registers into a single second clocked flip-flop circuit coupled to a second proper subset of multiplier circuits of the two-dimensional grid of multiplier circuits, multiply the first element of the first proper subset of elements from the single first clocked flip-flop circuit by a respective element from the clocked flip-flop circuit of each multiplier circuit of the first proper subset of multiplier circuits to generate a first plurality of resultants, and multiply the second element of the first proper subset of elements from the single second clocked flip-flop circuit by a respective element from the clocked flip-flop circuit of each multiplier circuit of the second proper subset of multiplier circuits to generate a second plurality of resultants.
The primary reasons for indicating allowable subject matter is the concept of storing a first element of a first proper subset of elements of the second two-dimensional matrix from the second 
Liao et al. (US-PGPUB 20200142949 A1) is the closest found prior art. Liao discloses an accelerator comprising: a first memory configured to store a first two-dimensional matrix, a second memory configured to store a second two-dimensional matrix, an operation circuit coupled to the first and second memory that includes a plurality of multiplier/operation units, where each operation unit is configured to receive and multiply two pieces of data, one from the first matrix and one from the second matrix, an adder circuit to add calculation results of the operation units, and a controller configured to control operation of the operation circuit by storing each element of the second matrix into a respective operation unit as shown in Fig. 9 where each element/data of matrix B in paragraph [0069] is written in a respective operation unit of an operation group, storing a first element A11 of the first matrix to a first row of the first operation group, storing a second element A12 of the matrix to a second row of the first operation group as shown in Fig. 11, and multiplying the matrix elements shown in Fig. 12 stored in each of the multiplier units. Further, Liao discloses that the multiply operation is performed at the same time. Further, Liao discloses that the accelerator may be used as a coprocessor 11 and the second element A12 of the first matrix is stored in each operation unit of respective rows of the first operation group of the operation circuit. Further, Liao does not teach or suggest using a single first flip-flop circuit connected to the first row of the first operation group of the operation circuit to store the first element A11 of the first matrix, and single second flip-flop circuit connected to the second row of the first operation group of the operation circuit to store the second element A12 of the first matrix. Therefore, Liao fails to explicitly teach or suggest storing a first element of a first proper subset of elements of the second two-dimensional matrix from the second plurality of registers into a single first clocked flip-flop circuit coupled to a first proper subset of multiplier circuits of the two-dimensional grid of multiplier circuits, storing a second element of the first proper subset of elements of the second two-dimensional matrix from the second plurality of registers into a single second clocked flip-flop circuit coupled to a second proper subset of multiplier circuits of the two-dimensional grid of multiplier circuits, multiplying the first element of the first proper subset of elements from the single first clocked flip-flop circuit by a respective element from the clocked flip-flop circuit of each multiplier circuit of the first proper subset of multiplier circuits to generate a first plurality of resultants, and multiplying the second element of the first proper subset of elements from the single second clocked flip-flop circuit by a respective element from the clocked flip-flop circuit of each multiplier circuit of the second proper subset of multiplier circuits to generate a second plurality of resultants as recited in claim 1.
Chilappagari et al. (US-PGPUB 20200341772 A1) discloses an apparatus comprising of a first register file for storing a first matrix operand A, a second register file for storing a second matrix operand B, an operand A distribution circuit, an operand B distribution circuit, and a SIMD block that includes a plurality of multipliers. Further, Chilappagari discloses multiple different input distribution schemes for performing matrix multiplication using matrix A and matrix B including the configuration 
Mansell et al. (US-PGPUB 20200117450 A1) discloses an apparatus comprising: a register storage circuitry having a plurality of registers for storing matrix data elements, decoder circuitry responsive to a matrix multiply instruction to generate control signals, wherein the matrix multiply instruction specifies in the plurality of registers: a first source register, and second source register, and a destination register; and data processing circuitry responsive to the control signals to perform a matrix multiply operation comprising: extracting a first matrix of data elements from the first source register; extracting a second matrix of data elements from the second source register; performing plural dot 
Claims 9 and 17 recite substantially the same limitations as claim 1 and are allowed for the same reasons provided above. Claims 2-8, 10-16, and 18-24 are dependent on claims 1, 9, and 17 respectively and are also allowed for the same reasons provided above.
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Carlo Waje whose telephone number is (571)272-5767. The examiner can normally be reached 9:00-6:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on (571) 270-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.


/C.W./
Carlo WajeExaminer, Art Unit 2182                                                                                                                                                                                                        (571)272-5767




/EMILY E LAROCQUE/Primary Examiner, Art Unit 2182