DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claims 2-3, 6-8, and 11-14  are objected to because of the following informalities:
Claims 2 and 13 recite "these registers". It is unclear whether "these registers" refers to "a plurality of packed data registers" or different registers. Examiner interpreted as "the plurality of packed data registers".
Claims 3 and 14 recite "these registers and memory". It is unclear whether "these registers and memory" refers to "a plurality of packed data registers and memory" or different registers and memory. Examiner interpreted as "the plurality of packed data registers and memory". 
Claims 6-8 recite “matrix operations circuitry” should be “the matrix operations circuitry” as antecedently recited in claim 1.
Claim 11 recites "the system of claim 9, wherein the computational grid is to house at least one of the buffered matrix data from the plurality of data buffers during a matrix manipulation operation". However, "the buffered matrix data from the plurality of data buffers" is not antecedently recited in claim 9, but is antecedently recited in claim 10. For examination purposes, Examiner interpreted claim 11 as "the system of claim 10, wherein the computational grid is to house at least one of the buffered matrix data from the plurality of data buffers during a matrix manipulation operation."
Claim 12 recites "The system of claim 9, wherein the data buffers are a plurality of registers" should be "the system of 10, wherein the plurality of data buffers are a plurality 
Claim 13 recites "the registers" should be "the plurality of registers" as antecedently recited in claim 12.
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


Claims 14-16 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Claim 14 recites the limitation "the storage" in line 1.  There is insufficient antecedent basis for this limitation in the claim. For examination, examiner interpreted as "the plurality of data buffers".

Claim 15 recites the limitation "the matrix operations circuitry” in line 1.  There is insufficient antecedent basis for this limitation in the claim. Examiner interpreted as "the matrix operations accelerator". Dependent claim is also rejected for inheriting the same deficiencies in which claim it depends on.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-7, 9-17 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Ginzburg (US - 20110153707) (IDS dated 08/12/2020).

Regarding claim 1, Ginzburg teaches an apparatus (Ginzburg, figure 1, 100) comprising: matrix operations circuitry (Ginzburg, figure 1, execution unit 136, [0030] the execution unit 136 performs the operation) to execute one or more decoded matrix operation instructions (Ginzburg, figure 1 decoder 128. [0027-0028] Figure 2, decoder 128 receives a matrix operation instructions from cache and decodes the instruction) on data stored in two-dimensional data structures (Ginzburg figures 3, 5 provides example of a 4x4 matrix data and 4x4 matrix B [0035] execution unit 136 is a matrix multiply add unit perform the 2D matrix multiply add operation on data elements stored in registers); and storage to store the two-dimensional data structures (Ginzburg, figure 3, 5 shows the memory and register [i.e. storage] to store at least the matrix A and matrix B).

Regarding claim 2, Ginzburg teaches the apparatus of claim 1, wherein the storage is a plurality of packed data registers and the two-dimensional data structures are overlaid on these registers (Ginzburg, [0024] register filed unit 134 includes a plurality of registers. figure 3 illustrates the matrix A is stored in the register).

Regarding claim 3, Ginzburg teaches The apparatus of claim 1, wherein the storage is a plurality of packed data registers and memory, and the two-dimensional data structures are overlaid on these registers and memory (Ginzburg, [0024] register filed unit 134 includes a plurality of registers. figure 3 illustrates the matrix A is stored in the register and memory).

Regarding claim 4, Ginzburg teaches the apparatus of claim 1, wherein the matrix operations circuitry is a plurality of chained fused multiply accumulate circuits (Ginzburg, figure 5-6 [0035-0037] the MMAU includes at least 4 identical sub-units, figure 6 illustrates the implementation of each sub unit as a chained FMA circuit).

Regarding claim 5, Ginzburg teaches the apparatus of claim 4, wherein each of the chained fused multiply accumulate circuits is to include storage for a portion of a two-dimensional data structure that the fused multiply accumulate circuit is to operate on (Ginzburg, figure 5 [0036-0037] illustrates that each sub-unit [i.e. chained fused multiply accumulate circuit] includes storage to store a row of matrix A and a column of matrix B).

Regarding claim 6, Ginzburg teaches the apparatus of claim 1, wherein matrix operations circuitry supports element matrix multiply, subtract, and add instructions (Ginzburg, figure 5-6 [0035-0037] describes the unit for performing matrix multiply-add instruction. Note that MMAU performs addition operation and subtraction is performed by addition of negative values. [0019, 0031] describes that the element value are represented in 322 bit floating point number, and floating point includes a sign bit, which means that data element could be negative or positive. Thus, the MMAU supports subtraction operation).

Regarding claim 7, Ginzburg teaches the apparatus of claim 1, wherein matrix operations circuitry supports dot product and multiply accumulate operations (Ginzburg, [0036] each sub unit multiplies each row of matrix A and column of matrix B to generate a corresponding row of dot products. A multiply add operation is performed to add the resultant dot product to a corresponding data element in matrix C).

Regarding claim 9, Ginzburg teaches a system (Ginzburg, figure 1 100) comprising: a host processor (Ginzburg, figure 1 processing element 110); a matrix operations accelerator coupled to the host processor (Ginzburg, figure 1 the execution unit 136 and register file unit 134 [i.e. a matrix operations accelerator] is coupled to the processing element 110 [i.e. host processor]), wherein the matrix operations accelerator is to perform matrix operations on two-dimensional data structures using a computational grid [Ginzburg, figure 1, 5 [0035-007] execution unit 136 perform the 2D multiply-add operation on matrix A and B. figure 5 shows the MMAU includes at least 4 sub-unit [i.e. computational grid])  based on commands received from the host processor [Ginzburg, [0024] if a uop corresponds to, for example, an arithmetic operation, that uop is dispatched to function unit 136, which then performs the arithmetic operation].

Regarding claim 10, Ginzburg teaches the system of claim 9, wherein the matrix operations accelerator further comprises a plurality of data buffers to buffer matrix data in two-dimensional data structures (Ginzburg, figure 3 shows the memory and register to store at least the matrix A. [0024] register file unit 134 includes a plurality of registers. figure 5 shows the matrix A and matrix B are stored in the registers [i.e. buffered matrix data]).

Regarding claim 11, Ginzburg teaches the system of claim 9, wherein the computational grid is to house at least one of the buffered matrix data from the plurality of data buffers during a matrix manipulation operation (Ginzburg, figure 5, each sub unit of MMAU includes registers to store row of matrix A and column of matrix B. Note that for examination purposes, claim 11 is interpreted as dependent on system claim 10).

Regarding claim 12, Ginzburg teaches the system of claim 9, wherein the data buffers are a plurality of registers (Ginzburg, [0024] the register unit 134 includes a plurality of registers. Note that for examination purposes, claim 12 is interpreted as dependent on system claim 10).

Regarding claim 13, Ginzburg teaches the system of claim 12, wherein the registers are a plurality of packed data registers and the two-dimensional data structures are overlaid on these registers (Ginzburg, [0024] register filed unit 134 includes a plurality of registers. figure 3 illustrates the matrix A is stored in the register).

Regarding claim 14, Ginzburg teaches the system of claim 12, wherein the storage is a plurality of packed data registers and memory, and the two-dimensional data structures are overlaid on these registers and the memory (Ginzburg, figure 3 shows the memory and register [i.e. the storage] to store at least the matrix A. [0024] register filed unit 134 includes a plurality of registers).

Regarding claim 15, Ginzburg teaches the system of claim 9, wherein the matrix operations circuitry is a plurality of chained fused multiply add circuits (Ginzburg, figure 5-6 [0035-0037] the MMAU includes at least 4 identical sub-units, figure 6 illustrates the implementation of each sub unit as a chained FMA circuit).

Regarding claim 16, Ginzburg teaches the system of claim 15, wherein each of the chained fused multiply add circuits is to include storage for a portion of a two-dimensional data structure that the fused multiply add circuit is to operate on (Ginzburg, figure 5 illustrates that each sub-unit [i.e. chained fused multiply accumulate circuit] includes registers to store row of matrix A and column of matrix B).

Regarding claim 17, Ginzburg teaches the system of claim 9, further comprising a coherent memory interface coupled to the matrix operations accelerator and host processor to provide access to shared memory between the host processor and matrix operations accelerator (Ginzburg, figure 1, [0021] illustrates that execution circuitry 136 and component within the processing element 110 are connected by using interconnect local bus 124 [i.e. coherent memory interface]. Figure 1 [0024] data from memory, such as 138, 140, 112, 104, and 118, are shared between execution unit 136 [i.e. the matrix operations accelerator and process element 110 [i.e. host processor]).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having 

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Ginzburg in view of Gholaminejad (US 20180032477).

Regard claim 8, Ginzburg discloses the claimed invention as in the parent claim above, including system for performing matrix operations, but Ginzburg does not teach matrix operation circuitry supports matrix transpose and diagonal operations. However, Gholaminejad discloses matrix operation circuitry supports matrix transpose and diagonal operations (Gholaminejad, figure 1, 4 [0012, 0018-0019] the system includes GPU configured to perform transpose operation of matrix by moving diagonally though tiles of the matrix and as shown in figure 5 [0027], the matrix are scheduled to be transposed using a staggered diagonal ordering scheme [i.e. matrix transpose and diagonal operations])
It would have been obvious for one of ordinary skills in the art before the effective filing date of the claimed invention to modify Ginzburg’s system to support matrix transpose and diagonal operations as disclosed by Gholaminejad. This modification would have been obvious because Ginzburg and Gholaminejad disclose system for performing matrix operation. Furthermore, as recognized by Gholaminejad [0001] that transpose operation is an important operation in many computing applications, and matrices are often transposed when performing other operations, for example, as part of a Fourier Transform.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Hansen US – 7932910

Cloutier US – 5892962
Cloutier discloses a system that includes a matrix of processing element to performing matrix operation, where in the matrix is a set of PEs interconnected in a 2D grid topology (see column 5 line 7-8 in D2) 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HUY DUONG whose telephone number is (571)272-2764. The examiner can normally be reached Mon-Friday 7:30-5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on 571-270-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.







/H.D./Examiner, Art Unit 2182                                                                                                                                                                                                        (571)272-2764


/JYOTI MEHTA/Supervisory Patent Examiner, Art Unit 2182