DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
This office action is responsive to amendment filed on 07/11/2022 and 07/15/2022. Claims 1-17 are pending. Applicant has amended the claims to overcome the rejection under 35 U.S.C. 112(b) and part of the objections as previously set forth in Non-Fina Rejection (03/09/2022). However, Applicant fails to address the objections for claims 13-14. Accordingly, the objection for claims 13 and 14 still standing.

Response to Arguments
In response to Applicant’s argument regarding rejection under 35 U.S.C 102 in Remarks on page 5-6 “Ginzburg, as cited, does not appear to describe the use of a configuration. For at least this rationale, Ginzburg, as recited does not appear to describe claim 1.”
Examiner respectfully disagrees because Ginzburg describes the use of configuration as recited in amended claims 1 and 9. See rejection under 35 U.S.C. 102 below for details.  

Specification
The specification is objected to as failing to provide proper antecedent basis for the claimed subject matter.  See 37 CFR 1.75(d)(1) and MPEP § 608.01(o).  see the rejection under 35 U.S.C. 112(a) below.

Claim Objections
Claims 9-17  are objected to because of the following informalities:
Claim 9 line 6 recites “two dimensional data structures” should be “two-dimensional data structures”. Dependent claims are also objected for inheriting the same deficiencies in which claim they depend on.
Claim 13 line 3 recites "these registers". It is unclear whether "these registers" refers to "a plurality of packed data registers" or different registers. Examiner interpreted as "the plurality of packed data registers".
Claim 14 line 3 recites "these registers and the memory". It is unclear whether "these registers and memory" refers to "a plurality of packed data registers and memory" or different registers and memory. Examiner interpreted as "the plurality of packed data registers and memory". 
Appropriate correction is required.

Claim Rejections - 35 USC § 112(a)
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

Claim 14 is rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
Claim 14 recites “the two-dimensional data structures are a plurality of packed data registers and memory” and as antecedently recited in claim 1 the two-dimensional data structures are to be configured according to a configuration”. The specification fails to describe configuring of a plurality of packed data registers and memory according to a configuration. [0095] describes two-dimensional data structures are referred to as tiles. [0207] further describes a TILECONFIG instruction is executed to configure tile usage including setting a number of rows and columns per tile. Typically, at least one matrix (tile) is loaded from memory. [0211] in particular, an execution of the tileconfig instruction causes a configuration to be retrieved from memory and applied to matrix settings within a matrix accelerator. Accordingly, the configuring of two-dimensional data structures is the configuration of the matrices, not configuring the plurality of packed data registers and the memory as recited in claim 14.

Claim Rejections - 35 USC § 112(b)
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


Claims 14 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Claim 14 line 3 recites “wherein the two-dimensional data structures are a plurality of packed data registers and memory, and the two-dimensional data structures overlaid on these registers and the memory”. Claim 14 recites “the two-dimensional data structures overlaid on these registers and the memory”, and as objected above, the term “these registers and the memory” is interpreted as "the plurality of packed data registers and memory". Thus, when the two-dimensional data structures are a plurality of packed data registers and memory, it would be unclear how the two-dimensional data structures overlaid on themselves because these registers and the memory are referred to the plurality of packed data registers and memory. For purpose of prior art examination, Examiner interprets the limitation as two-dimensional data structures are stored in a plurality of packed data registers and memory, and the two-dimensional data structures overlaid on these registers and the memory. 

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-7, 9-17 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Ginzburg (US - 20110153707) (IDS dated 08/12/2020).

Regarding claim 1, Ginzburg teaches an apparatus (Ginzburg, figure 1, 100) comprising: matrix operations circuitry (Ginzburg, figure 1, execution unit 136, [0030] the execution unit 136 performs the operation) to execute one or more decoded matrix operation instructions (Ginzburg, figure 1 decoder 128. [0027-0028] Figure 2, decoder 128 receives a matrix operation instructions from cache and decodes the instruction) on data stored in two-dimensional data structures (Ginzburg figures 3, 5 provides example of a 4x4 matrix data and 4x4 matrix B [0035] execution unit 136 is a matrix multiply add unit perform the 2D matrix multiply add operation on data elements stored in registers); and storage to store the two-dimensional data structures according to a configuration, the configuration to at least describe a number of rows and columns per two-dimensional data structure (Ginzburg, figure 3, 5 shows the memory and register [i.e. storage] to store at least the matrix A and matrix B. [0031-0032,0035,0038] register file unit 134 stores the matrix vectors from respective source registers. Figures 3-6, in an m by m matrix operation, storage location is loaded with a first vector representing an m by m matrix (A) of data element in a row ordered format, and second storage location to loaded with a second vector representing an m by m matrix (B) of data elements in a column ordered format, storing mxm matrices A and B in row ordered format and column ordered format [i.e. a configuration describes a number of rows and columns].  Thus, the storage stores the two-dimensional data structures according to a configuration that describes a number of rows and a number of columns, in this case would be 4 rows and 4 columns).

Regarding claim 2, Ginzburg teaches the apparatus of claim 1, wherein the storage is a plurality of packed data registers and the two-dimensional data structures are overlaid on at least a subset of the plurality of packed data registers (Ginzburg, [0024] register filed unit 134 includes a plurality of registers. figure 3 illustrates the matrix A is stored in the register. [0038] mxm matrix (A) is loaded into first storage location, and mxm matrix (B) is loaded into second storage location).

Regarding claim 3, Ginzburg teaches The apparatus of claim 1, wherein the storage is a plurality of packed data registers and memory, and the two-dimensional data structures are overlaid on at least a subset of the plurality of packed data registers and memory (Ginzburg, [0024] register filed unit 134 includes a plurality of registers. Figure 3 illustrates the matrix A is stored in the register and memory. [0031] data elements are stored in memory and loaded into registers. [0038] mxm matrix (A) is loaded into first storage location, and mxm matrix (B) is loaded into second storage location).

Regarding claim 4, Ginzburg teaches the apparatus of claim 1, wherein the matrix operations circuitry is a plurality of chained fused multiply accumulate circuits (Ginzburg, figure 5-6 [0035-0037] the MMAU includes at least 4 identical sub-units, figure 6 illustrates the implementation of each sub unit as a chained FMA circuit).

Regarding claim 5, Ginzburg teaches the apparatus of claim 4, wherein each of the chained fused multiply accumulate circuits is to include storage for a portion of a two-dimensional data structure that the fused multiply accumulate circuit is to operate on (Ginzburg, figure 5 [0036-0037] illustrates that each sub-unit [i.e. chained fused multiply accumulate circuit] includes storage to store a row of matrix A and a column of matrix B).

Regarding claim 6, Ginzburg teaches the apparatus of claim 1, wherein the matrix operations circuitry supports element matrix multiply, subtract, and add instructions (Ginzburg, figure 5-6 [0035-0037] describes the unit for performing matrix multiply-add instruction. Note that MMAU performs addition operation and subtraction is performed by addition of negative values. [0019, 0031] describes that the element value are represented in 322 bit floating point number, and floating point includes a sign bit, which means that data element could be negative or positive. Thus, the MMAU supports subtraction operation).

Regarding claim 7, Ginzburg teaches the apparatus of claim 1, wherein the matrix operations circuitry supports dot product and multiply accumulate operations (Ginzburg, [0036] each sub unit multiplies each row of matrix A and column of matrix B to generate a corresponding row of dot products. A multiply add operation is performed to add the resultant dot product to a corresponding data element in matrix C).

Regarding claim 9, Ginzburg teaches a system (Ginzburg, figure 1 100) comprising: a host processor (Ginzburg, figure 1 processing element 110); and a matrix operations accelerator coupled to the host processor (Ginzburg, figure 1 the execution unit 136 and register file unit 134 [i.e. a matrix operations accelerator] is coupled to the processing element 110 [i.e. host processor]), wherein the matrix operations accelerator is to perform matrix operations on two-dimensional data structures using a computational grid [Ginzburg, figure 1, 5 [0035-007] execution unit 136 perform the 2D multiply-add operation on matrix A and B. figure 5 shows the MMAU includes at least 4 sub-unit [i.e. computational grid])  based on commands received from the host processor ([Ginzburg, [0024] if a uop corresponds to, for example, an arithmetic operation, that uop is dispatched to function unit 136, which then performs the arithmetic operation]), wherein the two dimensional data structures are to be configured according to a configuration, the configuration to at least describe a number of rows and columns per two-dimensional data structure ([0031-0032,0035,0038] register file unit 134 stores the matrix vectors from respective source registers and figures 3-6, in an m by m matrix operation, first storage location is loaded with a first vector representing an m by m matrix (A) of data element in a row ordered format, and second storage location to loaded with a second vector representing an m by m matrix (B) of data elements in a column ordered format, storing mxm matrices A and B are stored in storage location or registers as illustrated in figure 3 in row ordered format and column ordered format according to a configuration 4 rows and 4 columns[i.e. a configuration describes a number of rows and columns]).

Regarding claim 10, Ginzburg teaches the system of claim 9, wherein the matrix operations accelerator further comprises a plurality of data buffers to buffer matrix data in two-dimensional data structures (Ginzburg, figure 3 shows the memory and register to store at least the matrix A. [0024] register file unit 134 includes a plurality of registers. figure 5 shows the matrix A and matrix B are stored in the registers [i.e. buffered matrix data]).

Regarding claim 11, Ginzburg teaches the system of claim 10, wherein the computational grid is to house at least one of the buffered matrix data from the plurality of data buffers during a matrix manipulation operation (Ginzburg, figure 5, each sub unit of MMAU includes registers to store row of matrix A and column of matrix B. Note that for examination purposes, claim 11 is interpreted as dependent on system claim 10).

Regarding claim 12, Ginzburg teaches the system of claim 10, wherein the data buffers are a plurality of registers (Ginzburg, [0024] the register unit 134 includes a plurality of registers. Note that for examination purposes, claim 12 is interpreted as dependent on system claim 10).

Regarding claim 13, Ginzburg teaches the system of claim 12, wherein the plurality of registers are a plurality of packed data registers and the two-dimensional data structures are overlaid on these registers (Ginzburg, [0024] register filed unit 134 includes a plurality of registers. figure 3 illustrates the matrix A is stored in the register. [0038] mxm matrix (A) is loaded into first storage location, and mxm matrix (B) is loaded into second storage location).

Regarding claim 14, Ginzburg teaches the system of claim 12, wherein the two-dimensional data structures are a plurality of packed data registers and memory, and the two-dimensional data structures overlaid on these registers and the memory (Ginzburg, [0007 figure 3 illustrates the memory and registers to store at least the matrix A in row ordered format and column ordered format. [0038] mxm matrix (A) is loaded into first storage location in row ordered format, and mxm matrix (B) is loaded into second storage location in column ordered format as illustrated in figure 3 [i.e. two-dimensional data structures are stored in a plurality of packed data registers and memory] and the data matrix A and matrix B stored in two dimensional data structures are overlaid on these registers and memory).

Regarding claim 15, Ginzburg teaches the system of claim 9, wherein the matrix operations accelerator comprises a plurality of chained fused multiply add circuits (Ginzburg, figure 5-6 [0035-0037] the MMAU includes at least 4 identical sub-units, figure 6 illustrates the implementation of each sub unit as a chained FMA circuit).

Regarding claim 16, Ginzburg teaches the system of claim 15, wherein each of the chained fused multiply add circuits is to include storage for a portion of a two-dimensional data structure that the fused multiply add circuit is to operate on (Ginzburg, figure 5 illustrates that each sub-unit [i.e. chained fused multiply accumulate circuit] includes registers to store row of matrix A and column of matrix B).

Regarding claim 17, Ginzburg teaches the system of claim 9, further comprising a coherent memory interface coupled to the matrix operations accelerator and host processor to provide access to shared memory between the host processor and matrix operations accelerator (Ginzburg, figure 1, [0021] illustrates that execution circuitry 136 and component within the processing element 110 are connected by using interconnect local bus 124 [i.e. coherent memory interface]. Figure 1 [0024] data from memory, such as 138, 140, 112, 104, and 118, are shared between execution unit 136 [i.e. the matrix operations accelerator and process element 110 [i.e. host processor]).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Ginzburg in view of Gholaminejad (US 20180032477).

Regard claim 8, Ginzburg discloses the claimed invention as in the parent claim above, including system for performing matrix operations, but Ginzburg does not teach the matrix operation circuitry supports matrix transpose and diagonal operations. However, Gholaminejad discloses matrix operation circuitry supports matrix transpose and diagonal operations (Gholaminejad, figure 1, 4 [0012, 0018-0019] the system includes GPU configured to perform transpose operation of matrix by moving diagonally though tiles of the matrix and as shown in figure 5 [0027], the matrix are scheduled to be transposed using a staggered diagonal ordering scheme [i.e. matrix transpose and diagonal operations])
It would have been obvious for one of ordinary skills in the art before the effective filing date of the claimed invention to modify Ginzburg’s system to support matrix transpose and diagonal operations as disclosed by Gholaminejad. This modification would have been obvious because Ginzburg and Gholaminejad disclose system for performing matrix operation. Furthermore, as recognized by Gholaminejad [0001] that transpose operation is an important operation in many computing applications, and matrices are often transposed when performing other operations, for example, as part of a Fourier Transform.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HUY DUONG whose telephone number is (571)272-2764. The examiner can normally be reached Mon-Friday 7:30-5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on 571-270-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/H.D./Examiner, Art Unit 2182                                                                                                                                                                                                        (571)272-2764

/JYOTI MEHTA/Supervisory Patent Examiner, Art Unit 2182