DETAILED ACTION
This is in response to communication filed on December 30, 2020.
Status of Claims
Claims 1 – 20 and 22 are pending, of which claims 1, 8, 15, and 22 are in independent form.

Information Disclosure Statement
The information disclosure statements (IDS) submitted on 8/12/2020 and 2/1/2021 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statements are being considered by the examiner.

Claim Rejections - 35 USC § 112
In light of applicant’s amendments to the claims, the examiner withdraws the previous rejection to the claims under 35 USC 112.

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:


Claims 1 – 18, 20, and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Moyer, U.S. Patent Application 2005/0053012 (hereinafter referred to as Moyer) in view of Hughes et al., U.S. Patent Application 2015/0052333 (hereinafter referred to as Hughes), further in view of Yadavalli, U.S. Patent Application 2017/0337156 (hereinafter referred to as Yadavalli).

Referring to claim 1, Moyer discloses “A processor” (Fig. 1 processor 14) and an “instruction having fields for an opcode, a destination” “matrix operand identifier, and source memory information” (Figs. 2-7 are load instructions with opcode, rD, rA, rB); “and execution circuitry” (Fig. 1 execution units 32) “to execute the” “instruction to load groups of strided data elements from memory into configured rows of the identified destination” “matrix operand” ([0045] instruction load stream of vector elements (Istrmvex) loads multiple instances of destination with a 'cnt' number of elements from memory and [0118] Istrmvex loads each row of matrix 102 in turn).
Moyer does not appear to explicitly disclose the processor “comprising: decode circuitry to decode” an instruction and executing “the decoded” “instruction.”
However, Hughes discloses another processor (Fig. 18) that executes an instruction to load strided data into a destination vector ([0041] – [0046]).  Hughes further describes “decode circuitry to decode an instruction” and executing “the decoded instruction” (Fig. 4 instruction is decoded at 403 and Fig. 18 decode unit 1832).

Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Moyer and Hughes before him or her, to modify the teachings of Moyer to include the teachings of Hughes so that a decode unit is present to decode the matrix load instruction.
The motivation for doing so would have been to utilize commonly provided components of a processor, such as an instruction decoder.  Instruction decoders are known in the art and are used for translating instruction opcodes into the correct instruction procedure that the CPU will do next (as stated by provided NPL lateblt tripod archived from the internet in 2016).
Neither Moyer nor Hughes appears to explicitly disclose “a single instruction having fields for an opcode, a destination multi-dimensional matrix identifier.” Also, it follows that neither Moyer nor Hughes appears to explicitly disclose “the decoded single instruction” and “the identified destination multi-dimensional matrix operand.”
However, Yadavalli discloses a single instruction for loading multi-dimensional matrix data elements (Figs. 3(a), 3(b), 5(b) along with [0001] a set of machine instructions and methods to load, store and compute with these matrices, [0034] inside a Microprocessor [300] an embedded Random Access Memory (RAM) based storage [301] called Matrix Space is used to hold a plurality of Matrices (Matrixes) [310, 311, 312, 313], Matroids [314] (arrays of higher than 2 dimensions used in mathematics, physics and engineering) or multi-dimensional (numerical and non-numerical) any generic Arrays [315] for computation inside a processing unit. [0038] loading a matrix from system memory into matrix space [0059] - [0060] loading a matrix from system memory, LOAD Matrix instruction is decoded, the transfer can occur either by writing the rows or columns or both, into [301]).
Moyer, Hughes, and Yadavalli are analogous art because they are from the same field of endeavor, which is loading strided data from a memory.
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Moyer, Hughes, and Yadavalli before him or her, to modify the teachings of Moyer and Hughes to include the teachings of Yadavalli so that a single instruction loads multi-dimensional matrix data.
The motivation for doing so would have been to avoid complexity and extensibility of prior methods (as stated by Yadavalli at [0020]).
Therefore, it would have been obvious to combine Yadavalli with Moyer and Hughes to obtain the invention as specified in the instant claim.

	As per claim 2, Moyer discloses “the opcode defines a size of each data element of the destination” “matrix operand” (Figs. 2 – 7 and [0026] ds destination element size).  
Also, Hughes discloses “the opcode defines a size of each data element of the destination” “matrix operand” ([0145] and [0178] N is determined based on the full opcode field 1474).
Further, as above, Yadavalli discloses “the destination multi-dimensional matrix” (Figs. 3(a), 3(b), 5(b) along with [0001] a set of machine instructions and methods to load, store and compute with these matrices, [0034] inside a Microprocessor [300] an embedded Random Access Memory (RAM) based storage [301] called Matrix Space is used to hold a plurality of Matrices (Matrixes) [310, 311, 312, 313], Matroids [314] (arrays of higher than 2 dimensions used in mathematics, physics and engineering) or multi-dimensional (numerical and non-numerical) any generic Arrays [315] for computation inside a processing unit. [0038] loading a matrix from system memory into matrix space [0059] - [0060] loading a matrix from system memory, LOAD Matrix instruction is decoded, the transfer can occur either by writing the rows or columns or both, into [301]).
As above, Moyer, Hughes, and Yadavalli are analogous art because they are from the same field of endeavor, which is loading strided data from a memory.
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Moyer, Hughes, and Yadavalli before him or her, to modify the teachings of Moyer and Hughes to include the teachings of Yadavalli so that a single instruction loads multi-dimensional matrix data.
The motivation for doing so would have been to avoid complexity and extensibility of prior methods (as stated by Yadavalli at [0020]).
Therefore, it would have been obvious to combine Yadavalli with Moyer and Hughes to obtain the invention as specified in the instant claim.

	As per claim 3, Moyer discloses “the size of each data element of the destination” “matrix operand is a doubleword” ([0025] vector element may be a word (32 bits).  Note that Applicant’s description states that a word is 16-bit, doubleword is 32-bit at Applicant’s [0089]).
[0134] different vector operand sizes including 32 bit or 16 bit).
Further, as above, Yadavalli discloses “the destination multi-dimensional matrix” (Figs. 3(a), 3(b), 5(b) along with [0001] a set of machine instructions and methods to load, store and compute with these matrices, [0034] inside a Microprocessor [300] an embedded Random Access Memory (RAM) based storage [301] called Matrix Space is used to hold a plurality of Matrices (Matrixes) [310, 311, 312, 313], Matroids [314] (arrays of higher than 2 dimensions used in mathematics, physics and engineering) or multi-dimensional (numerical and non-numerical) any generic Arrays [315] for computation inside a processing unit. [0038] loading a matrix from system memory into matrix space [0059] - [0060] loading a matrix from system memory, LOAD Matrix instruction is decoded, the transfer can occur either by writing the rows or columns or both, into [301]).
As above, Moyer, Hughes, and Yadavalli are analogous art because they are from the same field of endeavor, which is loading strided data from a memory.
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Moyer, Hughes, and Yadavalli before him or her, to modify the teachings of Moyer and Hughes to include the teachings of Yadavalli so that a single instruction loads multi-dimensional matrix data.
The motivation for doing so would have been to avoid complexity and extensibility of prior methods (as stated by Yadavalli at [0020]).


	As per claim 4, Moyer discloses “the size of each data element of the destination” “matrix operand is a word” ([0025] vector element may be a halfword (16 bits).  Note that Applicant’s description states that a word is 16-bit, doubleword is 32-bit at Applicant’s [0089]).
	Also, Hughes discloses “the size of each data element of the destination” “matrix operand is a doubleword” ([0134] different vector operand sizes including 32 bit or 16 bit).
Further, as above, Yadavalli discloses “the destination multi-dimensional matrix” (Figs. 3(a), 3(b), 5(b) along with [0001] a set of machine instructions and methods to load, store and compute with these matrices, [0034] inside a Microprocessor [300] an embedded Random Access Memory (RAM) based storage [301] called Matrix Space is used to hold a plurality of Matrices (Matrixes) [310, 311, 312, 313], Matroids [314] (arrays of higher than 2 dimensions used in mathematics, physics and engineering) or multi-dimensional (numerical and non-numerical) any generic Arrays [315] for computation inside a processing unit. [0038] loading a matrix from system memory into matrix space [0059] - [0060] loading a matrix from system memory, LOAD Matrix instruction is decoded, the transfer can occur either by writing the rows or columns or both, into [301]).
As above, Moyer, Hughes, and Yadavalli are analogous art because they are from the same field of endeavor, which is loading strided data from a memory.

The motivation for doing so would have been to avoid complexity and extensibility of prior methods (as stated by Yadavalli at [0020]).
Therefore, it would have been obvious to combine Yadavalli with Moyer and Hughes to obtain the invention as specified in the instant claim.

	As per claim 5, Moyer discloses “the execution circuitry is to store each configured row into the identified destination” “matrix operand and update a counter value as each row is stored” ([0040], [0042], [0045] counter used to keep track of ‘cnt’, and [0118] loads each row of matrix 102 in turn).
Further, as above, Yadavalli discloses “the destination multi-dimensional matrix” (Figs. 3(a), 3(b), 5(b) along with [0001] a set of machine instructions and methods to load, store and compute with these matrices, [0034] inside a Microprocessor [300] an embedded Random Access Memory (RAM) based storage [301] called Matrix Space is used to hold a plurality of Matrices (Matrixes) [310, 311, 312, 313], Matroids [314] (arrays of higher than 2 dimensions used in mathematics, physics and engineering) or multi-dimensional (numerical and non-numerical) any generic Arrays [315] for computation inside a processing unit. [0038] loading a matrix from system memory into matrix space [0059] - [0060] loading a matrix from system memory, LOAD Matrix instruction is decoded, the transfer can occur either by writing the rows or columns or both, into [301]).
As above, Moyer, Hughes, and Yadavalli are analogous art because they are from the same field of endeavor, which is loading strided data from a memory.
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Moyer, Hughes, and Yadavalli before him or her, to modify the teachings of Moyer and Hughes to include the teachings of Yadavalli so that a single instruction loads multi-dimensional matrix data.
The motivation for doing so would have been to avoid complexity and extensibility of prior methods (as stated by Yadavalli at [0020]).
Therefore, it would have been obvious to combine Yadavalli with Moyer and Hughes to obtain the invention as specified in the instant claim.

	As per claim 6, Moyer discloses “the identified destination” “matrix operand is a plurality of registers configured to represent a matrix” (Figs. 12 – 26 and [0079] register file represents a matrix).
	Also, Hughes discloses “the identified destination” “matrix operand is a plurality of registers configured to represent a matrix” ([0041] destination vector register.  A vector is a single row matrix).
Further, as above, Yadavalli discloses “the destination multi-dimensional matrix” (Figs. 3(a), 3(b), 5(b) along with [0001] a set of machine instructions and methods to load, store and compute with these matrices, [0034] inside a Microprocessor [300] an embedded Random Access Memory (RAM) based storage [301] called Matrix Space is used to hold a plurality of Matrices (Matrixes) [310, 311, 312, 313], Matroids [314] (arrays of higher than 2 dimensions used in mathematics, physics and engineering) or multi-dimensional (numerical and non-numerical) any generic Arrays [315] for computation inside a processing unit. [0038] loading a matrix from system memory into matrix space [0059] - [0060] loading a matrix from system memory, LOAD Matrix instruction is decoded, the transfer can occur either by writing the rows or columns or both, into [301]).
As above, Moyer, Hughes, and Yadavalli are analogous art because they are from the same field of endeavor, which is loading strided data from a memory.
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Moyer, Hughes, and Yadavalli before him or her, to modify the teachings of Moyer and Hughes to include the teachings of Yadavalli so that a single instruction loads multi-dimensional matrix data.
The motivation for doing so would have been to avoid complexity and extensibility of prior methods (as stated by Yadavalli at [0020]).
Therefore, it would have been obvious to combine Yadavalli with Moyer and Hughes to obtain the invention as specified in the instant claim.

	As per claim 7, Moyer discloses “the source memory information includes a scale, an index, a base, and a displacement” (Fig. 7 cnt, rcnt, stride, skip, skip_cnt).
	Also, Hughes discloses “the source memory information includes a scale, an index, a base, and a displacement” (Fig. 1 and [0041] base, scale, stride, and displacement).

Referring to claim 8, claim 1 recites the corresponding limitations as that of claim 8.  Therefore, the rejection of claim 1 applies to claim 8. 

Note, claim 9 recites the corresponding limitations of claim 2.  Therefore, the rejection of claim 2 applies to claim 9.

Note, claim 10 recites the corresponding limitations of claim 3.  Therefore, the rejection of claim 3 applies to claim 10.

Note, claim 11 recites the corresponding limitations of claim 4.  Therefore, the rejection of claim 4 applies to claim 11.

Note, claim 12 recites the corresponding limitations of claim 5.  Therefore, the rejection of claim 5 applies to claim 12.

Note, claim 13 recites the corresponding limitations of claim 6.  Therefore, the rejection of claim 6 applies to claim 13.

Note, claim 14 recites the corresponding limitations of claim 7.  Therefore, the rejection of claim 7 applies to claim 14.


Referring to claim 15, claim 1 recites the corresponding limitations as that of claim 15.  Therefore, the rejection of claim 1 applies to claim 15. 
Also, Hughes discloses “A non-transitory machine-readable medium storing an instruction which causes a processor to perform a method” of claim 1 ([0272] – [0274] machine-readable media).

Note, claim 16 recites the corresponding limitations of claim 2.  Therefore, the rejection of claim 2 applies to claim 16.

Note, claim 17 recites the corresponding limitations of claim 3.  Therefore, the rejection of claim 3 applies to claim 17.

Note, claim 18 recites the corresponding limitations of claim 4.  Therefore, the rejection of claim 4 applies to claim 18.

Note, claim 20 recites the corresponding limitations of claim 6.  Therefore, the rejection of claim 6 applies to claim 20.

Referring to claim 22, claim 1 recites the corresponding limitations as that of claim 22.  Therefore, the rejection of claim 1 applies to claim 22. 
Also, Hughes discloses “A system comprising: a processor; and an accelerator coupled to the processor, the accelerator including” the features of claim 1 (Fig. 19 and [0252] – [0257] processors, additional processors, and graphics accelerators).

Response to Arguments
Applicant’s arguments with respect to claims 1 – 20 and 22 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 


The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
U.S. Patent 9,519,947 discloses surface memory load and surface memory store instructions with multidimensional matrix operands.
U.S. Patent 9,996,350 discloses decoding an instruction for prefetching multidimensional array elements.
Other pertinent art includes WIPO WO 2016105841 A1, US 20190205137, US 20170220352, and US 20160188337.

Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to STEVEN G SNYDER whose telephone number is (571)270-1971.  The examiner can normally be reached on M-F 8:00am-4:30pm (flexible).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.  
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Henry Tsai can be reached on 571-272-4176.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.


/STEVEN G SNYDER/Primary Examiner, Art Unit 2184