DETAILED ACTION
Claims 1-24 are pending.
The office acknowledges the following papers:
Claims, specification, drawings, and remarks filed on 3/19/2021,
IDS filed on 5/10/2021.

	Withdrawn objections and rejections
The drawing objections have been withdrawn due to amendment.
The specification objections have been withdrawn due to amendment.

New Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1-9, 11, and 16-19 are rejected under 35 U.S.C. 103 as being unpatentable over Ansari et al. (U.S. 7,590,823), in view of Eaton (U.S. 4,187,539).
As per claim 1:
Ansari and Eaton disclosed a hardware accelerator comprising: 

a programmable instruction schema mapping table implemented as a content addressable memory (CAM) and programmed to map the plurality of opcodes to a plurality of definitions of operands in a plurality of instructions (Eaton: Figure 4 element 16, column 6 lines 59-61)(Ansari: Figures 2, 7C, and 24 elements 231, 233, 723-724, 730, and 2412, column 11 lines 66-67 continued to column 12 lines 1-3, column 14 lines 47-50, column 14 lines 65-67 continued to column 15 lines 1-3, column 15 lines 28-31, column 17 lines 30-35, column 38 lines 18-23, column 38 lines 42-55, and column 39 lines 40-50)(Eaton disclosed a decoder being implemented as a CAM. The combination implements the instruction decoder within the FCM of Ansari as a CAM. The combination maps an UDI opcode to the decoder output from the CAM. The UDI register shows operands that are used for the UDI. These operands are decoded for execution.); 
a hardware execution engine (Ansari: Figure 2 elements 230-232, column 7 lines 63-67); and
a controller configured to:
receive an instruction that includes a first opcode of the plurality of opcodes (Ansari: Figure 2 element 220, column 7 lines 1-12)(The APU controller receives coprocessor UDI instructions containing opcodes.);
control the hardware instruction decoder to extract the first opcode from the 
obtain, from the instruction schema mapping table and based on the first opcode, a first definition of a first operand, wherein the first definition specifies a location and a number of bits of the first operand in the instruction (Eaton: Figure 4 element 16, column 6 lines 59-61)(Ansari: Figures 2, 7C, and 24 elements 231, 233, 723-724, 730, and 2412, column 11 lines 66-67 continued to column 12 lines 1-3, column 14 lines 47-50, column 14 lines 65-67 continued to column 15 lines 1-3, column 15 lines 28-31, column 17 lines 30-35, column 38 lines 18-23, column 38 lines 42-55, and column 39 lines 40-50)(The combination maps an UDI opcode to the decoder output from the CAM. The UDI register shows operands that are used for the UDI. These operands are decoded for execution. The UDI opcode values identify the UDI encoding format that indicates the bit location and number of bits for a first operand (i.e. bits 28-30).); and
forward the instruction and the first definition to the hardware execution engine to control the hardware execution engine (Ansari: Figures 2 and 24 elements 230-233 and 2412)(The decoded instruction controls and operands are sent to the execution units for execution.) to:
extract the first operand from the instruction based on the first definition (Eaton: Figure 4 element 16, column 6 lines 59-61)(Ansari: Figures 2, 7C, and 24 elements 231, 233, 723-724, 730, and 2412, column 
execute the instruction based on the first operand (Ansari: Figures 2 and 24 elements 230-233 and 2412)(The decoded instruction controls and operands are sent to the execution units for execution.).
Ansari disclosed a coprocessor decoder, but doesn’t show the exact implementation of how the decoding occurs. One of ordinary skill in the art would have been motivated by this lack of teaching to find the Eaton reference showing a decoder implemented as a content addressable memory (CAM). Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to implement the decoder of Ansari as a CAM for the advantage of quickly providing the corresponding control signals for a given UDI encoding output by the decoder controller interface.
As per claim 2:
Ansari and Eaton disclosed the hardware accelerator of claim 1, wherein the hardware instruction decoder is programed based on an instruction schema program that 
As per claim 3:
Ansari and Eaton disclosed the hardware accelerator of claim 1, wherein each of the plurality of definitions specifies a location and a number of bits of operands in the respective instruction (Ansari: Figures 2, 5, 7C, and 24 elements 222-223, 503, 720, 2402, and 2410, column 7 lines 8-22, column 13 lines 12-19, column 14 lines 16-28, column 14 lines 55-63, column 17 lines 30-35, column 38 lines 18-23, and column 39 lines 4-24)(The instruction decoder and decode/UDI/configurable instruction registers store user-defined instructions. The UDI opcode values identify the UDI encoding format that indicates the bit location and number of bits for a first operand (i.e. bits 28-30).).
As per claim 4:
Ansari and Eaton disclosed the hardware accelerator of claim 3, wherein the instruction schema mapping table is programmable based on the instruction schema program that further specifies locations and sizes of bits of the operands in the instructions (Eaton: Figure 4 element 16, column 6 lines 59-61)(Ansari: Figures 2, 7C, and 24 elements 231, 233, 723-724, 730, and 2412, column 11 lines 66-67 continued to column 12 lines 1-3, column 14 lines 47-50, column 14 lines 65-67 continued to column 
As per claim 5:
Ansari and Eaton disclosed the hardware accelerator of claim 1, wherein the CAM maps the opcodes to addresses of the definitions in the CAM (Eaton: Figure 4 element 16, column 6 lines 59-61)(Ansari: Figures 2, 7C, and 24 elements 231, 233, 723-724, 730, and 2412, column 11 lines 66-67 continued to column 12 lines 1-3, column 14 lines 47-50, column 14 lines 65-67 continued to column 15 lines 1-3, column 15 lines 28-31, column 17 lines 30-35, column 38 lines 18-23, column 38 lines 42-55, and column 39 lines 40-50)(Eaton disclosed a decoder being implemented as a CAM. The combination implements the instruction decoder within the FCM of Ansari as a CAM. The combination maps an UDI opcode to the decoder output from the CAM. The CAM decoder outputs control signals corresponding to the UDI.); and 
wherein the mapping enables the first definition to be retrieved from the CAM based on the opcode (Eaton: Figure 4 element 16, column 6 lines 59-61)(Ansari: Figures 2, 7C, and 24 elements 231, 233, 723-724, 730, and 2412, column 11 lines 66-67 continued to column 12 lines 1-3, column 14 lines 47-50, column 14 lines 65-67 
As per claim 6:
Claim 6 essentially recites the same limitations of claim 1. Therefore, claim 6 is rejected for the same reasons as claim 1.
As per claim 7:
Claim 7 essentially recites the same limitations of claim 2. Claim 7 additionally recites the following limitations:
extract bits of the opcode from either a single byte of the instruction or from multiple bytes of the instruction (Ansari: Figures 7 and 24 elements 711, 2403, and 2411, column 13 lines 10-19, column 17 lines 30-35, and column 39 lines 13-24)(An opcode of the stored UDI instruction is extracted to be compared with opcodes in the configuration instruction registers. The opcode is multiple bytes.).
As per claim 8:
Ansari and Eaton disclosed the hardware accelerator of claim 7, wherein the instruction schema program specifies locations and sizes of a plurality of sets of bits of the opcode in a plurality of bytes of the instruction (Ansari: Figures 2, 5, 7C, and 24 elements 222-223, 503, 720, 2402, and 2410, column 7 lines 8-22, column 13 lines 12-
wherein the hardware instruction decoder is programmed based on the instruction schema program to extract the plurality of sets of bits of the opcode from the plurality of bytes of the instruction and to combine the plurality of sets of bits to extract the opcode (Ansari: Figures 7 and 24 elements 711, 2403, and 2411, column 13 lines 10-19, column 17 lines 30-35, and column 39 lines 13-24)(An opcode of the stored UDI instruction is extracted to be compared with opcodes in the configuration instruction registers. The opcode is multiple bytes.).
As per claim 9:
The additional limitation(s) of claim 9 basically recite the additional limitation(s) of claim 4. Therefore, claim 9 is rejected for the same reason(s) as claim 4.
As per claim 11:
The additional limitation(s) of claim 11 basically recite the additional limitation(s) of claim 5. Therefore, claim 11 is rejected for the same reason(s) as claim 5.
As per claim 16:
Claim 16 essentially recites the same limitations of claim 1. Therefore, claim 16 is rejected for the same reasons as claim 1.
As per claim 17:
The additional limitation(s) of claim 17 basically recite the additional limitation(s) of 
As per claim 18:
The additional limitation(s) of claim 18 basically recite the additional limitation(s) of claim 8. Therefore, claim 18 is rejected for the same reason(s) as claim 8.
As per claim 19:
The additional limitation(s) of claim 19 basically recite the additional limitation(s) of claim 9. Therefore, claim 19 is rejected for the same reason(s) as claim 9.

Claims 10, 12, 20-21, and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Ansari et al. (U.S. 7,590,823), in view of Eaton (U.S. 4,187,539), further in view of Official Notice.
As per claim 10:
Ansari and Eaton disclosed the hardware accelerator of claim 9, wherein the instruction schema mapping table is programmable during the execution of an instruction file comprising a first instruction and a second instruction by the hardware execution engine (Eaton: Figure 4 element 16, column 6 lines 59-61)(Ansari: Figures 2, 7C, and 24 elements 231, 233, 723-724, 730, and 2412, column 11 lines 66-67 continued to column 12 lines 1-3, column 14 lines 47-50, column 14 lines 65-67 continued to column 15 lines 1-3, column 15 lines 28-31, column 17 lines 30-35, column 38 lines 18-23, column 38 lines 42-55, and column 39 lines 40-50)(Eaton disclosed a decoder being implemented as a CAM. The combination implements the programmable instruction decoder within the FCM of Ansari as a programmable CAM. Official notice is given that programmable logic can be reprogrammed during execution for the advantage of implementing other types of 
wherein the instruction schema mapping table is programmed based on a first instruction schema program to provide a first instruction schema for the first instruction to the hardware execution engine (Eaton: Figure 4 element 16, column 6 lines 59-61)(Ansari: Figures 2, 7C, and 24 elements 231, 233, 723-724, 730, and 2412, column 11 lines 66-67 continued to column 12 lines 1-3, column 14 lines 47-50, column 14 lines 65-67 continued to column 15 lines 1-3, column 15 lines 28-31, column 17 lines 30-35, column 38 lines 18-23, column 38 lines 42-55, and column 39 lines 40-50)(Eaton disclosed a decoder being implemented as a CAM. The combination implements the programmable instruction decoder within the FCM of Ansari as a programmable CAM. An initial programming of the decoder reads upon the first instruction schema program.); and 
wherein the instruction schema mapping table is programmed based on a second instruction schema program to provide a second instruction schema for the second instruction to the hardware execution engine (Eaton: Figure 4 element 16, column 6 lines 59-61)(Ansari: Figures 2, 7C, and 24 elements 231, 233, 723-724, 730, and 2412, column 11 lines 66-67 continued to column 12 lines 1-3, column 14 lines 47-50, column 14 lines 65-67 continued to column 15 lines 1-3, column 15 lines 28-31, column 17 lines 30-35, column 38 lines 18-23, column 38 lines 42-55, and column 39 lines 40-50)(Eaton disclosed a decoder being implemented as a CAM. The combination implements the programmable instruction decoder within the FCM of Ansari as a programmable CAM. In 
As per claim 12:
Ansari and Eaton disclosed the hardware accelerator of claim 6, wherein the operand is a first operand (Ansari: Figure 7A element 701, column 13 lines 27-46)(A load/store UDI includes at least operands pointing to a base address register and/or offset.);
wherein the hardware accelerator further comprises a memory access circuit (Ansari: Figures 3 and 7A element 308, column 11 lines 61-66)(The decoder decodes load/store UDIs. The logic to process load/store UDIs to access memory reads upon the memory access circuit.);
wherein the controller is configured to: 
obtain, from the instruction schema mapping table and based on the opcode, a first definition of the first operand and a second definition of a second operand, wherein the first definition and the second definition specify a location and a number of bits of, respectively, the first operand and the second operand in the instruction (Ansari: Figures 3 and 7A element 304, column 11 lines 63-65 and column 13 lines 8-46)(Decoding the load/store UDI identifies operands for the source and destination of data to be loaded/stored. Official notice is given that load and store instructions include multiple operands to indicate memory addresses to load from/store data to, as well as operands storing data to be written to memory or operands indicating where data is to be written to from memory. Thus, it would have been obvious to one of ordinary skill in the art that 
forward the instruction and the second definition to the memory access circuit to control the memory access circuit to extract the second operand from the instruction and to perform a memory access operation based on the second operand to support the execution of the instruction by the hardware execution engine (Ansari: Figures 3 and 7A element 308, column 11 lines 61-66)(The decoder decodes load/store UDIs. The logic to process load/store UDIs to access memory reads upon the memory access circuit that performs the load/store functionality.).
As per claim 20:
The additional limitation(s) of claim 20 basically recite the additional limitation(s) of claim 10. Therefore, claim 20 is rejected for the same reason(s) as claim 10.
As per claim 21:
The additional limitation(s) of claim 21 basically recite the additional limitation(s) of claim 12. Therefore, claim 21 is rejected for the same reason(s) as claim 12.
As per claim 23:
Ansari and Eaton disclosed the hardware accelerator of claim 21, wherein the second operand specifies a condition to be satisfied to perform a write operation with the memory access circuit (Ansari: Figures 3 and 7A element 304, column 11 lines 63-65 and column 13 lines 8-46)(Decoding the load/store UDI identifies operands for the source and destination of data to be loaded/stored. Official notice is given that .

Claims 13 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Ansari et al. (U.S. 7,590,823), in view of Eaton (U.S. 4,187,539), in view of official notice, further in view of Gu et al. (U.S. 2020/0184001).
As per claim 13:
Ansari and Eaton disclosed the hardware accelerator of claim 12.
Ansari and Eaton failed to teach further comprising an on-chip memory; wherein the hardware execution engine comprises a systolic array; wherein the opcode controls the systolic array to perform computations to generate intermediate outputs; and wherein the memory access operation fetches input data elements and weight elements from the on-chip memory to the systolic array to perform the computations.
However, Gu combined with Ansari and Eaton disclosed further comprising an on-chip memory (Gu: Figures 11-12 elements 202, 1105, and 1225, paragraphs 5, 58-59, and 61)(Ansari: Figure 2 element 230, column 7 lines 23-25)(Gu disclosed a systolic array accelerator with on-chip memory banks, input buffers, and weight buffers. The combination implements the tensor accelerator of Gu as the coprocessor of Ansari.);
wherein the hardware execution engine comprises a systolic array (Gu: Figure 12 element 1110, paragraph 61)(Ansari: Figure 2 element 232, column 7 lines 23-25)(Gu 
wherein the opcode controls the systolic array to perform computations to generate intermediate outputs (Gu: Figure 12 element 1110, paragraph 61)(Ansari: Figures 2 and 7B elements 232 and 710, column 7 lines 23-25 and column 13 lines 47-65)(Gu disclosed a systolic array accelerator with on-chip memory banks, input buffers, and weight buffers. The combination implements the tensor accelerator of Gu as the coprocessor of Ansari. The combination allows for systolic MAC array processing via the UDI operations of Ansari.); and 
wherein the memory access operation fetches input data elements and weight elements from the on-chip memory to the systolic array to perform the computations (Gu: Figure 12 element 1110, paragraph 61)(Ansari: Figures 2 and 7B elements 232 and 710, column 7 lines 23-25 and column 13 lines 47-65)(Gu disclosed a systolic array accelerator with on-chip memory banks, input buffers, and weight buffers. The combination implements the tensor accelerator of Gu as the coprocessor of Ansari. The combination allows for systolic MAC array processing via the UDI operations of Ansari. These UDI operations fetch data from the input and weight buffers for MAC computations.).
The advantage of the accelerator architecture of Gu is that systolic array and on-chip memories are useful for better performance in deep learning applications. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to implement the accelerator architecture of Gu as the coprocessor of Ansari 
As per claim 15:
Ansari, Eaton, and Gu disclosed the hardware accelerator of claim 13, wherein the systolic array is programmable by a first set of instructions to perform the computations for a first neural network and programmable by a second set of instructions to perform the computations for a second neural network (Gu: Figure 12 element 1110, paragraphs 3-4 and 61)(Ansari: Figures 2 and 7B elements 232 and 710, column 7 lines 23-25 and column 13 lines 47-65)(Gu disclosed a systolic array accelerator with on-chip memory banks, input buffers, and weight buffers. The combination implements the tensor accelerator of Gu as the coprocessor of Ansari. The combination allows for systolic MAC array processing via the UDI operations of Ansari for multiple different applications.);
wherein the instruction schema mapping table is programmed based on a first instruction schema program to provide instruction schemas to the systolic array to control the systolic array to perform the computations for the first neural network (Gu: Figure 12 element 1110, paragraphs 3-4 and 61)(Eaton: Figure 4 element 16, column 6 lines 59-61)(Ansari: Figures 2, 7C, and 24 elements 231, 233, 723-724, 730, and 2412, column 11 lines 66-67 continued to column 12 lines 1-3, column 14 lines 47-50, column 14 lines 65-67 continued to column 15 lines 1-3, column 15 lines 28-31, column 17 lines 30-35, column 38 lines 18-23, column 38 lines 42-55, and column 39 lines 40-50)(Eaton disclosed a decoder being implemented as a CAM. The combination implements the programmable instruction decoder within the FCM of Ansari as a programmable CAM. The combination implements the tensor accelerator of Gu as the coprocessor of Ansari. An initial programming of the decoder reads upon the first instruction schema program 
wherein the instruction schema mapping table is programmed based on a second instruction schema program to provide instruction schemas to the systolic array to control the systolic array to perform the computations for the second neural network (Gu: Figure 12 element 1110, paragraphs 3-4 and 61)(Eaton: Figure 4 element 16, column 6 lines 59-61)(Ansari: Figures 2, 7C, and 24 elements 231, 233, 723-724, 730, and 2412, column 11 lines 66-67 continued to column 12 lines 1-3, column 14 lines 47-50, column 14 lines 65-67 continued to column 15 lines 1-3, column 15 lines 28-31, column 17 lines 30-35, column 38 lines 18-23, column 38 lines 42-55, and column 39 lines 40-50)(Eaton disclosed a decoder being implemented as a CAM. The combination implements the programmable instruction decoder within the FCM of Ansari as a programmable CAM. The combination implements the tensor accelerator of Gu as the coprocessor of Ansari. Official notice is given that programmable logic can be reprogrammed during execution for the advantage of implementing other types of functions. Thus, it would have been obvious to one of ordinary skill in the art to reprogram the CAM and instruction registers to implement additional UDIs during execution. In view of the above official notice, a reprogramming of the decoder reads upon the second instruction schema program for processing a second neural network application.).

Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Ansari et al. (U.S. 7,590,823), in view of Eaton (U.S. 4,187,539), in view of Official Notice, in view of Gu et al. (U.S. 2020/0184001), further in view of Kloth (U.S. 2004/0156547).
As per claim 14:

wherein the controller is configured to: 
receive a second instruction (Ansari: Figure 2 element 220, column 7 lines 1-12)(The APU controller receives coprocessor UDI instructions containing opcodes.);
extract, using the instruction decoder, a second opcode from the second instruction (Ansari: Figures 7 and 24 elements 711, 2403, and 2411, column 13 lines 10-19, column 17 lines 30-35, and column 39 lines 13-24)(An opcode of the stored UDI instruction is extracted to be compared with opcodes in the configuration instruction registers.);
obtain, from the instruction schema mapping table and based on the second opcode, a third definition of a third operand and a fourth definition of a fourth operand, wherein the third definition and the fourth definition specify a location and a number of bits of, respectively, the third operand and the fourth operand in the instruction (Ansari: Figures 3 and 7A element 304, column 11 lines 63-65 and column 13 lines 8-46)(Decoding the load/store UDI identifies operands for the source and destination of data to be loaded/stored. In view of the above official notice, the load/store UDI format indicates the location and number of bits for these operands.).
Ansari, Eaton, and Gu failed to teach wherein the hardware accelerator further comprises a post-processing engine; wherein the controller is configured to: forward the 
However, Kloth combined with Ansari, Eaton, and Gu disclosed wherein the hardware accelerator further comprises a post-processing engine (Kloth: Figures 6-7 element 500, paragraph 21)(Gu: Figure 12 element 1110, paragraph 61)(Ansari: Figure 2 element 232, column 7 lines 23-25)(Kloth disclosed a post processing engine coupled to an image processing engine. The combination implements the tensor accelerator of Gu with the post processing engine of Kloth as the coprocessor of Ansari.);
wherein the controller is configured to: 
forward the second instruction and the third definition to the post- processing engine to enable the post-processing engine to extract the third operand from the second instruction and to perform a post-processing operation on the intermediate outputs of the systolic array based on the third operand to generate outputs (Kloth: Figure 6 elements 504 and 510, paragraph 35-36)(Ansari: Figures 2, 7C, and 24 elements 231, 233, 723-724, 730, and 2412, column 11 lines 66-67 continued to column 12 lines 1-3, column 14 lines 47-50, column 14 lines 65-67 continued to column 15 lines 1-3, column 15 lines 28-31, column 17 lines 30-35, column 38 lines 18-23, column 38 lines 42-55, and column 39 lines 40-50)(The combination implements the tensor accelerator of Gu with the 
forward the second instruction and the fourth definition to the memory access circuit to store the outputs at the on-chip memory (Kloth: Figure 6 elements 504 and 510, paragraph 37)(Gu: Figure 12 element 1110, paragraph 61)(Ansari: Figure 2 element 232, column 7 lines 23-25)(The combination implements the tensor accelerator of Gu with the post processing engine of Kloth as the coprocessor of Ansari. Vector engine outputs from the post processing engine are stored in the DRAM.).
The advantage of post processing engines is that pixel processing can be further refined. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to implement a post processing engine in Ansari to allow for further refinements of data processing from the systolic array of Gu.

Claims 22 and 24 are rejected under 35 U.S.C. 103 as being unpatentable over Ansari et al. (U.S. 7,590,823), in view of Eaton (U.S. 4,187,539), in view of official notice, further in view of Garegrat et al. (U.S. 2019/0391811).
As per claim 22:
Ansari and Eaton disclosed the hardware accelerator of claim 21.
Ansari and Eaton failed to teach wherein the second operand specifies parameters for a memory access operation of 4D data.

The advantage of matrix processing is that large amounts of data can be processed in parallel with smaller program sizes. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to implement the matrix processing of Garegrat into the accelerator of Ansari as UDI instructions for the advantage of increased performance from smaller programs.
As per claim 24:
Ansari and Eaton disclosed the hardware accelerator of claim 10.
Ansari and Eaton failed to teach wherein the first instruction schema program defines a third instruction for a memory access operation of 3D data; and wherein the second instruction schema program defines a fourth instruction for a memory access operation of 4D data.
However, Garegrat combined with Ansari and Eaton disclosed wherein the first instruction schema program defines a third instruction for a memory access operation of 3D data (Garegrat: Figures 1-2 elements 110 and 115, paragraph 54)(Ansari: Figures 3 and 7A element 304, column 11 lines 63-65 and column 13 lines 8-46)(Garegrat disclosed a matrix processor with matrix operations using 3D operands. The combination allows for the UDI instructions to be matrix load/store operations using 3D operands.); 
wherein the second instruction schema program defines a fourth instruction for a memory access operation of 4D data (Garegrat: Figures 1-2 elements 110 and 115, paragraph 54)(Ansari: Figures 3 and 7A element 304, column 11 lines 63-65 and column 13 lines 8-46)(Garegrat disclosed a matrix processor with matrix operations using 4D operands. The combination allows for the UDI instructions to be matrix load/store operations using 4D operands.).
The advantage of matrix processing is that large amounts of data can be processed in parallel with smaller program sizes. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to implement the matrix processing of Garegrat into the accelerator of Ansari as UDI instructions for the advantage of increased performance from smaller programs.

Response to Arguments
The arguments presented by Applicant in the response, received on 3/19/2021 are partially considered persuasive.
Applicant argues for claims 1, 6, and 16:
“Applicant respectfully submits that the cited references fail to disclose, teach, or suggest all the features of amended independent claim 1. For example, Ansari and Eaton fail to disclose “obtain[ing], from the instruction schema mapping table and based on the opcode, an instruction schema of the instruction, wherein the instruction schema specifies a location and a number of bits of an operand in the instruction; and forward[ing] the instruction and the instruction schema to the hardware execution engine to enable the hardware execution engine to extract the operand from the instruction based on the instruction schema, and to execute the instruction based on the operand” (emphases added). During the examiner interview, Examiner Petranek agreed that these features do not appear to be disclosed in the cited references. Applicant thanks Examiner Petranek for the agreement.”  

This argument is not found to be persuasive for the following reason. Upon a closer look at the specific claim language, the language can still be rejected by the combination. This is due to the opcode mapping the instruction to an encoding format. The encoding format indicates the bit locations and number of bits for an operand of the instruction. In this instance, the UDI opcode maps to the encoding format in figure 7C that shows bits 28-30 being used for a first operand. Thus, reading upon the newly claimed limitation.
The examiner notes an amendment that would likely overcome the cited references in the above rejections. In figure 5C of the drawings of the application, an instruction schema mapping table is shown that specifies bit offsets and bit lengths for multiple operands. The combination above is able to read upon the current claimed limitations via the encoding format that shows the bit locations and number of bits used for each operand of the instruction. Figures 7A-B of the drawings shows address operands, step operands, and a number of elements for each dimension of the multi-dimensional data. Adding feature such as these to the independent claims would likely result in overcoming the currently cited references.
Applicant argues for claim 12:
“In addition, Applicant also disagrees with the Office that Ansari in view of Eaton teaches or suggests the subject matter of claim 12. As discussed during the interview, while Ansari discloses store and load instructions for load module 308, Ansari does not disclose an instruction that has both a first operand for execution units 232 (the alleged “hardware execution engine”) and a second operand for load module 308 (the alleged “memory access circuit”). For at least these reasons, Ansari in view of Eaton does not disclose the subject matter of claim 12.” 
 


	Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
The following is text cited from 37 CFR 1.111(c): In amending in reply to a rejection of claims in an application or patent under reexamination, the applicant or patent owner must clearly point out the patentable novelty which he or she thinks the claims present in view of the state of the art disclosed by the references cited or the objections made. The applicant or patent owner must also show how the amendments avoid such references or objections.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached on (571) 272-4169.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/JACOB PETRANEK/Primary Examiner, Art Unit 2183