DETAILED ACTION
Status of Claims 
Claims 1-15 have been considered. It is hereby acknowledged that the following papers have been received and placed of record in the file:
Abstract 							-Receipt Date 09/01/2021
Application Data Sheet 						-Receipt Date 09/01/2021
Claims 								-Receipt Date 09/01/2021
Drawings-only black and white line drawings			-Receipt Date 09/01/2021
Information Disclosure Statement (IDS) 				-Receipt Date 09/01/2021
Specification							-Receipt Date 09/01/2021

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 09/01/2021 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Specification
The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. 


Claim Objections
Claims 4-6 and 14 are objected to because of the following informalities:  
Claim 4 line 2- “the matrix data are” should be changed to “the matrix data is”
similar corrected should be made in claims 5-6
Claim 14 delete comma at beginning of line 2
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


Claims 1-15 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
	Claim 1 recites “store the instruction of the second type as a register file in the memory”, however, it is unclear how an instruction can be stored “as a register file in the memory” since an instruction is code that causes a processor to perform a specific task (see Wikipedia “Machine code”) while a register file is an array of processor registers (see Wikipedia “Register file”). Since code and registers are two distinct things in a processor, it does not make sense for one to be stored as the other (i.e. for code to be stored as registers). Further, since the term “register file” is being used inconsistent with its ordinary meaning, as evidenced by the fact that it does not make sense to store an instruction “as a register file”, the claim is indefinite as per MPEP 2173.05(a)(III). For purposes of examination this limitation will be interpreted as “store the instruction of the second type in a buffer”.
	Claim 12 recites a similar limitation and is rejected for similar reasons. The dependent claims are further rejected based on their dependence from a rejected base claim. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4, 10-13, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Agarwal (US 5,758,176) in view of Lu (US 2010/0180100).
	Regarding claim 1, Agarwal teaches:
1. A general-purpose computing accelerator (Fig. 2, 140, which is shown in detail in Figs. 3-5, see col 4 lines 18-34) comprising: 
a memory (Fig. 4, 206 and Fig. 5, 236) including an instruction cache (col 5 lines 44-46: L2 cache stores data and instructions, the portion of L2 that stores instructions is an instruction cache; col 7 lines 20-27: memory 206 includes the L2 cache); 
a first executing unit configured to perform a first computation operation (Fig. 4 202 and 204 together form a first executing unit where one of the operations performed by 202 or 204 is a first computation operation); 
a second executing unit configured to perform a second computation operation (the plurality of matrix processing elements 182, see col 5 lines 20-22, and the load data unit in the control unit, see col 6 lines 22-23, form a second executing unit where one of the operations performed by the load data unit or the processing elements 182 is a second computation operation); 
a state control unit (col 6 lines 20-23: the instruction control unit in the control unit 180 is a state control unit) configured to control a path of the instruction depending on an operation state of the second executing unit (col 7 lines 4-7: the command dispatch unit of the instruction control unit dispatches an instruction as commands to the processing elements, i.e. controls a path of the instruction, as dispatch conditions are met; col 8 lines 60-66: the control unit dispatches on based on the processing elements availability, i.e. based on an operating state), 
providing the instruction to the first executing unit when the instruction is of a first type and provides the instruction to the state control unit when the instruction is of a second type (col 6 lines 4-6: instructions are provided to an execution unit to perform the type of operation represented by the instruction), and 
wherein, depending on the operation state of the second executing unit, the state control unit provides the instruction of the second type to the second executing unit or stores the instruction of the second type as a register file in the memory (col 7 lines 4-7 and col 8 lines 60-66: the command dispatch unit of the instruction control unit dispatches/provides the instruction as commands to the processing elements based on the processing elements availability/operating state, the storing limitation is not required in the BRI of this limitation).
Agarwal does not explicitly teach:
an instruction fetching unit configured to fetch an instruction stored in the instruction cache; 
a decoding unit configured to decode the instruction; 
However, Lu teaches:
an instruction fetching unit configured to fetch an instruction ([0044]: fetch unit 115 fetches instructions, see also Fig. 1 showing 115 fetching instructions from instruction memory 110); 
a decoding unit configured to decode the instruction ([0044]: decode unit 125 decodes instructions, see also Fig. 1 125); 
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the processor of Agarwal to include a fetch and decode unit as taught by Lu. In this combination, the Agarwal will include a decoding unit that decodes fetched instructions and sends the instructions to the appropriate execution unit after decoding. One of ordinary skill in the art would have been motivated to make this modification because fetching and decoding instructions are well known functions of a general five stage pipeline and having a dedicated unit for performing the fetching and decoding would increase performance by enabling pipelining of instructions where an instruction can be decoded while another instruction is being fetched, see Lu [0013].

Regarding claim 2, Agarwal in view of Lu teaches:
 2. The general-purpose computing accelerator of claim 1, wherein the first executing unit includes: 
at least one arithmetic logic unit configured to perform an arithmetic operation or a logic operation (Agarwal col 6 lines 9-15: 204 includes ALUs to perform arithmetic/logic operations); and 
at least one floating point calculating unit configured to perform a floating point operation (Agarwal col 6 lines 9-15: 202 is a floating point calculating unit that performs floating point operations).

	Regarding claim 3, Agarwal in view of Lu teaches: 
3. The general-purpose computing accelerator of claim 1, wherein the second executing unit includes: 
a multi-precision arithmetic logic unit (Agarwal col 7 lines 58-61: the arithmetic logic units 238 across all the processing elements form a multi-precision arithmetic logic unit) including a plurality of floating point operators configured to perform a matrix operation (Agarwal col 2 lines 59-61 and col 7 lines 58-61: the floating point execution unit of each ALU 238 are a plurality of floating point operators; Agarwal col 7 lines 4-6: the processing elements perform matrix processing commands); and 
an extended cache direct memory access unit configured to move matrix data from an external memory to an extended cache of the memory (Agarwal col 7 lines 11-14: the load data unit, i.e. an extended cache direct memory access unit, loads data from memory into the processing elements; Agarwal col 5 lines 44-46: the portion of L2 that stores data is an extended cache of the memory; Agarwal col 7 lines 20-30: if requested data, i.e. by the load data unit of the SIMD unit 156, is not in L2 the data is retrieved/moved into L2 from a memory external to L2, i.e. an external memory, so that it can retrieved by the control unit, see also connection from memory data bus to L2 to control unit 180 in Fig. 3 indicating that data is moved from memory into L2; Agarwal col 5 lines 20-22: since 182 is a plurality of matrix processing elements, the data being loaded by the load data unit into 182 is matrix data).

	Regarding claim 4, Agarwal in view of Lu teaches:
	4. The general-purpose computing accelerator of claim 3, wherein, when the matrix data are stored in the memory by the extended cache direct memory access unit and an operation of the multi-precision arithmetic logic unit is possible, the state control unit provides the instruction of the second type to the multi-precision arithmetic logic unit (Agarwal col 8 lines 60-66: the instruction is dispatched to the ALUs of the processing elements when the processing elements are available, i.e. when an operation of the ALUs is possible, and when the data for the instruction is stored in L2 data cache, see also col 7 lines 20-27), and 
wherein the multi-precision arithmetic logic unit performs a matrix operation on the matrix data in response to the instruction of the second type (Agarwal col 5 lines 20-22 and col 7 lines 4-6: the matrix processing elements perform the matrix processing element commands, i.e. matrix operations, using the floating-point execution unit in each processing element, see col 7 lines 58-61 and col 2 lines 59-51).

	Regarding claim 10, Agarwal in view of Lu teaches:
10. The general-purpose computing accelerator of claim 1, wherein a result of the first computation operation of the first executing unit is stored in the memory (Agarwal col 6 lines 9-17: the bidirectional data bus connections 210/214 indicate that results from 202 and 204 are stored in memory), and 
wherein a result of the second computation operation of the second executing unit is stored in the memory (Agarwal col 7 lines 55-57: the register file 236 stores results of operations performed by the processing elements).

	Regarding claim 11, Agarwal in view of Lu teaches:
11. The general-purpose computing accelerator of claim 1, wherein the general- purpose computing accelerator is implemented with a single core (Agarwal Fig. 2, 140 is a single core).

	Regarding claim 12, Agarwal teaches:
12. An operation method of a general-purpose computing accelerator (Fig. 2, 140, which is shown in detail in Figs. 3-5, see col 4 lines 18-34) which includes a first executing unit configured to perform a first computation operation (Fig. 4 202 and 204 together form a first executing unit where one of the operations performed by 202 or 204 is a first computation operation) and a second executing unit configured to perform a second computation operation (the plurality of matrix processing elements 182, see col 5 lines 20-22, and the load data unit in the control unit, see col 6 lines 22-23, form a second executing unit where one of the operations performed by the load data unit or the processing elements 182 is a second computation operation), the method comprising: 
fetching an instruction from a memory (Fig. 4, 206 and Fig. 5, 236) of the general-purpose computing accelerator cache (col 4 lines 49-56: instructions are fetched from memory); 
when the instruction is of a first type, executing the instruction of the first type through the first executing unit (col 6 lines 4-6: instructions are dispatched to an execution unit to perform the type of operation represented by the instruction); and 
when the instruction is of a second type, based on an operation state of the second executing unit, executing the instruction of the second type through the second executing unit or storing the instruction of the second type as a register file in the memory (col 7 lines 4-7: the command dispatch unit of the instruction control unit dispatches an instruction as commands to the processing elements as dispatch conditions are met; col 8 lines 60-66: the control unit dispatches on based on the processing elements availability, i.e. based on an operating state; col 6 lines 4-6: instructions of a second type will be executed by the processing elements of the SIMD execution unit, the storing limitation is not required in the BRI of this limitation), 
wherein the first computation operation includes an arithmetic logic operation (col 6 lines 9-15: 204 includes ALUs to perform arithmetic/logic operations) or a floating point operation (col 6 lines 9-15: 202 is a floating point calculating unit that performs floating point operations), and 
wherein the second computation operation includes a matrix operation (col 2 lines 59-61 and col 7 lines 58-61: the floating-point execution unit of each ALU 238 are a plurality of floating point operators; col 7 lines 4-6: the processing elements perform matrix processing commands).
	Agarwal does not teach fetch an instruction based on a program counter
	However, Lu teaches fetching an instruction based on a program counter ([0013] and [0042]: a program counter holds the address of the instruction being fetched for execution, see also Fig. 1 showing the program counter PC being received by the fetch unit 115)
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the system of Agarwal to fetch instructions according to a program counter as taught by Lu. One of ordinary skill in the art would have been motivated to make this modification because fetching based on a program counter is a known technique on the known device of a computer processor for fetching instructions and would yield the predictable result of enabling execution of instruction in program order. 

	Regarding claim 13, Agarwal in view of Lu teaches:
13. The method of claim 12, wherein the executing of the instruction of the second type through the second executing unit or the storing of the instruction of the second type as the register file in the memory, based on the operation state of the second executing unit, when the instruction is of the second type includes: 
when matrix data corresponding to the instruction of the second type is not ready in the memory, moving the matrix data from an external memory to the memory and storing the instruction of the second type as the register file in the memory; 
when the matrix data corresponding to the instruction of the second type is ready in the memory and the second executing unit is incapable of operating, storing the instruction of the second type as the register file in the memory; and 
when the matrix data corresponding to the instruction of the second type is ready in the memory and the second executing unit is capable of operating, executing the instruction of the second type through the second executing unit (Agarwal col 8 lines 60-66: the instruction is dispatched to the ALUs of the processing elements when the processing elements are available, i.e. when an operation of the ALUs is possible, and when the data for the instruction is stored in L2 data cache, see also col 7 lines 20-27; the BRI of this method claim does not require the contingent limitations “when the matrix data…” since they are not required to be performed, see MPEP 2111.04(II)).

	Regarding claim 15, Agarwal in view of Lu teaches:
15. The method of claim 12, wherein a result of the first computation operation of the first executing unit and a result of the second computation operation of the second executing unit are stored in the memory (Agarwal col 6 lines 9-17: the bidirectional data bus connections 210/214 indicate that results from 202 and 204 are stored in memory; col 7 lines 55-57: the register file 236 stores results of operations performed by the processing elements).

Prior Art Considerations
	The known prior art, taken alone or in combination, was not found to teach, in combination with other limitations in the claims, storing/writing an instruction of a second type in a register file in memory under the specific conditions required in claims 5, 6, or 14. The known prior art was also not found to teach claims 7-9 since these claims depend from claim 6.
	Examiner notes that while no prior art rejection has been given for claims 5-9 or 14, these claims are rejected under 112(b) and it is likely that amendments to overcome the 112(b) would affect the prior art considerations. 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
US 7,293,160 teaches a deferred instruction queue that holds instructions when there is an unresolved data dependency (Fig. 2)
US 2007/0083742 teaches issuing instructions to execution units 140-160 or an issue queue that then issues instructions to a VMX ALU 182 or FPU ALU 186 (Fig. 1)
US 2021/0089317 teaches scheduling instructions to a plurality of executing circuitries based on instruction type and number of instructions that have been allocated to the execution circuitries (Abstract)
US 8,020,169 teaches saving data from a register file to a context cache and transmitting data of a new thread from the context cache to the register file (Abstract)
US 8,055,886 teaches an instruction register that stores an instruction being sent to a decoder (Fig. 7 3626)
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KASIM ALLI whose telephone number is (571)270-1476. The examiner can normally be reached Monday - Friday 9am 5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on (571) 270-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/KASIM ALLI/Examiner, Art Unit 2183      

/William B Partridge/Primary Examiner, Art Unit 2183