Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
	Claims 1-20 are presented for examination.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 9, 11-13, 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al., US Patent Application Publication 2022/0058237 (hereinafter Wang) in view of Babich et al., US Patent Application Publication 2020/0050451 (hereinafter Babich).
	Regarding claim 1, Wang teaches:
An apparatus for vector computing incorporating with matrix multiply and accumulation (MMA) calculation, comprising: 
a general matrix multiply (GEMM) calculation unit, comprising an instruction queue and a first arithmetic logical unit (ALU) (see e.g. para. [0002], [0032], fig. 2A, 2B, an accelerator comprising an operation unit and an instruction buffer), wherein the first ALU coupled to local memory is arranged operably to perform MMA calculation according to a GEMM instruction stored in the instruction queue, and store a calculation result in memory (see e.g. para. [0039], [0043], [0045]).
Wang fails to explicitly describe a streaming multiprocessor (SM), comprising a general-purpose register (GPR).
Babich teaches a streaming multiprocessor used with a coprocessor accelerator for certain processes (see e.g. para. [0058]) as well as a general register file for storing operand and result data (see e.g. para. [0077], [0155], [0195], [0295]).
Before the effective filing date of the claimed invention it would have been obvious to one of ordinary skill in the art to combine the teachings of Wang and Babich to include a streaming multiprocessor (SM), comprising a general-purpose register (GPR). The use of a streaming multiprocessor would have provided the advantage of increased parallelism and processing bandwidth by allowing for massively-parallel groups of threads. The use of a general-purpose register would have improved the access speed of storage by using registers rather than a larger but slower memory.
Regarding claim 2, Wang in view of Babich teaches or suggests:
The apparatus of claim 1, wherein the SM comprises a second ALU, the second ALU coupled to the instruction queue is arranged operably to: when a fetched instruction is the GEMM instruction, obtain source data from the GPR, and push the GEMM instruction and the source data into the instruction queue (see e.g. Wang para. [0032], a host system/ALU can push commands into an instruction buffer; see e.g. Babich para. [0168-172], instructions and data from registers are pushed to a coprocessor).
Regarding claim 9, Wang in view of Babich teaches or suggests:
The apparatus of claim 2, wherein the second ALU is arranged operably to: when the fetched instruction is not the GEMM instruction, use a pipeline in the second ALU to execute the fetched instruction (see e.g. Wang para. [0030], the host unit executes other instructions such as x86 instructions; see e.g. Babich para. [0159], [0201], [0249], [0296]).
Regarding claim 11, Wang in view of Babich teaches or suggests:
The apparatus of claim 1, wherein the SM and the GEMM calculation unit are arranged operably to perform different types of computation in parallel (see e.g. Wang para. [0018], [0025]; see e.g. Babich para. [0069], [0158], [0249]).
Claims 12-13, 18-19 are rejected for reasons corresponding to those given above for claims 1-2, 9, 11.


Claims 10, 20 are rejected under 35 U.S.C. 103 as being unpatentable over Wang in view of Babich, further in view of In re Rinehart, 531 F.2d 1048, 189 USPQ 143 (CCPA 1976).
Regarding claim 10, Wang in view of Babich teaches or suggests:
The apparatus of claim 1.
Wang in view of Babich fails to explicitly teach wherein the GEMM calculation unit, coupled to sixteen SMs, is arranged operably to perform 16K MMA calculation in every clock cycle.
In re Rinehart describes that "mere scaling up of a prior art process capable of being scaled up, if such were the case, would not establish patentability in a claim to an old process so scaled." 531 F.2d at 1053, 189 USPQ at 148.).
Before the effective filing date of the claimed invention it would have been obvious to one of ordinary skill in the art to combine the teachings of Wang, Babich, and In re Rinehart such that the GEMM calculation unit, coupled to sixteen SMs, is arranged operably to perform 16K MMA calculation in every clock cycle. This would have provided the clearly predictable result of performing the exact same functionality on a specific number of SMs. An increase in the number of calculations per second would be desirable to improve the speed of the system. A reduction in the number of calculations per second would allow for a reduction in power requirements.
Claim 20 is rejected for reasons corresponding to those given above for claim 10.







Allowable Subject Matter
Claims 3-8, 14-17 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.



Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOHN M LINDLOF whose telephone number is (571)270-1024. The examiner can normally be reached M-F 9:00-6:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached on 5712724169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JOHN M LINDLOF/Primary Examiner, Art Unit 2183