DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments, see page 8 of Applicant’s reply, filed 11/10/2020, with respect to claim objections have been fully considered and are persuasive.  The objections of claims 1 and 8 have been withdrawn. 
Applicant’s arguments, see page 8 of Applicant’s reply, filed 11/10/2020, with respect to the double patenting rejections of claims 1-20. That is, Applicant, via filing of the terminal disclaimer, has overcome the double patenting rejections set forth in the previous Office action. Therefore, the double patenting rejections of claims 1-20 have been withdrawn.
Applicant’s arguments, see pages 8-9 of Applicant’s reply, filed 11/10/2020, with respect to the rejection(s) of claim(s) 1-5, 8-10, and 15-20 under 35 U.S.C. 103 as being unpatentable over AMD, “Graphics Core Next Architecture, Generation 3”, in view of Mimar, U.S. Patent 7,873,812, have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of Boswell et al., U.S. Patent Application Publication No. 2018/0321938.

Claim Objections
Claim 20 objected to because of the following informalities:  In claim 20, the phrase “the 32-bit intermediate product” should be amended to recite “the intermediate product” to properly refer back to the intermediate product recited in claim 15.  Appropriate correction is required.


Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1-20 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Boswell et al., U.S. Patent Application Publication No. 2018/0321938 (hereinafter Boswell).

Regarding claims 1, 8, and 15, taking claim 1 as exemplary, Boswell teaches a graphics processing unit [The parallel processing unit (PPU) 200 is a graphics processing unit (GPU). Paragraph 29]  to accelerate machine-learning operations, the graphics processing unit comprising: 
a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction across multiple threads of the multiprocessor [The PPU comprises general processing clusters (GPCs) that comprise multiple streaming multiprocessors 340 (SMs) (i.e. multiprocessor). Paragraphs 39-40; FIGS. 2 and 3A. Each SM has a SIMT architecture that executes a single instruction across multiple threads. Paragraph 45]; and 
a first compute unit included within the multiprocessor [Each SM 340 includes one or more processing cores (i.e. compute unit) 450. Paragraph 50], the at least one single instruction to cause the first compute unit to perform a multiply and accumulate operation [Each core (i.e. compute unit) includes an HMMA datapath to execute a half-precision matrix multiply and accumulate (HMMA) operation. Paragraphs 103 and 107], wherein to perform the multiply and accumulate operation includes to compute an intermediate product of 16-bit operands and to compute a 32-bit sum based on the intermediate product [The HMMA operation includes multiplying two half-precision (i.e. 16-bit) operands thereby computing an intermediate product and computing a sum based on that intermediate product. Paragraph 117-118; FIG. 11. The computed result is a single-precision floating point value (i.e. a 32-bit sum). Paragraph 122]; wherein to compute a 32-bit sum based on the intermediate product, the first compute unit is to: 
[The half-precision floating point values are multiplied to generate a partial product. Paragraph 118; FIG. 11. The partial product is the product of two 11-bit values and results in a 22 bit value (i.e. both of which are less than 32-bits) (i.e. an intermediate precision). See paragraph 121]; 
compute a sum based on the intermediate product to generate an intermediate sum a second intermediate precision [The partial products (i.e. intermediate product) is summed by the carry save adder (CSA) and the completion adder to generate a sum, which inherently has a precision (i.e. a second intermediate precision). Paragraph 121-122; FIG. 11]; and 
compute the 32-bit sum via a conversion of the intermediate sum at the second intermediate precision to a 32-bit precision [The intermediate sum is normalized and rounded to covert the result to a single-precision floating-point value (i.e. 32-bit precision). Paragraph 122.].
Further regarding claims 8 and 15, Boswell further teaches a memory communicatively coupled with the graphics processing unit [Memory 204. Paragraph 30] and decoding the single instruction on the graphics processing unit (GPU) [Paragraph 32], respectively.

Regarding claims 2, 9, and 16, taking claim 2 as exemplary, Boswell and Mimar teach the graphics processing unit as in claim 1, the multiprocessor to execute parallel threads of a thread group, each thread of the thread group having independent thread state [The SM executes a group of threads (a warp) concurrently (i.e. in parallel) where each thread has an independent state. Boswell at paragraph 45].

Regarding claims 3, 10, and 17, taking claim 3 as exemplary, Boswell and Mimar teach the graphics processing unit as in claim 2, the multiprocessor including a scheduler to schedule the parallel threads of the thread group to multiple compute units within the multiprocessor [The SM includes scheduler unit 410 that schedules the warp threads to multiple cores. Boswell at paragraph 51].

Regarding claims 4, 11, and 18, taking claim 4 as exemplary, Boswell and Mimar teach the graphics processing unit as in claim 3, the multiple compute units within the multiprocessor including a second compute unit to perform an integer operation [Each core includes an integer arithmetic unit. Boswell at paragraph 54], the scheduler to schedule a floating-point operation to the first compute unit and an integer operation to the second compute unit wherein the multiprocessor is to concurrently execute a floating-point operation on the first compute unit and an integer operation on the second compute unit [Multiple instruction are dispatched each cycle and therefore the SM concurrently executes a floating-point instruction on one core (i.e. first compute unit) concurrently with an integer instruction on another core (i.e. second compute unit). Boswell at paragraphs 51-52].

Regarding claims 5, 12, and 19, taking claim 5 as exemplary, Boswell and Mimar teach the graphics processing unit as in claim 4, wherein the multiprocessor is to concurrently execute a first floating-point operation at a first precision on the first compute unit and a second floating-point operation at a second precision [Each core includes a floating point arithmetic unit that operates on floating point values and different precisions. Boswell at paragraphs 54 and 83. Because multiple instructions are dispatched/executed concurrently on different cores, a first core (i.e. first compute unit) executes a floating point instruction at one precision and another core executes a different floating point instruction and a different precision. See Boswell at paragraphs 51-52].

Regarding claims 6 and 13, taking claim 6 as exemplary, Boswell and Mimar teach the graphics processing unit as in claim 1, the first compute unit additionally including one or more shifters to normalize or align an intermediate result [The HMMA datapath of the cores (i.e. compute unit) includes shift logic that aligns partial products (i.e. intermediate results). Boswell at paragraph 136; FIG. 13].

Regarding claims 7, 14, and 20, taking claim 7 as exemplary, Boswell and Mimar teach the graphics processing unit as in claim 6, the first compute unit additionally configurable to compute a 16-bit sum based on the intermediate product [The resulting sum is rounded/truncating to compute a half-precision (i.e. 16-bit) output sum. Boswell at paragraph 122].
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to BENJAMIN P GEIB whose telephone number is (571)272-8628.  The examiner can normally be reached on Monday - Friday 8:30 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ALEXEY SHMATOV can be reached on (571)270-3428.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/BENJAMIN P GEIB/Primary Examiner, Art Unit 2123