Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
Status of Claims
This action is in reply to the application filed on August 17, 2021.
Claims 15-34 are currently pending.
The instant application claims priority to application 15/494,887 (now U.S. Patent 11,238,338) filed on April 24, 2017.


Information Disclosure Statement
The information disclosure statements (IDS) submitted on December 1, 2021 and March 1, 2022 have been considered by the examiner.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 15-34 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-8, 10-14, 16-20, 23, and 24 of U.S. Patent No. 11,238,338. Although the claims at issue are not identical, they are not patentably distinct from each other because the independent claims in the ‘338 are not patentably distinct from the claims of the instant application.
Instant Application 17/404,153
Patent 11,238,338
Claim 15:
An apparatus comprising: 

a general purpose graphics processing unit (GPGPU) comprising a plurality of streaming multiprocessors (SMs), the GPGPU to: 

receive a plurality of data inputs for training a neural network executed by the plurality of SMs, wherein the data inputs comprise training data and weights inputs; 

perform measurements of compute power and latency of the plurality of SMs; 

determine a ratio of the compute power for the plurality of SMs; and 

assign, in accordance with the ratio of the compute power and in accordance with the latency, the training data in a low precision form and the weight inputs in a high precision form among the plurality of SMs, wherein the low precision form is lower than the high precision form.
Claim 1:
An apparatus comprising: 

an interconnect fabric; and 
a general purpose graphics processing unit (GPGPU) comprising a plurality of streaming multiprocessors (SMs) communicatively coupled to the interconnect fabric and comprising a plurality of execution resources and a local cache memory, the GPGPU to: 

receive a plurality of data inputs for training a neural network executed by the plurality of SMs, wherein the data inputs comprise training data and weights inputs; 

represent the training data in a low precision form in a memory of at least one of the plurality of SMs; 

represent the weight inputs in a high precision form in the memory of at least one of the plurality of SMs, wherein the low precision form is lower relative to the high precision form; 

perform measurements of compute power and latency on the plurality of SMs of the GPGPU; 

determine a ratio of the compute power for the plurality of SMs; and 

assign the training data in the lower precision form and the weight inputs in the high precision form among the plurality of SMs in accordance with the ratio of the compute power and in accordance with the latency of the plurality of SMs.
16
2
17
3
18
4
19
5
20
6
Claim 21:
A system, comprising: 

an interconnect fabric; 

a general purpose graphics processing unit (GPGPU) comprising a plurality of streaming multiprocessors (SMs) communicatively coupled to the interconnect fabric, the comprising a plurality of execution resources and a local cache memory, the GPGPU to: 

receive a plurality of data inputs for training a neural network executed by the plurality of SMs, wherein the data inputs comprise training data and weights inputs; 

perform measurements of compute power and latency of the plurality of SMs; 

determine a ratio of the compute power for the plurality of SMs; and 

assign, in accordance with the ratio of the compute power and in accordance with the latency, the training data in a low precision form and the weight inputs in a high precision form among the plurality of SMs, wherein the low precision form is lower than the high precision form.
Claim 7:
An electronic device, comprising: 

an interconnect fabric; 

a general purpose graphics processing unit (GPGPU) comprising a plurality of streaming multiprocessors (SMs) communicatively coupled to the interconnect fabric and comprising a plurality of execution resources and a local cache memory, the GPGPU to: 

receive a plurality of data inputs for training a neural network executed by the plurality of SMs, wherein the data inputs comprise training data and weights inputs; 

represent the training data in a low precision form in a memory of at least one of the plurality of SMs; 

represent the weight inputs in a high precision form in the memory of at least one of the plurality of SMs, wherein the low precision form is lower relative to the high precision form; 

perform measurements of compute power and latency on the plurality of SMs of the GPGPU; 

determine a ratio of the compute power for the plurality of SMs; and 

assign the training data in the lower precision form and the weight inputs in the high precision form among the plurality of SMs in accordance with the ratio of the compute power and in accordance with the latency of the plurality of SMs; and 

a display device communicably coupled to the GPGPU.
22
8
23
10
24
11
25
12
Claim 26:
A method comprising: 

receiving, in a general purpose graphics processing unit (GPGPU) comprising a plurality of streaming multiprocessors (SMs), a plurality of data inputs for training a neural network executed by the plurality of SMs, wherein the data inputs comprise training data and weights inputs; 

performing measurements of compute power and latency of the plurality of SMs; 

determining a ratio of the compute power for the plurality of SMs; and 

assigning, in accordance with the ratio of the compute power and in accordance with the latency, the training data in a low precision form and the weight inputs in a high precision form among the plurality of SMs, wherein the low precision form is lower than the high precision form.
Claim 13:
A method comprising: 

receiving, in a general purpose graphics processing unit (GPGPU) comprising a plurality of streaming multiprocessors (SMs) communicatively coupled to an interconnect fabric and comprising a plurality of execution resources and a local cache memory, a plurality of data inputs for training a neural network executed by the plurality of SMs, wherein the data inputs comprise training data and weights inputs; 

representing the training data in a low precision form in a memory of at least one of the plurality of SMs; 

representing the weight inputs in a high precision form in the memory of at least one of the plurality of SMs, wherein the low precision form is lower relative to the high precision form; 

performing measurements of compute power and latency on the plurality of SMs of the GPGPU; 

determining a ratio of the compute power for the plurality of SMs; and 

assigning the training data in the lower precision form and the weight inputs in the high precision form among the plurality of SMs in accordance with the ratio of the compute power and in accordance with the latency of the plurality of SMs.
27
14
28
16
29
17
30
18
Claim 31:
A non-transitory machine-readable storage medium having stored thereon executable computer program instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising: 

receiving, in a general purpose graphics processing unit (GPGPU) comprising a plurality of streaming multiprocessors (SMs), a plurality of data inputs for training a neural network executed by the plurality of SMs, wherein the data inputs comprise training data and weights inputs; 

performing measurements of compute power and latency of the plurality of SMs; 

determining a ratio of the compute power for the plurality of SMs; and 

assigning, in accordance with the ratio of the compute power and in accordance with the latency, the training data in a low precision form and the weight inputs in a high precision form among the plurality of SMs, wherein the low precision form is lower than the high precision form.
Claim 19:
A non-transitory machine-readable storage medium having stored thereon executable computer program instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising: 

receiving, in a general purpose graphics processing unit (GPGPU) comprising a plurality of streaming multiprocessors (SMs) communicatively coupled to an interconnect fabric and comprising a plurality of execution resources and a local cache memory, a plurality of data inputs for training a neural network executed by the plurality of SMs, wherein the data inputs comprise training data and weights inputs; 

representing the training data in a low precision form in a memory of at least one of the plurality of SMs; 

representing the weight inputs in a high precision form in the memory of at least one of the plurality of SMs, wherein the low precision form is lower relative to the high precision form; 

performing measurements of compute power and latency on the plurality of SMs of the GPGPU; 

determining a ratio of the compute power for the plurality of SMs; and 

assigning the training data in the lower precision form and the weight inputs in the high precision form among the plurality of SMs in accordance with the ratio of the compute power and in accordance with the latency of the plurality of SMs.
32
20
33
23
34
24




Status of Prior Art
The prior art of record Holt (“Finite Precision Error Analysis of Neural Network Hardware Implementation”), Seide (U.S. Patent 9,477,925), Hsaio (U.S. Patent 9,940,575), and Raina (“Large-scale Deep Unsupervised Learning using Graphics Processors”) do not teach the claimed features.  Independent claims 15, 21, 26, and 31 recite:
perform measurements of compute power and latency on the plurality of SMs;
determine a ratio of the compute power for the plurality of SMs; and
assign, in accordance with the ratio of the compute power and in accordance with the latency, the training data in the lower precision form and the weight inputs in the high precision form among the plurality of SM, wherein the low precision form is lower than the high precision form.

The previously cited art does not teach measuring compute power and latency on the SMs to determine a ratio of the compute power and assigning training data in accordance with the ratio.

Conclusion
Claims 15-34 are rejected.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to DANIEL T PELLETT whose telephone number is (571)270-7156.  The examiner can normally be reached on Monday - Friday 9-5 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li Zhen can be reached on 571-272-3768.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DANIEL T PELLETT/             Primary Examiner, Art Unit 2121