Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
Claims 1-3, 5-9, and 11-12 are currently pending in this application. Claims 1, 5-7, and 11-12 have been amended.

Response to Applicant’s Remarks
With respect to 35 U.S.C. 102(a2) rejections:
	Claims 1-3, 5-9, and 11-12 were previously rejected under 35 USC 102(a2), anticipated by Sarel et al. (US 2018/0293777). However, Applicant filed a response on April 6, 2021 with a common ownership statement that Sarel reference and instant claimed invention are both commonly assigned to Intel Corporation (See statement in Remarks at 5-6). For this reason, the 102(a2) rejections have been withdrawn.
With respect to 35 U.S.C. 103 rejections:
	Applicant’s claim amendment and remarks filed 04/06/2021 overcame the rejections, therefore have been withdrawn. See Remarks at 6-8. In particularly, the claimed amendment contains the detailed subject matter of “wherein high precision floating point data is utilized for a first subset of the different layers, low precision floating point data is utilized for a second subset of the different layers, and integer data is utilized for a third subset of the different layers; and load the cast operations at the target precision level and the corresponding data type for each layer of the multi-layer DNN” that Du et al. in view of Judd et al. fails to teach.

Pertinent Art Cited
The following US Patent Applications and/or NPL references reveal the current state of the art:

Du et al. (US 2007/0296729) teaches of an electronic device (a mobile device, fig.8), comprising: 

a graphics multiprocessor (graphics processor 812) communicably coupled to the display (fig. 8), the graphics multiprocessor (graphics processor 812 of fig.8; graphics processor 702 of fig.7; ) comprising: 
an instruction cache (cache memory 114/712, fig.1/7) to receive a stream of instructions (receive plurality of instruction threads 706, fig.7 and par.0064); 
an instruction unit communicatively coupled to the instruction cache (texture engine 126/710 coupled to instruction cache 114/712, fig. 1/7) to issue the stream of instructions (performs specific graphic operations such as texture mapping, fig. 7 and par.0065); 
a plurality of execution units communicably coupled to the instruction unit to execute the stream of instructions (multi-threaded processor 102/704 coupled to texture engine 126/710 of fig.1/6/7, multi-threaded processor of fig. 6 having a processor core that includes four ALUs 624, 626, 628, and 630, par.0060), 
a shared memory (unified register file 118/202/302/604 shared by plurality of threads, fig.1/2/3/6 par.0060-0062 and par.0069) communicatively coupled to the plurality of execution units (coupled to the ALUs 624, 626, 628, and 630, par.0062); and 
a processor communicably coupled to the instruction cache, the instruction unit, the plurality of execution units, and the shared memory (ALU processor cores 624-630 of multi-threaded processor 704 of figs. 6-7, par.0060) wherein the processor to: 
expose operations in at least one of a load instruction or a store instruction of the stream of instructions (fetch and store plurality of threaded operations/instructions into memory space of unified register file and associate instructions into instruction cache, par.0037 and par.0041); and 
load the operations (loads instructions indicating specific operations to be performed for each of plurality of threads where each operation may be an arithmetic operation, an elementary function, a memory access operation, par.0043).

Judd et al. (“Reduced-Precision Strategies for Bounded Memory in Deep Neural Nets”, 2016, University of Toronto, p.1-12) teaches of an energy and performance improvements in memory (Judd: p.2, 2nd paragraph) comprising expose embedded cast operations (Judd: conversion of numerical representation values to the desired representation and then back to single precision floating point prior to processing in each layer, p.2-3, section 2.1) and determine for each layer of multi-layer deep learning neural network, a target precision level for the cast operations and data types of a plurality of different data types (Judd: assigning a different precision to each layer of the convolutional deep learning neural network, section 2, p.2; conversion of integer bits and fractional bits type data across layers, p.3; assigning a particular precision to each layer for integer data and fraction data, p.3), wherein the target precision level is determined from a plurality of different data types that are used to represent various weights in different layers of a multi-layer DNN (Judd: precision level at each of 5 layers of CNN for integer data and fraction data and the associating weights at each of 5 layers, fig.3); wherein the target precision level represents an optimal precision level (Judd: optimal precision level for accuracy as shown in figs.3-5 and between 1-10% as shown in Table 2); wherein the target precision level is determined to match a hardware capability (a reduced precision representation saves energy in memory and communication channels, improves performance in memory bound systems through better memory bandwidth unitization and effective cache capacity, and supports larger networks on systems with a fixed memory budget, p.2); wherein high precision floating point data is used for one or more lower layers of a neural network (Judd: high precision fraction bits for layer 1 as shown in right portions of figure 3f-i); and wherein lower precision floating point data and integers are used for one or more higher layer of the neural network (Judd: low precision fraction bits for layer 4 as shown in right portions of figure 3f-i). 

Sharangpani (US 6,108,772) teaches an apparatus and/or electronic device (figs. 1-2), comprising:; a processor (processor 210, fig.2) having a plurality of execution or execution units (arithmetic units 230, 235, 240, and 245 of processor 210 of fig.2 and 

However, the prior arts as stated above, individually or in combination, fails to teach at least “determine, for each layer of a multi-layer deep learning neural network (DNN), a target precision level for the cast operations at each layer and data types of a plurality of different data types for the cast operations at each layer, wherein the target precision level for the cast operations at each layer is determined from the plurality of different data types that are used to represent various weights in different layers of the multi-layer DNN deep learning neural network (DNN), and wherein high precision floating point data is utilized for a first subset of the different layers, low precision floating point data is utilized for a second subset of the different layers, and integer data is utilized for a third subset of the different layers; and load the cast operations at the target precision level and the corresponding data type for each layer of the multi-laver DNN.”

Allowable Subject Matter
Claims 1-3, 5-9, and 11-12 are allowed.
The primary reason for the allowance of claim 1 is that the prior art of record, taken alone or in combination, fails to disclose or render obvious the subject matter of:
“A graphics multiprocessor comprising: an instruction cache to receive a stream of instructions; an instruction unit communicably coupled to the instruction cache to execute issue the stream of instructions; a plurality of execution units communicably coupled to the instruction unit to execute the stream of instructions; a shared memory communicatively coupled to the plurality of execution units; and a processor communicably coupled to the instruction cache, the instruction unit, the plurality of execution units, and the shared memory, wherein the processor to: expose embedded cast operations in at least one of a load instruction or a store instruction of the stream of instructions; determine, for each layer of a multi-layer deep learning neural network (DNN), a target precision level for the cast operations at each layer and data types of a plurality of different data types for the cast operations at each layer, wherein the target precision level for the cast operations at each layer is determined from the plurality of different data types that are used to represent various weights in different layers of the multi-layer DNN deep learning neural network (DNN), and wherein high precision floating point data is utilized for a first subset of the different layers, low precision floating point data is utilized for a second subset of the different layers, and integer data is utilized for a third subset of the different layers; and load the cast operations at the target precision level and the corresponding data type for each layer of the multi-layer DNN.”
The primary reason for the allowance of claim 7 is that the prior art of record, taken alone or in combination, fails to disclose or render obvious the subject matter of:
“An electronic device, comprising: a display; and a graphics multiprocessor communicably coupled to the display, the graphics multiprocessor comprising: an instruction cache to receive a stream of instructions; an instruction unit communicably coupled to the instruction cache to execute issue the stream of instructions; a plurality of execution units communicably coupled to the instruction unit to execute the stream of instructions; a shared memory communicatively coupled to the plurality of execution 
Claims 2-3, 5-6, 8-9 and 11-12 are allowed due to their dependency on claims 1 and 7.
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HIEN (CINDY) D KHUU whose telephone number is (571)272-8585.  The examiner can normally be reached on Monday-Friday 8am-4:30pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/HIEN D KHUU/Primary Examiner, Art Unit 2116                                                                                                                                                                                                        April 10, 2021