Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
This non-final office action is responsive to the U.S. patent application no. 16/403,884 filed on May 6, 2019. 
Claims 1-20 are pending.
Claims 1-12 and 18-20 are rejected.
Claims 13-17 are objected to.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on November 5, 2019 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement has been considered by the examiner.
Allowable Subject Matter
Claims 13-17 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

 (a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-12 and 18-20 are rejected under 35 U.S.C. 102(a)(2) as being unpatentable over Burger et al. (U.S. 2019/0340492).
Regarding claim 1, Burger disclosed a method performed by one or more data processing apparatus, the method comprising: 
training a first machine learning model having a machine learning model architecture using a first computing system having a first configuration (Burger, Fig. 9 and [0098-0100], “The quantized neural network can be retrained by adjusting one or more training parameters that were used to train the normal-precision neural network and then training the quantized neural network model with the adjusted training parameters.” Here the quantized neural network uses reduced-precision block floating point format to represent the values in the neural network model; Burger also disclosed in [0051-0057] that a quantization emulator 140 or a hardware accelerator 180 may be used to train the quantized neural network).
training a second machine learning model having the machine learning model architecture using a second computing system having a second configuration (Burger, Fig. 9 and [0097], “the neural network model 910 may be generated anew by creating a neural network and proceeding through a training process”; please note that the model 910 is a normal-precision neural network model and Burger disclosed in [0028-0039] that a normal-prevision neural network model is typically trained using general-purpose CPU that natively supports normal-prevision floating point number format; the normal-precision neural network model 910 anticipates the second machine learning model in the claim), 
wherein the second configuration of the second computing system is different than the first configuration of the first computing system (Burger’s disclosure in [0028-0030] and [0051-0057] made it clear that the quantized neural network model Wq is trained using a different configuration from the normal-precision neural network W.); 
determining, for each of a plurality of shared training operations that are performed by both the first computing system and the second computing system, a respective similarity measure that measures a similarity between: a first training output generated by the first computing system by performing the shared training operation during the training of the first machine learning model, and a second training output generated by the second computing system by performing the shared training operation during the training of the second machine learning model (Burger, Fig. 9 and [0098], “Output tensors generated by the quantized neural network model are provided to a comparison unit 950 that compares the normal-precision output tensors to the quantized precision output tensors”); and 
providing the respective similarity measure determined for each of the plurality of shared training operations for use in comparing: (i) the training of the first machine learning model performed by the first computing system having the first configuration, and (ii) the training of the second machine learning model performed by the second computing system having the second configuration (Burger, [0099], “Based on the results of the comparison, the normal-precision neural network model 910 and/or the quantized precision neural network model 940 may be retrained, have its hyperparameters adjusted, and/or have quantization parameters of the quantized precision neural network adjusted (960 and 960′).”).  
Regarding claim 2, Burger disclosed the method of claim 1.
Burger further disclosed wherein: the first training outputs generated by the first computing system are generated using one or more application-specific integrated circuits (Burger, [0027-0030, 0048-0057], “specialized hardware”, “FPGA”, “TPU”); and the second training outputs generated by the second computing system are generated using one or more central processing units (Burger, [0027-0030, 0048-0057], “general-purpose CPU”).  
Regarding claim 3, Burger disclosed the method of claim 2.
Burger further disclosed wherein the application-specific integrated circuits are artificial intelligence accelerators (Burger, [0028-0030, 0048-0057], “neural network accelerator”).  
Regarding claim 4, Burger disclosed the method of claim 3.
Burger further disclosed wherein one or more of the artificial intelligence accelerator application-specific integrated circuits are tensor processing units (Burger, [0027-0030, 0048-0057], “TPU”).  
Regarding claim 5, Burger disclosed the method of claim 1.
Burger further disclosed wherein: the first training outputs generated by the first computing system are generated using one or more first application-specific integrated circuits having an X-bit architecture (Burger, [0027-0030], “low-precision quantized format” “reduced precision (floating point) format” on “specialized hardware such as FPGA”; [0029], “ lower-precision quantized formats include formats having a reduced bit width (including by reducing the number of bits used to represent a number's mantissa or exponent) and block floating-point formats where two or more numbers share the same single exponent”); and 
the second training outputs generated by the second computing system are generated using one or more second application-specific integrated circuits having a Y-bit architecture (Burger, [0027-0030], “normal-precision floating-point format e.g. 16-bit floating point format, 32-bit floating point format, 64-bit … or an 80-bit floating point format”), wherein X and Y are different positive integer values (Burger, [0027-0030] disclosed that the number of bits used to represent a floating point number is different between a normal-prevision processor such as an CPU and a specialized hardware such as FPGA and TPU).  
Regarding claim 6, Burger disclosed the method of claim 1.
Burger further disclosed wherein one or more correctness issues occur during the training of the first machine learning model, and no correctness issues occur during the training of the second machine learning model (Burger, [0029], “NN weights and activation values can be represented in a lower-precision quantized format with an acceptable level of error introduced.”) 
Regarding claim 7, Burger disclosed the method of claim 6.
Burger further disclosed wherein: parameter values of the first machine learning model do not converge during the training using the first computing system; and parameter values of the second machine learning model converge during the training using the second computing system (Burger, [0101] disclosed that “”, which means that the quantized neural network may not converge initially).  
Regarding claim 8, Burger disclosed the method of claim 6.
Burger further disclosed wherein: one or more special values are generated during the training using the first computing system (Burger, [0027-0030] disclosed that quantized neural network model uses quantized-precision format to represent weight values); and no special values are generated during the training using the second computing system (Burger, [0027-0030] disclosed that normal-prevision neural network model uses normal-precision floating point format to represent weight values).  
Regarding claim 9, Burger disclosed the method of claim 8.
Burger further disclosed wherein the special values are not-a-number values (Burger, [0027-0030], “quantized-precision format”; it is unclear what “not-a-number values” means).  
Regarding claim 10, Burger disclosed the method of claim 1.
Burger further disclosed wherein the machine learning model architecture is a neural network architecture (Burger, Abstract and Title).  
Regarding claim 11, Burger disclosed the method of claim 10.
Burger further disclosed wherein the plurality of shared training operations comprise one or more of: determining a value of an objective function, determining a gradient of the objective function, determining an output of a neural network layer, determining a result of a convolution operation of a neural network layer, determining a result of an activation function of a neural network layer (these are inherent attributes of a neural network training process therefore is anticipated by Burger’s disclosure about training a neural network model).  
Regarding claim 12, Burger disclosed the method of claim 1.
Burger further disclosed wherein the first training output comprises a first matrix and the second training output comprises a second matrix (Burger, Fig. 9 and [0097], “output tensor 925 and 925’).  
Claim 18 recites substantially the same subject matter as claim 1, in computer storage media form rather than method form, therefore is rejected with the same rationale as claim 1.
Claim 19 recites substantially the same subject matter as claim 1, in apparatus form rather than method form, therefore is rejected with the same rationale as claim 1.
Claim 20 recites substantially the same subject matter as claim 2, in apparatus form rather than method form, therefore is rejected with the same rationale as claim 2. 
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHIRLEY X ZHANG whose telephone number is (571)270-5012.  The examiner can normally be reached on 8:30am - 5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, William Trost can be reached on 571-272-7872.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/SHIRLEY X ZHANG/Primary Examiner, Art Unit 2442
5/11/2022