DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claim(s) 1-2, 11-13, and 17-18 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Drumond et al., “End-to-End DNN Training with Block Floating Arithmetic” (herein Drumond).
Regarding claim 1, Drumond teaches a method comprising: 
quantizing a normal-precision neural network model comprising tensors of normal-precision floating-point numbers, producing a quantized neural network model in a quantized-precision format [The FP-to-BFP units convert neural network floating point (FP) tensors to block floating point (BFP), thereby quantizing the neural network. Drumond at Abstract; section 4.4, 3rd paragraph; Fig. 5]; 
evaluating the quantized neural network model by applying input tensors to an input layer of the quantized neural network model, producing quantized output [Inferencing (i.e. evaluating the neural network by applying input tensors) is performed using the BFP weights. Drumond at section 4.4, 2nd - 3rd paragraph; Fig. 5]; and 
[The BFP neural network is compared to the FP32 neural network (i.e. the output generated by applying the input tensors to the normal-precision floating-point model). Drumond at section 6, 1st paragraph; section 5.2, “Evaluation Metric”; Fig. 7].

Regarding claim 2, Drumond teaches the method of claim 1, wherein: the quantized-precision format is a block floating-point format where at least two elements of the quantized neural network model share a common exponent [The format is block floating point (BFP) that shares exponents. Drumond at section 3, 4th paragraph].

Regarding claim 11, Drumond teaches one or more computer-readable storage media storing computer-readable instructions that when executed by a computer, cause the computer to perform the method of claim 1 [Tensorflow (i.e. software comprising instructions to be executed), which is inherently stored in a storage media, is used to perform the perform the operations. Drumond at section 5.1].

Regarding claim 12, Drumond teaches a quantization-enabled system for modeling a neural network comprising tensors representing node weights and edges, the system comprising: 
Memory [Accelerator memory. See Drumond at section 4.4; Fig. 5]; 
at least one processor coupled to the memory [Accelerator. See Drumond at section 4.4; Fig. 5]; and 
one or more computer readable storage media storing computer-readable instructions that when executed by the at least one processor, cause the system to perform a method of evaluating the neural network [Tensorflow (i.e. software comprising instructions to be executed), which is inherently stored in a storage media, is used to perform the perform the operations. Drumond at section 5.1], the instructions comprising: 
instructions that cause the system to transform a normal-precision neural network model to a block floating-point format neural network model according to a set of quantization parameters, the block floating-point format model including at least one shared exponent [The FP-to-BFP units convert neural network floating point (FP) (i.e. normal-precision) tensors to block floating point (BFP), thereby quantizing the neural network. Drumond at Abstract; section 4.4, 3rd paragraph; Fig. 5], 
instructions that cause the system to apply input tensors to an input layer of the block floating-point format neural network model, producing first output values [Inferencing (i.e. applying input tensors) is performed using the BFP weights, thereby producing outptu. Drumond at section 4.4, 2nd - 3rd paragraph; Fig. 5], and 
instructions that cause the system to calculate differences between the first output values and second output values generated by applying the input tensors to the normal-precision neural network model [The BFP neural network is compared to the FP32 neural network, thereby calculating the difference. Drumond at section 6, 1st paragraph; section 5.2, “Evaluation Metric”; Fig. 7].

Regarding claim 13, Drumond teaches the system of claim 12, wherein the computer-readable instructions further comprise: instructions that cause the system to retrain the normal-precision neural network model by adjusting a hyperparameter and retraining the normal-precision neural network model with the adjusted hyperparameter [The FP32 model (i.e. normal-precision neural network model) is trained (i.e. retrained) using an adjusted learning rate (i.e. hyperparameter), wherein the learning rate is adjusted periodically. Drumond at section 5.2].

Regarding claim 17, Drumond teaches the system of claim 12, further comprising:
a hardware accelerator configured to evaluate the block floating-point format neural network model by receiving input tensors, processing operations for nodes of the block floating-point neural network model representing in the block floating-point format, and produce an output tensor [A hardware accelerator is used for the training, which includes the evaluation, processing, and output production of the neural network. See Drumond at section 4.4; Fig. 5]; and 
[The accelerator is configure to use the block floating point (BFP) model. See Drumond at section 4.4; Fig. 5].

Regarding claim 18, Drumond teaches one or more computer-readable storage media storing computer-readable instructions that when executed by a processor, cause the processor to provide an interface for designing a block floating-point format neural network model [Tensorflow (i.e. software comprising instructions to be executed), which is inherently stored in a storage media, is used to perform the perform the operation. Drumond at section 5.1], the instructions comprising: 
instructions that cause the system to provide tensors for a normal-precision neural network model [The FP-32 model (i.e. normal-precision neural network model) is evaluated, which requires the model receiving input tensors. See Drumond at section 6, 1st paragraph; section 5.2, “Evaluation Metric”]; 
instructions that cause the system to convert the normal-precision format tensors to a block floating-point format for a block floating-point neural network model [The FP-to-BFP units convert neural network floating point (FP) (i.e. normal-precision) tensors to block floating point (BFP), thereby quantizing the neural network. Drumond at Abstract; section 4.4, 3rd paragraph; Fig. 5]; 
instructions that cause the system to apply at least one input tensor to an input layer of the normal-precision neural network model, producing at least one normal-precision output tensor [The FP-32 model (i.e. normal-precision neural network model) is evaluated, which requires the model receiving input tensors and produces an output to be evaluated. See Drumond at section 6, 1st paragraph; section 5.2, “Evaluation Metric”]; 
instructions that cause the system to apply the at least one input tensor to an input layer of the block floating-point neural network model, producing at least one output tensor [Inferencing (i.e. applying input tensors) is performed using the BFP weights, thereby producing output. Drumond at section 4.4, 2nd - 3rd paragraph; Fig. 5]; and 
instructions that cause the system to compare the at least one normal-precision output tensor and the at least one output tensor [The BFP neural network is compared to the FP32 neural network (i.e. comparing the normal-precision output tensor and the output tensor). Drumond at section 6, 1st paragraph; section 5.2, “Evaluation Metric”; Fig. 7].

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 3-5, 8, and 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Drumond in view of Anwar et al., “Fixed Point Optimization of Deep Convolutional Neural Networks for Object Recognition” (herein Anwar).
Regarding claims 3 and 10, taking claim 3 as exemplary, Drumond teaches the method of claim 1. Drumond doesn’t teach based on the comparing, retraining the quantized neural network model by adjusting at least one or more training parameters used to train the normal-precision neural network and training the quantized neural network with the adjusted at least one training parameter. In the same field of the neural network quantization, Anwar teaches based on comparing a quantized output to an output of [After quantization and based on the computed output error (i.e. comparing) the quantized neural network is retrained, wherein retraining includes adjusting of various parameters (batch size, learning rate, training epochs). Anwar at section 3; Fig. 3; Table 2]. Post-quantization training improves the neural network’s accuracy/performance [Anwar at section 2.1; section 3, 1st paragraph]. Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify the quantization of Drumond to further comprise: based on the comparing, retraining the quantized neural network model by adjusting at least one or more training parameters used to train the normal-precision neural network and training the quantized neural network with the adjusted at least one training parameter, as taught Anwar in order to improves the neural network’s accuracy/performance.

Regarding claim 4, Drumond, as modified, teaches the method of claim 3, wherein the adjusted at least one of the training parameters comprises at least one of the following: a batch size, a momentum value, a number of training epochs, or a drop out rate [Batch size and the number of training epochs are adjusted in retraining. Anwar at section 3; Fig. 3; Table 2].

Regarding claim 5, Drumond, as modified, teaches the method of claim 3, further comprising: 
producing the normal-precision neural network by training an untrained normal-precision neural network according to one or more training parameters at a selected learning rate [The FP32 (i.e. normal) neural network is trained using a selected learning rate. Drumond at section 5.2, “Training” subsection]; and 
wherein the adjusting the at least one of the training parameters comprises adjusting a learning rate to be lower than the selected learning rate used to train the untrained normal-precision neural network [During retraining (See claim 3 rejection above), the selected learning rate is decrement (i.e. adjusted to be lower) after each epoch. Anwar at section 4].

Regarding claim 8, Drumond teaches the method of claim 1. Drumond doesn’t teach based on the comparing, sparsifying at least one weight of the quantized neural network. In the same field of the neural networks, Anwar teaches based on comparing a quantized output to an output of a normal-precision floating-point model, retraining a quantized neural network model including sparsifying at least one weight of the quantized neural network [After quantization and based on the computed output error (i.e. comparing) the quantized neural network is retrained, which reduces more weights and induces sparsity (i.e. sparsifying at least one weight). Anwar at section 1, 4th paragraph; section 3, 2nd paragraph]. Inducing sparsity (i.e. sparsifying) reduces the number of network parameters and improves generalization [Anwar at Abstract]. Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify the quantization of Drumond to further comprise: based on the comparing, retraining a quantized neural network model thereby sparsifying at least one weight of the quantized neural network, as taught Anwar, in order to reduce the number of network parameters and improve generalization.


Claims 6, 7, 15, 16, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Drumond in view of Zhou et al., “Adaptive Quantization for Deep Neural Networks” (herein Zhou). 

Regarding claims 6, 15, and 20, taking claim 6 as exemplary, Drumond teaches the method of claim 1. Drumond doesn’t teach: based on the comparing, selecting a new quantized-precision format having at least one quantization parameter different than the quantized-precision format []; and quantizing the normal-precision neural network model to produce a re-quantized neural network model in the new quantized-precision format. In the same field of neural network quantization, Zhou teaches based on comparing a quantized output to an output of a normal-precision floating-point model, selecting a new quantized-precision format having at least one quantization parameter different than the quantized-precision format [Based on the accuracy penalty (i.e. the comparison), a new layer wise width (i.e. quantization parameter) is selected. Zhou at page 2, right column, “Quantization optimization”; page 6, left column]; and quantizing the normal-precision neural network model to produce a re-quantized neural [The neural network is quantized using the selected layer wise bit width (i.e. quantization parameter). Zhou at page 2, right column, “Quantization optimization”; page 6, left column]. This optimization procedure provides high neural network compression while retaining model prediction accuracy [Zhou at Abstract]. Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify the quantization of Drumond to further comprise: based on the comparing, selecting a new quantized-precision format having at least one quantization parameter different than the quantized-precision format (i.e. selecting a new block floating point format having the selected bit width); and quantizing the normal-precision neural network model to produce a re-quantized neural network model in the new quantized-precision format (i.e. quantizing using the new block floating point format), as taught by Zhou, in order to acheive high neural network compression while retaining model prediction accuracy.

Regarding claims 7 and 16, taking claim 7 as exemplary, Drumond teaches the method of claim 6, wherein the at least one different quantization parameter comprises, for at least one layer of the quantized neural network, at least one of: a bit width used to represent bit widths of node weight mantissas, a bit width used to represent bit widths of node weight exponents, a bit width used to represent bit widths of activation value mantissas, a bit width used to represent bit widths of activation value exponents, a tile size for a shared exponent, a parameter to share an exponent on a per-row basis, a parameter to share an exponent on a per-column basis, or a parameter specifying a method of common exponent selection [The selected different quantization parameter is the coefficient bit width, which includes a bit width used to represent bit widths of node weight mantissas. See Zhou at page 1, right column; page 2, right column].


Claims 9, 14, and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Drumond in view of Shin et al., “Fixed-Point Optimization of Deep Neural Networks with Adaptive Step Size Retraining” (herein Shin).

Regarding claim 9, Drumond teaches the method of claim 1. Drumond doesn’t teach: based on the comparing, changing a hyperparameter used to train the normal-precision neural network or the quantized-precision network and retraining the quantized-precision neural network with the changed hyperparameter; and wherein the changed hyperparameter includes one of a number of hidden layers in the normal-precision neural network, a node type for a layer of the normal-precision neural network, or a learning rate for training the neural network. In the same field of neural network quantization, Shin teaches: based on comparing a quantized output to an output of a normal-precision floating-point model, changing a hyperparameter used to train the normal-precision neural network or the quantized-precision network and retraining the quantized-precision neural network with the changed hyperparameter [After quantization and based on the computed output error (i.e. comparing) the quantized neural network is retrained, wherein retraining includes adjusting of the step size/learning rate (i.e. hyperparameter). Shin at section 2.2; Fig. 1 and caption]; and wherein the changed hyperparameter includes one of a number of hidden layers in the normal-precision neural network, a node type for a layer of the normal-precision neural network, or a learning rate for training the neural network [The step size/learning rate is the adjusted hyperparameter. Shin at section 2.2; Fig. 1 and caption]. Retraining a quantized neural network with step size adjustment improves the performance of the neural network [Shin at Abstract; section 2.2, 1st paragraph]. Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention to modify the quantization of Drumond to further comprise: based on the comparing, changing a hyperparameter used to train the normal-precision neural network or the quantized-precision network and retraining the quantized-precision neural network with the changed hyperparameter, as taught by Shin, in order to improve the performance of the neural network.

Regarding claim 14, Drumond teaches the system of claim 12. Drumond doesn’t teach that the computer-readable instructions further comprise: instructions that cause the system to retrain the block floating-point format neural network model by adjusting a hyperparameter and retraining the block floating-point format neural network model with the adjusted hyperparameter. In the same field of neural network quantization, Shin teaches instructions that cause the system to retrain the block floating-point format neural network model by adjusting a hyperparameter and retraining the block floating-point format [After quantization, the quantized neural network is retrained, wherein retraining includes adjusting of the step size/learning rate (i.e. hyperparameter). Shin at section 2.2; Fig. 1 and caption]. Retraining a quantized neural network with step size adjustment improves the performance of the neural network [Shin at Abstract; section 2.2, 1st paragraph]. Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention to modify the quantization of Drumond to further comprise: instructions that cause the system to retrain the block floating-point format neural network model by adjusting a hyperparameter and retraining the block floating-point format neural network model with the adjusted hyperparameter, as taught by Shin, in order to improve the performance of the neural network.

Regarding claim 19, Drumond teaches the computer-readable storage media of claim 18. Drumond doesn’t teach the instructions further comprising: instructions that cause the system to automatically select a hyperparameter to retrain the normal-precision neural network model or the block floating-point neural network model; and instructions that cause the system to retrain the normal-precision neural network model or the block floating-point neural network model based on the selected hyperparameter. In the same field of neural network quantization, Shin teaches instructions that cause the system to automatically select a hyperparameter to retrain a quantized neural network model [A step size/learning rate (i.e. hyperparameter) is selected to retrain the quantized neural network. Shin at section 2.2; Fig. 1 and caption]; and instructions that cause the system to retrain quantized neural network model based on the selected hyperparameter [The quantized neural network is retrained using the selected step size/learning rate (i.e. hyperparameter). Shin at section 2.2; Fig. 1 and caption]. Retraining a quantized neural network with step size adjustment improves the performance of the neural network [Shin at Abstract; section 2.2, 1st paragraph]. Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention to modify the quantization of Drumond to further comprise instructions that cause the system to automatically select a hyperparameter to retrain the quantized neural network model (i.e. the block floating-point neural network model); and instructions that cause the system to retrain the quantized neural network model (i.e. the block floating-point neural 



Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BENJAMIN P GEIB whose telephone number is (571)272-8628. The examiner can normally be reached Monday - Friday 8:30 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ALEXEY SHMATOV can be reached on (571)270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/BENJAMIN P GEIB/Primary Examiner, Art Unit 2123