DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Amendments
As per Applicant’s request, claims 5, 10, 16, and 21 are amended in the Examiner’s Amendment. Claims 11-14, 16-17 and 21-22 were amended and claim 20 was canceled in the claim set filed 09/01/2021. Claims 1-19 and 21-22 are pending and have been considered.

EXAMINER’S AMENDMENT
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee. Authorization for this examiner’s amendment for resolving antecedent basis issues was given in a phone interview with Ryan Carter on October 1, 2021. Email exchanges and versions of examiner’s amendments are attached to this action.
The following claims replace all existing claims.
1.	A method, comprising:
selecting a neural network model, wherein the neural network model includes a plurality of layers, and wherein each of the plurality of layers includes weights and activations;
modifying the neural network model by inserting a plurality of quantization layers within the neural network model;
associating a cost function with the modified neural network model, wherein the cost function includes a first coefficient corresponding to a first regularization term, and wherein an initial value of the first coefficient is pre-defined; and
training the modified neural network model to generate quantized weights for a layer by increasing the first coefficient until all weights are quantized and the first coefficient satisfies a pre-defined threshold, further including optimizing a weight scaling factor for the quantized 

2.	The method of claim 1, further comprising optimizing the weight scaling factor and the activation scaling factor based on minimizing a mean square quantization error (MSQE).

3.	The method of claim 1, further comprising inserting each quantization layer of the plurality of quantization layers after each activation output in each layer within the neural network model.

4.	The method of claim 1, wherein the cost function includes a second coefficient corresponding to a second regularization term based on the weight scaling factor and the activation scaling factor being power-of-two numbers.

5.	The method of claim 1, further comprising applying the quantized weights, the weight scaling factor, and the activation scaling factor to a fixed-point neural network, wherein the fixed-point neural network includes a plurality of convolutional layers, 
wherein each of the plurality of convolutional layers includes a convolution operation configured to perform convolution on feature maps and the quantized weights,
a bias addition operation configured to perform addition on an output of the convolution operation and biases,
a first multiplying operation configured to perform multiplication on an output of the bias addition operation and a first scale factor,
an activation operation configured to apply an activation function to an output of the first multiplying operation,
a second multiplying operation configured to perform multiplication on an output of the activation operation and a second scale factor, and
a quantization operation configured to quantize an output of the second multiplying operation.

6.	The method of claim 1, wherein training the neural network comprises:
updating the weights by a stochastic gradient descent method;
updating the weight scaling factor by the stochastic gradient descent method;
updating the activation scaling factor by the stochastic gradient descent method;
if the weight scaling factor and the activation scaling factor are of a power of two, including additional gradients of the stochastic descent method;
updating regularization coefficients by the stochastic gradient descent method; and
terminating the training if either the regularization coefficient is greater than a pre-determined constant or a number of iterations of the method is greater than a predetermined limit.

7.	The method of claim 5, wherein the weights are fixed-point weights.

8.	The method of claim 5, wherein the first scale factor is a product of the weight scaling factor and the activation scaling factor.

9.	The method of claim 5, wherein the activation operation is a non-linear activation function.  

10.	The method of claim 1, further comprising applying the quantized weights, the weight scaling factor, and the activation scaling factor to a fixed-point neural network, wherein the fixed-point neural network includes a plurality of convolutional layers,
wherein each of the plurality of convolutional layers includes a convolution operation configured to perform convolution on feature maps and the quantized weights,
a bias addition operation configured to perform addition on an output of the convolution operation and biases,
a rectified linear unit (ReLU) activation operation configured to apply an ReLU activation function to an output of the bias addition operation, 
a scale-factor multiplying operation configured to perform multiplication on an output of the ReLU activation operation and a scale factor, and
a quantization operation configured to quantize an output of the scale-factor multiplying operation.

11.	The method of claim 10, wherein the scale factor is a product of a weight scale factor and a quantization scale factor.

12.	An apparatus, comprising:
a memory storing instructions; and
a processor, 
wherein the processor is configured to execute the instructions causing the processor to: 
select a neural network model, wherein the neural network model includes a plurality of layers, and wherein each of the plurality of layers includes weights and activations;
modify the neural network model by inserting a plurality of quantization layers within the neural network model;
associate a cost function with the modified neural network model, wherein the cost function includes a first coefficient corresponding to a first regularization term, and wherein an initial value of the first coefficient is pre-defined; and
train the modified neural network model to generate quantized weights for a layer by increasing the first coefficient until all weights are quantized and the first coefficient satisfies a pre-defined threshold, and optimize a weight scaling factor for the quantized weights and an activation scaling factor for quantized activations, wherein the quantized weights are quantized using the optimized weight scaling factor.

13.	The apparatus of claim 12, wherein the processor is further configured to execute the instructions to optimize the weight scaling factor and the activation scaling factor based on minimizing a mean square quantization error (MSQE).

14.	The apparatus of claim 12, wherein the processor is further configured to execute the instructions to insert each quantization layer of the plurality of quantization layers after each activation output in each layer within the neural network model.

15.	The apparatus of claim 12, wherein the cost function includes a second coefficient corresponding to a second regularization term based on the weight scaling factor and the activation scaling factor being power-of-two numbers.

16.	The apparatus of claim 12, wherein the neural network model is a fixed-point neural network to which the quantized weights, the weight scaling factor, and the activation scaling factor are applied, wherein the fixed-point neural network includes a plurality of convolutional layers, 
wherein each of the plurality of convolutional layers is configured to perform a convolution operation on feature maps and the quantized weights, and
wherein the processor is further configured to execute the instructions to:
perform addition on an output of the convolution operation and biases,
perform multiplication on an output of the addition and a first scale factor,
apply an activation function to an output of the first multiplication,
perform multiplication on an output of the activation function and a second scale factor, and
quantize an output of the second multiplication.

17.	The apparatus of claim 12, wherein the processor is further configured to execute the instructions to:
update the weights by a stochastic gradient descent method;
update the weight scaling factor by the stochastic gradient descent method;
update the activation scaling factor by the stochastic gradient descent method;
if the weight scaling factor and the activation scaling factor are of a power of two, include additional gradients of the stochastic descent method;
update regularization coefficients by the stochastic gradient descent method; and
terminate the training if either the regularization coefficient is greater than a pre-determined constant or a number of iterations is greater than a predetermined limit.

18.	The apparatus of claim 16, wherein the weights are fixed-point weights.

19.	The apparatus of claim 16, wherein the first scale factor is a product of the weight scaling factor and the activation scaling factor.

20. (Cancelled)

21.	The apparatus of claim 12, wherein the neural network model is a fixed-point neural network to which is applied the quantized weights, the weight scaling factor, and the activation scaling factor, wherein the fixed-point neural network includes a plurality of convolutional devices,
wherein each of the plurality of convolutional devices is configured to perform convolution on feature maps and the quantized weights, and
wherein the processor is further configured to execute the instructions to:
perform addition on an output of each of the convolution devices and biases,
apply a rectified linear unit (ReLU) activation function to an output of the addition, 
perform multiplication on an output of the ReLU activation function and a scale factor, and
quantize an output of the multiplication.
 
22.	The apparatus of claim 21, wherein the scale factor is a product of a weight scale factor and a quantization scale factor.



REASONS FOR ALLOWANCE
The following is an examiner’s statement of reasons for allowance: Claims 1-19 and 21-22 are considered allowable since when reading the claims in light of the specification, none of the references of record either alone or in combination fairly disclose or suggest the combination of limitations specified in the independent claims, including at least:
For independent claims 1 and 12:
training…by increasing the first coefficient until all weights are quantized and the first coefficient satisfies a pre-defined threshold, with the first coefficient corresponding to a first regularization term.

The closest prior art of record is U.S. Patent Application Pub. No. 2017/0286830 to El-Yaniv et al., hereinafter El-Yaniv, “TensorQuant - A Simulation Toolbox for Deep Neural Network Quantization” (Oct. 2017) to Loroch et al., hereinafter Loroch, U.S. Patent Application Pub. No. 2014/0073892 to Randloev et al., hereinafter Randloev, and “Improving the speed of neural networks on CPUs” (2011) to Vanhoucke et al., hereinafter Vanhoucke.
	El-Yaniv teaches selecting a neural network model, wherein the neural network model includes a plurality of layers, and wherein each of the plurality of layers includes weights and activations in ¶ [0057]: “a neural network model is constructed and/or received”, “plurality of neurons each having a quantized activation function”, and, “The neurons are arranged in a plurality of layers and are connected by connections. Each connection has a quantized connection weight function”. El-Yaniv also teaches associating a cost function with the neural network model in ¶ [0065]: C denotes a cost function for a mini-batch gradient descent. El-Yaniv teaches training the neural network model to generate quantized weights for a layer in ¶ [0064] and Fig. 1, numeral 103: “Now, as shown at 103, the QNN, for instance the BNN is trained based on the output of the quantized functions (both quantized activation functions and the quantized connection weight functions)”
	Loroch teaches modifying the neural network model by inserting a plurality of quantization layers within the neural network model; (P. 4, col. 2, paragraphs captioned “Changes to the User’s Topology File” and “Assigning Layers for Quantization”. Examiner interprets assigning a layer for quantization to be functionally equivalent to inserting a quantization layer.)
Randloev teaches: wherein the cost function includes a first coefficient corresponding to a first regularization term, (Randloev teaches the following cost function                         
                            
                                
                                    f
                                
                                
                                    K
                                
                                
                                    λ
                                
                            
                        
                    , where λ is a first coefficient corresponding to a first regularization term                         
                            
                                
                                    
                                        
                                            f
                                        
                                    
                                
                                
                                    H
                                
                                
                                    2
                                
                            
                        
                    )

    PNG
    media_image1.png
    79
    745
    media_image1.png
    Greyscale


    PNG
    media_image2.png
    93
    418
    media_image2.png
    Greyscale

and wherein an initial value of the first coefficient is pre-defined; and (Randloev teaches in ¶ [0102] an initial parameter                         
                            
                                
                                    λ
                                
                                
                                    0
                                
                            
                        
                    )
Vanhoucke teaches: further including optimizing a weight scaling factor for the quantized weights and (Vanhoucke teaches on P. 5, §4.1, para. 2: “Weights are scaled by taking their maximum magnitude in each layer and normalizing them to fall in the [−128, 127] range.” Examiner interprets optimizing as a reduction to 8-bits. A weight scaling factor is the conversion factor to 8-bits. The quantized weights are the output of this 8-bit quantization.)
[optimizing] an activation scaling factor for quantized activations, and (Vanhouke teaches on P. 5, §4.1, para. 1: “we used 8-bit quantization to convert activations into unsigned char”. Examiner interprets optimizing as a reduction to 8-bits. An activation scaling factor is the conversion factor to 8-bits.)
wherein the quantized weights are quantized using the optimized weight scaling factor. (The optimized weight scaling factor is interpreted as a quantization conversion factor to 8-bits.)

However, none of the prior art of record either alone or in combination teaches: training…by increasing the first coefficient until all weights are quantized and the first coefficient satisfies a pre-defined threshold, with the first coefficient corresponding to a first regularization term.

Dependent claims 2-11, 13-19 and 21-22 are allowed as they depend upon an allowable independent claim.
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee. Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”
Conclusion
Claims 1-19 and 21-22 are allowed. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Asher Jablon whose telephone number is (571)270-7648.  The examiner can normally be reached on Monday - Friday, 9:00 am - 6:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abdullah Kawsar can be reached on (571)270-316.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.










/ASHER JABLON/Examiner, Art Unit 2127                                                                                                                                                                                                        
/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122