Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-22 are pending and have been examined.
Information Disclosure Statement
The information disclosure statement filed 08/09/2018 fails to comply with the provisions of 37 CFR 1.97, 1.98 and MPEP § 609 because multiple publications are missing the month of the publication date and/or the place of publication. It has been placed in the application file, but the information referred to therein has not been considered as to the merits.  Applicant is advised that the date of any re-submission of any item of information contained in this information disclosure statement or the submission of any missing element(s) will be the date of submission for purposes of determining compliance with the requirements based on the time of filing the statement, including all certification requirements for statements under 37 CFR 1.97(e).  See MPEP § 609.05(a). 
See MPEP 609.04(a)(I), ¶ 37 CFR 1.98(b): “Each publication must be identified by publisher, author (if any), title, relevant pages of the publication, and date and place of publication. The date of publication supplied must include at least the month and year of publication, except that the year of publication (without the month) will be accepted if the applicant points out in the information disclosure statement that the year of publication is sufficiently earlier than the effective U.S. filing date and any foreign priority date so that the particular month of publication is not in issue. The place of publication refers to the name of the journal, magazine, or other publication in which the information being submitted was published.”
For example, reference 9 in the information disclosure statement filed 08/09/2018 should read: L.-C. Chen et al., “DeepLab: Semantic image segmentation with deep convolutional nets, atrous Apr. 2017, In IEEE transactions on pattern analysis and machine intelligence vol. 40, no. 4, pp. 834-848.
Claim Objections
Claims 2 and 13 recite “minimum square quantization error (MSQE)” whereas specification ¶ [0018] line 2 recites: “mean square quantization error (MSQE)”. Applicant should ensure that the MSQE has the same meaning in the specification and claims.
Claim 16 is objected to because of the following informalities: “quantization operation” should read “quantization device” in the last line of claim 16.  Appropriate correction is required. For examination purposes, Examiner is interpreting the quantization operation in claim 16 as a quantization device.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are:
 “selector” in claim 12
“association device” in claim 12
 “quantization device” in claims 16, 21 (Examiner is interpreting the quantization operation in claim 16 as a quantization device.)
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 5-9, 11, 16, 17, 19, and 22 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claims 5, 8, 16, and 19 recite the limitation "the first scale factor". There is insufficient antecedent basis for this limitation in the claim. Examiner is interpreting “the first scale factor” in claims 5 and 16 as “a first scale factor”. Amending claims 5 and 16 as such would cure the deficiencies of claims 8 and 19, respectively. Claims 7-9 depend on claim 5 and are rejected for failing to cure the deficiencies of claim 5. Claims 18-20 depend on claim 16 and are rejected for failing to cure the deficiencies of claim 16.
Claims 5 and 16 recite the limitation “the second scale factor”. There is insufficient antecedent basis for this limitation in the claim. Examiner is interpreting the limitation as “a second scale factor”.
Claims 6 and 17 recite the limitation “the regularization coefficients” in line 7. There is insufficient antecedent basis for this limitation in the claim because claims 1 and 12 recite a singular regularization coefficient, not plural. Examiner interprets this limitation as “the regularization coefficient”.
Claims 11 and 22 recite the limitations “the scale factor”, “the weight scale factor”, and “the quantization scale factor”. There is insufficient antecedent basis for this limitation in the claims. Claim 11 depends on claims 1, 5, and 9, none of which recite any of “scale factor”, “weight scale factor”, nor “quantization scale factor”. Similarly, claim 22 depends on claims 12, 16, and 20, none of which recite any of “scale factor”, “weight scale factor”, nor “quantization scale factor”. However, claims 10 and 21 both recite the limitation “a scale factor”. Examiner interprets claim 11 as depending on the method of claim 10 and interprets claim 22 as depending on the apparatus of claim 21. In both claims 11 and 22, Examiner interprets “the weight scale factor” as “a weight scale factor” and “the quantization scale factor” as “a quantization scale factor”.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1, 3, 6, 12, 14, and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent Application Pub. No. 2017/0286830 to El-Yaniv et al., hereinafter El-Yaniv, in view of “TensorQuant - A Simulation Toolbox for Deep Neural Network Quantization” (Oct. 2017) to Loroch et al., hereinafter Loroch, U.S. Patent Application Pub. No. 2014/0073892 to Randloev et al., hereinafter Randloev, and “Improving the speed of neural networks on CPUs” (2011) to Vanhoucke et al., hereinafter Vanhoucke.

Regarding claim 1, El-Yaniv teaches: A method, comprising: selecting a neural network model, wherein the neural network model includes a plurality of layers, and wherein each of the plurality of layers includes weights and activations; (¶ [0057]: “a neural network model is constructed and/or received”, “plurality of neurons each having a quantized activation function”, and, “The neurons are arranged in a plurality of layers and are connected by connections. Each connection has a quantized connection weight function”.)
associating a cost function with the… neural network model, (¶ [0065]: C denotes a cost function for a mini-batch gradient descent)
training the… neural network model to generate quantized weights for a layer (¶ [0064] and Fig. 1, numeral 103: “Now, as shown at 103, the QNN, for instance the BNN is trained based on the output of the quantized functions (both quantized activation functions and the quantized connection weight functions)”)
However, El-Yaniv does not explicitly teach: modifying the neural network model by inserting a plurality of quantization layers within the neural network model;
wherein the cost function includes a first coefficient corresponding to a first regularization term, and wherein an initial value of the first coefficient is pre-defined; and
training…by increasing the first coefficient until all weights are quantized and the first coefficient satisfies a pre-defined threshold, further including optimizing a weight scaling factor for the quantized weights and an activation scaling factor for quantized activations, and wherein the quantized weights are quantized using the optimized weight scaling factor.
But Loroch teaches: modifying the neural network model by inserting a plurality of quantization layers within the neural network model; (P. 4, col. 2, paragraphs captioned “Changes to the User’s Topology File” and “Assigning Layers for Quantization”. Examiner interprets assigning a layer for quantization to be functionally equivalent to inserting a quantization layer.)
Loroch is in the same field of endeavor as El-Yaniv, namely, deep neural network quantization. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings of Loroch’s system into El-Yaniv’s system by adding quantization layers with a motivation of achieving a significant reduction of the communication bottleneck in distributed DNN training and faster neural network implementations on hardware accelerators like FPGAs. (Loroch, Abstract)
Further, Randloev teaches: wherein the cost function includes a first coefficient corresponding to a first regularization term, (Randloev teaches the following cost function                         
                            
                                
                                    f
                                
                                
                                    K
                                
                                
                                    λ
                                
                            
                        
                    , where λ is a first coefficient corresponding to a first regularization term                         
                            
                                
                                    
                                        
                                            f
                                        
                                    
                                
                                
                                    H
                                
                                
                                    2
                                
                            
                        
                    )

    PNG
    media_image1.png
    26
    745
    media_image1.png
    Greyscale


    PNG
    media_image2.png
    93
    418
    media_image2.png
    Greyscale

and wherein an initial value of the first coefficient is pre-defined; and (Randloev teaches in ¶ [0102] an initial parameter                         
                            
                                
                                    λ
                                
                                
                                    0
                                
                            
                        
                    )
training…by increasing the first coefficient until all weights are quantized and the first coefficient satisfies a pre-defined threshold, (Randloev teaches in ¶ [0031]: “The approximate optimal magnitude of the regularization parameter may… be determined by starting with an unacceptably small value of the regularization parameter and then gradually increase this by e.g. an exponential factor over a certain range in a geometric sequence.”)
Randloev is in the same field of endeavor as El-Yaniv and Loroch, namely, machine learning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings Randloev’s system into the combination of El-Yaniv and Loroch’s system by pre-defining a first coefficient corresponding to a first regularization term and increasing its value, with a motivation to find the best shift for the solution. (Randloev teaches in ¶ [0031], col. 2: “…by adding a small shift. If the shift is too small then the kernel will remain nearly ill-conditioned and for numerical reasons a solution will not be found. If on the other hand the shift is too large then the solution will be altered by an unacceptable amount thus injecting error.”)
Further, Vanhoucke teaches: further including optimizing a weight scaling factor for the quantized weights and (Vanhoucke teaches on P. 5, §4.1, para. 2: “Weights are scaled by taking their maximum magnitude in each layer and normalizing them to fall in the [−128, 127] range.” Examiner interprets optimizing as a reduction to 8-bits. A weight scaling factor is the conversion factor to 8-bits. The quantized weights are the output of this 8-bit quantization.)
[optimizing] an activation scaling factor for quantized activations, and (Vanhouke teaches on P. 5, §4.1, para. 1: “we used 8-bit quantization to convert activations into unsigned char”. Examiner interprets optimizing as a reduction to 8-bits. An activation scaling factor is the conversion factor to 8-bits.)
wherein the quantized weights are quantized using the optimized weight scaling factor. (The optimized weight scaling factor is interpreted as a quantization conversion factor to 8-bits.)
	Vanhoucke is in the field of endeavor of quantizing neural networks. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings Vanhoucke’s system into the combination of El-Yaniv, Loroch, and Randloev’s system by quantizing the activations and weights to 8-bits values, with a motivation to reduce the computational burden of neural network computations. (Vanhoucke, Abstract)

Regarding claim 3, the combination of El-Yaniv, Loroch, Randloev, and Vanhoucke teaches: The method of claim 1,  
Further, Loroch teaches: further comprising inserting each quantization layer of the plurality of quantization layers after each activation output in each layer within the neural network model. (P. 4, col. 2, paragraphs captioned “Changes to the User’s Topology File” and “Assigning Layers for Quantization”. Assigning a layer for quantization amounts to inserting a quantization layer.)

Regarding claim 6, the combination of El-Yaniv, Loroch, Randloev, and Vanhoucke teaches: The method of claim 1,  
Further, El-Yaniv teaches: wherein training the neural network comprises: updating the weights by a stochastic gradient descent method; (El-Yaniv teaches in ¶ 5: “computing a plurality of weight gradients for backpropagation sub-processes”.)
updating the weight scaling factor by the stochastic gradient descent method; (Updating the weights by stochastic gradient descent updates the scaling factor for the respective weights)
updating the activation scaling factor by the stochastic gradient descent method; (El-Yaniv teaches in ¶ 5: “each the neuron gradient is of an output of a respective the quantized activation function in one layer of the plurality of layers with respect to an input of the respective quantized activation function”)
if the weight scaling factor and the activation scaling factor are of a power of two,… (El-Yaniv’s quantized weights and quantized activations are interpreted as power of two, as taught by “optionally binary” in ¶ 31.)
terminating the training if either the regularization coefficient is greater than a pre- determined constant or a number of iterations of the method is greater than a predetermined limit. (El-Yaniv ¶ 5: “the respective neuron gradient is set as a positive constant value”)
However, the combination of El-Yaniv, Loroch, Randloev, and Vanhoucke as combined above does not explicitly teach: 
…including additional gradients of the stochastic descent method;  
updating the regularization coefficients by the stochastic gradient descent method; and 
But Randloev teaches: 
updating the regularization coefficients* by the stochastic gradient descent method; and (*Coefficient. Randloev teaches a method for tuning the regularization coefficient (“parameter”) λ in ¶ 99-100. The parameter λ is shown to be a regularization coefficient by the first equation in ¶ 98.)
…including additional gradients of the stochastic descent method; (The scaling factors are already power of two from El-Yaniv’s system. In combination with El-Yaniv’s system, the regularization parameter is interpreted as an additional gradient.)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings of Randloev’s system into the combination of El-Yaniv, Loroch, Randloev, and Vanhoucke’s system by tuning Randloev’s regularization parameter using El-Yaniv’s method of stochastic gradient descent, with a motivation to achieve good performance for the learning algorithm. (Randloev ¶ [0099]: “To achieve a good performance, the regularization parameter λ should be tuned properly.”)

Regarding claim 12, El-Yaniv teaches: An apparatus, comprising: a selector configured to select a neural network model, wherein the neural network model includes a plurality of layers, and wherein each of the plurality of layers includes weights and activations; (¶ [0057]: “a neural network model is constructed and/or received”, “plurality of neurons each having a quantized activation function”, and, “The neurons are arranged in a plurality of layers and are connected by connections. Each connection has a quantized connection weight function”.)
an association device configured to associate a cost function with the…neural network model, (¶ [0065]: C denotes a cost function for a mini-batch gradient descent)
a training device configured to train the… neural network model to generate quantized weights for a layer (¶ [0064] and Fig. 1, numeral 103: “Now, as shown at 103, the QNN, for instance the BNN is trained based on the output of the quantized functions (both quantized activation functions and the quantized connection weight functions)”)
However, El-Yaniv does not explicitly teach: an insertion device configured to modify the neural network model by inserting a plurality of quantization layers within the neural network model; 
wherein the cost function includes a first coefficient corresponding to a first regularization term, and wherein an initial value of the first coefficient is pre-defined; and 
train… by increasing the first coefficient until all weights are quantized and the first coefficient satisfies a pre-defined threshold, and optimize a weight scaling factor for the quantized weights and an activation scaling factor for quantized activations, wherein the quantized weights are quantized using the optimized weight scaling factor.
But Loroch teaches: an insertion device configured to modify the neural network model by inserting a plurality of quantization layers within the neural network model; (P. 4, col. 2, paragraphs captioned “Changes to the User’s Topology File” and “Assigning Layers for Quantization”. Assigning a layer for quantization amounts to inserting a quantization layer.)
Loroch is in the same field of endeavor as El-Yaniv, namely, deep neural network quantization. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings of Loroch’s system into El-Yaniv’s system by adding quantization layers with a motivation of achieving a significant reduction of the communication bottleneck in distributed DNN training and faster neural network implementations on hardware accelerators like FPGAs. (Loroch, Abstract)
Further, Randloev teaches: wherein the cost function includes a first coefficient corresponding to a first regularization term, (Randloev teaches the following cost function                         
                            
                                
                                    f
                                
                                
                                    K
                                
                                
                                    λ
                                
                            
                        
                    , where λ is a first coefficient corresponding to a first regularization term                         
                            
                                
                                    
                                        
                                            f
                                        
                                    
                                
                                
                                    H
                                
                                
                                    2
                                
                            
                        
                    )

    PNG
    media_image1.png
    26
    745
    media_image1.png
    Greyscale


    PNG
    media_image2.png
    93
    418
    media_image2.png
    Greyscale

 and wherein an initial value of the first coefficient is pre-defined; and (Randloev teaches in ¶ [0102] an initial parameter                         
                            
                                
                                    λ
                                
                                
                                    0
                                
                            
                        
                    )
train… by increasing the first coefficient until all weights are quantized and the first coefficient satisfies a pre-defined threshold, (Randloev teaches in ¶ [0031]: “The approximate optimal magnitude of the regularization parameter may… be determined by starting with an unacceptably small value of the regularization parameter and then gradually increase this by e.g. an exponential factor over a certain range in a geometric sequence.”)
Randloev is in the same field of endeavor as El-Yaniv and Loroch, namely, machine learning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings Randloev’s system into the combination of El-Yaniv and Loroch’s system by pre-defining a first coefficient corresponding to a first regularization term and increasing its value, with a motivation to find the best shift for the solution. (Randloev teaches in ¶ [0031], col. 2: “…by adding a small shift. If the shift is too small then the kernel will remain nearly ill-conditioned and for numerical reasons a solution will not be found. If on the other hand the shift is too large then the solution will be altered by an unacceptable amount thus injecting error.”)
Further, Vanhoucke teaches: and optimize a weight scaling factor for the quantized weights and (Vanhoucke teaches on P. 5, §4.1, para. 2: “Weights are scaled by taking their maximum magnitude in each layer and normalizing them to fall in the [−128, 127] range.” Examiner interprets optimizing as a reduction to 8-bits. A weight scaling factor is the conversion factor to 8-bits. The quantized weights are the output of this 8-bit quantization.)
[optimize] an activation scaling factor for quantized activations, (Vanhouke teaches on P. 5, §4.1, para. 1: “we used 8-bit quantization to convert activations into unsigned char”. Examiner interprets optimizing as a reduction to 8-bits. An activation scaling factor is the conversion factor to 8-bits.)
wherein the quantized weights are quantized using the optimized weight scaling factor. (The optimized weight scaling factor is interpreted as a quantization conversion factor to 8-bits.)
	Vanhoucke is in the field of endeavor of quantizing neural networks. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings Vanhoucke’s system into the combination of El-Yaniv, Loroch, and Randloev’s system by quantizing the activations and weights to 8-bits values, with a motivation to reduce the computational burden of neural network computations.  (Vanhoucke, Abstract)

Regarding claim 14, the combination of El-Yaniv, Loroch, Randloev, and Vanhoucke teaches: The apparatus of claim 12, 
Further, Loroch teaches: wherein the insertion device is further comprised to insert each quantization layer of the plurality of quantization layers after each activation output in each layer within the neural network model. (P. 4, col. 2, paragraphs captioned “Changes to the User’s Topology File” and “Assigning Layers for Quantization”. Assigning a layer for quantization amounts to inserting a quantization layer.)

Regarding claim 17, the combination of El-Yaniv, Loroch, Randloev, and Vanhoucke teaches: The apparatus of claim 12, 
Further, El-Yaniv teaches: wherein the training device is further comprised to: update the weights by a stochastic gradient descent method; (El-Yaniv teaches in ¶ 5: “computing a plurality of weight gradients for backpropagation sub-processes”.)
update the weight scaling factor by the stochastic gradient descent method; (Updating the weights by stochastic gradient descent updates the scaling factor for the respective weights)
update the activation scaling factor by the stochastic gradient descent method; (El-Yaniv teaches in ¶ 19: “each the neuron gradient is of an output of a respective the quantized activation function in one layer of the plurality of layers with respect to an input of the respective quantized activation function”)
if the weight scaling factor and the activation scaling factor are of a power of two,… (El-Yaniv’s quantized weights and activations are interpreted as power of two, as taught by “optionally binary” in ¶ 33.)
terminate the training if either the regularization coefficient is greater than a pre-determined constant or a number of iterations is greater than a predetermined limit. (El-Yaniv ¶ 19: “the respective neuron gradient is set as a positive constant value”)
However, the combination of El-Yaniv, Loroch, Randloev, and Vanhoucke as combined above does not explicitly teach: 
… include additional gradients of the stochastic descent method;
update the regularization coefficients by the stochastic gradient descent method; and 
But Randloev teaches: 
updating the regularization coefficients* by the stochastic gradient descent method; and (*Coefficient. Randloev teaches a method for tuning the regularization coefficient (“parameter”) λ in ¶ 99-100. The parameter λ is shown to be a regularization coefficient by the first equation in ¶ 98.)
… include additional gradients of the stochastic descent method; (The scaling factors are already power of two from El-Yaniv’s system. In combination with El-Yaniv’s system, the regularization parameter is interpreted as an additional gradient.)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings of Randloev’s system into the combination of El-Yaniv, Loroch, Randloev, and Vanhoucke’s system by tuning the regularization coefficient, with a motivation to achieve good performance for the learning algorithm. (Randloev ¶ [0099]: To achieve a good performance, the regularization parameter λ should be tuned properly.)

Claims 2 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over El-Yaniv in view of Loroch, Randloev, and Vanhoucke, and further in view of U.S. Patent Application Pub. No. 20180349758 to Pan et al., hereinafter Pan.

Regarding claim 2, the combination of El-Yaniv, Loroch, Randloev, and Vanhoucke teaches: The method of claim 1, 
Further, Vanhoucke teaches: further comprising optimizing the weight scaling factor and the activation scaling factor (The 8-bit quantization of these scaling factors is interpreted as optimizing)
However, the combination of El-Yaniv, Loroch, Randloev, and Vanhoucke does not explicitly teach: based on minimizing a minimum square quantization error (MSQE).
	But Pan teaches: based on minimizing a minimum square quantization error (MSQE) (interpreted as mean square quantization error. Pan teaches a mean square quantization error in equation 4:

    PNG
    media_image3.png
    408
    888
    media_image3.png
    Greyscale

	Pan is in the field of endeavor of quantization neural networks. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings of Pan’s system into the combination of El-Yaniv in view of Loroch, Randloev, and Vanhoucke’s system to optimize the scaling factors based on Pan’s teaching of minimizing MSQE with a motivation to achieve uniform quantization (Pan ¶ [0040]: “Manner I [which starts at Pan ¶ 37] is an example of uniform quantization methods”). 

Regarding claim 13, the combination of El-Yaniv, Loroch, Randloev, and Vanhoucke teaches: The apparatus of claim 12,
Further, Vanhoucke teaches: further the training device is further configured to optimize the weight scaling factor and the activation scaling factor 
However, the combination of El-Yaniv, Loroch, Randloev, and Vanhoucke does not explicitly teach: based on minimizing a minimum square quantization error (MSQE).
But Pan teaches: based on minimizing a minimum square quantization error (MSQE). (interpreted as mean square quantization error. Pan teaches a mean square quantization error in equation 4:

    PNG
    media_image3.png
    408
    888
    media_image3.png
    Greyscale

	Pan is in the field of endeavor of quantization neural networks. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings of Pan’s system into the combination of El-Yaniv in view of Loroch, Randloev, and Vanhoucke’s system to optimize the scaling factors based on Pan’s teaching of minimizing MSQE with a motivation to achieve uniform quantization (Pan ¶ [0040]: “Manner I [which starts at Pan ¶ 37] is an example of uniform quantization methods”). 

Claims 4 and 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over El-Yaniv in view of Loroch, Randloev, and Vanhoucke, and further in view of “Deep Feature Selection: Theory and Application to Identify Enhancers and Promoters” (Nov. 2016) to Li et al., hereinafter Li.
Regarding claim 4, the combination of El-Yaniv, Loroch, Randloev, and Vanhoucke teaches: a cost function
Further, El-Yaniv teaches: based on the weight scaling factor and the activation scaling factor being power-of-two numbers (the combination’s weight scaling factor and activation scaling factor are interpreted as being power of two numbers, as taught by “optionally binary” in El-Yaniv ¶ 31.)
However, El-Yaniv in view of Loroch, Randloev, and Vanhoucke does not explicitly teach: a second coefficient corresponding to a second regularization term 
But Li teaches: a second coefficient corresponding to a second regularization term (Li p. 323 equation 1 below teaches an elastic net. The first term is an L2 regularization term corresponding to Randloev’s L2 regularizaiton term. The second term is an L1 regularization term, corresponding to claim 4’s second regularization term with a corresponding second coefficient α.)

    PNG
    media_image4.png
    51
    551
    media_image4.png
    Greyscale

Li is in the field of endeavor of optimizing neural networks. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings of Li’s system into the combination of El-Yaniv, Loroch, Randloev, and Vanhoucke’s system by including an L1 regularization term, with a motivation to avoid overfitting.

Regarding claim 15, the combination of El-Yaniv, Loroch, Randloev, and Vanhoucke teaches: a cost function 
Further, El-Yaniv teaches: based on the weight scaling factor and the activation scaling factor being power-of-two numbers (the combination’s weight scaling factor and activation scaling factor are interpreted as being power of two numbers, as taught by “optionally binary” in El-Yaniv ¶ 31.)
However, the combination of El-Yaniv, Loroch, Randloev, and Vanhoucke does not explicitly teach: a second coefficient corresponding to a second regularization term 
But Li teaches: a second coefficient corresponding to a second regularization term (Li p. 323 equation 1 below teaches an elastic net. The first term is an L2 regularization term corresponding to Randloev’s L2 regularizaiton term. The second term is an L1 regularization term, corresponding to claim 4’s second regularization term with a corresponding second coefficient α.)

    PNG
    media_image4.png
    51
    551
    media_image4.png
    Greyscale

Li is in the field of endeavor of optimizing neural networks. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings of Li’s system into the combination of El-Yaniv, Loroch, Randloev, and Vanhoucke’s system by including an L1 regularization term, with a motivation to avoid overfitting.

Claims 5, 7-11, 16, and 18-22 is/are rejected under 35 U.S.C. 103 as being unpatentable over El-Yaniv in view of Loroch, Randloev, and Vanhoucke, and further in view of U.S. Patent Application Pub. No. 20170154632 to Sung et al., hereinafter Sung.

Regarding claim 5, the combination of El-Yaniv, Loroch, Randloev, and Vanhoucke teaches: The method of claim 1,
Further, El-Yaniv teaches: wherein the fixed-point neural network includes a plurality of convolutional layers, (El-Yaniv teaches a convolutional neural network with a plurality of layers in ¶ [0097], and teaches in ¶ [0098]: “Continuous-valued inputs may be handed as fixed point numbers with m bits of precision”) wherein each of the plurality of convolutional layers includes a convolution operation configured to receive feature maps and the quantized weights, (Taught by El-Yaniv ¶ [0047] first sentence, and the convolutional neural network in ¶ [0050])
El-Yaniv teaches, “Continuous-valued inputs may be handed as fixed point numbers with m bits of precision” in ¶ [0098], but is silent on applying the quantized weights, the weight scaling factor, and the activation scaling factor to a fixed-point neural network.
Vanhoucke teaches: further comprising applying the quantized weights, the weight scaling factor, and the activation scaling factor to a fixed-point neural network, (Vanhoucke teaches on P. 5, § 4: Fixed-point implementation: “There are several properties of neural networks that make them prime candidates for a fixed-point implementation”)
It would have been obvious to one of ordinary skill in the art before the effective filing date to have incorporated the teachings of Vanhoucke’s system into the combination of El-Yaniv, Loroch, Randloev, and Vanhoucke’s system by using fixed-point computations with a motivation to represent activations “as unsigned integers without much concern for scaling” (Vanhoucke p. 5, § 4).
However, the combination of El-Yaniv, Loroch, Randloev, and Vanhoucke does not explicitly teach: a bias addition operation configured to receive an output of the convolution operation and biases, a first multiplying operation configured to receive an output of the bias addition operation and the first scale factor, an activation operation configured to receive an output of the first multiplier operation, a second multiplying operation configured to receive an output of the activation operation and the second scale factor, and a quantization operation configured to receive an output of the second multiplying operation.
But Sung teaches: a bias addition operation configured to receive an output of the convolution operation and biases, (Sung Fig. 9B reproduced/annotated below, first summation)
a first multiplying operation configured to receive an output of the bias addition operation and the first scale factor, (Sung Fig. 9B, first quantizer 931)
an activation operation configured to receive an output of the first multiplier operation, (Intra-frame prediction 932. Sung para. 52 recites a code-excited linear prediction, which can be interpreted as an activation predicting future values linearly based on past values - see para 137, “According to Fig. 9B... is used")
a second multiplying operation configured to receive an output of the activation operation and the second scale factor, and (Sung Fig. 9B, first quantizer 931)
a quantization operation configured to receive an output of the second multiplying operation. (Sung Fig. 9B, second quantizer 933)

    PNG
    media_image5.png
    516
    1247
    media_image5.png
    Greyscale

Sung, Figure 9B
Sung is in the field of endeavor of quantizing error vectors.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings of Sung’s system into the combination of El-Yaniv, Loroch, Randloev, and Vanhoucke’s system by performing the above operations, with a motivation to achieve quantization by the first and second multiplication operations in 931 “irrespective of whether the second quantizer 933 exists or not” (Sung ¶ [0158]: (The above intra-frame prediction procedure of each embodiment… may be applied irrespective of whether the second quantizer 933 exists or not.”)

Regarding claim 7, the combination of El-Yaniv, Loroch, Randloev, Vanhoucke and Sung teaches: The method of claim 5, 
Further, Vanhoucke teaches: wherein the weights are fixed-point weights. (Vanhoucke teaches on P. 5, § 4: Fixed-point implementation: “There are several properties of neural networks that make them prime candidates for a fixed-point implementation”)

Regarding claim 8, the combination of El-Yaniv, Loroch, Randloev, Vanhoucke and Sung teaches: The method of claim 5, 
However, the combination of El-Yaniv, Loroch, Randloev, Vanhoucke and Sung does not explicitly teach: wherein the first scale factor is a product of the weight scaling factor and the activation scaling factor.
But Vanhoucke teaches: wherein the first scale factor is a product of the weight scaling factor and the activation scaling factor. (Vanhoucke teaches an 8-bit weight scaling factor on P. 5, §4.1, para. 2: “Weights are scaled by taking their maximum magnitude in each layer and normalizing them to fall in the [−128, 127] range.” Vanhoucke also teaches an 8-bit activation scaling factor on P. 5, §4.1, para. 2: “…which a fast, approximate sigmoid implementation then maps to an 8-bit probability.” The first scale factor is interpreted as the sigmoid output, which is a product of the weight and activation scaling factors.)
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings of Vanhoucke’s system into the combination of El-Yaniv, Loroch, Randloev, Vanhoucke and Sung’s system by multiplying the weight and scaling factors with a motivation to represent activations “as unsigned integers without much concern for scaling” (Vanhoucke p. 5, § 4).

Regarding claim 9, the combination of El-Yaniv, Loroch, Randloev, Vanhoucke and Sung teaches: The method of claim 5, 
Further, Vanhoucke teaches: wherein the activation operation is a non-linear activation function. (Vanhoucke teaches a sigmoid activation operation, which is non-linear)

Regarding claim 10, the combination of El-Yaniv, Loroch, Randloev, and Vanhoucke teaches:
The method of claim 1, 
Further, El-Yaniv teaches: wherein the fixed-point neural network includes a plurality of convolutional layers, (El-Yaniv teaches a convolutional neural network with a plurality of layers in ¶ [0097], and teaches in ¶ [0098]: “Continuous-valued inputs may be handed as fixed point numbers with m bits of precision”) wherein each of the plurality of convolutional layers includes a convolution operation configured to receive feature maps and the quantized weights, (Taught by El-Yaniv ¶ [0047] first sentence, and the convolutional neural network in ¶ [0050])
El-Yaniv teaches, “Continuous-valued inputs may be handed as fixed point numbers with m bits of precision” in ¶ [0098], but is silent on applying the quantized weights, the weight scaling factor, and the activation scaling factor to a fixed-point neural network. 
Vanhoucke teaches: further comprising applying the quantized weights, the weight scaling factor, and the activation scaling factor to a fixed-point neural network, (Vanhoucke teaches on P. 5, § 4: Fixed-point implementation: “There are several properties of neural networks that make them prime candidates for a fixed-point implementation”)
It would have been obvious to one of ordinary skill in the art before the effective filing date to have incorporated the teachings of Vanhoucke’s system into the combination of El-Yaniv, Loroch, Randloev, and Vanhoucke’s system by using fixed-point computations with a motivation to represent activations “as unsigned integers without much concern for scaling” (Vanhoucke p. 5, § 4).
However, the combination of El-Yaniv, Loroch, Randloev, and Vanhoucke does not explicitly teach: a bias addition operation configured to receive an output of the convolution operation and biases, a rectified linear unit (ReLU) activation operation configured to receive an output of the bias addition operation, a scale-factor multiplying operation configured to receive an output of the ReLU activation operation and a scale factor, and a quantization operation configured to receive an output of the scale-factor multiplying operation.
But Sung teaches: a bias addition operation configured to receive an output of the convolution operation and biases, (Sung Fig. 9B reproduced/annotated below, first summation)
a[n]… activation operation configured to receive an output of the bias addition operation, (Intra-frame prediction 932. Sung para. 52 recites a code-excited linear prediction, which can be interpreted as an activation predicting future values linearly based on past values - see para 137, “According to... is used")
a scale-factor multiplying operation configured to receive an output of the… activation operation and a scale factor, and (Sung Fig. 9B, first quantizer 931)
a quantization operation configured to receive an output of the scale-factor multiplying operation. (Sung Fig. 9B, second quantizer 933)

    PNG
    media_image6.png
    516
    1247
    media_image6.png
    Greyscale

Sung, Figure 9B
Sung is in the field of endeavor of quantizing error vectors.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to have incorporated the teachings of Sung’s system into the combination of El-Yaniv, Loroch, Randloev, and Vanhoucke’s system by performing the above operations, with a motivation to quantize an error (Sung, Abstract).
Sung teaches all of the claim limitations in claim 1 from “a bias addition operation…” onward except for rectified linear unit (ReLU). But Loroch teaches: rectified linear unit (ReLU) (Loroch p. 2, end of § 1.1). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings of Loroch’s system (i.e., a ReLU activation function) into the combination of El-Yaniv, Loroch, Randloev, Vanhoucke, and Sung’s system with a motivation to simplify computations in the presence of noise. (Loroch p. 2, end of § 1.1: “The argument in [18] is that the quantization noise introduced after every rounding step can be transformed into a single source of noise at the end of the layer, since all operations are linear, if the activation function is a rectifying linear unit (ReLU). In other words, it does not matter at which point the noise level is increased, thus rounding at the end is a sufficient approximation.”)

Regarding claim 11, the combination of El-Yaniv, Loroch, Randloev, Vanhoucke and Sung teaches: The method of claim 9, (interpreted as “The method of claim 10”)
However, the combination of El-Yaniv, Loroch, Randloev, Vanhoucke and Sung does not explicitly teach: wherein the scale factor is a product of the weight scale factor and the quantization scale factor. 
But the Vanhoucke teaches: wherein the scale factor is a product of the weight scale factor and the quantization scale factor. (Vanhoucke teaches an 8-bit weight scale factor on P. 5, §4.1, para. 2: “Weights are scaled by taking their maximum magnitude in each layer and normalizing them to fall in the [−128, 127] range.” Vanhoucke also teaches an 8-bit quantization scale factor of the activation function on P. 5, §4.1, para. 2: “…which a fast, approximate sigmoid implementation then maps to an 8-bit probability.” Passing the these factors through the system’s multiplication operation in Sung Fig. 9B, 931 results in the scale factor as claimed.)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings of Vanhoucke system into the combination of El-Yaniv, Loroch, Randloev, Vanhoucke and Sung’s system by quantizing the neural network, with a motivation to represent activations “as unsigned integers without much concern for scaling” (Vanhoucke p. 5, § 4).

Regarding claim 16, the combination of El-Yaniv, Loroch, Randloev, and Vanhoucke teaches: The apparatus of claim 12, 
Further, El-Yaniv teaches: wherein the neural network is a fixed-point neural network …, wherein the fixed-point neural network includes a plurality of convolutional layers, (El-Yaniv teaches a convolutional neural network with a plurality of layers in ¶ [0097], and teaches in ¶ [0098]: “Continuous-valued inputs may be handed as fixed point numbers with m bits of precision”) wherein each of the plurality of convolutional layers includes a convolution device configured to receive feature maps and the quantized weights, (Taught by El-Yaniv ¶ [0047] first sentence, and the convolutional neural network in ¶ [0050])
El-Yaniv teaches, “Continuous-valued inputs may be handed as fixed point numbers with m bits of precision” in ¶ [0098], but is silent on applying the quantized weights, the weight scaling factor, and the activation scaling factor to a fixed-point neural network.
Vanhoucke teaches: …to which the quantized weights, the weight scaling factor, and the activation scaling factor are applied (Vanhoucke teaches on P. 5, § 4: Fixed-point implementation: “There are several properties of neural networks that make them prime candidates for a fixed-point implementation”) 
It would have been obvious to one of ordinary skill in the art before the effective filing date to have incorporated the teachings of Vanhoucke’s system into the combination of El-Yaniv, Loroch, Randloev, and Vanhoucke’s system by using fixed-point computations with a motivation to represent activations “as unsigned integers without much concern for scaling” (Vanhoucke p. 5, § 4).
However, the combination of El-Yaniv, Loroch, Randloev, and Vanhoucke does not explicitly teach: a bias addition device configured to receive an output of the convolution operation and biases, a first multiplier configured to receive an output of the bias addition device and the first scale factor, an activation device configured to receive an output of the first multiplier, a second multiplier configured to receive an output of the activation operation and the second scale factor, and  a quantization operation configured to receive an output of the second multiplier.
But Sung teaches: a bias addition device configured to receive an output of the convolution operation and biases, (Sung Fig. 9B reproduced/annotated below, first summation)
a first multiplier configured to receive an output of the bias addition device and the first scale factor, (Sung Fig. 9B, first quantizer 931)
an activation device configured to receive an output of the first multiplier, (Intra-frame prediction 932. Sung para. 52 recites a code-excited linear prediction, which can be interpreted as an activation predicting future values linearly based on past values - see para 137, “According to Fig. 9B... is used")
a second multiplier configured to receive an output of the activation operation and the second scale factor, and (Sung Fig. 9B, first quantizer 931)
a quantization operation configured to receive an output of the second multiplier. (Sung Fig. 9B, second quantizer 933)


    PNG
    media_image5.png
    516
    1247
    media_image5.png
    Greyscale

Sung, Figure 9B
Sung is in the field of endeavor of quantizing error vectors.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to have incorporated the teachings of Sung’s system into the combination of El-Yaniv, Loroch, Randloev, and Vanhoucke’s system by performing the above operations, with a motivation to achieve quantization by the first and second multiplication operations in 931 “irrespective of whether the second quantizer 933 exists or not” (Sung ¶ [0158]: (The above intra-frame prediction procedure of each embodiment… may be applied irrespective of whether the second quantizer 933 exists or not.”)

Regarding claim 18, the combination of El-Yaniv, Loroch, Randloev, Vanhoucke and Sung teaches: The apparatus of claim 16, 
Further, Vanhoucke teaches: wherein the weights are fixed-point weights. (Vanhoucke teaches on P. 5, § 4: Fixed-point implementation: “There are several properties of neural networks that make them prime candidates for a fixed-point implementation”)

Regarding claim 19, the combination of El-Yaniv, Loroch, Randloev, Vanhoucke and Sung teaches: The apparatus of claim 16, 
However, the combination of El-Yaniv, Loroch, Randloev, Vanhoucke and Sung does not explicitly teach: wherein the first scale factor is a product of the weight scaling factor and the activation scaling factor.
But Vanhoucke teaches: wherein the first scale factor is a product of the weight scaling factor and the activation scaling factor.(Vanhoucke teaches an 8-bit weight scaling factor on P. 5, §4.1, para. 2: “Weights are scaled by taking their maximum magnitude in each layer and normalizing them to fall in the [−128, 127] range.” Vanhoucke also teaches an 8-bit activation scaling factor on P. 5, §4.1, para. 2: “…which a fast, approximate sigmoid implementation then maps to an 8-bit probability.” The first scale factor is interpreted as the sigmoid output, which is a product of the weight and activation scaling factors.)
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings of Vanhoucke’s system into the combination of El-Yaniv, Loroch, Randloev, Vanhoucke and Sung’s system by multiplying the weight and scaling factors with a motivation to represent activations “as unsigned integers without much concern for scaling” (Vanhoucke p. 5, § 4).

Regarding claim 20, the combination of El-Yaniv, Loroch, Randloev, Vanhoucke and Sung teaches: The apparatus of claim 16, 
Further, Vanhoucke teaches: wherein the activation device is a non-linear activation device. (Vanhoucke teaches a sigmoid activation operation, which is non-linear)

Regarding claim 21, the combination of El-Yaniv, Loroch, Randloev, and Vanhoucke teaches: The apparatus of claim 12, 
Further, El-Yaniv teaches: wherein the neural network is a fixed-point neural network, … wherein the fixed-point neural network includes a plurality of convolutional devices, (El-Yaniv teaches a convolutional neural network with a plurality of layers in ¶ [0097], and teaches in ¶ [0098]: “Continuous-valued inputs may be handed as fixed point numbers with m bits of precision”) 
wherein each of the plurality of convolutional devices is configured to receive feature maps and the quantized weights, (Taught by El-Yaniv ¶ [0047] first sentence, and the convolutional neural network in ¶ [0050])
El-Yaniv teaches, “Continuous-valued inputs may be handed as fixed point numbers with m bits of precision” in ¶ [0098], but is silent on applying the quantized weights, the weight scaling factor, and the activation scaling factor to a fixed-point neural network. 
Vanhoucke teaches: to which is applied the quantized weights, the weight scaling factor, and the activation scaling factor, (Vanhoucke teaches on P. 5, § 4: Fixed-point implementation: “There are several properties of neural networks that make them prime candidates for a fixed-point implementation”)
It would have been obvious to one of ordinary skill in the art before the effective filing date to have incorporated the teachings of Vanhoucke’s system into the combination of El-Yaniv, Loroch, Randloev, and Vanhoucke’s system by using fixed-point computations with a motivation to represent activations “as unsigned integers without much concern for scaling” (Vanhoucke p. 5, § 4).
However, the combination of El-Yaniv, Loroch, Randloev, and Vanhoucke does not explicitly teach: a bias addition device configured to receive an output of the convolution device and biases, a rectified linear unit (ReLU) activation device configured to receive an output of the bias addition device, 222180-119 (WB-201711-001-1) a scale-factor multiplier configured to receive an output of the ReLU activation device and a scale factor, and a quantization device configured to receive an output of the scale-factor multiplier.
But Sung teaches: a bias addition device configured to receive an output of the convolution device and biases, (Sung Fig. 9B reproduced/annotated below, first summation)
a[n]… activation device configured to receive an output of the bias addition device, (Intra-frame prediction 932. Sung para. 52 recites a code-excited linear prediction, which can be interpreted as an activation predicting future values linearly based on past values - see para 137, “According to... is used")
a scale-factor multiplier configured to receive an output of the… activation device and a scale factor, and (Sung Fig. 9B, first quantizer 931)
a quantization device configured to receive an output of the scale-factor multiplier. (Sung Fig. 9B, second quantizer 933)

    PNG
    media_image6.png
    516
    1247
    media_image6.png
    Greyscale

Sung, Figure 9B
Sung is in the field of endeavor of quantizing error vectors.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to have incorporated the teachings of Sung’s system into the combination of El-Yaniv, Loroch, Randloev, and Vanhoucke’s system by performing the above operations, with a motivation to quantize an error (Sung, Abstract).
Sung teaches all of the claim limitations in claim 12 from “a bias addition operation…” onward except for rectified linear unit (ReLU). But Loroch teaches: rectified linear unit (ReLU) (Loroch p. 2, end of § 1.1). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings of Loroch’s system (i.e., a ReLU activation function) into the combination of El-Yaniv, Loroch, Randloev, Vanhoucke, and Sung’s system with a motivation to simplify computations in the presence of noise. (Loroch p. 2, end of § 1.1: “The argument in [18] is that the quantization noise introduced after every rounding step can be transformed into a single source of noise at the end of the layer, since all operations are linear, if the activation function is a rectifying linear unit (ReLU). In other words, it does not matter at which point the noise level is increased, thus rounding at the end is a sufficient approximation.”)

Regarding claim 22, the combination of El-Yaniv, Loroch, Randloev, Vanhoucke and Sung teaches: The apparatus of claim 20, (interpreted as “The apparatus of claim 21”) 
However, El-Yaniv in view of Loroch, Randloev, Vanhoucke and Sung does not explicitly teach: wherein the scale factor is a product of the weight scale factor and the quantization scale factor.
But the Vanhoucke teaches: wherein the scale factor is a product of the weight scale factor and the quantization scale factor. (Vanhoucke teaches an 8-bit weight scale factor on P. 5, §4.1, para. 2: “Weights are scaled by taking their maximum magnitude in each layer and normalizing them to fall in the [−128, 127] range.” Vanhoucke also teaches an 8-bit quantization scale factor of the activation function on P. 5, §4.1, para. 2: “…which a fast, approximate sigmoid implementation then maps to an 8-bit probability.” Passing the these factors through the system’s multiplication operation in Sung Fig. 9B, 931 results in the scale factor as claimed.)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings of Vanhoucke system into the combination of El-Yaniv, Loroch, Randloev, Vanhoucke and Sung’s system by quantizing the neural network, with a motivation to represent activations “as unsigned integers without much concern for scaling” (Vanhoucke p. 5, § 4).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
 “Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding” to Han et al. teaches quantized neural networks.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Asher H. Jablon whose telephone number is (571)270-7648.  The examiner can normally be reached on Monday - Friday, 8:30 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571) 272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/ASHER H. JABLON/Examiner, Art Unit 2122                                                                                                                                                                                                        
/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122