DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is in response to amendments and remarks filed on 01/04/2022. In the current amendments, claims 8 and 16 are cancelled, claims 1, 3, 6, 7, 11, 13-15, and 19-20 are amended, and claims 21 and 22 are added. Claims 1-7, 9-15, and 17-22 are pending and have been examined.
In response to amendments and remarks filed on 01/04/2022, the 35 U.S.C. 101 and 35 U.S.C. 112(b) rejections put forth in the previous Office Action have been withdrawn. 

Claim Interpretation
Claims 19-20 recite “computer readable storage medium.” Specification [0098] provides the following, “A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire” (emphasis added). Therefore, “computer readable storage medium” in claims 19-20 has been interpreted as “non-transitory computer readable storage medium.”

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(d):
(d) REFERENCE IN DEPENDENT FORMS.—Subject to subsection (e), a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.

The following is a quotation of pre-AIA  35 U.S.C. 112, fourth paragraph:


Claims 17-18 are rejected under 35 U.S.C. 112(d) or pre-AIA  35 U.S.C. 112, 4th paragraph, as being of improper dependent form for failing to further limit the subject matter of the claim upon which it depends, or for failing to include all the limitations of the claim upon which it depends.  Each of claims 17-18 is dependent on claim 16, which is a cancelled claim. Therefore, claims 17-18 are in improper dependent form for failing to contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. Applicant may cancel the claim(s), amend the claim(s) to place the claim(s) in proper dependent form, rewrite the claim(s) in independent form, or present a sufficient showing that the dependent claim(s) complies with the statutory requirements. For examinations purposes, claims 17-18 are interpreted as being dependent on claim 11.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-7, 9-15, and 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over WANG et al. (US 2019/0050710 A1) in view of CHOI et al. (US 2019/0138882 A1).
Regarding Claim 1,
WANG et al. teaches A computer-implemented method, comprising: for a set of weights, determining, by a system operatively coupled to a processor, a quantization scale as a function of a bit precision level, wherein the quantization scale comprises quantization scale values (pg. 4 [0027] teaches computer-implemented and processor; Fig. 2 teaches a system in the memory coupled to the processor; pg. 8 [0066] teaches determining a quantization scale based on a quantization scale as a function of a bit precision level; pg. 9 [0074]: “In some embodiments, a first layer (e.g., i=2) of the plurality of layers in the reduced neural network model (e.g., model 112) has a first reduced bit-width (e.g., 4-bit) that is smaller than the original bit-width (e.g., 32-bit) of the first neural network model, a second layer (e.g., i=3) of the plurality of layers in the reduced neural network model (e.g., model 112) has a second reduced bit-width (e.g., 6-bit) that is smaller than the original bit-width of the first neural network model, and the first reduced bit-width is distinct from the second reduced bit-width in the reduced neural network model” teaches various quantization scale values for a set of weights; pg. 8 [0067]: “The result of the above process is the set of quantized weights Wopt,i with the optimal quantization bit-width(s) for each layer i, and the set of quantized bias bopt,i with the optimal quantization bit-width(s) for each layer i. The adaptive bit-width model 112 is thus obtained” teaches obtaining quantized weights based on quantization bit-width (scale); also see pg. 6 [0057] for 8-bit uniform quantization to the weight parameters);
...and applying, by the system, the quantization scale value to generate the quantized weights in one or more minibatches to facilitate training a deep learning system or producing an inference by the deep learning system (pg. 2 [0019]: “during training, integer (INT) weight regularization and 8-bit quantization techniques are applied to push the values of the full-precision parameters of the deep learning model 106 toward their corresponding integer values, and reduce the value ranges of the parameters such that they fall within the dynamic range of a predefined reduced maximum bit-width (e.g., 8 bits)” teaches quantizing weights based on a quantization scale value to facilitate training a deep learning model; pg. 2 [0020] teaches producing inference by the deep learning system; pg. 2 [0020] further teaches examples of minibatch of training data (selection of training data that is used to train the model); Fig. 2 teaches a system in the memory coupled to the processor).
WANG et al. does not appear to explicitly teach determining, by the system, a quantization scale value that reduces a quantization error of the set of weights in accordance with a defined quantization criterion relating to the quantization error; quantizing, by the system, weights of at least a portion of a layer of the set of weights to generate quantized weights based on the quantization scale value and an offset value.
However, CHOI et al. teaches determining, by the system, a quantization scale value that reduces a quantization error of the set of weights in accordance with a defined quantization criterion relating to the quantization error (pg. 1 [0005] “optimizing a weight scaling factor for the quantized weights and an activation scaling factor for quantized activations, and wherein the quantized weights are quantized using the optimized weight scaling factor” and pg. 2 [0018]: “During training, a scaling factor for weights in each layer is learnable as well such that the present system optimizes the scaling factor to minimize the MSQE” teach determining a scaling factor for weights (quantization scale value) that minimizes (reduces) the quantization error for weights in accordance with a defined quantization criterion relating to the quantization error, which is the particular measure of mean square quantization error (MSQE) for weights; Figs. 1-2 teach system for weight quantization);
quantizing, by the system, weights of at least a portion of a layer of the set of weights to generate quantized weights based on the quantization scale value and an offset value (pg. 4 [0051]:

    PNG
    media_image1.png
    318
    507
    media_image1.png
    Greyscale
teaches using a quantization function (see Equation (1)) to generate quantized weights based on scaling factor for weights (quantization scale value) and a rounding function that includes an offset value (see Equation (2), offset = 0.5); pg. 1 [0006] teaches weights of at least a portion of a layer).
WANG et al. and CHOI et al. are analogous art to the claimed invention because they are directed to weight quantization.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate determining, by the system, a quantization scale value that reduces a quantization error of the set of weights in accordance with a defined quantization criterion relating to the quantization error; quantizing, by the system, weights of at least a portion of a layer of the set of weights to generate quantized weights based on the quantization scale value and an offset value as taught by CHOI et al. to the disclosed invention of WANG et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to “[optimize] the scaling factor to minimize the MSQE” to “[obtain] low-precision neural networks having quantized weights” because “low-precision weights and activations are preferred and sometimes necessary for efficient processing with reduced power consumption when computation and power budgets are limited” (CHOI et al. pg. 1 [0004] & pg. 2 [0017]-[0018]).

Regarding Claim 2,
WANG et al. in view of CHOI et al. teaches the computer-implemented method of claim 1.
WANG et al. further teaches wherein the quantizing the weights further comprises, for the bit precision level, at least one of symmetrically or uniformly quantizing the weights to generate the quantized weights based on the quantization scale value (pg. 6 [0057]: “When applying the 8-bit uniform quantization to the weight parameters, the weight parameters are still expressed in full-precision format, but the total number of such response levels are bound by 28- 1, with half in the positive and half in the negative. In each iteration of the training phase, this quantization function is applied in each layer in the forward pass” teaches the quantizing can further comprise of for a specific bit precision level, applying uniform quantization and symmetric quantization (“with half in the positive and half in the negative”) based on the 8-bit quantization (quantization scale value)).
CHOI et al. further teaches wherein rounding is utilized to generate the quantized weights (pg. 4 [0051] teaches rounding is utilized to generate the quantized weights).
WANG et al. and CHOI et al. are analogous art to the claimed invention because they are directed to weight quantization.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein rounding is utilized to generate the quantized weights as taught by CHOI et al. to the disclosed invention of WANG et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to “[optimize] the scaling factor to minimize the MSQE” to “[obtain] low-precision neural networks having quantized weights” because “low-precision weights and activations are preferred and sometimes necessary for efficient processing with reduced power consumption when computation and power budgets are limited” (CHOI et al. pg. 1 [0004] & pg. 2 [0017]-[0018]).

Regarding Claim 3,
WANG et al. in view of CHOI et al. teaches the computer-implemented method of claim 1.
CHOI et al. further teaches wherein the determining the quantization scale value comprises estimating the quantization scale value to apply to a weight of the set of weights as a linear or a non-linear function of a first statistical function of a weight value of the weight and a linear or non-linear function of a second statistical function of the weight value, and wherein the second statistical function is different from the first statistical function (pg. 1 [0005] “optimizing a weight scaling factor for the quantized weights and an activation scaling factor for quantized activations, and wherein the quantized weights are quantized using the optimized weight scaling factor” and pg. 2 [0018]: “During training, a scaling factor for weights in each layer is learnable as well such that the present system optimizes the scaling factor to minimize the MSQE” teach determining a scaling factor for weights (quantization scale value) and how that factor affects the mean square quantization error (MSQE), which corresponds to estimating the quantization scale value; pg. 4 [0053]:  
    PNG
    media_image2.png
    462
    520
    media_image2.png
    Greyscale
teaches applying a non-linear function of a first statistical function (cost function Equation (6)) of a weight value; pg. 5 [0059]:  

    PNG
    media_image3.png
    252
    510
    media_image3.png
    Greyscale
 teaches applying a non-linear function of a second statistical function (cost function Equation (10)) of a weight value; Equation (6) and Equation (10) are different).
WANG et al. and CHOI et al. are analogous art to the claimed invention because they are directed to weight quantization.

One of ordinary skill in the arts would have been motivated to make this modification in order to “[optimize] the scaling factor to minimize the MSQE” to “[obtain] low-precision neural networks having quantized weights” because “low-precision weights and activations are preferred and sometimes necessary for efficient processing with reduced power consumption when computation and power budgets are limited” (CHOI et al. pg. 1 [0004] & pg. 2 [0017]-[0018]).
Regarding Claim 4,
WANG et al. in view of CHOI et al. teaches the computer-implemented method of claim 3.
CHOI et al. further teaches further comprising: determining, by the system, a first coefficient value associated with the first statistical function based on measurement data relating to weight pg. 4 [0053]:  
    PNG
    media_image2.png
    462
    520
    media_image2.png
    Greyscale
teaches determining a regularization coefficient (first coefficient) associated with the first statistical function (Equation (6)) based on weight quantization data); and
determining, by the system, a second coefficient value associated with the second statistical function based on the measurement data (pg. 5 [0059]:  

    PNG
    media_image3.png
    252
    510
    media_image3.png
    Greyscale
 teaches determining a regularization coefficient (second coefficient) associated with the second statistical function (Equation (10)) based on weight quantization data).

It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate further comprising: determining, by the system, a first coefficient value associated with the first statistical function based on measurement data relating to weight quantization; and determining, by the system, a second coefficient value associated with the second statistical function based on the measurement data as taught by CHOI et al. to the disclosed invention of WANG et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to “[optimize] the scaling factor to minimize the MSQE” to “[obtain] low-precision neural networks having quantized weights” because “low-precision weights and activations are preferred and sometimes necessary for efficient processing with reduced power consumption when computation and power budgets are limited” (CHOI et al. pg. 1 [0004] & pg. 2 [0017]-[0018]).
Regarding Claim 5,
WANG et al. in view of CHOI et al. teaches the computer-implemented method of claim 4.
CHOI et al. further teaches further comprising: updating, by the system, at least one of the first coefficient value or the second coefficient value to generate at least one of a third coefficient value associated with the first statistical function or a fourth coefficient value associated with the second statistical function based on additional measurement data relating to the weight quantization (pg. 4 [0053]:  
    PNG
    media_image2.png
    462
    520
    media_image2.png
    Greyscale
teaches increasing (updating) the regularization coefficient (first coefficient) associated with the first statistical function (Equation (6)) based on weight quantization data to generate a third coefficient (the increased coefficient); pg. 5 [0059]:  

    PNG
    media_image3.png
    252
    510
    media_image3.png
    Greyscale
 teaches learning (updating) the regularization coefficient (second coefficient) associated with the second statistical function (Equation (10)) based on weight quantization data to generate a fourth coefficient (the learned/updated coefficient)).

It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate further comprising: updating, by the system, at least one of the first coefficient value or the second coefficient value to generate at least one of a third coefficient value associated with the first statistical function or a fourth coefficient value associated with the second statistical function based on additional measurement data relating to the weight quantization as taught by CHOI et al. to the disclosed invention of WANG et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to “[optimize] the scaling factor to minimize the MSQE” to “[obtain] low-precision neural networks having quantized weights” because “low-precision weights and activations are preferred and sometimes necessary for efficient processing with reduced power consumption when computation and power budgets are limited” (CHOI et al. pg. 1 [0004] & pg. 2 [0017]-[0018]).
Regarding Claim 6,
WANG et al. in view of CHOI et al. teaches the computer-implemented method of claim 1.
CHOI et al. further teaches wherein the quantization scale value is associated with a quantization error within a defined value distance of a minimum quantization error (pg. 1 [0005] “optimizing a weight scaling factor for the quantized weights and an activation scaling factor for quantized activations, and wherein the quantized weights are quantized using the optimized weight scaling factor” and pg. 2 [0018]: “During training, a scaling factor for weights in each layer is learnable as well such that the present system optimizes the scaling factor to minimize the MSQE” teach determining a scaling factor for weights (quantization scale value) that minimizes (reduces) the quantization error for weights, minimizing corresponds to determining quantization error within a defined value distance of a minimum quantization error).

It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein the quantization scale value is associated with a quantization error within a defined value distance of a minimum quantization error as taught by CHOI et al. to the disclosed invention of WANG et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to “[optimize] the scaling factor to minimize the MSQE” to “[obtain] low-precision neural networks having quantized weights” because “low-precision weights and activations are preferred and sometimes necessary for efficient processing with reduced power consumption when computation and power budgets are limited” (CHOI et al. pg. 1 [0004] & pg. 2 [0017]-[0018]).
Regarding Claim 7,
WANG et al. in view of CHOI et al. teaches the computer-implemented method of claim 1.
CHOI et al. further teaches wherein the determining the quantization scale value comprises estimating the quantization scale value based on the defined quantization criterion, wherein, in accordance with the defined quantization criterion, the quantization scale value is within a defined value distance of a quantization scale value that is able to minimize the quantization error associated with quantizing the weights (pg. 1 [0005] “optimizing a weight scaling factor for the quantized weights and an activation scaling factor for quantized activations, and wherein the quantized weights are quantized using the optimized weight scaling factor” and pg. 2 [0018]: “During training, a scaling factor for weights in each layer is learnable as well such that the present system optimizes the scaling factor to minimize the MSQE” teach determining a learnable scaling factor for weights (quantization scale value) that minimizes (reduces) the quantization error for weights wherein learning the scaling factor includes determining, in accordance with the MSQE (defined quantization criterion), that the 
WANG et al. and CHOI et al. are analogous art to the claimed invention because they are directed to weight quantization.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein the determining the quantization scale value comprises estimating the quantization scale value based on the defined quantization criterion, wherein, in accordance with the defined quantization criterion, the quantization scale value is within a defined value distance of a quantization scale value that is able to minimize the quantization error associated with quantizing the weights as taught by CHOI et al. to the disclosed invention of WANG et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to “[optimize] the scaling factor to minimize the MSQE” to “[obtain] low-precision neural networks having quantized weights” because “low-precision weights and activations are preferred and sometimes necessary for efficient processing with reduced power consumption when computation and power budgets are limited” (CHOI et al. pg. 1 [0004] & pg. 2 [0017]-[0018]).
Regarding Claim 9,
WANG et al. in view of CHOI et al. teaches the computer-implemented method of claim 6.
WANG et al. further teaches further comprising: reducing, by the system, a bit precision of the weights of the set of weights based on the applying of the quantization scale value (pg. 9 [0074]: “In some embodiments, a first layer (e.g., i=2) of the plurality of layers in the reduced neural network model (e.g., model 112) has a first reduced bit-width (e.g., 4-bit) that is smaller than the original bit-width (e.g., 32-bit) of the first neural network model, a second layer (e.g., i=3) of the plurality of layers in the reduced neural network model (e.g., model 112) has a second reduced bit-width (e.g., 6-bit) that is smaller than the original bit-width of the first neural network model, and the first reduced bit-width is distinct from the second reduced bit-width in the reduced neural network model” teaches reducing the bit-width (bit precision) based on applying the quantization scale value (for example, reducing to 6-bit from the original bit-width); Fig. 2 teaches a system in the memory coupled to the processor).
Regarding Claim 10,
WANG et al. in view of CHOI et al. teaches the computer-implemented method of claim 6.
WANG et al. further teaches further comprising: reducing, by the system, at least one of a memory usage or a communication overhead utilized to transfer data between layers of the deep learning system based on the applying of the quantization scale value (pg. 8 [0072]: “The device reduces (504) a footprint (e.g., memory and computation cost) of the first neural network model on the computing device (e.g., both during storage, and, optionally, during deployment of the model) by using respective reduced bit-widths for storing the respective sets of parameters of different layers of the first neural network model, wherein: preferred values (e.g., optimal bit-width values that have been identified using the techniques described herein) of the respective reduced bit-widths are determined through multiple iterations of forward propagation through the first neural network model using a validation data set while each of two or more layers of the first neural network model is expressed with different degrees of quantization corresponding to different reduced bit-widths until a predefined information loss threshold (e.g., as measured by the Jensen-Shannon Divergence described herein) is met by respective response statistics of the two or more layers” teaches reducing footprint in the form of memory usage utilized in the deployment of the neural network; pg. 3 [0024]: “As shown in FIG. 1, once the reduced, adaptive bit-width model 112 is provided to a deployment platform 116 on the model deployment system 104, real-world input data or testing data 114 is fed to the reduced, adaptive bit-width model 112, and final prediction result 118 is generated by the reduced, adaptive bit-width” teaches the reduced adaptive bit-width model (neural network) based on the bit-width Fig. 2 teaches a system in the memory coupled to the processor).
Regarding Claim 11,
WANG et al. teaches A system, comprising: a memory that stores computer-executable components; and a processor, operatively coupled to the memory, that executes computer-executable components, the computer-executable components comprising (Fig. 2 teaches various computer-executable components stored in memory wherein a processor is coupled to a memory):
a quantizer management component that, for a set of weights: determines a quantization scale based on a number of quantization levels, wherein the quantization scale comprises quantization scale values (Fig. 2 element 218 teaches Model Generation Module (corresponds to quantizer management component); pg. 8 [0066] teaches determining a quantization scale based on a number of quantization levels; pg. 9 [0074]: “In some embodiments, a first layer (e.g., i=2) of the plurality of layers in the reduced neural network model (e.g., model 112) has a first reduced bit-width (e.g., 4-bit) that is smaller than the original bit-width (e.g., 32-bit) of the first neural network model, a second layer (e.g., i=3) of the plurality of layers in the reduced neural network model (e.g., model 112) has a second reduced bit-width (e.g., 6-bit) that is smaller than the original bit-width of the first neural network model, and the first reduced bit-width is distinct from the second reduced bit-width in the reduced neural network model” teaches various quantization scale values for a set of weights; pg. 8 [0067]: “The result of the above process is the set of quantized weights Wopt,i with the optimal quantization bit-width(s) for each layer i, and the set of quantized bias bopt,i with the optimal quantization bit-width(s) for each layer i. The adaptive bit-width model 112 is thus obtained” teaches obtaining quantized pg. 6 [0057] for 8-bit uniform quantization to the weight parameters);
...and a quantizer component that quantizes weights of at least a portion of a layer of the set of weights to generate quantized weights (Fig. 2 element 218 teaches Model Generation Module (corresponds to quantizer component); pg. 8 [0067]: “The result of the above process is the set of quantized weights Wopt,i with the optimal quantization bit-width(s) for each layer i, and the set of quantized bias bopt,i with the optimal quantization bit-width(s) for each layer i. The adaptive bit-width model 112 is thus obtained” teaches quantizing weights based on quantization bit-width (scale) values; also see pg. 6 [0057] for 8-bit uniform quantization to the weight parameters; Fig. 4 teaches the neural network has multiple layers)...
and applies the quantization scale value to generate the quantized weights in one or more minibatches to facilitate training a deep learning model or generating an inference by the deep learning model (Fig. 2 element 218 teaches Model Generation Module (corresponds to quantizer component); pg. 2 [0019]: “during training, integer (INT) weight regularization and 8-bit quantization techniques are applied to push the values of the full-precision parameters of the deep learning model 106 toward their corresponding integer values, and reduce the value ranges of the parameters such that they fall within the dynamic range of a predefined reduced maximum bit-width (e.g., 8 bits)” teaches quantizing weights based on a quantization scale value to facilitate training a deep learning model; pg. 2 [0020] teaches producing inference by the deep learning system; pg. 2 [0020] further teaches examples of minibatch of training data (selection of training data that is used to train the model)).
WANG et al. does not appear to explicitly teach determines, based on a defined quantization criterion relating to the quantization error, a quantization scale value that reduces a quantization error of weights of the set of weights;...quantizes weights of at least a portion of a layer of the set of weights to generate quantized weights based on the quantization scale value and an offset value.
CHOI et al. teaches determines, based on a defined quantization criterion relating to the quantization error, a quantization scale value that reduces a quantization error of weights of the set of weights (pg. 1 [0005] “optimizing a weight scaling factor for the quantized weights and an activation scaling factor for quantized activations, and wherein the quantized weights are quantized using the optimized weight scaling factor” and pg. 2 [0018]: “During training, a scaling factor for weights in each layer is learnable as well such that the present system optimizes the scaling factor to minimize the MSQE” teach determining a scaling factor for weights (quantization scale value) that minimizes (reduces) the quantization error for weights in accordance with a defined quantization criterion relating to the quantization error, which is the particular measure of mean square quantization error (MSQE) for weights; Figs. 1-2 teach system for weight quantization);
...quantizes weights of at least a portion of a layer of the set of weights to generate quantized weights based on the quantization scale value and an offset value (pg. 4 [0051]:

    PNG
    media_image1.png
    318
    507
    media_image1.png
    Greyscale
teaches using a quantization function (see Equation (1)) to generate quantized weights based on scaling factor for weights (quantization scale value) and a rounding function that includes an offset value (see Equation (2), offset = 0.5)).
WANG et al. and CHOI et al. are analogous art to the claimed invention because they are directed to weight quantization.

One of ordinary skill in the arts would have been motivated to make this modification in order to “[optimize] the scaling factor to minimize the MSQE” to “[obtain] low-precision neural networks having quantized weights” because “low-precision weights and activations are preferred and sometimes necessary for efficient processing with reduced power consumption when computation and power budgets are limited” (CHOI et al. pg. 1 [0004] & pg. 2 [0017]-[0018]).
Regarding Claim 12,
WANG et al. in view of CHOI et al. teaches the system of claim 11.
WANG et al. further teaches wherein the quantizer component at least one of symmetrically or uniformly quantizes the weights (Fig. 2 element 218 teaches Model Generation Module (corresponds to quantizer component); pg. 6 [0057]: “When applying the 8-bit uniform quantization to the weight parameters, the weight parameters are still expressed in full-precision format, but the total number of such response levels are bound by 28- 1, with half in the positive and half in the negative. In each iteration of the training phase, this quantization function is applied in each layer in the forward pass” teaches the quantizing can further comprise of for a specific bit precision level, applying uniform quantization and symmetric quantization (“with half in the positive and half in the negative”) based on the 8-bit quantization (quantization scale value)).
CHOI et al. further teaches utilizes rounding to generate the quantized weights based on the quantization scale value (pg. 4 [0051] teaches rounding is utilized to generate the quantized weights based on scaling factor (quantization scale value)).
WANG et al. and CHOI et al. are analogous art to the claimed invention because they are directed to weight quantization.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate utilizes rounding to generate the quantized weights based on the quantization scale value as taught by CHOI et al. to the disclosed invention of WANG et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to “[optimize] the scaling factor to minimize the MSQE” to “[obtain] low-precision neural networks having quantized weights” because “low-precision weights and activations are preferred and sometimes necessary for efficient processing with reduced power consumption when computation and power budgets are limited” (CHOI et al. pg. 1 [0004] & pg. 2 [0017]-[0018]).
Regarding Claim 13,
WANG et al. in view of CHOI et al. teaches the system of claim 11.
CHOI et al. further teaches wherein the set of weights comprises a weight, and wherein the quantizer management component estimates the quantization scale value to apply to the weight as a linear or a non-linear function of a first statistical function of a weight value of the weight and a linear or a non-linear function of a second statistical function of the weight value, and wherein the first statistical function is different from the second statistical function (pg. 1 [0005] “optimizing a weight scaling factor for the quantized weights and an activation scaling factor for quantized activations, and wherein the quantized weights are quantized using the optimized weight scaling factor” and pg. 2 [0018]: “During training, a scaling factor for weights in each layer is learnable as well such that the present system optimizes the scaling factor to minimize the MSQE” teach determining a scaling factor pg. 4 [0053]:  
    PNG
    media_image2.png
    462
    520
    media_image2.png
    Greyscale
teaches applying a non-linear function of a first statistical function (cost function Equation (6)) of a weight value; pg. 5 [0059]:  

    PNG
    media_image3.png
    252
    510
    media_image3.png
    Greyscale
 teaches applying a non-linear function of a second statistical function (cost function Equation (10)) of a weight value; Equation (6) and Equation (10) are different; also see Fig. 2).

It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein the set of weights comprises a weight, and wherein the quantizer management component estimates the quantization scale value to apply to the weight as a linear or a non-linear function of a first statistical function of a weight value of the weight and a linear or a non-linear function of a second statistical function of the weight value, and wherein the first statistical function is different from the second statistical function as taught by CHOI et al. to the disclosed invention of WANG et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to “[optimize] the scaling factor to minimize the MSQE” to “[obtain] low-precision neural networks having quantized weights” because “low-precision weights and activations are preferred and sometimes necessary for efficient processing with reduced power consumption when computation and power budgets are limited” (CHOI et al. pg. 1 [0004] & pg. 2 [0017]-[0018]).
Regarding Claim 14,
WANG et al. in view of CHOI et al. teaches the system of claim 11.
CHOI et al. further teaches wherein the quantization scale value is associated with a quantization error within a defined value distance of a minimum quantization error (pg. 1 [0005] “optimizing a weight scaling factor for the quantized weights and an activation scaling factor for quantized activations, and wherein the quantized weights are quantized using the optimized weight scaling factor” and pg. 2 [0018]: “During training, a scaling factor for weights in each layer is learnable as well such that the present system optimizes the scaling factor to minimize the MSQE” teach determining a scaling factor for weights (quantization scale value) that minimizes (reduces) the 
WANG et al. and CHOI et al. are analogous art to the claimed invention because they are directed to weight quantization.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein the quantization scale value is associated with a quantization error within a defined value distance of a minimum quantization error as taught by CHOI et al. to the disclosed invention of WANG et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to “[optimize] the scaling factor to minimize the MSQE” to “[obtain] low-precision neural networks having quantized weights” because “low-precision weights and activations are preferred and sometimes necessary for efficient processing with reduced power consumption when computation and power budgets are limited” (CHOI et al. pg. 1 [0004] & pg. 2 [0017]-[0018]).
Regarding Claim 15,
WANG et al. in view of CHOI et al. teaches the system of claim 11.
CHOI et al. further teaches wherein the quantizer management component estimates the quantization scale value based on the defined quantization criterion, wherein the quantization scale value reduces the quantization error of the weights to have the quantization error be within a defined value distance of a minimum quantization error of the weights associated with any quantization scale value (pg. 1 [0005] “optimizing a weight scaling factor for the quantized weights and an activation scaling factor for quantized activations, and wherein the quantized weights are quantized using the optimized weight scaling factor” and pg. 2 [0018]: “During training, a scaling factor for weights in each layer is learnable as well such that the present system optimizes the scaling factor to minimize the MSQE” teach determining a learnable scaling factor for weights (quantization scale value) that 
WANG et al. and CHOI et al. are analogous art to the claimed invention because they are directed to weight quantization.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein the quantizer management component estimates the quantization scale value based on the defined quantization criterion, wherein the quantization scale value reduces the quantization error of the weights to have the quantization error be within a defined value distance of a minimum quantization error of the weights associated with any quantization scale value as taught by CHOI et al. to the disclosed invention of WANG et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to “[optimize] the scaling factor to minimize the MSQE” to “[obtain] low-precision neural networks having quantized weights” because “low-precision weights and activations are preferred and sometimes necessary for efficient processing with reduced power consumption when computation and power budgets are limited” (CHOI et al. pg. 1 [0004] & pg. 2 [0017]-[0018]).
Regarding Claim 17,
WANG et al. in view of CHOI et al. teaches the system of claim 16 (claim 17 is interpreted as being dependent on claim 11).
WANG et al. further teaches wherein the quantizer component reduces a bit precision associated with the weights based on the application of the quantization scale value. (Fig. 2 element 218 pg. 9 [0074]: “In some embodiments, a first layer (e.g., i=2) of the plurality of layers in the reduced neural network model (e.g., model 112) has a first reduced bit-width (e.g., 4-bit) that is smaller than the original bit-width (e.g., 32-bit) of the first neural network model, a second layer (e.g., i=3) of the plurality of layers in the reduced neural network model (e.g., model 112) has a second reduced bit-width (e.g., 6-bit) that is smaller than the original bit-width of the first neural network model, and the first reduced bit-width is distinct from the second reduced bit-width in the reduced neural network model” teaches reducing the bit-width (bit precision) based on applying the quantization scale value (for example, reducing to 6-bit from the original bit-width)).
Regarding Claim 18,
WANG et al. in view of CHOI et al. teaches the system of claim 16 (claim 18 is interpreted as being dependent on claim 11).
WANG et al. further teaches wherein the quantizer component reduces at least one of a memory usage or a communication overhead used to transfer data between layers of the deep learning model based on the application of the quantization scale value (Fig. 2 element 218 teaches Model Generation Module (corresponds to quantizer component); pg. 8 [0072]: “The device reduces (504) a footprint (e.g., memory and computation cost) of the first neural network model on the computing device (e.g., both during storage, and, optionally, during deployment of the model) by using respective reduced bit-widths for storing the respective sets of parameters of different layers of the first neural network model, wherein: preferred values (e.g., optimal bit-width values that have been identified using the techniques described herein) of the respective reduced bit-widths are determined through multiple iterations of forward propagation through the first neural network model using a validation data set while each of two or more layers of the first neural network model is expressed with different degrees of quantization corresponding to different reduced bit-widths until a predefined information loss threshold (e.g., as measured by the Jensen-Shannon Divergence described herein) is met by respective response statistics of the two or more layers” teaches reducing footprint in the form of memory usage utilized in the deployment of the neural network; pg. 3 [0024]: “As shown in FIG. 1, once the reduced, adaptive bit-width model 112 is provided to a deployment platform 116 on the model deployment system 104, real-world input data or testing data 114 is fed to the reduced, adaptive bit-width model 112, and final prediction result 118 is generated by the reduced, adaptive bit-width” teaches the reduced adaptive bit-width model (neural network) based on the bit-width (quantization scale value), when deployed, uses the model to make predictions (which would require passing input throughout the layers of the network, and thus requiring transferring of data between layers, see Fig. 4 for deep learning model)).
Regarding Claim 19,
WANG et al. teaches A computer program product that facilitates quantizing weights, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions are executable by a processor to cause the processor to (pg. 1 [0006] teaches non-transitory computer readable storage medium with instructions executable by processor; pg. 8 [0067] teaches weight quantization):
for a set of weights, determine a quantization scale as a function of a bit precision level, wherein the quantization scale comprises quantization scale values (pg. 4 [0027] teaches computer-implemented and processor; pg. 8 [0066] teaches determining a quantization scale based on a quantization scale as a function of a bit precision level; pg. 9 [0074]: “In some embodiments, a first layer (e.g., i=2) of the plurality of layers in the reduced neural network model (e.g., model 112) has a first reduced bit-width (e.g., 4-bit) that is smaller than the original bit-width (e.g., 32-bit) of the first neural network model, a second layer (e.g., i=3) of the plurality of layers in the reduced neural network model (e.g., model 112) has a second reduced bit-width (e.g., 6-bit) that is smaller than the original bit-width of the first neural network model, and the first reduced bit-width is distinct from the second reduced bit-width in the reduced neural network model” teaches various quantization scale values for a set of weights; pg. 8 [0067]: “The result of the above process is the set of quantized weights Wopt,i with the optimal quantization bit-width(s) for each layer i, and the set of quantized bias bopt,i with the optimal quantization bit-width(s) for each layer i. The adaptive bit-width model 112 is thus obtained” teaches obtaining quantized weights based on quantization bit-width (scale));
...quantize weights of at least a portion of a layer of the set of weights...to facilitate training a deep learning system or producing an inference by the deep learning system (pg. 2 [0019]: “during training, integer (INT) weight regularization and 8-bit quantization techniques are applied to push the values of the full-precision parameters of the deep learning model 106 toward their corresponding integer values, and reduce the value ranges of the parameters such that they fall within the dynamic range of a predefined reduced maximum bit-width (e.g., 8 bits)” teaches quantizing weights based on a quantization scale value to facilitate training a deep learning model; pg. 2 [0020] teaches producing inference by the deep learning system).
WANG et al. does not appear to explicitly teach determine a quantization scale value of the quantization scale values that reduces a quantization error of the set of weights in accordance with a defined quantization criterion relating to the quantization error; quantize weights of at least a portion of a layer of the set of weights to generate quantized weights based on the quantization scale value.
However, CHOI et al. teaches determine a quantization scale value of the quantization scale values that reduces a quantization error of the set of weights in accordance with a defined quantization criterion relating to the quantization error (pg. 1 [0005] “optimizing a weight scaling factor for the quantized weights and an activation scaling factor for quantized activations, and wherein the quantized weights are quantized using the optimized weight scaling factor” and pg. 2 [0018]: “During training, a scaling factor for weights in each layer is learnable as well such that the present system optimizes the scaling factor to minimize the MSQE” teach determining a scaling factor for weights (quantization scale value) that minimizes (reduces) the quantization error for weights in accordance with a defined quantization criterion relating to the quantization error, which is the particular measure of mean square quantization error (MSQE) for weights; Figs. 1-2 teach system for weight quantization);
quantize weights of at least a portion of a layer of the set of weights to generate quantized weights based on the quantization scale value (pg. 4 [0051]:

    PNG
    media_image1.png
    318
    507
    media_image1.png
    Greyscale
teaches using a quantization function (see Equation (1)) to generate quantized weights based on scaling factor for weights (quantization scale value) and a rounding function that includes an offset value (see Equation (2), offset = 0.5); pg. 1 [0006] teaches weights of at least a portion of a layer).
WANG et al. and CHOI et al. are analogous art to the claimed invention because they are directed to weight quantization.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate determine a quantization scale value of the quantization scale values that reduces a quantization error of the set of weights in accordance with a defined quantization criterion relating to the quantization error; quantize weights of at least a portion of a layer of the set of weights to generate quantized weights based on the quantization scale value as taught by CHOI et al. to the disclosed invention of WANG et al.

Regarding Claim 20,
WANG et al. in view of CHOI et al. teaches the computer program product of claim 19.
WANG et al. further teaches wherein to facilitate the determining the quantization scale value, the program instructions are executable by a processor to cause the processor to (pg. 1 [0006] teaches non-transitory computer readable storage medium with instructions executable by processor; pg. 8 [0066] teaches determining a quantization scale).
CHOI et al. further teaches estimate the quantization scale value to apply to a weight of the set of weights as a linear or a non-linear function of a first statistical function of a weight value of the weight and a linear or a non-linear function of a second statistical function of the weight value, wherein the first statistical function is different from the second statistical function (pg. 1 [0005] “optimizing a weight scaling factor for the quantized weights and an activation scaling factor for quantized activations, and wherein the quantized weights are quantized using the optimized weight scaling factor” and pg. 2 [0018]: “During training, a scaling factor for weights in each layer is learnable as well such that the present system optimizes the scaling factor to minimize the MSQE” teach determining a scaling factor for weights (quantization scale value) and how that factor affects the mean square quantization error (MSQE), which corresponds to estimating the quantization scale value; pg. 4 [0053]:  
    PNG
    media_image2.png
    462
    520
    media_image2.png
    Greyscale
teaches applying a non-linear function of a first statistical function (cost function Equation (6)) of a weight value; pg. 5 [0059]:  

    PNG
    media_image3.png
    252
    510
    media_image3.png
    Greyscale
 teaches applying a non-linear function of a second statistical function (cost function Equation (10)) of a weight value; Equation (6) and Equation (10) are different), and
wherein the quantization scale value is associated with a quantization error within a defined value distance of a minimum quantization error (pg. 1 [0005] “optimizing a weight scaling factor for the quantized weights and an activation scaling factor for quantized activations, and wherein the quantized weights are quantized using the optimized weight scaling factor” and pg. 2 [0018]: “During training, a scaling factor for weights in each layer is learnable as well such that the present system optimizes the scaling factor to minimize the MSQE” teach determining a scaling factor for weights (quantization scale value) that minimizes (reduces) the quantization error for weights, minimizing corresponds to determining quantization error within a defined value distance of a minimum quantization error).
WANG et al. and CHOI et al. are analogous art to the claimed invention because they are directed to weight quantization.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate estimate the quantization scale value to apply to a weight of the set of weights as a linear or a non-linear function of a first statistical function of a weight value of the weight and a linear or a non-linear function of a second statistical function of the weight value, wherein the first statistical function is different from the second statistical function, and wherein the quantization scale value is associated with a quantization error within a defined value distance of a minimum quantization error as taught by CHOI et al. to the disclosed invention of WANG et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to “[optimize] the scaling factor to minimize the MSQE” to “[obtain] low-precision neural networks having quantized weights” because “low-precision weights and activations are preferred and sometimes necessary for efficient processing with reduced power consumption when computation and power budgets are limited” (CHOI et al. pg. 1 [0004] & pg. 2 [0017]-[0018]).

Claims 21-22 are rejected under 35 U.S.C. 103 as being unpatentable over WANG et al. (US 2019/0050710 A1) in view of CHOI et al. (US 2019/0138882 A1) and further in view of Na et al. (“On-Chip Training of Recurrent Neural Networks with Limited Numerical Precision”).
Regarding Claim 21,
WANG et al. in view of CHOI et al. teaches the computer-implemented method of claim 2.
WANG et al. in view of CHOI et al. does not appear to explicitly teach wherein the offset value and a type of rounding utilized to generate the quantized weights is determined based on a type of application or model associated with the deep learning system.
However, Na et al. teaches wherein the offset value and a type of rounding utilized to generate the quantized weights is determined based on a type of application or model associated with the deep learning system (pg. 3716 last three full paragraphs: “This paper presents limited precision training of RNNs...Low precision training simulation with various rounding options shows that stochastic rounding achieves superior results than other options. We study the effect of fully low precision training by comparing with partial low precision training. We also explore applying piecewise linear activation function with stochastic rounding. Low precision multiplier and accumulator (MAC) with linear-feedback shift register (LFSR) is implemented with 28nm Synopsys PDK for energy and performance analysis. Implementation results show that low precision hardware is 4.7x faster, and energy per task is up to 4.55x lower than that of floating point hardware” teaches determining to use stochastic rounding (type of rounding) to generate quantized weights based on the need for low precision training simulation using low precision devices associated with recurrent neural network (correspond to type of application or model associated with deep learning system); pg. 3719 Section B:
    PNG
    media_image4.png
    129
    495
    media_image4.png
    Greyscale
teaches stochastic rounding involves determination of offset value 
    PNG
    media_image5.png
    18
    11
    media_image5.png
    Greyscale
).

It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein the offset value and a type of rounding utilized to generate the quantized weights is determined based on a type of application or model associated with the deep learning system as taught by Na et al. to the disclosed invention of WANG et al. in view of CHOI et al.
One of ordinary skill in the arts would have been motivated to make this modification because “[l]ow precision training simulation with various rounding options shows that stochastic rounding achieves superior results than other options. We study the effect of fully low precision training by comparing with partial low precision training. We also explore applying piecewise linear activation function with stochastic rounding. Low precision multiplier and accumulator (MAC) with linear-feedback shift register (LFSR) is implemented with 28nm Synopsys PDK for energy and performance analysis. Implementation results show that low precision hardware is 4.7x faster, and energy per task is up to 4.55x lower than that of floating point hardware” (Na et al. pg. 3716 last two full paragraphs).
Regarding Claim 22,
WANG et al. in view of CHOI et al. teaches the system of claim 12.
WANG et al. in view of CHOI et al. does not appear to explicitly teach wherein the quantizer component determines the offset value and a type of rounding utilized to generate the quantized weights based on a type of application or model associated with the deep learning system.
However, Na et al. teaches wherein the quantizer component determines the offset value and a type of rounding utilized to generate the quantized weights based on a type of application or model associated with the deep learning system (pg. 3716 last three full paragraphs: “This paper presents limited precision training of RNNs...Low precision training simulation with various rounding options shows that stochastic rounding achieves superior results than other options. We study the effect of fully low precision training by comparing with partial low precision training. We also explore applying piecewise linear activation function with stochastic rounding. Low precision multiplier and accumulator (MAC) with linear-feedback shift register (LFSR) is implemented with 28nm Synopsys PDK for energy and performance analysis. Implementation results show that low precision hardware is 4.7x faster, and energy per task is up to 4.55x lower than that of floating point hardware” teaches determining to use stochastic rounding (type of rounding) to generate quantized weights based on the need for low precision training simulation using low precision devices associated with recurrent neural network (correspond to type of application or model associated with deep learning system); pg. 3719 Section B:
    PNG
    media_image4.png
    129
    495
    media_image4.png
    Greyscale
teaches stochastic rounding involves determination of offset value 
    PNG
    media_image5.png
    18
    11
    media_image5.png
    Greyscale
; MAC devices used for low precision quantization correspond to quantizer component).
WANG et al., CHOI et al., and Na et al. are analogous art to the claimed invention because they are directed to weight quantization.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein the quantizer component determines the offset value and a type of rounding utilized to generate the quantized weights based on a type of application or model associated with the deep learning system as taught by Na et al. to the disclosed invention of WANG et al. in view of CHOI et al.
One of ordinary skill in the arts would have been motivated to make this modification because “[l]ow precision training simulation with various rounding options shows that stochastic rounding .

Response to Arguments
Applicant's arguments filed on 01/04/2022 with respect to the 35 U.S.C. 103 rejection to claims 1-7, 9-15, and 17-20 have been fully considered but they are not persuasive. Applicant asserts that “none of the cited references, alone or in combination, teach, disclose or suggest the elements of claim 1, recited as "...quantizing, by the system, weights of at least a portion of a layer of the set of weights to generate quantized weights based on the quantization scale value and an offset value..."...As such, neither Wang et al. nor Choi et al., alone or in combination, teach, disclose or suggest the elements of claim 1” (Remarks, pg. 9).
Examiner’s Response:
The Examiner respectfully disagrees. Applicant merely asserts that none of the cited references teaches the amended limitation of claim 1 but does not provide any arguments. Choi et al. teaches quantizing, by the system, weights of at least a portion of a layer of the set of weights to generate quantized weights based on the quantization scale value and an offset value (pg. 4 [0051]:

    PNG
    media_image1.png
    318
    507
    media_image1.png
    Greyscale
teaches using a quantization function (see Equation (1)) to generate quantized weights based on scaling factor for weights (quantization scale value) and a rounding function that includes an offset value (see Equation (2), offset = 0.5); pg. 1 [0006] teaches weights of at least a portion of a layer). Please see the current rejection for more information.
Since analogous argument is relied upon for the dependent claims of claim 1, the response above is applicable to those claims.
Since analogous argument is relied upon for independent claims 11 and 19 (and their respective dependent claims), the response above is applicable to those claims.










Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YING YU CHEN whose telephone number is (571)270-1484. The examiner can normally be reached Monday-Friday 7:30 am-5:00 pm (EST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and 





/Y.C./Examiner, Art Unit 2125                                                                                                                                                                                                        

/KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125