DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The amendment filed 2022-09-14 has been entered.  Applicant’s amendments to the claims .  The status of the claims is as follows:
Claims 1-20 remain pending in the application.
Claims 1, 3-7, 9, 11-15, 17, and 19-20 are amended.
Response to Arguments
Applicant’s arguments with respect to rejections under 35 USC 112(b) have been fully considered but they are not persuasive.  Applicant argues on Remarks Page 11 that “The algorithm for the MPF element ‘scaling logic’ includes the several actions in question. Thus, there is no requirement that a separate algorithm be disclosed for each step of the algorithm for the MPF element ‘scaling logic’”  Examiner respectfully disagrees, as each of the 3 recited steps in Claim 9:
Determine…a set of quantization parameters associated with the channel
Determine weights and bias associated with the channel from the meta file 
Generate, from the first layer, an output feature map
Each amounts to purely functional language, in which what the “scaling logic” does is claimed, but not how it does it.  See MPEP 2181(II)(B):  “To claim a means for performing a specific computer-implemented function and then to disclose only a general purpose computer as the structure designed to perform that function amounts to pure functional claiming.” Examiner points out that the 3 steps above amount to claiming 3 steps performed by a general purpose computer, as there is no description of how a computer may have been specially programmed to achieve such functions.
Applicant’s arguments with respect to rejections under 35 USC 101 have been fully considered but they are not persuasive.  Applicant argues on Remarks Page 11 that “the alleged abstract idea would have been incorporated into a practical application of the abstract idea, because it improves the performance of a computing system, i.e., a hardware-based neural network model.”  Examiner respectfully disagrees, and points out that Applicant has not provided any evidence of the alleged improvement, either in the claim language or the specification, as to how the improvement is made.
Applicant’s arguments with respect to rejections under 35 USC 102 and 103 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.  The amendments have changed the scope of the claims, necessitating new grounds of rejection, and thus a new combination of art is relied upon with the addition of Yu at al. (US 2018/0046913 A1).
Remarks – Prior Art
Cited reference Wu et. al. (US 2020/0193270 A1) discloses a channel-wise quantization of a convolutional neural network. It has a filing date of 2019-08-27.  However, it has a priority date from a provisional application filed 2018-12-12.  The priority application comprises a paper by Wu et. al. titled “Low-Precision and Coarse-to-Fine Dynamic Fixed-Point Quantization Design in Convolution Neural Network”, which supports the matter in the subsequent patent application (Page 3, Last Paragraph:  “To solve this problem, we proved the fine-grained dynamic range technique on channel-wised quantization”).  
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: 
“Scaling logic configured to” in Claim 9
 “Scaling logic is further configured to” in Claim 13
“Scaling logic is to” in Claim 14
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.  Examiner points out that MPEP 2181(II)(B) states:  “For a computer-implemented 35 U.S.C. 112(f)  claim limitation, the specification must disclose an algorithm for performing the claimed specific computer function, or else the claim is indefinite under 35 U.S.C. 112(b).”  As for the “Scaling logic” of Claim 9, the step of “quantize, based on the set of quantization parameters, the input feature map” is being interpreted as the algorithm described in Instant Specification [0046].  Also, as for the “Scaling logic” of Claim 14, the step of “re-quantize the output feature map” is also being interpreted as the algorithm described in Instant Specification [0046].
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 9-16 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
The following claim limitations invoke 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
 “Scaling logic configured to” in Claim 9
“Scaling logic is further configured to” in Claim 13
“Scaling logic is to” in Claim 14
However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function.   Examiner points out that MPEP 2181(II)(B) states:  “For a computer-implemented 35 U.S.C. 112(f)  claim limitation, the specification must disclose an algorithm for performing the claimed specific computer function, or else the claim is indefinite under 35 U.S.C. 112(b).”  Examiner points out that for the following steps that the scaling logic is “configured” to do, no algorithm is disclosed in the Specification:
“determine, based on a meta file associated with the neural network model, a set of quantization parameters associated with the channel”; Specification [0037-0040] describes the nature of the parameters, but no specific algorithm detailing how to determine them
 “determine weights and bias associated with the channel from the meta file”; No specific algorithm is recited in the Specification
“generate, from the first layer, an output feature map”; No specific algorithm is recited in the Specification
Therefore, the claims are indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph.
Applicant may:
(a)        Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph; 
(b)        Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(c)        Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 
(a)        Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(b)        Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.
Dependent claims 10-16 are rejected because they inherit the deficiencies of Claim 9. 

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-16 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea, specifically mathematical concepts (MPEP 2106.04(a)(2)(I)), without significantly more. 
Step 1 Analysis:
Claims 1-16 are directed to a method, and Claims 9-16 are directed to an integrated circuit.  Therefore, each of the claims are within the four statutory categories of patentable subject matter.
Step 2A Prong 1 Analysis:
Claims 1 and 9 recite:
“determining … a set of quantization parameters associated with the channel, wherein the set of quantization parameters specifies a range for integers of the first bit width and a type of integers of a second bit width”; this is a mathematical calculation (see MPEP 2106.04(a)(2)(I)(C))
“quantizing, based on the set of quantization parameters…the input feature map at the channel from a first set of integers of the first bit width to a second set of integers of the second bit width”; this is a mathematical calculation (see MPEP 2106.04(a)(2)(I)(C))
Step 2A Prong 2 Analysis:
	Additional elements “receiving an input feature map at a first layer of a hardware-based neural network model having a plurality of layers implemented within an integrated circuit, wherein the input feature map is represented by integers of a first bit width”, “meta file associated with the neural network model”, and “wherein weights and bias for each channel of each layer of the hardware-based network model have been quantized using an offline quantization tool” do not integrate the abstract idea into a practical application.  “Receiving an input feature map” amounts to insignificant extra-solution activity (mere data gathering, MPEP 2106.05(g)(3)) and “hardware-based neural network model”, “meta file”, and “offline quantization tool” amount to merely applying the abstract idea to a field of use and technological environment (MPEP 2106.05(h)).  These additional elements place no meaningful restrictions on the practice of the abstract idea.
	Step 2B Analysis:
	Additional elements “receiving an input feature map at a first layer of a hardware-based neural network model having a plurality of layers implemented within an integrated circuit, wherein the input feature map is represented by integers of a first bit width”, “meta file associated with the neural network model”, and “wherein weights and bias for each channel of each layer of the hardware-based network model have been quantized using an offline quantization tool” are not sufficient to amount to significantly more than the judicial exception.  “Receiving an input feature map” amounts to insignificant extra-solution activity (mere data gathering, MPEP 2106.05(g)(3)) and “hardware-based neural network model”, “meta file”, and “offline quantization tool” amount to merely applying the abstract idea to a field of use and technological environment (MPEP 2106.05(h)).  The claims are directed to a judicial exception.
	Dependent claims 2-8 and 10-16 are also directed to an abstract idea.  They recite the following:
	Claims 2 and 10 recite the same limitations as Claims 1 and 9, further reciting “wherein the first bit width comprises 32 bits and the second bit width comprises 8 bits”; the claims are still directed to a mathematical concept.
Claims 3 and 11 recite the same limitations as Claims 1 and 9, further reciting “wherein at least two channels of the hardware-based neural network model are associated with different quantization parameters”; the claims are still directed to a mathematical concept.
Claims 4 and 12 recite the same limitations as Claims 1 and 9, further reciting “wherein at least two layers of the hardware-based neural network model are associated with different quantization parameters”; the claims are still directed to a mathematical concept.
Claims 5 and 13 recite the same limitations as Claims 1 and 9, further reciting “for each of the plurality of channels associated with the input feature map received at the first layer, determining the weights and bias associated with the channel from the meta file; generating, from the first layer, an output feature map represented by a third set of integers of the first bit width based on the quantized feature map, the quantized weights, and the quantized bias associated with the channel”; determining values and generating an output feature map are also mathematical calculations.
Claims 6 and 14 recite the same limitations as Claims 5 and 13, further reciting “further comprising re-quantizing the output feature map from the third set of integers of the first bit width to a fourth set of integers of the second bit width before providing the output feature map as an input feature map to a second layer of the hardware-based neural network model”; re-quantizing is still a mathematical calculation.
Claims 7 and 15 recite the same limitations as Claims 5 and 13, further reciting “wherein at least two channels of the hardware-based neural network model are associated with different weights and bias”; the claims are still directed to a mathematical concept.
Claims 8 and 16 recite the same limitations as Claims 1 and 9, further reciting “wherein quantizing the input feature map at each channel includes mapping each of the first set of integers of the first bit width to an integer in the second set of integers of the second bit width based on the set of quantization parameters”; mapping values based on parameters is still based on the mathematical concept.

Claims 17-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea, specifically a mathematical concept, without significantly more. 
Step 1 Analysis:
Claims 17-20 are directed to a method.  Therefore, each of the claims are within the four statutory categories of patentable subject matter.
Step 2A Prong 1 Analysis:
Claim 17 recites:
 “quantizing, using an offline quantization tool, weights and bias for each channel of each layer of the neural network model from floating-point values to integer values based on the data distribution for that channel”; quantizing a model is a mathematical calculation (see MPEP 2106.04(a)(2)(I)(C))
“performing, using an offline quantization tool, a plurality of inferences on the subset of data using the neural network model, to generate a data distribution for each channel of each layer of the neural network model”; generating a data distribution is a mathematical calculation (see MPEP 2106.04(a)(2)(I)(C))
“generating, using an offline quantization tool, a quantization model for each channel of each layer based on the data distribution for that channel”; quantizing based on a distribution is a mathematical calculation (see MPEP 2106.04(a)(2)(I)(C))
Step 2A Prong 2 Analysis:
	The following additional elements do not integrate the judicial exception into a practical application: 
“extracting, using an offline quantization tool, a subset of data from a training data set, wherein the training data set includes a first subset used to train the neural network model and a second subset used to validate the neural network model represented by floating point values, the neural network model having a plurality of layers and each of the plurality of layers including a plurality of channels”; this amounts to insignificant extra-solution activity, as it is mere data gathering (see MPEP 2106.05(g)(3))
“storing, using an offline quantization tool, the weights and bias and the quantization model for each channel of each layer of the neural network model in a meta file”; this amounts to mere instructions to apply an exception (see MPEP 2106.05(f))
“wherein the neural network model can be deployed in an integrated circuit to perform data classification operations in integers, and wherein the quantization model for each channel of each layer of the neural network model is utilized to scale feature map data generated in that channel”; this amounts to mere instructions to apply an exception (see MPEP 2106.05(f)).  However, Examiner notes that if the deployment is positively recited “i.e., “deploying the model in an integrated circuit”, rather than “model can be deployed”, this would result in eligibility
The recitation of a “computer” and a NN model having a plurality of layers and channels are recited at a high level of generality, i.e. a generic processor performing generic computer functions.  The judicial exception is not integrated into a practical application.  Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limitations on practicing the abstract idea.  The claim is directed to an abstract idea.
	Step 2B Analysis:
	The step of “extracting data” is considered to be WURC in accordance with MPEP 2106.05(d)(II)(iv).  As discussed above, the neural network and the storing of data in a meta file amount to mere instructions to apply the exception in a computer environment. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.  As discussed above, the additional elements of using a generic processor and generic model to perform the steps identified amount to no more than mere instructions to apply the exception using a generic computer.  The claim is not patent eligible.
	Dependent claims 18-20 are also directed to an abstract idea.  They recite the following:
	Claim 18 recites the same limitations as Claim 17, further reciting “generating a distribution of floating point values at each of the plurality of channels based on the plurality of inferences”; generating a distribution of floating point values can be performed by a human with pen and paper, and thus the claim is still directed to a mathematical concept.
Claim 19 recites the same limitations as Claim 18, further reciting “removing one or more outlier values based on a predetermined percentage from each end of the distribution of floating point values; determining a maximum floating-point value and a minimum floating-point value from the distribution; determining a maximum integer value of a first bit width and a minimum integer value of the first bit width that respectively correspond to the maximum floating-point value and the minimum floating-point value; and constructing a set of quantization parameters for the channel using the maximum integer value, the maximum floating-point value, the minimum floating-point value, the minimum integer value, and an integer type of a second bit width”; removing outliers, determining values, and constructing parameters are also mathematical calculations.
Claim 20 recites the same limitations as Claim 17, further reciting “wherein the training data set includes a first subset used to train the neural network model and a second subset used to validate the neural network model represented by floating point values”; this merely describes what the data was previously used for, and thus the claim is still directed to a mathematical concept.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-16 are rejected under 35 U.S.C. 103 as being unpatentable over Wu et. al. (US 2020/0193270 A1; hereinafter “Wu”) in view of Lee et. al. (“Quantization for Rapid Deployment of Deep Neural Networks”; hereinafter “Lee”), Croxford et. al. (US 2020/0175338 A1; hereinafter “Croxford”), and Yu et al. (US 2018/0046913 A1; hereinafter “Yu”)
As per Claim 1, Wu teaches a method performed within an integrated circuit, the method comprising (Wu, Para [0004], suggests an integrated circuit:  “There are three major challenges to develop a new CNN accelerator”)
receiving an input feature map at a first layer of a hardware-based neural network model having a plurality of layers implemented within an integrated circuit, wherein the input feature map is represented by [integers of] a first bit width, wherein weights and bias for each channel of each layer of the hardware-based network model have been quantized [using an offline quantization tool] (Recall above Wu [0004] suggests a hardware-based neural network model having a plurality of layers implemented within an integrated circuit (“CNN accelerator”).  Wu, Para [0013], discloses:  “S102: inputting input data to a floating pre-trained convolution neural network (CNN) to generate floating feature maps for each layer of the floating pre-trained CNN mode.”  Here, Wu discloses a feature map comprising input data, that is received at a first layer (“for each layer”).  Since the values of the feature map are stored on an integrated circuit, then they must be stored in a memory with a “first bit width”.  Wu also teaches wherein weights and bias for each channel of each layer of the hardware-based network model have been quantized in Para [0029]:  “In the weight vectors, it has a convolutional vector or batch-normalization scaling vector (ω) and a bias vector (b).”  Here, Wu discloses “weight vectors” as comprising weights (“ω”) and bias (“b”).  Wu, Para [0030], discloses these “weight vectors” as being quantized per channel:  “To solve this problem, the fine quantization technique is used on channel-wised quantization. In the convolutional weight vector, the maximum value for i-th output channel is defined as max.sub.v(w.sub.i)(i∈1, 2, . . . , M). The updated dynamic range per output channel is [−max.sub.v(w.sub.i),max.sub.v(w.sub.i)].”) *Integers and offline quantization tool to be taught by other references below.
and for each of a plurality of channels associated with the input feature map, 
determining[, based on a meta file associated with the hardware-based neural network model,] a set of quantization parameters associated with the channel, [wherein the meta file is generated using the offline quantization tool], wherein the set of quantization parameters specifies a range for [integers] fixed point of the first bit width and a type of [integers] fixed point of a second bit width (Recall above Wu [0004] suggests a hardware-based neural network model having a plurality of layers implemented within an integrated circuit (“CNN accelerator”). Wu, Para [0030], discloses:  “According to the statistical analysis on the convolutional weight vector (w.sub.(k×k×N×M)×1), the values for each output channel (the total number of output channels is M) vary differently. The quantization accuracy will thus be significantly impacted when the dynamic quantization range ([−max.sub.v(w),−max.sub.v(w)]) is used to cover the entire output channels. To solve this problem, the fine quantization technique is used on channel-wised quantization. In the convolutional weight vector, the maximum value for i-th output channel is defined as max.sub.v(w.sub.i)(i∈1, 2, . . . , M). The updated dynamic range per output channel is [−max.sub.v(w.sub.i),max.sub.v(w.sub.i)]. Applying the coarse quantization and the fine quantization by quantization range to generate fixed-point inferred data, it can provide the very low quantization error and provide a quantization result close to 32-bit floating point accuracy for all CNNs.”  Here, Wu discloses for each of a plurality of channels associated with the input feature map (“channel-wised”, “i-th output channel”), determining a set of quantization parameters (“dynamic range per output channel”), wherein the set of quantization parameters specifies a range (“[−max.sub.v(w.sub.i),max.sub.v(w.sub.i)]”).
Note that Wu discloses both a first range and a second range, as Wu [0025] discloses “coarse” and “fine” quantization:  “An embodiment discloses a quantization methodology, a coarse quantization and a fine quantization by the dynamic quantization range on weight vector is described as below.”  Wu, Para [0021], discloses:  “To fully represent the 32-bit floating-point value when using dynamic fixed-point format in the activation vector (x), a scalar factor s is defined as shown in equation 3.

    PNG
    media_image1.png
    49
    365
    media_image1.png
    Greyscale
 
where p represents the quantization bit-width. In equation 3, the dynamic quantization range is [[−max.sub.v,max.sub.v]].  Thus, here Wu discloses a range for fixed point value of the first bit width.  Then, as shown above in [0030], Wu discloses “The updated dynamic range per output channel is [−max.sub.v(w.sub.i),max.sub.v(w.sub.i)]”, and thus discloses a type of fixed point value of a second bit width.) *Meta file, integer, and offline quantization too to be taught by other references below.
and quantizing, based on the set of quantization parameters [determined based on the meta file], the [input feature map] values at the channel from a first set of integers of the first bit width to a second set of integers of the second bit width (Wu [0030] as shown above, discloses using the quantization parameters as specified above in order to perform the quantizing (“Applying the coarse quantization and the fine quantization”) from a first bit width to a second bit width, as Wu discloses “by quantization range to generate fixed-point inferred data”).
Examiner notes that Wu only performs the “coarse” quantization on the feature map (“activation”), while performing both the “coarse” and “fine” (“channel-wise”) quantizations on the “weights vectors” only (Wu [0026]:  “To handle this issue, a coarse quantization and a fine quantization to fixed-point technique on weight vectors is proposed.”)  Thus, Wu does not explicitly teach quantizing the input feature map at the channel level, but only the weight vector.  Examiner asserts that Wu certainly suggests it, as Wu discloses:
Quantizing the input feature map
Quantizing the weight vectors at the channel level
Therefore Examiner asserts that these 2 embodiments of Wu suggest quantizing the input feature map at the channel.  Nevertheless, Examiner will combine with another reference to explicitly teach this limitation for the sake of clarity and precision.
	Lee teaches quantizing, based on the set of quantization parameters [determined based on the meta file], the input feature map values at the channel (Lee, Page 2 Section 2.1, discloses:  “In the channel-wise quantization, the fractional lengths for the feature maps and the weights can be customized for each channel to minimize the impact of low-precision rounding.”)
	Lee also teaches integer.  (Recall above that Wu teaches “fixed point” types.  One of ordinary skill in the art will appreciate that a “fixed point” representation includes an integer part.  However, Lee also explicitly recites integer.  Lee, Page 3 Figure 1, discloses:  “Qn.m represents a fixed point integer format with n bits for integer part, m bits for fractional part, and 1 bit for the sign”.  Lee, Page 1 Abstract, concludes:  “The results prove that the
networks can be quantized into 8-bit integer precision without fine tuning.”) *Meta file to be taught by other references below.
	Lee and Wu are analogous art because they are both in the field of endeavor or neural network quantization.
	It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Lee and Wu.  One of ordinary skill in the art would be motivated to do so in order to minimize quantization error (Lee, Page 2 Section 2.1 :  “Each channel of the IFMs and the OFMs has an independent fractional length based on its expected dynamic range while each channel of the kernels has a fractional length which tightly fits its known values”, and end of Section 1:  “Our method significantly reduces the accuracy loss caused by quantizing to lower precision without increasing the inference computation cost.”).
	However, the combination of Wu and Lee does not explicitly teach that determining a set of quantization parameters is based on a meta file associated with the hardware-based neural network model.
	Croxford teaches that determining a set of quantization parameters is based on a meta file associated with the hardware-based neural network model. (Croxford, Para [0091], discloses:  “In some examples, the compression metadata may be generated by a CNN accelerator separately from the compression data. In other examples, the compression metadata may be generated by the CNN accelerator from input data that may be not be compressed. Therefore, when analysis is conducted to identify whether a region of input data corresponds to a data pattern, the metadata of the input data (rather than the input data itself) may be analyzed. As such, the processing power required to analyze the input data may be reduced.”  Here, Croxford discloses storing data about potential compression in a metadata file.)
	Croxford and the combination of Wu and Lee are analogous art because they are both in the field of endeavor of neural networks.
	The combination of Wu and Lee teaches channel-wise quantization of a convolutional neural network.  Croxford teaches storing potential compression data for a convolutional neural network in metadata.  Quantization is a type of compression, and the combination would result in channel-wise quantization of a convolutional neural network in which the quantization data is stored in a meta file.  It would have been obvious to make this combination before the effective filing date of the claimed invention.  One of ordinary skill in the art would be motivated to do so in order to save memory space by providing data about how data may be compressed (Croxford [0061]:  “Said compression metadata information may indicate if regions of the input data correspond to a known data pattern. For example, compression of a static photographic image (e.g. a tree in a field) may produce compression metadata information that indicates that a region of the image data is all the same color (e.g. a green region corresponding to the grass)”).
	However, the combination of Wu, Lee, and Croxford does not explicitly teach wherein weights and bias for each channel of each layer of the hardware-based network model have been quantized using an offline quantization tool; determining, based on a meta file associated with the hardware-based neural network model, a set of quantization parameters associated with the channel, wherein the meta file is generated using the offline quantization tool
	Yu teaches wherein weights and bias for each channel of each layer of the hardware-based network model have been quantized using an offline quantization tool; determining, based on a meta file associated with the hardware-based neural network model, a set of quantization parameters [associated with the channel], wherein the meta file is generated using the offline quantization tool (Recall above that Wu discloses weight and bias parameters for each channel and Croxford discloses storing compression data in a meta file.  Yu discloses weight and bias in [0043-0044]:  “FC layer applies a linear transformation on the input feature vector: fout=Wfin+b  where W is an nout=nin transformation matrix and b is the bias term.”  Yu, Para [0089-0090], discloses:  “The proposed quantization flow mainly consists of two phases: the weight quantization phase, and the data quantization phase.  The weight quantization phase aims to find the optimal fl for weights in one layer.”  Yu, Para [0095], discloses:  “The data quantization phase aims to find the optimal fl for a set of feature maps between two layers.”  Here, Yu discloses a process similar to Instant Specification [0018], which states “In the offline stage, statically generated metadata (e.g., weights and bias) of the neural network model is quantized…Dynamically generated metadata (e.g., an input feature map) is not quantized in the offline stage.”  Yu, Para [0100], discloses:  “In the above example of data quantization, weight quantization is conducted before data quantization”  Here, the “weight quantization” that is first performed by Yu’s hardware, which is described in Yu, Abstract:  “In particular, the present invention relates to how to implement and optimize a convolutional neural network based on an embedded FPGA. Specifically, it proposes a CPU+FPGA heterogeneous architecture to accelerate ANNs.”  This hardware may be considered a “tool”, and therefore the portion of Yu’s accelerator that performs the “weight quantization phase” is an “offline quantization tool.”  Furthermore, one of ordinary skill in the art will appreciate that in order to apply the weights during the second phase to calculate the “feature maps” in the “data quantization phase”, the weights will have had to be stored in memory, and such memory may be considered a “meta file”.  In fact, Yu [0123] states:  “In certain embodiment, instead of having separate input data buffer and weight buffer, the input buffer further comprises an input data buffer and a weight buffer. Said weight buffer is for storing weights of the ANN. Said input data buffer might be a line data buffer, for storing data and holding the data with delayers in order to reuse the data.” Also recall above that Croxford discloses a meta file.  Based on this, Yu determines a set of quantization parameters “fl” for the feature maps.)
Yu and the combination of Wu, Lee, and Croxford are analogous art because they are both in the field of endeavor of machine learning.
It would have been obvious before the effective filing date of the claimed invention to combine the CNN quantization of Wu, Lee, and Croxford with the two-stage quantization of Yu.  One of ordinary skill in the art would be motivated to do so in order to improve the accuracy of the quantization (Yu [0087-0089]:  “In order to convert floating-point numbers into fixed-point ones while achieving the highest accuracy, it proposes a dynamic-precision data quantization strategy and an automatic workflow. Unlike previous static precision quantization strategies, in the proposed data quantization flow, fl is dynamic for different layers and feature map sets while static in one layer to minimize the truncation error of each layer. The proposed quantization flow mainly consists of two phases: the weight quantization phase, and the data quantization phase.”)
		
	As per Claim 2, the combination of Wu, Lee, Croxford, and Yu teaches the method of claim 1.  Lee teaches wherein the first bit width comprises 32 bits and the second bit width comprises 8 bits (Lee, Page 2 Section 2.1 Para 2, discloses:  “Figure 1 demonstrates how the IFMs and the kernels from different channels having different fractional lengths in the channel-wise quantization scheme are computed through a convolution layer compared to the layer-wise
scheme. In this example, the input and the output of the convolution layer and weights are all bound to 8 bits while the partial sums are allowed to be accumulated in 32 bits as to avoid data loss.”  Here, Lee teaches, as any one of ordinary skill in the art would be aware, that 32 bits is the default storage size.  Lee also teaches that 8 bits is the quantized bit width.)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Lee with Wu, Croxford, and Yu for at least the reasons recited in Claim 1.

	As per Claim 3, the combination of Wu, Lee, Croxford, and Yu teaches the method of claim 1.  Wu teaches wherein at least two channels of the hardware-based neural network model are associated with different quantization parameters (Wu, Para [0030], discloses:  “To solve this problem, the fine quantization technique is used on channel-wised quantization. In the convolutional weight vector, the maximum value for i-th output channel is defined as max.sub.v(w.sub.i)(i∈1, 2, . . . , M). The updated dynamic range per output channel is [−max.sub.v(w.sub.i),max.sub.v(w.sub.i)].”)

	As per Claim 4, the combination of Wu, Lee, Croxford, and Yu teaches the method of claim 1.  Wu teaches wherein at least two layers of the hardware-based neural network model are associated with different quantization parameters.  (Wu, Para [0014], discloses:  “S104: inputting the floating feature maps to a statistical analysis simulator to generate a dynamic quantization range for each layer of the floating pre-trained CNN model”).

As per Claim 5, the combination of Wu, Lee, Croxford, and Yu teaches the method of claim 1 as well as meta file and integers and bit width and quantized feature map (see Rejection to Claim 1).  Wu teaches further comprising: for each of the plurality of channels associated with the input feature map received at the first layer,
determining the weights and bias associated with the channel from the meta file (Wu, Para [0029], discloses:  “In the weight vectors, it has a convolutional vector or batch-normalization scaling vector (ω) and a bias vector (b).”  Here, Wu discloses “weight vectors” as comprising weights (“ω”) and bias (“b”).  Wu, Para [0030], discloses these “weight vectors” as being quantized per channel:  “To solve this problem, the fine quantization technique is used on channel-wised quantization. In the convolutional weight vector, the maximum value for i-th output channel is defined as max.sub.v(w.sub.i)(i∈1, 2, . . . , M). The updated dynamic range per output channel is [−max.sub.v(w.sub.i),max.sub.v(w.sub.i)].”  
and generating, from the first layer, an output feature map represented by a third set of integers of the first bit width based on the quantized feature map, the quantized weights, and the quantized bias associated with the channel (As shown above, Wu discloses quantized feature map, quantized weights, and quantized bias.  Wu, Para [0020-0021], discloses producing an output feature map based on a quantized feature map:  “An embodiment discloses a quantization methodology, a fixed-precision Representation on activation vector is described as below. To fully represent the 32-bit floating-point value when using dynamic fixed-point format in the activation vector (x), a scalar factor s is defined as shown in equation 3.”  Note that Wu discloses this is done across layers, including the first layer in Wu [0031]:  “The invention provides a method of processing a convolution neural network. The method comprises inputting input data to a pre-trained convolution neural network (CNN) to generate floating feature maps for each layer of the floating pre-trained CNN model.”  Every output of subsequent layers is ultimately generated from the first layer, and this is generated based on quantized feature maps, weights, and biases from the first layer and subsequent layers.  Wu, Para [0030] indicates that the feature map is quantized to the first bit width, as only the weights and biases are quantized to another bit width in “fine quantization”.  These are based on quantized values for each channel, as Wu [0030] discloses “channel-wise quantization”.  Recall above in Claim 1 that Lee also discloses channel-wise quantization for the feature maps.  Thus, in combination, Wu, Lee, and Croxford discloses the claimed limitation of generating, from the first layer, an output feature map represented by a third set of integers of the first bit width based on the quantized feature map, the quantized weights, and the quantized bias associated with the channel.”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Lee with Wu, Croxford, and Yu for at least the reasons recited in Claim 1.

As per Claim 6, the combination of Wu, Lee, Croxford, and Yu teaches the method of claim 5 as well as integers (see Rejection to Claim 5).  Wu teaches further comprising re-quantizing the output feature map from the third set of integers of the first bit width to a fourth set of integers of the second bit width before providing the output feature map as an input feature map to a second layer of the hardware-based neural network model.  (Wu discloses an iterative process connecting layers of the network in Wu [0031]:  “The invention provides a method of processing a convolution neural network. The method comprises inputting input data to a pre-trained convolution neural network (CNN) to generate floating feature maps for each layer of the floating pre-trained CNN model.”  Wu, Para [0019], discloses re-quantizing at a per channel level between two sets of data types:  “To minimize the quantization errors in the weighting vectors (ω and b), coarse-to-fine dynamic fixed-point approximation is performed on the weighting vectors”  Here, the “fine” quantization is a re-quantization that is done on a per-channel level, after the initial quantization.  However, Wu discloses doing this to weight and bias, rather than activations.  
As discussed in Claim 1, Lee explicitly teaches a channel-wise quantization of the feature map, as Lee, Page 2 Section 2.1, discloses:  “In the channel-wise quantization, the fractional lengths for the feature maps and the weights can be customized for each channel to minimize the impact of low-precision rounding.”  Thus, in combination, Wu and Lee discloses re-quantizing the output feature map before providing the output feature map as an input feature map to a second layer of the neural network model.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Lee with Wu, Croxford, and Yu for at least the reasons recited in Claim 1.

As per Claim 7, the combination of Wu, Lee, Croxford, and Yu teaches the method of claim 5.  Wu teaches wherein at least two channels of the hardware-based neural network model are associated with different weights and bias.  (Wu, Para [0030], discloses:  “According to the statistical analysis on the convolutional weight vector (w.sub.(k×k×N×M)×1), the values for each output channel (the total number of output channels is M) vary differently. The quantization accuracy will thus be significantly impacted when the dynamic quantization range ([−max.sub.v(w),−max.sub.v(w)]) is used to cover the entire output channels. To solve this problem, the fine quantization technique is used on channel-wised quantization. In the convolutional weight vector, the maximum value for i-th output channel is defined as max.sub.v(w.sub.i)(i∈1, 2, . . . , M). The updated dynamic range per output channel is [−max.sub.v(w.sub.i),max.sub.v(w.sub.i)].”  Here, WU discloses different quantization parameters for different channels, because the “statistical analysis” will vary between channels.  Therefore, the values of the weights and biases are different between channels.)

	As per Claim 8, the combination of Wu, Lee, Croxford, and Yu teaches the method of claim 1 as well as quantizing the input feature map at each channel and integers and quantization parameters and bit width (see Rejection to Claim 1).  Lee teaches wherein quantizing the input feature map at each channel includes mapping each of the first set of integers of the first bit width to an integer in the second set of integers of the second bit width based on the set of quantization parameters.  (Lee, Page 2 Section 2.1 Para 2, discloses:  “Figure 1 demonstrates how the IFMs and the kernels from different channels having different fractional lengths in the channel-wise quantization scheme are computed through a convolution layer compared to the layer-wise scheme. In this example, the input and the output of the convolution layer and weights are all bound to 8 bits while the partial sums are allowed to be accumulated in 32 bits as to avoid data loss.”  Here, Lee teaches, as any one of ordinary skill in the art would be aware, that 32 bits is the default storage size.  Lee also teaches that 8 bits is the quantized bit width.)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Lee with Wu and Croxford for at least the reasons recited in Claim 1.

Claims 9-16 are integrated circuit claims corresponding to method Claims 1-8 respectively.  The difference is that they recite scaling logic.  Lee, Page 2 Para 1 Last sentence, discloses scaling to 8 bits:  “The results show that various state-of-the-art DNNs trained on the ImageNet dataset can readily be converted for 8-bit fixed-point accelerators without fine tuning by using a few training samples for profiling.”  Lee, Page 3 Figure 1 discloses MAC units.  For the remaining limitations, Claims 9-16 are rejected for the same reasons as Claims 1-8, respectively.  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Lee with Wu, Croxford, and Yu for at least the reasons recited in Claim 1.

Claim(s) 17-18 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Lee et. al. (“Quantization for Rapid Deployment of Deep Neural Networks”; hereinafter “Lee”) in view of Croxford et. al. (US 2020/0175338 A1; hereinafter “Croxford”) and Yu et al. (US 2018/0046913 A1; hereinafter “Yu”).
As per Claim 17, Lee teaches A computer-implemented method for quantizing a neural network model, the method including: 
extracting, [using an offline quantization tool], a subset of data from a training data set, wherein the training data set includes a first subset used to train the neural network model and a second subset used to validate the neural network model represented by floating point values, the neural network model having a plurality of layers and each of the plurality of layers including a plurality of channels (Lee, Page 5 Section 3.1, discloses:  “The proposed quantization method was evaluated on various state-of-the-art deep networks trained on the ImageNet dataset containing 1.2M training and 50k validation examples. Pretrained networks were quantized into 8-bit fixed point format by using the profiling dataset sampled from the training set and evaluated on the whole validation dataset (50k examples).”  Here, Lee discloses extracting a subset of data from a training data set (“sampled from the training set”) and second subset used to validate a first neural network (“whole validation dataset (50k examples)”).  Lee, Page 6 Table 1 Caption discloses that the original values are 32-bit floating point:  “No retraining is performed. Reference (Float32) lists baseline accuracies while all other figures are accuracy losses.”  Lee, Page 3 first paragraph discloses a plurality of layers and channels:  “The channel-wise quantization can be applied to a fully-connected (FC) layer by considering each unit as a channel. However, for simplicity, we use the layer-wise quantization for the activations of fully-connected (FC) layers.”) *Offline quantization tool to be taught by other art below
performing, [using an offline quantization tool], a plurality of inferences on the subset of data using the neural network model, to generate a data distribution for each channel of each layer of the neural network model; (Lee, Page 5 Section 3.1, discloses “validate a first neural network”, wherein validating comprises performing a plurality of inferences.  Then Lee continues in the next sentences:  “Uniform linear quantization was used for all the cases. Batch normalization[12] layers were fused into convolution layers before the quantization process. Unsigned integer format was employed for the activation values with the ReLU nonlinearity.”  Here, Lee discloses a plurality of layers.  Lee, Page 2 Para 2 discloses a plurality of channels:  “In this paper, we introduce a novel technique in which fine tuning is not necessary for 8-bit linear quantization which quantizes the feature maps and the parameters for individual channels instead of layers to accommodate for the inter-channel variations in the dynamic range.”  Lee, Page 8 Conclusion discloses distributions for each channel:  “We further improved our method by considering the variations in distribution across the channels.”) *Offline quantization tool to be taught by other art below
quantizing, [using an offline quantization tool], weights and bias for each channel of each layer of the neural network model from floating-point values to integer values based on the data distribution for that channel (Lee, Page 1 Abstract, Last Sentence, discloses:  “The results prove that the networks can be quantized into 8-bit integer precision without fine tuning.”  Lee, Page 6 Para 1, discloses quantizing channels based on distributions:  “We evaluated the channel-wise quantization in four modes depending on the method to  determine the fractional lengths: MAX, Laplace, S.Cauchy, and PDF-aware.”  See the caption on the same page for Table 1 that described Laplace and Cauchy as distributions:  “Laplace (optimal fraction length based on Laplace distribution), S.Cauchy (optimal fraction length based on truncated super Cauchy distribution.”  Lee also discloses weights on Page 2 Section 2.1:  “In the channel-wise quantization, the fractional lengths for the feature maps and the weights can be customized for each channel to minimize the impact of low-precision rounding.”  One of ordinary skill in the art will appreciate that weights and biases are trained together, and Lee notes this in previous work in Page 2 Section 2:  “Previous implementations such as Ristretto[6], a fixed-point quantization simulator based on Caffe, reserves three placeholders for the fractional lengths (defined as the number of required bits for the fractional part of a fixed-point number) per layer, one each for the input and output feature maps (IFM and OFM respectively) and for the layer parameters (weights and biases).”)) *Offline quantization tool to be taught by other art below
generating, [using an offline quantization tool], a quantization model for each channel of each layer based on the data distribution for that channel (Lee, Page 6 Para 1, discloses quantizing channels based on distributions:  “We evaluated the channel-wise quantization in four modes depending on the method to  determine the fractional lengths: MAX, Laplace, S.Cauchy, and PDF-aware.”  See the caption on the same page for Table 1 that described Laplace and Cauchy as distributions:  “Laplace (optimal fraction length based on Laplace distribution), S.Cauchy (optimal fraction length based on truncated super Cauchy distribution.” *Offline quantization tool to be taught by other art below
wherein the neural network model can be deployed in an integrated circuit to perform data classification operations in integers, and wherein the quantization model for each channel of each layer of the neural network model is utilized to scale feature map data generated in that channel. (Lee, Page 2 Section 2.1, discloses:  “In the channel-wise quantization, the fractional lengths for the feature maps and the weights can be customized for each channel to minimize the impact of low-precision rounding. Each channel of the IFMs and the OFMs has an independent fractional length based on its expected dynamic range while each channel of the kernels has a fractional length which tightly fits its known values. Figure 1 demonstrates how the IFMs and the kernels from different channels having different fractional lengths in the channel-wise quantization scheme are computed through a convolution layer compared to the layer-wise scheme. In this example, the input and the output of the convolution layer and weights are all bound to 8 bits while the partial sums are allowed to be accumulated in 32 bits as to avoid data loss.”  Here, Lee discloses generating a set of quantization metadata (“fractional lengths for the feature maps and the weights can be customized”) for each of the channels (“for each channel”), wherein the second neural network model can be deployed in an integrated circuit (see Lee Figure 1 with the “MAC” (multiply accumulator) unit hardware) to perform data classification in integers (Recall Lee Page 1 Abstract “quantized into 8-bit integer precision”) wherein the quantization metadata is utilized to scale data generated in each of the channels of each layer of the neural network model, including for feature maps (“fractional lengths for the feature maps and the weights can be customized for each channel”)).
However, Lee does not explicitly teach storing, using an offline quantization tool, the weights and bias and the quantization model for each channel of each layer of the neural network model in a meta file.
Croxford teaches storing, [using an offline quantization tool], the weights and bias and the quantization model [for each channel of each layer] of the neural network model in a meta file. (Recall above that Lee discloses for each channel of each layer.  Croxford, Para [0091], discloses:  “In some examples, the compression metadata may be generated by a CNN accelerator separately from the compression data. In other examples, the compression metadata may be generated by the CNN accelerator from input data that may be not be compressed. Therefore, when analysis is conducted to identify whether a region of input data corresponds to a data pattern, the metadata of the input data (rather than the input data itself) may be analyzed. As such, the processing power required to analyze the input data may be reduced.”  Here, Croxford discloses storing data about potential compression in a metadata file.  Croxford discloses weight and biases in [0034]:  “In the example of FIG. 2, the region 111 of the input data 110 is multiplied with the weights of the kernel 210 before accumulation 220. In cases where the input data 110 has n number of channels, the kernel 210 will have n number of channels i.e. a matrix of 3×3×n. A bias value may then be applied to the accumulated result 220. The resulting convolved output 225 may then be input into an activation function 230.”) *Offline quantization tool to be taught by other art below
	Croxford and Lee are analogous art because they are both in the field of endeavor of neural networks.
	Lee teaches channel-wise quantization of a convolutional neural network.  Croxford teaches storing potential compression data for a convolutional neural network in metadata.  Quantization is a type of compression, and the combination would result in channel-wise quantization of a convolutional neural network in which the quantization data is stored in a meta file.  It would have been obvious to make this combination before the effective filing date of the claimed invention.  One of ordinary skill in the art would be motivated to do so in order to save memory space by providing data about how data may be compressed (Croxford [0061]:  “Said compression metadata information may indicate if regions of the input data correspond to a known data pattern. For example, compression of a static photographic image (e.g. a tree in a field) may produce compression metadata information that indicates that a region of the image data is all the same color (e.g. a green region corresponding to the grass)”).
However, the combination of Lee and Croxford does not explicitly teach using an offline quantization tool.
Yu teaches using an offline quantization tool. (Yu, Para [0089-0090], discloses:  “The proposed quantization flow mainly consists of two phases: the weight quantization phase, and the data quantization phase.  The weight quantization phase aims to find the optimal fl for weights in one layer.”  Yu, Para [0095], discloses:  “The data quantization phase aims to find the optimal fl for a set of feature maps between two layers.”  Here, Yu discloses a process similar to Instant Specification [0018], which states “In the offline stage, statically generated metadata (e.g., weights and bias) of the neural network model is quantized…Dynamically generated metadata (e.g., an input feature map) is not quantized in the offline stage.”  Yu, Para [0100], discloses:  “In the above example of data quantization, weight quantization is conducted before data quantization”  Here, the “weight quantization” that is first performed by Yu’s hardware, which is described in Yu, Abstract:  “In particular, the present invention relates to how to implement and optimize a convolutional neural network based on an embedded FPGA. Specifically, it proposes a CPU+FPGA heterogeneous architecture to accelerate ANNs.”  This hardware may be considered a “tool”, and therefore the portion of Yu’s accelerator that performs the “weight quantization phase” is an “offline quantization tool.”)
Yu and the combination of Lee, and Croxford are analogous art because they are both in the field of endeavor of machine learning.
It would have been obvious before the effective filing date of the claimed invention to combine the CNN quantization of Lee and Croxford with the two-stage quantization of Yu.  One of ordinary skill in the art would be motivated to do so in order to improve the accuracy of the quantization (Yu [0087-0089]:  “In order to convert floating-point numbers into fixed-point ones while achieving the highest accuracy, it proposes a dynamic-precision data quantization strategy and an automatic workflow. Unlike previous static precision quantization strategies, in the proposed data quantization flow, fl is dynamic for different layers and feature map sets while static in one layer to minimize the truncation error of each layer. The proposed quantization flow mainly consists of two phases: the weight quantization phase, and the data quantization phase.”)

As per Claim 18, the combination of Lee, Croxford, and Yu teaches the method of claim 17.  Lee teaches further comprising generating a distribution of floating point values at each of the plurality of channels based on the plurality of inferences. (Lee, Page 5 Section 2.3, discloses:  “Large variations in distributions across the OFM channels naturally led us to search for the optimal PDF for each channel in determining the fractional length”).

As per Claim 20, the combination of Lee, Croxford, and Yu teaches the method of claim 17.  Lee teaches wherein the training data set includes a first subset used to train the neural network model and a second subset used to validate the neural network model represented by floating point values (Lee, Page 5 Section 3.1, discloses:  “The proposed quantization method was evaluated on various state-of-the-art deep networks trained on the ImageNet dataset containing 1.2M training and 50k validation examples. Pretrained networks were quantized into 8-bit fixed point format by using the profiling dataset sampled from the training set and evaluated on the whole validation dataset (50k examples).”  Here, Lee discloses extracting a subset of data from a training data set (“sampled from the training set”) and second subset used to validate a first neural network (“whole validation dataset (50k examples)”).  Lee, Page 6 Table 1 Caption discloses that the original values are 32-bit floating point:  “No retraining is performed. Reference (Float32) lists baseline accuracies while all other figures are accuracy losses.”)

Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Lee, Croxford, and Yu, further in view of Diril et. al. (US 2019/0171927 A1; hereinafter “Diril”).
As per Claim 19, the combination of Lee and Yu teaches the method of claim 18.  Lee teaches further comprising: 
determining a maximum floating-point value and a minimum floating-point value from the distribution (Lee, Page 2 Section 2.1 Para 1, discloses:  “Each channel of the IFMs and the OFMs has an independent fractional length based on its expected dynamic range while each channel of the kernels has a fractional length which tightly fits its known values.”  Here, Lee discloses floating-point (“fractional length”) and a minimum and a maximum (“range”)).	
determining a maximum integer value of a first bit width and a minimum integer value of the first bit width that respectively correspond to the maximum floating-point value and the minimum floating-point value (Lee, Page 1 Abstract, discloses:  “The results prove that the networks can be quantized into 8-bit integer precision without fine tuning.”  Here, Lee discloses a range of integer values (8-bit), and an 8-bit value is finite and thus comprises a minimum and maximum value.)
and constructing a set of quantization parameters for the channel using the maximum integer value, the maximum floating-point value, the minimum floating-point value, the minimum integer value, and an integer type of a second bit width.  (Lee Page 2 Section 2.1 discloses:  “Figure 1 demonstrates how the IFMs and the kernels from different channels having different fractional lengths in the channel-wise quantization scheme are computed through a convolution layer compared to the layer-wise scheme. In this example, the input and the output of the convolution layer and weights are all bound to 8 bits while the partial sums are allowed to be accumulated in 32 bits as to avoid data loss.”  Here, Lee discloses quantizing from a range of 32 bit values to a range of 8 bit values.)
Lee suggests, but does not explicitly teach for each of the plurality of channels of each layer of the neural network model, removing one or more outlier values based on a predetermined percentage from each end of the distribution of floating point values (Lee Page 5 Section 2.3 discloses for each channel:  “Large variations in distributions across the OFM channels naturally led us to search for the optimal PDF for each channel in determining the fractional length.”  Lee, Page 5 Section 3.1, discloses:  “We figured out that the outliers were the major source of accuracy degradation after layer-wise quantization. For example, using the max value of a parameter could significantly overestimate its dynamic range when there are outliers with extraordinarily large values which cannot be seen in the validation set or when deployed. Carefully removing those outliers will significantly improve the quality of quantization even if layer-wise max-based method is used. Thus, there are previous papers showing better results than our baseline layer-wise quantization. However, we did not consider such improvement in the baseline because it requires extra effort and the process itself might taint the dataset since there’s no explicitly clear boundary of the outliers.”  Here, Lee does not “teach away” from removing outliers.  In fact, Lee suggests outliers can cause accuracy degradation and other papers show removing them can improve accuracy, but they don’t remove them because it “requires extra effort”.  They do not expend this effort on determining “an explicitly clear boundary of the outliers”, and therefore by not determining any “explicitly clear boundary”, they risk tainting the dataset.  Lee takes an ambivalent stance on this step.)
Diril explicitly teaches for each of the plurality of channels of each layer of the neural network model, removing one or more outlier values based on a predetermined percentage from each end of the distribution of floating point values (Diril, Para [0042], discloses quantization:  “FIG. 4 is a flow diagram of an exemplary computer-implemented method 400 for providing layer-level quantization in various types of neural networks. The steps shown in FIG. 4 may be performed by any suitable computer-executable code and/or computing system, including the system(s) illustrated in FIGS. 1, 7, and 8. In one example, each of the steps shown in FIG. 4 may represent an algorithm whose structure includes and/or is represented by multiple sub-steps, examples of which will be provided in greater detail below.”  Diril, Para [0044-0045] discloses:  “Returning to FIG. 4, at step 420 one or more of the systems described herein may store a first limit value of the activation layer in a data storage system. For instance, processing unit 765 may store the first limit value in register 790A or within any other part of a data storage subsystem. The first limit value may correspond to a minimum value for the activation layer, such as an absolute sample minimum output (e.g., the lowest value of an activation layer, which may be identified by passing output values through a min-max unit) or an estimated minimum output value (e.g., an approximate minimum that discards outliers, a minimum within a predetermined standard deviation of values for a particular layer, etc.). One of functional units 770 may be a processing element (e.g., a min-max unit or any other suitable processing element) for determining or detecting the minimum value of the activation layer. At step 430, one or more of the systems described herein may store a second limit value of the activation layer in the data storage system. For instance, accelerator 700 may store the second limit value in register 790B or in any other part of a data storage subsystem. This second limit value may correspond to a maximum value for the activation layer, such as an absolute maximum weight or filter value (e.g., the highest value of an activation layer, which may be identified by passing output values through a min-max unit) or an estimated maximum weight or filter value (e.g., an approximate maximum that discards outliers, a maximum within a predetermined standard deviation of values for a particular layer, etc.). One of functional units 770 may be a processing element for determining the maximum value of the activation layer. In certain implementations, a single functional unit 770 may determine the minimum value and the maximum value.”  Here, Diril discloses removing outlier values form the min and max ends (“approximate minimum that discards outliers”, “an approximate maximum that discards outliers”).  One of ordinary skill in the art will appreciate that an “outlier” is a specific percentage away from other values on each end of a distribution, for example determined by a “standard deviation” (also disclosed by Diril above) away from the mean, which comprises a percentage.)
Diril and the combination of Lee, Croxford, and Yu are analogous art because they are both in the field of endeavor of quantizing neural networks.
It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Diril with Lee, Croxford, and Yu.  One of ordinary skill in the art would be motivated to do so in order to minimize quantization error as noted by Lee:  “Carefully removing those outliers will significantly improve the quality of quantization even if layer-wise max-based method is used. Thus, there are previous papers showing better results than our baseline layer-wise quantization”).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Zhuang et al. (“Towards Effective Low-bitwidth Convolutional Neural Networks”) also discloses a two-phase quantization.  Page 7920 Intro Para 2 discloses:  “The first method is to adopt a two-stage training process. At the first stage, only the weights of a network is quantized. After obtaining a sufficiently good solution of the first stage, the activation of the network is further required to be in low-precision and the network will be trained again.”  Page 7922 Section 3.2 “Two-stage optimization” discloses:  “To reduce the difficulty of training, we devise a two-stage optimization procedure: at the first stage, we only quantize the weights of the network while setting the activations to be full precision. After the converge (or after certain number of iterations) of this model, we further apply the quantization function on the activations as well and retrain the network.”
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEONARD A SIEGER whose telephone number is (571)272-9710. The examiner can normally be reached M-F 8:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571) 272-9767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/L.A.S./Examiner, Art Unit 2126     
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126