DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Election/Restrictions
2.	Claims 8 and 9 are withdrawn from further consideration pursuant to 37 CFR 1.142(b) as being drawn to a nonelected species, there being no allowable generic or linking claim. Election was made without traverse in the reply filed on 10/12/2022.

Drawings
3.	The drawings are objected to as failing to comply with 37 CFR 1.84(p)(5) because they do not include the following reference sign(s) mentioned in the description: S192, S194, and S198 (see specification 0006]-[0008]).  Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

4.	The drawings are objected to as failing to comply with 37 CFR 1.84(p)(5) because they include the following reference character(s) not mentioned in the description: S142, S144, and S148 (FIG. 1).  Corrected drawing sheets in compliance with 37 CFR 1.121(d), or amendment to the specification to add the reference character(s) in the description in compliance with 37 CFR 1.121(b) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would 
 been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


5.	Claims 1-5, 10, 11, 15, 19, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Imber (US 20190227893 A1).

	Regarding claim 1, Imber teaches a method for quantizing artificial neural networks (i.e., “Methods for determining a fixed point format for one or more layers of a DNN”; see Abstract) including:
accessing a floating-point network comprising a set of floating-point layers (i.e., “an instantiation of the DNN that is configured to represent the values input to, and output from, each layer in a floating point number format”; see [0071]; “an example DNN 100 that comprises a plurality of layers 102-1, 102-2, 102-3”; see [0031]);
accessing a set of validation examples for the floating-point network (i.e., “test input data is provided”; see [0067]);
for each floating-point layer in the set of floating-point layers:
calculating a set of example input activations of the floating-point layer based on the set of validation examples and a preceding subset of floating-point layers in the floating-point network (i.e., “test input data is provided to the instantiation of the DNN and the output of the instantiation of the DNN in response to the test input data is recorded. Specifically, the test input data propagates through the DNN until it reaches the output”; see [0067]; “the baseline output may be the output of an instantiation of the DNN that is configured to represent the values input to, and output from, each layer in a floating point number format”; see [0071] and FIG. 7);
calculating a set of example output activations of the floating-point layer based on the set of example input activations of the floating point layer (i.e., “test input data is provided to the instantiation of the DNN and the output of the instantiation of the DNN in response to the test input data is recorded. Specifically, the test input data propagates through the DNN until it reaches the output”; see [0067]; “the baseline output may be the output of an instantiation of the DNN that is configured to represent the values input to, and output from, each layer in a floating point number format”; see [0071] and FIG. 7);
converting the floating-point layer to a low-bit-width layer in a set of low-bit-width layers (i.e., “an instantiation of the DNN that is configured, or initialised, to represent the weights and/or input data values of each layer of the DNN using initial or starting fixed point number format(s) for that layer”; see [0063]; “the output data generated by the floating point instantiation of the DNN may be used as the benchmark or baseline output against which to gauge the accuracy of output data generated by an instantiation of the DNN using fixed point number formats to represent the values input to, and output from, each layer of the DNN”; see [0071]; note that the above indicates that the fixed point DNN is initialized based on the floating point DNN);
calculating a set of low-bit-width output activations of the low-bit-width layer based on the set of example input activations (i.e., “test input data is provided to the instantiation of the DNN and the output of the instantiation of the DNN in response to the test input data is recorded. Specifically, the test input data propagates through the DNN until it reaches the output”; see [0067] and FIG. 7); and
calculating a per-layer deviation statistic of the low-bit-width layer based on the set of low-bit-width output activations of the low-bit-width layer and the set of example output activations of the floating-point layer (i.e., “the portion of the differentiable error calculated in block 606 attributable to the quantisation of the weights of a layer”; see [0075] and FIG. 10);
generating a quantized network representing the floating-point network and comprising the set of low-bit-width layers (i.e., “an instantiation of the DNN that is configured, or initialised, to represent the weights and/or input data values of each layer of the DNN using initial or starting fixed point number format(s) for that layer”; see [0063]);
calculating an accuracy of the quantized network based on the set of validation examples (i.e., “the accuracy of the output may be determined. In some cases, the accuracy may be calculated based on the ground-truth accuracy of the output. For example, the accuracy may be calculated as a Top-1 classification accuracy, or a Top-5 classification accuracy based on known correct classifications or labels for the test input data”; see [0086]); and
while a is below the accuracy of the quantized network, sequentially (i.e., “If it is determined that the accuracy as determined in block 1002 is greater than or equal to the accuracy threshold (ATh) then the adjustment made in block 902 is accepted and the method 1000 proceeds to block [604]”; see [0087] and FIG. 10), from a least deviating low-bit-width layer in the set of low-bit-width layers toward a greatest deviating low-bit-width layer in the set of low-bit-width layers (i.e., “the fixed point number format of the layer with the lowest weight quantisation error portion or the lowest input data value quantisation error portion may be adjusted… the appropriate fixed point number format may be adjusted to reduce the number of mantissa bits”; see [0080]; note that the adjustment is iteratively, therefore sequentially; see FIG. 10):
converting a floating-point layer represented by the low-bit-width layer to a much lower-bit-width layer (i.e., “the appropriate fixed point number format may be adjusted to reduce the number of mantissa bits”; see [0080]);
replacing the low-bit-width layer with the much lower-bit-width layer in the quantized network (i.e., “then the adjustment made in block 902 is accepted and the method 1000 proceeds to block [604]”; see [0087] and FIG. 10);
updating the accuracy of the quantized network based on the set of validation examples (see block 1002 in the loop of FIG. 10); and
in response to the accuracy of the quantized network is equal to or below the 
	Imber does not explicitly disclose (see the underlined):
while a loss-of-accuracy threshold exceeds the accuracy of the quantized network, sequentially, from a greatest-deviating low-bit-width layer in the set of low-bit-width layers toward a least-deviating low-bit-width layer in the set of low-bit-width layers:
converting a floating-point layer represented by the low-bit-width layer to a high-bit-width layer;
replacing the low-bit-width layer with the high-bit-width layer in the quantized network; and
in response to the accuracy of the quantized network exceeding the loss-of-accuracy threshold, returning the quantized network.
	The difference is that Imber starts with an above-target-accuracy fixed-point network and reduces the bit width of a layer from a least-deviating layer iteratively by bringing the accuracy closer to the limit of the target accuracy, while the claimed invention starts with a below-target-accuracy fixed-point network and increases the bit width of a layer from a greatest-deviating layer iteratively by bringing the accuracy closer to the target accuracy. 
	They both seek to quantize the network to close to an accuracy threshold (or loss-of accuracy threshold). Moreover, Imber also suggests, alternatively, starting with a greatest deviating layer for the adjustment of bit width (i.e., “the mantissa bit length of the layer allocated the highest portion of the output error may be increased. Such a method may be iteratively repeated to determine an optimum set of fixed point number formats for the layers of a DNN”; see Abstract,  [0006], and [0045]).
	It would have been obvious to one of ordinary skill in the art at the time the application was filed to modify the method by starting with a below-target-accuracy fixed-point network and updating the layer bit width from a greatest deviating layer by bringing the accuracy closer to the target accuracy, by incorporating the steps: while a loss-of-accuracy threshold exceeds the accuracy of the quantized network, sequentially, from a greatest-deviating low-bit-width layer in the set of low-bit-width layers toward a least-deviating low-bit-width layer in the set of low-bit-width layers: converting a floating-point layer represented by the low-bit-width layer to a high-bit-width layer; replacing the low-bit-width layer with the high-bit-width layer in the quantized network; and in response to the accuracy of the quantized network exceeding the loss-of-accuracy threshold, returning the quantized network, as claimed. The motivation would be to quantize the neural network that has an accuracy closer to a target accuracy.

	Regarding claim 2, Imber does not explicitly disclose:
wherein converting the floating-point layer to the low-bit-width layer in the set of low-bit-width layers comprises converting the floating-point layer to an eight-bit layer in a set of eight-bit layers;
wherein calculating the set of low-bit-width output activations of the low-bit-width layer based on the set of example input activations comprises calculating a set of eight-bit output activations of the eight-bit layer based on the set of example input activations;
wherein calculating the per-layer deviation statistic of the low-bit-width layer based on the set of low-bit-width output activations of the low-bit-width layer and the set of example output activations of the floating-point layer comprises calculating a per-layer deviation statistic of the eight-bit layer based on the set of eight-bit output activations of the eight-bit layer and the set of example output activations of the floating-point layer;
wherein generating the quantized network representing the floating-point network and comprising the set of low-bit-width layers comprises generating the quantized network representing the floating-point network and comprising the set of eight-bit layers; and
further comprising, while the loss-of-accuracy threshold exceeds the accuracy of the quantized network, sequentially, from a greatest-deviating eight-bit layer in the set of eight-bit layers toward a least-deviating eight-bit layer in the set of eight-bit layers:
converting a floating-point layer represented by the eight-bit layer to a sixteen-bit layer;
replacing the eight-bit layer with the sixteen-bit layer in the quantized network;
updating the accuracy of the quantized network based on the set of validation examples; and
in response to the accuracy of the quantized network exceeding the loss-of- accuracy threshold, returning the quantized network.
	Claim 2 is an implementation of eight-bit width as the low-bit width and replacing the eight-bid width layer with a 16-bit width layer during adjustment of the bit width for a layer. However, Imber indicates a layer can have 8 or 16 bit width, which can be implemented depending on the particular application (such as hardware implementation; see [0083]). It would have been obvious to one of ordinary skill in the art at the time the application was filed to start with 8-bit width for the initial quantized network,  by incorporating the limitations of claim 2 as claimed. The motivation would be to use specific bit widths, such as 8 and 16 bit widths, based on the circumstances of the application.

	Regarding claim 3, Imber does not explicitly disclose:
in response to the accuracy of the quantized network exceeding the loss-of-accuracy threshold, loading the quantized network onto an edge device.
	But Imber teaches determining a fixed point Deep Neural Network “DNN” for use in configuring a hardware implementation of the DNN (see [0007]). It would have been obvious to one of ordinary skill in the art at the time the application was filed to incorporate the steps: in response to the accuracy of the quantized network exceeding the loss-of-accuracy threshold, loading the quantized network onto an edge device, as claimed. The motivation would be to facilitate the implementation of DDN in a particular field (such as an edge device) that power consumption, processing capabilities, or silicon area are limited (see Imber [0002]).

	Regarding claim 4, the claim recites the same substantive limitations as claim 1 and is rejected using the same teachings. Note that claim 4 is broader than claim 1 in that claim 4 does not require calculating the set of example input activations of the floating-point layer based on a preceding subset of floating-point layers in the floating-point network and does not require sequentially replacing the layer from a greatest-deviating layer.

	Regarding claim 5, the claim recites the same substantive further limitations as claim 2 and is rejected using the same teachings. 

	Regarding claim 10, Imber further teaches:
wherein converting the floating-point layer to the low-bit-width layer in the set of low-bit-width layers comprises, for each floating-point channel of the floating-point layer, converting the floating-point channel to a low-bit-width channel in the low-bit-width layer in the set of low-bit-width layers (i.e., “an instantiation of the DNN that is configured, or initialised, to represent the weights and/or input data values of each layer of the DNN using initial or starting fixed point number format(s) for that layer”; see [0063], “the output data generated by the floating point instantiation of the DNN may be used as the benchmark or baseline output against which to gauge the accuracy of output data generated by an instantiation of the DNN using fixed point number formats to represent the values input to, and output from, each layer of the DNN”; see [0071]).

	Regarding claim 11, Imber further teaches:
wherein converting the floating-point layer to the low-bit-width layer in the set of low-bit-width layers comprises, for each floating- point output channel of the floating-point layer, converting the floating-point output channel to a low-bit-width output channel in the low-bit-width layer in the set of low-bit-width layers (i.e., “an instantiation of the DNN that is configured, or initialised, to represent the weights and/or input data values of each layer of the DNN using initial or starting fixed point number format(s) for that layer”; see [0063], “the output data generated by the floating point instantiation of the DNN may be used as the benchmark or baseline output against which to gauge the accuracy of output data generated by an instantiation of the DNN using fixed point number formats to represent the values input to, and output from, each layer of the DNN”; see [0071]).

	Regarding claim 15, the claim recites the same substantive further limitations as claim 3 and is rejected using the same teachings. 

	Regarding claim 19, the further limitations correspond to the same features of claim 1 and is rejected using the same teachings. 

	Regarding claim 20, the claim recites the same substantive limitations as claims 4, 5 and 15 combined and is rejected using the same teachings. 

6.	Claims 6 and 7 are rejected under 35 U.S.C. 103 as being unpatentable over Imber in view of GUO et al. (US 20200357045 A1; hereinafter “GUO”).

	Regarding claim 6, Imber further teaches:
wherein ordering the set of low-bit-width layers based on the per-layer deviation statistic of each low-bit-width layer in the set of low-bit-width layers to generate the ordered set of low-bit-width layers comprises ordering the set of low-bit-width layers, based on the per-layer deviation statistic of each low-bit-width layer in the set of low-bit-width layers 
	Imber does not explicitly disclose (see the underlined):
wherein ordering the set of low-bit-width layers based on the per-layer deviation statistic of each low-bit-width layer in the set of low-bit-width layers to generate the ordered set of low-bit-width layers comprises ordering the set of low-bit-width layers, based on the per-layer deviation statistic of each low-bit-width layer in the set of low-bit-width layers and a network position of each low-bit-width layer in the set of low-bit-width layers.
	But GUO teaches:
lower level features recognized by the first several layers are significant for the neural network's accuracy (see [0069]).
	It would have been obvious to one of ordinary skill in the art at the time the application was filed to modify Imber in view of GUO to consider the position of the layer when ordering the layers, such that ordering the set of low-bit-width layers based on the per-layer deviation statistic of each low-bit-width layer in the set of low-bit-width layers to generate the ordered set of low-bit-width layers comprises ordering the set of low-bit-width layers, based on the per-layer deviation statistic of each low-bit-width layer in the set of low-bit-width layers and a network position of each low-bit-width layer in the set of low-bit-width layers, as claimed. The motivation would be to give earlier (lower level) layers more priority of precision for better network accuracy.

	Regarding claim 7, the prior art applied to the preceding linking claim(s) teaches the features of the linking claim(s).  
	Imber does not explicitly disclose:
wherein ordering the set of low-bit-width layers, based on the per-layer deviation statistic of each low-bit-width layer in the set of low-bit-width layers and the network position of each low-bit-width layer in the set of low-bit-width layers, to generate the ordered set of low-bit-width layers comprises:
selecting a highest-deviating subset of the set of low-bit-width layers based on the per-layer deviation statistic of each low-bit-width layer in the set of low-bit-width layers; and
ordering the highest-deviating subset from an earliest network position to a latest network position based on the network position of each low-bit-width layer in the set of low-bit-width layers to generate the ordered set of low-bit-width layers.
	However, as discussed in rejecting claim 6 above, earlier layers would be given higher priority of precision. Because both the deviating error and the position of a layer are factors for determining the layer’s priority (order) for precision, it would have been obvious to one of ordinary skill in the art at the time the application was filed to modify the method by ordering the set of low-bit-width layers, based on the per-layer deviation statistic of each low-bit-width layer in the set of low-bit-width layers and the network position of each low-bit-width layer in the set of low-bit-width layers, to generate the ordered set of low-bit-width layers comprises: selecting a highest-deviating subset of the set of low-bit-width layers based on the per-layer deviation statistic of each low-bit-width layer in the set of low-bit-width layers; and ordering the highest-deviating subset from an earliest network position to a latest network position based on the network position of each low-bit-width layer in the set of low-bit-width layers to generate the ordered set of low-bit-width layers, as claimed. The motivation would be to order the layers by the deviation errors and then by the positions, as one of known practices of sorting based on two factors.

7.	Claims 13 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Imber in view of Lo et al. (“Fixed-Point Implementation of Convolutional Neural Networks for Image Classification” 2018 International Conference on Advanced Technologies for Communications; Hereinafter “Lo”).

	Regarding claim 13, the prior art applied to the preceding linking claim(s) teaches the features of the linking claim(s).
	Imber does not explicitly disclose:
wherein converting the floating-point layer to the low-bit-width layer in the set of low-bit-width layers comprises converting the floating-point layer to the low-bit-width layer in the set of low-bit-width layers, the low-bit-width layer represented in Q-format fixed-point notation.
	However, Imber teaches:
Q format is a common fixed point number format (see [0040]).
	And Lo teaches a fixed point CNN using Q-format notation (i.e., “First we convert the data into appropriate formats. Next, we determine bit widths of the weights and input data. Recall that a Qm.n number format uses one bit to represent sign, m−1 bits to represent the integral value, and n bits to represent the fractional value”; see p. 106, col. 2, sect. C). 
	Because Imber suggests that Q format notation is common and can be used in some other fixed point neural networks (i.e., “in some cases, instead of using the Q format, some hardware implementations may be configured to use a fixed point number format for the input data values and weights wherein each value x is represented by a fixed integer exponent e and an n-bit mantissa m format” see [0042]), and Lo teaches using the Q format notation in its network, it would have been obvious to one of ordinary skill in the art at the time the application was filed to modify Imber in view of Lo to use a Q-format notation, such that the converting the floating-point layer to the low-bit-width layer in the set of low-bit-width layers comprises converting the floating-point layer to the low-bit-width layer in the set of low-bit-width layers, the low-bit-width layer represented in Q-format fixed-point notation, as claimed. The motivation would be to use the Q-format notation in a suitable implementation, such as in Lo.

	Regarding claim 14, the prior art applied to the preceding linking claim(s) teaches the features of the linking claim(s).
	Imber does not explicitly disclose:
wherein converting the floating-point layer to the low-bit-width layer in the set of low-bit-width layers comprises:
converting a set of floating-point weights of the floating-point layer to a set of Q-format fixed-point weights of the low-bit-width layer in the set of low-bit-width layers;
converting a bias of the floating-point layer to a Q-format fixed-point bias of the low- bit-width layer in the set of low-bit-width layers; and
bit-shifting the set of Q-format fixed-point weights to match the Q-format fixed- point bias.
	However, Imber teaches that a bit width is adjusted per layer and each layer may include weight and bias (see [0033]). It is also well-known that the weight and bias in a layer may have different data ranges and the conversion of the layer from floating point to fixed point, may result in different Q-formats (such as different fractional lengths). Therefore, it would have been obvious to one of ordinary skill in the art at the time the application was filed to match the bit widths of the weight and bias when quantizing the layers, by converting the floating-point layer to the low-bit-width layer in the set of low-bit-width layers comprising: converting a set of floating-point weights of the floating-point layer to a set of Q-format fixed-point weights of the low-bit-width layer in the set of low-bit-width layers; converting a bias of the floating-point layer to a Q-format fixed-point bias of the low- bit-width layer in the set of low-bit-width layers; and bit-shifting the set of Q-format fixed-point weights to match the Q-format fixed-point bias, as claimed. The motivation would be to use the same number format for data across the same layer, for easier implementation.

8.	Claims 16-18 are rejected under 35 U.S.C. 103 as being unpatentable over Imber in view of Lo et al. (US 20200264876 A1; hereinafter “Lo-876”).

	Regarding claim 16, the prior art applied to the preceding linking claim(s) teaches the features of the linking claim(s). 
	Imber does not explicitly disclose:
calculating a performance metric of the quantized network based on the set of validation examples; and
while the loss-of-accuracy threshold exceeds the accuracy of the quantized network, sequentially, in the ordered set of low-bit-width layers, updating the performance metric of the quantized network based on the set of validation examples.
	But Lo-876 teaches:
calculating a performance metric of the quantized network based on the set of validation examples (i.e., “at process block 620, a performance metric can be determined for the neural network. In some examples, the performance metric indicates accuracy of the neural network, for example, based on a set of training data. In some examples, the performance metric is based on at least one of the following metrics: a number of true positives, a number of true negatives, a number of false positives, or a number of false negatives generated by the neural network. In some examples, the performance metric is based on entropy of one or more layers of the neural network. In some examples, the performance metric is based on a rate distortion function”; see [0105]); and

	It would have been obvious to one of ordinary skill in the art at the time the application was filed to modify Imber in view of Lo-876 to consider other performance metric in adjusting the bit width of a layer, by incorporating the steps of calculating a performance metric of the quantized network based on the set of validation examples; and while the loss-of-accuracy threshold exceeds the accuracy of the quantized network, sequentially, in the ordered set of low-bit-width layers, updating the performance metric of the quantized network based on the set of validation examples, as claimed.  The motivation would be to consider both accuracy and performance of the network in adjusting the bit width of a layer sequentially to bring the quantized network closer to both the target accuracy and target performance.

	Regarding claim 17, the prior art applied to the preceding linking claim(s) teaches the features of the linking claim(s).
	Imber does not explicitly disclose:
while the loss-of-accuracy threshold exceeds the accuracy of the quantized network and while the performance metric of the quantized network exceeds a performance metric threshold, sequentially, in the ordered set of low-bit-width layers:
converting the floating-point layer represented by the low-bit-width layer to the high-bit-width layer;
replacing the low-bit-width layer with the high-bit-width layer in the quantized network;
updating the accuracy of the quantized network based on the set of validation examples;
updating the performance metric of the quantized network based on the set of validation examples; and
in response to the accuracy of the quantized network exceeding the loss-of-accuracy threshold, returning the quantized network.
	However, as a result of modification applied to claim 16 above, both accuracy and performance metric will be checked (and thus updated) in the sequence for adjusting the layer. It would have been obvious to one of ordinary skill in the art at the time the application was filed to modify the method such that while the loss-of-accuracy threshold exceeds the accuracy of the quantized network and while the performance metric of the quantized network exceeds a performance metric threshold, sequentially, in the ordered set of low-bit-width layers: converting the floating-point layer represented by the low-bit-width layer to the high-bit-width layer; replacing the low-bit-width layer with the high-bit-width layer in the quantized network; updating the accuracy of the quantized network based on the set of validation examples; updating the performance metric of the quantized network based on the set of validation examples; and in response to the accuracy of the quantized network exceeding the loss-of-accuracy threshold, returning the quantized network, as claimed. The motivation would be to consider both accuracy and performance of the network in adjusting the bit width of a layer sequentially to bring the quantized network closer to both the target accuracy and target performance.

	Regarding claim 18, the prior art applied to the preceding linking claim(s) teaches the features of the linking claim(s). 
	Imber does not explicitly disclose:
rendering a plot of the accuracy of the quantized network and the performance metric of the quantized network for each replacement of a low-bit-width layer with a high-bit-width layer in the quantized network.
	However, as a result of modification applied to claim 16 above, the accuracy and performance metric are updated in sequence. It would have been obvious to one of ordinary skill in the art at the time the application was filed to incorporate the step of rendering a plot of the accuracy of the quantized network and the performance metric of the quantized network for each replacement of a low-bit-width layer with a high-bit-width layer in the quantized network, as claimed. The motivation would be to visualize the relevant data (such as the accuracy and the performance) as a well-known practice.

Allowable Subject Matter
9.	Claim 12 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:  
	Regarding claim 12, the closest prior art of record fails to teach the features of “wherein calculating the per-layer deviation statistic of the low-bit-width layer based on the set of low-bit-width output activations of the low-bit-width layer and the set of example output activations of the floating-point layer comprises: for each example output activation in the set of example output activations, calculating an error metric between a corresponding low-bit-width output activation in the set of low-bit-width output activations and the example output activation; and calculating the per-layer deviation statistic equal to a mean squared error of the error metric for each example output activation in the set of example output activations,”  in combination with the rest of the claim limitations as claimed and defined by the Applicant.
	Note that Imber teaches calculating a per-layer deviation statistic by estimating a portion of the differentiable error attributable to the quantisation of the weights of a layer using a Taylor approximation. It is fundamentally different from the claimed features. There is no teaching or suggestion to make such implementation as claimed.
	
Conclusion
10.	The claims are eligible under 35 USC 101 because the recited method is directed to fine-tuning a neural network to a different neural network (quantization), similar to training and creating a neural network, which is not an abstracts idea.

11.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
	Xu et al. (US 20220067527 A1) teaches a method of compressing a neural network, involving determining a quantized-sparsified subset of weights for a particular layer iteratively by applying a loss function.
	Dikici et al. (US 20210073614 A1) teaches a method of converting a plurality of weights of a filter of a Deep Neural Network (DNN) in a first number format to a second number format that is less precision than the first number format based on comparing a total quantisation error with a threshold.
	LEE et al. (US 20200218962 A1) teaches a method for neural network quantization, involving obtaining weight differences between an initial weight and an updated weight determined by the learning of each cycle for each of layers in the first neural network, analyzing a statistic of the weight differences for each of the layers, determining one or more layers, from among the layers, to be quantized with a lower-bit precision based on the analyzed statistic, and generating a second neural network by quantizing the determined one or more layers with the lower-bit precision.
	WU et al. (US 20200193270 A1) teaches a method of processing a convolution neural network, involving performing coarse-to-fine dynamic fixed-point approximation on weighting vectors (ω and b) to minimize quantization errors in the weighting vectors.
	Weber et al. (US 20190392300 A1) teaches a method for processing a neural network includes performing a decompression step before executing operations associated with a block of layers of the neural network, performing a compression step after executing operations associated with the block of layers of a neural network, gathering performance indicators for the executing the operations associated with the block of layers of the neural network, and determining whether target performance metrics have been met with a compression format used for at least one of the decompression step and the compression step.
	Burger et al. (US 20190340499 A1) teaches a method for providing emulation of quantized precision operations, involving quantizing a trained neural network model by adjusting bit width, and comparing output values produced by the quantized model to output values produced using the same model expressed in a normal precision format to determine a quantization loss.
	HOLLAND (US 20190228284 A1) teaches a method of compressing one or more of activations or weights in one or more layer of a neural network. The activations and/or weights may be compressed based on a compression ratio or a system event. The system event may be a bandwidth condition, a power condition, a debug condition, a thermal condition or the like.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOHN C KUAN whose telephone number is (571)270-7066. The examiner can normally be reached M-F: 9:00AM-5:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Schechter can be reached on (571)272-2302. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JOHN C KUAN/Primary Examiner, Art Unit 2857