Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Application 16/711,376 with preliminary amendments filed 3/5/2020 have been examined.
Claims 15-17 have been cancelled.
Thus, claims 1-14 are currently pending,

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-14 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an
abstract idea without significantly more.
Claim 1 recites:
determining a quantization parameter of a weight of a corresponding layer.
The limitation of determining a quantization parameter of a weight of a corresponding layer, as drafted, is a process that, under its broadest reasonable interpretation,
covers performance of the limitation in the mind but for the recitation of generic computer
components. That is, other than reciting a neural network, nothing in the claim element precludes the step from practically being performed in the mind. For example, but for the neural
network, determining in the context of this claim encompasses the user manually determining a generic “quantization parameter” using generic weights. Similarly, the limitation of obtaining; determining and quantizing, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. For example, but for the neural network language, obtaining; determining and quantizing in the context of this claim encompasses the user manually generating a listing of travel reservations based on generic target cost. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)).

Further, these concepts also recite “Certain Methods of Organizing Human Activity”; (such as
commercial or legal interactions (including agreements in the form of contracts; legal
obligations; advertising, marketing or sales activities or behaviors; business relations) where
determining a quantization parameter is a method of human activity in advertising/marketing
activities. Accordingly, the claim recites an abstract idea.

This judicial exception is not integrated into a practical application. In particular, the claim only
recites one additional element – using a neural network to perform both the obtaining; determining and quantizing and determining steps. The neural network in both steps is recited at a high level of generality (i.e., as a generic processor performing a generic computer function of determining quantization parameters) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any
meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
The claim does not include additional elements that are sufficient to amount to significantly more
than the judicial exception. As discussed above with respect to integration of the abstract idea
into a practical application, the additional element of using a neural network to perform
both the obtaining; determining and quantizing and determining steps amounts to no more
than mere instructions to apply the exception using a generic computer component. Mere
instructions to apply an exception using a generic computer component cannot provide an
inventive concept. The claim(s) is/are not patent eligible.

Dependent claims 2-7 are merely add further details of the abstract steps/elements recited in
claim 1 without integrating the idea into a practical application; or including an improvement to
another technology or technical field, an improvement to the functioning of the computer itself,
or meaningful limitations beyond generally linking the use of an abstract idea to a particular
technological environment. Therefore, dependent claims 2-7 are also directed towards
nonstatutory subject matter.

As per independent claim(s) 8, this is also rejected as ineligible subject matter under 35
U.S.C. 101 for substantially the same reasons as the method claim(s) 1. The components (i.e.,
system/medium described in independent claim(s) 8 does not provide for integrating the
abstract idea into a practical application. At best, the claim(s) are merely providing alternate
environments to implement the abstract idea.

Dependent claims 9-14 merely add further details of the abstract steps/elements
recited in claim 1 without integrating the idea into a practical application; or including an
improvement to another technology or technical field, an improvement to the functioning of the
computer itself, or meaningful limitations beyond generally linking the use of an abstract idea to
a particular technological environment. Therefore, dependent claims 9-14 are also
directed towards non-statutory subject matter.

 


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim(s) 1-14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Diril et al., US Pub. No. 2019/0171927 A1.
 
As to claim 1 (and substantially similar claim 8), Diril discloses a neural network quantization method, 
(Diril [0015] FIG. 1 is a block diagram of an exemplary system for performing layer-level quantization within a neural network.);
comprising:
obtaining a weight and input data of a target quantization layer of an original neural
network,
(Diril [0034] In some examples (e.g., recurrent neural networks and/or feed-forward
neural networks), these artificial neurons may be non-linear functions of a weighted sum of inputs that are arranged in layers, with the outputs of one layer becoming the inputs of a subsequent layer.;
see also [0036] At activation layer 212, a set of weights (i.e., a filter) may be applied to the layer
inputs, and each node may output a weighted sum that may be scaled ( e.g., multiplied by a scaling factor sf0) and propagated to activation layer 214)

wherein the target quantization layer includes at least one computation layer of the
original neural network;
(Diril [0035] FIG. 2A is a block diagram of an exemplary feed-forward neural network 200. Neural network 200 may include an input layer 202, an output layer 204, and a series
of five activation layers-activation layer 212, activation layer 214, activation layer 216, activation layer 218, and activation layer 220.;
See also [0042] FIG. 4 is a flow diagram of an exemplary computer-implemented method 400 for providing layer-level quantization in various types of neural networks.)

determining a quantization parameter of a weight of a corresponding layer by using the
weight of the target quantization layer of the original neural network; 
(Diril [0024] The present disclosure is generally directed to implementing layer-level quantization within neural networks by dynamically adjusting ( e.g., for a particular dataset or group of datasets) quantization parameters for network layers.;
See also [0025] For example, the quantization procedures discussed herein may adjust input scaling parameters over time to learn optimal layer-level quantization intervals for various datasets.;
See also [0034] these artificial neurons may be non-linear functions of a weighted sum of inputs that are arranged in layers, with the outputs of one layer becoming the inputs of a subsequent layer.;
See also [0036] As shown, each value from the nodes of input layer 202 may be duplicated and sent to the nodes of activation layer 212. At activation layer 212, a set of weights (i.e., a filter) may be applied to the layer inputs, and each node may output a weighted sum that may
be scaled ( e.g., multiplied by a scaling factor sf0) and propagated to activation layer 214.)

determining a quantization parameter of input data of a corresponding layer by using the input data of the target quantization layer of the original neural network, wherein both the weight and the input data of the target quantization layer follow a principle of not distorting a maximum
absolute value; 
(Diril [0036] Neural network 200 may also store and/or update a first limit value (e.g., min0), a
second limit value (e.g., max0 ), and a scaling factor (sf0 ) based on the range of the outputs, as discussed in greater detail below. This process may be repeated at each activation layer in sequence to create outputs at layer 204.;
see also [0045] At step 430, one or more of the systems described herein may store a second limit value of the activation layer in the data storage system. For instance, accelerator 700 may
store the second limit value in register 790B or in any other part of a data storage subsystem. This second limit value may correspond to a maximum value for the activation layer,
such as an absolute maximum weight or filter value ( e.g., the highest value of an activation layer, which may be identified by passing output values through a min-max unit) or an
estimated maximum weight or filter value (e.g., an approximate maximum that discards outliers, a maximum within a predetermined standard deviation of values for a particular
layer, etc.). One of functional units 770 may be a processing element for determining the maximum value of the activation layer. In certain implementations, a single functional
unit 770 may determine the minimum value and the maximum value.)

and
quantizing the target quantization layer of the original neural network according to the
quantization parameter of the weight and the quantization parameter of the input data
(Diril [0046] At step 440, one or more of the systems described herein may determine a scaling factor based on the first and second limit values. For example, accelerator 700 may use
the minimum value from register 790A and the maximum value from register 790B to determine the scaling factor. The minimum and maximum values may span all or most of the
dynamic values of the activation layer, and the scaling factor may be used to scale numbers between the minimum and maximum values linearly ( e.g., in fixed quantization intervals)
or non-linearly (e.g., in variable quantization intervals, such as logarithmic intervals) down to a smaller range, thereby quantizing a range of data to a range that can be represented by within a bit width of the arithmetic operators of a system or subsequent layer. The quantization scheme for determining the scaling factor may be designed to preserve as much accuracy as possible while reducing the bit width to a predetermined size or an optimal size for a dataset.).

It would have been obvious to one having ordinary skill in the art at the time the time of the effective filing date to apply quantization parameters/ scaling parameters as taught by Diril  since it was known in the art that neural network system provide for after each invocation of the
inference on a specific dataset, the firmware may read the minimum and maximum values for each layer from the registers, compute a new range, and update the quantization procedure with the new range. The firmware may machine learning techniques to find an ideal interval to optimize the CNN and further improve the efficacy of the machine learning accelerator. Thus, the bit width of the arithmetic operations for the layers may be reduced, which may speed
up computation, reduce memory usage, and ( over time) achieve an optimized quantization (Diril [0074]).

As to claim 2, Diril discloses the neural network quantization method of claim 1, wherein the computation layer includes at least one of a convolution layer, a fully connected layer, an LRN layer, a deconvolution layer, a Reorg layer, and a Normalize layer
(see Diril Fig. 3 Convolution Layer 312;
See also [0004] The system may also include a hardware processing unit programmed to (1)
perform an inference of an activation layer of a neural network;
see also [0041] As explain above in the discussion of FIG. 3, in a CNN each activation layer may be a set of nonlinear functions of spatially nearby subsets of outputs of a prior layer. Neural networks may also operate in a variety of other ways. For example, embodiments of the instant disclosure may be applied to a multi-layer perceptron (MLP), in which each activation layer is a set of nonlinear functions of the weighted sum of each output from a prior layer. Embodiments
of the instant disclosure may also be applied to a recurrent neural network (RNN),).

As to claim 3, Diril discloses the neural network quantization method of claim 1, wherein the determining the quantization parameter of the weight of the corresponding layer by using the weight of the target quantization layer of the original neural network includes:
obtaining a maximum absolute value of a weight of each target quantization layer, 
(Diril [0036] Neural network 200 may also store and/or update a first limit value (e.g., min0), a
second limit value (e.g., max0 ), and a scaling factor (sf0 ) based on the range of the outputs, as discussed in greater detail below. This process may be repeated at each activation layer in sequence to create outputs at layer 204.;
see also [0045] At step 430, one or more of the systems described herein may store a second limit value of the activation layer in the data storage system. For instance, accelerator 700 may
store the second limit value in register 790B or in any other part of a data storage subsystem. This second limit value may correspond to a maximum value for the activation layer,
such as an absolute maximum weight or filter value ( e.g., the highest value of an activation layer, which may be identified by passing output values through a min-max unit) or an
estimated maximum weight or filter value (e.g., an approximate maximum that discards outliers, a maximum within a predetermined standard deviation of values for a particular
layer, etc.). One of functional units 770 may be a processing element for determining the maximum value of the activation layer. In certain implementations, a single functional
unit 770 may determine the minimum value and the maximum value.)
and
determining a first quantization parameter and a second quantization parameter of the
weight of the corresponding layer according to the maximum absolute value of the weight of
each target quantization layer
(Diril [0046] At step 440, one or more of the systems described herein may determine a scaling factor based on the first and second limit values. For example, accelerator 700 may use
the minimum value from register 790A and the maximum value from register 790B to determine the scaling factor. The minimum and maximum values may span all or most of the
dynamic values of the activation layer, and the scaling factor may be used to scale numbers between the minimum and maximum values linearly ( e.g., in fixed quantization intervals)
or non-linearly (e.g., in variable quantization intervals, such as logarithmic intervals) down to a smaller range, thereby quantizing a range of data to a range that can be represented by within a bit width of the arithmetic operators of a system or subsequent layer. The quantization scheme for determining the scaling factor may be designed to preserve as much accuracy as possible while reducing the bit width to a predetermined size or an optimal size for a dataset.).

As to claim 4, Diril discloses the neural network quantization method of claim 1, wherein the determining the quantization parameter of the input data of the corresponding layer by using the input data of the target quantization layer of the original neural network includes:
obtaining a maximum absolute value of input data of each target quantization layer, 
(Diril [0036] Neural network 200 may also store and/or update a first limit value (e.g., min0), a
second limit value (e.g., max0 ), and a scaling factor (sf0 ) based on the range of the outputs, as discussed in greater detail below. This process may be repeated at each activation layer in sequence to create outputs at layer 204.;
see also [0045] At step 430, one or more of the systems described herein may store a second limit value of the activation layer in the data storage system. For instance, accelerator 700 may
store the second limit value in register 790B or in any other part of a data storage subsystem. This second limit value may correspond to a maximum value for the activation layer,
such as an absolute maximum weight or filter value ( e.g., the highest value of an activation layer, which may be identified by passing output values through a min-max unit) or an
estimated maximum weight or filter value (e.g., an approximate maximum that discards outliers, a maximum within a predetermined standard deviation of values for a particular
layer, etc.). One of functional units 770 may be a processing element for determining the maximum value of the activation layer. In certain implementations, a single functional
unit 770 may determine the minimum value and the maximum value.)
and
determining a first quantization parameter and a second quantization parameter of input
data of the corresponding layer according to the maximum absolute value of the input data of
each target quantization layer
(Diril [0046] At step 440, one or more of the systems described herein may determine a scaling factor based on the first and second limit values. For example, accelerator 700 may use
the minimum value from register 790A and the maximum value from register 790B to determine the scaling factor. The minimum and maximum values may span all or most of the
dynamic values of the activation layer, and the scaling factor may be used to scale numbers between the minimum and maximum values linearly ( e.g., in fixed quantization intervals)
or non-linearly (e.g., in variable quantization intervals, such as logarithmic intervals) down to a smaller range, thereby quantizing a range of data to a range that can be represented by within a bit width of the arithmetic operators of a system or subsequent layer. The quantization scheme for determining the scaling factor may be designed to preserve as much accuracy as possible while reducing the bit width to a predetermined size or an optimal size for a dataset.).

As to claim 5, Diril discloses the neural network quantization method of claim 1, further comprising:
processing each target quantization layer of the original neural network by using a first
quantization method, a second quantization method, or a third quantization method, wherein:
the first quantization method includes:
quantizing the weight of the corresponding layer by using a first quantization parameter
of the weight of each target quantization layer to obtain a weight quantization result of the
corresponding layer,
(Diril [0036] Neural network 200 may also store and/or update a first limit value (e.g., min0), a
second limit value (e.g., max0 ), and a scaling factor (sf0 ) based on the range of the outputs, as discussed in greater detail below. This process may be repeated at each activation layer in sequence to create outputs at layer 204.;
see also [0045] At step 430, one or more of the systems described herein may store a second limit value of the activation layer in the data storage system. For instance, accelerator 700 may
store the second limit value in register 790B or in any other part of a data storage subsystem. This second limit value may correspond to a maximum value for the activation layer,
such as an absolute maximum weight or filter value ( e.g., the highest value of an activation layer, which may be identified by passing output values through a min-max unit) or an
estimated maximum weight or filter value (e.g., an approximate maximum that discards outliers, a maximum within a predetermined standard deviation of values for a particular
layer, etc.). One of functional units 770 may be a processing element for determining the maximum value of the activation layer. In certain implementations, a single functional
unit 770 may determine the minimum value and the maximum value.)
and
quantizing the input data of the corresponding layer by using a first quantization
parameter of the input data of each target quantization layer to obtain an input data quantization
result of the corresponding layer, 
(Diril [0036] In the example shown in FIG. 2A, data flows from input layer 202 thorough activation layers 212-220 to output layer 204 (i.e., from left to right). As shown, each value from
the nodes of input layer 202 may b  duplicated and sent to the nodes of activation layer 212. At activation layer 212, a set of weights (i.e., a filter) may be applied to the layer inputs, and each node may output a weighted sum that may be scaled ( e.g., multiplied by a scaling factor sf0) and propagated to activation layer 214.; Neural network 200 may also store and/or update a first limit value (e.g., min0), a second limit value (e.g., max0 ), and a scaling factor (sf0 )
based on the range of the outputs, as discussed in greater detail below. This process may be repeated at each activation layer in sequence to create outputs at layer 204.)

the second quantization method includes:
obtaining a weight quantization intermediate parameter of the corresponding layer by
using the first quantization parameter and a second quantization parameter of the weight of
each target quantization layer,
obtaining the weight quantization result of the corresponding layer according to the
weight quantization intermediate parameter,
obtaining a quantization intermediate parameter of the input data of the corresponding
layer by using the first quantization parameter and a second quantization parameter of the input
data of each target quantization layer, and
obtaining the input data quantization result of the corresponding layer according to the
quantization intermediate parameter of the input data,
(Diril [0024] Embodiments of the instant disclosure may, while performing inference (and/or training) on a dataset, identify minimum and maximum values for activation layers (i.e.,
hidden or intermediate layers) of a neural network and then update scaling factors for the layers based on the identified values.)

the third quantization method includes:
obtaining the weight quantization result of the corresponding layer by using the first
quantization parameter and the second quantization parameter of the weight of each target
quantization layer, and
obtaining the input data quantization result of the corresponding layer by using the first
quantization parameter and the second quantization parameter of the input data of each target
quantization layer
(Diril [0042] FIG. 4 is a flow diagram of an exemplary computer-implemented method 400 for providing layer-level quantization in various types of neural networks. The steps shown in FIG. 4 may be performed by any suitable computer- executable code and/or computing system, including the system(s) illustrated in FIGS. 1, 7, and 8. In one example, each of the steps shown in FIG. 4 may represent an algorithm whose structure includes and/or is represented by
multiple sub-steps, examples of which will be provided in greater detail below.).

As to claim 6, Diril discloses the neural network quantization method of claim 1, further comprising:
obtaining a weight quantization intermediate parameter of a corresponding channel by
using a first weight quantization parameter and a second weight quantization parameter of each
channel of each target quantization layer, wherein the target quantization layer includes a
convolution layer and / or a fully connected layer,
obtaining a weight quantization result of the corresponding channel by using the weight
quantization intermediate parameter of each channel, wherein the weight quantization result of
each channel of each target quantization layer constitutes a weight quantization result of the
corresponding layer,
obtaining a quantization intermediate parameter of the input data of the corresponding
layer by using a first input data quantization parameter and a second input data quantization
parameter of each target quantization layer, and
obtaining an input data quantization result of the corresponding layer by using the
quantization intermediate parameter of the input data of each target quantization layer
(Diril [0024] Embodiments of the instant disclosure may, while performing inference (and/or training) on a dataset, identify minimum and maximum values for activation layers (i.e.,
hidden or intermediate layers) of a neural network and then update scaling factors for the layers based on the identified values.).

As to claim 7, Diril discloses the neural network quantization method of claim 6, further comprising:
processing each target quantization layer of the original neural network by using the first
quantization method, the second quantization method, or the third quantization method,
wherein the target quantization layer further includes at least one layer other than the
convolution layer and / or the fully connected layer in the computation layers of the original
neural network,
(Diril [0041] As explain above in the discussion of FIG. 3, in a CNN each activation layer may be a set of nonlinear functions of spatially nearby subsets of outputs of a prior layer. Neural networks may also operate in a variety of other ways. For example, embodiments of the instant disclosure may be applied to a multi-layer perceptron (MLP), in which each activation layer is a set of nonlinear functions of the weighted sum of each output from a prior layer. Embodiments
of the instant disclosure may also be applied to a recurrent neural network (RNN), in which each activation layer may be a collection of nonlinear functions of weighted sums of outputs and of a previous state. Embodiments of the instant disclosure may also be applied to any other suitable
type or form of neural network.; 
See also [0040] While FIGS. 2A and 2B show one way to conceptualize a neural network, there are a variety of other ways to illustrate and conceptualize neural networks. For example, FIG. 3 shows a neural network 300 with activation layers 310 represented by sets of feature maps 304-308. In neural network 300, which may represent a CNN, an input 302 may undergo transformations for each of activation layers 310, which may be calculated by hardware such as processing unit 160, accelerator 700, and/or processor 814. For example, input 302 may undergo convolutions based on the filters and quantization parameters of convolution layer 312
to produce feature maps 304.) 
the first quantization method includes:
quantizing the weight of the corresponding layer by using the first quantization parameter
of the weight of each target quantization layer to obtain the weight quantization result of the
corresponding layer, and
quantizing the input data of the corresponding layer by using the first quantization
parameter of the input data of each target quantization layer to obtain the quantization result
of the input data of the corresponding layer,
(Diril [0036] Neural network 200 may also store and/or update a first limit value (e.g., min0), a
second limit value (e.g., max0 ), and a scaling factor (sf0 ) based on the range of the outputs, as discussed in greater detail below. This process may be repeated at each activation layer in sequence to create outputs at layer 204.;
see also [0045] At step 430, one or more of the systems described herein may store a second limit value of the activation layer in the data storage system. For instance, accelerator 700 may
store the second limit value in register 790B or in any other part of a data storage subsystem. This second limit value may correspond to a maximum value for the activation layer,
such as an absolute maximum weight or filter value ( e.g., the highest value of an activation layer, which may be identified by passing output values through a min-max unit) or an
estimated maximum weight or filter value (e.g., an approximate maximum that discards outliers, a maximum within a predetermined standard deviation of values for a particular
layer, etc.). One of functional units 770 may be a processing element for determining the maximum value of the activation layer. In certain implementations, a single functional
unit 770 may determine the minimum value and the maximum value.)

the second quantization method includes:
obtaining a weight quantization intermediate parameter of the corresponding layer by
using the first quantization parameter and the second quantization parameter of the weight of
each target quantization layer,
obtaining a weight quantization result of the corresponding layer according to the weight
quantization intermediate parameter,
obtaining the quantization intermediate parameter of the input data of the corresponding
layer by using the first quantization parameter and the second quantization parameter of the
input data of each target quantization layer, and
obtaining the input data quantization result of the corresponding layer according to the
quantization intermediate parameter of the input data,
(Diril [0024] Embodiments of the instant disclosure may, while performing inference (and/or training) on a dataset, identify minimum and maximum values for activation layers (i.e.,
hidden or intermediate layers) of a neural network and then update scaling factors for the layers based on the identified values)

the third quantization method includes:
obtaining the weight quantization result of the corresponding layer by using the first
quantization parameter and the second quantization parameter of the weight of each target
quantization layer, and
obtaining the input data quantization result of the corresponding layer by using the first
quantization parameter and the second quantization parameter of the input data of each target
quantization layer
(Diril [0042] FIG. 4 is a flow diagram of an exemplary computer-implemented method 400 for providing layer-level quantization in various types of neural networks. The steps shown in FIG. 4 may be performed by any suitable computer- executable code and/or computing system, including the system(s) illustrated in FIGS. 1, 7, and 8. In one example, each of the steps shown in FIG. 4 may represent an algorithm whose structure includes and/or is represented by
multiple sub-steps, examples of which will be provided in greater detail below.).

Referring to claim 9, this dependent claim recites similar limitations as claim 2;
therefore, the arguments above regarding claim 2 are also applicable to claim 9.

Referring to claim 10, this dependent claim recites similar limitations as claim 3;
therefore, the arguments above regarding claim 3 are also applicable to claim 10.

Referring to claim 11, this dependent claim recites similar limitations as claim 4;
therefore, the arguments above regarding claim 4 are also applicable to claim 11.

Referring to claim 12, this dependent claim recites similar limitations as claim 5;
therefore, the arguments above regarding claim 5 are also applicable to claim 12.

Referring to claim 13, this dependent claim recites similar limitations as claim 6;
therefore, the arguments above regarding claim 6 are also applicable to claim 13.

Referring to claim 14, this dependent claim recites similar limitations as claim 7;
therefore, the arguments above regarding claim 7 are also applicable to claim 14.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:


Lee et al., US Pub. No. 2019/0042948 A1 teaches a method of generating a fixed-point quantized neural network includes analyzing a statistical distribution for each channel of floating-point parameter values of feature maps and a kernel for each channel from data of a pre-trained floating-point neural network, determining a fixed-point expression of each of the parameters for each channel statistically covering a distribution range of the floating point
parameter values based on the statistical distribution for each channel, determining fractional lengths of a bias and a weight for each channel among the parameters of the fixed-point expression for each channel based on a result of performing a convolution operation, and generating a fixed point quantized neural network in which the bias and the weight for each channel have the determined fractional lengths;

Zhang et al., US Pub. No. 2020/0285933 A1 teaches a method and an apparatus for quantizing an activation volume of a deep neural network are disclosed. The method includes: obtaining an activation volume of a network layer in the deep neural network, wherein elements in the
activation volume are arranged in three directions: height, width, and depth; dividing depth segments in the activation volume in which a difference among element features is
smaller than a preset threshold into a same slice group along the depth direction of the activation volume, so as to obtain a plurality of slice groups; quantizing each slice
group respectively by using a quantization parameter corresponding to each slice group obtained through a quantization formula. The quantization error can be reduced through the above method;

Ha et al., US Pub. No.: 2020/0026986, teaches A neural network method of parameter quantization includes obtaining channel profile information for first parameter values of a floating-point type in each channel included in each of feature maps based on an input in a first dataset to a floating-point parameters pre-trained neural network; determining a probability density function (PDF) type, for each channel, appropriate for the channel profile information
based on a classification network receiving the channel profile information as a dataset; determining a fixed-point representation, based on the determined PDF type, for each channel, statistically covering a distribution range of the first parameter values; and generating a fixed-point quantized neural network based on the fixed-point representation determined for each channel;

Deisher et al., US Pub. No. 2019/0042935 A1, teaches an apparatus for applying dynamic quantization of a neural network is described herein. The apparatus includes a scaling unit and a quantizing unit. The scaling unit is to calculate an initial desired scale factors of a plurality of inputs, weights and a bias and apply the input scale factor to a summation node. Also, the scaling unit is to determine a scale factor for a multiplication node based on the desired
scale factors of the inputs and select a scale factor for an activation function and an output node. The quantizing unit is to dynamically requantize the neural network by traversing a graph of the neural network;

Desappan et al. , US Pub. No.: 2019/0012559 , teaches a method for dynamically quantizing feature maps of a received image. The method includes convolving an image based on a predicted maximum value, a predicted minimum value, trained kernel weights and the image data. The input data is quantized based on the predicted minimum value and predicted maximum value. The output of the convolution is computed into an accumulator and re-quantized. The requantized value is output to an external memory. The predicted min value and the predicted max value are computed based on the previous max values and min values with
a weighted average or a pre-determined formula. Initial min value and max value are computed based on known quantization methods and utilized for initializing the predicted min value and predicted max value in the quantization process;

and Yu et al., US Pub. No.: 2018/0046896, teaches an invention which relates to artificial neural network, for example, convolutional neural network. In particular, the present invention relates to how to accelerate a complex neural network by fixed-point data quantization.






CONTACT INFORMATION
Any inquiry concerning this communication or earlier communications from the examiner should be directed to EVAN S ASPINWALL whose telephone number is (571)270-7723. The examiner can normally be reached Monday-Friday 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Neveen Abel-Jalil can be reached on 571-270-0474. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/Evan Aspinwall/Primary Examiner, Art Unit 2152