Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101
101 Rejection
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 USC § 101 because the claimed invention is directed to non-statutory subject matter.

Regarding Claim 1:  Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 1 is directed to a method which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis:  Claim 1 recites a computer implemented method of processing neural networks, which, under its broadest reasonable interpretation is a series of mental processes.  For example, but for the generic computer components language, the above limitations in the context of this claim encompass neural network processing, including the following: 
generating at least one noise-to-signal metric representing quantization noise present in the tensor 
 generating a scaled learning rate based on the at least one noise-to-signal metric (mathematical calculation)
Therefore, claim 1 recites an abstract idea which is a judicial exception.
Step 2A Prong Two Analysis:  Claim 1 recites additional elements “quantization-enabled system”. However, these additional features are computer components recited at a high-level of generality, such that they amount to no more than mere instructions to apply the judicial exception using a generic computer component.  An additional element that merely recites the words “apply it” (or an equivalent) with the judicial exception, or merely includes instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, does not integrate the judicial exception into a practical application.  Claim 1 also recites additional elements “performing an epoch of training” which amounts to generally linking the judicial exception to a particular technology or field of use.  Claim 1 also recites additional insignificant extra-solution activity “obtaining a tensor comprising values of one or more parameters of the neural network represented in a quantized-precision format”.  Therefore, claim 1 is directed to a judicial exception.
Step 2B Analysis:  Claim 1 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the lack of integration of the abstract idea into a practical application, the additional elements recited in claim 1 amount to no more than mere instructions to apply the judicial exception using a generic computer component.

Dependent claim 2 recites additional application of the judicial exception to a particular technology or field of use “the one or more parameters are weights used in a forward-propagation phase of a training epoch” as well as additional mathematical calculations “converting values of a first tensor from a normal-precision floating-point format to the quantized-precision format”
Dependent claim 3 recites additional insignificant extra-solution activities of selection of a data type “the one or more parameters represent edge weights and activation weights of the neural network” as well as additional mathematical calculations “generating a noise-to-signal ratio for the activation weights of the layer and generating a noise-to-signal ratio for the edge weights of the layer.”
Dependent claim 4 recites additional mathematical calculations “generating the noise-to-signal ratio for the activation weights of each of the plurality of layers comprises computing the difference between the activation weights of the second tensor for that layer and the activation weights of the first tensor for that layer, and dividing the difference by the absolute value of the activation weights of the first tensor for that layer” and “generating the noise-to-signal ratio for the edge weights of each of the plurality of layers comprises computing the difference between the edge weights of the second tensor for that layer and the edge weights of the first tensor for that layer, and dividing the difference by the absolute value of the edge weights of the first tensor for that layer”
Dependent claim 5 recites additional application of the judicial exception to a particular field or technology “the neural network comprises a total of L layers” and additional mathematical calculations “the scaling factor for a layer l of the neural network is generated based on an average value of the noise-to-signal ratio for the activation weights of the layer l as well as a sum of average values of the noise-to-signal ratios for the edge weights of layers l+1 through L of the neural network”.
Dependent claim 6 recites additional application of the judicial exception to a particular field or technology “training the neural network comprises training the neural network via stochastic gradient descent” and additional mathematical calculations “scaled learning rate for the layer l of the neural network is computed by the equation:” The additional elements in the claim are directed to mathematical equations or representations.
Dependent claim 7 recites additional mathematical calculations “wherein computing the one or more gradient updates using the scaled learning rate comprises computing gradient updates for one or more parameters of the layer l using the scaled learning rate”.
Dependent claim 8 recites additional mathematical calculations “wherein computing the one or more gradient updates using the scaled learning rate further comprises computing gradient updates for one or more parameters of one or more other layers of the neural network using the same scaled learning rate generated for the layer l”.
Dependent claim 9 recites additional insignificant extra-solution activity “the normal-precision floating-point format represents the values with a first bit width” and “the quantized-precision format represents the values with a second bit width, the second bit width being lower than the first bit width” which amounts to selection of a data type. The claim also recites additional extra-solution activity “storing the scaling factor” and obtaining the scaling factor which amounts to gathering and outputting data.  The claim also recites additional mathematical calculations “computing gradient updates”.
Dependent claim 10 recites additional applications of the judicial exception to a particular field or technology “wherein the epoch of training of the neural network is a second epoch performed after a first epoch of training of the neural network” and “performing the first epoch of training”.  The claim also recites additional mathematical calculations “computing one or more gradient updates” and “generating the scaled learning rate based on the at least one noise-to-signal metric comprises scaling the predetermined learning rate based on the at least one noise-to-signal metric”.

Regarding Claim 11:  Claim 11 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 11 is directed to a system which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis:  Claim 11 recites a computer implemented method of processing neural networks, which, under its broadest reasonable interpretation is a series of mental processes.  For example, but for the generic computer components language, the above limitations in the context of this claim encompass neural network processing, including the following: 
to compute at least one metric representing quantization noise present in the values represented in the quantized-precision format (mathematical calculation),
 to adjust a learning rate of the neural network based on the at least one metric (mathematical calculation)
Therefore, claim 11 recites an abstract idea which is a judicial exception.
Step 2A Prong Two Analysis:  Claim 11 recites additional elements “quantization-enabled system”, “processors”, “memory”, and “computer-readable storage media storing computer-readable instructions”. However, these additional features are computer components recited at a high-level of generality, such that they amount to no more than mere instructions to apply the judicial exception using a generic computer component.  An additional element that merely recites the words “apply it” (or an equivalent) with the judicial exception, or merely includes instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, does not integrate the judicial exception into a practical application.  Claim 11 also recites additional elements “cause the system to perform a method of training a neural network” which amounts to generally linking the judicial exception to a particular technology or field of use.  Claim 11 also recites additional insignificant extra-solution activity “to represent values of one or more parameters of the neural network in a quantized-precision format” which amounts to selection of a data type.  Therefore, claim 11 is directed to a judicial exception.
Step 2B Analysis:  Claim 11 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the lack of integration of the abstract idea into a practical application, the additional elements recited in claim 11 amount to no more than mere instructions to apply the judicial exception using a generic computer component.
For the reasons above, claim 11 is rejected as being directed to non-patentable subject matter under §101. This rejection applies equally to dependent claims 12-14. The additional limitations of the dependent claims are addressed briefly below:
Dependent claim 12 recites additional application of the judicial exception to a particular technology or field of use “the one or more parameters of the neural network comprise a plurality of weights of a layer of the neural network” as well as additional mathematical calculations “the at least one metric comprises a noise-to-signal ratio computed by computing a difference between values of the weights represented in the quantized-precision format and values of the weights represented in a normal-precision floating-point format, and dividing the difference by an absolute value of the values of the weights represented in the normal-precision floating-point format.
Dependent claim 13 recites additional application of the judicial exception to a particular field or technology “the one or more parameters comprise activation weights and edge weights of a first layer of the neural network” and “cause the system to train the neural network with at least some values of the parameters represented in the quantized-precision format” as well as additional mathematical calculations “computing the at least one metric comprises computing a first noise-to-signal ratio for the activation weights of the first layer and a second noise-to-signal ratio for the edge weights of the first layer.” and “to compute gradient updates for the first layer and at least one other layer of the neural network using the adjusted learning rate”.
Dependent claim 14 recites additional generic computer components “a neural network accelerator having a tensor processing unit.”.

Regarding Claim 15:  Claim 15 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 15 is directed to a method which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis:  Claim 15 recites a computer implemented method of processing neural networks, which, under its broadest reasonable interpretation is a series of mental processes.  For example, but for the generic computer components language, the above limitations in the context of this claim encompass neural network processing, including the following: 
computing at least one noise-to-signal ratio representing noise present in the neural network (mathematical calculation),
 adjusting a hyper-parameter of the neural network based on the at least one noise-to-signal ratio (mathematical calculation)
Therefore, claim 15 recites an abstract idea which is a judicial exception.
Step 2A Prong Two Analysis:  Claim 15 recites additional elements “training the neural network using the adjusted hyper-parameter” which amounts to generally linking the judicial exception to a particular technology or field of use.  Therefore, claim 15 is directed to a judicial exception.
Step 2B Analysis:  Claim 15 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the lack of integration of the abstract idea into a practical application, the additional elements recited in claim 15 amount to no more than mere instructions to apply the judicial exception using a generic computer component.
For the reasons above, claim 15 is rejected as being directed to non-patentable subject matter under §101. This rejection applies equally to dependent claims 16-20. The additional limitations of the dependent claims are addressed briefly below:
Dependent claim 16 recites additional insignificant extra-solution activity “the hyper-parameter comprises at least one of: a learning rate, a learning rate schedule, a bias, a stochastic gradient descent batch size, a number of neurons in the neural network, or a number of layers in the neural network” which amounts to selection of a data type.
Dependent claim 17 recites additional generic computer components “obtaining a first tensor comprising values of one or more parameters.”, “introducing noise to the neural network”, and “obtaining a second tensor comprising values of the one or more parameters”  which amounts to gathering and outputting data.  Claim 14 also recites additional mathematical calculations “computing a difference between one or more values of the second tensor and one or more corresponding values of the first tensor” and “dividing the difference by the absolute value of the one or more corresponding values of the first tensor”.
Dependent claim 18 recites additional insignificant extra-solution activity “changing a data type of values” which amounts to selection of a data type.  Claim 18 also recites additional mathematical calculations “decreasing a stochastic gradient descent batch size for one or more layers of the neural network.  Claim 18 also recites additional generic computer components “DRAM”.  Claim 18 also recites additional insignificant extra-solution activity “storing values of one or more parameters” which amounts to gathering and outputting data.  Claim 18 also recites additional applications of the judicial exception to a particular field or technology “implementing analog-based training of the neural network”.
Dependent claim 19 recites additional mathematical calculations “computing a scaling factor based on the at least one noise-to-signal ratio” and “scaling the hyper-parameter using the scaling factor”.
Dependent claim 20 recites additional mathematical calculations “wherein the hyper-parameter is adjusted to compensate for the effect of the noise present in the neural network on the accuracy of gradient updates computed during the training of the neural network.”

Therefore, when considering the elements separately and in combination, they do not do not add significantly more to the inventive concept. Accordingly, claims 1-20 are rejected under 35 U.S.C. § 101. 

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1 and 2 are rejected under 35 U.S.C. 102 as being unpatentable over Hou (“LOSS-AWARE WEIGHT QUANTIZATION OF DEEP NETWORKS”, 2018). 

Regarding claim 1, Hou teaches A method for training a neural network implemented with a quantization-enabled system, the method comprising: with the quantization-enabled system: ([p. 1 Sec. 1] "Another effective approach to compress the network and accelerate training is by quantizing each full-precision weight to a small number of bits").
obtaining a tensor comprising values of one or more parameters of the neural network represented in a quantized-precision format; ([p. 2 Sec. 2] "Let the full-precision weights from all L layers be w = [w>1 ,w>2 ,...,w>L ]>, where wl =vec(Wl), and Wl is the weight matrix at layer l. The corresponding quantized weights will be denoted ^w" Weights interpreted as parameters of the neural network.  See also Wi.).
generating at least one noise-to-signal metric representing quantization noise present in the tensor; ([p. 3 Sec. 3.1] "In weight ternarization, TWN simply finds the closest ternary approximation of the full precision weight at each iteration, while TTQ sets the ternarization threshold heuristically. Inspired by LAB (for binarization), we consider the loss explicitly during quantization and obtain the quantization thresholds and scaling parameter by solving an optimization problem."  Ternarization interpreted as synonymous with quantization.  Loss explicitly due to quantization is interpreted as synonymous with a noise-to-signal metric representing quantization noise.).
generating a scaled learning rate based on the at least one noise-to-signal metric; and ([p. 3 Sec. 3.1] "we consider the loss explicitly during quantization and obtain the quantization thresholds and scaling parameter by solving an optimization problem" [p. 4] "Obviously, this objective can be minimized layer by layer. Each proximal Newton iteration thus consists of two steps: (i) Obtain wtl in (7) by gradient descent along ∇l`(wˆt−1), which is preconditioned by the adaptive learning rate...so that the rescaled dimensions have similar curvature" Adaptive learning rate is interpreted as synonymous with scaled learning rate. Hou teaches that adaptive learning rate is scaled relative to the loss or error in response to quantization.).
performing an epoch of training of the neural network using the values of the tensor, including computing one or more gradient updates using the scaled learning rate. ([p. 16 Sec. D.2] "We use a one-layer LSTM with 512 cells. The maximum number of epochs is 200, and the number of time steps is 100. The initial learning rate is 0.002. After 10 epochs, it is decayed by a factor of 0.98 after each epoch. The weights are initialized uniformly in [0.08, 0.08]. After each iteration, the gradients are clipped to the range [−5, 5]. All the updated weights are clipped to [−1, 1] for binarization and ternarization methods"). 

Regarding claim 2, Hou teaches The method of claim 1, wherein: the tensor is a second tensor obtained by converting values of a first tensor from a normal-precision floating-point format to the quantized-precision format, and ([p. 3 Sec. 3.1] "In weight ternarization, TWN simply finds the closest ternary approximation of the full precision weight at each iteration" See also Eqn. 3 where a vector (tensor) of weights from 1 to L is disclosed.).
the one or more parameters are weights used in a forward-propagation phase of a training epoch of the neural network. ([p. 4 Sec. 3.1] "LBNN uses full-precision weights in the forward pass, while all other quantization methods including ours use quantized weights (which eliminates most of the multiplications and thus faster training)."). 

Claims 15 and 19 are rejected under 35 U.S.C. 102 as being unpatentable over Dexu Lin (US 2016/0328647 A1). 

Regarding claim 15, Dexu Lin teaches A method for compensating for noise during training of a neural network, comprising: ([¶0087] "In some aspects, where the model performance is below the threshold, the noise level may be reduced and the model performance may be reevaluated").
computing at least one noise-to-signal ratio representing noise present in the neural network; ([¶0027] " In some aspects, model performance may be evaluated using a signal to quantization noise ratio (SQNR). That is, in a machine learning model such as a deep convolutional network, the effect of quantizing weights and/or activations is the introduction of quantization noise. Similar to other communication systems, when quantization noise increases, the model performance decreases. Accordingly, the SQNR observed at the output may provide an indication of model performance or accuracy.").
adjusting a hyper-parameter of the neural network based on the at least one noise-to-signal ratio; and ([¶0030] "In some aspects, the bit width selection may be simplified as: min−Σ ρi log(x i), s.t. Σ α i x i =C, (3) where αi is the noise amplification or reduction factor from layer i to the output, C is a constant that constrains the α factors, and ρi is a scaling factor of the bit width at layer i...In some aspects, the constant C may be computed based on the SQNR limit").
training the neural network using the adjusted hyper-parameter. ([¶0035] "In some aspects, additional safety factors may be added to account for non-Gaussian distribution of activations and weights and/or variations for different training and test sets"). 

Regarding claim 19, Dexu Lin teaches The method of claim 15, wherein adjusting the hyper-parameter based on the at least one noise-to-signal ratio comprises: ([¶0030] "In some aspects, the bit width selection may be simplified as: min−Σ ρi log(x i), s.t. Σ α i x i =C, (3) where αi is the noise amplification or reduction factor from layer i to the output, C is a constant that constrains the α factors, and ρi is a scaling factor of the bit width at layer i...In some aspects, the constant C may be computed based on the SQNR limit").
computing a scaling factor based on the at least one noise-to-signal ratio; (See scaling factor α.).
scaling the hyper-parameter using the scaling factor. ([¶0030] "In some aspects, the bit width selection may be simplified as: min−Σ ρi log(x i), s.t. Σ α i x i =C, (3)" Eqn. 3 shows that the hyper-parameter is scaled relative to the scaling factor.). 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 3, 4, 11, 12, 13, 16, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Hou and in view of Dexu Lin (US 2016/0328647 A1).

Regarding claim 3, Hou teaches The method of claim 2.  However, Hou does not explicitly teach wherein: the one or more parameters represent edge weights 
and activation weights of the neural network, and generating the at least one noise-to-signal metric comprises, for each of a plurality of layers of the neural network, generating a noise-to-signal ratio for the activation weights of the layer and generating a noise-to-signal ratio for the edge weights of the layer.  

Dexu Lin who teaches a related art of a quantized neural network teaches The method of claim 2, wherein: the one or more parameters represent edge weights ([¶0058] "A deep learning architecture may learn a hierarchy of features. If presented with visual data, for example, the first layer may learn to recognize relatively simple features, such as edges" Dexu Lin explicitly teaches that weights may be edge weights.).
and activation weights of the neural network ([¶0026] “For example, different bit widths may be selected for bias values, activation values”), and generating the at least one noise-to-signal metric comprises, for each of a plurality of layers of the neural network, generating a noise-to-signal ratio for the activation weights of the layer and generating a noise-to-signal ratio for the edge weights of the layer. ([¶0027] "In some aspects, model performance may be evaluated using a signal to quantization noise ratio (SQNR). That is, in a machine learning model such as a deep convolutional network, the effect of quantizing weights and/or activations is the introduction of quantization noise. Similar to other communication systems, when quantization noise increases, the model performance decreases. Accordingly, the SQNR observed at the output may provide an indication of model performance or accuracy." Dexu Lin teaches quantizing weights and/or activations, the exclusive or nature of weights and/or activations suggests that the SQNR can be generated for either or both of the weights and activations.). 

It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the quantized neural network in Hou with the signal-to-quantization-noise ratio taught in Dexu Lin. The combination would have been obvious because a person of ordinary skill in the art would be able to determine that Hou determines the learning rate as a function of the quantization loss or noise.  Similarly, Dexu Lin teaches ([¶0027] “In some aspects, model performance may be evaluated using a signal to quantization noise ratio (SQNR)”).  Therefore, one of ordinary skill in the art would be able to determine that in order to optimize the learning rate as a function of noise, the SQNR would be a valuable mathematical tool.

Regarding claim 4, the combination of Hou, and Dexu Lin teaches The method of claim 3, wherein: generating the noise-to-signal ratio for the activation weights of each of the plurality of layers comprises computing the difference between the activation weights of the second tensor for that layer and the activation weights of the first tensor for that layer, and dividing the difference by the absolute value of the activation weights of the first tensor for that layer; and (Dexu Lin [¶0028] " In some aspects, the signal variance (or power) of each stage may be assumed to be normalized to 1 for simplicity of notation. The bit width selection may be subject to certain constraints. For example, in some aspects, the bit width selection may be subject to a threshold of SQNR at the output of the model, which may be expressed as:" See Eqn. 2.   Second tensor is quantized tensor, therefore the claim amounts to dividing quantized weights by full precision weights.  Dexu Lin teaches normalizing or quantizing the weight to 1 and dividing by the sum of the signal variance (weights or activations) at each layer.  One of ordinary skill in the art would recognize that the difference between the activation weight of the first and second (quantized) tensor is the quantization noise.).
generating the noise-to-signal ratio for the edge weights of each of the plurality of layers comprises computing the difference between the edge weights of the second tensor for that layer and the edge weights of the first tensor for that layer, and dividing the difference by the absolute value of the edge weights of the first tensor for that layer. (Dexu Lin [¶0028] " In some aspects, the signal variance (or power) of each stage may be assumed to be normalized to 1 for simplicity of notation. The bit width selection may be subject to certain constraints. For example, in some aspects, the bit width selection may be subject to a threshold of SQNR at the output of the model, which may be expressed as:" (See Eqn. 2.)  Second tensor is quantized tensor, therefore the claim amounts to dividing quantized weights by full precision weights.  Dexu Lin teaches normalizing or quantizing the weight to 1 and dividing by the sum of the signal variance (weights or activations) at each layer.  One of ordinary skill in the art would recognize that the difference between the activation weight of the first and second (quantized) tensor is the quantization noise.). 

Regarding claim 11, Dexu Lin teaches A system for training a neural network implemented with a quantization-enabled system, the system comprising: ([Abstract] "A method for selecting bit widths for a fixed point machine learning model includes evaluating a sensitivity of model accuracy to bit widths at each computational stage of the model. The method also includes selecting a bit width for parameters, and/or intermediate calculations in the computational stages of the mode" Selecting a bit width interpreted as synonymous with quantizing.).
memory; one or more processors coupled to the memory and adapted to perform quantized-precision operations; ([¶0011] "The apparatus includes a memory and at least one processor coupled to the memory").
one or more computer-readable storage media storing computer-readable instructions that, when executed by the one or more processors, cause the system to perform a method of training a neural network, the instructions comprising: ([¶007] "Deep neural networks may be trained to recognize a hierarchy of features and so they have increasingly been used in object recognition applications" [¶0053]  Instructions executed at the general-purpose processor 102 may be loaded from a program memory associated with the CPU 102 or may be loaded from a dedicated memory block 118.").
instructions that cause the system to represent values of one or more parameters of the neural network in a quantized-precision format; ([¶0037] "For instance, a DCN with 3 convolutional layers, for the purpose of the SQNR calculation, may have 6 quantization “layers”, or steps as follows: Quantize weights and biases of convolution layer (conv) 1, Quantize activations of conv1, Quantize weights and biases of conv2, Quantize activations of conv2, Quantize weights and biases of conv3, Quantize activations of conv3.").
instructions that cause the system to compute at least one metric representing quantization noise present in the values represented in the quantized-precision format; and ([¶0027] "In some aspects, model performance may be evaluated using a signal to quantization noise ratio (SQNR)"). 
However, Dexu Lin does not explicitly teach instructions that cause the system to adjust a learning rate of the neural network based on the at least one metric.  

Hou who teaches a related art of a quantized neural network teaches instructions that cause the system to adjust a learning rate of the neural network based on the at least one metric. ([p. 3 Sec. 3.1] "we consider the loss explicitly during quantization and obtain the quantization thresholds and scaling parameter by solving an optimization problem" [p. 4] "Obviously, this objective can be minimized layer by layer. Each proximal Newton iteration thus consists of two steps: (i) Obtain wtl in (7) by gradient descent along ∇l`(wˆt−1), which is preconditioned by the adaptive learning rate...so that the rescaled dimensions have similar curvature" Adaptive learning rate is interpreted as synonymous with scaled learning rate. Hou teaches that adaptive learning rate is scaled relative to the loss or error in response to quantization.). 

It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the quantized neural network in Hou with the signal-to-quantization-noise ratio taught in Dexu Lin. The combination would have been obvious because a person of ordinary skill in the art would be able to determine that Hou determines the learning rate as a function of the quantization loss or noise.  Similarly, Dexu Lin teaches ([¶0027] “In some aspects, model performance may be evaluated using a signal to quantization noise ratio (SQNR)”).  Therefore, one of ordinary skill in the art would be able to determine that in order to optimize the learning rate as a function of noise, the SQNR would be a valuable mathematical tool.

Regarding claim 12, the combination of Dexu Lin, and Hou teaches The system of claim 11, wherein: the one or more parameters of the neural network comprise a plurality of weights of a layer of the neural network; and (Dexu Lin [¶0026] "For example, different bit widths may be selected for bias values, activation values, and/or weights of each layer of the neural network").
the at least one metric comprises a noise-to-signal ratio computed by computing a difference between values of the weights represented in the quantized-precision format and values of the weights represented in a normal-precision floating-point format, and dividing the difference by an absolute value of the values of the weights represented in the normal-precision floating-point format. (Dexu Lin [¶0028] " In some aspects, the signal variance (or power) of each stage may be assumed to be normalized to 1 for simplicity of notation. The bit width selection may be subject to certain constraints. For example, in some aspects, the bit width selection may be subject to a threshold of SQNR at the output of the model, which may be expressed as:" See Eqn. 2.   Second tensor is quantized tensor, therefore the claim amounts to dividing quantized weights by full precision weights.  Dexu Lin teaches normalizing or quantizing the weight to 1 and dividing by the sum of the signal variance (weights or activations) at each layer.  One of ordinary skill in the art would recognize that the difference between the activation weight of the first and second (quantized) tensor is the quantization noise.). 

Regarding claim 13, the combination of Dexu Lin, and Hou teaches The system of claim 11, wherein: the one or more parameters comprise activation weights and edge weights of a first layer of the neural network; (Dexu Lin [¶0026] "For example, different bit widths may be selected for bias values, activation values, and/or weights of each layer of the neural network").
computing the at least one metric comprises computing a first noise-to-signal ratio for the activation weights of the first layer and a second noise-to-signal ratio for the edge weights of the first layer; and (Dexu Lin [¶0027] "In some aspects, model performance may be evaluated using a signal to quantization noise ratio (SQNR). That is, in a machine learning model such as a deep convolutional network, the effect of quantizing weights and/or activations is the introduction of quantization noise").
the system further comprises instructions that cause the system to train the neural network with at least some values of the parameters represented in the quantized-precision format, including instructions that cause the system to compute gradient updates for the first layer and at least one other layer of the neural network using the adjusted learning rate. (Dexu Lin [¶0027] "In some aspects, model performance may be evaluated using a signal to quantization noise ratio (SQNR). That is, in a machine learning model such as a deep convolutional network, the effect of quantizing weights and/or activations is the introduction of quantization noise."). 

Regarding claim 16, Dexu Lin teaches The method of claim 15.  However, Dexu Lin does not explicitly teach wherein the hyper-parameter comprises at least one of: a learning rate, a learning rate schedule, a bias, a stochastic gradient descent batch size, a number of neurons in the neural network, or a number of layers in the neural network.  

Hou who teaches a related art of a quantized neural network teaches wherein the hyper-parameter comprises at least one of: a learning rate, a learning rate schedule, a bias, a stochastic gradient descent batch size, a number of neurons in the neural network, or a number of layers in the neural network. (p. 15 See learning rate in algorithm 3). 

It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the quantized neural network in Hou with the signal-to-quantization-noise ratio taught in Dexu Lin. The combination would have been obvious because a person of ordinary skill in the art would be able to determine that Hou determines the learning rate as a function of the quantization loss or noise.  Similarly, Dexu Lin teaches ([¶0027] “In some aspects, model performance may be evaluated using a signal to quantization noise ratio (SQNR)”).  Therefore, one of ordinary skill in the art would be able to determine that in order to optimize the learning rate as a function of noise, the SQNR would be a valuable mathematical tool.

Regarding claim 17, Dexu Lin teaches The method of claim 15, introducing noise to the neural network; ([¶0083] "In some aspects, the process may also inject noise into one or more computational stages of the model." [¶0086] "In block 504, the process determines a model performance. In some aspects, the model performance may comprise a classification accuracy, classification speed, SQNR, other model performance metric or a combination thereof. The model performance may be evaluated by comparing the performance to a threshold").
computing a difference between one or more values of the second tensor and one or more corresponding values of the first tensor; and dividing the difference by the absolute value of the one or more corresponding values of the first tensor. ([¶0028] " In some aspects, the signal variance (or power) of each stage may be assumed to be normalized to 1 for simplicity of notation. The bit width selection may be subject to certain constraints. For example, in some aspects, the bit width selection may be subject to a threshold of SQNR at the output of the model, which may be expressed as:" See Eqn. 2.   Second tensor is quantized tensor, therefore the claim amounts to dividing quantized weights by full precision weights.  Dexu Lin teaches normalizing or quantizing the weight to 1 and dividing by the sum of the signal variance (weights or activations) at each layer.  One of ordinary skill in the art would recognize that the difference between the activation weight of the first and second (quantized) tensor is the quantization noise.). 
However, Dexu Lin does not explicitly teach wherein computing the at least one noise-to-signal ratio comprises: obtaining a first tensor comprising values of one or more parameters of the neural network before introducing noise to the neural network; 
obtaining a second tensor comprising values of the one or more parameters after the introduction of noise to the neural network;  

Hou who teaches a related art of a quantized neural network teaches The method of claim 15, wherein computing the at least one noise-to-signal ratio comprises: obtaining a first tensor comprising values of one or more parameters of the neural network before introducing noise to the neural network; ([p. 2 Sec. 2] "Let the full-precision weights from all L layers be w = [w>1 ,w>2 ,...,w>L ]>, where wl =vec(Wl), and Wl is the weight matrix at layer l. The corresponding quantized weights will be denoted ^w" Weights interpreted as parameters of the neural network.  Ternarization interpreted as a form of quantization. See also Wi.).
obtaining a second tensor comprising values of the one or more parameters after the introduction of noise to the neural network; ([p. 2 Sec. 2] "Let the full-precision weights from all L layers be w = [w>1 ,w>2 ,...,w>L ]>, where wl =vec(Wl), and Wl is the weight matrix at layer l. The corresponding quantized weights will be denoted ^w" W^ interpreted as second tensor after the introduction of noise to the neural network.). 

It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the quantized neural network in Hou with the signal-to-quantization-noise ratio taught in Dexu Lin. The combination would have been obvious because a person of ordinary skill in the art would be able to determine that Hou determines the learning rate as a function of the quantization loss or noise.  Similarly, Dexu Lin teaches ([¶0027] “In some aspects, model performance may be evaluated using a signal to quantization noise ratio (SQNR)”).  Therefore, one of ordinary skill in the art would be able to determine that in order to optimize the learning rate as a function of noise, the SQNR would be a valuable mathematical tool.

Regarding claim 20, Dexu Lin teaches The method of claim 15.  However, Dexu Lin does not explicitly teach, wherein the hyper-parameter is adjusted to compensate for the effect of the noise present in the neural network on the accuracy of gradient updates computed during the training of the neural network.  

Hou teaches The method of claim 15, wherein the hyper-parameter is adjusted to compensate for the effect of the noise present in the neural network on the accuracy of gradient updates computed during the training of the neural network. ([p. 3 Sec. 3.1] "we consider the loss explicitly during quantization and obtain the quantization thresholds and scaling parameter by solving an optimization problem" [p. 4] "Obviously, this objective can be minimized layer by layer. Each proximal Newton iteration thus consists of two steps: (i) Obtain wtl in (7) by gradient descent along ∇l`(wˆt−1), which is preconditioned by the adaptive learning rate...so that the rescaled dimensions have similar curvature" With respect to the instant specification scaling a learning rate is interpreted as capable of improving the gradient update accuracy.  Hou teaches adjusting the learning rate with respect to the the quantization loss (noise).). 

It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the quantized neural network in Hou with the signal-to-quantization-noise ratio taught in Dexu Lin. The combination would have been obvious because a person of ordinary skill in the art would be able to determine that Hou determines the learning rate as a function of the quantization loss or noise.  Similarly, Dexu Lin teaches ([¶0027] “In some aspects, model performance may be evaluated using a signal to quantization noise ratio (SQNR)”).  Therefore, one of ordinary skill in the art would be able to determine that in order to optimize the learning rate as a function of noise, the SQNR would be a valuable mathematical tool.

Claims 5-8 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Hou, and Dexu Lin and in further view of Darryl Lin (“Fixed Point Quantization of Deep Convolutional Networks”, 2016). 

Regarding claim 5, the combination of Hou and Dexu Lin teaches The method of claim 3.  Furthermore, Dexu Lin teaches further comprising generating a scaling factor based on the at least one noise-to-signal metric, wherein: the neural network comprises a total of L layers; and ([¶0030] "In some aspects, the bit width selection may be simplified as: min−Σ ρi log(x i), s.t. Σ α i x i =C, (3) where αi is the noise amplification or reduction factor from layer i to the output, C is a constant that constrains the α factors, and ρi is a scaling factor of the bit width at layer i...In some aspects, the constant C may be computed based on the SQNR limit").
the scaling factor for a layer l of the neural network is generated based on an average value of the noise-to-signal ratio for the activation weights of the layer l as well as a sum of average values of the noise-to-signal ratios ([¶0028] "Equations 1 and 2 may be considered an SQNR budget for a machine learning model." [¶0030] See Eqn. 1 and 2 "the constant C may be computed based on the SQNR limit." Equation 3 can be easily rearranged to solve for the scaling factor such that the scaling factor would necessarily be a factor of C which Dexu Lin teaches may be computed based on the SQNR.  Equation 3 shows a summation of the signal variance (weight or activation values) which would then be divided by the average SQNR or SQNR limit). 

However, the combination of Hou and Dexu Lin does not explicitly teach for the edge weights of layers l+1 through L of the neural network.  

Darryl Lin who teaches a related art of a quantized neural network teaches for the edge weights of layers l+1 through L of the neural network. ([p. 4-5 Sec. 4.1.3] Eqn. 7 and 8. "Ignoring the bias term for the time being, since a(i+1)i is simply a sum of terms like w(l+1)i;j a(l)j , which when quantized all have the same SQNR w(l+1) a(l) . Assuming the product terms w(l+1)i,j*a(l)j are independent, it follows that the value of a(i+1)i , before further quantization, has inverse SQNR that equals [Eqn. 8]"   Darryl Lin explicitly teaches calculating SQNR from layers l+1.). 

It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the quantized neural network methods of Hou and Dexu Lin with that taught by Darryl Lin. The combination would have been obvious because a person of ordinary skill in the art would be able to determine that all three prior art references deal with optimizing a neural network relative to the quantization noise.  Darryl Lin teaches the rationale of iterating from an index of the layer number plus one in a zero indexed array of layer indices ([p. 4,5 Sec. 4.1.3] “In a DCN with multiple layers, computation of the ith activation in layer l+1 of the DCN can be expressed as follows [Eqn. 7]”).  Darryl Lin shows that specific activations for all layers after the current layer can be selectively processed in this manner.

Regarding claim 6, the combination of Hou, Dexu Lin, and Darryl Lin teaches 
The method of claim 5, wherein: training the neural network comprises training the neural network via stochastic gradient descent; and ( See line 16 of Algorithm 3 on p. 15. In Hou).
the scaled learning rate for the layer l of the neural network is computed by the equation: ɛ q = ɛ 1 + E  [ ξ ( l ) X ( l ) ] + ∑ k = l + 1 L  E  [ γ ( k ) w ( k ) ] (Hou [p. 2 Sec. 2.1] "By minimizing the difference between wl and albl, the optimal a l, bl have the simple form: [See Eqn.]" See also Eqns on p. 3 Hou teaches scaling factor based on binarazation a.  Hou explicitly teaches taking a summation of weights whose magnitude is greater than the gradient which Hou teaches may be substituted by E(|Wlt|) which is highly relevant to the disclosure of the instant.  In equation 6 Hou teaches the substitution of the weight in the learning rate equation with the calculation of the noise, the equation shown in equations 6 and 7 of Hou are interpreted as synonymous with the method of calculating a signal-to-noise ratio as is well known in the art.  Therefore, the equation in the instant is seen as simply a mathamatical manipulation of the scaled learning rate equation taught in Hou.).
wherein Eq represents the scaled learning rate, ε represents a predetermined learning rate of the neural network, (Hou teaches scaling factor a [p. 2 Sec. 2.1], and adaptive learning rate d [p. 4 Sec. 3.1] and the relationship between the two in Proposition 3.2.).
E  [ ξ ( l ) X ( l ) ] represents the average value of the noise-to-signal ratio for the activation weights of the layer l over a stochastic gradient descent batch size, in the form of a vector, and (Dexu Lin [¶0027] " In some aspects, model performance may be evaluated using a signal to quantization noise ratio (SQNR). That is, in a machine learning model such as a deep convolutional network, the effect of quantizing weights and/or activations is the introduction of quantization noise. Similar to other communication systems, when quantization noise increases, the model performance decreases. Accordingly, the SQNR observed at the output may provide an indication of model performance or accuracy." Dexu Lin explicitly teaches the noise-to-signal ratio for the weights.  One of ordinary skill in the art would understand that the expected value of the expectation represents an average value of the raio.).
E  [ γ ( k ) w ( k ) ] represents the average value of the noise-to-signal ratio for the edge weights of a layer k of the neural network, per sample, in the form of a matrix. (Dexu Lin [¶0027] "In some aspects, model performance may be evaluated using a signal to quantization noise ratio (SQNR). That is, in a machine learning model such as a deep convolutional network, the effect of quantizing weights and/or activations is the introduction of quantization noise. Similar to other communication systems, when quantization noise increases, the model performance decreases. Accordingly, the SQNR observed at the output may provide an indication of model performance or accuracy." Dexu Lin explicitly teaches the noise-to-signal ratio for the weights.  One of ordinary skill in the art would understand that the expected value of the expectation represents an average value of the ratio.). 

Regarding claim 7, the combination of Hou, Dexu Lin, and Darryl Lin teaches 
The method of claim 6, wherein computing the one or more gradient updates using the scaled learning rate comprises computing gradient updates for one or more parameters of the layer l using the scaled learning rate. (Hou See Algorithm 3 lines 15-24 on p. 15. Hou shows that the gradient of the weight is calculated in line 16, learning rate is then updated at the end of the epoch.  Therefore, any subsequent epochs will use the parameters of the layer using the scaled learning rate.). 

Regarding claim 8, the combination of Hou, Dexu Lin, and Darryl Lin teaches 
The method of claim 7, wherein computing the one or more gradient updates using the scaled learning rate further comprises computing gradient updates for one or more parameters of one or more other layers of the neural network using the same scaled learning rate generated for the layer l. (Hou [p. 14 Sec. D.1] "Batch normalization with a minibatch size 100, is used to accelerate learning. The maximum number of epochs is 50. The learning rate starts at 0:01, and decays by a factor of 0:1 at epochs 15 and 25." See also Algorithm 3 Each layer at each epoch is taught as using the same learning rate eta.). 

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Hou, and Dexu Lin and in further view of Jacob (Intel Lab Distiller Github Repository, 2018). 

Regarding claim 9, the combination of Hou and Dexu Lin teaches The method of claim 2.  Furthermore, Dexu Lin teaches further comprising generating a scaling factor based on the at least one noise-to-signal metric, ([¶0030] "In some aspects, the bit width selection may be simplified as: min−Σ ρi log(x i), s.t. Σ α i x i =C, (3) where αi is the noise amplification or reduction factor from layer i to the output, C is a constant that constrains the α factors, and ρi is a scaling factor of the bit width at layer i...In some aspects, the constant C may be computed based on the SQNR limit").
wherein: the normal-precision floating-point format represents the values with a first bit width; ([¶0025] 'in a fixed point representation, a fixed position of the decimal point is chosen such that there are a fixed number of bits to the right and/or the left of the decimal point and used to represent the elements").
the quantized-precision format represents the values with a second bit width, the second bit width being lower than the first bit width; and ([¶0026] "aspects of the present disclosure are directed to changing the bit widths based on performance specifications and system resources." One of ordinary skill in the art would expect that a quantized-precision bit width would be lower than a normal-precision bit width.).
computing gradient updates for one or more other parameters of the neural network represented with the second bit width ([¶0064] "To adjust the weights, a learning algorithm may compute a gradient vector for the weights. The gradient may indicate an amount that an error would increase or decrease if the weight were adjusted slightly.").
computing the gradient updates for the one or more other parameters using the scaling factor for the second bit width. ([¶0030] "In some aspects, the bit width selection may be simplified as: min−Σ ρi log(x i), s.t. Σ α i x i =C, (3) where αi is the noise amplification or reduction factor from layer i to the output, C is a constant that constrains the α factors, and ρi is a scaling factor of the bit width at layer i...In some aspects, the constant C may be computed based on the SQNR limit" [¶0064] "To adjust the weights, a learning algorithm may compute a gradient vector for the weights." Dexu Lin teaches that the bit widths for the weights are scaled with the scaling factor and that similarly the gradient update is a function of the weights.).  
However, the combination of Hou and Dexu Lin does not explicitly teach the method further comprises: storing the scaling factor in an entry for the second bit width in a lookup table; 
by accessing the entry for the second bit width in the lookup table to obtain the scaling factor for the second bit width;

Jacob who teaches a related method of a quantized neural network teaches the method further comprises: storing the scaling factor in an entry for the second bit width in a lookup table; ([l. 420-494] "def linear_quantize_param(param_fp, param_meta):
            perch = per_channel_wts and param_fp.dim() in [2, 4]
            with torch.no_grad():
                scale, zero_point = _get_tensor_quantization_params(param_fp, param_meta.num_bits, mode, per_channel=perch)" Function shows setting and accessing the map (lookup table) of second bit width to scaling factor for network quantization.).
by accessing the entry for the second bit width in the lookup table to obtain the scaling factor for the second bit width; and ([l. 49-69] "if mode == LinearQuantMode.SYMMETRIC:
        sat_fn = get_tensor_avg_max_abs if clip else get_tensor_max_abs
        sat_val = sat_fn(tensor, dim)
        scale, zp = symmetric_linear_quantization_params(num_bits, sat_val)" Function shows accessing the scale given second bit width via lookup table.). 

It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the neural network quantization in Hou and Dexu Lin with the mapping of the scaling factor and number of bits as shown in Jacob. The combination would have been obvious because a person of ordinary skill in the art would be able to determine from Jacob that a mapping is a logical way to implement the method in an instruction set that could be performed by a processor.  The combination would be further obvious since Jacob implements the neural network quantization methods described in Hou and Dexu Lin in code which can be compiled into instructions for a variety of processors.  

Regarding claim 10, Hou teaches The method of claim 1, wherein the epoch of training of the neural network is a second epoch performed after a first epoch of training of the neural network, the method further comprising: ([p. 14 Sec. D.1] "Batch normalization with a minibatch size 100, is used to accelerate learning. The maximum number of epochs is 50. The learning rate starts at 0:01, and decays by a factor of 0:1 at epochs 15 and 25" Hou explicitly teaches in one trial using 50 epochs, therefore a second epoch is necessarily taught.  See also t in algorithm 3.).
prior to generating the scaled learning rate, performing the first epoch of training using the values of the tensor, including computing one or more gradient updates using a predetermined learning rate of the neural network, (See Algorithm 3 lines 15-24 on p. 15. The scaled learning rate is calculated at the end of the epoch after the gradient updates using the predetermined rate.).
wherein generating the scaled learning rate based on the at least one noise-to-signal metric comprises scaling the predetermined learning rate based on the at least one noise-to-signal metric. ([p. 4 Sec. 3.1] See Eqn. 7 Hou teaches that w hat is the binarized weight, rearrangement of Eqn. 7 allows one to solve for scaled learning rate based on noise to signal metric.  Without rearrangement one of ordinary skill in the art could see the dependency.). 

Claims 14 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Dexu Lin, and Hou and in further view of Yim (US 2019/0130255 A1).

Regarding claim 14, the combination of Dexu Lin and Hou teaches The system of claim 11, wherein the one or more processors comprise a neural network accelerator (Dexu Lin [¶0091] "The various illustrative logical blocks, modules and circuits described in connection with the present disclosure may be implemented or performed with a general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array signal (FPGA) or other programmable logic device (PLD), discrete gate or transistor logic, discrete hardware components or any combination thereof designed to perform the functions described herein" With respect to the instant specification an FPGA is given as an example of a neural network accelerator.). However, the combination of Dexu Lin and Hou does not explicitly teach a neural network accelerator having a tensor processing unit.

Yim who teaches a related art of a quantized neural network teaches a neural network accelerator having a tensor processing unit. ([¶0051] "The hardware accelerator may be, for example, but is not limited to, a neural processing unit (NPU), a tensor processing unit (TPU), a neural engine, or the like, which is a dedicated module for driving a neural network."). 

It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the neural network hardware in Dexu Lin with that in Yim. The combination would have been obvious because a person of ordinary skill in the art would be able to determine that neural networks often take immense processing power and significant amounts of time, therefore any method of accelerating the training process would be beneficial.  Yim reiterates this ([¶0004] “In order for the neural network device to analyze high-quality input in real time and extract information, technology capable of efficiently processing neural network operations may be used.”).  Yim further teaches that a tensor processing unit is a dedicated module for driving the neural network (¶0051]).

Regarding claim 18, the combination of Hou and Dexu Lin teaches The method of claim 17.  However, Dexu Lin does not explicitly teach wherein introducing noise to the neural network comprises one or more of the following: changing a data type of values of one or more parameters of the neural network. decreasing a stochastic gradient descent batch size for one or more layers of the neural network, reducing a voltage supplied to hardware implementing the neural network, implementing analog-based training of the neural network, or storing values of one or more parameters of the neural network in DRAM.  

Yim who teaches a related art of a quantized neural network teaches The method of claim 17, wherein introducing noise to the neural network comprises one or more of the following: changing a data type of values of one or more parameters of the neural network. decreasing a stochastic gradient descent batch size for one or more layers of the neural network, reducing a voltage supplied to hardware implementing the neural network, implementing analog-based training of the neural network, or storing values of one or more parameters of the neural network in DRAM. ([¶0034] "The memory 120 is hardware for storing various data processed in the neural network quantization device" [¶0035] "The memory 120 may be dynamic random access memory (DRAM), but is not limited thereto"). 

It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the neural network hardware in Dexu Lin with that in Yim. The combination would have been obvious because a person of ordinary skill in the art would be able to determine that neural networks often take immense processing power and significant amounts of time, therefore any method of accelerating the training process would be beneficial. Yim reiterates this ([¶0004] “In order for the neural network device to analyze high-quality input in real time and extract information, technology capable of efficiently processing neural network operations may be used.”).  Yim further teaches that a tensor processing unit is a dedicated module for driving the neural network (¶0051], and teaches that DRAM can be used to store neural network data.).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Jain (“Compensated-DNN: Energy efficient low-precision deep neural networks by compensating quantization errors”, 2018).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720.  The examiner can normally be reached on M-F 7:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/SB/Examiner, Art Unit 2124                                                                                                                                                                                                        




/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124