Detailed Action
This action is in response to Applicant's communications filed 29 March 2021.  
Claim(s) 1 and 12 was/were amended.  No claims were cancelled. No claims were withdrawn.  No claims were added.  Therefore, claims 1-20 are pending in this Application.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendments/Arguments
Applicant's amendments/arguments, filed 29 March 2021, regarding the rejections of claims 1 and 12 under 35 USC 102 are regarding newly amended claims and are addressed in the current rejection. 

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –




Claim(s) 1-5, 7-10, 12-16, and 18-20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Lin et al. (US 2016/0328646, hereinafter "Lin").

Regarding Claim 1,
Lin teaches a computation method used in a convolutional neural network ("Convolutional neural networks (CNNs)" [0006]), comprising:
receiving original data ("Quantization efficiency in artificial neural networks, according to aspects of the present disclosure, may be better understood by a review of quantization according to the probability distribution function 400 shown in FIG. 4. For example, an input to the quantizer may be uniformly distributed over [Xmin, Xmax], where Xmin and Xmax define the range of a fixed point representation." [0063]);
determining a first optimal quantization step size according to a distribution of the original data (FIG. 8, Determine quantizer parameters for quantizing values of the floating point network based on the selected moment(s) to obtain corresponding values of a fixed point machine learning network 804; "Application of quantization to the weights, biases, and activation values in artificial neural networks includes the determination of a step size. For example, the step sizes of a symmetric uniform quantizer for Gaussian, Laplacian, and Gamma distributions may be calculated with a deterministic function of the standard deviation of the input distribution, if it is assumed that the distributions have zero mean and unit variance." [0064]; 

    PNG
    media_image1.png
    509
    421
    media_image1.png
    Greyscale
  [0078]),
wherein the step of determining the first optimal quantization step size comprises calculating a mean ("measuring the mean activations, µCl)" [0076]) and a variance (σ2, "Equations for determining step size may be specified as: σ′=|μ|+σ or σ′=σ or σ′=√{square root over (μ2+σ2)} (14) where μ and σ are mean and standard deviation of the input to compute an effective sigma value σ′." [0078]; σ2 is the square of the standard deviation and the square of the standard deviation is the variance) of the distribution of the original data ("FIG. 7A illustrates an input distribution 700 of activation values for an exemplary deep convolutional network also having a mean (μ) and a variance (σ)" [0069]);
("Equations for determining step size may be specified as: σ′=|μ|+σ or σ′=σ or σ′=√{square root over (μ2+σ2)} (14) where μ and σ are mean and standard deviation of the input to compute an effective sigma value σ′." [0078]; σ′ teaches a first quantization parameter);
determining the first optimal quantization step size according to the first quantization parameter (
 
    PNG
    media_image2.png
    350
    569
    media_image2.png
    Greyscale
 [0078], the optimal step size as shown herein is 2-n, which is calculated using σ′ in equation 15)
performing fixed-point processing to the original data according to the first optimal quantization step size to generate first data (

    PNG
    media_image3.png
    234
    502
    media_image3.png
    Greyscale
 [0076]);
inputting the first data to a first layer of the convolutional neural network to generate first output data ("Deep learning architectures, such as deep belief networks and deep convolutional networks, are layered neural networks architectures in which the output of a first layer of neurons becomes an input to a second layer of neurons, the output of a second layer of neurons becomes an input to a third layer of neurons, and so on." [0007]);
determining a second optimal quantization step size according to a distribution of the first output data (

    PNG
    media_image4.png
    136
    677
    media_image4.png
    Greyscale
 [0078]; " The additional adjustment factor, α, is a value that may be adjusted to improve the classification performance." [0080]; "In addition, the step size adjustment factor a may be specified differently throughout the model. For example, a may be specified individually for weights and activations of each layer. In addition, the weights and biases may have very different dynamic ranges. For example, weights and biases may be specified to have different Q number representations and different bit-widths. Additionally, the bit-width of weights and biases in the same layer may be the same. In one configuration, for a given layer, weights have a format of Q 3.18 and biases have a format of Q 6.9." [0081])
wherein the step of determining the second optimal quantization step size comprises calculating a mean ("measuring the mean activations, µCl)" [0076]) and a variance (σ2, "Equations for determining step size may be specified as: σ′=|μ|+σ or σ′=σ or σ′=√{square root over (μ2+σ2)} (14) where μ and σ are mean and standard deviation of the input to compute an effective sigma value σ′." [0078]; σ2 is the square of the standard deviation and the square of the standard deviation is the variance) of the distribution of the first output data ("Deep learning architectures, such as deep belief networks and deep convolutional networks, are layered neural networks architectures in which the output of a first layer of neurons becomes an input to a second layer of neurons, the output of a second layer of neurons becomes an input to a third layer of neurons, and so on." [0007]; "the output of each layer may serve as an input of a succeeding layer in the deep convolutional network" [0054]);
calculating a second quantization parameter according to the mean and variance of the distribution of the output data ("Equations for determining step size may be specified as: σ′=|μ|+σ or σ′=σ or σ′=√{square root over (μ2+σ2)} (14) where μ and σ are mean and standard deviation of the input to compute an effective sigma value σ′." [0078]; σ′ teaches a second quantization parameter); and
determining the second optimal quantization step size according to the second quantization parameter (
 
    PNG
    media_image2.png
    350
    569
    media_image2.png
    Greyscale
 [0078], the optimal step size as shown herein is 2-n, which is calculated using σ′ in equation 15)
performing the fixed-point processing to the first output data according to the second optimal quantization step size to generate second data (

    PNG
    media_image3.png
    234
    502
    media_image3.png
    Greyscale
 [0076]); and
inputting the second data to a second layer of the convolutional neural network ("Deep learning architectures, such as deep belief networks and deep convolutional networks, are layered neural networks architectures in which the output of a first layer of neurons becomes an input to a second layer of neurons, the output of a second layer of neurons becomes an input to a third layer of neurons, and so on." [0007]).

Regarding Claim 2,
Lin teaches the method of claim 1.  Lin further teaches wherein the data format of the original data is a floating-point format and the data formats of the first data and the second data are a fixed-point format (FIG. 8, Determine quantizer parameters for quantizing values of the floating point network based on the selected moment(s) to obtain corresponding values of a fixed point machine learning network 804; "In an aspect of the present disclosure, the instructions loaded into the general-purpose processor 102 may comprise code for quantizing a floating point neural network to obtain a fixed point neural network. The instructions loaded into the general-purpose processor 102 may also comprise code for providing a fixed point representation when quantizing weights, biases and activation values in the network." [0036])

Regarding Claim 3,
Lin teaches the method of claim 1.  Lin further teaches wherein the first optimal quantization step size is different from the second optimal quantization step size ("In addition, the step size adjustment factor a may be specified differently throughout the model. For example, a may be specified individually for weights and activations of each layer. In addition, the weights and biases may have very different dynamic ranges. For example, weights and biases may be specified to have different Q number representations and different bit-widths. Additionally, the bit-width of weights and biases in the same layer may be the same. In one configuration, for a given layer, weights have a format of Q 3.18 and biases have a format of Q 6.9." [0081]).

Regarding Claim 4,
Lin teaches the method of claim 1.  Lin further teaches wherein the first layer and the second layer are convolutional layers of the convolutional neural network ("In some artificial neural networks (ANNs), such as a deep convolutional network (DCN), quantization may be applied to activations of the normalization layer; weights, biases, and activations of the fully connected layer; and/or weights, biases, and activations of the convolution layer." [0032]).

Regarding Claim 5,
Lin teaches the method of claim 4.  Lin further teaches wherein the first data input to the first layer of the convolutional neural network comprises first image data ("A DCN may be trained with supervised learning. During training, a DCN may be presented with an image, such as a cropped image of a speed limit sign, and a “forward pass” may then be computed to produce an output 322." [0044]), first weight data and/or first bias data ("FIG. 6A illustrates an input distribution 600 for an exemplary deep convolutional network. In this example, the input distribution 600 includes a variance (σ) and a mean value (μ). Aspects of the present disclosure are directed towards specifying a zero mean (μ=0) for the distributions of weights, biases and activation values, for example, as shown in FIG. 6B." [0066]).

Regarding Claim 7,
Lin teaches the method of claim 1.  Lin further teaches wherein before performing the fixed-point processing to the first output data according to the second optimal quantization step size, the first output data is output to a rectified linear (ReLU) layer and/or a pooling layer (FIG. 3A; "Quantization, however, need not be applied to the pooling layer if max pooling is specified, and/or the neuron layer if rectified linear units (ReLU) are specified." [0062]).

Regarding Claim 8,
Lin teaches the method of claim 1.  Lin further teaches wherein the first optimal quantization step size and/or the second optimal quantization step size are determined offline ("A processor may also be implemented as a combination of computing devices" [0089]).

Regarding Claim 9,
Lin teaches the method of claim 1.  Lin further teaches wherein the step of performing the fixed-point processing to the original data according to the first optimal quantization step size to generate the first data further comprises: determining a first fixed-point format ("Fixed point numbers may be specified for using less complex software and/or hardware designs at the cost of reduced accuracy because floating point numbers have a greater dynamic range compared to fixed point numbers. Converting floating point numbers to fixed point numbers through the process of quantization may decrease the complexity of hardware and/or software implementations. The floating point numbers may assume a single-precision binary format including a sign bit, an 8-bit exponent, and a 23-bit fraction component. Aspects of the disclosure are directed to using the Q number format to represent fixed point numbers. Still, other formats may be considered. The Q number format is represented as Qm.n, where m is a number of bits for an integer part and n is a number of bits for a fraction part. In one configuration, m does not include a sign bit. Each Qm.n format may use an m+n+1 bit signed integer container with n fractional bits. In one configuration, the range is [−(2m), 2m-2n)] and the resolution is 2−n. For example, a Q14.1 format number may use sixteen bits. In this example, the range is [−214, 214-21] (e.g., [−16384.0, +16383.5]) and the resolution is 21(e.g., 0.5)." [0059]-[0060]) according to the first optimal quantization step size; and performing the fixed-point processing to the original data according to the first fixed-point format to generate the first data (

    PNG
    media_image3.png
    234
    502
    media_image3.png
    Greyscale
 [0076]).

Regarding Claim 10,
Lin teaches the method of claim 1.  Lin further teaches wherein the original data comprises original image data ("A DCN may be trained with supervised learning. During training, a DCN may be presented with an image, such as a cropped image of a speed limit sign, and a “forward pass” may then be computed to produce an output 322." [0044]), first weight data and/or first bias data ("FIG. 6A illustrates an input distribution 600 for an exemplary deep convolutional network. In this example, the input distribution 600 includes a variance (σ) and a mean value (μ). Aspects of the present disclosure are directed towards specifying a zero mean (μ=0) for the distributions of weights, biases and activation values, for example, as shown in FIG. 6B." [0066]).

Regarding Claim(s) 12-16 and 18-20,
Claim(s) 12-16 and 18-20 recite(s) a computation device including a processor (Lin: "processor" [0089]) and memory (Lin: "memory" [0090]) storing instructions for performing functions corresponding to the method steps recited in claim(s) 1-5 and 7-9, respectively.  Lin teaches the limitations of claim(s) 12-16 and 18-20 as set forth above in connection with claim(s) 1-5 and 7-9.  Therefore, claim(s) 12-16 and 18-20 is/are rejected under the same rationale as respective claim(s) 1-5 and 7-9.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:


The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claim(s) 6 and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lin et al. (US 2016/0328646, hereinafter "Lin") in view of Anwar et al. (Fixed Point Optimization of Deep Convolutional Neural Networks for Object Recognition, hereinafter "Anwar").

Regarding Claim 6,
Lin teaches the method of claim 5.  Lin does not explicitly teach wherein the weight data and the bias data are performed fixed-point processing by using a non-uniform quantization method.
Anwar teaches wherein the weight data and the bias data (Sec. 2.2 Layer wise sensitivity analysis for non-uniform quantization; Figure 2 shows that the weights between the penultimate and final layer, rearlayer7 are most sensitive to quantization" sec. 2.2, p. 1133) are performed fixed-point processing ("optimization method for fixed point deep convolutional neural network, sec. Abstract, p. 1131) by using a non-uniform quantization method ("Table 2 shows a layer wise distribution of weights after undergoing non-uniform quantization and retraining." sec. 3., p. 1133).
Lin and Anwar are analogous art because both are directed towards quantization of convolutional neural networks. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the quantization method of Lin with the non-uniform quantization of Anwar.  The modification would have been obvious because one of ordinary skill in the art would be motivated to reduce memory storage and improve generalization of the network, as suggested by Anwar (sec. Abstract, p. 1131; sec. Conclusion, p. 1134).

Regarding Claim(s) 17,
(Lin: "processor" [0089]) and memory (Lin: "memory" [0090]) storing instructions for performing functions corresponding to the method steps recited in claim(s) 6.  The Lin/Anwar combination teaches the limitations of claim(s) 17 as set forth above in connection with claim(s) 6.  Therefore, claim(s) 17 is/are rejected under the same rationale as claim(s) 6.

Claim(s) 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lin et al. (US 2016/0328646, hereinafter "Lin") in view of Morcel et al. (FPGA-based Accelerator for Deep Convolutional Neural Networks for the SPARK Environment, hereinafter "Morcel").

Regarding Claim 11,
Lin teaches the method of claim 1.  Lin does not explicitly teach wherein before determining the second optimal quantization step size according to the distribution of the first output data further comprises: converting the data format of the first output data from a fixed-point format to a floating-point format
Morcel teaches wherein before determining the second optimal quantization step size according to the distribution of the first output data further comprises: converting the data format of the first output data from a fixed-point format to a floating-point format ("Figure 2 shows the architecture of our accelerator. It contains a floating-to-fixed-point conversion circuit, a zero padding circuit, a Pipelined 2D convolution filter, a fixed-to-floating-point conversion circuit, and finally, a Control Unit." sec. IV.B, p. 130).
Lin and Morcel are analogous art because both are directed towards training deep convolutional neural networks. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the quantization method of Lin with the training accelerator of Anwar.  The modification would have been obvious because one of ordinary skill in the art would be motivated to accelerate training of deep convolutional neural networks, as suggested by Morcel (sec. Abstract, p. 126).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHARLES C KUO whose telephone number is (571)270-7477.  The examiner can normally be reached on M-F: 9:00 a.m. - 6:00 p.m..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571) 272-9767.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/CHARLES C KUO/Examiner, Art Unit 2126 
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126