DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments regarding the 101 rejections have been fully considered. The rejections are withdrawn due to claim amendments including non-rejected claim limitations.
Applicant’s arguments regarding the 103 rejections have been fully considered. The arguments for claims 9-20 are persuasive and the rejections are withdrawn. Applicant’s arguments regarding claims 1-8 are not persuasive. Please see the updated rejection which shows how the current references teach the claim limitations. 
Claim Rejections - 35 USC § 103 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or non-obviousness.
Claims 1, 3, 4, and 8 are rejected under 35 U.S.C. 103 as being unpatentable over Li (Li, F., Zhang, B., & Liu, B. (2016). Ternary weight networks. arXiv preprint arXiv:1605.04711) in view of Mao (Mao, Senior, Vanhoucke. “Deep Learning and Unsupervised Feature Learning Workshop.” NIPS 2011) and in view of Saldana (U.S. Patent # 10,643,126). 

Regarding claim 1, Li teaches a computing device comprising:
a ternarization unit including a weight ternarization circuit; wherein the weight ternarization circuit is to convert [threshold-based ternary function (3) shows the conversion to ternary-value weights Wt from full precision weights W; section 2.2 Approximated solution with threshold-based ternary function on p.2] a weight tensor from a floating-point representation [full precision weights W; section 2.1 Problem Formulation on p.2] to a ternary representation including a two-bit ternary weight [ternary-value weights Wt; section 2.1 Problem Formulation on p.2] and [an eight-bit integer] scale factor [scaling factor; section 2.1 Problem Formulation on p.2] and the two-bit ternary weight represents a weight value of one of a negative one, zero, or positive one [abstract “We introduce ternary weight networks (TWNs) - neural networks with weights constrained to +1, 0 and -1.”]
Li, however, does not explicitly teach an eight-bit integer scale factor
a parallel processor compute unit to perform a set of parallel integer compute operations;
a ternarization unit including the weight ternarization circuit and an activation quantization circuit; wherein the activation quantization circuit is to convert an activation tensor from a floating-point representation to an integer representation; and wherein the parallel processor compute unit includes one or more circuits to perform the set of parallel integer compute operations on the ternary representation of the weight tensor and the integer representation of the activation tensor.
Mao, analogous to Li, teaches an eight-bit integer scale factor [pg. 5 §4.1 ¶2 “Weights are scaled by taking their maximum magnitude in each layer and normalizing them to fall in the [−128, 127] range. Biases are scaled by the same amount and linearly quantized to 32 bits. The matrix multiplication at each layer produces a 3”]
an activation quantization circuit; wherein the activation quantization circuit is to convert [Mao explains how activations may be compressed into a Sigmoid wherein a 32 bit integer is mapped to an 8 bit probability Page 5 Section 4] an activation tensor from a floating-point representation [Mao recites each initial layer is 32 bits Page 5 Section 4.1] to an integer representation [Mao explains that activations are good candidates for signed integer representation Page 5 Section 4].
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the activation quantization circuit for Mao with the conversion circuit of Li. The combination would have be obvious because a person of ordinary skill in the art would know to apply a known technique (i.e. the conversion by Mao) to a known device ready for improvement to yield predictable results.
Saldana, analogous to Li/Mao, teaches a parallel processor compute unit to perform a set of parallel integer compute operations [Saldana recites a compute unit wherein the memory control unit can communicate the quantized weight values in parallel to the data requesting unit Saldana Col. 8 Line 20-25]; wherein the parallel processor compute unit includes one or more circuits to perform the set of parallel integer compute operations on the ternary representation of the weight tensor and the integer representation of the activation tensor [Saldana recites using a compute unit to convert a tensor value into binary or ternary form Col. 3 Line 10-20].
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the parallel processor compute unit as disclosed in Saldana with the activation quantization circuit and conversion circuity of Li/Mao. The combination would have be obvious because a person of ordinary skill in the art would recognize that a parallel processor would be suited for executing the type of operations described in Li/Mao.
Regarding claim 3, Li/Mao/Saldana teaches the computing device as in claim 1, the activation quantization circuit to convert the activation tensor from a single-precision floating-point representation to an eight-bit integer representation [Mao recites using 8 bit quantization.  Mao states that the “benefits of quantizing the majority of the network down to 8 bits is that the total memory footprint of the network consequently shrinks by between 3× and 4×.” Page 5 Section 4.1 Paragraph 1].
Li, Mao, and Saldana are analogous art because they are in the same field of invention, neural networks.  Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to use an activation quantization circuit to convert an activation tensor from a single precision floating point to an eight bit integer representation.  The combination would have been obvious because a person of ordinary skill in the art would know that a conversion from a single-precision floating point to an eight bit integer representation would be the most reliable and predictable way to convert activation tensor data. 

Regarding claim 4, Saldana additionally teaches the computing device as in claim 1, the weight ternarization circuit to convert the weight tensor from a single-precision floating-point representation to the ternary representation [Saldana recites that for a neural network an interconnect weight may be quantized into one of two values (binary values ) or into one of three values (ternary values).  Saldana also recites that quantizing a 32 or 64 - bit number into a smaller bit number can significantly reduce the size and cost of a multiplier circuit.  Col. 3 Line 10-20].
Saldana is analogous art to Li and Mao because it is in the same field of invention, neural networks.  Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to use a ternarization circuit to convert the weight tensor from a single precision floating point to a ternary representation.  The combination would have been obvious because a person of ordinary skill in the art would know that a conversion from a single-precision floating point to ternary representation would be the most predictable and organized way to convert weight tensor data. 

Regarding claim 8, Li/Mao/Saldana teach the computing device as in claim 1, the parallel processor compute unit including one or more ternary logic units to perform one or more operations in the set of parallel integer compute operations [Saldana recites performing operations in parallel.  Fig. 8 Element 830 and Col. 8 Line 20-27].  

Li/Mao/Saldana is analogous art because it is in the same field of invention, neural network models.  Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to use one or more ternary logic units to perform parallel integer compute operations.  The combination would have been obvious because a person of ordinary skill in the art would know that the most predictable way to perform parallel integer compute operations would be to use one or more logic units capable of ternary computations. 

Claims 2 and 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over Li/Mao/Saldana in view of Dally (Dally, W. J., Zhu, C., Han, S., & Mao, H. (2016). “Trained ternary quantization.” arXiv preprint arXiv:1612.01064). 
Regarding claim 2, Li/Mao/Saldana recites the computing device as in claim 1.  However, it does not recite the parallel processor compute unit to perform the set of parallel integer compute operations in response to an instruction provided by a machine learning inferencing framework.
Regarding claim 2, Dally recites the parallel processor compute unit to perform the set of parallel integer compute operations in response to an instruction provided by a machine learning inferencing framework [Dally recites that deep neural networks are becoming the preferred approach for many machine learning applications and then goes on to propose Trained Ternary Quantization which uses two full-precision scaling coefficients W p l , Wn l for each layer l, and quantize the weights to {−Wn l , 0, +W p l }. Page 1 Section 1 Paragraphs 1-2].  
Li, Mao, Saldana, and Dally are analogous art because they are in the same field of invention, neural networks.  Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to use a parallel processor compute unit in order to execute parallel operations in response to instructions provided by a machine learning inferencing framework.  The combination would have be obvious because a person of ordinary skill in the art would know that a parallel processor compute unit would be the most reliable and predictable processor to use to perform parallel compute operations instructed by a machine learning inferencing framework. 
Regarding claim 5, Li/Mao/Saldana teaches the computing device as in claim 4.  However Li/Mao/Saldana does not teach wherein the eight-bit integer scale factor is determined to minimize L2 loss between values of a group of pre-trained weights and a group of ternary weights  
Dally however teaches wherein the eight-bit integer scale factor is determined to minimize L2 loss between values of a group of pre-trained weights and a group of ternary weights  [pg. 3 §3.3 “They then solve an optimization problem of minimizing L2 distance between full precision and ternary weights to obtain layer-wise values of Wl and ∆l :”].
Li, Mao, Saldana, and Dally are analogous art because they are in the same field of invention, neural networks.  Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to use a parallel processor compute unit in order to execute parallel operations in response to instructions provided by a machine learning inferencing framework.  The combination would have be obvious because a person of ordinary skill in the art would know that a parallel processor compute unit would be the most reliable and predictable processor to use to perform parallel compute operations instructed by a machine learning inferencing framework. 


Claims 6 and 7 are rejected under 35 U.S.C. 103 as being unpatentable over Li/Mao/Saldana in view of Ni (Ni et. al. (2016). “Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients.” arXiv preprint arXiv:1606.06160).  

Regarding claim 6, Li/Mao/Saldana teaches the computing device as in claim 4.  However, Li/Mao/Saldana does not teach wherein to convert the weight tensor from the single-precision floating-point representation, the weight ternarization circuit is to decompose the weight tensor into a set of orthogonal vectors and ternarize components of the set of orthogonal vectors.  

Ni, when reciting conversion to ternary form, teaches wherein to convert the weight tensor from the single-precision floating-point representation, the weight ternarization circuit is to decompose the weight tensor into a set of orthogonal vectors and ternarize components of the set of orthogonal vectors.  

Ni discusses decomposing the weight tensor into a set of orthogonal vectors and then ternarizing this set of vectors.  Ni teaches wherein to convert the weight tensor from the single-precision floating-point representation, the weight ternarization circuit is to decompose the weight tensor into a set of orthogonal vectors and ternarize components of the set of orthogonal vectors [Ni recites a function designed for k-bit quantization of gradients f k γ (dr) = 2 max0(|dr|) [ quantizek ( dr / 2 max0(|dr|) + 1/2 ) – 1/2 ] in order to map it into a set of [0,1] vectors which can then be quantized. Page 4 Section 2.5].

Ni is analogous art to Li, Mao, and Saldana because it is in the same field of invention, neural network models.  Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to decompose weight tensor data into orthogonal vectors which are then ternarized into a set of orthogonal vectors as taught by Ni.  The combination would have been obvious because a person of ordinary skill in the art would know that the most reliable and predictable way to organize the activity of transforming data to ternary form would involve first decomposing the weight tensor into a set of orthogonal vectors and then ternarizing this set of orthogonal vectors. 

Regarding claim 7, Li/Mao/Saldana teaches the computing device as in claim 1. However Li/Mao/Saldana does not explicitly teach a ternarization unit to ternarize a first set of weights having a first data distribution into a first set of ternarized weights having a first scale factor and ternarize a second set of weights having a second data distribution into a second set of ternarized weights having a second scale factor different from the first scale factor.

Ni addresses how to use scale factors on sets of ternarized weights.  Ni teaches a ternarization unit to ternarize a first set of weights having a first data distribution into a first set of ternarized weights having a first scale factor [Ni recites separating weight data into forward and back propagated gradients. Page 3 Section 2.3.  Ni recites the scaling factor EF(|ri |) as the mean of absolute value of each output channel of weights.  Ni discusses a scaling factor that increases the value range of weights, while still being able to exploit bit convolution kernels.  Page 4 Paragraph 1] and ternarize a second set of weights having a second data distribution into a second set of ternarized weights having a second scale factor different from the first scale factor [Ni recites separating weight data into forward and back propagated gradients.  Page 3 Section 2.3 Ni discusses the need for a separate scale factor for back-propagated gradients.  Ni recites that the channel-wise scaling factors will make it impossible to exploit bit convolution kernels when computing the convolution between gradients and the weights during back propagation.  Thus, a constant scalar is used for back propagation. Page 4 Paragraph 1]. 
Li, Mao, Saldana, and Ni is analogous art because it is in the same field of invention, neural networks.  Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to use a ternarization unit that ternarizes two sets of weights using two differing scaling factors as taught by Ni.  The combination would have been obvious because a person of ordinary skill in the art would know that the most reliable and ordinary way for a ternarization unit to ternarize weight tensor data would be to use two or more scaling factors on the weight tensor data distributions. 
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KEVIN W FIGUEROA whose telephone number is (571)272-4623. The examiner can normally be reached Monday-Friday, 10AM-6PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, MIRANDA HUANG can be reached on (571)270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

KEVIN W FIGUEROA
Primary Patent Examiner
Art Unit 2124



/Kevin W Figueroa/Primary Examiner, Art Unit 2124