Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Reasons for Allowance
The following is an examiner’s statement of reasons for allowance: Claim 1 requires among other things: 
 A neural inference chip comprising a neural core, the neural core comprising: a vector-matrix multiplier adapted to ….and compute a partial sum vector by multiplying the input activation vector by the weight matrix, the partial sum vector having a partial sum vector precision; a vector processor adapted to receive one or more partial sum vector from one or more vector source, the one or more vector source including the vector-matrix multiplier, and perform one or more vector function on the one or more partial sum vector to yield a vector processor output vector, the vector processor output vector having a precision equal to the partial sum vector precision; and an activation unit operatively coupled to the vector processor and adapted to apply an activation function to the vector processor output vector, yielding an output activation vector having an output activation precision, wherein the vector-matrix multiplier, vector processor, and/or activation unit is adapted to operate at variable precision.

Claims 19 requires among other things: 
A method comprising: receiving a weight matrix having a first precision; receiving an activation vector having the first precision; 4 FHi i009443 1P201807374US01 IBI-18401 computing a vector-matrix multiplication of the weight matrix and the activation vector, yielding a partial sum vector a second precision; performing one or more vector functions on the partial sum vector to yield a vector processor output vector…; and applying an activation function to the vector processor output vector, yielding an output activation vector having a third precision, wherein at least one of the first, second, and third precision is varied at runtime.


The closest prior art includes Appuswamy (patent application publication No. 2019/0303749) and Xie (patent application publication No. 2018/0046901) and Bourges-Sevenier (patent application publication No. 2019/0325314) and Sekiyama (patent application publication No. 2019/0138888).
 
Appuswamy taught A neural inference chip comprising a neural core (e.g., see fig. 1, paragraph 0029,0032, 0042), the neural core comprising: a vector-matrix multiplier (504) adapted to receive a weight matrix (n x m Weight Matrix in fig. 5) having a weight matrix precision (e.g., see paragraph 0049), receive an input activation vector (n activations in fig. 5) having an input activation vector precision (e.g., see paragraphs 0049 and 0051), and compute a partial sum vector (m Parallel adders in fig. 4 computer M Partial Sum Vector sent to M Partial Sum Vect. Register in fig. 4)) by multiplying the input activation vector by the weight matrix (e.g., see fig. 4,5), the partial sum vector having a partial sum vector precision (e.g., see paragraphs 0049 and 0051); a vector processor adapted to receive one or more partial sum vector from one or more vector source (the output of the M Partial sum registers/vector registers in figs. 4, 11, 15 are fed back to the parallel Adders which process the vectors and therefor provide the vector processor), the one or more vector source including the vector-matrix multiplier (e.g., see fig. 4,11,15)[the output of multiplier 1502 is sent to parallel adders 1504), and perform one or more vector function on the one or more partial sum vector to yield a vector processor output vector (see step 2003 in fig. 20 an paragraph 0110), the vector processor output vector having a precision equal to the partial sum vector precision (e.g., 10-bits)(e.g.,. see paragraph 00049 and 0053); and an activation unit operatively coupled to the vector processor and adapted to apply an activation function to the vector processor output vector(e.g., see step 2004 of fig.20 and paragraph 0110), yielding an output activation vector having an output activation precision, wherein the vector-matrix multiplier, vector processor, and/or activation unit is adapted to operate at fixed precision(s)  (e.g., see paragraphs 0053-0054).

Appuswamy taught at least one network (Network on chip, NOC 102) interconnecting the neural core with at least one additional neural core (e.g., see fig. 1), the at least one network adapted to deliver synaptic weights and/or input activations to the neural cores at a precision (e.g., see figs. 1,2,3 and paragraphs 0036-0036 and 0039). 
Appuswamy taught wherein the partial sum vector precision is higher than the weight matrix precision and/or the activation vector precision (e.g., see fig. 20 and paragraphs 0110 and 0053-0054). [note Appuswamy taught the output activation function can reduce the precision of the vectors therefore the vector output as the partial sum in fig. 20 is taught a being reduced in precision by the activation function (step 2004 in fig. 20)].

Xie however taught wherein the neural core further comprises: at least one memory, the at least one memory being adapted to store weight matrices, input activation vectors, and/or output activation vectors. (e.g., see paragraphs 0083-0085 and paragraphs 0091-0093).

Xie taught wherein the weight matrix precision is equal to the activation vector precision (e.g., see paragraph 0095).

Appuswamy taught wherein the partial sum vector precision is not equal to the output activation precision (e.g., see fig. 20 and paragraphs 0110 and 0053-0054). [note the output activation function can reduce the precision of the vectors therefore the vector output as the partial sum in fig. 20 is taught a being reduced in precision by the activation function (step 2004 in fig. 20)).

Xie taught wherein the output activation precision is equal to the weight matrix precision (e.g., see paragraphs 00881-0084 and 0093-0095). [Note the register files of source and destination register files exchange roles in a next layer in paragraph 0084 provides the input weight precision being equal to the output precision of the previous layer as in paragraph 0095 the input activation vector is the same _ precision as the weight matrix].

Xie taught  varying at least one of the first, second, and third precision for computation of each layer of a neural network. (e.g., see paragraph 0047-0048 and paragraphs 0082-0087). 
Bourges-Sevenier  taught  wherein the activation function is adapted to re- range the vector processor output vector. (e.g., see fig. 7 and paragraphs 0068-0069)[the output vector is quantized and the buffer size is changed which re-ranges the output vector. note the range of bits to store the vector and the number or range of locations to address or store the output vector is reduced provides a re-range operation].
Sekiyama  taught  wherein applying the activation function comprises applying a saturating function. (e.g., see paragraph 0028). 
Sekiyama taught wherein the saturating function has as least one bound corresponding to the output activation precision.(e.g., see paragraph 0028)[note the applying activation function using saturating function taught including tangent function provides at least one bound corresponding to the precision of the activation precision].

Appuswamy taught  wherein the precision is 2-bit or 4-bit or 32-bit (e.g., see paragraph 0049).   Lee taught single bit and multiple bit modes (e.g., see IV lookup table bit-serial processing element subsection C, Two modes of LBPE on page 179). 

However the closest prior art did not disclose among other things the limitations of claims 1,19 as shown above. 

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERIC COLEMAN whose telephone number is (571)272-4163. The examiner can normally be reached M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on 0-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

ERIC . COLEMAN
Primary Examiner
Art Unit 2183



EC
/ERIC COLEMAN/Primary Examiner, Art Unit 2183