DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1,2,17,19 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Appuswamy (patent application publication No. 2019/0303749).

Appuswamy taught the invention  as claimed (as to claim 1) including  A neural inference chip comprising a neural core(e.g., see fig. 1, paragraph 0029,0032, 0042), the neural core comprising: a vector-matrix multiplier(504) adapted to receive a weight matrix (n x m Weight Matrix in fig. 5) having a weight matrix precision(e.g.,. see paragraph 0049), receive an input activation vector (n activations  in fig. 5) having an input activation vector precision(e.g.,. see paragraphs 0049 and 0051), and compute a partial sum vector (m Parallel adders in fig. 4  computer M Partial Sum Vector sent to M Partial Sum Vect. Register in fig. 4)) by multiplying the input activation vector by the weight matrix (e.g., see fig. 4,5), the partial sum vector having a partial sum vector precision (e.g., see paragraphs 0049 and 0051); a vector processor adapted to receive one or more partial sum vector from one or more vector source (the output of the M Partial sum registers/vector registers   in figs. 4, 11, 15 are fed back to the parallel Adders which process the vectors and therefor provide the vector processor), the one or more vector source including the vector-matrix multiplier (e.g., see fig. 4,11,15)[the output of multiplier 1502 is sent to parallel adders 1504), and perform one or more vector function on the one or more partial sum vector to yield a vector processor output vector (see step 2003 in fig. 20 an paragraph 0110), the vector processor output vector having a precision equal to the partial sum vector precision (e.g., 10-bits)(e.g.,. see paragraph 00049 and 0053); and an activation unit operatively coupled to the vector processor and adapted to apply an activation function to the vector processor output vector(e.g., see step 2004 of fig.20 and paragraph  0110), yielding an output activation vector having an output activation precision, wherein the vector-matrix multiplier, vector processor, and/or activation unit is adapted to operate at variable precision (e.g., see paragraphs 0053-0054).
	Due to the similarities between claims 1 and 19; claim 19 is rejected for the same reasons as claim 1.
As to claim 2 Appuswamy taught The neural inference chip of claim 1, further comprising: at least one network (Network on chip, NOC 102) interconnecting the neural core with at least one additional neural core (e.g., see fig. 1), the at least one network adapted to deliver synaptic weights and/or input activations to the neural cores at variable precision (e.g., see figs. 1,2,3 and paragraphs 0036-0036 and 0039). Note the function in step 2004 of fig. 20 and paragraph 0053 changes the precision of the output vector of a core and therefore the precision of the weight and/or activations were variable and the partial sums from one core are sent to neighboring cores (e.g., see paragraph 0039) where the output of a core includes  partial sum(s)  that provide weights or input activations (e.g., see figs.17,18,19)].
As to claim 17 Appuswamy taught  The neural inference chip of claim 1, wherein the partial sum vector precision is higher than the weight matrix precision and/or the activation vector precision( e.g., see fig. 20 and paragraphs 0110 and 0053-0054). [note Appuswamy taught the output activation function can reduce the precision of the vectors therefore the vector output as the partial sum in fig. 20 is taught a being reduced in precision by the activation function (step 2004 in fig. 20)].
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.



Claim 4,14-16,18,20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Appuswamy as applied to claim 1 above, and further in view of Xie (patent application publication No. 2018/0046901).
As to claim 4 Appuswamy taught  The neural inference chip of claim 1, but did not expressly detail wherein the neural core further comprises: at least one memory, the at least one memory being adapted to store weight matrices, input activation vectors, and/or output activation vectors at variable precision. Xie however taught this limitation (e.g., see paragraphs 0083-0085 and paragraphs 0091-0093). 
	It would have been obvious to one of ordinary skill combine the teachings of Appuswamy and Xie. Both references were directed to hardware neural network accelerator that uses variable length  data  including reducing the size of data. One of ordinary skill would have been motivated to incorporate the Xie teachings of providing SRAM memory for storing the differing lengths of data including activation vectors at least to enable the system to quickly access the data for processing in a properly time manner to increase throughput versus have to retrieve the data from an external source.  
As to claim 14 Appuswamy taught The neural inference chip of claim 1, Xie taught  wherein the variable precision is selectable for each layer of a neural network (e.g., see paragraph 0047-0048 and paragraphs 0082-0087) [note with the source and destination register  files exchange their roles for a next layer then variable precision is selectable for each layer since precision of the previous layer precision is selected for the next layer when the files are exchanged also the hidden layer activation calculation provide that the precision of the hidden layers are selected ].

As to claim 15 Appuswamy taught The neural inference chip of claim 1, Xie taught wherein the weight matrix precision is equal to the activation vector precision (e.g., see paragraph 0095).

As to claim 16 Appuswamy and Xie taught The neural inference chip of claim 15, Appuswamy taught  wherein the partial sum vector precision is not equal to the output activation precision (e.g., see fig. 20 and paragraphs 0110 and 0053-0054). [note the output activation function can reduce the precision of the vectors therefore the vector output as the partial sum in fig. 20 is taught a being reduced in precision by the activation function (step 2004 in fig. 20)].

As to claim 18 Appuswamy and Xie taught The neural inference chip of claim 15, Xie taught wherein the output activation precision is equal to the weight matrix precision (e.g, see paragraphs 00881-0084 and 0093-0095). Note the register files of source and destination register files exchange roles in  a next layer in paragraph 0084  provides the input weight precision being equal to the output precision of the previous layer as in paragraph 0095 the input activation vector is the same  precision as the weight matrix].
As to claim 20 Appuswamy and Xie taught The method of claim 19, Xie taught further comprising: varying at least one of the first, second, and third precision for computation of each layer of a neural network. (e.g., see paragraph 0047-0048 and paragraphs 0082-0087) [note with the source and destination register files exchange their roles for a next layer then variable precision is selectable for each layer since precision of the previous layer precision is selected for the next layer when the files are exchanged also the hidden layer activation calculation provide that the precision of the hidden layers are selected].

Claim 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Appuswamy as applied to claim 1 above, and further in view of  Bourges-Sevenier (patent application publication No. 2019/0325314).

As to claim 8 Appuswamy taught The neural inference chip of claim 1,  but did not expressly detail wherein the activation function is adapted to re- range the vector processor output vector. Bourges-Sevenier however taught this limitation (e.g., see fig. 7 and paragraphs 0068-0069)[the output vector is quantized and the buffer size is changed which  re-ranges the output vector. note the range of bits to store the vector and the number or range  of locations to address or store  the  output vector is reduced provides a re-range operation].
It would have been obvious to one of ordinary skill in the art to combine the teachings of Appuswamy and Bourges-Sevenier. Both references were directed to hardware neural network accelerator that uses variable length  data  including reducing the size of data. One of ordinary skill would have been motivated to incorporate the Bourges-Sevenier teachings of quantizing the output activation vector at least to  reduce the amount of memory required to store the data which would reduce system cost. 
	
Claim 9,10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Appuswamy as applied to claim 8 above, and further in view of Sekiyama (patent application publication No. 20190138888)
As to claim 9 Appuswamy taught . The neural inference chip of claim 8, but did not expressly detail wherein applying the activation function comprises applying a saturating function. Sekiyama however taught this limitation e.g., see paragraph 0028).

It would have been obvious to one of ordinary skill in the art to combine the teachings of Appuswamy and Sekiyama. Both references were directed to hardware neural network accelerator. One of ordinary skill would have been motivated to incorporate the Sekiyama teachings of applying activation using saturation at least to increase non-linear properties of   decision function  without affecting the receptive field of convolutional layer (e.g., see paragraph 0028). This operation provides mapping negative activation maps to zero.  Therefore the saturation operation enables the operation of the system to be  more efficient for processing non-linear function(s).

As to claim 10 Appuswamy and Sekiyama taught  The neural inference chip of claim 9, Sekiyama taught  wherein the saturating function has as least one bound corresponding to the output activation precision.(e.g., see paragraph 0028)[note the applying activation function using saturating function taught including tangent function provides  at least one bound corresponding to the precision of the activation precision].

Claim 12,13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Appuswamy as applied to claim 8 above, and further in view of Lee, Jinmook, etal., (IEEE paper entitled UNPU: An Energy-Efficient Deep Neural Network Accelerator With Fully Variable Weight Bit Precision) 

As to claim 12 Appuswamy taught  The neural inference chip of claim 1,  whereing the precsion is selected from 2-bit or 4-bit or 32-bit (e.g.,  see paragraph 0049) but did not expressly detail wherein the variable precision is selected from  8 bit,  and 16 bit. Lee however taught the selection of fully variable  precision from 1 bit to16 bit.  (e.g., see IV lookup table bit-serial processing element subsection C, Two modes of LBPE on  page 179).  Therefore the combination of Appuswamy and the two modes of Lee provide the variable  precision selected from 2-bit or 4-bit -or 8-bit or 16-bit or 32 bit)
	It would have been obvious to one of ordinary skill in the art to combine the teachings of Appuswamy and Lee. Both references were directed toward the problems of operating a neural network using variable precision. One of ordinary skill would have been motivated to incorporate the Lee teachings of variable precision including 1-bit to 16 bit at least to enable the system to efficiently operate on a wider range of inputs and outputs reducing the requirement to alter the input or outputs for processing and storing which reduces the hardware necessary for data size conversion  reducing  system cost. 
As to claim 13 Appuswamy taught The neural inference chip of claim 1,  Lee taught wherein the variable precision is selectable at runtime.  (e.g., see IV lookup table bit-serial processing element subsection C,  Two modes of LBPE on  page 179). [note because variable precision are modes of the processor then the change in precision is selectable at runtime].

Allowable Subject Matter

Claims 3,5-7,11 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:  the closest prior includes Appuswamy , Xie,  Sekiyama and Lee which taught the limitations of claim 1-2,4,8-10,12-20 as detailed above. However the limitations of claims 3,5-7 and 11 (among other things in the claims  and claims  from which each respectively depends)  is not disclosed in the closest prior art as follows: 
3. The neural inference chip of claim 2, wherein the at least one network is further adapted to vary the weight matrix precision and dimension, input activation vector precision and dimension, and/or the output activation vector precision and dimension while maintaining constant bandwidth.

5. The neural inference chip of claim 4, wherein the at least one memory is further adapted to vary the weight matrix precision and dimension, input activation vector precision and dimension, and/or the output activation vector precision and dimension while maintaining constant storage utilization.
6. The neural inference chip of claim 1, wherein the vector-matrix multiplier is further adapted to vary the weight matrix precision and dimension and/or the input activation vector precision and dimension while maintaining constant bandwidth.
7. The neural inference chip of claim 6, wherein the vector-matrix multiplier is further adapted to compute a variable number of multiplications per cycle at variable precision, wherein the variable number of multiplications per cycle and variable precision are inversely proportional.
11. The neural inference chip of claim 8, wherein applying the activation function comprises truncating one or more least significant bits.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Goyal (patent application publication No. 2017/0316312) disclosed systems for Deep Learning processor (e.g., see abstract).
	Mellempudi (patent application publication No. 2018/0322607) disclosed dynamic precision management for integer Deep learning primitives (e.g., see abstract).
	Young (patent application publication No. 2018/0165574) disclosed performing average pooling in hardware (e.g., see abstract).
	Gupta (patent application publication No. 2006/0217084) disclosed method to compute one shot frequency estimate (e.g., see abstract and fig. 5,7).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERIC COLEMAN whose telephone number is (571)272-4163. The examiner can normally be reached M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on 0-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

ERIC . COLEMAN
Primary Examiner
Art Unit 2183



EC
/ERIC COLEMAN/Primary Examiner, Art Unit 2183