DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 07/12/2018 and 06/26/2020.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Specification
The disclosure is objected to because of the following informalities:
In paragraph 0018 line 11, “weighed sum” should read “weighted sum”.
Appropriate correction is required.


Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claim 11-13 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly 
Claim 11 and 12 recites the limitation “The system of claim 2” in line 1. There is insufficient antecedent basic for these limitation in the claim. Claim 11 and 12 depend on claim 2, but claim 2 never mentions any system. Claim 2 mentions the neural inference chip of claim 1, but not the system of claim 1. For examination purposes examiner has interpreted “The system of claim 2” to be the neural inference chip of claim 2. 
Claim 13 recites the limitation “The system of claim 1” in line 1. There is insufficient antecedent basic for this limitation in the claim. Claim 13 depend on claim 1, but claim 1 never mentions any system. For examination purposes examiner has interpreted “The system of claim 1” to be the neural inference chip of claim 1. 

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.




Claim(s) 1-14, and 16-25 is/are rejected under 35 U.S.C. 102(a)(1), (a)(2) as being anticipated by Dally (US 20180046906).
Regarding Claim 1, Dally teaches:
	A neural inference chip, comprising: a plurality of neural cores (Paragraph 0038; The SCNN 200 includes a memory interface 205, layer sequencer 215, and an array of processing elements (PEs) 210. Figure. 2A);
	each of the plurality of neural cores comprising a plurality of vector compute units configured to operate in parallel (Paragraph 0060; To exploit the parallelism of many multipliers within a PE 220, a vector of F filter-weights may be fetched from the weight buffer 230 and a vector of I inputs may be fetched from the input activations buffer 235. The vectors are delivered to an array of F×I multipliers 240 to compute a full Cartesian product of output partial sums. Figure. 2C);
wherein: each of the plurality of neural cores is configured to compute in parallel output activations by applying its plurality of vector compute units to input activations (Paragraph 0060; a vector of F filter-weights may be fetched from the weight buffer 230 and a vector of I inputs may be fetched from the input activations buffer 235. The vectors are delivered to an array of F×I multipliers 240 to compute a full Cartesian product of output partial sums. Paragraph 0061; The multiplier outputs (e.g., products) are sent to the accumulation unit 245, which updates the partial sums stored in the accumulation buffer 250. Each product is accumulated with a partial sum at the output coordinates in the output activation space that matches (i.e., equals) a position associated with the product.);
each of the plurality of neural cores is assigned a subset of output activations of a layer of a neural network for computation (Paragraph 0063; each PE 210 operates on an exclusive subset of the input and output activation space. In other words, there is no duplication of input activations or output activations between the PEs 210. Paragraph 0065; At a CNN layer boundary, the output activations of the previous layer become the input activations of the next layer.).

Regarding Claim 2, Dally teaches:
	The neural inference chip of claim 1, wherein: upon receipt of a subset of input activations of the layer of the neural network, each of the plurality of neural cores computes a partial sum for each of its assigned output activations (Paragraph 0063; each PE 210 operates on an exclusive subset of the input and output activation space. In other words, there is no duplication of input activations or output activations between the PEs 210.);
	and computes its assigned output activations from at least the computed partial sums (Paragraph 0061; Each product is accumulated with a partial sum at the output coordinates in the output activation space that matches (i.e., equals) a position associated with the product.).

Regarding Claim 3, Dally teaches:
(Paragraph 0064; The halos now contain incomplete partial sums that must be communicated to neighbor PEs 210 for accumulation. Paragraph 0076exchange partial sums with neighboring PEs 210 for the halo regions at the boundary of the PE's 210 output activations).

Regarding Claim 4, Dally teaches:
	The neural inference chip of claim 1, wherein the vector compute units comprise multiplication and addition units (Paragraph 0061; The number of adders in the adder unit 255 does not necessarily equal the number of multipliers in the F×I multiplier array 240. However, the accumulation unit 245 must employ at least F×I adders in the adder unit 255 to match the throughput of the F×I multiplier array 240.).

Regarding Claim 5, Dally teaches:
	The neural inference chip of claim 1, wherein the vector compute units comprise accumulation units (Paragraph 0061; The number of adders in the adder unit 255 does not necessarily equal the number of multipliers in the F×I multiplier array 240. However, the accumulation unit 245 must employ at least F×I adders in the adder unit 255 to match the throughput of the F×I multiplier array 240.).


	The neural inference chip of claim 2, wherein the plurality of neural cores perform said partial sum computation in parallel (Paragraph 0042; Products generated by the multipliers within each PE 210 are accumulated to produce intermediate values (e.g., partial sums). Paragraph 0104; To increase parallelism beyond a single PE 210, multiple PEs 210 can be operated in parallel with each working on a disjoint three-dimensional tile of input activations.).

Regarding Claim 7, Dally teaches:
	The neural inference chip of claim 2, wherein the plurality of neural cores perform said output activation computation in parallel (Paragraph 0108; At step 410, the non-zero elements are processed in parallel by the PE 210 to produce a plurality of result values.).

Regarding Claim 8, Dally teaches:
	The neural inference chip of claim 2, wherein computing the partial sum comprises applying at least one of the plurality of vector compute units to multiply the input activations and synaptic weights (Paragraph 0044; Importantly, only non-zero weights and input activations are transmitted to the multiplier array within each PE 210. Additionally, the input activation vectors may be reused within each PE 210 in an input stationary fashion against a number of weight vectors to reduce data accesses. The products generated by the multipliers are then summed together to generate the partial sums and the output activations).

Regarding Claim 9, Dally teaches:
	The neural inference chip of claim 2, wherein computing the assigned output activations comprises applying a plurality of addition units (Paragraph 0061; the accumulation unit 245 must employ at least F×I adders in the adder unit 255 to match the throughput of the F×I multiplier array 240.).

Regarding Claim 10, Dally teaches:
	The neural inference chip of claim 2, wherein computing output activations comprises applying a nonlinear function (Paragraph 0076; When the output-channel group is complete, the post-processing unit 345 performs the following tasks: (1) exchange partial sums with neighboring PEs 210 for the halo regions at the boundary of the PE's 210 output activations, (2) apply the non-linear activation (e.g. ReLU).).

Regarding Claim 11, Dally teaches:
	The system of claim 2, wherein the vector compute units are configured to: perform a plurality of multiply operations in parallel; perform a plurality of additions in parallel; and accumulating the partial sum (Paragraph 0066; Parallelism within a PE 210 is accomplished by processing a vector of F non-zero filter weights a vector of I non-zero input activations in within the F×I multiplier array 325.).

Regarding Claim 12, Dally teaches:
(Paragraph 0060; To exploit the parallelism of many multipliers within a PE 220, a vector of F filter-weights may be fetched from the weight buffer 230 and a vector of I inputs may be fetched from the input activations buffer 235. The vectors are delivered to an array of F×I multipliers 240 to compute a full Cartesian product of output partial sums.).

Regarding Claim 13, Dally teaches:
	The system of claim 1, wherein said computation by each of the plurality of neural cores is pipelined (Paragraph 0079; The different logic blocks within the PE 210 may be pipelined as needed to achieve a target clock rate).

Regarding Claim 14, Dally teaches:
	The system of claim 13, wherein each of the plurality of neural cores is configured to concurrently perform each stage of said computation (Paragraph 0079; The different logic blocks within the PE 210 may be pipelined as needed to achieve a target clock rate. However, the pipeline registers between pipeline stages need to freeze if the logic block receiving data output by the pipeline registers is stalled. Alternatively, elastic buffers can be used between the pipeline stages to simplify the distribution of a ready signal that indicates data can be accepted.).

Regarding Claim 16 it is substantially similar to Claim 1 and Claim 2 without any further teachings, and is rejected in the same manner, the same art and reasoning applying.

Regarding Claim 17, it is substantially similar to Claim 3 without any further teachings, and is rejected in the same manner, the same art and reasoning applying.

Regarding Claim 18, it is substantially similar to Claim 4 without any further teachings, and is rejected in the same manner, the same art and reasoning applying.

Regarding Claim 19, it is substantially similar to Claim 5 without any further teachings, and is rejected in the same manner, the same art and reasoning applying.

Regarding Claim 20, it is substantially similar to Claim 6 without any further teachings, and is rejected in the same manner, the same art and reasoning applying.

Regarding Claim 21, it is substantially similar to Claim 7 without any further teachings, and is rejected in the same manner, the same art and reasoning applying.

Regarding Claim 22, it is substantially similar to Claim 8 without any further teachings, and is rejected in the same manner, the same art and reasoning applying.

Regarding Claim 23, it is substantially similar to Claim 9 without any further teachings, and is rejected in the same manner, the same art and reasoning applying.



Regarding Claim 25, it is substantially similar to Claim 11 without any further teachings, and is rejected in the same manner, the same art and reasoning applying.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 15 are rejected under 35 U.S.C. 103 as being unpatentable over Dally (US 20180046906), in view of Henry (US 20180189651)

Regarding claim 15, Dally teaches the system of claim 14, but Dally does not explicitly teach wherein said computation maintains parallelism.
	However, Henry teaches:
(Paragraph 0127; In one embodiment, the NPU 126 is pipelined. Paragraph 0131; For each instruction of the program, all of the NPUs 126 perform the instruction in parallel.).
	Further, it would have been obvious to one of ordinary skill in the art, prior to the effective filing date, to combine the method of Dally and the method of Henry in order to efficiently compute the neuron output values for a layer of the network (Paragraph 0506).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PHAT MINH DANG whose telephone number is (571)272-8665. The examiner can normally be reached Monday - Friday 7:30am - 5:30pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached on (571) 272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-





/P.M.D./Examiner, Art Unit 2121                                                                                                                                                                                                        



/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121