DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-18 is/are rejected under 35 U.S.C. 103 as being unpatentable over
Henry (European patent application publication EP 3330899 A1) (submitted by Applicant). 
Henry taught the invention substantially  as claimed including (as to claim 1) including A neural engine circuit (neural network unit 121) (e.g., see fig. 1 and paragraph 0006) comprising: a plurality of multiply-add circuits(wide ALU 204A, narrow ALU 204B)(e.g., see fig. 18) configured to perform multiply-add operations of a three dimensional (3D) convolution on a work unit of input data using a kernel to generate at least a portion of output data in a processing cycle (e.g., see fig. 18,  fig. 58; paragraphs 0403-0404)   ; and an accumulator circuit coupled to the plurality of multiply-add circuits, the accumulator circuit comprising multiple batches of accumulators (accumulators 202A,202B) (e.g., see fig. 18 and paragraphs 0420-0421), each of the batches of accumulators configured to, after the processing cycle, receive the portion of the output data for each output depth plane of a plurality of output depth planes, the output depth plane comprising the portion of the output data for an output channel having an output width and an output height (e.g., see paragraphs 0404 and 0420-0421,0119-0122).[note as the claim is understood the accumulator(s) may perform accumulation  multiple times after a first accumulation in a first cycle  wherein subsequently the accumulators of Henry accumulating the data of each plane one after another meets the limitations of the claim]. 
Henry did not expressly detail that the accumulators stored the output data however Henry taught (e.g., see paragraphs,0121-0122,0382,0418) the loading result to a accumulator  and then outputting the data for combining with  other result data in a later cycle. This implicitly requires storage of the previous result(s). One of ordinary skill would have been motivated to store the result data in the accumulator at least to ensure that the data was not lost and therefore would be available for combination/accumulation with result from succeeding cycles of multiplication of input data. The storing of the data would simplify the circuitry for properly timed accumulation of results of multiplication that were generated repeatedly in a loop and this would reduce system cost.
	Due to the similarities between claims 1 and 10; clam 10 is rejected for the same reasons and claim 1 above.
As to claims 2, 11 Henry taught  The neural engine circuit of claim 1, wherein the plurality of multiply-add circuits is further configured to: perform the multiply-add operations (e.g., see fig. 18) as part of the 3D convolution to generate the output data (e.g., see paragraphs 0403-0405) comprising the plurality of output depth planes having the output width and the output height for each output channel of a plurality of output channels (e.g., see figs. 18, 58 and paragraphs 0404 and 0420-0421)[note the “F” PxQ outputs 5808 in paragraph 0405, lines 11-13 and fig. 58 provide the output channels].

As to claims 3, 12 Henry taught  The neural engine circuit of claim 2,  but did not expressly detail wherein the accumulator circuit is further configured to: store, after the processing cycle, the portion of the output data for a subset of the output channels and for each output depth plane in the corresponding batch of accumulators. One of ordinary skill would have been motivated to store the portion of the output data for a subset of the output channels (e.g., see fig 58 and paragraph 0404 and 0420 and 0119-0122) and for each output depth plane in the corresponding batch of accumulators in the accumulator at least to ensure that the data was not lost and therefore would be available for combination/accumulation with result from succeeding cycles of multiplication of input data when performing 3-dimensional convolution. The storing of the data would simplify the circuitry for properly timed accumulation of results of multiplication that were generated repeatedly in a loop and this would reduce system cost.


As to claims 4,13 Henry taught  the neural engine circuit of claim 1, wherein the input data comprises multiple input depth planes (C channels of input 5802)  having an input width and an input height for each input channel of a plurality of input channels, and the kernel comprises multiple kernel depth planes(C channels of F filters) having a kernel width and a kernel height (e.g., see fig. 58  and paragraph 0404).

As to claims 5,14 Henry taught the neural engine circuit of claim 4, including two accumulators in fig. 18) and accumulating for each of C channels (e.g., see paragraph 0416) wherein a number of the batches of accumulators is equal to a number of the kernel depth planes (e.g., see paragraph 0421).

As to claims 6,15 Henry taught the neural engine circuit of claim 1, wherein the neural engine circuit is configured to: receive, during a clock cycle, a depth slice (channel) of the input data (RAM 122) from a data buffer (RAM 124) located between the neural engine circuit and a system memory external to the neural engine circuit (e.g., see paragraphs 414-0416 and fig. 1).

As to claims 7,16 Henry taught the neural engine circuit of claim 6, wherein the multiply-add circuits and the accumulators are configured to: perform multiply-accumulate operations with partial accumulations as part of the 3D convolution on the work unit of the depth slice of the input data and the kernel to generate partial output sums stored in the batches of accumulators (e.g., see figs. 11,18,49,52 and paragraphs 0404, 0419-0420).

As to claims 8,17 Henry taught the neural engine circuit of claim 7, wherein the partial output sums are accumulated in the batches of accumulators are associated with all output depth planes of the output data and at least a portion of output channels (e.g., see paragraphs 0421,0404). One of ordinary skill would have been motivated to store the result data in the accumulator at least to ensure that the data was not lost and therefore would be available for combination/accumulation with result from succeeding cycles of multiplication of input data. The storing of the data would simplify the circuitry for properly timed accumulation of results of multiplication that were generated repeatedly in a loop and this would reduce system cost.


As to claims 9,18 Henry taught the neural engine circuit of claim 1, further comprising: a post-processor configured to scale the output data for each output depth plane by a scale factor predetermined for the output depth plane (e.g., see paragraph 0422 and 0404) [the saturation or compression of the accumulator values provides this limitation].
Allowable Subject Matter
Claim 19-20 are allowed.

The following is a statement of reasons for the indication of allowable subject matter.
Claim 19 requires among other things:
 “An electronic device comprising: at least one neural engine circuit including: plurality of multiply-add circuits configured to perform multiply-add operations of a three dimensional (3D) convolution on a work unit of input data using a kernel to generate at least a portion of output data in a processing cycle, and an accumulator…height; a planar engine circuit coupled to the at least one neural engine circuit configured to perform at least one planar operation on at least the portion of the output data; a data buffer configured to broadcast, during a clock cycle, a depth slice of the input data to the at least one neural engine circuit; and a kernel fetcher circuit configured to send the kernel to the at least one neural engine circuit.”
The closest prior art includes Henry. Henry taught some of the limitations of independent claim 19 as discussed above.  However Henry did not disclose among other things:
 	An electronic device comprising: at least one neural engine circuit including: plurality of multiply-add circuits configured to perform multiply-add operations of a three dimensional (3D) convolution on a work unit of input data using a kernel to generate at least a portion of output data in a processing cycle, and an accumulator…height; a planar engine circuit coupled to the at least one neural engine circuit configured to perform at least one planar operation on at least the portion of the output data; a data buffer configured to broadcast, during a clock cycle, a depth slice of the input data to the at least one neural engine circuit; and a kernel fetcher circuit configured to send the kernel to the at least one neural engine circuit.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
	Aliabadi (patent application publication No. 2018/0096226) disclosed data layout for neural networks (e.g., see abstract).
Hu (patent application publication No. 2018/0373981) disclosed method and device for optimizing neural network (e.g., see abstract).
Baum (patent application publication No. 2018/0285727) disclosed neural network processing element incorporating compute and local memory (e.g., see  abstract).
Huang (patent application publication No. 2019/0180167) disclosed apparatus for performing convolutional operations in a convolutional neural network (e.g., see abstract)

Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERIC COLEMAN whose telephone number is (571)272-4163. The examiner can normally be reached M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on 0-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

ERIC . COLEMAN
Primary Examiner
Art Unit 2183



EC
/ERIC COLEMAN/Primary Examiner, Art Unit 2183