DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 08/04/2022 has been entered.

Response to Arguments
Applicant’s arguments, see pages 6-7 of reply, filed 08/04/2022, with respect to the rejection(s) of claim(s) 1-20 under 35 U.S.C. 102(a)(1) as being anticipated by Ding et al., “Designing efficient accelerator of depthwise separable convolutional neural network on FPGA”, have been fully considered and are persuasive. Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of Wei et al., “TGPA: Tile-Grained Pipeline Architecture for Low Latency CNN Inference”.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claim(s) 1-9 and 11-20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Wei et al., “TGPA: Tile-Grained Pipeline Architecture for Low Latency CNN Inference” (herein Wei).
Regarding claims 1 and 12, taking claim 1 as exemplary, Wei teaches a device comprising: 
memory configured to store first data for a first layer of a neural network [Off-chip DDR memory stores weights and activations for the neural networks. Wei at section 3.1, 1st paragraph; section 3.2; Fig. 3 and 5]; 
first circuitry comprising a first plurality of processing element (PE) circuits arranged as a first systolic array [The first accelerator die in the architecture, wherein each accelerator comprises a plurality of processing elements (PEs) arranged in a systolic array. Wei at section 3.1, 1st paragraph; section 3.2; Fig. 3], the first plurality of PE circuits configured to read the first data from the memory and to perform computation for the first layer of the neural network using the first data to generate second data, the first plurality of PE circuits including a first PE circuit performing computation for a first node for the first layer and a second PE circuit performing computation for a second node for the first layer [Each accelerator executes a respective single layer, thereby having PEs operating on different nodes (i.e. first and second nodes) of the layer executed. Wei at section 3.1, 1st paragraph; section 3.2; Fig. 5], the first circuitry further comprising a plurality of buffers configured to output the generated second data as input directly to second circuitry to perform computation for a second layer of the neural network without first storing the generated second data in the memory [Each accelerator die comprises a stream buffer comprising a plurality of buffers to store the generated output (i.e. a tile) for the next accelerator. Wei at section 3.1; Fig. 3]; and 
the second circuitry comprising a second plurality of PE circuits arranged as a second systolic array [The second accelerator die in the architecture, wherein each accelerator comprises a plurality of PEs arranged in a systolic array. Wei at section 3.1, 1st paragraph; section 3.2], the second plurality of PE circuits configured to perform computation for the second layer of the neural network using the second data [Each accelerator operates on its own layer in pipelined fashion, therefore second accelerator operates on the next layer using the output from the first accelerator. Wei at section 3.1; Fig. 3].

Regarding claims 2 and 13, taking claim 2 as exemplary, Wei teaches the device of claim 1, wherein the first plurality of PE circuits is configured to perform computation for at least one node of the neural network while the second plurality of PE circuits is performing computation for the second layer of the neural network [The first accelerator (i.e. comprising the first plurality of PEs) and the second accelerator (i.e. comprising the second plurality of PEs) operate in pipelined fashioned and, therefore, the first accelerator operates on a layer (i.e. at least one node) while the second accelerator operates on the second layer. Wei at section 3.1; Fig. 3 and 5].

Regarding claims 3 and 14, taking claim 3 as exemplary, Wei teaches the device of claim 2, wherein the at least one node is from a third layer of the neural network or from the first layer of the neural network [The first accelerator operates on the next consecutive layer (i.e. a third layer). Wei at section 3.1; Fig. 3].

Regarding claims 4 and 15, taking claim 4 as exemplary, Wei teaches the device of claim 1, wherein the plurality of buffers is configured to output the generated second data as input to the second circuitry by bypassing any transfer of the second data into or out of the memory [The data stored in the stream/activation buffers for output to the next accelerator, rather than the DRAM, thereby bypassing the memory. See Wei at section 3.2; Fig. 3 and 5].

Regarding claims 5 and 16, taking claim 5 as exemplary, Wei teaches the device of claim 1, wherein the second plurality of PE circuits is further configured to use the second data to generate third data [The second accelerator operates on the second layer, thereby producing third data. Wei at section 3.1; Fig. 3].

Regarding claims 6 and 17, taking claim 6 as exemplary, Wei teaches the device of claim 5, wherein the second plurality of PE circuits is further configured to store the generated third data to the memory [Each accelerator, including the second accelerator and it’s plurality of PEs, can output the generated data to DRAM. See Wei at Fig. 3 and 5; section 3.2].

Regarding claims 7 and 18, taking claim 7 as exemplary, Wei teaches the device of claim 5, wherein the second circuitry further comprises a plurality of buffers configured to output the generated third data as input to third circuitry [Each accelerator die, including the second accelerator die, comprises a stream buffer comprising a plurality of buffers to store the generated output (i.e. a tile) for the next accelerator die (i.e. third circuitry). Wei at section 3.1; Fig. 3.].
‘
Regarding claims 8 and 19, taking claim 8 as exemplary, Wei teaches the device of claim 1, wherein the first data comprises at least one of weight or activation information for the first layer of the neural network, and the second data comprises at least one of weight or activation information for the second layer of the neural network [The data for each layer comprise weights and activations for the respective layer. See Wei at section 3.1; Fig. 3].

Regarding claims 9 and 20, taking claim 9 as exemplary, Wei teaches the device of claim 1, wherein the first plurality of PE circuits is configured to perform a convolution operation using the first data, and the second plurality of PE circuits is configured to perform dot-product operations using the second data [Each accelerator (i.e. first and second plurality of PEs) performs convolution operations for a CNN, which includes dot product operations. Section 3.1, 1st paragraph].

Regarding claim 11, Wei teaches the device of claim 1, wherein the plurality of buffers is configured with sufficient capacity to buffer the generated second data and output the generated second date to the second circuitry [The stream buffers enabling pipelining with multiple accelerators and, therefore, have sufficient capacity to buffer the generated data. See Wei at section 3.1, 1st paragraph; section 3.3, 1st paragraph; Fig. 3].

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wei in view of Ding et al., “Designing efficient accelerator of depthwise separable convolutional neural network on FPGA” (herein Ding).
Regarding claim 10, Wei teaches the device of claim 1. Wei doesn’t teach that the first circuitry and the second circuitry are formed on a same semiconductor device. In the same field of neural network processing, Ding teaches a device for processing CNNs comprising a first circuitry and a second circuitry, wherein the first circuitry and the second circuitry are formed on a same semiconductor device [The architecture comprises a plurality of computing engines (i.e. first and second circuitry), wherein the architecture is formed on a single FPGA. Ding at section 4.1, 1st - 2nd paragraph; section 6.1; Fig. 4]. Ding teaches that the implementation provides high performance, power, and resource efficiency [Ding at section 6.2, last paragraph; Table 7]. Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify Wei’s architecture so that the accelerators (i.e. first and second circuitry) are formed on a same semiconductor device, as taught by Ding in order to provide high performance, power, and resource efficiency.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BENJAMIN P GEIB whose telephone number is (571)272-8628. The examiner can normally be reached Monday - Friday 8:30 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ALEXEY SHMATOV can be reached on (571)270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/BENJAMIN P GEIB/Primary Examiner, Art Unit 2123