Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

1.  Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Lavigueur et al (US 2017/0236053, herein Lavigueur) in view of Phelps et al (US 2018/0036165, herein Phelps).

Regarding claim 1, Lavigueur teaches a deep neural network (DNN) accelerator comprising:
an instruction pipeline configured to receive one or more chains of instructions, wherein at least one of one or more chains of instructions includes both instructions for performing a first class of operations corresponding to a neural network model and instructions for performing a second class of operations corresponding to the neural network model ([0025-0026], [0030-0031], specialized instructions for implementing neural network operations);
a first datapath comprising at least one matrix register file and at least one matrix vector multiplier ([0007], [0027], [0029], input matrices and vector registers, [0025], [0029], multiple multiply-accumulators per processing element); and
a second datapath, different from the first datapath, comprising at least one vector register file and at least one function unit (Fig 2, [0025], [0027], [0029], second PE of parallel PEs), wherein each of the first datapath and the second datapath is configured to execute at least one instruction chain locally before outputting any results (Fig 6, [0030], [0049], a single PE may implement a layer of computations), and wherein the instruction pipeline is configured to forward at least a first set of instructions for performing the first class of operations to the first datapath and forward at least a second set of instructions for performing the second class of operations to the second datapath, and wherein the first datapath and the second datapath are configured to overlap in time a performance of at least a subset of the first class of operations corresponding to the first set of the instructions with a performance of at least a subset of the second class of operations corresponding to the second set of the instructions (Figs 2, 3, 6 & [0026-0032], [0046-0049], PEs operate in parallel to implement processing of a given layer, FIFOs provided to pass results to next PE, [0033], convolution and activation performed in parallel).
Lavigueur fails to teach wherein the accelerator comprises an instruction dispatcher configured to receive the instructions.
Phelps teaches a deep neural network (DNN) accelerator comprising an instruction dispatcher to receive one or more chains of instructions ([0039], dispatcher).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the teachings of Lavigueur and Phelps to utilize an instruction dispatcher for providing instructions to the parallel operations units.  While Lavigueur does not disclose any explicit structural details of the multiple processing elements provided for implementing neural network computation, one of ordinary skill in the art would understand that an instruction dispatcher is a routine and conventional aspect of pipelined instruction processing as performed by the architecture disclosed by Lavigueur (Lavigueur [0026], instruction pipelining).  Given that both Lavigueur and Phelps disclose parallel architectures for implementing neural network models (Lavigueur [0022], Phelps [0042]), and both disclose parallel processing elements including both scalar and vector processing pipelines (Lavigueur Fig 4, [0026] & Phelps Fig 1B, [0036]), combining their teachings by implementing an instruction dispatcher for dispatching the instructions described by Lavigueur would merely entail a combination of known prior art elements to achieve predictable results.

Regarding claim 2, the combination of Lavigueur and Phelps teaches the DNN accelerator of claim 1, wherein the first class of operations comprises matrix-matrix multiply operations or matrix-vector multiply operations, and wherein the second class of operations comprises activation operations, softmax operations, or layer normalization operations (Lavigueur [0026-0027] & [0033], [0042], matrix multiplication in convolution pipeline & activation operations in activation pipeline, Phelps ).

Regarding claim 3, the combination of Lavigueur and Phelps teaches the DNN accelerator of claim 1, wherein the first datapath is coupled to the second datapath via a first local switch, and wherein the first local switch is configured to forward at least one intermediate result generated from a performance of any operations, using the first datapath, corresponding to a first instruction chain to the second datapath (Lavigueur Fig 5, [0032], [0048-0050], passing results between PEs using FIFOs and ports to implement operations).

Regarding claim 4, the combination of Lavigueur and Phelps teaches the DNN accelerator of claim 3, herein the second datapath is coupled to the first datapath via a second local switch, and wherein the second local switch is configured to forward at least one intermediate result generated from a performance of any operations, using the second datapath, corresponding to a second instruction chain to the first datapath (Lavigueur Fig 5, [0027], intermediate results & [0032], [0048-0050], passing results between PEs).

Regarding claim 5, the combination of Lavigueur and Phelps teaches the DNN accelerator of claim 1, wherein an instruction chain comprises a set of single-instruction-multiple-data (SIMD) instructions arranged as a single thread, and wherein the single thread starts with a read instruction and ends with a write instruction (Phelps [0045], SIMD instructions & Lavigueur [0025], [0031], instruction groups with read & write DMA instructions).

Regarding claim 6, the combination of Lavigueur and Phelps teaches the DNN accelerator of claim 1, wherein the first datapath is coupled to at least a first local memory and the second datapath is coupled to at least a second local memory, and wherein the first datapath is configured to perform operations without accessing the at least the second local memory and the second datapath is configured to perform operations without accessing the at least the first local memory (Lavigueur Fig 2, [0025], [0030], local memory per PE).

Regarding claim 7, the combination of Lavigueur and Phelps teaches the DNN accelerator of claim 6, wherein the neural network model comprises a transformer-based model (Lavigueur [0029], non-linear transformations).

Claims 8-14 refer to a method embodiment of the accelerator embodiment of claims 1-7, respectively.  Therefore, the above rejections for claims 1-7 are applicable to claims 8-14.

Regarding claim 15, Lavigueur teaches a deep neural network (DNN) accelerator comprising:
a first datapath comprising at least one matrix register file and at least one matrix vector multiplier ([0007], [0027], [0029], input matrices and vector registers, [0025], [0029], multiple multiply-accumulators per processing element); and
a second datapath, different from the first datapath, comprising at least one vector register file and at least one function unit (Fig 2, [0025], [0027], [0029], second PE of parallel PEs)
a switch coupled to both a first local memory associated with the first datapath and to a second local memory associated with the second datapath (Fig 2, [0025], [0030], local memory per PE & Fig 5, [0032], [0048-0050], passing results between PEs using FIFOs and ports to implement operations);
an instruction pipeline configured: (1) to access one or more chains of instructions from at least one instruction from at least one instruction memory, wherein at least one of one or more chains of instructions includes both instructions for performing a first class of operations corresponding to a neural network model and instructions for performing a second class of operations corresponding to the neural network model ([0025-0026], [0030-0031], specialized instructions for implementing neural network operations) and (2) to split the at least one or more chains of instructions into a first chain comprising instructions for performing only the first class of operations and a second chain comprising instructions for performing only the second class of operations ([0026-0027], different instruction group types per pipeline of PE); and
wherein the instruction pipeline is further configured to forward the first chain to the first datapath and forward the second chain to the second datapath, wherein
first datapath is configured to execute the first chain using the first local memory before outputting any results and the second datapath is configured to execute the second chain using the second local memory before outputting any results, and wherein the first datapath and the second datapath are configured to overlap in time a performance of at least a subset of the first class of operations corresponding to the first set of the instructions with a performance of at least a subset of the second class of operations corresponding to the second set of the instructions (Figs 2, 3, 6 & [0026-0032], [0046-0049], PEs operate in parallel to implement processing of a given layer, FIFOs provided to pass results to next PE, [0033], convolution and activation performed in parallel).
Lavigueur fails to teach wherein the accelerator comprises an instruction dispatcher configured to receive the instructions or an instruction queue.
Phelps teaches a deep neural network (DNN) accelerator comprising an instruction dispatcher to receive one or more chains of instructions and an instruction queue ([0039], dispatcher & [0053], decode & issue unit).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the teachings of Lavigueur and Phelps to utilize an instruction dispatcher for providing instructions to the parallel operations units.  While Lavigueur does not disclose any explicit structural details of the multiple processing elements provided for implementing neural network computation, one of ordinary skill in the art would understand that an instruction dispatcher is a routine and conventional aspect of pipelined instruction processing as performed by the architecture disclosed by Lavigueur (Lavigueur [0026], instruction pipelining).  Given that both Lavigueur and Phelps disclose parallel architectures for implementing neural network models (Lavigueur [0022], Phelps [0042]), and both disclose parallel processing elements including both scalar and vector processing pipelines (Lavigueur Fig 4, [0026] & Phelps Fig 1B, [0036]), combining their teachings by implementing an instruction dispatcher for dispatching the instructions described by Lavigueur would merely entail a combination of known prior art elements to achieve predictable results.

Claims 16-19 and 20 refer to an alternate accelerator embodiment of claims 2-5 and 7, respectively.  Therefore, the above rejections for claims 2-5 and 7 are applicable to 16-19 and 20, respectively.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Takata (US 2019/0171419) discloses a neural network processor for performing activation functions and matrix multiplication in parallel.
Temam (US 2019/0050717) discloses a neural network processor for performing activation functions and matrix multiplication in parallel.
Ouyang (US 2018/0052685) discloses a neural network processor for performing activation functions and matrix multiplication in parallel and in succession.
Feinson (US 2019/0236464) discloses a neural network processor using a transformer network.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL J METZGER whose telephone number is (571)272-3105. The examiner can normally be reached Monday-Friday 7:30-4.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on 571-270-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MICHAEL J METZGER/             Primary Examiner, Art Unit 2182