DETAILED ACTION

Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement

The information disclosure statements (IDS) submitted on April 23, 2020 and October 22, 2020 were filed after the mailing date of the application on November 13, 2019.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statements are being considered by the examiner.

Specification

The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. 

Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-7, 15, and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Han et al. (US 2018/0174036).

As to claim 1, Han et al. disclose a neural network circuitry (Figure 7, with further description in Figure 6) comprising: a first plurality of logic cells that are interconnected to form neural network computation units (e.g. plurality of channels comprising multiple processing elements (PEs) and various other components) that are configured to perform approximate computations (e.g. Figure 7 illustrates plurality of channels, each channel comprising multiple processing elements (PEs), where [0051] and [0059] notes PEs are computation units for a slice of input vectors with partial weight matrix, and all PEs are averagely partitioned into several channels, Figure 6 further illustrates each channel comprising multiple PEs and various other major components, such as input-vector queue (activation vector queue) 601, sparse matrix vector multiplier (SpMV) 610, adder tree 620, sigmoid and tanh units (Sigmoid/Tanh) 630, accumulator, element-wise multiplier (ElemMul) 640 and 650, and a plurality of buffers); and a second plurality of logic cells that are interconnected to form a controller hierarchy (e.g. plurality of controllers, including accelerator controller, PCIE controller, and MEM controller) that is interfaced with the neural network computation units (e.g. plurality of channels) to control pipelining of the approximate computations performed by the neural network computation units (e.g. plurality of channels)(e.g. Figure 7 illustrates accelerator controller with PCIE controller and MEM controller, where [0059] notes accelerator controller determines the behavior of other circuits on FPGA chip, and schedules PCIE/MEM controller for data fetch and the LSTM computation pipeline flow of accelerator).

As noted above, Han et al. describes its accelerator and the system to work directly on compressed long short term memory (LSTM) model to accelerate the LSTM, where the LSTM is composed of a plurality of cells for computations as described in Figures 1-5 and associated text, where the accelerator with the LSTM comprises the “first plurality of logic cells” and “second plurality of logic cells,” each interconnected to form the components as described above.  Thus, as described, the accelerator and the system is optimized across the algorithm-software-hardware boundary, where complex LSTM operations on FPGA may be mapped to achieve parallelism ([0061]).
   
As to claim 2, Han et al. disclose the neural network computation units comprise a neural network ([0051] and [0059] note the channels includes processing elements (PEs) to perform most computation tasks in the long short term memory (LSTM) model, where [0021] notes LSTM is a specific recurrent neural network (RNN) architecture).

As to claim 3, Han et al. disclose the neural network is a recurrent neural network of a long short-term memory type ([0021] notes LSTM is a specific recurrent neural network (RNN) architecture).

As to claim 4, Han et al. disclose the neural network computation units (e.g. plurality of channels) comprise: a plurality of matrix-vector multiplier units (e.g. sparse matrix-vector multipliers (SpMV) 610) that are configured to multiply input data by weight data ([0054] notes in SPMV, each element in the input vector is multiplied by its corresponding weight column so a new vector may be obtained); a plurality of ternary adders (e.g. accumulator 604, assembler 606, and/or adder tree 620) configured to add bias values to products of the plurality of matrix-vector multiplier units (SpMV 610)(e.g. Figure 6 illustrates results from SpMV 610 to accumulator, then further to ACT buffer 605, assembler 606, and adder tree 620, where [0056] notes adder tree 620 performs summation by consuming the intermediate data produced by other units or bias data from input buffer); a plurality of activation function units (sigmoid and tanh units (sigmoid/tanh) 630) configured to receive output from the plurality of ternary adders (e.g. accumulator 604, assembler 606, and/or adder tree 620) and provide non-linear response values to neurons comprising the neural network (e.g. Figure 6 illustrates results from accumulator to ACT buffer 605, assembler 606, adder tree 620, and sigmoid/tanh 630, where [0057] notes sigmoid/tanh 630 are non-linear modules applied as activation functions to some intermediate summation results); a first element-wise approximate multiplier (e.g. element-wise multiplier (ElemMul) 650) and adder unit (e.g. adder tree 620) configured to multiply and accumulate the non-linear response values to provide accumulated values (e.g. Figure 6 illustrates results from ElemMul 650 input to adder tree 620, where [0055] notes each of ElemMul 640 and 650 generates one vector by consuming two vectors, each element in the output vector is the element wise multiplication of two input vectors, [0056] notes adder tree 620 performs summation by consuming the intermediate data produced by other units or bias data from input buffer); and an output element-wise approximate multiplier unit (e.g. element-wise multiplier (ElemMul) 640) configured to multiply feedback values from an output of the neural network computation units with the accumulated values (e.g. Figure 6 illustrates results from adder tree 620 to sigmoid/tanh 630 and element-wise multiplier (ElemMul) 640, [0055] notes each of ElemMul 640 and 650 generates one vector by consuming two vectors, each element in the output vector is the element wise multiplication of two input vectors).

As to claim 5, Han et al. disclose the controller hierarchy (e.g. plurality of controllers, including accelerator controller, PCIE controller, and MEM controller) comprises: a matrix-vector multiplier controller configured to control multiplication operations of the plurality of matrix-vector multiplier units; a second element-wise approximate multiplier and adder unit controller configured to control multiplication and addition operations of the element-wise approximate multiplier and adder unit; an output element-wise multiplier controller configured to control multiplication operations of the output element-wise approximate multiplier unit; and a top-level controller to control pipelining between the matrix-vector multiplier controller, the element-wise approximate multiplier and adder unit and the output element-wise approximate multiplier unit ([0059] notes accelerator controller determines the behavior of other circuits on FPGA chip, and schedules PCIE/MEM controller for data fetch and the LSTM computation pipeline flow of accelerator, thus may be considered to control each of the components as described above, e.g. the components of each of the plurality of channels as outlined in Figure 6 and associated text, see rejection of claim 4).

As to claim 6, Han et al. disclose intermediate buffers coupled between the controller hierarchy (e.g. plurality of controllers, including accelerator controller, PCIE controller, and MEM controller) and the neural network computation units (e.g. plurality of channels), which are configured to store intermediate results of the neural network computation units to reduce external memory access time (Figure 6 illustrates and [0051] notes lots of buffers for storing intermediate results, where it is understood the intermediate buffers serves as local memories to reduce time of accessing an external memory, such as a system memory).

As to claim 7, Han et al. disclose the neural network computation units include approximate multiplier units that are configured to perform approximate multiplications that comprise the approximate computations (e.g. sparse matrix-vector multiplication)(e.g. Figure 7 illustrates plurality of channels, each channel comprising multiple processing elements (PEs), where [0051] and [0059] notes PEs are computation units for a slice of input vectors with partial weight matrix, and all PEs are averagely partitioned into several channels, Figure 6 further illustrates each channel with multiple PEs and various other major components, such as input-vector queue (activation vector queue) 601, sparse matrix vector multiplier (SpMV) 610, adder tree 620, sigmoid and tanh units (Sigmoid/Tanh) 630, accumulator, element-wise multiplier (ElemMul) 640 and 650, and a plurality of buffers).

As to claim 15, Han et al. disclose the first plurality of logic cells and the second plurality of logic cells comprise a field-programmable gate array (Figure 7 illustrates hardware accelerator as part of field programmable gate array (FPGA)).

As to claim 16, Han et al. disclose the first plurality of logic cells and the second plurality of logic cells comprise an application-specific integrated circuit ([0060] notes the hardware accelerator may be implemented as an application specific integrated circuit (ASIC) core).

Claim(s) 17-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Han et al. (US 2018/0174036) as applied to claim 16 above, and further in view of Zhang et al. (US 2019/0228285).

As to claim 17, Han et al. do not disclose, but Zhang et al. disclose the application-specific integrated circuit has an efficiency of at least 1.3 giga-operations per second per milliwatt when operated at a frequency of 322 MHz ([0061] notes test chip achieves 718 GOPs at 380MHz, where Figure 11 illustrates graph having 380MHz as the target frequency, but with a range between 100MHz and 380MHz, thus includes a frequency of 322MHz, where it is obvious the GOPs may be different at that frequency, [0064] notes system may include an application specific integrated circuit (ASIC)).

NOTE: Zhang et al. describes a configurable convolution neural network processor, but also discloses its configurable neural network processor may be implemented as a configurable recurrent neural network processor (see Figure 3 and associated text).  Since Zhang et al. describes its neural network processor as “configurable” which provides many advantages such as flexibility and versatility (see [0004]), thus may accommodate many differences other than those explicitly described, including that noted in the claim above. 

Therefore, it would have been obvious to one of ordinary skill in the art at the time of the invention to modify Han et al.’s neural network circuitry including an ASIC to achieve the operations as described in Zhang et al. to provide a powerful, yet power-efficient circuitry, thus enhancing the performance of the system (see [0061] of Zhang). 

As to claim 18, Han et al. do not disclose, but Zhang et al. disclose the application-specific integrated circuit is configured to perform between 27 giga-operations per second and 50 giga-operations per second ([0061] notes test chip achieves 718 GOPs at 380MHz, where Figure 11 illustrates graph having 380MHz as the target frequency, but with a range between 100MHz and 380MHz, thus includes less frequencies which may result in lower GOPs, where it is obvious this may include GOPs between 27 GOPs and 50 GOPs, [0064] notes processor may include an application specific integrated circuit (ASIC)).

NOTE: Zhang et al. describes a configurable convolution neural network processor, but also discloses its configurable neural network processor may be implemented as a configurable recurrent neural network processor (see Figure 3 and associated text).  Since Zhang et al. describes its neural network processor as “configurable” which provides many advantages such as flexibility and versatility (see [0004]), thus may accommodate many differences other than those explicitly described, including that noted in the claim above. 

Therefore, it would have been obvious to one of ordinary skill in the art at the time of the invention to modify Han et al.’s neural network circuitry including an ASIC to achieve the operations as described in Zhang et al. to provide a powerful, yet power-efficient circuitry, thus enhancing the performance of the system (see [0061] of Zhang). 

As to claim 19, Han et al. do not disclose, but Zhang et al. disclose the first plurality of logic cells and the second plurality of logic cells integrated into the application-specific integrated circuit have cell area that is less than one-half of a square millimeter ([0061] notes 4.1mm2 test chip is implemented in 40nm CMOS and the configurable convolution processor occupies 2.56mm2, further, a total of 49 VCOs are instantiated, with each VCO occupying only 250 um2 area, where nm and um2 are smaller (less than) than mm2, [0064] notes processor may include an application specific integrated circuit (ASIC)).

NOTE: Zhang et al. describes a configurable convolution neural network processor, but also discloses its configurable neural network processor may be implemented as a configurable recurrent neural network processor (see Figure 3 and associated text).  Since Zhang et al. describes its neural network processor as “configurable” which provides many advantages such as flexibility and versatility (see [0004]), thus may accommodate many differences other than those explicitly described, including that noted in the claim above. 

Therefore, it would have been obvious to one of ordinary skill in the art at the time of the invention to modify Han et al.’s neural network circuitry including an ASIC to be implemented in a small area as described in Zhang et al. to reduce energy waste and cost of manufacturing,  enhancing the performance of the system (see [0061] of Zhang). 

As to claim 20, Han et al. do not disclose, but Zhang et al. disclose the application-specific integrated circuit operates on less than 21 milliwatts at a core voltage of less than 1.2 volts (Figure 11 illustrates graph having power (mW) (y-axis) with respect to frequency (MHz) (x-axis) and voltage (V), where at lower power (mW) (which range between 1mW and 40mW), the voltage is 0.6V, thus less than 21mW at a core voltage less than 1.2V, [0064] notes processor may include an application specific integrated circuit (ASIC)).

NOTE: Zhang et al. describes a configurable convolution neural network processor, but also discloses its configurable neural network processor may be implemented as a configurable recurrent neural network processor (see Figure 3 and associated text).  Since Zhang et al. describes its neural network processor as “configurable” which provides many advantages such as flexibility and versatility (see [0004]), thus may accommodate many differences other than those explicitly described, including that noted in the claim above. 

Therefore, it would have been obvious to one of ordinary skill in the art at the time of the invention to modify Han et al.’s neural network circuitry including an ASIC to achieve the operations as described in Zhang et al. to provide a powerful, yet power-efficient circuitry, thus enhancing the performance of the system (see [0061] of Zhang). 

Allowable Subject Matter

Claims 8-14 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including ALL of the limitations of the base claim AND any intervening claims.

The following is a statement of reasons for the indication of allowable subject matter:  Regarding claim 8, the prior art of record fails to teach the limitations of the claim as recited.  Dependent claims 9, 10, 12 and 13 are indicated allowable for depending upon indicated allowable claim 8, where dependent claim 11 further depends upon dependent claim 10, and dependent claim 14 further depends upon dependent claim 13.

Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Lei (US 2019/0122101) disclose a system and method of reducing the amount of time to train a neural network by modifying the neural network to allow greater parallelizing of computations, such that cells do not depend on a previous cell, thus allow computations, such as matrix-vector multiplications to be performed outside of the cells.
Diamos et al. (US 2017/016936) disclose a system comprising a multi-core optimized recurrent neural network (RNN) architecture when mapping onto a modern general purpose processor to improve performance.	

Any inquiry concerning this communication or earlier communications from the examiner should be directed to JACINTA M CRAWFORD whose telephone number is (571)270-1539. The examiner can normally be reached 9:00 a.m. to 5:00 p.m.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Mehmood can be reached on (571)272-2976. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/JACINTA M CRAWFORD/Primary Examiner, Art Unit 2612