DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA 
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 1/14/2021 has been entered.
Claims 1, 3, 11, 13, and 20 have been amended. Claims 1-20 are pending and have been examined.
Response to Arguments/Amendments
The amendments to the specification and drawings have obviated the prior objections, which are withdrawn accordingly.
Applicant’s arguments, see pp. 8-10, filed 1/14/2021, with respect to the rejection(s) of claim(s) 1, 11, and 20 under 35 USC § 103 (as well as each respective dependent claim) have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of U.S. Patent Application Publication 2017/0103298 by Ling et al. 

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 3 and 13 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. Claims 3 and 13 both refer to “lane widths in the adders and the multipliers.” ¶ 0088 in the originally filed disclosure refers to “hardware multiply and adder resources”. However, a description of associated “lane widths” was not found in the originally filed disclosure, and Applicant has not pointed out where such description could be found. For the purpose of further examination, this limitation will be interpreted as “bandwidth.”

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3-5, 8-11, 13-15, and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over cited art of record “ESE: Efficient Speech Recognition Engine with Compressed LSTM on FPGA” by Han et al. (“Han”) in view of U.S. Patent 6,044,211 to Jain (“Jain”), U.S. Patent Application Publication 2017/0103298 by Ling et al. (“Ling”), and U.S. Patent Application Publication 2014/0253575 by Yang et al. (“Yang”).

	In regard to claim 1, Han discloses:
1. A method in a design framework module implemented by an electronic device for generating an accelerator instance optimized to implement a recurrent neural network (RNN) algorithm comprising: See Han, at least section 1, p. 1, e.g. “ESE takes the approach of EIE [5] one step further to address a more general problem of accelerating not only feed forward neural networks but also recurrent neural networks and LSTM. … we design a novel method to optimize across the algorithm, software and hardware.”
obtaining, by the design framework module, a flow … for the RNN algorithm, the flow graph identifying a plurality of operations to be performed to implement the RNN algorithm and further identifying data dependencies between ones of the plurality of operations, See Han, section 1, p. 1, e.g. “we designed a data flow that can effectively schedule the complex RNN operations using multiple EIE cores.” Also see section 3, p. 3, e.g. “LSTM is a complicated dataflow, we want to have more parallelism and meet the data dependency at the same time.” Han does not expressly disclose obtaining a graph. However, Jain teaches this. See Abstract, e.g. “The behavioral level is represented as a Data Dependency Graph (DDG) having a plurality of operations (shown as nodes) and operands (shown as arcs) which connect the nodes.” It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Han’s data flow with Jain’s graph in order to utilize a user-friendly representation as suggested by Jain (see col. 4, lines 6-13).
wherein the plurality of operations include one or more matrix operations and one or more vector operations; and See Han, section 2, p. 2, e.g. “In our hardware implementation, matrix is divided into different sub-matrices and assigned to the corresponding processing elements (PEs).” Also see Fig. 4 along with associated text in section 3 on p. 3, e.g. “Operations in the fourth line are matrix-vector multiplications.”
… one or more matrix processing units and one or more vector processing units of an accelerator hardware template; See Han, Figs. 5(a) and 5(b) broadly depicting an architecture template including connection of matrix and vector processing units. Also see Han, section 1 on p. 2, e.g. “At hardware level, we design a new architecture that can works directly on the compressed model that could be efficiently mapped to FPGA.” While implying determination of an FPGA, Han does not expressly disclose determining, by the design framework module, hardware components of …. However, this is taught by Ling. See Ling, ¶ 0023, e.g. “According to an embodiment of the present invention, an architecture description of the design for the CNN accelerator is generated in response to the features of the CNN accelerator. The design for the CNN accelerator may be optimized for the target implementing the CNN accelerator.” Also see ¶ 0028, e.g. “At 203, resources available on a target to implement the CNN accelerator are identified. According to an embodiment of the present invention the target may include one or more target devices of one or more target device types. The resources identified may include a number and type of memory blocks, digital signal processors (DSPs), and other components and processing units on a target device.” It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Han’s processing units with Ling’s accelerator hardware components in order to optimize a design for available resources as suggested by Ling.
mapping, by the design framework module, the plurality of operations of the flow graph to the accelerator hardware template based on the determining to yield the accelerator instance … See Han, section 1 on p. 2, e.g. “At hardware level, we design a new architecture that can works directly on the compressed model that could be efficiently mapped to FPGA.” Also see section 6 on p. 5, e.g. “Then we design a scheduler that can map the complex LSTM operations on FPGA and achieve parallelism.” Also see Ling, ¶ 0034, e.g. “mapping.” Note that Ling’s mapping is associated with Fig. 1, element 103 and is performed based on a determination of hardware elements 
comprising … code that describes how the one or more matrix processing units and the one or more vector processing units are to be arranged to perform the RNN algorithm, wherein at least one of the one or more matrix processing units, as part of implementing the RNN algorithm, is to … [communicate with] one of the one or more vector processing units …. See Han, Fig. 5(b) depicting connection of matrix and vector processing units. Also see related text in section 4 on p. 4, e.g. “Sparse Matrix-vector Multiplication (SpMV) … ElemMul in Fig.5(b) generates one vector by consuming two vectors.” Also see section 6 on p. 5, e.g. “map the complex LSTM operations on FPGA and achieve parallelism.”  Also see Ling, ¶ 0023, “the design for the CNN accelerator may be generated in a high level design language or a hardware description language.”
Han does not expressly disclose register transfer language (RTL) code. However, this is taught by Jain. See Jain, Abstract, e.g. “The attributes, nodes, and arcs are compiled and reduced to a Register Transfer Level ( RTL) simulation model compatible with present VHDL and Verilog formats.” It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Han’s mapping with Jain’s RTL in order to provide a well-known standardized description representative of structural aspects of the device which includes a data path comprising numerous physical elements (registers and transfer elements) and a control path which provides control signals to the data path as suggested by Jain (see col. 3, lines 12-24).
provide a value to or receive the value from a vector unit. However, Yang clearly teaches this. See Yang, Fig. 1, elements 122 and 124 along with ¶ 0020, e.g. “Such a reading may include reading image data in a Y-tiled-type storage format via matrix module 124 (e.g., via a matrix pattern adapted for video memory 114). Similarly, such a writing of image data into system memory 112 may include writing image data in a linear-type storage format via vector module 122 (e.g., via a vector pattern adapted for system memory 112).” As depicted in the figure, data is transferred directly from the matrix module to the vector module, and vice versa. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Han’s matrix and vector RNN operations with Yang’s data handling in order to provide fast data processing as essentially suggested by Yang (e.g. see Yang, ¶ 0002).

	In regard to claim 3, Han discloses:
3. The method of claim 1, wherein the determining the hardware components comprises determining a number and type of adders and multipliers, and a number of pipeline stages and [bandwidth] in the adders and the multipliers See section 4 along with Fig. 5, describing hardware components including number and types of adders (e.g. “Adder Tree”), multipliers (e.g. SpMV) assembled as pipeline stages as depicted. Also, Han, p. 3, at the end of section 2, describes requiring 16 bit bandwidth for weights and activations used in adders and multipliers.


4. The method of claim 1, wherein the mapping is based upon optimization goals indicating properties of the accelerator instance that should be optimized for. See at least section 1 on p. 2, e.g. “load balancing and partitioning both the computation and storage.”

	In regard to claim 5, Han discloses:
5. The method of claim 1, wherein the mapping is based upon one or more dataset properties identifying properties of input data for the RNN algorithm to be used with the accelerator instance. See section 2 on p. 3, e.g. “We compress the model by quantizing 32-bit floating point weights into 12-bit integer.” Also see section 3, generally describing dataset feature considerations.

	In regard to claim 8, Han discloses:
8. The method of claim 1, further comprising: validating a performance of and functionalities of the generated accelerator instance against one or more performance and functional models derived from hardware design constraints and optimization goals. See Han, at least Table 1 and Fig. 6 on p. 5 along with the associated discussion explaining the validation of the accelerator instance against a model of performance speedup.


9. The method of claim 1, further comprising at least one of: programming a Field Programmable Gate Array (FPGA), using the accelerator instance, to cause the FPGA to become operable to implement the RNN algorithm; or providing the RTL code to be used as an input to a logic synthesis tool to yield a circuit design for an Application-Specific Integrated Circuit (ASIC). See Han, at least section 5 on p. 4, describing programming of a FPGA. 

	In regard to claim 10, Han discloses:
10. The method of claim 1, wherein the RNN algorithm is either: a gated recurrent unit (GRU) RNN variant; or a long short term memory (LSTM) RNN variant. See Han, section 1 on p. 1, e.g. “LSTM.”

	In regard to claim 11, Han discloses:
11. A non-transitory machine readable storage medium having instructions which, when executed by one or more processors of a device, cause the device to implement a design framework module to generate an accelerator instance optimized to implement a recurrent neural network (RNN) algorithm by performing operations comprising: See Han, at least Fig. 5(a), depicting a software program associated with CPU and memory storage which are required for implementation with FPGA.
All further limitations have been addressed in the above rejection of claim 1.



	In regard to claim 20, Han discloses:
20. A device comprising: one or more processors; and one or more non-transitory machine readable storage media having instructions which, when executed by the one or more processors, cause the device to implement a design framework module that is to generate an accelerator instance optimized to implement a recurrent neural network (RNN) algorithm by performing operations comprising: See Han, at least Fig. 5(a), depicting a software program associated with CPU and memory storage which are required for implementation with FPGA.
All further limitations have been addressed in the above rejection of claim 1

Claims 2 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Han in view of Jain, Ling, and Yang as applied above, and further in view of U.S. Patent 5,434,951 to Kuwata (“Kuwata”) and U.S. Patent 8,909,581 by Oka et al. (“Oka”).

	In regard to claim 2, Han does not expressly disclose the claimed limitations. However, Kuwata and Oka teach the following:
2. The method of claim 1, wherein the obtaining comprises: computing, by the design framework module, the flow graph based upon a plurality of equations corresponding to the RNN algorithm. Kuwata teaches that neural networks are associated with equations. See Kuwata, col. 1, lines 41-50, e.g. “The operating characteristic of the neural network shown in FIG. 1 can be represented by the following equations …“ It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Han’s neural networks with Kuwata’s equations in order to analyze the mathematical properties of a system as is generally taught by Kuwata and known to those of ordinary skill in the art. Oka teaches construction of a graph corresponding to an equation. See Oka, col. 12, lines 15 and 36-42, e.g. “graphical models of the joint factor graph ri for i=1 . . . N could be easily constructed.” It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Han’s RNN with Oka’s graph in order to provide mathematical analysis of variables in an equation as suggested by Oka (see col. 7, lines 2-18).

	In regard to claim 12, parent claim 11 is addressed above. All further limitations have been addressed in the above rejection of claim 2.

Claims 6-7 and 16-17 are rejected under 35 U.S.C. 103 as being unpatentable over Han in view of Jain, Ling, and Yang as applied above, and further in view of cited art of record “NeuroFlow: A General Purpose Spiking Neural Network Simulation Platform using Customizable Processors” by Cheung et al. (“Cheung”).


6. The method of claim 1, wherein the mapping further yields a compiler that is executable to program an accelerator, generated based upon the accelerator instance, to execute micro-code to implement the RNN algorithm. See Han, at least Fig. 1 on p. 2, depicting a relationship of mapping/scheduling and compiling to produce FPGA acceleration for execution of the instance of RNN code. While the use of a compiler to program an accelerator are well-known, Han does not expressly disclose the use of the compiler to program the accelerator. However, Cheung teaches this. See Cheung, Fig. 3 on p. 4; also see related text at p. 6, bottom right column – p. 7., top left column, describing compilation to program an accelerator. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Han’s algorithms with Cheung’s compiler in order to automate the process of translating neural models on a hardware system as suggested by Cheung.

	In regard to claim 7, Han discloses:
7. The method of claim 6, wherein the compiler is to program the accelerator by causing a control unit of the accelerator to execute at least some of the micro-code. See Han, Fig. 1 on p. 2, depicting the well-known use of a compiler to program/cause execution of an accelerator. Also see Cheung, Fig. 3 on p. 4; also see related text at p. 6, bottom right column – p. 7, top left column as cited above.




Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to James D Rutten whose telephone number is (571)272-3703.  The examiner can normally be reached on M-F 9:00-5:30 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached on (571)272-3768.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-




/James D. Rutten/Primary Examiner, Art Unit 2121