DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-10, 12-18, and 20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Xu et al. US 2010/0076915.
Regarding claims 1, 11, and 20, Xu teaches “a method for pipelining tasks submitted to a neural network accelerator” ([0037] “One or more neural network algorithms may be implemented on one or more of the FPGAs with direct parallel architecture and/or pipelined architecture to exploit both application parallelism and direct functional logic implementation”), the method comprising: receiving a first task from a neural network application to be processed by the neural network accelerator” ([0139] “In each data block, some bytes are combined to contain the same data entry. Multiple data rows provide a data block for a query, and such multi-row data blocks arriving sequentially representing one query at a time. For example, data format 600 illustrates two data blocks, one for query 0 and the other for query 1” a query i.e. receiving a task); 
“generating, using one or more computer processors, a packet that contains information used by multiple stages in a pipeline” ([0139] “FIG. 7 shows an exemplary data format used for data streaming from a host computer (e.g., 514) to an FPGA (e.g., 506). The data format 700 represents a data stream in which each row represents a data row containing eight bytes (64 bits of data) arriving at the same time.”); 
“processing the packet at the multiple stages, wherein at least one of the multiple stages performs a call to a hardware system executing the neural network accelerator” ([0125] “FIG. 6 illustrates an exemplary workflow 600 for using an FPGA accelerator such as shown in FIGS. 4-5 to train a neural network. The workflow 600 shows the actions taken by or occurring in software and hardware in the training processing, as well as the alternation there between” the software application and hardware accelerator i.e. multiple stages), “and wherein the pipeline processes at least one other packet corresponding to a second task in parallel with processing the packet” ([0114] “Given the training data from host computer, a series of operations are performed in multiple computation units (or processing elements) in the processing engine. This design exploits the parallelism without explicitly managing allocation, synchronization, or communication among the computation units. These computation units may be carefully to enable the pipelined, high-bandwidth processing of the training data.”); and 
“returning results of processing the packet using the pipeline to the neural network application” ([0127] “At (hh), the hardware polls the prepare register until the register is pulled up by software, and then sends training results to the software”)

Regarding claims 2 and 12, Xu further teaches “wherein processing the packet at the multiple stages comprises: processing the packet at a pre-processing stage” (figure 6 
    PNG
    media_image1.png
    367
    290
    media_image1.png
    Greyscale
); 
“processing the packet at an execute stage occurring after the pre-processing stage, wherein the call to the hardware system occurs during the execute stage” (fig. 6 
    PNG
    media_image2.png
    250
    608
    media_image2.png
    Greyscale
); and 
“processing the packet at a post-processing stage after the execute stage” (fig. 6 
    PNG
    media_image3.png
    440
    586
    media_image3.png
    Greyscale
)

Regarding claims 3 and 13, Xu further teaches “wherein processing the packet at the pre-processing stage comprises: converting data corresponding to the first task from a first format used by the neural network application to a second format used by the hardware system, wherein processing the packet at the post-processing stage comprises converting the results from the second format to the first format” ([0139] “The data format 700 represents a data stream in which each row represents a data row containing eight bytes (64 bits of data) arriving at the same time. In the illustrated example, all data is 64-bit aligned, and the blank parts are complemented by zeros”)
Regarding claim 4, Xu teaches “wherein each of the multiple stages comprises a respective thread that processes the packet independently from the other threads” ([0116] “The PCI board 404, and/or devices thereon, may communicate with the bus 416 thorough a PCI controller 418. The computation logic blocks of FPGA 406 are programmed to comprise a hidden layer processing engine hPE 460 and the output layer processing engine oPE 456. Each engine is configured to perform computation associated with the respective layer of the LambaRank algorithm being implemented, hidden layer by hPE 460 and the output layer by oPE 456, respectively. The hidden layer processing engine hPE 460 has a plurality of processing elements PE0-PE19, each representing a hidden node of the hidden layer”)
Regarding claims 5 and 14, Xu teaches “further comprising: generating a memory map that maps allocated blocks of memory for the neural network application to allocated blocks of memory for the neural network accelerator in the hardware system” ([0119] “One suitable commercial example that can be configured as FPGA 406 is Altera Stratix-II FPGA, which has logic array blocks (LABs) that can be programmed as processing elements, and memory block structures (such as M512 RAM, M4K RAM, and M-RAM blocks) that can be used as the internal memory as described herein.” wherein this functionality is inherent to the fpga system); and 
“converting first memory addresses received from the neural network application based on the memory map into second memory addresses for memory blocks in the hardware system” (previous citation and [0126] “application software calls the write routine in the driver installed on the host computer to write the data to memories on the FPGA accelerator. The write routine may be implemented with a direct memory access (DMA) method to achieve high bandwidth access to the accelerator. At (c), the software sends instructions to set the initialize register (a register for initialization) and starts to write the initialized data onto the accelerator hardware with the DMA write”)
	Regarding claims 6 and 15, Xu teaches “further comprising: transferring weights used to perform multiple layers of a neural network to the hardware system in a matrix format” (fig. 7 wherein the Gains are weights and the format is a matrix format); 
“and identifying a subset of the weights corresponding to a new task” (fig. 4 items 460, 456, and [0116] “computation logic blocks of FPGA 406 are programmed to comprise a hidden layer processing engine hPE 460 and the output layer processing engine of oPE 456” wherein each engine using different subsets of weights in a computation necessitates identifying the subsets of weights and [0148] “Each instance of ALU0 has access to corresponding data such as a threshold, an output node weight, an intermediate hidden node output, a hidden node weight O and the term 1-O.sup.2”); 
“and transmitting an offset to the hardware system indicating the subset of the weights that are to be used when processing the packet” (previous citation, when executing the operations for the corresponding layers, necessarily retrieves the weights from the internal memory and does so via the addressing mechanisms of the FPGA which is addressing via registers base+offset thus meeting the claim limitations)
	Regarding claims 7 and 16, Xu teaches “further comprising: obtaining a metric regarding execution of the neural network accelerator on the hardware system” ([0128] “Register access (write/raad) is used to communicate status and control signals between the software and the hardware.”); 
“outputting for display a visual representation of the metric” (fig. 2 shows displayers for displaying the metric); and 
“adjusting hardware resources executing the multiple stages in the pipeline to increase utilization of the neural network accelerator” ([0147] “The LambdaRank, for example, mainly include three steps: forward process (FP), lambda calculation, and backward propagation (BP), according to the algorithm described herein. In these three steps, forward process and backward propagation share many basic calculation items. For efficient usage of FPGA board resources, one embodiment combines the FP and BP in one computation unit or processing engine. Moreover, a refined structured pipeline is integrated to the computation unit.”)
	Regarding claims 8 and 17, Xu teaches “wherein the multiple stages in the pipeline are defined in a library” (figure 11 which shows multiple pipelines), “wherein the library comprises an application program interface (API) that is configured to permit different types of neural network applications to use the multiple stages in the pipeline to submit tasks to the neural network accelerator” ([0008] “the hardware logic of the FPGA includes a processing element performing computations relating to a hidden layer of the neural network training algorithm. The processing element may have a plurality of arithmetic logic units each representing a hidden node of the hidden layer. Each arithmetic logic unit may include a plurality of multiple-pipeline multipliers and a plurality of multiple-pipeline adders. The multipliers and the adders can be based on floating-point numbers to improve weight precision. The plurality of arithmetic logic units performs parallel computations”)
	Regarding claims 9 and 18, Xu further teaches “further comprising: customizing a plurality of fields in the packet used to provide information to the neural network accelerator, wherein the customized plurality of fields various according to a type of the neural network accelerator, wherein different types of neural network accelerators use different fields” (fig. 7, each field is customized to the specific processing going on)
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 11 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Xu et al. US 2010/0076915 in view of Canoy et al. US 2015/0066826.
	Regarding claims 11 and 19, the Xu reference has been addressed above. Xu does not explicitly teach debugging. Canoy however teaches “further comprising: determining a debugging function is active” (Canoy [0076] “Debugging is the process of detecting the issue in a neural processor. Types of debugging include telemetry monitoring and halting of execution followed by an inspection of neural state.”); 
“and switching between submitting the packet to the neural network accelerator to submitting the packet to the one or more computer processor in a host executing the neural network application” ([0078] “This detection is done seamlessly during execution without monitoring from an external entity. In contrast, the monitoring may be accomplished with a breakpoint determination unit, which may be internal to an artificial nervous system. For certain aspects, the breakpoint determination unit may be implemented as a breakpoint neuron, which may behave similarly to a typical spiking neuron in the neural processor, except that the breakpoint neuron may be configured to generate a notification event under a specific condition. This generated notification event may lead to a suspension of spike processing.”)
	It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Xu with that of Canoy since a combination of known methods would yield predictable results that is, debugging has been known in the art and thus would ooerate in a normal and predictable manner with the system of Xu as described in Canoy. 
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KEVIN W FIGUEROA whose telephone number is (571)272-4623.  The examiner can normally be reached on Monday-Friday, 10AM-6PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, MIRANDA HUANG can be reached on (571)270-7092.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/Kevin W Figueroa/Examiner, Art Unit 2124