DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This office action is in response to submission of application on 7/25/2018.
Claims 1-20 are presented for examination.
Oath/Declaration
For the record, Examiner acknowledges that the Oaths/Declarations submitted on 7/25/2018 have been received.
Information Disclosure Statement
The information disclosure statement submitted on 7/25/2018 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is considered by the examiner.
Drawings
The drawings are acceptable for the purposes of examination.
Specification
In paragraph [0033], line 1, “At a block 420” should be “At block 420”.
In paragraph [0035], line 1, “in order to processing” should be “in order to process”.
In paragraph [0044], line 2, “associated the layer” should be “associated with the layer”.
In paragraph [0047], line 3, “which equals” should be “which is equal to”.
In paragraph [0055], line 3, “the layer” should be “layer”.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 12-13, and 20 are rejected under Sharma et al (From High-Level Deep Network Models to FPGAs, herein Sharma) and Wei et al (Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs, herein Wei).
Regarding claim 1,
	Sharma teaches a computer-implemented method for improving deep neural network performance in a field-programmable gate array, the method comprising: (Sharma, Page 1, Column 1, Paragraph 1, Line 10 “This work tackles these challenges by devising DNNWEAVER, a framework that automatically generates a synthesizable accelerator for a given (DNN, FPGA) pair from a high-level specification in Caffe[1].”  In other words, DNNWEAVER is a computer-implemented method, generates a synthesizable accelerator is improving performance, DNN is deep neural network, and FPGA is field-programmable gate array.)
	in response to receiving a network model describing a deep neural network, determining a plurality of layers associated with the deep neural network; (Sharma, Page 2, Column 2, Paragraph 1, Line 12 “The input to DNNWEAVER is a high-level specification of the DNN in Berkeley Caffe format [1]. Caffe is a widely used open-source deep learning framework that takes the DNN specification as input and computes the given model on CPUs and GPUs. The code snippet in Figure 2. shows how two DNN layers, convolution and pooling, are described and connected in Caffe.” In other words, high-level specification is network model describing a deep neural network, and computes the given model on CPUs and GPUs is determining a plurality of layers associated with the deep neural network.)

    PNG
    media_image1.png
    248
    611
    media_image1.png
    Greyscale

	Thus far, Sharma does not explicitly teach with respect to a layer in the plurality of layers, determining a parallelism factor for processing operations associated with the layer simultaneously by processing elements in a field-programmable gate array (FPGA) based on a workload associated with the layer and a configuration of the FPGA.
	Wei teaches with respect to a layer in the plurality of layers, determining a parallelism factor for processing operations associated with the layer simultaneously by processing elements in a field-programmable gate array (FPGA) based on a workload associated with the layer and a configuration of the FPGA. (Wei, Page 2, Column 1, Paragraph, 7, Line 1 “We present a novel 2-D systolic array architecture for CNN on FPGA in Fig. 1.  As shown in this figure, each PE shifts the data of W and IN horizontally and vertically to the neighboring PEs at each cycle.  This 2-D topology matches the 2-D structure in the FPGA layout so that it can achieve timing constraints easily because of low routing complexity.  In addition, there is a SIMD vector accumulation inside each PE.  The parallelization factor of the SIMD factor is usually power of two due to the dedicated inter-DSP accumulation interconnect in modern FPGAs.” In other words, parallelization factor is parallelism factor, FPGA is FPGA, and due to the dedicated inter-DSP accumulation interconnect is workload associated with the layer and the configuration of the FPGA.)

    PNG
    media_image2.png
    238
    385
    media_image2.png
    Greyscale

	It would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Wei into the teaching of Sharma.  This would result in being able to determine a parallelism factor for a layer in a deep neural network (DNN).
	Wei and Sharma are both directed to frameworks that optimize (DNN) design for implementation on a field-programmable gate array (FPGA).  One of ordinary skill in the art would be motivated to combine the teaching of Wei into Sharma in order to determine the parallelism factor for layers in a DNN.
Regarding claim 2,
	Sharma teaches the computer-implemented method of Claim 1, wherein: the workload associated with the layer comprises an amount of operations associated with the layer; and the configuration of the FPGA comprises a total bandwidth required for processing operations associated with the plurality of layers in the FPGA and a bandwidth of a memory in the FPGA. (Sharma, Page 2, Column 2, Paragraph 2, Line 12 “We choose this abstraction to provide a unified hardware-software interface and enable layer-specific optimization in the accelerator microarchitecture without exposing them to the software.” And Page 2, Column 2, Paragraph 3, Line 12 “Our Template Resource Optimization algorithm aims to strike a balance between parallel operations and data reuse by slicing computations and configuring the accelerator to best match the constraints of the FPGA (on-chip memory and external memory bandwidth).” In other words, layer-specific optimization is workload associated with the layer, and constraints of the FPGA (on-chip memory and external memory bandwidth) is configuration of the FPGA comprises a total bandwidth required for processing operations.)
Claims 12 and 13 are computer system claims corresponding to method claims 1 and 2 respectively.  Outside of that, they are the same.  It is implicit that a computer-implemented method will be implemented on a computer system with at least one computer processor and at least one computer-readable memory unit.  Therefore, claims 12 and 13 are rejected for the same reasons as claims 1 and 2 respectively.
Claim 20 is a computer program product comprising a computer-readable storage medium that corresponds to method claim 1.  Outside of that, they are the same.  It is implicit that a computer-implemented method would have at least one computer-readable storage medium.  Therefore, claim 20 is rejected for the same reasons as claim 1.
Allowable Subject Matter
Claim 3 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter: Claim 3 requires, among other things, a method for determining a parallelism factor 
    PNG
    media_image3.png
    30
    112
    media_image3.png
    Greyscale
 wherein the parallelism factor for a given layer i (PFi) is calculated by the amount of operations per layer (Nopsi) multiplied by the bandwidth (ABW) of the memory in the FPGA, the product of which being divided by the total bandwidth (NTBW) of the FPGA. NTBW is calculated by
    PNG
    media_image4.png
    31
    361
    media_image4.png
    Greyscale
, where BPOi is the amount of bits to be loaded into the FPGA for one operation for layer i.  BPOi is calculated by 
    PNG
    media_image5.png
    24
    215
    media_image5.png
    Greyscale
 where DWi is the bit width of the weights, Hi is the height of an output feature map, and Ri is the reuse factor.
	Sharma teaches a method for implementing and accelerating a deep neural network (DNN) in a field-programmable gate array (FPGA) but does not teach calculating a parallelism factor based on amount of operations associated with the layer and the bandwidth of memory in the FPGA the product of which being divided by the total bandwidth. Other references also teach accelerating DNNs in an FPGA, see (Wei) and Zhang et al (Optimizing FPGA-based Accelerator Design for Deep Convolutional neural Networks), but all fail to teach calculating a parallelism factor as claimed in the present invention.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BART RYLANDER whose telephone number is (571)272-8359.  The examiner can normally be reached on Monday - Thursday 8:00 to 5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on 571-270-7092.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/B.I.R./Examiner, Art Unit 2124                                                                                                                                                                                                        

/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124