DETAILED ACTION
This Office Action is in response to the remarks entered on 02/01/2021. Claims 1, 15, were amended. No claims were added. No claims were cancelled.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 are presented for examination.

Applicant's arguments filed on 02/01/2021have been fully considered but they are not persuasive. 
In reference to Applicant’s arguments about: Rejections under 35 U.S.C. §103:
 	Applicant’s Argument: Reiection of Claims under 35 U.S.C. § 103 
Claims 1-3, 5-10, 12-17 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Aydonat et al. (US 20170103299, hereinafter Aydonat). 
Claims 4, 11 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Aydonat in view of Bolic et al. (US 20160210167, hereinafter Bolic). 
Independent claims 1, 8, and 15 are amended to recite "generating, based on the model, a systolic array that defines a plurality of interconnected processing elements that perform identical processes as defined by the first layer of the neural network." When rejecting the pre-amended claim language, the Office appears to map the PE arrays 901-904 in Aydonatto the systolic array recited in claim 1. However, as shown in Fig. 9 of Aydonat, the PE arrays 901-904 are arrays of multiple layers (e.g., the 
Examiner’s Response: 
Examiner respectfully disagrees to applicant because Aydonat teaches a layer including one or more processing elements to implemented a standard convolution layer, therefore, the convolution layer is considered as the first layer defined the one or more processing elements, as it can be seen at [Par.0009, lines 1-4], “According to an embodiment of the present invention, a method for implementing a CNN accelerator on a target includes utilizing one or more processing elements to implement a standard convolution layer” and [Par.0010, lines 6-9], “The CNN accelerator also includes a plurality of processing elements that implement a standard convolutional layer during the first configuration” and [Par.0037, lines 6-8], “The parameters of a CNN algorithm may include a number of processing elements to instantiate for each layer identified” therefore, the first configuration of the convolutional layer (first layer) is implemented by plurality of processing elements. Aydonat further discloses the processing elements arrays 901-904, that is obviously show the processing elements arrays including one or more processing elements, wherein each processing element is interconnected with 
Examiner respectfully reminds applicant that Bolic is only brought to cure the specific deficiencies of Aydonat regarding their respective dependent claims. Examiner still understands that Aydonat still teaches claims 1, 8 and 15, as explained above in this response. Therefore, the argument is not persuasive, the rejections of the dependent claims are still maintained. 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the 

Claims 1-3, 5-10, 12-17 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Aydonat et al. (Pub. No. US20170103299– hereinafter, Aydonat).  
Regarding to claim 15, Aydonat teaches a computing system, comprising: a processor (Aydonat, [Fig.14, Par.0087, lines 3-5], “The computer system 1400 includes a processor 1410 that process data signals.”);
and a memory comprising a compiler, wherein the compiler, when executed by the processor performs an operation comprising (Aydonat, [Fig.14, Par.0087], “The computer system 1400 includes a memory 1420. The memory 1420 may store instructions and code represented by data signals that may be executed by the processor 1410.” and element 1560 HDL compilation unit is considered as the compiler. ):
receiving a model (Aydonat, [Fig.15, Par.0098, lines 2-4], “The HDL compilation unit 1560 compiles a description of the design for the CNN accelerator for the target.” Furthermore, see [Par.0089, lines 2-7], “According to an embodiment of the present invention, the EDA tool 1421 operates to identify features of a CNN accelerator which includes characteristics and parameters of the CNN accelerator, and resources of a target that the CNN accelerator is to be implemented on.” The target that CNN accelerator is to be implemented on, which included characteristics and parameter features. Therefore, the target is considered as the model.)
defining a sequential order of a plurality of pipelined functions performed when executing a first layer in a neural network (Aydonat, [Par.0009, lines 1-4], , [Par.0055], “FIG. 8 illustrates a conceptual view of an exemplary CNN algorithm 800 that may be implemented according to an exemplary embodiment of the present invention. The CNN 800 includes a plurality of layers where each layer transforms one volume of activations to another volume through a differentiable function. The CNN 800 includes five convolution layers 811-815. The convolution layer computes an output of neurons that are connected to local regions in an input. The convolution layer computes a dot product between its coefficients (weights) and the region it is connected to in an input volume. According to an .
wherein the neural network comprises a plurality of layers (Aydonat, [Fig.8, Par. 0055, lines 4-6], “The CNN 800 includes a plurality of layers where each layer transforms one volume of activations to another volume through a differentiable function.”);
generating, based on the model (Aydonat, [Par.0006, lines 5-10], “The methodology utilizes an electronic design automation (EDA) tool that generates a design for the CNN accelerator in response to features of a CNN accelerator which may include characteristics and parameters of the CNN accelerator specified by a user, and available resources on a target selected by the user. The target may include one or more target devices of one or more types. The EDA tool assigns resources on the target to implement the CNN accelerator to achieve high performance. For example,resources on the target are assigned to implement appropriately sized buffers to handle the types and sizes of images to be processed by the CNN accelerator. Resources on the target ,
 a systolic array that defines a plurality of interconnected processing elements that perform identical processes as defined by the first layer of the neural network (Aydonat, [Par.0010, lines 6-9], “The CNN accelerator also includes a plurality of processing elements that implement a standard convolutional layer during the first configuration” and [Par.0037, lines 6-8], “The parameters of a CNN algorithm may include a number of processing elements to instantiate for each layer identified” therefore, the first configuration of the convolutional layer (first layer) is implemented by plurality of processing elements. Aydonat further discloses the processing elements arrays 901-904, that is obviously show the processing elements arrays including one or more processing elements, wherein each processing element is interconnected with each other, as it can be seen at [par. 0060]. Aydonat further discloses the plurality interconnected processing elements that perform identical processes defined by one layer, as it can be seen at [Par.0068, lines 4-11], “The output of one convolution layer is streamed into a next convolution layer. Each processing element receives the same streaming feature data that belongs to the same image every cycle to compute an output in the same (x,y) output coordinates in different output planes. Coefficient data is treated as repeated data since the same set of coefficients is used to compute different output feature maps in the same (x,y) output plane.”.) ;
and compiling source code corresponding to the model and the systolic array into a hardware level design (Aydonat, [Fig.6, Par.0043, lines 1-7], “FIG. 6 is a 
that provides a static schedule when executing the neural network in a hardware system (Aydonat, [Par.0061], “A sequencer unit 920 orchestrates the sequencing, addressing, and delivery of data to each of the PE arrays 901-904, kernels in each of the PE arrays 901-904, and components in each of the kernels. The sequencer unit 920 coordinates the transmission of data to appropriate PE arrays 901-904 in order to time multiplex computations on the PE arrays 901-904. The accumulated results from the PE arrays 901-904 may be transmitted to one of the buffers 951-954 which transmits the computed output layer back to kernel and components in the PE arrays 901-904 for a next round of layer computation. The buffers 951-954 reside on a 
Regarding to claim 1 is being rejected for the same reason as the claim 15.
Regrading to claim 8 is being rejected for the same reason as the claim 15.
Additionally, Aydonat also teaches a non-transitory computer-readable storage medium storing instructions, which when executed on one or more processing devices, perform an operation for scheduling a neural network, the operation comprising (Aydonat, [Par. 0106, lines 1-7], “It should be appreciated that embodiments of the present invention may be provided as a computer program product, or software, that may include a computer-readable or machine-readable medium having instructions. The instructions on the computer-readable or machine-readable medium may be used to program a computer system or other electronic device.”)
Regarding to claim 16, Aydonat teaches the computing system of claim 15, wherein the operation further comprises: configuring a field programmable gate array (FPGA) based on the hardware level design (Aydonat, [Fig.14, Par.0099, lines 5-8], “FIG. 14. The CNN accelerator configuration tool 1600 may be used to configure a system such as a CNN accelerator on one or more target devices such as an FPGA, ASIC, structured ASIC, or other circuitry.” ),
wherein the hardware level design comprises register transfer level (RTL) code (Aydonat, [Par.0104], “The CNN accelerator configuration tool 1600 includes a configurable status register unit 1650. The configurable status register unit 1650 sets one or more configurable status registers to support the variation of the CNN .
Regarding to claim 2, is being rejected for the same reason as the claim 16.
Regarding to claim 9, is being rejected for the same reason as the claim 16. 
Regarding to claim 17, Aydonat teaches the computing system of claim 15, wherein compiling the source code of the systolic array comprises: converting the source the source code of the systolic array into a two dimensional array of the plurality of  interconnected processing elements execute concurrently to process data received at the first layer (Aydonat, [Par.0066, lines 1-6], “One or more processing elements may be used together with off-chip memory interfaces, on-chip buffers and control logic to route data into and out of the one or more processing elements to support computations performed by a variety of algorithms. These computations include matrix multiplication, and 1D/2D/3D convolutions” and [Par.0039, lines 13-16], “According to one embodiment, double buffering is supported to allow writing of new intermediate results from a convolution stage while reading results from a previous stage in a different location in the buffer.” Examiner’s note, writing a new intermediate result from a convolution stage while reading results from a previous state , therefore, examiner interprets the processing elements are  parallel processing corresponding to Pes are concurrently process. ). 
Regarding to claim 3, is being rejected for the same reason as the claim 17.
Regrading to claim 10, is being rejected for the same reason as the claim 17.
Regarding to claim 19, Aydonat teaches the computing system of claim 15, wherein the model comprises a software defined parallelization pragma (Aydonat, [Par.0097], “According to an embodiment of the present invention, information from the buffer allocation unit 1530, computation unit generation unit 1540, and sequencer generation unit 1550 is used to generate a description of the design of the CNN accelerator. The description of the design may be in HDL format or other format” wherein the HDL format includes parallelization pragma.)
indicating the sequential order of the plurality of pipelined functions (Aydonat, [Par.0096], “The sequencer generation unit 1550 generates and programs a sequencer unit that coordinates transmission of data to appropriate processing element arrays on the CNN accelerator, kernels in the processing element arrays, and components in the kernels at appropriate times in order to time multiplex computations on the processing element arrays. According to an embodiment of the present invention, the sequencer unit may be programmed to perform the procedures illustrated with reference to FIGS. 11-13.” Furthermore, see [Fig. 8] described the sequential order of the plurality of pipelined functions of the convolutional neural network.).
Regarding to claim 5, is being rejected for the same reason as the claim 19.
Regarding to claim 12, is being rejected for the same reason as the claim 19.
Regrading to claim 20, Aydonat teaches the computing system of claim 15, wherein the source code corresponding to the model comprises untimed functional code for the neural network (Aydonat, [Par.0006, lines 12-22], “For . 
Regarding to claim 6, is being rejected for the same reason as the claim 20.
Regarding to claim 13, is being rejected for the same reason as the claim 20.
Regarding to claim 7, Aydonat teaches the method of claim 1, wherein the plurality of pipelined functions includes at least one of a convolution unit, a pooling unit (Aydonat, [Par.0060, lines 4-9], “PE array 901 represents a first PE array and PE array 904 represents an nth PE array, where n can be scaled to any number. According to an embodiment of the present invention, each PE array includes hardware components that support layers such as a convolution layer, ReLU layer, normalization layer, and pooling layer.”),
and a matrix multiplier that transmits data to an activation unit in the plurality of pipelined functions (Aydonat, [Par.0066], “One or more processing .
Regarding to claim 14, is being rejected for the same reason as the claim 7.

Claims 4, 11 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Aydonat in view of Bolic et al. (Pub. No. US 20160210167– hereinafter, Bolic).  
Regarding to claim 18, Aydonat teaches the computing system of claim 17, wherein compiling the source code of the systolic array comprises: identifying a plurality of operations performed by each of the plurality of interconnected processing elements (Aydonat, [Par.0041], “a sequencer unit is generated. The sequencer unit coordinates transmission of data to appropriate processing elements on the CNN accelerator at appropriate times in order to time multiplex computations on the processing elements. According to an embodiment of the present invention, the sequencer unit is programmed to perform the coordination required to support the algorithms performed by the CNN accelerator. The sequencer unit may be generated using logic array blocks, registers, and/or a hard or soft processing unit available on a ,
 […]
and assigning the plurality of operations to different hardware elements in the hardware system such that the plurality of operations are able to perform concurrently (Aydonat, [Par.0061, lines 4-13, and Par.0062], “The sequencer unit 920 coordinates the transmission of data to appropriate PE arrays 901-904 in order to time multiplex computations on the PE arrays 901-904. The accumulated results from the PE arrays 901-904 may be transmitted to one of the buffers 951-954 which transmits the computed output layer back to kernel and components in the PE arrays 901-904 for a next round of layer computation. The buffers 951-954 reside on a target device implementing the CNN accelerator 900 and may be referred to as on-chip buffers...” Therefore, Sequencer unit 920 coordinate and modify the plurality of operation, which may include the parallel performance. See further [Par.0086, lines 2-9], “The procedures described in these figures may be performed by a sequencer unit implemented by a CNN accelerator, and may be used to program the sequencer unit as .
However, Aydonat does not teach wherein each of the plurality of interconnected processing elements perform the same plurality of operations;
On the other hand, Bolic teaches wherein each of the plurality of interconnected processing elements perform the same plurality of operations (Bolic, [Par.0023, lines 2-13], “In some examples, a coprovisor component may be configured to multiplex multiple domains' requests to access a hardware accelerator such as a field programmable gate array (FPGA), an application specific integrated circuit (ASIC), or a comparable accelerator in a paravirtualized environment. Hyper-requesting may be employed for hardware acceleration virtualization, where a hardware acceleration module concurrently loads a portion of data of a request for a first accelerator application and a portion of data of another request for a second accelerator application and simultaneously processes the two portions of data.” Furthermore, see [Abstract, lines 7-12], “Hyper-requesting may be employed for hardware acceleration virtualization, where a hardware acceleration module concurrently loads a portion of data of a request for a first accelerator application and a portion of data of another request for a second accelerator application and simultaneously processes the two portions of data. );
Aydonat and Bolic are analogous in arts because they have the same filed of endeavor of implementing a convolutional neural network (CNN) accelerator in data processing.

Regarding to claim 4, is being rejected as the same reason as the claim 18.
Regarding to claim 11, is being rejected as the same reason as the claim 18.

 
Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  

The prior art made of record on the PTO-892 and not relied upon is considered pertinent to applicant’s disclosure.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to EM N TRIEU whose telephone number is (571)272-5747.  The examiner can normally be reached on 7:30 - 5:00 M_TH.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571) 272-9767.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  

/E.T./           Examiner, Art Unit 2126                                                                                                                                                                                             
/BABOUCARR FAAL/Primary Examiner, Art Unit 2184