Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 are presented for examination.
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 06/01/2021 has been entered.
Response to Argument
In reference to applicant’s argument about: Rejections under 103.
-Applicant’s Argument:
Claims 1-3, 5-10, 12-17 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Aydonat et al. (US 20170103299, hereinafter Aydonat). 
Independent claims 1, 8, and 15 are amended to recite "generating, based on the model, a systolic array comprising a plurality of interconnected processing elements forming a matrix that each perform identical processes in order to execute a function defined by the first layer of the neural network." When rejecting the pre-amended claim language, the Office appears to map the PE arrays 901-904 in Aydonatto the systolic array recited in claim 1. However, as shown in Fig. 9 of Aydonat, the PE arrays 901- 904 are arrays of multiple layers (e.g., the combination of Conv 911, RELU 921, norm the fact multiple PEs can be instantiated for each layer does not teach that the PEs for that layer are part of a "systolic array" comprising "a plurality of interconnected processing elements forming a matrix" as recited in claims 1, 8, and 15. For at least this reason, Applicant submits Aydonat does not render these claims obvious. Further, Aydonat does not teach that the PEs for a layer "each perform identical processes in order to execute a function" defined by the layer. To teach PEs performing identical processes, on page 4 of the Final Office Action the Office quotes from para. [0068] of Aydonat which states: 
The output of one convolution layer is streamed into a next convolution layer. Each processing element receives the same streaming feature data that belongs to the same image every cycle to compute an output in the same (x,y) output coordinates in different output planes. Coefficient data is treated as repeated data since the same set of coefficients is used to compute different output feature maps in the same (x,y) output plane. The very first sentence in this quote provided by the Office states "the output of one convolution layer is streamed into a next convolution layer." Paragraph [0068] is clearly discussing transmitting data between PEs of different layers, not a plurality of PEs performing identical processes to execute the function of one layer. Even further, paragraph [0068] does not say the processes performed by the processing elements are identical. Rather, it clearly states that each PE receives the same streaming feature data that belongs to the same image. The fact that some PEs receive the same data Examiner Response:  
Applicant’s arguments with respect to claim(s) 1, 8 and 15 have been considered and they are persuasive, based on the amended limitation of the claims 1, 18 and 15. However, upon further consideration, the new ground of rejection is made in view of Yodanat and further in view of Ross. Please see the rejection below. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 
 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:


A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:

2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Aydonat et al. (Pub. No. US20170103299– hereinafter, Aydonat) in view of Ross et al. (Patent. No. US9710748-hereinafter, Ross).  
Regarding to claim 15, Aydonat teaches a computing system, comprising: a processor (Aydonat, [Fig.14, Par.0087, lines 3-5], “The computer system 1400 includes a processor 1410 that process data signals.”);
and a memory comprising a compiler, wherein the compiler, when executed by the processor performs an operation comprising (Aydonat, [Fig.14, Par.0087], “The computer system 1400 includes a memory 1420. The memory 1420 may store instructions and code represented by data signals that may be executed by the processor 1410.” and element 1560 HDL compilation unit is considered as the compiler. ):
receiving a model (Aydonat, [Fig.15, Par.0098, lines 2-4], “The HDL compilation unit 1560 compiles a description of the design for the CNN accelerator for the target.” Furthermore, see [Par.0089, lines 2-7], “According to an embodiment of the present invention, the EDA tool 1421 operates to identify features of a CNN accelerator which includes characteristics and parameters of the CNN accelerator, and resources of a target that the CNN accelerator is to be implemented on.” The target that CNN 
defining a sequential order of a plurality of pipelined functions performed when executing a first layer in a neural network (Aydonat, [Par.0009, lines 1-4], “According to an embodiment of the present invention, a method for implementing a CNN accelerator on a target includes utilizing one or more processing elements to implement a standard convolution layer” and [Par.0010, lines 6-9], “The CNN accelerator also includes a plurality of processing elements that implement a standard convolutional layer during the first configuration” and furthermore, see [Par.0035], “FIG. 3 illustrates an example of a standard convolution layer implemented by an exemplary embodiment of the present invention. The standard convolution layer may be one of the layers identified at 201, described with reference to FIG. 2. The standard convolution layer receives input features from an input feature map 310. The standard convolution layer also receives a set of coefficients 321-323 generated through a training of the convolution layer. The coefficients 321-323 apply weights which formulate a filter for the convolution layer. The standard convolution layer performs a 3-dimensional dot product between a region 330 defined within the input features 310 and the coefficients 321-323.”, Examiner’s note, a convolutional layer is considered as the first layer of the neural network.  Furthermore, see [Fig. 8] described the sequential order of the plurality of pipelined functions of the convolutional neural network.), [Par.0055], “FIG. 8 illustrates a conceptual view of an exemplary CNN algorithm 800 that may be implemented according to an exemplary embodiment of the present invention. The CNN 800 includes a plurality of layers where each layer transforms one volume of activations 
wherein the neural network comprises a plurality of layers (Aydonat, [Fig.8, Par. 0055, lines 4-6], “The CNN 800 includes a plurality of layers where each layer transforms one volume of activations to another volume through a differentiable function.”);
generating, based on the model (Aydonat, [Par.0006, lines 5-10], “The methodology utilizes an electronic design automation (EDA) tool that generates a design for the CNN accelerator in response to features of a CNN accelerator which may include characteristics and parameters of the CNN accelerator specified by a user, and available resources on a target selected by the user. The target may include one or more target devices of one or more types. The EDA tool assigns resources on the target to implement the CNN accelerator to achieve high performance. For example,resources on the target are assigned to implement appropriately sized buffers to handle the types and sizes of images to be processed by the CNN accelerator. Resources on the target are also assigned to implement the appropriate types and number of computation units, 
[…]
and compiling source code corresponding to the model and the systolic array into a hardware level design (Aydonat, [Fig.6, Par.0043, lines 1-7], “FIG. 6 is a flow chart illustrating a method for compiling a design for a CNN accelerator on a target according to an exemplary embodiment of the present invention. The target may be one or more field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), structured ASICs, or other programmable device.” Furthermore, see [Par.0044- 0048], “At 601, a design for the CNN accelerator is synthesized. Synthesis includes generating a logic design of the system to be implemented by the target. According to an embodiment of the present invention, synthesis generates an optimized logical representation of the system from an HDL design definition. The optimized logical representation of the system may include a representation that has a minimized number of functional blocks, such as logic gates, logic elements, and registers, required for the system. Synthesis also includes mapping the optimized logical representation… Programming the target physically transforms programmable resources on the target into the design of the CNN accelerator” Therefore the neural network compiling the source code logic.)
that provides a static schedule when executing the neural network in a hardware system (Aydonat, [Par.0061], “A sequencer unit 920 orchestrates the sequencing, addressing, and delivery of data to each of the PE arrays 901-904, kernels in each of the PE arrays 901-904, and components in each of the kernels. The 
Aydonat teaches a systolic array comprising a plurality of interconnected processing elements forming a matrix that each perform identical processes in order to execute a function (Aydonat, [Par.0010, lines 6-9], “The CNN accelerator also includes a plurality of processing elements that implement a standard convolutional layer during the first configuration” and [Par.0037, lines 6-8], “The parameters of a CNN algorithm may include a number of processing elements to instantiate for each layer identified” therefore, the first configuration of the convolutional layer (first layer) is implemented by plurality of processing elements.” And [Par.0060, lines 1-8], “Input image pixels are transmitted into the processing element (PE) arrays 901-904 which may perform independent dot-product operations in a convolution procedure. PE array 901 represents a first PE array and PE array 904.” Aynoda further teaches the systolic array that generate the same stream data by using the same coefficient data that is corresponding to the identical processes, as it can be seen [Par.0068, lines 5-11], “Each processing element receives the same streaming feature data that belongs to the same image every cycle to compute an output in the same (x,y) output coordinates in different 
However, Aydonat does not clarify as defined by the first layer of the neural network;
On the other hand, Ross teaches as defined by the first layer of the neural network (Ross, [Column 1, lines 40-46], “and a vector computation unit communicatively coupled to the matrix computation unit and configured to, for each of the plurality of neural network layers: apply an activation function to each accumulated value generated by the matrix computation unit to generate a plurality of activated values for the neural network layer.” Examiner’s note, applying the activation function to generate the activated values for each of the neural network layer, therefore, each of layer network layer is obviously including the first layer. Ross further teaches using two dimensional systolic array to generate the layer of neural network, as it can be seen at Ross, [Column 5, lines 23-32], “FIG. 3 shows an example architecture 300 including a matrix computation unit. The matrix computation unit is a two-dimensional systolic array 306. The two-dimensional systolic array 306 can be a square array. The array 306 includes multiple cells 304. In some implementations, a first dimension 320 of the systolic array 306 corresponds to columns of cells and a second dimension 322 of the systolic array 306 corresponds to rows of cells. The systolic array can have more rows than columns, more columns than rows, or an equal number of columns and rows”. );
Aydonat and Ross are analogous in arts because they have the same filed of generate the neural network by using a systolic array. 

Regarding to claim 1 is being rejected for the same reason as the claim 15.
Regrading to claim 8 is being rejected for the same reason as the claim 15.
Additionally, Aydonat also teaches a non-transitory computer-readable storage medium storing instructions, which when executed on one or more processing devices, perform an operation for scheduling a neural network, the operation comprising (Aydonat, [Par. 0106, lines 1-7], “It should be appreciated that embodiments of the present invention may be provided as a computer program product, 
Regarding to claim 16, Aydonat teaches the computing system of claim 15, wherein the operation further comprises: configuring a field programmable gate array (FPGA) based on the hardware level design (Aydonat, [Fig.14, Par.0099, lines 5-8], “FIG. 14. The CNN accelerator configuration tool 1600 may be used to configure a system such as a CNN accelerator on one or more target devices such as an FPGA, ASIC, structured ASIC, or other circuitry.” ),
wherein the hardware level design comprises register transfer level (RTL) code (Aydonat, [Par.0104], “The CNN accelerator configuration tool 1600 includes a configurable status register unit 1650. The configurable status register unit 1650 sets one or more configurable status registers to support the variation of the CNN accelerator identified. According to an embodiment of the present invention, setting a configurable status register may add or subtract a convolution layer on the CNN accelerator, add or subtract one or more pooling layers, or change a size of a filter.” Furthermore, see [Par.0042], “a description of the design is generated. According to an embodiment of the present invention, the description of the design may be in a hardware description language (HDL) format or other format.”).
Regarding to claim 2, is being rejected for the same reason as the claim 16.
Regarding to claim 9, is being rejected for the same reason as the claim 16. 
Regarding to claim 17, Aydonat teaches the computing system of claim 15, wherein compiling the source code of the systolic array comprises: converting the source the source code of the systolic array into a two dimensional array of the plurality of  interconnected processing elements execute concurrently to process data received at the first layer (Aydonat, [Par.0010, lines 6-9], “The CNN accelerator also includes a plurality of processing elements that implement a standard convolutional layer during the first configuration”, examiner’s note, the standard convolutional generated at the first configuration is considered as the first layer.   [Par.0066, lines 1-6], “One or more processing elements may be used together with off-chip memory interfaces, on-chip buffers and control logic to route data into and out of the one or more processing elements to support computations performed by a variety of algorithms. These computations include matrix multiplication, and 1D/2D/3D convolutions” and [Par.0039, lines 13-16], “According to one embodiment, double buffering is supported to allow writing of new intermediate results from a convolution stage while reading results from a previous stage in a different location in the buffer.” Examiner’s note, writing a new intermediate result from a convolution stage while reading results from a previous state , therefore, examiner interprets the processing elements are  parallel processing corresponding to Pes are concurrently process. ). 
Regarding to claim 3, is being rejected for the same reason as the claim 17.
Regrading to claim 10, is being rejected for the same reason as the claim 17.
Regarding to claim 18, Aydonat teaches the computing system of claim 17, wherein compiling the source code of the systolic array comprises: identifying a plurality of operations performed by each of the plurality of interconnected processing elements (Aydonat, [Par.0041], “a sequencer unit is generated. The sequencer unit coordinates transmission of data to appropriate processing elements on 
wherein each of the plurality of interconnected processing elements perform the same plurality of operations (Yonadat, [Fig. 9, Par.0060, lines 1-7], “Input image pixels are transmitted into the processing element (PE) arrays 901-904 which may perform independent dot-product operations in a convolution procedure. PE array 901 represents a first PE array and PE array 904 represents an nth PE array, where n can be scaled to any number” Examiner’s note, the fig 9 shows the interconnected multiple processing elements (PES array) that generate the same operations.);
and assigning the plurality of operations to different hardware elements in the hardware system such that the plurality of operations are able to perform concurrently (Aydonat, [Par.0061, lines 4-13, and Par.0062], “The sequencer unit 920 coordinates the transmission of data to appropriate PE arrays 901-904 in order to time multiplex computations on the PE arrays 901-904. The accumulated results from the PE arrays 901-904 may be transmitted to one of the buffers 951-954 which transmits the computed output layer back to kernel and components in the PE arrays 901-904 for a next round of layer computation. The buffers 951-954 reside on a target device implementing the CNN accelerator 900 and may be referred to as on-chip buffers...” Therefore, Sequencer unit 920 coordinate and modify the plurality of operation, which may include the parallel performance. See further [Par.0086, lines 2-9], “The procedures described in these figures may be performed by a sequencer unit implemented by a CNN accelerator, and may be used to program the sequencer unit as described with reference to 503 in FIG. 5. Some of the techniques illustrated may be performed sequentially, in parallel or in an order other than that which is described and that the procedures described may be repeated.”).
Regarding to claim 4, is being rejected as the same reason as the claim 18.
Regarding to claim 11, is being rejected as the same reason as the claim 18.
Regarding to claim 19, Aydonat teaches the computing system of claim 15, wherein the model comprises a software defined parallelization pragma (Aydonat, [Par.0097], “According to an embodiment of the present invention, information from the buffer allocation unit 1530, computation unit generation unit 1540, and sequencer generation unit 1550 is used to generate a description of the design of the CNN 
indicating the sequential order of the plurality of pipelined functions (Aydonat, [Par.0096], “The sequencer generation unit 1550 generates and programs a sequencer unit that coordinates transmission of data to appropriate processing element arrays on the CNN accelerator, kernels in the processing element arrays, and components in the kernels at appropriate times in order to time multiplex computations on the processing element arrays. According to an embodiment of the present invention, the sequencer unit may be programmed to perform the procedures illustrated with reference to FIGS. 11-13.” Furthermore, see [Fig. 8] described the sequential order of the plurality of pipelined functions of the convolutional neural network.).
Regarding to claim 5, is being rejected for the same reason as the claim 19.
Regarding to claim 12, is being rejected for the same reason as the claim 19.
Regrading to claim 20, Aydonat teaches the computing system of claim 15, wherein the source code corresponding to the model comprises untimed functional code for the neural network (Aydonat, [Par.0006, lines 12-22], “For example, resources on the target are assigned to implement appropriately sized buffers to handle the types and sizes of images to be processed by the CNN accelerator. Resources on the target are also assigned to implement the appropriate types and number of computation units, such as processing elements, to support the type of filters and layers applied by the CNN accelerator. The EDA tool also generates a sequencer unit that is programmed to coordinate the transmission of data to appropriate computation units in order to time multiplex computations on the computation units.” 
Regarding to claim 6, is being rejected for the same reason as the claim 20.
Regarding to claim 13, is being rejected for the same reason as the claim 20.
Regarding to claim 7, Aydonat teaches the method of claim 1, wherein the plurality of pipelined functions includes at least one of a convolution unit, a pooling unit (Aydonat, [Par.0060, lines 4-9], “PE array 901 represents a first PE array and PE array 904 represents an nth PE array, where n can be scaled to any number. According to an embodiment of the present invention, each PE array includes hardware components that support layers such as a convolution layer, ReLU layer, normalization layer, and pooling layer.”),
and a matrix multiplier that transmits data to an activation unit in the plurality of pipelined functions (Aydonat, [Par.0066], “One or more processing elements may be used together with off-chip memory interfaces, on-chip buffers and control logic to route data into and out of the one or more processing elements to support computations performed by a variety of algorithms. These computations include matrix multiplication, and 1D/2D/3D convolutions. One or more processing elements may also be used to implement both a standard convolution layer and a fully connected layer at different instances of time.” Furthermore, [Par.006, lines 1-4], “A sequencer unit 920 orchestrates the sequencing, addressing, and delivery of data to each of the PE 
Regarding to claim 14, is being rejected for the same reason as the claim 7.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to EM N TRIEU whose telephone number is (571)272-5747.  The examiner can normally be reached on 7:30 - 5:00 M_TH
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on 571 272-9767.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access 
/E.T./
Examiner, Art Unit 2126
/BABOUCARR FAAL/Primary Examiner, Art Unit 2184