Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This office action is in response to the amendment filed on 12/28/2021. Claims 1, 3, 4, 8, 10, 11, 15, 17, 18. No claim was added. No claims were canceled. 
Claims 1-20 are presented for examination.
Response to Argument
Applicant’s  Argument: 
Aydonat does not teach "a parallelized systolic array" generated "from source code," and Ross fails to cure this deficiency. Ross at column 1, lines 15-16 and 29-30 teaches: "This specification relates to computing neural network inferences in hardware. ... In general, this specification describes a special-purpose hardware circuit that computes neural network inferences." Ross at column 5, lines 1-6 teaches: "In some implementations, the matrix computation unit 212 is a two-dimensional systolic array. The matrix computation unit 212 can also be a one-dimensional systolic array or other circuitry that can perform mathematical operations, e.g., multiplication and addition." Ross teaches a systolic array, which is circuitry. Aydonat and Ross fail to teach "a parallelized systolic array" generated "from source code," which is then compiled by a compiler. Thus, for at least this reason, Applicant submits the combination of references do not render claims 1, 8, and 15 obvious.
Examiner’s Response: 
Examiner respectfully disagree to applicant argument regarding Aydonat does not teach "a parallelized systolic array" generated "from source code," because as it can 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 
 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:


A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.

4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Aydonat et al. (Pub. No. US20170103299– hereinafter, Aydonat) in view of Ross et al. (Patent. No. US9710748-hereinafter, Ross).  
Regarding to claim 15, Aydonat teaches a computing system, comprising: a processor (Aydonat, [Fig.14, Par.0087, lines 3-5], “The computer system 1400 includes a processor 1410 that process data signals.”);
and a memory comprising a compiler, wherein the compiler, when executed by the processor performs an operation comprising (Aydonat, [Fig.14, Par.0087], “The computer system 1400 includes a memory 1420. The memory 1420 may store instructions and code represented by data signals that may be executed by the processor 1410.” and element 1560 HDL compilation unit is considered as the compiler. ):
receiving a model (Aydonat, [Fig.15, Par.0098, lines 2-4], “The HDL compilation unit 1560 compiles a description of the design for the CNN accelerator for the target.” Furthermore, see [Par.0089, lines 2-7], “According to an embodiment of the present invention, the EDA tool 1421 operates to identify features of a CNN accelerator which includes characteristics and parameters of the CNN accelerator, and resources of a target that the CNN accelerator is to be implemented on.” The target that CNN accelerator is to be implemented on, which included characteristics and parameter features. Therefore, the target is considered as the model.)
defining a sequential order of a plurality of pipelined functions performed when executing a first layer in a neural network (Aydonat, [Par.0009, lines 1-4], “According to an embodiment of the present invention, a method for implementing a CNN accelerator on a target includes utilizing one or more processing elements to implement a standard convolution layer” and [Par.0010, lines 6-9], “The CNN accelerator also includes a plurality of processing elements that implement a standard convolutional layer during the first configuration” and furthermore, see [Par.0035], “FIG. 3 illustrates an example of a standard convolution layer implemented by an exemplary embodiment of the present invention. The standard convolution layer may be one of the layers identified at 201, described with reference to FIG. 2. The standard convolution layer receives input features from an input feature map 310. The standard convolution layer also receives a set of coefficients 321-323 generated through a training of the convolution layer. The coefficients 321-323 apply weights which formulate a filter for the convolution layer. The standard convolution layer performs a 3-dimensional dot product between a region 330 defined within the input features 310 and the coefficients 321-323.”, Examiner’s note, a convolutional layer is considered as the first layer of the neural network.  Furthermore, see [Fig. 8] described the sequential order of the plurality of pipelined functions of the convolutional neural network.), [Par.0055], “FIG. 8 illustrates a conceptual view of an exemplary CNN algorithm 800 that may be implemented according to an exemplary embodiment of the present invention. The CNN 800 includes a plurality of layers where each layer transforms one volume of activations to another volume through a differentiable function.” And “The sequencer generation unit 1550 generates and programs a sequencer unit that coordinates transmission of 
wherein the neural network comprises a plurality of layers (Aydonat, [Fig.8, Par. 0055, lines 4-6], “The CNN 800 includes a plurality of layers where each layer transforms one volume of activations to another volume through a differentiable function.”);
generating, based on the model (Aydonat, [Par.0006, lines 5-10], “The methodology utilizes an electronic design automation (EDA) tool that generates a design for the CNN accelerator in response to features of a CNN accelerator which may include characteristics and parameters of the CNN accelerator specified by a user, and available resources on a target selected by the user. The target may include one or more target devices of one or more types. The EDA tool assigns resources on the target to implement the CNN accelerator to achieve high performance. For example, resources on the target are assigned to implement appropriately sized buffers to handle the types and sizes of images to be processed by the CNN accelerator. Resources on the target are also assigned to implement the appropriate types and number of computation units, such as processing elements, to support the type of filters and layers applied by the CNN accelerator.”),
a parallelized systolic array from source code (Aydonat, [Par.0007], “According to an embodiment of the present invention, a range of characteristics may be specified by the user to allow the CNN accelerator to execute a plurality of CNN algorithms. In this embodiment, one or more configurable status registers (CSRs) are implemented to allow a user to configure the target to support specified characteristics required for executing one of the plurality of CNN algorithms at runtime, after the CNN accelerator is programmed on the target. When implemented on an field programmable gate array (FPGA), the CSRs effectively allow runtime configuration of the CNN accelerator.” Examiner’s note, the CNN accelerator generates the plurality of CNN algorithms after the CNN is programmed. The CNN accelerator is considered as the parallelized systolic array that using the programmable (code) CSRS to generate the CNN accelerator. [Par.0062], “The CNN accelerator 900 includes configurable status registers (CSRs) 960. The CSRs 960 are programmable by a user during runtime to modify various aspects of the CNN accelerator 900. For example, the CSRs 960 may be set to add or subtract a number of convolution layers used by the CNN accelerator 900, add or subtract one or more pooling, ReLU, or other layers used by the CNN accelerator 900, and/or change a size of a filter supported by the CNN accelerator 900.”  Examiner’s note, the CNN accelerator is considered as the parallelized systolic array that using the programmable (code) CSRS to generate the CNN accelerator. The CNN accelerator is structured to writing of new intermedia results from a convolution stage while reading result that is corresponding the parallel working of the systolic array As it can be seen at [Par.0039], “... According to an embodiment of the present invention, the design for the CNN accelerator architecture is structured such that there is one read ,
the parallelized systolic array comprising a plurality of interconnected processing elements (Aydonat, [Par.0060], “Input image pixels are transmitted into the processing element (PE) arrays 901-904 which may perform independent dot-product operations in a convolution procedure. PE array 901 represents a first PE array and PE array 904 represents an nth PE array, where n can be scaled to any number” Examiner’s note, the processing element arrays 901-904 is considered as the parallelized systolic arrays.)
forming a matrix that each perform identical processes in order to execute a first function of the plurality of pipelined functions (Aydonat, [Par.0068], “When implementing a standard convolution layer using one or more of the processing elements, feature map data is treated as non-repeated data and stored in on-chip buffers 951-954. The output of one convolution layer is streamed into a next convolution layer. Each processing element receives the same streaming feature data that belongs to the same image every cycle to compute an output in the same (x,y) output coordinates in different output planes. Coefficient data is treated as repeated data since the same set of coefficients is used to compute different output feature maps in the same (x,y) output plane.” Examiner’s note, the each of the processing element in the processing element arrays process the same feature data that belongs to same image ;
[…]
and compiling source code corresponding to the model (Aydonat, [Fig.6, Par.0043, lines 1-7], “FIG. 6 is a flow chart illustrating a method for compiling a design for a CNN accelerator on a target according to an exemplary embodiment of the present invention. The target may be one or more field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), structured ASICs, or other programmable device.” Furthermore, see [Par.0044- 0049], “At 601, a design for the CNN accelerator is synthesized. Synthesis includes generating a logic design of the system to be implemented by the target. According to an embodiment of the present invention, synthesis generates an optimized logical representation of the system from an HDL design definition. The optimized logical representation of the system may include a representation that has a minimized number of functional blocks, such as logic gates, logic elements, and registers, required for the system. Synthesis also includes mapping the optimized logical representation… Programming the target physically transforms programmable resources on the target into the design of the CNN accelerator”. The design for the CNN is generated based on compiled source code.).
and the parallelized systolic array into a hardware level design (Aydonat, [Par.0032], “At 103, the design for the CNN accelerator is compiled for the target. According to an embodiment of the present invention, compilation involves performing synthesis, placement, routing, and timing analysis procedures on a hardware description language of the design. The compiled design for the CNN accelerator supports a range of CNN variants.”)
that provides a static schedule when executing the neural network in a hardware system (Aydonat, [Par.0061], “A sequencer unit 920 orchestrates the sequencing, addressing, and delivery of data to each of the PE arrays 901-904, kernels in each of the PE arrays 901-904, and components in each of the kernels. The sequencer unit 920 coordinates the transmission of data to appropriate PE arrays 901-904 in order to time multiplex computations on the PE arrays 901-904. The accumulated results from the PE arrays 901-904 may be transmitted to one of the buffers 951-954 which transmits the computed output layer back to kernel and components in the PE arrays 901-904 for a next round of layer computation. The buffers 951-954 reside on a target device implementing the CNN accelerator 900 and may be referred to as on-chip buffers” Therefore the sequencer unit 920 coordinate and transmit the data to appreciate PE bases on specific time. ).
However, Aydonat does not clearly clarify as defined by the first layer of the neural network;
On the other hand, Ross teaches as defined by the first layer of the neural network (Ross, [Column 1, lines 40-46], “and a vector computation unit communicatively coupled to the matrix computation unit and configured to, for each of ;
Aydonat and Ross are analogous in arts because they have the same filed of generate the neural network by using a systolic array. 
Accordingly, it would have been prima facie obvious to one of the ordinary skills in the art before the effective filing date of the claimed invention to have modified Aydonat’s method, further in view of Ross by having the systolic array generates the function defined by the first layer of the neural network. The modification would have been obvious because one of the ordinary skills in art would be motivated to improve the efficiency, the speed, reduce the power costed to generate a large data size, (Ross, column 2, lines 51-63], “Particular embodiments of the subject matter described in this specification can be implemented so as to realize one or more of the following 
Regarding to claim 1 is being rejected for the same reason as the claim 15.
Regrading to claim 8 is being rejected for the same reason as the claim 15.
Additionally, Aydonat also teaches a non-transitory computer-readable storage medium storing instructions, which when executed on one or more processing devices, perform an operation for scheduling a neural network, the operation comprising (Aydonat, [Par. 0106, lines 1-7], “It should be appreciated that embodiments of the present invention may be provided as a computer program product, or software, that may include a computer-readable or machine-readable medium having instructions. The instructions on the computer-readable or machine-readable medium may be used to program a computer system or other electronic device.”)
Regarding to claim 16, Aydonat teaches the computing system of claim 15, wherein the operation further comprises: configuring a field programmable gate array (FPGA) based on the hardware level design (Aydonat, [Fig.14, Par.0099, lines 5-8], “FIG. 14. The CNN accelerator configuration tool 1600 may be used to configure a 
wherein the hardware level design comprises register transfer level (RTL) code (Aydonat, [Par.0104], “The CNN accelerator configuration tool 1600 includes a configurable status register unit 1650. The configurable status register unit 1650 sets one or more configurable status registers to support the variation of the CNN accelerator identified. According to an embodiment of the present invention, setting a configurable status register may add or subtract a convolution layer on the CNN accelerator, add or subtract one or more pooling layers, or change a size of a filter.” Furthermore, see [Par.0042], “a description of the design is generated. According to an embodiment of the present invention, the description of the design may be in a hardware description language (HDL) format or other format.”).
Regarding to claim 2, is being rejected for the same reason as the claim 16.
Regarding to claim 9, is being rejected for the same reason as the claim 16. 
Regarding to claim 17, Aydonat teaches the computing system of claim 15, wherein generating the parallelized systolic array is performed in response to determining that the first function in the first layer of the neural network performs identical processes (Aydonat, [Par.0068], “When implementing a standard convolution layer using one or more of the processing elements, feature map data is treated as non-repeated data and stored in on-chip buffers 951-954. The output of one convolution layer is streamed into a next convolution layer. Each processing element receives the same streaming feature data that belongs to the same image every cycle to compute an output in the same (x,y) output coordinates in different output planes. Coefficient data is . 
Regarding to claim 3, is being rejected for the same reason as the claim 17.
Regrading to claim 10, is being rejected for the same reason as the claim 17.
Regarding to claim 18, Aydonat teaches the computing system of claim 15, wherein compiling the source code of the systolic array comprises: converting the source code of the systolic array […] of the plurality of interconnected processing elements (Aydonat, [Par.0060], “Input image pixels are transmitted into the processing element (PE) arrays 901-904 which may perform independent dot-product operations in a convolution procedure. PE array 901 represents a first PE array and PE array 904 represents an nth PE array, where n can be scaled to any number” Examiner’s note, the processing element arrays 901-904 is considered as the parallelized systolic arrays. [Par.0007], “According to an embodiment of the present ,
wherein the plurality of interconnected processing elements execute concurrently to process data received at the first layer ([Par.0039, lines 13-16], ;
identifying a plurality of operations performed by each of the plurality of interconnected processing elements (Aydonat, [Par.0041], “a sequencer unit is generated. The sequencer unit coordinates transmission of data to appropriate processing elements on the CNN accelerator at appropriate times in order to time multiplex computations on the processing elements. According to an embodiment of the present invention, the sequencer unit is programmed to perform the coordination required to support the algorithms performed by the CNN accelerator. The sequencer unit may be generated using logic array blocks, registers, and/or a hard or soft processing unit available on a target device.” Therefore, Sequencer unit 920 identify the operation performed by each processing element (PE arrays 901-904). Furthermore, see [Par.0046, lines 1-5], “the placed design is routed. During routing, routing resources 
wherein each of the plurality of interconnected processing elements perform the same plurality of operations (Yonadat, [Fig. 9, Par.0060, lines 1-7], “Input image pixels are transmitted into the processing element (PE) arrays 901-904 which may perform independent dot-product operations in a convolution procedure. PE array 901 represents a first PE array and PE array 904 represents an nth PE array, where n can be scaled to any number” Examiner’s note, the fig 9 shows the interconnected multiple processing elements (PES array) that generate the same operations.);
and assigning the plurality of operations to different hardware elements in the hardware system such that the plurality of operations are able to perform concurrently (Aydonat, [Par.0061, lines 4-13, and Par.0062], “The sequencer unit 920 coordinates the transmission of data to appropriate PE arrays 901-904 in order to time multiplex computations on the PE arrays 901-904. The accumulated results from the PE arrays 901-904 may be transmitted to one of the buffers 951-954 which transmits the computed output layer back to kernel and components in the PE arrays 901-904 for a next round of layer computation. The buffers 951-954 reside on a target device implementing the CNN accelerator 900 and may be referred to as on-chip buffers...” 
Aydonat teaches 2 dimensional convolution of the neural network , However, Aydonat does not clarify the systolic array into a two dimensional array (emphasis added)
On the other hand, Ross teaches the systolic array into a two dimensional array (emphasis added), (Ross, [Column 5, lines 23-32], “FIG.3 shows an example architecture 300 including a matrix computation unit. The matrix computation unit is a two-dimensional systolic. The two-dimensional the array 306 can be a square array. The array includes multiple cells 304. In some implementations, a first dimension 320 of the systolic array 306 corresponds to columns of cells and a second dimension 322 of the systolic array 306 corresponds to rows of cells. The systolic array can have more rows than columns, more columns than rows, or an equal number of columns and rows.”
Aydonat and Ross are analogous in arts because they have the same filed of generate the neural network by using a systolic array. 
Accordingly, it would have been prima facie obvious to one of the ordinary skills in the art before the effective filing date of the claimed invention to have modified 
Regarding to claim 4, is being rejected as the same reason as the claim 18.
Regarding to claim 11, is being rejected as the same reason as the claim 18.
Regarding to claim 19, Aydonat teaches the computing system of claim 15, wherein the model comprises a software defined parallelization pragma (Aydonat, [Par.0097], “According to an embodiment of the present invention, information from the buffer allocation unit 1530, computation unit generation unit 1540, and sequencer generation unit 1550 is used to generate a description of the design of the CNN accelerator. The description of the design may be in HDL format or other format” wherein the HDL format includes parallelization pragma.)
indicating the sequential order of the plurality of pipelined functions (Aydonat, [Par.0096], “The sequencer generation unit 1550 generates and programs a sequencer unit that coordinates transmission of data to appropriate processing element arrays on the CNN accelerator, kernels in the processing element arrays, and components in the kernels at appropriate times in order to time multiplex computations on the processing element arrays. According to an embodiment of the present invention, the sequencer unit may be programmed to perform the procedures illustrated with reference to FIGS. 11-13.” Furthermore, see [Fig. 8] described the sequential order of the plurality of pipelined functions of the convolutional neural network.).
Regarding to claim 5, is being rejected for the same reason as the claim 19.
Regarding to claim 12, is being rejected for the same reason as the claim 19.
Regrading to claim 20, Aydonat teaches the computing system of claim 15, wherein the source code corresponding to the model comprises untimed functional code for the neural network (Aydonat, [Par.0006, lines 12-22], “For example, resources on the target are assigned to implement appropriately sized buffers to handle the types and sizes of images to be processed by the CNN accelerator. Resources on the target are also assigned to implement the appropriate types and number of computation units, such as processing elements, to support the type of filters and layers applied by the CNN accelerator. The EDA tool also generates a sequencer unit that is programmed to coordinate the transmission of data to appropriate computation units in order to time multiplex computations on the computation units.” Furthermore, see [Par.0031, lines 5-10], “The design for the CNN accelerator may be optimized for the target implementing the CNN accelerator. According to an 
Regarding to claim 6, is being rejected for the same reason as the claim 20.
Regarding to claim 13, is being rejected for the same reason as the claim 20.
Regarding to claim 7, Aydonat teaches the method of claim 1, wherein the plurality of pipelined functions includes at least one of a convolution unit, a pooling unit (Aydonat, [Par.0060, lines 4-9], “PE array 901 represents a first PE array and PE array 904 represents an nth PE array, where n can be scaled to any number. According to an embodiment of the present invention, each PE array includes hardware components that support layers such as a convolution layer, ReLU layer, normalization layer, and pooling layer.”),
and a matrix multiplier that transmits data to an activation unit in the plurality of pipelined functions
Regarding to claim 14, is being rejected for the same reason as the claim 7.
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
 A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to EM N TRIEU whose telephone number is (571)272-5747.  The examiner can normally be reached on 7:30 - 5:00 M_TH.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Omar Fernandez Rivas can be reached on (571) 272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.

/E.T./Examiner, Art Unit 2128  

/OMAR F FERNANDEZ RIVAS/Supervisory Patent Examiner, Art Unit 2128