Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 are presented for examination. 
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 05/31/2022 has been entered.
Response to Argument
In reference to applicant’s argument regrading rejections under 35 U.S.C. § 103-Applicant’s Argument:
Applicant argument’s regarding the 103 rejection of claims 1-20. 
Examiner’s Response:
Applicant’s arguments have been considered. However, upon further consideration, the new ground of rejection is presented below.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 
 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:


A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or non obviousness.
In the event the determination of the status of the application as subject to AIA  35
U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
 The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claims 1, 3, 4, 5, 6, 7,8, 10,11, 12, 13, 14, 15, 17, 18, 19, 20 are rejected under 35 U.S.C. 103 as being unpatentable over by Wei et al. (Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs- Computer Science Department, University of California, Los Angeles, CA, USA – hereinafter, Wei) and further in view of Ross et al. (Patent No.: US 9710748-hereinafter, Ross).
Regarding to claim 1 Wei teaches a method for scheduling a neural network, the method comprising receiving a model defining a sequential order of a plurality of pipelined functions performed when executing a first layer in the neural network (Wei, [Sec.2.2, Fig. 1, page 4, the left column], “We present a novel 2-D systolic array architecture for CNN on FPGA in Fig. 1. As shown in this figure, each PE shifts the data of W and IN horizontally and vertically to the neighboring PEs at each…As shown in the right part of Fig. 3, PE0;0 gets weight data W from buffer WB and input feature map data IN from buffer IB at the first cycle. PE0;0 performs the multiplication of the two inputs and accumulates the result OUT in the register within the PE along with previous partial accumulated results. Meanwhile, the other PEs are stalled because no data is being received from at least one of its inputs. At cycle 1, the data W from PE0;0 is passed to PE0;1 and the data IN is passed to PE1;0. As a result, both PE0;1 and PE1;0 have the required data to perform an execution. At the same cycle, PE0;0 is able to perform the execution with new data coming from the input buffers as well. As can be seen for the 3  3 systolic array examples shown in Fig. 3, all Pes are active after five cycles. Thus, they can synchronously read data from their neighboring PEs, perform computation and pass data to the next PEs simultaneously in each cycle. After the in-PE computation has been finished, finally, OUT in the shift register is shifted across vertical PEs to the corresponded OB.” Examiner’s note, a CNN systolic array  is considered as the receiving model defining the sequential order of plurality functions (input function, output function) of input layer and output layer of a neural network,  a PE0 of a processing element array generate an input data is considered as a function performed when executing the first layer or input layer.   ),
wherein the neural network comprises a plurality of layers (Wei, [Sec 2.1,], “A convolutional neural network is a typical deep learning neural network that is adopted in applications like image and video processing. In recent years CNN has evolved quickly, especially with boost from the visual recognition challenge (ImageNet [18]). CNN models, such as AlexNet, VGG, and GoogleNet [18–20], consist of several to hundreds of cascaded layers. Although these networks vary significantly in terms of topology and complexity, the basic computations in each layer are common—such as convolutional, fully connected, pooling, sigmod and ReLU [21].”);
generating, based on the model, source code defining a parallelized systolic array (Wei, [Sec.2.1, page 2, the left column, Fig.1], “ A convolutional neural network is a typical deep learning neural network that is adopted in applications like image and video processing. In recent years CNN has evolved quickly, especially with a boost from the visual recognition challenge (ImageNet [18])….


    PNG
    media_image1.png
    201
    662
    media_image1.png
    Greyscale


A simplified code of the convolutional layer can be summarized in Code 1. Although the computation pattern is as simple as multiplication and accumulation, the calculation requires a huge volume of both computation power and data transfer bandwidth. On the other hand, the algorithm also provides considerable potential for both the massive parallelism and intensive data reuse. In the original six-level nested loop, three (L1, L4, L3) are parallelizable because they do not have data dependency; the remaining loops (L2, L5, L6) have dependency carried for the accumulation of array out. However, these loops are still parallelizable by leveraging the associative law of the addition operations…” Examiner’s note, the code 1 implementing the CNN on systolic array, that is considered as the source code that defining the systolic array, for further clarification see [Sec.3.1, page 3, Fig.4],

    PNG
    media_image2.png
    820
    550
    media_image2.png
    Greyscale
),
the parallelized systolic array comprising a plurality of interconnected processing elements forming a matrix that each perform identical processes (Wei, [Sec.3.1, page 3, Fig.4], 

    PNG
    media_image2.png
    820
    550
    media_image2.png
    Greyscale

Examiner’s note, the systolic array performs parallel execution in the PE array, which is considered as the parallelize systolic array. Furthermore, the systolic array having each processing element of the interconnected processing elements performs identical processes (multiplication and accumulation) in each cycle, as it can be seen at [Sec.3.4, Page 4, sec 3.4, right column], “In the CNN systolic design, both computation and data transfer may be the performance bottleneck for different design options. The adoption of double buffering in the input and output enables us to model the throughput in a decoupled way, so the overall throughput T is dominated by the lower one of computation throughput (PT) and external memory transfer throughput (MT). T(~s;~t) = min(PT(~s;~t);MT(~s;~t)) (7) Since the systolic array is executed in the fully pipelined way, each PE will complete two floating point operations (multiplication and accumulation) in each cycle. However, the quantization effect (described in Section 2.3) may lead to wasted computation on the incomplete data blocks on the boundaries of the original loops. By defining the clock frequency as F, the computational throughput is modeled as the number of effective floating operations in the original code performed every second…”.)
in order to execute a first function of the plurality of pipelined functions defined by the first layer of the neural network (Wei, [Sec.2.2, page 2], “We present a novel 2-D systolic array architecture for CNN on FPGA in Fig. 1. As shown in this figure, each PE shifts the data of W and IN horizontally and vertically to the neighboring PEs at each 
    PNG
    media_image3.png
    1024
    399
    media_image3.png
    Greyscale


Examiner’s note, PE0 generates an output of the first input layer and of the input layer and passed it to the next processing element in a systolic array. Therefore, generating the output of the first input layer in the processing element array that is considered as an executing function of the first function of plurality pipelined of the first layer of the neural network.)
[…]
the source code corresponding to the model and the parallelized systolic array into a hardware level design that provides a static schedule when executing the neural network in a hardware system (Wei, [Sec.3.1, page 3, right column],

    PNG
    media_image2.png
    820
    550
    media_image2.png
    Greyscale

Examiner’s note, the code in fig 4 is corresponding to a systolic array (parallelize systolic array) for a neural network that is implemented on FPGA to executing the processing elements, wherein, an FPGA is considered as a hardware level design. The processing elements are considered as a hardware system, as it can be seen at [Sec.2.2, page 2, left column], “2.2 Systolic Array Architecture for CNN We present a novel 2-D systolic array architecture for CNN on FPGA in Fig. 1. As shown in this figure, each PE shifts the data of W and IN horizontally and vertically to the neighboring PEs at each…”). 
However, Wei does not clearly disclose and compiling, using one or more computing processors,
On the other hand, Ross teaches and compiling, using one or more computing processors (Ross, Col.9, lines 25-64], “The term "data processing apparatus" encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can include special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit). The apparatus can also include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them. A computer program (which may also be referred to or described as a program, software, a software application, a module, a software module, a script, or code) can be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, and it can be deployed in any form, including as a standalone program or as a module, component, subroutine or other unit suitable for use in a computing environment... The processes and logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).” Examiner’s note, data processing apparatus comprising multiple processor that used to compile a programing language or source code, such as operating an input and generating an output.),
Wei and Ross are analogous in arts because they have the same filed of generate the systolic array for CNN. 
Accordingly, it would have been prima facie obvious to one of the ordinary skills in the art before the effective filing date of the claimed invention to have modified Wei’s method, further in view of Ross by compiling, using one or more computing processors. The modification would have been obvious because one of the ordinary skills in art would be motivated to able to operate an input and generate an output by using the computer processor to compiler a programing languages , (Ross, Col.9, lines 25-64], “The term "data processing apparatus" encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can include special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit). The apparatus can also include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them. A computer program (which may also be referred to or described as a program, software, a software application, a module, a software module, a script, or code) can be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, and it can be deployed in any form, including as a standalone program or as a module, component, subroutine or other unit suitable for use in a computing environment... The processes and logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).”).
Regarding claim 8 is being rejected for the same reason as the claim 1.
However, Wei does not clearly disclose a non-transitory computer-readable storage medium storing instructions, which when executed on one or more processing devices, perform an operation for scheduling a neural network, the operation comprising
On the other hand, Ross teaches a non-transitory computer-readable storage medium storing instructions, which when executed on one or more processing devices, perform an operation for scheduling a neural network, the operation comprising (Ross, [Col.10, [lines 17-33], “Computer readable media suitable for storing computer program instructions and data include all forms of nonvolatile memory, media and memory devices, including by way  of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto optical disks; and CD ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry. To send for interaction with a user, embodiments of the subject matter described in this specification can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can send input to the computer.”)
Wei and Ross are analogous in arts because they have the same filed of generate the systolic array for CNN. 
Accordingly, it would have been prima facie obvious to one of the ordinary skills in the art before the effective filing date of the claimed invention to have modified Wei’s method, further in view of Ross by having a non-transitory computer-readable storage medium storing instructions, which when executed on one or more processing devices, perform an operation for scheduling a neural network ((Ross, [Col.10, [lines 17-33], “Computer readable media suitable for storing computer program instructions and data include all forms of nonvolatile memory, media and memory devices, including by way  of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto optical disks; and CD ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry. To send for interaction with a user, embodiments of the subject matter described in this specification can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can send input to the computer.”)
Regarding claim 15 is being rejected for the same reason as the claim 1.
Wei does not clearly teach a computing system, comprising: a processor and a memory comprising a compiler, wherein the compiler, when executed by the processor performs an operation comprising 
On the other hand, Ross teaches a computing system, comprising: a processor and a memory comprising a compiler, wherein the compiler, when executed by the processor performs an operation comprising (Ross, Col.9, lines 25-64], “The term "data processing apparatus" encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can include special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit). The apparatus can also include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them. A computer program (which may also be referred to or described as a program, software, a software application, a module, a software module, a script, or code) can be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, and it can be deployed in any form, including as a standalone program or as a module, component, subroutine or other unit suitable for use in a computing environment... The processes and logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).”
Wei and Ross are analogous in arts because they have the same filed of generate the systolic array for CNN. 
Accordingly, it would have been prima facie obvious to one of the ordinary skills in the art before the effective filing date of the claimed invention to have modified Wei’s method, further in view of Ross by having a computing system, comprising: a processor and a memory comprising a compiler, wherein the compiler, when executed by the processor performs an operation comprising. The modification would have been obvious because one of the ordinary skills in art would be motivated to able to operate an input and generate an output, (Ross, Col.9,  lines 25-64], “The term "data processing apparatus" encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can include special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit). The apparatus can also include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them. A computer program (which may also be referred to or described as a program, software, a software application, a module, a software module, a script, or code) can be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, and it can be deployed in any form, including as a standalone program or as a module, component, subroutine or other unit suitable for use in a computing environment... The processes and logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).”).
Regarding claim 3 Wei teaches the method of claim 1, wherein generating the source code defining the parallelized systolic array is performed in response to determining that the first function in the first layer of the neural network performs identical processes (Wei, [Sec.2.1-2.2, pages 2-3, Fig.1], “ A convolutional neural network is a typical deep learning neural network that is adopted in applications like image and video processing. In recent years CNN has evolved quickly, especially with a boost from the visual recognition challenge (ImageNet [18])….


    PNG
    media_image1.png
    201
    662
    media_image1.png
    Greyscale


A simplified code of the convolutional layer can be summarized in Code 1. Although the computation pattern is as simple as multiplication and accumulation, the calculation requires a huge volume of both computation power and data transfer bandwidth. On the other hand, the algorithm also provides considerable potential for both the massive parallelism and intensive data reuse. In the original six-level nested loop, three (L1, L4, L3) are parallelizable because they do not have data dependency; the remaining loops (L2, L5, L6) have dependency carried for the accumulation of array out. However, these loops are still parallelizable by leveraging the associative law of the addition operations…

    PNG
    media_image4.png
    1198
    934
    media_image4.png
    Greyscale

” Examiner’s note, the code 1 implementing the CNN on systolic array, that is considered as the source code that defining the systolic array. Generating the output of the first input layer in the processing element array that is considered as an executing function of the first function of the first layer of the neural network. Furthermore, the systolic array performs parallel execution in the PE array, which is considered as the parallelize systolic array. the systolic array having each processing element of the interconnected processing elements performs identical processes (multiplication and accumulation) in each cycle, as it can be seen at [Sec.3.4, Page 4, sec 3.4, right column], “In the CNN systolic design, both computation and data transfer may be the performance bottleneck for different design options. The adoption of double buffering in the input and output enables us to model the throughput in a decoupled way, so the overall throughput T is dominated by the lower one of computation throughput (PT) and external memory transfer throughput (MT). T(~s;~t) = min(PT(~s;~t);MT(~s;~t)) (7) Since the systolic array is executed in the fully pipelined way, each PE will complete two floating point operations (multiplication and accumulation) in each cycle. However, the quantization effect (described in Section 2.3) may lead to wasted computation on the incomplete data blocks on the boundaries of the original loops. By defining the clock frequency as F, the computational throughput is modeled as the number of effective floating operations in the original code performed every second…”).
Regarding claim 10 is being rejected for the same reason as the claim 3.
Regarding claim 17 is being rejected for the same reason as the claim 3.
Regrading claim 4 Wei teaches the method of claim 1, wherein compiling the source code of the systolic array comprises: converting the source code of the systolic array into a two dimensional array of the plurality of interconnected processing elements (Wei,[Abstract], “We provide an analytical model for performance and resource utilization and develop an automatic design space exploration framework, as well as source to-source code transformation from a C program to a CNN implementation using systolic array.” Examiner’s note, code indicating a CNN implementation using systolic array, wherein, the systolic array is ywo dimensional systolic array , as it can be seen at  [Sec.3.1, Fig.4],

    PNG
    media_image2.png
    820
    550
    media_image2.png
    Greyscale
),
wherein the plurality of interconnected processing elements executes concurrently to process data received at the first layer (Wei, [Sec.2.2], “ 

    PNG
    media_image5.png
    1157
    451
    media_image5.png
    Greyscale

…
As shown in the right part of Fig. 3, PE0;0 gets weight data W from buffer WB and input feature map data IN from buffer IB at the first cycle. PE0;0 performs the multiplication of the two inputs and accumulates the result OUT in the register within the PE along with previous partial accumulated results. Meanwhile, the other PEs are stalled because no data is being received from at least one of its inputs. At cycle 1, the data W from PE0;0 is passed to PE0;1 and the data IN is passed to PE1;0. As a result, both PE0;1 and PE1;0 have the required data to perform an execution. At the same cycle, PE0;0 is able to perform the execution with new data coming from the input buffers as well. As can be seen for the 3 _ 3 systolic array example shown in Fig. 3, all Pes are active after five cycles. Thus, they can synchronously read data from their neighboring PEs, perform computation and pass data to the next PEs simultaneously in each cycle. After the in-PE computation has been finished, finally, OUT in the shift register is shifted across vertical PEs to the corresponded OB.” Examiner’s note, the PE0,1 and PE1,0 performs a data execution the same time with PE00 executing a data is received at the first layer.);
identifying a plurality of operations performed by each of the plurality of interconnected processing elements (Wei, Sec.3.1, page 3], “

    PNG
    media_image2.png
    820
    550
    media_image2.png
    Greyscale

“Examiner’s note, identify a performing of CNN systolic array in loop tilling.), 
wherein each of the plurality of interconnected processing elements perform the same plurality of operations (Wei, Sec. 3.4, page 4, right column], “In the CNN systolic design, both computation and data transfer may be the performance bottleneck for different design options. The adoption of double buffering in the input and output enables us to model the throughput in a decoupled way, so the overall throughput T is dominated by the lower one of computation throughput (PT) and external memory transfer throughput (MT). T(~s;~t ) = min(PT(~s;~t);MT(~s;~t)) (7) Since the systolic array is executed in the fully pipelined way, each PE will complete two floating point operations (multiplication and accumulation) in each cycle “ Examiner’s note, each processing element of an interconnected processing elements are preform the same plurality of operation such as multiplication and accumulation in each cycle.);
assigning the plurality of operations to different hardware elements in the hardware system such that the plurality of operations are able to perform concurrently (Wei, [Sec.2.2, page 3, left column], “As shown in the right part of Fig. 3, PE0;0 gets weight data W from buffer WB and input feature map data IN from buffer IB at the first cycle. PE0;0 performs the multiplication of the two inputs and accumulates the result OUT in the register within the PE along with previous partial accumulated results. Meanwhile, the other PEs are stalled because no data is being received from at least one of its inputs. At cycle 1, the data W from PE0;0 is passed to PE0;1 and the data IN is passed to PE1;0. As a result, both PE0;1 and PE1;0 have the required data to perform an execution. At the same cycle, PE0;0 is able to perform the execution with new data coming from the input buffers as well. As can be seen for the 3.3 systolic array example shown in Fig. 3, all Pes are active after five cycles. Thus, they can synchronously read data from their neighboring PEs, perform computation and pass data to the next PEs simultaneously in each cycle. After the in-PE computation has been finished, finally, OUT in the shift register is shifted across vertical PEs to the corresponded OB.” Examiner’s note, the PE0 and PE1 perform plurality of operation on the same time such as the PE1 executes a required data concurrently with an operation of the PE0 executes on a received data from an input buffer.).  
Regarding claim 11 is being rejected for the same reason as the claim 4.
Regarding claim 18 is being rejected for the same reason as the claim 4.
Regarding claim 5 Wei teaches the method of claim 1, wherein the model comprises a software defined parallelization pragma indicating the sequential order of the plurality of pipelined functions (Wei, [Sec.5.1, page 5, fig.6], “We implement a push-button design flow framework to generate an executable system on FPGAs from a user-written intuitive CNN program in Fig. 6. A user only needs to specify the nested loop that functions as a CNN layer using a pragma, as shown in the left side of Fig. 6. Our automation flow shown in the right side of Fig. 6 first analyzes the user program using the ROSE compiler infrastructure [25] to obtain necessary information such as iteration domains and data access patterns. Subsequently, we perform design space exploration to identify multiple valid design options with the highest estimated throughput. The design options are parameterized to instantiate template files, including OpenCL systolic array implementation (kernel), as well as the C/C++ software program (host).

    PNG
    media_image6.png
    379
    634
    media_image6.png
    Greyscale
 
Examiner’s note, the Systolic array (parallelize systolic array) indicating a sequential performing of plurality of operations in the systolic array. As it can be seen at [Sec.3.1, page 3, right column, fig.4], “Fig. 4 for this purpose, which establishes the link between
the architecture and high-level program code. The tiled loops in the intermediate representation contains all the architecture considerations in the systolic array, such as PE array mapping, PE array shape, data reuse strategy, etc. This representation itself is a sequential program, which enables us to perform the modeling in a general way using program analysis techniques and tools such as polyhedral model.”).  
Regarding claim 12 is being rejected for the same reason as the claim 5.
Regarding claim 19 is being rejected for the same reason as the claim 5.
Regarding claim 6 Wei teaches the method of claim 1, wherein the source code corresponding to the model comprises untimed functional code for the neural network (Wei, [Sec.3.1, page 3], “

    PNG
    media_image7.png
    803
    508
    media_image7.png
    Greyscale

is determined by the bounds of inner loops (~t= (t0; t1; t2)); while the systolic array mapping feasibility is determined by the relation between the inner loop iterators and the array access addresses in the loop body.”  Examiner’s note, the code schedules a sequentially operation of interconnected processing elements in the loop tiling, the code is not indicate a specific operation time of each block, therefore, the source code is considered as the untimed function code.).  
Regarding claim 13 is being rejected for the same reason as the claim 6.
Regarding claim 20 is being rejected for the same reason as the claim 6.
Regarding claim 7 Wei teaches the method of claim 1, wherein the plurality of pipelined functions includes at least one of a convolution unit, a pooling unit (Wei, [Sec.2.1, page 2], “

    PNG
    media_image8.png
    1143
    756
    media_image8.png
    Greyscale

”),
and a matrix multiplier that transmits data to an activation unit in the plurality of pipelined functions (Wei, [Sec 2.1 and 2.2], “…

    PNG
    media_image9.png
    1024
    394
    media_image9.png
    Greyscale

“Examiner’s note, each processing element performs a matrix multiplication and accumulated the result and pass to neighboring Pes. data processing of the systolic array, which including the plurality of processing elements are pipelined performing. Therefore, a calculation of each processing element is considered as function of plurality pipelined functions in the systolic array.).
Regarding claim 14 is being rejected for the same reason as the claim 7.

Claims 2, 9 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over by Wei et al. (Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs- Computer Science Department, University of California, Los Angeles, CA, USA – hereinafter, Wei) and further in view of Ross et al. (Patent No.: US 9710748-hereinafter, Ross) and further in view of Lee at al. (NPL: Implementation of the Super-Systolic Array for Convolution- Dept. of Computer Engineering Chungbuk National University, Cheongju Chungbuk 361-763 Korea,- hereinafter, Lee)
Regarding claim 2 Wei teaches the method of claim 1, further comprising: configuring a field programmable gate array (FPGA) based on the hardware level design (Wei, [Sec. 1] “Sec 1. ], “Although those implementations utilize FPGA resources well to achieve high throughput, the capacity of hardware resources in the FPGA increases continuously, which provides more than a thousand floating compute units in one FPGA chip..”), 
However, Wei and Ross dos not disclose wherein the hardware level design comprises register transfer level (RTL) code 
On the other hand, Lee teaches wherein the hardware level design comprises register transfer level (RTL) code (Lee, [Sec 3, page 3, right column] “Each of the systolic array multiplier and super-systolic array for convolution was modeled and simulated in RT level using VHDL[11], and synthesized to a schematic using Synopsys design compiler[12-13].” And [Sec.3] 

    PNG
    media_image10.png
    978
    503
    media_image10.png
    Greyscale
).
Wei, Ross and Lee are analogous in arts because they have the same filed of generate the systolic array for CNN. 
Accordingly, it would have been prima facie obvious to one of the ordinary skills in the art before the effective filing date of the claimed invention to have modified Wei and Ross’s method, further in view of Lee having the hardware level design comprises register transfer level (RTL) code. The modification would have been obvious because one of the ordinary skills in art would be motivated to able to apply the register transfer level into the systolic array , (Lee, [Sec 3, page 3, right column] “Each of the systolic array multiplier and super-systolic array for convolution was modeled and simulated in RT level using VHDL[11], and synthesized to a schematic using Synopsys design compiler[12-13].” 
Regarding claim 9 is being rejected for the same reason as the claim 2.
Regarding claim 16 is being rejected for the same reason as the claim 2.

Conclusion
32.	The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure is provide below.
Bazlamacci et al. (Pub. No.:20120257506-hereinafter, Bazlamacci.) teaches a systolic array architecture. 
Bolic et al. (Pub. No.: US 20160210167-hereinafter, Bolic) teaches a virtualize hardware acceleration performs the plurality of operations in parallel. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to EM N TRIEU whose telephone number is (571)272-5747.  The examiner can normally be reached on 7:30 - 5:00 M_TH.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Omar Fernandez Rivas can be reached on (571) 272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/E.T./Examiner, Art Unit 2128  

/OMAR F FERNANDEZ RIVAS/Supervisory Patent Examiner, Art Unit 2128