DETAILED ACTION
This Office Action is in response to communication made on October 12, 2022. 
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114
Claims 1-20 are pending in this Application.
Claims 1 and 17 have been amended by the Applicant
Applicant’s amendments necessitate a new ground(s) of rejection.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
Response to Arguments
Applicant’s arguments to the rejections under 35 USC §103, filed October 12, 2022, have been fully considered.
The Applicant argues on page 11 that “Husseini, alone or in combination with Venkataramani, fails to disclose "generating instructions for a deep learning accelerator to facilitate generation of the output of the first artificial neural network, wherein the instructions are generated based on converting a description of the first artificial neural network using a compiler" and "optimizing, by utilizing the compiler, the instructions by reducing overlapping computations associated with the first artificial neural network or by coordinating a timing of the generation of the output," as claimed in amended claim 1. The examiner agrees.
Upon further consideration of the applicant’s amendments, a new ground of rejection is made and listed below.  
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2 and 5 are rejected under 35 U.S.C. 103 as being unpatentable over El Husseini et al (US Patent Application Pub. No. 2019/0286972 A1) hereinafter Husseini, in view of Caballero et al (US Patent Application Pub. No. 2019/0042224 A1) hereinafter Caballero. 
Regarding claims 1 and 5, Husseini teaches: 
A method, comprising: receiving, in a device and from a server system, a computation task of a first artificial neural network, wherein an output of the first artificial neural network is configured as an input to a second artificial neural network implemented in the server system; (see Fig.4 and ¶ [0002],[0020],[0018], Husseini shows a system which includes a neural network server coupled to a neural network accelerator, for solving complex computational tasks such as recognizing images, analyzing and classifying information, and performing various classification tasks (computation task of a first artificial neural network), by partitioning artificial neural network models into subgraphs provided to the neural network accelerator, Fig. 3, Fig. 4 items 310,320 and [0062],[0063] shows a subgraph 320 of the neural network model which executes on the accelerator (first artificial neural network), where the subgraph is generated by the neural nodes 305-306 on the server, which provides inputs to neural nodes 321 and 322 on the accelerator, and outputs of the subgraph 320 generated by the neural nodes is transmitted to the neural nodes 340 and 341 on the server (input to a second artificial neural network implemented in the server system)
generating instructions for a deep learning accelerator to facilitate generation of the output of the first artificial neural network, (see Fig.4 and ¶ [0002],[0036], Husseini shows the neural network server includes a compiler to partition subgraphs from a deep neural network/DNN model for execution on acceleration hardware and the compiler generates metadata and code (generating instructions for a deep learning accelerator), Fig. 10 steps 1040,1050 and [0125]-[0128] shows the generated input values of the subgraph are communicated from the neural network server to the neural network accelerator, and the generated output values from the accelerator (generation of the output of the first artificial neural network)
executing, by the computing device and the deep learning accelerator, instructions to generate the output of the first artificial neural network responsive to an input to the first artificial neural network; and (see Fig.4 and ¶ [0002],[0036], Husseini shows the neural network server includes a compiler to partition subgraphs from a deep neural network/DNN model for execution on acceleration hardware (executing, by the computing device) and the compiler generates metadata and code (instructions), Fig. 10 steps 1040,1050 and [0125]-[0128] shows the generated input values of the subgraph are communicated from the neural network server to the neural network accelerator, and the generated output values from the accelerator are communicated from the neural network accelerator (the deep learning accelerator) to the server (generate the output of the first artificial neural network responsive to an input to the first artificial neural network)
communicating, by the device and to the server system, the output of the first artificial neural network as the input to a second artificial neural network implemented in the server system (see Fig. 10 step 1050 and [0128], Husseini shows the generated output values from the accelerator are communicated from the neural network accelerator to the server (communicating, by the device and to the server system, the output of the first artificial neural network), Fig. 3 and [0063] shows neural network 310 in the server, where the output generated by the accelerated neural node 331 is transmitted by the edges to the server neural node 341, where the edges connecting the nodes 330-331 to the nodes 340-342 are at an output boundary of the subgraph 320 in the accelerated node (output of the first artificial neural network as the input to a second artificial neural network)
Husseini does not explicitly show:
wherein the instructions are generated based on converting a description of the first artificial neural network using a compiler;
optimizing, by utilizing the compiler, the instructions by reducing overlapping computations associated with the first artificial neural network or by coordinating a timing of the generation of the output;
Caballero shows:
wherein the instructions are generated based on converting a description of the first artificial neural network using a compiler; (see Fig. 1 item 199 and [0026], Caballero shows a computing device which includes a compiler that implements optimization algorithms, where the compiler transforms portions and/or an entirety of computer-based programs to produce code that may efficiently utilize a processor and improve execution time, memory usage, code size, etc. (using a compiler), [0038],[0040] shows an example compiler that converts example high-level language/HLL, such as C, C++, Java, instructions into example low-level language/LLL instructions correspond to machine-code that can be read and/or otherwise executed by a computer (instructions are generated based on converting a description of), [0150] shows an example processor platform to implement the system which can be a server, a personal computer, a workstation, a self-learning machine e.g., a neural network (the first artificial neural network)
optimizing, by utilizing the compiler, the instructions by reducing overlapping computations associated with the first artificial neural network or by coordinating a timing of the generation of the output (see Fig. 1 item 110, Fig.3 and [0045]-[0052], Caballero shows the compiler includes an optimizer that improves the efficiency of one or more target processors by reducing the execution time required to implement and/or otherwise execute the example LLL instructions such as performing loop collapsing based on an optimization scenario (optimizing, by utilizing the compiler, the instructions by reducing overlapping computations)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Husseini to incorporate the teaching of Caballero such that the system uses compiler that transforms portions and/or an entirety of computer-based programs to produce code that may efficiently utilize a processor and improve execution time, memory usage, code size, etc. includes an optimizer that improves the efficiency of one or more target processors by reducing the execution time required such as by performing loop collapsing. Doing so would provide faster execution of the complex computation tasks since the system includes an optimizer that improves the efficiency of the target processors by reducing the execution time required.

Regarding claim 2, Husseini modified by Caballero teaches claim 1
Husseini shows:
The method of claim 1, wherein the communicating is performed via a network interface to a wired or wireless local area network; and (see Fig.11 and ¶ [0135], Husseini shows the communication connection(s) 1170 are not limited to wired connections (e.g., megabit or gigabit Ethernet, Infiniband, Fibre Channel over electrical or fiber optic connections) but also include wireless technologies (e.g., RF connections via Bluetooth, WiFi (IEEE 802.11a/b/n), cellular) and other suitable communication connections for providing a network connection (communicating is performed via a network interface to a wired or wireless local area network)
the receiving of the computation task includes receiving, via the network interface, first data representative of parameters of the first artificial neural network, and second data representative of instructions executable to implement matrix computations of the first artificial neural network based at least on the first data (see Fig.3 and ¶ [0069],[0041]-[0045], Husseini shows the neural network data 310 can include source code, executable code, metadata, configuration data, data structures and/or files for representing the neural network model (first data representative of parameters of the first artificial neural network), and the NN processor core is programmed to execute a subgraph or an individual node of a neural network, uses a local memory for storing weights, biases, input values, output values, and so forth where the bulk of the processing operations performed in implementing a neural network includes Matrix x Matrix or Matrix x Vector multiplications (second data representative of instructions executable to implement matrix computations of the first artificial neural network based at least on the first data).


Claims 3-4 and 6-15 are rejected under 35 U.S.C. 103 as being unpatentable over Husseini and Caballero, in view of Jantz et al (US Patent Application Pub. No. 2017/0169208 A1) hereinafter Jantz. 
Regarding claim 3, Husseini modified by Caballero teaches claim 2 
Husseini does not explicitly show:
The method of claim 2, further comprising: receiving, via the network interface, third data from a data source, the third data representative of the input the first artificial neural network; and receiving, via the network interface, fourth data from the server system; wherein the output of the first artificial neural network is further responsive to the third data and the fourth data
Jantz shows:
The method of claim 2, further comprising: receiving, via the network interface, third data from a data source, the third data representative of the input the first artificial neural network; and receiving, via the network interface, fourth data from the server system; wherein the output of the first artificial neural network is further responsive to the third data and the fourth data (see Fig. 1, Fig. 2 items 205,210 and [0037],[0064], Jantz shows an autonomous vehicles system, that includes a neural network and sensor inputs such as a LIDAR, a RADAR, a vision system, a positioning system and wireless networks (third data representative of the input the first artificial neural network), Fig.2 item 215 and [0047] shows the neural network receives neural network sensor input and reference sensor data 215 (fourth data from the server system), and based on the action of neural network the host CPU performs functions such as normal vehicle startup operation (the output of the first artificial neural network is further responsive to the third data and the fourth data)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Husseini to incorporate the teaching of Jantz such that the neural network accelerator receives inputs from sensors and from the neural network server to performs the task such as recognizing images, analyzing and classifying information, and performing various classification tasks. Doing so would provide improved processing speed and reduced latency since the more computationally-intensive portions are performed by the hardware neural network accelerator.

Regarding claim 4, Husseini modified by Caballero and Jantz teaches claim 3 
Husseini shows:
The method of claim 3, further comprising: controlling access, made via the network interface, to random access memory of the device, using a control unit configured on an integrated circuit die of Field-Programmable Gate Array (FPGA) or Application Specific Integrated circuit (ASIC) implementing a Deep Learning Accelerator, the Deep Learning Accelerator comprising at least one processing unit configured to perform matrix computations, and the control unit configured to load the instructions from the random access memory for execution (see Fig.1 and ¶ [0044], Husseini shows a neural network multiprocessor which includes a control unit, memory and network I/O interface, having a plurality neural processing cores, implemented as a custom or application-specific integrated circuit, as a field programmable gate array (FPGA) or other reconfigurable logic (using a control unit configured on an integrated circuit die of Field-Programmable Gate Array (FPGA) or Application Specific Integrated circuit (ASIC)), [0050],[0041],[0047] shows the control unit executing a deep neural network, performing Matrix x Matrix or Matrix x Vector multiplications and a memory interface for RAM/DRAM external or embedded memory and a direct memory access controller allowing transfer of blocks of data in memory (implementing a Deep Learning Accelerator, the Deep Learning Accelerator comprising at least one processing unit configured to perform matrix computations), [0048],[0071] shows the I/O interface 150 includes circuitry for receiving and sending input and output signals to other components (access, made via the network interface), and the memory in addition to the source code includes training data and a set of input data for applying to the neural network model and a desired output from the neural network model (load the instructions from the random access memory for execution).

Regarding claim 6, Husseini modified by Caballero and Jantz teaches claim 5
Husseini shows: 
The system of claim 5, wherein the device includes at least one interface configured to receive, from the server system, the computation task identified by first data representative of parameters of the first artificial neural network, and second data representative of the instructions executable to implement matrix computations of the first artificial neural network using at least the first data; and (see Fig.3 and ¶ [0069],[0041]-[0045], Husseini shows the neural network data 310 can include source code, executable code, metadata, configuration data, data structures and/or files for representing the neural network model (data representative of parameters of the first artificial neural network), and the NN processor core is programmed to execute a subgraph or an individual node of a neural network, uses a local memory for storing weights, biases, input values, output values, and so forth where the bulk of the processing operations performed in implementing a neural network includes Matrix x Matrix or Matrix x Vector multiplications (data representative of the instructions executable to implement matrix computations of the first artificial neural network based at least on the first data)
Husseini does not explicitly show:
wherein the least one interface is further configured to receive third data from a data source; and the output of the artificial neural network is responsive at least to the third data
Jantz shows:
wherein the least one interface is further configured to receive third data from a data source; and the output of the artificial neural network is responsive at least to the third data (see Fig. 1, Fig. 2 items 205,210 and [0037],[0064], Jantz shows an autonomous vehicles system, that includes a neural network and sensor inputs such as a LIDAR, a RADAR, a vision system, a positioning system and wireless networks (receive third data from a data source), Fig.2 item 215 and [0047] shows the neural network receives neural network sensor input and reference sensor data 215 and based on the action of neural network the host CPU performs functions such as normal vehicle startup operation (the output of the first artificial neural network is responsive to the third data)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Husseini to incorporate the teaching of Jantz such that the neural network accelerator receives inputs from sensors and from the neural network server to performs the task such as recognizing images, analyzing and classifying information, and performing various classification tasks. Doing so would provide improved processing speed and reduced latency since the more computationally-intensive portions are performed by the hardware neural network accelerator.

Regarding claim 7, Husseini modified by Caballero and Jantz teaches claim 6. 
The system of claim 6, wherein the at least one interface includes a network interface connectable to the wired or wireless local area network to communicate the output of the first artificial network to the server system (see Fig.11 and ¶ [0135], Husseini shows the communication connection(s) 1170 are not limited to wired connections (e.g., megabit or gigabit Ethernet, Infiniband, Fibre Channel over electrical or fiber optic connections) but also include wireless technologies (e.g., RF connections via Bluetooth, WiFi (IEEE 802.11a/b/n), cellular) and other suitable communication connections for providing a network connection (connectable to the wired or wireless local area network)

Regarding claim 8, Husseini modified by Caballero and Jantz teaches claim 7. 
Husseini shows:
The system of claim 7, wherein the at least one interface further includes a serial interface connectable to a serial connection to receive the third data from the data source; and the serial connection is in accordance with a standard for Peripheral Component Interconnect express (PCIe), Universal Serial Bus (USB), or Mobile Industry Processor Interface (MIPI) (see ¶ [0083], Husseini shows a communication packet payload of a PCIe protocol transaction for transmission over a PCIe connection between the server and the hardware accelerator (standard for Peripheral Component Interconnect express (PCIe)

Regarding claim 9, Husseini modified by Caballero and Jantz teaches claim 6.
Husseini shows: 
The system of claim 6, wherein the device is further configured to receive, via the network interface to the wired or wireless local area network, (see Fig.11 and ¶ [0135], Husseini shows the communication connection between the server and the accelerator includes wired and wireless connections (connectable to the wired or wireless local area network))
Husseini does not explicitly show:
the third data from the data source
Jantz shows:
the third data from the data source (see Fig. 1, Fig. 2 items 205,210 and [0037],[0064], Jantz shows the neural network receives sensor inputs such as a LIDAR, a RADAR, a vision system, a positioning system and includes wireless networks (third data from a data source) 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Husseini to incorporate the teaching of Jantz such that the neural network accelerator receives inputs from sensors using wired or wireless connection. Doing so would enable the system to support multiple sensors/ sensor fusion since the hardware accelerator would provide the first neural network processing.

Regarding claim 10, Husseini modified by Caballero and Jantz teaches claim 9.
Husseini does not explicitly show:
The system of claim 9, wherein the device is further configured to receive, via the network interface to the wired or wireless local area network, fourth data; wherein the output of the first artificial neural network is further responsive to the fourth data
Jantz shows:
The system of claim 9, wherein the device is further configured to receive, via the network interface to the wired or wireless local area network, fourth data; wherein the output of the first artificial neural network is further responsive to the fourth data (see Fig.2 item 215 and [0047], Jantz shows the neural network receives sensor inputs and reference sensor data 215 from the host/server (fourth data), and based on the output of neural network the host performs functions such as normal vehicle startup operation (output of the first artificial neural network is further responsive to the fourth data)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Husseini to incorporate the teaching of Jantz such that the neural network accelerator receives inputs from sensors and from the neural network server to performs the task such as recognizing images, analyzing and classifying information, and performing various classification tasks. Doing so would provide improved processing speed and reduced latency since the more computationally-intensive portions are performed by the hardware neural network accelerator.

Regarding claim 11, Husseini modified by Caballero and Jantz teaches claim 10.
Husseini does not explicitly show:
The system of claim 10, wherein the device is further configured to receive, from the server system via the network interface to the wired or wireless local area network, the fourth data
Jantz shows:
The system of claim 10, wherein the device is further configured to receive, from the server system via the network interface to the wired or wireless local area network, the fourth data (see Fig.2 item 215 and [0047], Jantz shows the neural network receives sensor inputs and reference sensor data 215 from the host/server (fourth data)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Husseini to incorporate the teaching of Jantz such that the neural network accelerator receives inputs from sensors and from the neural network server to performs the task such as recognizing images, analyzing and classifying information, and performing various classification tasks. Doing so would provide improved processing speed and reduced latency since the more computationally-intensive portions are performed by the hardware neural network accelerator.

Regarding claim 12, Husseini modified by Caballero and Jantz teaches claim 11.
Husseini shows:
The system of claim 11, wherein the device further comprises: random access memory; and a control unit connected to the network interface to control access to the random access memory over the wired or wireless local area network (see Fig.5C and ¶ [0087], Husseini shows a neural network multiprocessor which includes a control unit, memory and network I/O interface, [0050],[0041],[0047] shows the control unit executing a deep neural network, and a memory interface for RAM/DRAM external or embedded memory and a direct memory access controller allowing transfer of blocks of data in memory, [0048],[0071] shows the I/O interface 150 includes circuitry for receiving and sending input and output signals to other components, and the memory includes the source code set of input data for applying to the neural network model and a desired output from the neural network model (control access to the random access memory over the wired or wireless local area network).

Regarding claim 13, Husseini modified by Caballero and Jantz teaches claim 12.
Husseini shows:
The system of claim 12, wherein the device further comprises: a central processing unit configured to run an application stored in the random access memory, wherein the application is configured to implement a portion of a server configured to provide services based on the output of the first artificial neural network (see Fig.5C and ¶ [0087], Husseini shows a subgraph of a neural network model that has been mapped to resources of a hardware accelerator for evaluation of the subgraph of the neural network. The resources 550 can include hardware, software, and/or a combination of hardware and software. For example, the resources can be implemented on a programmable logic platform, such as an FPGA. The resources 550 can include configurable logic blocks (such as programmable combinatorial and sequential logic), memory elements (such as block RAMs and register files), application-specific logic (such as hard macros for input/output and processing), and executable code for execution on a hard or soft CPU. (implement a portion of a server configured to provide services based on the output of the first artificial neural network).

Regarding claim 14, Husseini modified by V Caballero and Jantz teaches claim 12.
Husseini shows:
The system of claim 12, wherein the device further comprises: an integrated circuit die of a Field-Programmable Gate Array (FPGA) or Application Specific Integrated circuit (ASIC) implementing a Deep Learning Accelerator, the Deep Learning Accelerator comprising at least one processing unit configured to perform matrix computations, and the control unit configured to load the instructions from the random access memory for execution (see Fig.1 and ¶ [0044], Husseini shows a neural network multiprocessor which includes a control unit, memory and network I/O interface, having a plurality neural processing cores, implemented as a custom or application-specific integrated circuit, as a field programmable gate array (FPGA) or other reconfigurable logic (an integrated circuit die of Field-Programmable Gate Array (FPGA) or Application Specific Integrated circuit (ASIC)), [0050],[0041],[0047] shows the control unit executing a deep neural network, performing Matrix x Matrix or Matrix x Vector multiplications and a memory interface for RAM/DRAM external or embedded memory and a direct memory access controller allowing transfer of blocks of data in memory (implementing a Deep Learning Accelerator, the Deep Learning Accelerator comprising at least one processing unit configured to perform matrix computations), [0048],[0071] shows the I/O interface 150 includes circuitry for receiving and sending input and output signals to other components and the memory in addition to the source code includes a set of input data for applying to the neural network model and a desired output from the neural network model (load the instructions from the random access memory for execution).

Regarding claim 15, Husseini modified by Caballero and Jantz teaches claim 14.
Husseini shows:
The system of claim 14, wherein the at least one processing unit includes a matrix-matrix unit configured to operate on two matrix operands of an instruction; wherein the matrix-matrix unit includes a plurality of matrix-vector units configured to operate in parallel; wherein each of the plurality of matrix-vector units includes a plurality of vector- vector units configured to operate in parallel; and wherein each of the plurality of vector-vector units includes a plurality of multiply- accumulate units configured to operate in parallel (see Fig.2 and ¶ [0057],[0080], Husseini shows attributes of the subgraph can also be selected, for example, a number of iterations to run in parallel on the subgraph, where a set of parallel multiply-accumulate units in each convolutional layer can be used to speed up the computation, and the parallel multiplier units can be used in the fully-connected and dense-matrix multiplication stages and a parallel set of classifiers can also be used to speed up the computation (plurality of matrix-vector units configured to operate in parallel;)

Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Husseini, in views of Caballero and Jantz, and in further view of Akin et al (US Patent Application Pub. No. 2019/0042915 A1) hereinafter Akin. 
Regarding claim 16, Husseini modified by Caballero and Jantz teaches claim 15
Husseini shows:
The system of claim 15, wherein the random access memory and the Deep Learning Accelerator are formed on separate integrated circuit and the device further comprises: an integrated circuit package configured to enclose at least the random access memory and the Deep Learning Accelerator (see Fig.1 and ¶ [0044], Husseini shows a neural network multiprocessor which includes a control unit, memory and network I/O interface, having a plurality neural processing cores, implemented as a custom or application-specific integrated circuit, as a field programmable gate array (FPGA) or other reconfigurable logic, [0050],[0041],[0047] shows the control unit executing a deep neural network, performing Matrix x Matrix or Matrix x Vector multiplications (an integrated circuit package configured to enclose at least the random access memory and the Deep Learning Accelerator)
Husseini does not explicitly show:
connected by Through-Silicon Vias (TSVs);
Akin shows:
connected by Through-Silicon Vias (TSVs); (see Fig. 3 and [0044],[0045], Akin shows a neural network accelerator chip which includes multiple components including an  Axon Processor tightly coupled to one physical channel of external memory arrangements of DRAM, direct through-silicon via (TSV) die-stacked DRAM, and the like (connected by Through-Silicon Vias)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Husseini to incorporate the teaching of Akin such that the neural network accelerator includes the TSV die stacked DRAM in a separate die. Doing so would provide more scalability and support denser memory since the neural network accelerator would implement the memory on a separate die using through-silicon via.

Claims 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Husseini, in views of Caballero and Jantz. 
Regarding claim 17, Husseini teaches: 
An apparatus, comprising: memory; at least one interface configured to receive, over a computer network from a server system, first data representative of parameters of a first artificial neural network, and second data representative of instructions executable to implement matrix computations of the first artificial neural network using at least the first data, (see Fig.4 and ¶ [0002],[0020],[0018], Husseini shows a system which includes a neural network server coupled to a neural network accelerator, for solving complex computational tasks such as recognizing images, analyzing and classifying information, and performing various classification tasks by partitioning artificial neural network models into subgraphs provided to the neural network accelerator, Fig. 3, Fig. 4 items 310,320 and [0062],[0063] shows a subgraph 320 of the neural network model which executes on the accelerator (first artificial neural network), Fig.3 and ¶ [0069],[0041]-[0045] shows the neural network data 310 can include source code, executable code, metadata, configuration data, data structures and/or files for representing the neural network model (first data representative of parameters of a first artificial neural network), and the NN processor core is programmed to execute a subgraph or an individual node of a neural network, uses a local memory for storing weights, biases, input values, output values, and so forth where the bulk of the processing operations performed in implementing a neural network includes Matrix x Matrix or Matrix x Vector multiplications (second data representative of instructions executable to implement matrix computations)
a connection to the memory; and a Field-Programmable Gate Array (FPGA) or Application Specific Integrated circuit (ASIC) having: a memory interface configure to access the memory via the connection; and at least one processing unit configured to execute the instructions represented by the second data stored in the random access memory to implement the matrix computations of first artificial neural network and generate an output of the first artificial neural network responsive to the third data stored in the random access memory; (see Fig.1 and ¶ [0044], Husseini shows a neural network multiprocessor which includes a control unit, memory and network I/O interface, having a plurality neural processing cores, implemented as a custom or application-specific integrated circuit, as a field programmable gate array (FPGA) or other reconfigurable logic (a Field-Programmable Gate Array (FPGA) or Application Specific Integrated circuit (ASIC) having a memory interface)
wherein the at least one interface is further configured to communicate, to the server system, the output of the first artificial neural network as an input to a second artificial neural network implemented in the server system (see Fig. 10 step 1050 and [0128], Husseini shows the generated output values from the accelerator are communicated from the neural network accelerator to the server (communicate, to the server system, the output of the first artificial neural network), Fig. 3 and [0063] shows neural network 310 in the server, where the output generated by the accelerated neural node 331 is transmitted by the edges to the server neural node 341, where the edges connecting the nodes 330-331 to the nodes 340-342 are at an output boundary of the subgraph 320 in the accelerated node (as the input to a second artificial neural network)
wherein the least one interface is further configured store the first data, the second data, and… into the memory (see Fig.3 and ¶ [0069],[0041]-[0045], Husseini shows the neural network data 310 can include source code, executable code, metadata, configuration data, data structures and/or files for representing the neural network model, and the NN processor core is programmed to execute a subgraph or an individual node of a neural network, and uses a local memory for storing weights, biases, input values, output values (store the first data, the second data,)
Husseini does not explicitly show:
receive third data from a data source and to store the third data into the memory, wherein the instructions are generated for a deep learning accelerator to facilitate generation of an output of the first artificial neural network, wherein the instructions are generated by a compiler based on converting a description of the first artificial neural network
optimizing, by utilizing the compiler, the instructions by reducing overlapping computations associated with the first artificial neural network or by coordinating a timing of the generation of the output
Jantz shows:
receive third data from a data source and to store the third data into the memory (see Fig. 1, Fig. 2 items 205,210 and [0037],[0064], Jantz shows an autonomous vehicles system, that includes a neural network and sensor inputs such as a LIDAR, a RADAR, a vision system, a positioning system and wireless networks (receive third data from a data source), [0018] shows the raw sensor data is stored (store the third data)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Husseini to incorporate the teaching of Jantz such that the neural network accelerator receives inputs from sensors and stores the raw data. Doing so would provide improved system processing speed and reduced latency since the more computationally-intensive portions can be performed by the hardware neural network accelerator.
Caballero shows:
wherein the instructions are generated based on converting a description of the first artificial neural network using a compiler; (see Fig. 1 item 199 and [0026], Caballero shows a computing device which includes a compiler that implements optimization algorithms, where the compiler transforms portions and/or an entirety of computer-based programs to produce code that may efficiently utilize a processor and improve execution time, memory usage, code size, etc. (using a compiler), [0038],[0040] shows an example compiler that converts example high-level language/HLL, such as C, C++, Java, instructions into example low-level language/LLL instructions correspond to machine-code that can be read and/or otherwise executed by a computer (instructions are generated based on converting a description of), [0150] shows an example processor platform to implement the system which can be a server, a personal computer, a workstation, a self-learning machine e.g., a neural network (the first artificial neural network)
optimizing, by utilizing the compiler, the instructions by reducing overlapping computations associated with the first artificial neural network or by coordinating a timing of the generation of the output (see Fig. 1 item 110, Fig.3 and [0045]-[0052], Caballero shows the compiler includes an optimizer that improves the efficiency of one or more target processors by reducing the execution time required to implement and/or otherwise execute the example LLL instructions such as performing loop collapsing based on an optimization scenario (optimizing, by utilizing the compiler, the instructions by reducing overlapping computations)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Husseini to incorporate the teaching of Caballero such that the system uses compiler that transforms portions and/or an entirety of computer-based programs to produce code that may efficiently utilize a processor and improve execution time, memory usage, code size, etc. includes an optimizer that improves the efficiency of one or more target processors by reducing the execution time required such as by performing loop collapsing. Doing so would provide faster execution of the complex computation tasks since the system includes an optimizer that improves the efficiency of the target processors by reducing the execution time required.

Regarding claim 18, Husseini modified by Caballero and Jantz teaches claim 17.
Husseini shows:
The apparatus of claim 17, further comprising: a central processing unit configured to run an application stored in the random access memory, wherein the application is configured to implement a portion of an edge server configured to provide services based on the output of the first artificial neural network (see Fig.5C and ¶ [0087], Husseini shows a subgraph of a neural network model that has been mapped to resources of a hardware accelerator for evaluation of the subgraph of the neural network. The resources 550 can include hardware, software, and/or a combination of hardware and software. For example, the resources can be implemented on a programmable logic platform, such as an FPGA. The resources 550 can include configurable logic blocks (such as programmable combinatorial and sequential logic), memory elements (such as block RAMs and register files), application-specific logic (such as hard macros for input/output and processing), and executable code for execution on a hard or soft CPU. (implement a portion of a server configured to provide services based on the output of the first artificial neural network).

Regarding claim 19, Husseini modified by Caballero and Jantz teaches claim 17. 
Husseini shows:
The apparatus of claim 17, wherein the at least one interface includes: a network interface to a wired or wireless computer network, the network interface configured to receive the first data representative of the parameters of the first artificial neural network and the second representative of the instructions executable to implement the matrix computations of the first artificial neural network; and (see Fig.11 and ¶ [0135], Husseini shows the communication connection(s) 1170 are not limited to wired connections (e.g., megabit or gigabit Ethernet, Infiniband, Fibre Channel over electrical or fiber optic connections) but also include wireless technologies (e.g., RF connections via Bluetooth, WiFi (IEEE 802.11a/b/n), cellular) and other suitable communication connections for providing a network connection (a network interface to a wired or wireless computer network), Fig.3 and ¶ [0069],[0041]-[0045] shows the neural network data 310 can include source code, executable code, metadata, configuration data, data structures and/or files for representing the neural network model (first data representative of parameters of a first artificial neural network), and the NN processor core is programmed to execute a subgraph or an individual node of a neural network, uses a local memory for storing weights, biases, input values, output values, and so forth where the bulk of the processing operations performed in implementing a neural network includes Matrix x Matrix or Matrix x Vector multiplications (second data)
a serial interface to a serial connection to the data source, the serial interface configured to receive commands from the data source to write the third data into the memory (see ¶ [0083], Husseini shows the hardware accelerator supports the PCIe serial connection (a serial interface to a serial connection to the data source)

Regarding claim 20, Husseini modified by Caballero  and Jantz teaches claim 17. 
Husseini shows:
The apparatus of claim 17, wherein the serial communication connection is in accordance with Peripheral Component Interconnect express (PCIe), Universal Serial Bus (USB), or Mobile Industry Processor Interface (MIPI) (see ¶ [0083], Husseini shows a communication packet payload of a PCIe protocol transaction for transmission over a PCIe connection between the server and the hardware accelerator (standard for Peripheral Component Interconnect express (PCIe).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RANJAN PANT whose telephone number is (571)270-5946.  The examiner can normally be reached on IFW.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kevin T Bates can be reached on (571)272-3980.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

                                                                                                                                                                                                   
RANJAN PANT
Examiner
Art Unit 2458 

/RP/

/KEVIN T BATES/Supervisory Patent Examiner, Art Unit 2458