DETAILED ACTION
1.	This office action is in response to the Application No.  filed on 6/28/2019. Claims 1-20 are presented for examination and are currently pending.

Notice of Pre-AIA  or AIA  Status
2.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101 
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
 

3.	Claims 1-14 are rejected under 35 U.S.C 101 because the claimed invention is directed to non-statutory subject matter.
	Claim 1 recites “machine-readable storage medium” which does not fall within at least one of the four categories of patent eligible subject matter. Applicant’s specification as filed does not define machine-readable storage medium to exclude signals per se.  The broadest reasonable interpretation of machine-readable storage medium is to include signals per se; thus, claim 1 is directed to non-statutory subject matter.
The dependent claims 2-14 do not contain any subject matter which changes the above analysis.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):

(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


3.	Claim 11 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
	Claim 11 recites “graph theory-based analysis”. The term “graph theory-based analysis” is indefinite because it is describing an analysis based on graph-theory.  There are many different types of graphs (e.g. directed graphs, undirected graphs) and it is not clear which type of graph is used.  Therefore, the metes and bounds of the term “graph theory-based analysis” is not defined and the limitation is indefinite.   Published paragraph [0085] of applicant’s specification only mentions the use of ‘graph-theory based analysis’ to optimize the underlying neural network or optimize utilization of hardware resources.  [0083] describes optimization by ‘traversing’ the neural network graph.  For the purpose of examination, this limitation of graph theory-based analysis is interpreted as graph traversal.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.



5.	Claims 1, 2, 4, 6, 14, 15, 17, 18 are rejected under 35 U.S.C. 103 as being unpatentable over Venkataramani et al (US20180136912) in view of Rotem et al (Glow: Graph Lowering Compiler Techniques for Neural Networks, arXiv:1805.00907v3 [cs.PL] 3 Apr 2019)

	Regarding claim 1, Venkataramani teaches at least one machine-readable storage medium with instructions stored thereon, wherein the instructions are executable by a machine to cause the machine to: (computer readable media may also be used to store and execute these program instructions [0063])
	receive, at a compiler (a model compiler 220, [0058], Fig. 2)
	a graph describing a neural network; (the DL (deep learning) network 212 may be a graphical model [0060], Fig. 2; Other representations of a DL network include a Directed Acyclic Graph (DAG) [0055]; In other embodiments, the deep learning code generator 300 may utilize the IR builder 224 of the model compiler 220 to construct in-memory IRs of the DL network 212 [0075])
	access data to describe a target hardware device to implement the neural network; (The deep learning code generator 300 also may receive one or more settings, such as the code generation settings 318, for guiding or controlling the code generation process for the DL network 212, as indicated at step 504. The options may indicate which predefined library is to be used in the generated code 226, such as Nvidia's cuDNN library, among others. Alternatively or additionally, the options may indicate the platform target, for example a CPU target platform, a GPU target platform, a TPU target platform, an FPGA target platform, etc. [0073])
	generate, at the compiler, from the graph and the data, an intermediate representation, (In other embodiments, the deep learning code generator 300 may utilize the IR (intermediate representation) builder 224 of the model compiler 220 to construct in-memory IRs of the DL network 212 [0075]; In some embodiments, one or more IRs may be graph-based, object-oriented structures. For example, one or more IRs may be in the form of a hierarchical Data Flow Graph (DFG) [0077])
	wherein the intermediate representation comprises an operator model (a Control Flow Graph (CFG) that may include objects classified as Statements, Variables, Functions, Data Types, etc. The systems and methods may utilize the subclass types of the class hierarchy, such as the Convolution subclass type, the Pooling subclass type, etc., to transform and/or construct the CFG form of IR for the DL network. The CFG for of IR for the DL network also may include an object of a Network class type. [0048])
	to identify a set of operations to be performed to implement the neural network, (Convolution subclass type, the Pooling subclass type, etc., to transform and/or construct the CFG form of IR for the DL network. The CFG for of IR for the DL network also may include an object of a Network class type. [0048])
	a data model (In some embodiments, one or more IRs may be graph-based, object-oriented structures. For example, one or more IRs may be in the form of a hierarchical Data Flow Graph (DFG) and/or a Parallel Intermediate Representation (PIR), which may include a plurality of IR objects, such as nodes, which may represent layers of the DL network 212, interconnected by edges, which may represent data flow. [0077])
	and a control model to identify a sequencing of the operations; (One or more of the IRs may be in the form of a Data Flow Graph (DFG) and/or a Control Flow Graph (CFG) for the DL network. An IR may include nodes that correspond to the layers of the DL network.  [0046]) 
	and generate a binary executable using each of the operator model, data model, and control model of the intermediate representation (The deep learning code generator 300 may separate the network parameters from one or more IRs for the DL network 212, … In some embodiments, the deep learning code generator 300 may store the separated network parameters in the form of one or more binary data files (as binary executable) [0081]; The generated code 226 may also include or have access to the binary data files that contain the network parameters, e.g., weights and biases, that were separated from AlexNet [0123])
	Venkataramani does not explicitly teach to identify a set of tensors corresponding to the set of operations, 
	Rotem teaches to identify a set of tensors corresponding to the set of operations, (Placeholder nodes are symbolic nodes that may have backing tensors assigned or changed after compilation, pg. 3, right col, last para.; An example of a Placeholder node is the input image data tensor for image classification in a convolutional neural network, pg. 4, left col, first para, Fig. 2)
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the storage medium of Venkataramani to incorporate the teaching of Rotem for the benefit of enabling the compiler to support a high number of input operators as well as a large number of hardware targets (Rotem, abstract)

	Regarding claim 2, Venkataramani modified by Rotem teaches the storage medium of claim 1, Venkataramani teaches wherein the operator model identifies, from each node of the graph, a respective one of the set of operations, and (Nodes of the dependency graph may represent execution elements, such as a CUDA kernel, for executing the functionality of a layer of the DL network 212, (dependency graph may be a directed graph having nodes and edges) [0107] ) 
	Rotem teaches further identifies, from each edge of the graph, a respective one of the set of tensors (Placeholder nodes are symbolic nodes that may have backing tensors assigned or changed after compilation, pg. 3, right col, last para.; An example of a Placeholder node is the input image data tensor for image classification in a convolutional neural network, pg. 4, left col, first para, Fig. 2)
	The same motivation to combine as independent claim 1 applies here.

	Regarding claim 4, Venkataramani modified by Rotem teaches the storage medium of claim 1, Venkataramani teaches wherein the control model identifies (A CDFG may capture the control flow as well as the data flow of a source program through data dependency and control dependency edges [0077])

	Regarding claim 6, Venkataramani modified by Rotem modified by McBride teaches the storage medium of claim 5, Venkataramani teaches wherein the target hardware device comprises two or more different types of compute resources (Exemplary target platforms include host computers having one or more single core and/or multicore CPUs and one or more Parallel Processing Units (PPUs), such as Graphics Processing Units (GPUs), and embedded systems including single and/or multicore CPUs, … [0062]) 
	and two or more different types of memory resources (The main memory 1204, which may be a Random Access Memory (RAM) [0133], and the removable medium drive 1210 may accept and read a computer readable medium 1226, such as a CD, DVD, floppy disk, solid state drive, tape, flash memory or other non-transitory medium [0134])

	Regarding claim 14, Venkataramani modified by Rotem modified by Fursin teaches the storage medium of claim 1, Venkataramani teaches wherein the executable binary is to optimize implementation of the neural network using resources of the target hardware device (The deep learning code generator may generate code to call this class hierarchy/API layer. For each target platform, there may be a different implementation of the class-hierarchy/API layer that is optimized to the specified target platform. [0144]; The optimization engine 310 may apply a partitioning algorithm, such as a clique partitioning algorithm, to the dependency graph to reduce or minimize the number of edges between cliques, ... Each dense connection structure, e.g., clique, may be mapped to an execution unit of the target platform that will execute the generated code 226 [0097])

	Regarding claim 15, Venkataramani a method comprising: (The methods may be implemented as a deep learning compiler framework. FIG. 14 is a schematic illustration of a deep learning compiler framework 1400 in accordance with an embodiment [0041])
	receiving, at a compiler, (a model compiler 220, [0058], Fig. 2)
	a graph describing a neural network; (the DL (deep learning) network 212 may be a graphical model [0060], Fig. 2; Other representations of a DL network include a Directed Acyclic Graph (DAG) [0055]; In other embodiments, the deep learning code generator 300 may utilize the IR builder 224 of the model compiler 220 to construct in-memory IRs of the DL network 212 [0075])
	accessing data to describe a target hardware device to implement the neural network; (The deep learning code generator 300 also may receive one or more settings, such as the code generation settings 318, for guiding or controlling the code generation process for the DL network 212, as indicated at step 504. The options may indicate which predefined library is to be used in the generated code 226, such as Nvidia's cuDNN library, among others. Alternatively or additionally, the options may indicate the platform target, for example a CPU target platform, a GPU target platform, a TPU target platform, an FPGA target platform, etc. [0073])
	generating, at the compiler, from the graph and the data, an intermediate representation, (In other embodiments, the deep learning code generator 300 may utilize the IR (intermediate representation) builder 224 of the model compiler 220 to construct in-memory IRs of the DL network 212 [0075]; In some embodiments, one or more IRs may be graph-based, object-oriented structures. For example, one or more IRs may be in the form of a hierarchical Data Flow Graph (DFG) [0077])
	wherein the intermediate representation comprises an operator model (a Control Flow Graph (CFG) that may include objects classified as Statements, Variables, Functions, Data Types, etc. The systems and methods may utilize the subclass types of the class hierarchy, such as the Convolution subclass type, the Pooling subclass type, etc., to transform and/or construct the CFG form of IR for the DL network. The CFG for of IR for the DL network also may include an object of a Network class type. [0048])
	to identify a set of operations to be performed to implement the neural network, (Convolution subclass type, the Pooling subclass type, etc., to transform and/or construct the CFG form of IR for the DL network. The CFG for of IR for the DL network also may include an object of a Network class type. [0048])
	a data model (In some embodiments, one or more IRs may be graph-based, object-oriented structures. For example, one or more IRs may be in the form of a hierarchical Data Flow Graph (DFG) and/or a Parallel Intermediate Representation (PIR), which may include a plurality of IR objects, such as nodes, which may represent layers of the DL network 212, interconnected by edges, which may represent data flow. [0077])
	and a control model to identify a sequencing of the operations; (One or more of the IRs may be in the form of a Data Flow Graph (DFG) and/or a Control Flow Graph (CFG) for the DL network. An IR may include nodes that correspond to the layers of the DL network.  [0046]) 
	and generating a binary executable using each of the operator model, data model, and control model of the intermediate representation. (The deep learning code generator 300 may separate the network parameters from one or more IRs for the DL network 212, … In some embodiments, the deep learning code generator 300 may store the separated network parameters in the form of one or more binary data files (as binary executable) [0081]; The generated code 226 may also include or have access to the binary data files that contain the network parameters, e.g., weights and biases, that were separated from AlexNet [0123])
	Venkataramani does not explicitly teach to identify a set of tensors corresponding to the set of operations, 
	Rotem teaches to identify a set of tensors corresponding to the set of operations, (Placeholder nodes are symbolic nodes that may have backing tensors assigned or changed after compilation, pg. 3, right col, last para.; An example of a Placeholder node is the input image data tensor for image classification in a convolutional neural network, pg. 4, left col, first para, Fig. 2)
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the storage medium of (Rotem, abstract)

	Regarding claim 17, Venkataramani teaches a system comprising: a data processor; (For example, for a target platform that includes a GPU (graphic processing unit) (as data processor), the implementation may include cuDNN library calls, handwritten CUDA code, and/or a BLAS implementation [0070])
	a memory; and (The IRs may be stored in memory, such as a main memory or a persistent memory of a data processing device [0078)
	a compiler, executable by the data processor to: (In some embodiments, a compiler, such as the compiler 230, may compile the generated code 226 to produce the executable 232 [0088], Fig. 2)
	receive a graph describing a neural network; (the DL (deep learning) network 212 may be a graphical model [0060], Fig. 2; Other representations of a DL network include a Directed Acyclic Graph (DAG) [0055]; In other embodiments, the deep learning code generator 300 may utilize the IR builder 224 of the model compiler 220 to construct in-memory IRs of the DL network 212 [0075])
	access data to describe a target hardware device to implement the neural network; (The deep learning code generator 300 also may receive one or more settings, such as the code generation settings 318, for guiding or controlling the code generation process for the DL network 212, as indicated at step 504. The options may indicate which predefined library is to be used in the generated code 226, such as Nvidia's cuDNN library, among others. Alternatively or additionally, the options may indicate the platform target, for example a CPU target platform [0073])
	generate from the graph and the data, an intermediate representation, (In other embodiments, the deep learning code generator 300 may utilize the IR (intermediate representation) builder 224 of the model compiler 220 to construct in-memory IRs of the DL network 212 [0075]; In some embodiments, one or more IRs may be graph-based, object-oriented structures [0077])
	wherein the intermediate representation comprises an operator model (a Control Flow Graph (CFG) that may include objects classified as Statements, Variables, Functions, Data Types, etc. The systems and methods may utilize the subclass types of the class hierarchy, such as the Convolution subclass type, the Pooling subclass type, etc., to transform and/or construct the CFG form of IR for the DL network. The CFG for of IR for the DL network also may include an object of a Network class type. [0048])
	to identify a set of operations to be performed to implement the neural network, (Convolution subclass type, the Pooling subclass type, etc., to transform and/or construct the CFG form of IR for the DL network. The CFG for of IR for the DL network also may include an object of a Network class type. [0048])
	a data model (In some embodiments, one or more IRs may be graph-based, object-oriented structures. For example, one or more IRs may be in the form of a hierarchical Data Flow Graph (DFG) and/or a Parallel Intermediate Representation (PIR), which may include a plurality of IR objects, such as nodes, which may represent layers of the DL network 212, interconnected by edges, which may represent data flow. [0077])
	and a control model to identify a sequencing of the operations; (FIG. 8 is a flow diagram of an example method for performing parallel execution scheduling in accordance with an embodiment [0107]; A scheduling order may be created that specifies an order of execution of all or at least some of the layers of the DL network 212, and the scheduling order may be incorporated in the code generated for the DL network 212 [0100]; A CDFG may capture the control flow as well as the data flow of a source program through data dependency and control dependency edges [0077]) 
	and generate a binary executable using each of the operator model, data model, and control model of the intermediate representation. (The deep learning code generator 300 may separate the network parameters from one or more IRs for the DL network 212, … In some embodiments, the deep learning code generator 300 may store the separated network parameters in the form of one or more binary data files (as binary executable) [0081]; The generated code 226 may also include or have access to the binary data files that contain the network parameters, e.g., weights and biases, that were separated from AlexNet [0123])
	Venkataramani does not explicitly teach to identify a set of tensors corresponding to the set of operations, 
	Rotem teaches to identify a set of tensors corresponding to the set of operations, (Placeholder nodes are symbolic nodes that may have backing tensors assigned or changed after compilation, pg. 3, right col, last para.; An example of a Placeholder node is the input image data tensor for image classification in a convolutional neural network, pg. 4, left col, first para, Fig. 2)
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the storage medium of Venkataramani to incorporate the teaching of Rotem for the benefit of enabling the compiler to support a high number of input operators as well as a large number of hardware targets (Rotem, abstract)

	Regarding claim 18, Venkataramani modified by Rotem modified teaches the system of claim 17, Venkataramani teaches wherein the compiler is further to: access second data to describe a second, different target hardware device to implement the neural network; (The deep learning code generator 300 also may receive one or more settings, such as the code generation settings 318, for guiding or controlling the code generation process for the DL network 212, as indicated at step 504. The options may indicate which predefined library is to be used in the generated code 226, such as Nvidia's cuDNN library, among others. Alternatively or additionally, the options may indicate the platform target, for example a GPU target platform (different from a CPU target platform [0073])
	generate from an instance of the graph and the second data, a second intermediate representation, (In other embodiments, the deep learning code generator 300 may utilize the IR (intermediate representation) builder 224 of the model compiler 220 to construct in-memory IRs of the DL network 212 [0075]; In some embodiments, one or more IRs may be graph-based, object-oriented structures. For example, one or more IRs may be in the form of a hierarchical Data Flow Graph (DFG) [0077]) as second intermediate representation because the IR is from a GPU target platform)
	wherein the second intermediate representation comprises a respective operator model, (a Control Flow Graph (CFG) that may include objects classified as Statements, Variables, Functions, Data Types, etc [0048])
	data model, (Parallel Intermediate Representation (PIR), which may include a plurality of IR objects, such as nodes, which may represent layers of the DL network 212, interconnected by edges, which may represent data flow. [0077])
	and control model, wherein the second intermediate representation is different from the intermediate representation; ((One or more of the IRs may be in the form of a Data Flow Graph (DFG) and/or a Control Flow Graph (CFG) for the DL network. An IR may include nodes that correspond to the layers of the DL network [0046] which is different because the IR is from GPU target platform) 
	and generate a second binary executable using the second intermediate representation, wherein the second binary executable is different from the binary executable. (The deep learning code generator 300 may separate the network parameters from one or more IRs for the DL network 212, … In some embodiments, the deep learning code generator 300 may store the separated network parameters in the form of one or more binary data files (as binary executable) [0081]; The generated code 226 may also include or have access to the binary data files that contain the network parameters, e.g., weights and biases, that were separated from AlexNet [0123])


6.	Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Venkataramani et al (US20180136912) in view of Rotem et al (Glow: Graph Lowering Compiler Techniques for Neural Networks, arXiv:1805.00907v3 [cs.PL] 3 Apr 2019) and further in view of Evans et al (US20190197018)

	Regarding claim 3, Venkataramani modified by Rotem teaches the storage medium of claim 1, Venkataramani teaches wherein the data model identifies a set of buffers to be allocated in memory of the target hardware device (As described herein, one optimization that may be performed by the optimization engine 310 is buffer allocation minimization. Another optimization that may be performed is mapping portions of the IR for the DL network 212, such as the portions that correspond to the layers of the DL network 212, to execution units of a target platform [0082])
	Venkataramani does not explicitly teach and data model maps each of the set of tensors to a respective one of the set of buffers.
	Evans teaches data model maps each of the set of tensors to a respective one of the set of buffers (data flow graph can be based on a plurality of subgraphs … and tensor can queue within the input buffers [0068])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the storage medium of Modified Venkataramani to incorporate the teachings of Evans for the benefit of data flow graph representing a deep learning network [0085] and data flow graphs that represent matrix computations, and tensor manipulations (Evans [0058])

7.	Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Venkataramani et al (US20180136912) in view of Rotem et al (Glow: Graph Lowering Compiler Techniques for Neural Networks, arXiv:1805.00907v3 [cs.PL] 3 Apr 2019) and further in view of McBride et al (US20180299943)

	Regarding claim 5, Venkataramani modified by Rotem teaches the storage medium of claim 1, Venkataramani teaches wherein the data comprises to identify memory and compute resources of the target hardware device (The options may indicate which predefined library is to be used in the generated code 226, such as Nvidia's cuDNN library, among others. Alternatively or additionally, the options may indicate the platform target, for example a CPU target platform, a GPU target platform, a TPU target platform, an FPGA target platform, etc. The options also may indicate the identity of a compiler tool chain, such as Nvidia's nvcc compiler, a C/C++ compiler, etc. Other options may indicate whether the generated code should be optimized for speed of execution or to minimize memory usage. Other options may indicate whether to run the DL network 212 on a single input at a time or with batches of inputs at a time. [0073])
	Modified Venkataramani does not explicitly teach a target descriptor.

	McBride teaches a target descriptor (Generally, there can be several main classes of descriptors: memory-to-memory move (“M2M”) descriptors, configuration descriptors, and operation descriptors. M2M descriptors can be used to move data to/from the main memory to/from a local buffer (i.e. the line buffer 125 described below) for consumption by the operation descriptors. M2M descriptors follow a different execution pipeline than the operation descriptors. The target pipeline for M2M descriptors can be the internal DMA engine 105B or the configuration registers 105G, whereas the target pipeline for the operation descriptors can be the neurons 105F [0039])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the storage medium of Modified Venkataramani to incorporate the teachings of McBride for the benefit of software tools and/or compilers which can be executed on devices external to the DNN module 105 to create the descriptor lists that are executed on the DNN module 105 (McBride [0038])

8.	Claims 7 and 8 are rejected under 35 U.S.C. 103 as being unpatentable over Venkataramani et al (US20180136912) in view of Rotem et al (Glow: Graph Lowering Compiler Techniques for Neural Networks, arXiv:1805.00907v3 [cs.PL] 3 Apr 2019) and further in view of Pratas et al (US20170277658)

	Regarding claim 7, Venkataramani modified by Rotem modified by McBride teaches the storage medium of claim 6, Modified Venkataramani does not explicitly teach wherein the target hardware device comprises a hardware accelerator, one of the two or more different types of compute resources is implemented on the hardware accelerator and another one of the two or more different types of compute resources is implemented outside the hardware accelerator.
(The embodiments of the invention described below enable the integration of neuromorphic accelerators in low-power devices [0094], neuromorphic accelerator 1230, Fig. 12B)
	one of the two or more different types of compute resources is implemented on the hardware accelerator (neuromorphic accelerator 1230 includes a set of processing units PUs 1300-1303 [0101] implemented on accelerator 1230 Fig. 13)
	and another one of the two or more different types of compute resources is implemented outside the hardware accelerator (Uncore 1210 is implemented outside neuromorphic accelerator 1230, Fig. 12B, [0099])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the storage medium of Modified Venkataramani to incorporate the teachings of Pratas for the benefit of optimized architecture to compute fully-connected neural networks very efficiently (Pratas, [0092])

	Regarding claim 8, Venkataramani modified by Rotem teaches the storage medium of claim 6, Venkataramani teaches another one of the two or more different types of memory resources comprises random access memory (RAM) (The main memory 1204, which may be a Random-Access Memory (RAM) [0133])
	Venkataramani does not explicitly teach wherein one of the two or more different types of memory resources comprises local scratchpad memory
	Pratas teaches wherein one of the two or more different types of memory resources comprises local scratchpad memory (The embodiments of the invention described below include a scratchpad memory design for hardware convolvers and neural network accelerators [0114])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the storage medium of Modified Venkataramani to incorporate the teachings of Pratas for the benefit of better scratchpad memory utilization which results in a significant external-memory bandwidth reduction, and therefore lower power and energy consumption (Pratas, [0121])

9.	Claims 9-12 and 16, 20 are rejected under 35 U.S.C. 103 as being unpatentable over Venkataramani et al (US20180136912) in view of Rotem et al (Glow: Graph Lowering Compiler Techniques for Neural Networks, arXiv:1805.00907v3 [cs.PL] 3 Apr 2019) and further in view of Fursin et al (MILEPOST GCC: machine learning based research compiler. GCC Summit, Jun 2008, Ottawa, Canada. ffinria-00294704)

	Regarding claim 9, Venkataramani modified by Rotem teaches the storage medium of claim 1, Venkataramani teaches wherein the instructions are further executable by a machine to cause the machine (computer readable media may also be used to store and execute these program instructions [0063])
	to generate the binary executable.  (The deep learning code generator 300 may separate the network parameters from one or more IRs for the DL network 212,…In some embodiments, the deep learning code generator 300 may store the separated network parameters in the form of one or more binary data files (as binary executable) [0081]; The generated code 226 may also include or have access to the binary data files that contain the network parameters, e.g., weights and biases, that were separated from AlexNet [0123])
	Modified Venkataramani does not explicitly teach to perform a set of compilation passes using the operator model, data model, and control model 
	Fursin teaches to perform a set of compilation passes using the operator model, data model, and control model (During the compilation, the program is represented by several data structures, implementing the intermediate representation (tree-SSA, RTL etc), control flow graph (CFG), def-use chains, the loop hierarchy, etc, the data structures available depend on the compilation pass currently being performed pg. 7, left col, second para.; selecting pass sequences, Fig. 2b, pg. 4) 
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the storage medium of Modified Venkataramani to incorporate the teachings of Fursin for the benefits of iterative compilation while reducing the number of executions needed (Fursin, pg. 1, right col, second para.)

	Regarding claim 10, Venkataramani modified by Rotem modified by Fursin teaches the storage medium of claim 9, Fursin teaches wherein performing the set of compilation passes comprises: selecting, for each one of the set of compilation passes, one of the operator model, data model, or control model based on the respective compilation pass; (selecting pass sequences, Fig. 2b, pg. 4) and
	using the selected one of the operator model, data model, or control model to perform the corresponding compilation pass. (To verify that we can change the default optimization pass orders using ICI, we recompiled the same benchmark with the -O3 flag but selecting passes shown in Figure 4c. However, note that the GCC internal function execute_one_pass shown in Figure 3c has gate control (pass->gate()) to execute the pass only if the associate optimization flags is selected, pg. 6, left col, last para.)
	The same motivation to combine dependent claim 9 applies here.

	Regarding claim 11, Venkataramani modified by Rotem modified by Fursin teaches the storage medium of claim 10, Fursin teaches wherein each of the operator model, data model, and control model comprise a respective graph, (During the compilation, the program is represented by several data structures, implementing the intermediate representation (tree-SSA, RTL etc), control flow graph (CFG), def-use chains, the loop hierarchy, etc, The data structures available depend on the compilation pass currently being performed, pg. 7, left col, second para.)
	and one or more of the set of compilation passes (selecting pass sequences, Fig. 2b, pg. 4) comprises 
	a graph theory-based analysis of a corresponding one of the operator model, data model, or control model (During the compilation, the program is represented by several data structures, implementing the intermediate representation (tree-SSA, RTL etc), control flow graph (CFG), pg. 7, left col, second para.)
	The same motivation to combine dependent claim 9 applies here.

(computer readable media may also be used to store and execute these program instructions [0063])
	Fursin teaches to receive a compilation descriptor to identify the set of compilation passes to be used by the compiler (This version can now transparently monitor execution of passes or replace the GCC Controller (Pass Manager) (as compilation descriptor), if desired, pg. 3, 3.1 Internal structure, Fig. 2b)
	in generating the binary executable (The execution of the generated binary shows that we improve its execution time, pg. 6, right col, first para.)
	The same motivation to combine dependent claim 9 applies here.

	Regarding claim 16, Venkataramani modified by Rotem modified teaches the method of claim 15, Venkataramani teaches to generate a translated version of the graph, wherein the binary executable is generated based on the translated version of the graph. (The deep learning code generator 300 also may generate a code generation report 228. The generated code 226 may be provided to a compiler 230, such as a C compiler, Nvidia's nvcc compiler, etc., which may translate the generated code 226 and the library functions called by the generated code to produce executable code 232. The executable code 232, which may be in the form of assembly code, may be deployed on a deployed system, such as a target platform [0057])

	Fursin teaches to perform a set of compilation passes using the operator model, data model, and control model (During the compilation, the program is represented by several data structures, implementing the intermediate representation (tree-SSA, RTL etc), control flow graph (CFG), def-use chains, the loop hierarchy, etc, the data structures available depend on the compilation pass currently being performed pg. 7, left col, second para.; selecting pass sequences, Fig. 2b, pg. 4) 
	The same motivation to combine dependent claim 9 applies here.

	Regarding claim 20, Venkataramani modified by Rotem modified teaches the system of claim 17, Venkataramani teaches wherein the target computing device, when executing the binary executable (The executable code 232, which may be in the form of assembly code, may be deployed on a deployed system, such as a target platform [0057])
	Rotem teaches comprises a memory allocation pass, and performing the memory allocation pass comprises: (In the first section of the IR we declare a number of memory regions that live throughout the lifetime of the program, pg. 5, right col, third para.)
	determining, for a particular one of the set of tensors, attributes of the particular tensor; (The graph is strongly typed, which means that inputs and output have a known tensor type (consisting of the tensor’s shape and element type), and that the types of nodes are verified by the compiler, pg. 3, right col, first para.)
(Memory regions are strongly typed, which means that the kind of type of tensor that the region represents is known, pg. 5, right col, fifth para.)
	based on one or more of the attributes; (Some of the parameters that an optimized implementation may care about are the specific tensor sizes, the exact addresses of buffers in memory, pg. 8, 5.1 Standard Library)
	and allocate a particular buffer for the particular tensor in the particular memory resource based on one or more of the attributes, (This 5-dimensional tensor layout allows for consecutive SIMD memory access, pg. 9, right col, first para.; the low-level optimizer optimizes the instruction stream by shrinking the lifetime of memory allocations for the activations, and then performs static memory allocation for the whole network into a single buffer, pg. 9, right col, second to the last para.; Fig. 6)
	is to use the particular buffer to store the particular tensor. (an optimized implementation may care about are the specific tensor sizes, the exact addresses of buffers in memory, pg. 8, 5.1 Standard Library)
	Modified Venkataramani does not explicitly teach wherein the compiler is perform a plurality of compilation passes to generate the binary executable, and the plurality of compilation passes comprises a memory allocation pass, 
	Fursin teaches wherein the compiler is perform a plurality of compilation passes (This version can now transparently monitor execution of passes or replace the GCC Controller (Pass Manager) (as compilation descriptor), if desired, pg. 3, 3.1 Internal structure, Fig. 2b)
	to generate the binary executable, (The execution of the generated binary shows that we improve its execution time, pg. 6, right col, first para.)
	and the plurality of compilation passes (This version can now transparently monitor execution of passes or replace the GCC Controller (Pass Manager) (as compilation descriptor), if desired, pg. 3, 3.1 Internal structure, Fig. 2b)
	The same motivation to combine dependent claim 9 applies here.

10.	Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Venkataramani et al (US20180136912) in view of Rotem et al (Glow: Graph Lowering Compiler Techniques for Neural Networks, arXiv:1805.00907v3 [cs.PL] 3 Apr 2019) and further in view of Packes et al (US20150242747)

	Regarding claim 13, Venkataramani modified by Rotem teaches the storage medium of claim 1, Venkataramani teaches wherein the executable binary (In some embodiments, the deep learning code generator 300 may store the separated network parameters in the form of one or more binary data files (as binary executable) [0081]; The generated code 226 may also include or have access to the binary data files that contain the network parameters, e.g., weights and biases, that were separated from AlexNet [0123]) comprises
	Modified Venkataramani does not explicitly teach serialized data to be provided to the target hardware device
(A neural network may be an object, such as an instantiation of a C# class that may be serialized and saved in the system data store [0051])
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the storage medium of Modified Venkataramani to incorporate the teachings of Packes for the benefit of loading and instantiating a memory object representing the neural network (Packes, [0054])

11.	Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Venkataramani et al (US20180136912) in view of Rotem et al (Glow: Graph Lowering Compiler Techniques for Neural Networks, arXiv:1805.00907v3 [cs.PL] 3 Apr 2019) in view of McBride et al (US20180299943) and further in view Fursin et al (MILEPOST GCC: machine learning based research compiler. GCC Summit, Jun 2008, Ottawa, Canada. ffinria-00294704)

	Regarding claim 19, Venkataramani modified by Rotem modified teaches the system of claim 17, Venkataramani teaches identifying attributes of a set of memory resources of a target computing device (Alternatively or additionally, the options may indicate the platform target, for example a CPU target platform, a GPU …. etc. The options also may indicate the identity of a compiler tool chain, such as Nvidia's nvcc compiler, a C/C++ compiler, etc. Other options may indicate whether the generated code should be optimized for speed of execution or to minimize memory usage. [0073])
(a model compiler 220, [0058], Fig. 2)
	wherein the intermediate representation is generated based on the attributes; (The IR generator 304 may transform one or more of the IRs for the DL network 212 to a new form that incorporates the selected target agnostic API layer 400 [0083])
	Modified Venkataramani does not explicitly teach wherein the data comprises a target descriptor file, the compiler is further to: receive the target descriptor as an input,
wherein the intermediate representation is generated based on the attributes; receive a compilation descriptor identifying a plurality of compilation passes; perform the plurality of compilation passes based on the compilation descriptor to generate the binary executable.
	McBride teaches wherein the data comprises a target descriptor file (Generally, there can be several main classes of descriptors: memory-to-memory move (“M2M”) descriptors, configuration descriptors, and operation descriptors. M2M descriptors can be used to move data to/from the main memory to/from a local buffer (i.e. the line buffer 125 described below) for consumption by the operation descriptors. M2M descriptors follow a different execution pipeline than the operation descriptors. The target pipeline for M2M descriptors can be the internal DMA engine 105B or the configuration registers 105G, whereas the target pipeline for the operation descriptors can be the neurons 105F)
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the storage medium of Modified Venkataramani to incorporate the teachings of McBride for the benefit of (McBride [0038])
	Fursin teaches receive a compilation descriptor identifying a plurality of compilation passes; perform the plurality of compilation passes based on the compilation descriptor (This version can now transparently monitor execution of passes or replace the GCC Controller (Pass Manager) (as compilation descriptor), if desired, pg. 3, 3.1 Internal structure, Fig. 2b)
	to generate the binary executable (The execution of the generated binary shows that we improve its execution time, pg. 6, right col, first para.)
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the storage medium of Modified Venkataramani to incorporate the teachings of Fursin for the benefits of iterative compilation while reducing the number of executions needed (Fursin, pg. 1, right col, second para.)


Conclusion
	Any inquiry concerning this communication or earlier communications from the examiner should be directed to MORIAM MOSUNMOLA GODO whose telephone number is (571)272-8670. The examiner can normally be reached Monday-Friday 7:30am-5:30pm EST.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B. Zhen can be reached on (571)272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/M.G./Examiner, Art Unit 2121                                    


/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121