DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statements (IDS) submitted on 01/29/2019 and 10/11/2019 are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statements are being considered by the examiner.
Drawings
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(4) because reference character “336” has been used to designate two different MVM units in the figure on page 3 of the drawings (one in first tile 302 and one in second tile 304).  Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(5) because they do not include the following reference sign(s) mentioned in the description: Fig. 3 in specification paragraph [0005].  Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(5) because they include the following reference character(s) not mentioned in the description: 336 in the figure on page 3 of the drawings, 338 in the figure on page 3 of the drawings, 340 in the figure on page 3 of the drawings, 342 in the figure on page 3 of the drawings, 344 in the figure on page 3 of the drawings, 346 in the figure on page 3 of the drawings, 348 in the figure on page 3 of the drawings, 350 in the figure on page 3 of the drawings, 352 in the figure on page 3 of the drawings, and 354 in the figure on page 3 of the drawings.  Corrected drawing sheets in compliance with 37 CFR 1.121(d), or amendment to the specification to add the reference character(s) in the description in compliance with 37 CFR 1.121(b) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
Specification
The disclosure is objected to because of the following informalities:
In paragraph [0033], line 7, “and a construction corresponding to the MVM operation class” should read “and a constructor corresponding to the MVM operation class”
In paragraph [0036], line 9, “network program 204, record the matrix M” should read “network program 204, records the matrix M”
Appropriate correction is required.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 7-9, 12-16, and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Murray et al. (US 2019/0034785 A1) in view of Ambrosi et al. ("Hardware-Software Co-Design for an Analog-Digital Accelerator for Machine Learning"), in view of Ravishankar et al. (US 2019/0278574 A1), and further in view of Datta et al. (US 2020/0042856 A1).
Regarding Claim 1,
	Murray et al. teaches a system (Fig. 3; [0087]: "As an example, FIG. 3 is a diagram illustrating elements or components that may be present in a computer device or system 300 configured to implement a method, process, function, or operation in accordance with an embodiment of the invention" teaches a system 300 for implementing the methods, processes, or operation of the invention) comprising: 
a processor (Fig. 3; [0087]: "The interconnection via the system bus 302 allows one or more processors 320 to communicate with each subsystem and to control the execution of instructions" teaches that the system 300 comprises processors 320); 
a machine-readable storage medium comprising instructions executable by the processor (Fig. 3; [0087]: "… to control the execution of instructions that may be stored in a system memory 322 and/or the fixed disk 308, as well as the exchange of information between subsystems. The system memory 322 and/or the fixed disk 308 may embody a tangible computer-readable medium" teaches that the system 300 comprises system memory 322, which may be a computer-readable medium (machine-readable storage medium)) to: 
generate a computation graph corresponding to the neural network model, the computation graph comprising a first plurality of nodes, each of the first plurality of nodes representing one of a MVM operation, a matrix, and a vector ([0039]: "Embodiments or implementations of the approach(es) described herein may be used to generate a program or process for making a decision that is derived at least in part from data or other outputs produced by a neural network. The program or process may be specified by a computation graph, which is a form of representing a computation or computing program by the data flow (e.g., expressed as data tensors) through a series of operations (indicated by nodes in the graph)" teaches generating a computation graph representing an input computing program for a neural network (i.e. a neural network program). [0039-0041]: " The operations or graph nodes include an operator, where the operator is used to make a nondeterministic choice that influences the program logic. The combination of a computation graph and the operator represents a collection of decisions and information about the computing architecture used to make each decision. The computation graph describes how a program will execute or how a neural network will operate to process an input. A computation graph structure is typically described in terms of the following aspects: a node is (or has) a value (e.g., tensor, matrix, vector, scalar)" teaches that the computation graph includes nodes that represent operations and values including tensor, matrix, vector and scalar. [0048]: “The network parameters determine how an output is produced from the input data. As tensors, these parameters and values can be manipulated with standard operations, such as matrix-vector multiplication” teaches that the operations include matrix-vector multiplication operations).
	Murray et al. does not appear to explicitly teach receive, in a programming environment, a neural network program corresponding to a neural network model, expressed using a domain specific language (DSL), and comprising a plurality of matrices, a plurality of vectors, and a plurality of matrix-vector multiplication (MVM) operations, wherein the plurality of matrices, the plurality of vectors, and the plurality of MVM operations are declared using a matrix class, a vector class, and a MVM operation class, respectively, defined by the DSL; populate a class model corresponding to the neural network model with a data structure pointing to the computation graph; traverse the computation graph based on the class model; assign, based on traversal of the computation graph, the plurality of MVM operations to MVM units of a neural network accelerator that is to execute the neural network model, each MVM unit being capable of performing a MVM operation; and generate, based on assignment of the plurality of MVM operations, an executable file corresponding to the neural network model for execution by the neural network accelerator.
	However, Ambrosi et al. teaches receive, in a programming environment, a neural network program corresponding to a neural network model, expressed using a domain specific language (DSL) (Section III, sixth-ninth paragraphs: "The method prepare has as input the model and the graph in the ONNX representation format. Its main objective is to translate the model from ONNX format to the appropriate format of a framework or execution platform, cross-compiling the ONNX code to the desired back-end implementation … The method run provides the interface for loading the model into the platform, input preparation, inference execution, and finally, the collection of the outputs … In our solution, an initial version of the prepare method was developed to interpret ONNX models and generate a native graph representation that can be operated on by our compiler (Section V) to generate ISA code" teaches that an ONNX (Open Neural Network Exchange Format) model (neural network model) is input (received) in the ONNX format. Section III, first-third paragraphs: "The popularity of ML applications, especially neural networks and deep learning, has stimulated the emergence of multiple programming frameworks for supporting these applications. Effectively, these frameworks are Domain Specific Languages (DSLs) for building execution models … To solve neural network model interoperability, a couple of initiatives where launched to help define an exchangeable format for neural network models [41], [42]. The Open Neural Network Exchange Format (ONNX) [41] has resulted in an initial interest and engagement of the open source community and industry, supporting most of the known frameworks" teaches that the ONNX framework is a domain-specific language (DSL) for building executions models from a programming framework for a neural network (i.e. a neural network program in a programming environment)); 
assign … the plurality of MVM operations to MVM units of a neural network accelerator that is to execute the neural network model, each MVM unit being capable of performing a MVM operation (Fig. 2; Fig. 3; Section II. B, second paragraph: "Fig. 3(a) shows the ISA encoding of an MVM instruction which orchestrates the execution of an MVM operation on a crossbar, including digital-to-analog conversion, analog MVM execution, and analog-to-digital conversion. The mask operand specifies the crossbars in the core that will be active during the MVM operation" teaches that a MVM operation is assigned to a memristor crossbar (MVM unit) of an accelerator in order for the crossbar to execute (perform) the MVM operation of the neural model); and 
generate, based on assignment of the plurality of MVM operations, an executable file corresponding to the neural network model for execution by the neural network accelerator (Fig. 7; Section V, first paragraph: "The compiler generates ISA code from a graph representation of a neural network model that is either constructed by the ONNX back-end or specified by the programmer via our custom API" teaches ISA code (executable file) corresponding to a neural network model is generated based on a graph representation of the neural network model (i.e. a computation graph). Fig. 7; teaches that the computation graph is partitioned into crossbars (MVM units), cores, and tiles. Fig. 7; Section V. F: "The final stage is to generate assembly code for each tile and core from the linearized instruction sequences" teaches that the assembly (ISA) code (executable file) is generated for the assigned tiles and cores (i.e. based on the assignment of crossbars (MVM units) in cores and tiles)).
	Murray et al. and Ambrosi et al. are analogous to the claimed invention because they are directed to using computation graphs for neural network computations.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate receive, in a programming environment, a neural network program corresponding to a neural network model, expressed using a domain specific language (DSL); assign … the plurality of MVM operations to MVM units of a neural network accelerator that is to execute the neural network model, each MVM unit being capable of performing a MVM operation; and generate, based on assignment of the plurality of MVM operations, an executable file corresponding to the neural network model for execution by the neural network accelerator as taught by Ambrosi et al. to the disclosed invention of Murray et al.
	One of ordinary skill in the art would have been motivated to make this modification to allow "flexibility in resources allocation for partition, distribution and parallelization of models and layers to cores and tiles, as well as for communication and power management of the software stack targets ISA-programmable hybrid accelerators" (Ambrosi et al. Section IX, third paragraph).
	Murray et al. in view of Ambrosi et al. does not appear to explicitly teach a neural network program … comprising a plurality of matrices, a plurality of vectors, and a plurality of matrix-vector multiplication (MVM) operations, wherein the plurality of matrices, the plurality of vectors, and the plurality of MVM operations are declared using a matrix class, a vector class, and a MVM operation class, respectively, defined by the DSL; populate a class model corresponding to the neural network model with a data structure pointing to the computation graph; traverse the computation graph based on the class model; assign, based on traversal of the computation graph, the plurality of MVM operations.
	However, Ravishankar et al. teaches a neural network program … comprising a plurality of matrices, a plurality of vectors, and a plurality of matrix-vector multiplication (MVM) operations, wherein the plurality of matrices, the plurality of vectors, and the plurality of MVM operations are declared using a matrix class, a vector class, and a MVM operation class, respectively, defined by the DSL (Fig. 2A; [0033]-[0035]: "FIGS. 2A-2B illustrate an example of how the compiler of FIG. 1 generates and partitions a graph of nodes based on program code, according to various embodiments. The program code discussed in conjunction with the example shown in FIGS. 2A-2B is listed below: 

    PNG
    media_image1.png
    243
    355
    media_image1.png
    Greyscale

In one embodiment, listing 1 sets forth example program code written in the Python programming language. The example program code creates variables W, a, b, and x (a matrix and three arrays, respectively) and then evaluates an expression based on these variables, setting the result to the variable output. The example program code of Listing 1 can be executed by a serial processor. However, when W, a, b, and x have very large dimensions, the computation of output may take an excessive amount of time. In this situation, compiler 100 can be implemented to analyze this example program code and generate a graph of nodes, as shown in FIG. 2A" teaches an example input program code (i.e. a neural network program) is written in a programming language (defined by the DSL) including a declared matrix W, declared vectors a and b, and a declared MVM operation x).
Murray et al., Ambrosi et al., and Ravishankar et al., are analogous to the claimed invention because they are directed to using computation graphs for neural network computations.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate a neural network program … comprising a plurality of matrices, a plurality of vectors, and a plurality of matrix-vector multiplication (MVM) operations, wherein the plurality of matrices, the plurality of vectors, and the plurality of MVM operations are declared using a matrix class, a vector class, and a MVM operation class, respectively, defined by the DSL as taught by Ravishankar et al. to the disclosed invention of Murray et al. in view of Ambrosi et al.
	One of ordinary skill in the art would have been motivated to make this modification because "At least one technological advantage of the techniques described herein is that a serial computer program designed for serial execution can be automatically converted into a parallel computer program that is optimized for parallel execution. Accordingly, serial computer programs can quickly and easily be accelerated via parallel processing hardware" (Ravishankar et al. [0095]).
Murray et al. in view of Ambrosi et al., and further in view of Ravishankar et al. does not appear to explicitly teach populate a class model corresponding to the neural network model with a data structure pointing to the computation graph; traverse the computation graph based on the class model; and assign, based on traversal of the computation graph, the plurality of MVM operations.
However, Datta et al. teaches populate a class model corresponding to the neural network model with a data structure pointing to the computation graph ([0054]-[0056]: "Schedulers according to various embodiments of the present disclosure are able to ingest any network trained across any deep learning framework. For example, networks may originate from TensorFlow, PyTorch, Caffe2, M×Net, or other frameworks known in the art. In various embodiments, the scheduler is compliant with the ONNX Network Interface standard … In various embodiments, the scheduler includes a scheme library. The scheme library comprises a plurality of schemes, and maps various deep learning operations to those schemes. A scheme comprises computation and communication primitives necessary to implement a neural network layer … In various embodiments, a scheme includes parameters defining the data format and interaction of a neural network" teaches a scheduler (class model) that is populated by the schemes of a neural network model, where the schemes include neural network operations and input parameters. [0058]: "In various embodiments, the scheduler includes a graph sequencer. In some embodiments, each node in the computation graph corresponds to a scheme. Such a graph can be used to establish precedence relationship between logical cores. For some networks, the graph may be DAG. The precedence relationship between cores is used to ensure that each operation is scheduled for computation on a physical core only after all incoming edges to it are already scheduled" teaches that the schemes of the scheduler (class model) correspond (point) to nodes of a computation graph (i.e. point to the computation graph)); 
traverse the computation graph based on the class model ([0058]: "A network is traversed using a breadth first traversal. A breadth-first schedule (BFS) is preferred in various embodiments to a depth-first schedule (DFS). In particular, during graph traversal, BFS schedules nodes in the graph on the cores in the order that they are discovered. For example, with BFS, nodes within each layer are scheduled first, before nodes in the next layer of the graph" teaches that the scheduler (class model) is used to traverse the computation graph using a breadth first traversal); and 
assign, based on traversal of the computation graph, the plurality of MVM operations ([0058]: "A network is traversed using a breadth first traversal. A breadth-first schedule (BFS) is preferred in various embodiments to a depth-first schedule (DFS). In particular, during graph traversal, BFS schedules nodes in the graph on the cores in the order that they are discovered. For example, with BFS, nodes within each layer are scheduled first, before nodes in the next layer of the graph" teaches that during breadth first traversal (BFS) of the graph using the scheduler (class model), the operations of the nodes of the graph are assigned to cores. [0052]: "As set out above, each core computes a vector matrix multiplication" teaches that each core is capable of performing vector matrix multiplication (MVM operation)).
Murray et al., Ambrosi et al., Ravishankar et al., and Datta et al. are analogous to the claimed invention because they are directed to using computation graphs for neural network computations.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate populate a class model corresponding to the neural network model with a data structure pointing to the computation graph; traverse the computation graph based on the class model; and assign, based on traversal of the computation graph, the plurality of MVM operations as taught by Datta et al. to the disclosed invention of Murray et al. in view of Ambrosi et al., and further in view of Ravishankar et al.
	One of ordinary skill in the art would have been motivated to make this modification "to efficiently map an arbitrary dimensional tensor onto any set of neural cores such that the computational efficiency is maximized, memory access is efficient, and re-shuffling of activations is minimized" (Datta et al. [0088]).
Regarding Claim 2,
	Murray et al. in view of Ambrosi et al., in view of Ravishankar et al., and further in view of Datta et al. teaches the system of claim 1.
	Additionally, Ambrosi et al. further teaches comprising the neural network accelerator, wherein the neural network accelerator comprises a plurality of tiles, each tile comprises a plurality of cores (Fig. 2; Section II. A, first paragraph: " In the first stage of compilation, the graph is hierarchically partitioned and the operations in the graph are distributed to different crossbars, cores, and tiles. The hierarchical partitioning uses a bottom-up approach whereby the graph is first partitioned to crossbars, then to cores, then to tiles" teaches a neural network accelerator comprising a plurality of tiles, with each tile comprising a plurality of cores),HP 90660018 28 
each core comprises a plurality of MVM units, wherein the plurality of MVM units of each core is part of the MVM units of the neural network accelerator (Fig. 2; Section II. A, first paragraph: "In the first stage of compilation, the graph is hierarchically partitioned and the operations in the graph are distributed to different crossbars, cores, and tiles. The hierarchical partitioning uses a bottom-up approach whereby the graph is first partitioned to crossbars, then to cores, then to tiles" teaches that each core of the neural network accelerator comprises a plurality of memristor crossbars (MVM units). Fig. 2; Section I, second paragraph: "The reason memristive crossbars have been particularly attractive is their ability to perform low-energy and low-latency Matrix Vector Multiplication (MVM) operations" teaches that memristor crossbars are used for performing MVM operations (i.e. memristors crossbars can be considered MVM units)), and 
in response to assignment of a MVM operation to one of the MVM units of the neural network accelerator, the MVM unit is to perform the MVM operation (Fig. 2; Fig. 3; Section II. B, second paragraph: "Fig. 3(a) shows the ISA encoding of an MVM instruction which orchestrates the execution of an MVM operation on a crossbar, including digital-to-analog conversion, analog MVM execution, and analog-to-digital conversion. The mask operand specifies the crossbars in the core that will be active during the MVM operation" teaches that an MVM instruction is used to assign an MVM operation to a memristor crossbar (MVM unit) of the accelerator in order for the crossbar to execute (perform) the MVM operation).
	Murray et al., Ambrosi et al., Ravishankar et al., and Datta et al. are analogous to the claimed invention because they are directed to using computation graphs for neural network computations.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate comprising the neural network accelerator, wherein the neural network accelerator comprises a plurality of tiles, each tile comprises a plurality of cores,HP 90660018 28 each core comprises a plurality of MVM units, wherein the plurality of MVM units of each core is part of the MVM units of the neural network accelerator, and in response to assignment of a MVM operation to one of the MVM units of the neural network accelerator, the MVM unit is to perform the MVM operation. as taught by Ambrosi et al. to the disclosed invention of Murray et al. in view of Ravishankar et al., and further in view of Datta et al.
	One of ordinary skill in the art would have been motivated to make this modification to allow "flexibility in resources allocation for partition, distribution and parallelization of models and layers to cores and tiles, as well as for communication and power management of the software stack targets ISA-programmable hybrid accelerators" (Ambrosi et al. Section IX, third paragraph).
Regarding Claim 3,
	Murray et al. in view of Ambrosi et al., in view of Ravishankar et al., and further in view of Datta et al. teaches the system of claim 2.
	Additionally, Ambrosi et al. further teaches wherein each MVM unit is a memristor crossbar (Fig. 2; Section II. A, first paragraph: "In the first stage of compilation, the graph is hierarchically
partitioned and the operations in the graph are distributed to different crossbars, cores, and tiles. The hierarchical partitioning uses a bottom-up approach whereby the graph is first partitioned to crossbars, then to cores, then to tiles" teaches that each core of the neural network accelerator comprises a plurality of memristor crossbars (MVM units). Fig. 2; Section I, second paragraph: "The reason memristive crossbars have been particularly attractive is their ability to perform low-energy and low-latency Matrix Vector Multiplication (MVM) operations" teaches that memristor crossbars are used for performing MVM operations (i.e. memristors crossbars can be considered MVM units)).
	Murray et al., Ambrosi et al., Ravishankar et al., and Datta et al. are analogous to the claimed invention because they are directed to using computation graphs for neural network computations.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein each MVM unit is a memristor crossbar as taught by Ambrosi et al. to the disclosed invention of Murray et al. in view of Ravishankar et al., and further in view of Datta et al.
	One of ordinary skill in the art would have been motivated to make this modification to allow "flexibility in resources allocation for partition, distribution and parallelization of models and layers to cores and tiles, as well as for communication and power management of the software stack targets ISA-programmable hybrid accelerators" (Ambrosi et al. Section IX, third paragraph).
Regarding Claim 7,
	Murray et al. in view of Ambrosi et al., in view of Ravishankar et al., and further in view of Datta et al. teaches the system of claim 1.
	Additionally, Ambrosi et al. further teaches wherein, to assign the plurality of MVM operations to MVM units, the instructions are executable by the processor to assign a first MVM operation and second MVM operation that are related to each other to a first MVM unit and a second MVM unit (Section V. A, first paragraph: "In the first stage of compilation, the graph is hierarchically
partitioned and the operations in the graph are distributed to different crossbars, cores, and tiles. The hierarchical partitioning uses a bottom-up approach whereby the graph is first partitioned to crossbars, then to cores, then to tiles. The partitioning process starts by assigning all MVM operations that use the same constant matrix to the same virtual crossbar. Virtual crossbars, cores, and tiles are used at this stage for separation of concerns; they are mapped to physical crossbars, cores, and tiles at a later stage" teaches that each operation in a graph is assigned to a crossbar (MVM unit). Section V. A, second paragraph: "The second level of partitioning treats each sub-graph as a node in a graph and aggregates the edges across sub-graphs into a single edge. This new graph is then partitioned, grouping together nodes (i.e., virtual crossbars) that communicate frequently into the same sub-graph (i.e., virtual cores)" teaches that operations communicate frequently (are related to each other) have their individual crossbars (MVM units) grouped together (i.e. related operations are in separate crossbars, but are still grouped)), 
wherein the first MVM unit and the second MVM unit are present in a single core of the neural network accelerator (Section V. A, second paragraph: "The second level of partitioning treats each sub-graph as a node in a graph and aggregates the edges across sub-graphs into a single edge. This new graph is then partitioned, grouping together nodes (i.e., virtual crossbars) that communicate frequently into the same sub-graph (i.e., virtual cores)" teaches that operations that communicate frequently (are related to each other) are assigned to the same core).
	Murray et al., Ambrosi et al., Ravishankar et al., and Datta et al. are analogous to the claimed invention because they are directed to using computation graphs for neural network computations.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein, to assign the plurality of MVM operations to MVM units, the instructions are executable by the processor to assign a first MVM operation and second MVM operation that are related to each other to a first MVM unit and a second MVM unit, wherein the first MVM unit and the second MVM unit are present in a single core of the neural network accelerator as taught by Ambrosi et al. to the disclosed invention of Murray et al. in view of Ravishankar et al., and further in view of Datta et al.
	One of ordinary skill in the art would have been motivated to make this modification to allow "flexibility in resources allocation for partition, distribution and parallelization of models and layers to cores and tiles, as well as for communication and power management of the software stack targets ISA-programmable hybrid accelerators" (Ambrosi et al. Section IX, third paragraph).
Regarding Claim 8,
	Murray et al. in view of Ambrosi et al., in view of Ravishankar et al., and further in view of Datta et al. teaches the system of claim 1.
	Additionally, Ravishankar et al. further teaches whereinHP 90660018 29 the neural network model comprises a first vector-vector addition (VVA) operation, the first VVA operation involves a vector that is a result of a first MVM operation of the plurality of MVM operations (Fig. 2A; [0035]-[0036]: "Referring now to FIG. 2A, in one embodiment, compiler 100 generates graph 200 based on the example program code shown in Listing 1. As shown, graph 200 includes nodes 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, and 224 connected by various edges. A given node of graph 200 corresponds to an operation or a value specified in the example program code. A given incoming edge of graph 200 corresponds to an operand specified in or generated via the program code. In one embodiment, graph 200 is a directed acyclic graph. In one embodiment, the nodes of graph 200 are coupled together via edges to represent that some nodes receive input from other nodes. For example, node 214 (representing an addition operation) receives input from node 216 (representing the value of b) and node 218 (representing the output of a dot product between the transpose of W and a) via the edges shown. Node 214 computes the sum of the outputs of nodes 216 and node 218" teaches that the neural network program of the neural model comprises a first vector addition operation (node 214) that involves a result of a MVM operation (218)).
	Murray et al., Ambrosi et al., Ravishankar et al., and Datta et al. are analogous to the claimed invention because they are directed to using computation graphs for neural network computations.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate whereinHP 90660018 29 the neural network model comprises a first vector-vector addition (VVA) operation, the first VVA operation involves a vector that is a result of a first MVM operation of the plurality of MVM operations as taught by Ravishankar et al. to the disclosed invention of Murray et al. in view of Ambrosi et al., and further in view of Datta et al.
	One of ordinary skill in the art would have been motivated to make this modification because "At least one technological advantage of the techniques described herein is that a serial computer program designed for serial execution can be automatically converted into a parallel computer program that is optimized for parallel execution. Accordingly, serial computer programs can quickly and easily be accelerated via parallel processing hardware" (Ravishankar et al. [0095]).
	Furthermore, Ambrosi et al. further teaches the first MVM operation is assigned to a first MVM unit of a first core of the neural network accelerator (Section V. A, first paragraph: " In the first stage of compilation, the graph is hierarchically partitioned and the operations in the graph are distributed
to different crossbars, cores, and tiles. The hierarchical partitioning uses a bottom-up approach whereby the graph is first partitioned to crossbars, then to cores, then to tiles. The partitioning process starts by assigning all MVM operations that use the same constant matrix to the same virtual
crossbar. Virtual crossbars, cores, and tiles are used at this stage for separation of concerns; they are mapped to physical crossbars, cores, and tiles at a later stage" teaches that each MVM operation in a graph is assigned to a crossbar (MVM unit) of a core (i.e. the first MVM operation would be assigned to the first crossbar (MVM unit) of the first core), and 
the instructions are executable by the processor to assign the first VVA operation to a first VVA unit in the first core of the neural network accelerator (Section V. A, second paragraph: " The second level of partitioning treats each sub-graph as a node in a graph and aggregates the edges across sub-graphs into a single edge. This new graph is then partitioned, grouping together nodes (i.e., virtual crossbars) that communicate frequently into the same sub-graph (i.e., virtual cores)" teaches that operations that communicate frequently (are related to each other) are assigned to the same core (i.e. since the first MVM operation is assigned to the first core, the related first vector operation would also be assigned to the first core)).
	Murray et al., Ambrosi et al., Ravishankar et al., and Datta et al. are analogous to the claimed invention because they are directed to using computation graphs for neural network computations.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the first MVM operation is assigned to a first MVM unit of a first core of the neural network accelerator, and the instructions are executable by the processor to assign the first VVA operation to a first VVA unit in the first core of the neural network accelerator as taught by Ambrosi et al. to the disclosed invention of Murray et al. in view of Ravishankar et al., and further in view of Datta et al.
	One of ordinary skill in the art would have been motivated to make this modification to allow "flexibility in resources allocation for partition, distribution and parallelization of models and layers to cores and tiles, as well as for communication and power management of the software stack targets ISA-programmable hybrid accelerators" (Ambrosi et al. Section IX, third paragraph).
Regarding Claim 9,
	Murray et al. in view of Ambrosi et al., in view of Ravishankar et al., and further in view of Datta et al. teaches the system of claim 1.
	Additionally, Ambrosi et al. further teaches wherein, subsequent to assignment of the plurality of MVM operations to the MVM units, to generate the executable file, the instructions are executable by the processor to: convert the computation graph into a sequential stream of instructions (Section V. E, first paragraph: "The next stage of compilation is linearization. In this stage, the graph is linearized into a sequence of instructions for each tile and core" teaches that the graph is linearized (converted) into a sequence of instructions in order to generate the executable ISA instructions (executable file)); and 
subject the sequential stream of instructions to a code optimization technique (Section V. G, first paragraph: "The DSL and the compiler provide several value adds and failsafes to aid rapid software development, while safeguarding the programmer from common programming pitfalls … Further, the compiler diagnoses unused tensors, thus avoiding inadvertent programming errors and conserving precious device memory" teaches that the instructions when executed are able to diagnose unused tensors (code optimization technique) in order to avoid errors and conserve memory).
	Murray et al., Ambrosi et al., Ravishankar et al., and Datta et al. are analogous to the claimed invention because they are directed to using computation graphs for neural network computations.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein, subsequent to assignment of the plurality of MVM operations to the MVM units, to generate the executable file, the instructions are executable by the processor to: convert the computation graph into a sequential stream of instructions; and subject the sequential stream of instructions to a code optimization technique as taught by Ambrosi et al. to the disclosed invention of Murray et al. in view of Ravishankar et al., and further in view of Datta et al.
	One of ordinary skill in the art would have been motivated to make this modification to allow "flexibility in resources allocation for partition, distribution and parallelization of models and layers to cores and tiles, as well as for communication and power management of the software stack targets ISA-programmable hybrid accelerators" (Ambrosi et al. Section IX, third paragraph).
Regarding Claim 12,
	Murray et al. teaches a method ([0087]: "As noted, the system, apparatus, methods, processes, functions, and/or operations for implementing an embodiment of the invention may be wholly or partially implemented in the form of a set of instructions executed by one or more programmed computer processors such as a central processing unit (CPU), GPU, or microprocessor" teaches a method that may be implemented by a set of instructions executed on a processor) comprising: 
in response to an instruction to execute the neural network program: generating, by the processing resource, a computation graph corresponding to the neural network model, the computation graph comprising a first plurality of root nodes and a first plurality of leaf nodes, each of the first plurality of root nodes representing a MVM operation … ([0039]: "Embodiments or implementations of the approach(es) described herein may be used to generate a program or process for making a decision that is derived at least in part from data or other outputs produced by a neural network. The program or process may be specified by a computation graph, which is a form of representing a computation or computing program by the data flow (e.g., expressed as data tensors) through a series of operations (indicated by nodes in the graph)." teaches generating a computation graph representing an input computing program for a neural network (i.e. a neural network program). [0039]-[0041]: "The operations or graph nodes include an operator, where the operator is used to make a nondeterministic choice that influences the program logic. The combination of a computation graph and the operator represents a collection of decisions and information about the computing architecture used to make each decision. The computation graph describes how a program will execute or how a neural network will operate to process an input. A computation graph structure is typically described in terms of the following aspects: a node is (or has) a value (e.g., tensor, matrix, vector, scalar)" teaches that the computation graph includes nodes that represent operations and values including tensor, matrix, vector and scalar. [0048]: “The network parameters determine how an output is produced from the input data. As tensors, these parameters and values can be manipulated with standard operations, such as matrix-vector multiplication” teaches that the operations include matrix-vector multiplication operations (root nodes) based on the input tensor data (leaf nodes)).
	Murray et al. does not appear to explicitly teach providing, by a processing resource, a programming environment in which a neural network program is to be expressed using a domain specific language (DSL), wherein the DSL defines a matrix class, a vector class, and a matrix-vector multiplication (MVM) operation class; receiving, by the processing resource, in the programming environment, the neural network program corresponding to a neural network model and comprising a plurality of matrices, a plurality of vectors, and a plurality of MVM operations; … each of the first plurality of leaf nodes representing one of: a matrix and a vector; and populating, by the processing resource, a class model corresponding to the neural network model with a data structure pointing to the computation graph; and in response to an instruction to generate an executable file corresponding to the neural network model: traversing, by the processing resource, the computation graph based on the class model; assigning, by the processing resource, based on traversal of the computation graph, the plurality of MVM operations to MVM units of a neural network accelerator that is to execute the neural network model, each MVM unit being capable of performing a MVM operation; andHP 90660018 31generating, by the processing resource, the executable file executable by the neural network accelerator based on assignment of the plurality of MVM operations.
	However, Ambrosi et al. teaches providing, by a processing resource, a programming environment in which a neural network program is to be expressed using a domain specific language (DSL) (Section V, first paragraph: "The compiler generates ISA code from a graph representation of a neural network model that is either constructed by the ONNX back-end or specified by the programmer via our custom API" teaches that a neural network model is expressed using the ONNX (DSL) back end (e.g. as a neural network program in a programming environment) for use in generating the ISA code (executable file) in a compiler (processing resource)); 
receiving, by the processing resource, in the programming environment, the neural network program corresponding to a neural network model (Section III, sixth-ninth paragraphs: "The method prepare has as input the model and the graph in the ONNX representation format. Its main objective is to translate the model from ONNX format to the appropriate format of a framework or execution platform, cross-compiling the ONNX code to the desired back-end implementation … The method run provides the interface for loading the model into the platform, input preparation, inference execution, and finally, the collection of the outputs … In our solution, an initial version of the prepare method was developed to interpret ONNX models and generate a native graph representation that can be operated on by our compiler (Section V) to generate ISA code" teaches that an ONNX (Open Neural Network Exchange Format) model (neural network model) is input (received) in the ONNX format in a compiler (processing resource). Section III, first-third paragraphs: "The popularity of ML applications, especially neural networks and deep learning, has stimulated the emergence of multiple programming frameworks for supporting these applications. Effectively, these frameworks are Domain Specific Languages (DSLs) for building execution models … To solve neural network model interoperability, a couple of initiatives where launched to help define an exchangeable format for neural network models [41], [42]. The Open Neural Network Exchange Format (ONNX) [41] has resulted in an initial interest and engagement of the open source community and industry, supporting most of the known frameworks" teaches that the ONNX framework is a domain-specific language (DSL) for building executions models from a programming framework for a neural network (i.e. a neural network program in a programming environment)); 
assigning, by the processing resource, … the plurality of MVM operations to MVM units of a neural network accelerator that is to execute the neural network model, each MVM unit being capable of performing a MVM operation (Fig. 2; Fig. 3; Section II. B, second paragraph: "Fig. 3(a) shows the ISA encoding of an MVM instruction which orchestrates the execution of an MVM operation on a crossbar, including digital-to-analog conversion, analog MVM execution, and analog-to-digital conversion. The mask operand specifies the crossbars in the core that will be active during the MVM operation" teaches that a MVM operation is assigned to a memristor crossbar (MVM unit) of an accelerator in order for the crossbar to execute (perform) the MVM operation of the neural model); andHP 90660018 
31generating, by the processing resource, the executable file executable by the neural network accelerator based on assignment of the plurality of MVM operations (Fig. 7; Section V, first paragraph: "The compiler generates ISA code from a graph representation of a neural network model that is either constructed by the ONNX back-end or specified by the programmer via our custom API" teaches ISA code (executable file) corresponding to a neural network model is generated based on a graph representation of the neural network model (i.e. a computation graph). Fig. 7; teaches that the computation graph is partitioned into crossbars (MVM units), cores, and tiles. Fig. 7; Section V. F: "The final stage is to generate assembly code for each tile and core from the linearized instruction sequences" teaches that the assembly (ISA) code (executable file) is generated for the assigned tiles and cores (i.e. based on the assignment of crossbars (MVM units) in cores and tiles)).
	Murray et al. and Ambrosi et al. are analogous to the claimed invention because they are directed to using computation graphs for neural network computations.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate providing, by a processing resource, a programming environment in which a neural network program is to be expressed using a domain specific language (DSL); receiving, by the processing resource, in the programming environment, the neural network program corresponding to a neural network model; assigning, by the processing resource, … the plurality of MVM operations to MVM units of a neural network accelerator that is to execute the neural network model, each MVM unit being capable of performing a MVM operation; andHP 90660018 31generating, by the processing resource, the executable file executable by the neural network accelerator based on assignment of the plurality of MVM operations as taught by Ambrosi et al. to the disclosed invention of Murray et al.
	One of ordinary skill in the art would have been motivated to make this modification to allow "flexibility in resources allocation for partition, distribution and parallelization of models and layers to cores and tiles, as well as for communication and power management of the software stack targets ISA-programmable hybrid accelerators" (Ambrosi et al. Section IX, third paragraph).
	Murray et al. in view of Ambrosi et al. does not appear to explicitly teach wherein the DSL defines a matrix class, a vector class, and a matrix-vector multiplication (MVM) operation class; the neural network program … comprising a plurality of matrices, a plurality of vectors, and a plurality of MVM operations; … each of the first plurality of leaf nodes representing one of: a matrix and a vector; and populating, by the processing resource, a class model corresponding to the neural network model with a data structure pointing to the computation graph; and in response to an instruction to generate an executable file corresponding to the neural network model: traversing, by the processing resource, the computation graph based on the class model; assigning, by the processing resource, based on traversal of the computation graph, the plurality of MVM operations.
	However, Ravishankar et al. teaches wherein the DSL defines a matrix class, a vector class, and a matrix-vector multiplication (MVM) operation class (Fig. 2A; [0033]-[0035]: "FIGS. 2A-2B illustrate an example of how the compiler of FIG. 1 generates and partitions a graph of nodes based on program code, according to various embodiments. The program code discussed in conjunction with the example shown in FIGS. 2A-2B is listed below: 

    PNG
    media_image1.png
    243
    355
    media_image1.png
    Greyscale

In one embodiment, listing 1 sets forth example program code written in the Python programming language. The example program code creates variables W, a, b, and x (a matrix and three arrays, respectively) and then evaluates an expression based on these variables, setting the result to the variable output. The example program code of Listing 1 can be executed by a serial processor. However, when W, a, b, and x have very large dimensions, the computation of output may take an excessive amount of time. In this situation, compiler 100 can be implemented to analyze this example program code and generate a graph of nodes, as shown in FIG. 2A" teaches an example input program code (i.e. a neural network program) is written in a programming language (defined by the DSL) including a declared matrix W, declared vectors a and b, and a declared MVM operation x); 
the neural network program … comprising a plurality of matrices, a plurality of vectors, and a plurality of MVM operations (Fig. 2A; [0033]-[0035]: "FIGS. 2A-2B illustrate an example of how the compiler of FIG. 1 generates and partitions a graph of nodes based on program code, according to various embodiments. The program code discussed in conjunction with the example shown in FIGS. 2A-2B is listed below: 

    PNG
    media_image1.png
    243
    355
    media_image1.png
    Greyscale

In one embodiment, listing 1 sets forth example program code written in the Python programming language. The example program code creates variables W, a, b, and x (a matrix and three arrays, respectively) and then evaluates an expression based on these variables, setting the result to the variable output. The example program code of Listing 1 can be executed by a serial processor. However, when W, a, b, and x have very large dimensions, the computation of output may take an excessive amount of time. In this situation, compiler 100 can be implemented to analyze this example program code and generate a graph of nodes, as shown in FIG. 2A" teaches an example input program code (i.e. a neural network program) is written in a programming language (defined by the DSL) including a declared matrix W, declared vectors a and b, and a declared MVM operation x); and 
… each of the first plurality of leaf nodes representing one of: a matrix and a vector (Fig. 2A; teaches that the leaf nodes represent a matrix or a vector).
Murray et al., Ambrosi et al., and Ravishankar et al., are analogous to the claimed invention because they are directed to using computation graphs for neural network computations.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein the DSL defines a matrix class, a vector class, and a matrix-vector multiplication (MVM) operation class; the neural network program … comprising a plurality of matrices, a plurality of vectors, and a plurality of MVM operations; and … each of the first plurality of leaf nodes representing one of: a matrix and a vector as taught by Ravishankar et al. to the disclosed invention of Murray et al. in view of Ambrosi et al.
	One of ordinary skill in the art would have been motivated to make this modification because "At least one technological advantage of the techniques described herein is that a serial computer program designed for serial execution can be automatically converted into a parallel computer program that is optimized for parallel execution. Accordingly, serial computer programs can quickly and easily be accelerated via parallel processing hardware" (Ravishankar et al. [0095]).
Murray et al. in view of Ambrosi et al., and further in view of Ravishankar et al. does not appear to explicitly teach populating, by the processing resource, a class model corresponding to the neural network model with a data structure pointing to the computation graph; and in response to an instruction to generate an executable file corresponding to the neural network model: traversing, by the processing resource, the computation graph based on the class model; assigning, by the processing resource, based on traversal of the computation graph, the plurality of MVM operations.
However, Datta et al. teaches populating, by the processing resource, a class model corresponding to the neural network model with a data structure pointing to the computation graph ([0054]-[0056]: "Schedulers according to various embodiments of the present disclosure are able to ingest any network trained across any deep learning framework. For example, networks may originate from TensorFlow, PyTorch, Caffe2, M×Net, or other frameworks known in the art. In various embodiments, the scheduler is compliant with the ONNX Network Interface standard … In various embodiments, the scheduler includes a scheme library. The scheme library comprises a plurality of schemes, and maps various deep learning operations to those schemes. A scheme comprises computation and communication primitives necessary to implement a neural network layer … In various embodiments, a scheme includes parameters defining the data format and interaction of a neural network" teaches a scheduler (class model) that is populated by the schemes of a neural network model, where the schemes include neural network operations and input parameters. [0058]: "In various embodiments, the scheduler includes a graph sequencer. In some embodiments, each node in the computation graph corresponds to a scheme. Such a graph can be used to establish precedence relationship between logical cores. For some networks, the graph may be DAG. The precedence relationship between cores is used to ensure that each operation is scheduled for computation on a physical core only after all incoming edges to it are already scheduled" teaches that the schemes of the scheduler (class model) correspond (point) to nodes of a computation graph (i.e. point to the computation graph)); and 
in response to an instruction to generate an executable file corresponding to the neural network model: traversing, by the processing resource, the computation graph based on the class model ([0058]: "A network is traversed using a breadth first traversal. A breadth-first schedule (BFS) is preferred in various embodiments to a depth-first schedule (DFS). In particular, during graph traversal, BFS schedules nodes in the graph on the cores in the order that they are discovered. For example, with BFS, nodes within each layer are scheduled first, before nodes in the next layer of the graph" teaches that the scheduler (class model) is used to traverse the computation graph using a breadth first traversal); 
assigning, by the processing resource, based on traversal of the computation graph, the plurality of MVM operations ([0058]: "A network is traversed using a breadth first traversal. A breadth-first schedule (BFS) is preferred in various embodiments to a depth-first schedule (DFS). In particular, during graph traversal, BFS schedules nodes in the graph on the cores in the order that they are discovered. For example, with BFS, nodes within each layer are scheduled first, before nodes in the next layer of the graph" teaches that during breadth first traversal (BFS) of the graph using the scheduler (class model), the operations of the nodes of the graph are assigned to cores. [0052]: "As set out above, each core computes a vector matrix multiplication" teaches that each core is capable of performing vector matrix multiplication (MVM operation)).
Murray et al., Ambrosi et al., Ravishankar et al., and Datta et al. are analogous to the claimed invention because they are directed to using computation graphs for neural network computations.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate populating, by the processing resource, a class model corresponding to the neural network model with a data structure pointing to the computation graph; and in response to an instruction to generate an executable file corresponding to the neural network model: traversing, by the processing resource, the computation graph based on the class model; assigning, by the processing resource, based on traversal of the computation graph, the plurality of MVM operations as taught by Datta et al. to the disclosed invention of Murray et al. in view of Ambrosi et al., and further in view of Ravishankar et al.
	One of ordinary skill in the art would have been motivated to make this modification "to efficiently map an arbitrary dimensional tensor onto any set of neural cores such that the computational efficiency is maximized, memory access is efficient, and re-shuffling of activations is minimized" (Datta et al. [0088]).
Regarding Claim 13,
	Murray et al. in view of Ambrosi et al., in view of Ravishankar et al., and further in view of Datta et al. teaches the method of claim 12.
	Additionally, Ambrosi et al. further teaches comprising execution of the executable file by the neural network accelerator (Abstract, second paragraph: "The compiler generates executable
ISA code that the underlying accelerator can run" teaches that the accelerator runs (executes) the executable ISA code (executable file)).
	Murray et al., Ambrosi et al., Ravishankar et al., and Datta et al. are analogous to the claimed invention because they are directed to using computation graphs for neural network computations.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate comprising execution of the executable file by the neural network accelerator as taught by Ambrosi et al. to the disclosed invention of Murray et al. in view of Ravishankar et al., and further in view of Datta et al.
	One of ordinary skill in the art would have been motivated to make this modification to allow "flexibility in resources allocation for partition, distribution and parallelization of models and layers to cores and tiles, as well as for communication and power management of the software stack targets ISA-programmable hybrid accelerators" (Ambrosi et al. Section IX, third paragraph).
Regarding Claim 14,
	Murray et al. in view of Ambrosi et al., in view of Ravishankar et al., and further in view of Datta et al. teaches the method of claim 13.
	Additionally, Ambrosi et al. further teaches wherein each MVM unit is a memristor crossbar (Fig. 2; Section II. A, first paragraph: "In the first stage of compilation, the graph is hierarchically
partitioned and the operations in the graph are distributed to different crossbars, cores, and tiles. The hierarchical partitioning uses a bottom-up approach whereby the graph is first partitioned to crossbars, then to cores, then to tiles" teaches that each core of the neural network accelerator comprises a plurality of memristor crossbars (MVM units). Fig. 2; Section I, second paragraph: "The reason memristive crossbars have been particularly attractive is their ability to perform low-energy and low-latency Matrix Vector Multiplication (MVM) operations" teaches that memristor crossbars are used for performing MVM operations (i.e. memristors crossbars can be considered MVM units)).
	Murray et al., Ambrosi et al., Ravishankar et al., and Datta et al. are analogous to the claimed invention because they are directed to using computation graphs for neural network computations.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein each MVM unit is a memristor crossbar as taught by Ambrosi et al. to the disclosed invention of Murray et al. in view of Ravishankar et al., and further in view of Datta et al.
	One of ordinary skill in the art would have been motivated to make this modification to allow "flexibility in resources allocation for partition, distribution and parallelization of models and layers to cores and tiles, as well as for communication and power management of the software stack targets ISA-programmable hybrid accelerators" (Ambrosi et al. Section IX, third paragraph).
Regarding Claim 15,
	Murray et al. in view of Ambrosi et al., in view of Ravishankar et al., and further in view of Datta et al. teaches the method of claim 12.
	Additionally, Ravishankar et al. further teaches wherein the computation graph comprises a second plurality of root nodes, each of the second plurality of root nodes representing one of: a vector- vector addition (VVA) operation, a matrix-matrix multiplication (MMM) operation, and scalar-scalar addition (SSA) operation in the neural network model (Fig. 2A; [0035]-[0036]: "Referring now to FIG. 2A, in one embodiment, compiler 100 generates graph 200 based on the example program code shown in Listing 1. As shown, graph 200 includes nodes 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, and 224 connected by various edges. A given node of graph 200 corresponds to an operation or a value specified in the example program code. A given incoming edge of graph 200 corresponds to an operand specified in or generated via the program code. In one embodiment, graph 200 is a directed acyclic graph. In one embodiment, the nodes of graph 200 are coupled together via edges to represent that some nodes receive input from other nodes. For example, node 214 (representing an addition operation) receives input from node 216 (representing the value of b) and node 218 (representing the output of a dot product between the transpose of W and a) via the edges shown. Node 214 computes the sum of the outputs of nodes 216 and node 218" teaches that the generated computation graph on the neural network model includes nodes (second plurality of nodes) representing operations other than an MVM operation, including one of scalar addition (SSA) (node 206) and vector addition (node 214)).
	Murray et al., Ambrosi et al., Ravishankar et al., and Datta et al. are analogous to the claimed invention because they are directed to using computation graphs for neural network computations.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein the computation graph comprises a second plurality of root nodes, each of the second plurality of root nodes representing one of: a vector- vector addition (VVA) operation, a matrix-matrix multiplication (MMM) operation, and scalar-scalar addition (SSA) operation in the neural network model as taught by Ravishankar et al. to the disclosed invention of Murray et al. in view of Ambrosi et al., and further in view of Datta et al.
	One of ordinary skill in the art would have been motivated to make this modification because "At least one technological advantage of the techniques described herein is that a serial computer program designed for serial execution can be automatically converted into a parallel computer program that is optimized for parallel execution. Accordingly, serial computer programs can quickly and easily be accelerated via parallel processing hardware" (Ravishankar et al. [0095]).
	Furthermore, Datta et al. further teaches and the method comprises assigning each VVA operation, MMM operation, and SSA operation to a core of the neural network accelerator ([0058]: "A network is traversed using a breadth first traversal. A breadth-first schedule (BFS) is preferred in various embodiments to a depth-first schedule (DFS). In particular, during graph traversal, BFS schedules nodes in the graph on the cores in the order that they are discovered. For example, with BFS, nodes within each layer are scheduled first, before nodes in the next layer of the graph" teaches that during breadth first traversal (BFS) of the graph using the scheduler (class model), the operation of each node (including the operations of the second plurality of nodes) of the graph is assigned to a core).
	Murray et al., Ambrosi et al., Ravishankar et al., and Datta et al. are analogous to the claimed invention because they are directed to using computation graphs for neural network computations.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate and the method comprises assigning each VVA operation, MMM operation, and SSA operation to a core of the neural network accelerator as taught by Datta et al. to the disclosed invention of Murray et al. in view of Ambrosi et al., and further in view of Ravishankar et al.
	One of ordinary skill in the art would have been motivated to make this modification "to efficiently map an arbitrary dimensional tensor onto any set of neural cores such that the computational efficiency is maximized, memory access is efficient, and re-shuffling of activations is minimized" (Datta et al. [0088]).
Regarding Claim 16,
	Murray et al. in view of Ambrosi et al., in view of Ravishankar et al., and further in view of Datta et al. teaches the method of claim 15.
	Additionally, Ravishankar et al. further teaches wherein the neural network model comprises a first VVA operation, the first VVA operation is related to a first MVM operation of the plurality of MVM operations (Fig. 2A; [0035]-[0036]: "Referring now to FIG. 2A, in one embodiment, compiler 100 generates graph 200 based on the example program code shown in Listing 1. As shown, graph 200 includes nodes 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, and 224 connected by various edges. A given node of graph 200 corresponds to an operation or a value specified in the example program code. A given incoming edge of graph 200 corresponds to an operand specified in or generated via the program code. In one embodiment, graph 200 is a directed acyclic graph. In one embodiment, the nodes of graph 200 are coupled together via edges to represent that some nodes receive input from other nodes. For example, node 214 (representing an addition operation) receives input from node 216 (representing the value of b) and node 218 (representing the output of a dot product between the transpose of W and a) via the edges shown. Node 214 computes the sum of the outputs of nodes 216 and node 218" teaches that the neural network program of the neural model comprises a first vector addition operation (node 214) that involves a result of a MVM operation (218)).
	Murray et al., Ambrosi et al., Ravishankar et al., and Datta et al. are analogous to the claimed invention because they are directed to using computation graphs for neural network computations.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein the neural network model comprises a first VVA operation, the first VVA operation is related to a first MVM operation of the plurality of MVM operations as taught by Ravishankar et al. to the disclosed invention of Murray et al. in view of Ambrosi et al., and further in view of Datta et al.
	One of ordinary skill in the art would have been motivated to make this modification because "At least one technological advantage of the techniques described herein is that a serial computer program designed for serial execution can be automatically converted into a parallel computer program that is optimized for parallel execution. Accordingly, serial computer programs can quickly and easily be accelerated via parallel processing hardware" (Ravishankar et al. [0095]).
	Furthermore, Ambrosi et al. further teaches the first MVM operation is assigned to a first MVM unit of a first core of the neural network accelerator (Section V. A, first paragraph: "In the first stage of compilation, the graph is hierarchically partitioned and the operations in the graph are distributed
to different crossbars, cores, and tiles. The hierarchical partitioning uses a bottom-up approach whereby the graph is first partitioned to crossbars, then to cores, then to tiles. The partitioning process starts by assigning all MVM operations that use the same constant matrix to the same virtual
crossbar. Virtual crossbars, cores, and tiles are used at this stage for separation of concerns; they are mapped to physical crossbars, cores, and tiles at a later stage" teaches that each MVM operation in a graph is assigned to a crossbar (MVM unit) of a core (i.e. the first MVM operation would be assigned to the first crossbar (MVM unit) of the first core), and 
the method comprises assigning the first VVA operation to the first core (Section V. A, second paragraph: "The second level of partitioning treats each sub-graph as a node in a graph and aggregates the edges across sub-graphs into a single edge. This new graph is then partitioned, grouping together nodes (i.e., virtual crossbars) that communicate frequently into the same sub-graph (i.e., virtual cores)" teaches that operations that communicate frequently (are related to each other) are assigned to the same core (i.e. since the first MVM operation is assigned to the first core, the related first vector operation would also be assigned to the first core)).
	Murray et al., Ambrosi et al., Ravishankar et al., and Datta et al. are analogous to the claimed invention because they are directed to using computation graphs for neural network computations.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the first MVM operation is assigned to a first MVM unit of a first core of the neural network accelerator, and the method comprises assigning the first VVA operation to the first core as taught by Ambrosi et al. to the disclosed invention of Murray et al. in view of Ravishankar et al., and further in view of Datta et al.
	One of ordinary skill in the art would have been motivated to make this modification to allow "flexibility in resources allocation for partition, distribution and parallelization of models and layers to cores and tiles, as well as for communication and power management of the software stack targets ISA-programmable hybrid accelerators" (Ambrosi et al. Section IX, third paragraph).
Regarding Claim 18,
	Murray et al. teaches a non-transitory computer-readable medium comprising instructions …, the instructions being executable by a processing resource ([0090]: "For example, as noted in some embodiments, one or more of the operations, functions, processes, or methods described herein may be implemented by one or more suitable processing elements (such as a processor, microprocessor, CPU, GPU, controller, etc.) that is part of a client device, server, network element, or other form of computing or data processing device/platform and that is programmed with a set of executable instructions (e.g., software instructions), where the instructions may be stored in a suitable data storage element (for example, a non-transitory computer readable medium, examples of which are provided herein)" teaches that the invention may be implemented by a set of instructions executed by a processor (processing resource), where the set of instructions are stored in a non-transitory computer-readable medium. [0011]-[0014]: "In another embodiment, the invention is directed to an apparatus for generating a computation graph for a computation, where the apparatus includes: a processor programmed to execute a set of instructions; a data storage element in which the set of instructions are stored, wherein when executed by the processor the set of instructions cause the apparatus to represent the computation by a set of graph nodes and edges, wherein each graph node is associated with a corresponding value and each edge represents a relationship between a pair of nodes" teaches that the instructions are used for generating a computation graph describing a neural network computation) to: 
generate a computation graph corresponding to the neural network model based on execution of the neural network program, the computation graph comprising a first plurality of nodes, each of the first plurality of nodes representing one of: a MVM operation, a matrix, and a vector ([0039]: "Embodiments or implementations of the approach(es) described herein may be used to generate a program or process for making a decision that is derived at least in part from data or other outputs produced by a neural network. The program or process may be specified by a computation graph, which is a form of representing a computation or computing program by the data flow (e.g., expressed as data tensors) through a series of operations (indicated by nodes in the graph)" teaches generating a computation graph representing an input computing program for a neural network (i.e. a neural network program). [0039-0041]: "The operations or graph nodes include an operator, where the operator is used to make a nondeterministic choice that influences the program logic. The combination of a computation graph and the operator represents a collection of decisions and information about the computing architecture used to make each decision. The computation graph describes how a program will execute or how a neural network will operate to process an input. A computation graph structure is typically described in terms of the following aspects: a node is (or has) a value (e.g., tensor, matrix, vector, scalar)" teaches that the computation graph includes nodes that represent operations and values including tensor, matrix, vector and scalar. [0048]: “The network parameters determine how an output is produced from the input data. As tensors, these parameters and values can be manipulated with standard operations, such as matrix-vector multiplication” teaches that the operations include matrix-vector multiplication operations).
	Murray et al. does not appear to explicitly teach instructions for generating an executable file corresponding to a neural network model, the instructions being executable by a processing resource to: provide a programming environment in which a neural network program corresponding to a neural network model is to be expressed; receive, in the programming environment, the neural network program comprising: a plurality of matrices, a plurality of vectors, and a plurality of matrix-vector multiplication (MVM) operations, wherein the plurality of matrices, plurality of vectors, and plurality of MVM operations are declared using a matrix class, a vector class, and a MVM operation class, respectively; populate a class model corresponding to the neural network model with a data structure pointing to the computation graph; traverse the computation graph based on the class model; assign, based on traversal of the computation graph, the plurality of MVM operations to MVM units of a neural network accelerator that is to execute the neural network model, each MVM unit being capable of performing a MVM operation; and generate, based on assignment of the plurality of MVM operations, an executable file corresponding to the neural network model for execution by the neural network accelerator.
	However, Ambrosi et al. teaches instructions for generating an executable file corresponding to a neural network model, the instructions being executable by a processing resource (Fig. 7; Section V, first paragraph: "The compiler generates ISA code from a graph representation of a neural network model that is either constructed by the ONNX back-end or specified by the programmer via our custom API. The compilation flow is shown in Fig. 7." teaches a graph representation of the neural network model (i.e. a computation graph) is used to generate ISA code (executable file) corresponding to a neural network model) to: 
provide a programming environment in which a neural network program corresponding to a neural network model is to be expressed (Section V, first paragraph: "The compiler generates ISA code from a graph representation of a neural network model that is either constructed by the ONNX back-end or specified by the programmer via our custom API" teaches that a neural network model is expressed using the ONNX back end (e.g. as a neural network program in a programming environment) for use in generating the ISA code (executable file)); 
receive, in the programming environment, the neural network program … (Section III, sixth-ninth paragraphs: "The method prepare has as input the model and the graph in the ONNX representation format. Its main objective is to translate the model from ONNX format to the appropriate format of a framework or execution platform, cross-compiling the ONNX code to the desired back-end implementation … The method run provides the interface for loading the model into the platform, input preparation, inference execution, and finally, the collection of the outputs … In our solution, an initial version of the prepare method was developed to interpret ONNX models and generate a native graph representation that can be operated on by our compiler (Section V) to generate ISA code" teaches that an ONNX (Open Neural Network Exchange Format) model (neural network model) is input (received) in the ONNX format. Section III, first-third paragraphs: "The popularity of ML applications, especially neural networks and deep learning, has stimulated the emergence of multiple programming frameworks for supporting these applications. Effectively, these frameworks are Domain Specific Languages (DSLs) for building execution models … To solve neural network model interoperability, a couple of initiatives where launched to help define an exchangeable format for neural network models [41], [42]. The Open Neural Network Exchange Format (ONNX) [41] has resulted in an initial interest and engagement of the open source community and industry, supporting most of the known frameworks" teaches that the ONNX framework is a domain-specific language (DSL) for building executions models from a programming framework for a neural network (i.e. a neural network program in a programming environment)); 
assign … the plurality of MVM operations to MVM units of a neural network accelerator that is to execute the neural network model, each MVM unit being capable of performing a MVM operation (Fig. 2; Fig. 3; Section II. B, second paragraph: "Fig. 3(a) shows the ISA encoding of an MVM instruction which orchestrates the execution of an MVM operation on a crossbar, including digital-to-analog conversion, analog MVM execution, and analog-to-digital conversion. The mask operand specifies the crossbars in the core that will be active during the MVM operation" teaches that a MVM operation is assigned to a memristor crossbar (MVM unit) of an accelerator in order for the crossbar to execute (perform) the MVM operation of the neural model); and 
generate, based on assignment of the plurality of MVM operations, an executable file corresponding to the neural network model for execution by the neural network accelerator (Fig. 7; Section V, first paragraph: "The compiler generates ISA code from a graph representation of a neural network model that is either constructed by the ONNX back-end or specified by the programmer via our custom API" teaches ISA code (executable file) corresponding to a neural network model is generated based on a graph representation of the neural network model (i.e. a computation graph). Fig. 7; teaches that the computation graph is partitioned into crossbars (MVM units), cores, and tiles. Fig. 7; Section V. F: "The final stage is to generate assembly code for each tile and core from the linearized instruction sequences" teaches that the assembly (ISA) code (executable file) is generated for the assigned tiles and cores (i.e. based on the assignment of crossbars (MVM units) in cores and tiles)).
	Murray et al. and Ambrosi et al. are analogous to the claimed invention because they are directed to using computation graphs for neural network computations.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate instructions for generating an executable file corresponding to a neural network model, the instructions being executable by a processing resource to: provide a programming environment in which a neural network program corresponding to a neural network model is to be expressed; receive, in the programming environment, the neural network program …; assign … the plurality of MVM operations to MVM units of a neural network accelerator that is to execute the neural network model, each MVM unit being capable of performing a MVM operation; and generate, based on assignment of the plurality of MVM operations, an executable file corresponding to the neural network model for execution by the neural network accelerator as taught by Ambrosi et al. to the disclosed invention of Murray et al.
	One of ordinary skill in the art would have been motivated to make this modification to allow "flexibility in resources allocation for partition, distribution and parallelization of models and layers to cores and tiles, as well as for communication and power management of the software stack targets ISA-programmable hybrid accelerators" (Ambrosi et al. Section IX, third paragraph).
	Murray et al. in view of Ambrosi et al. does not appear to explicitly the neural network program comprising: a plurality of matrices, a plurality of vectors, and a plurality of matrix-vector multiplication (MVM) operations, wherein the plurality of matrices, plurality of vectors, and plurality of MVM operations are declared using a matrix class, a vector class, and a MVM operation class, respectively; populate a class model corresponding to the neural network model with a data structure pointing to the computation graph; traverse the computation graph based on the class model; and assign, based on traversal of the computation graph, the plurality of MVM operations.
	However, Ravishankar et al. teaches the neural network program comprising: a plurality of matrices, a plurality of vectors, and a plurality of matrix-vector multiplication (MVM) operations, wherein the plurality of matrices, plurality of vectors, and plurality of MVM operations are declared using a matrix class, a vector class, and a MVM operation class, respectively (Fig. 2A; [0033]-[0035]: "FIGS. 2A-2B illustrate an example of how the compiler of FIG. 1 generates and partitions a graph of nodes based on program code, according to various embodiments. The program code discussed in conjunction with the example shown in FIGS. 2A-2B is listed below: 

    PNG
    media_image1.png
    243
    355
    media_image1.png
    Greyscale

In one embodiment, listing 1 sets forth example program code written in the Python programming language. The example program code creates variables W, a, b, and x (a matrix and three arrays, respectively) and then evaluates an expression based on these variables, setting the result to the variable output. The example program code of Listing 1 can be executed by a serial processor. However, when W, a, b, and x have very large dimensions, the computation of output may take an excessive amount of time. In this situation, compiler 100 can be implemented to analyze this example program code and generate a graph of nodes, as shown in FIG. 2A" teaches an example input program code (i.e. a neural network program) is written in a programming language (defined by the DSL) including a declared matrix W, declared vectors a and b, and a declared MVM operation x).
Murray et al., Ambrosi et al., and Ravishankar et al., are analogous to the claimed invention because they are directed to using computation graphs for neural network computations.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the neural network program comprising: a plurality of matrices, a plurality of vectors, and a plurality of matrix-vector multiplication (MVM) operations, wherein the plurality of matrices, plurality of vectors, and plurality of MVM operations are declared using a matrix class, a vector class, and a MVM operation class, respectively as taught by Ravishankar et al. to the disclosed invention of Murray et al. in view of Ambrosi et al.
	One of ordinary skill in the art would have been motivated to make this modification because "At least one technological advantage of the techniques described herein is that a serial computer program designed for serial execution can be automatically converted into a parallel computer program that is optimized for parallel execution. Accordingly, serial computer programs can quickly and easily be accelerated via parallel processing hardware" (Ravishankar et al. [0095]).
Murray et al. in view of Ambrosi et al., and further in view of Ravishankar et al. does not appear to explicitly teach populate a class model corresponding to the neural network model with a data structure pointing to the computation graph; traverse the computation graph based on the class model; and assign, based on traversal of the computation graph, the plurality of MVM operations.
However, Datta et al. teaches populate a class model corresponding to the neural network model with a data structure pointing to the computation graph ([0054]-[0056]: "Schedulers according to various embodiments of the present disclosure are able to ingest any network trained across any deep learning framework. For example, networks may originate from TensorFlow, PyTorch, Caffe2, M×Net, or other frameworks known in the art. In various embodiments, the scheduler is compliant with the ONNX Network Interface standard … In various embodiments, the scheduler includes a scheme library. The scheme library comprises a plurality of schemes, and maps various deep learning operations to those schemes. A scheme comprises computation and communication primitives necessary to implement a neural network layer … In various embodiments, a scheme includes parameters defining the data format and interaction of a neural network" teaches a scheduler (class model) that is populated by the schemes of a neural network model, where the schemes include neural network operations and input parameters. [0058]: "In various embodiments, the scheduler includes a graph sequencer. In some embodiments, each node in the computation graph corresponds to a scheme. Such a graph can be used to establish precedence relationship between logical cores. For some networks, the graph may be DAG. The precedence relationship between cores is used to ensure that each operation is scheduled for computation on a physical core only after all incoming edges to it are already scheduled" teaches that the schemes of the scheduler (class model) correspond (point) to nodes of a computation graph (i.e. point to the computation graph)); 
traverse the computation graph based on the class model ([0058]: "A network is traversed using a breadth first traversal. A breadth-first schedule (BFS) is preferred in various embodiments to a depth-first schedule (DFS). In particular, during graph traversal, BFS schedules nodes in the graph on the cores in the order that they are discovered. For example, with BFS, nodes within each layer are scheduled first, before nodes in the next layer of the graph" teaches that the scheduler (class model) is used to traverse the computation graph using a breadth first traversal); and 
assign, based on traversal of the computation graph, the plurality of MVM operations … ([0058]: "A network is traversed using a breadth first traversal. A breadth-first schedule (BFS) is preferred in various embodiments to a depth-first schedule (DFS). In particular, during graph traversal, BFS schedules nodes in the graph on the cores in the order that they are discovered. For example, with BFS, nodes within each layer are scheduled first, before nodes in the next layer of the graph" teaches that during breadth first traversal (BFS) of the graph using the scheduler (class model), the operations of the nodes of the graph are assigned to cores. [0052]: "As set out above, each core computes a vector matrix multiplication" teaches that each core is capable of performing vector matrix multiplication (MVM operation)).
Murray et al., Ambrosi et al., Ravishankar et al., and Datta et al. are analogous to the claimed invention because they are directed to using computation graphs for neural network computations.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate populate a class model corresponding to the neural network model with a data structure pointing to the computation graph; traverse the computation graph based on the class model; and assign, based on traversal of the computation graph, the plurality of MVM operations … as taught by Datta et al. to the disclosed invention of Murray et al. in view of Ambrosi et al., and further in view of Ravishankar et al.
	One of ordinary skill in the art would have been motivated to make this modification "to efficiently map an arbitrary dimensional tensor onto any set of neural cores such that the computational efficiency is maximized, memory access is efficient, and re-shuffling of activations is minimized" (Datta et al. [0088]).
Regarding Claim 19,
	Murray et al. in view of Ambrosi et al., in view of Ravishankar et al., and further in view of Datta et al. teaches the non-transitory computer-readable medium of claim 18.
	Additionally, Ravishankar et al. further teaches wherein the neural network model comprises a first vector-vector addition (VVA) operation related to a first MVM operation of the plurality of MVM operations (Fig. 2A; [0035]-[0036]: "Referring now to FIG. 2A, in one embodiment, compiler 100 generates graph 200 based on the example program code shown in Listing 1. As shown, graph 200 includes nodes 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, and 224 connected by various edges. A given node of graph 200 corresponds to an operation or a value specified in the example program code. A given incoming edge of graph 200 corresponds to an operand specified in or generated via the program code. In one embodiment, graph 200 is a directed acyclic graph. In one embodiment, the nodes of graph 200 are coupled together via edges to represent that some nodes receive input from other nodes. For example, node 214 (representing an addition operation) receives input from node 216 (representing the value of b) and node 218 (representing the output of a dot product between the transpose of W and a) via the edges shown. Node 214 computes the sum of the outputs of nodes 216 and node 218" teaches that the neural network program of the neural model comprises a first vector addition operation (node 214) that involves a result of a MVM operation (218)).
	Murray et al., Ambrosi et al., Ravishankar et al., and Datta et al. are analogous to the claimed invention because they are directed to using computation graphs for neural network computations.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein the neural network model comprises a first vector-vector addition (VVA) operation related to a first MVM operation of the plurality of MVM operations as taught by Ravishankar et al. to the disclosed invention of Murray et al. in view of Ambrosi et al., and further in view of Datta et al.
	One of ordinary skill in the art would have been motivated to make this modification because "At least one technological advantage of the techniques described herein is that a serial computer program designed for serial execution can be automatically converted into a parallel computer program that is optimized for parallel execution. Accordingly, serial computer programs can quickly and easily be accelerated via parallel processing hardware" (Ravishankar et al. [0095]).
	Furthermore, Ambrosi et al. further teaches the first MVM operation is assigned to a first MVM unit of a first core of the neural network accelerator (Section V. A, first paragraph: "In the first stage of compilation, the graph is hierarchically partitioned and the operations in the graph are distributed to different crossbars, cores, and tiles. The hierarchical partitioning uses a bottom-up approach whereby the graph is first partitioned to crossbars, then to cores, then to tiles. The partitioning process starts by assigning all MVM operations that use the same constant matrix to the same virtual crossbar. Virtual crossbars, cores, and tiles are used at this stage for separation of concerns; they are mapped to physical crossbars, cores, and tiles at a later stage" teaches that each MVM operation in a graph is assigned to a crossbar (MVM unit) of a core (i.e. the first MVM operation would be assigned to the first crossbar (MVM unit) of the first core), andHP 9066001833 
the instructions are executable by the processing resource to assign the first VVA operation to the first core (Section V. A, second paragraph: "The second level of partitioning treats each sub-graph as a node in a graph and aggregates the edges across sub-graphs into a single edge. This new graph is then partitioned, grouping together nodes (i.e., virtual crossbars) that communicate frequently into the same sub-graph (i.e., virtual cores)" teaches that operations that communicate frequently (are related to each other) are assigned to the same core (i.e. since the first MVM operation is assigned to the first core, the related first vector operation would also be assigned to the first core)).
	Murray et al., Ambrosi et al., Ravishankar et al., and Datta et al. are analogous to the claimed invention because they are directed to using computation graphs for neural network computations.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the first MVM operation is assigned to a first MVM unit of a first core of the neural network accelerator, andHP 9066001833 the instructions are executable by the processing resource to assign the first VVA operation to the first core as taught by Ambrosi et al. to the disclosed invention of Murray et al. in view of Ravishankar et al., and further in view of Datta et al.
	One of ordinary skill in the art would have been motivated to make this modification to allow "flexibility in resources allocation for partition, distribution and parallelization of models and layers to cores and tiles, as well as for communication and power management of the software stack targets ISA-programmable hybrid accelerators" (Ambrosi et al. Section IX, third paragraph).
Regarding Claim 20,
	Murray et al. in view of Ambrosi et al., in view of Ravishankar et al., and further in view of Datta et al. teaches the non-transitory computer-readable medium of claim 18.
	Additionally, Ambrosi et al. further teaches wherein, subsequent to assignment of MVM operations to the MVM units, to generate the executable file, the instructions are executable by the processing resource to: convert the computation graph into a sequential stream of instructions (Section V. E, first paragraph: "The next stage of compilation is linearization. In this stage, the graph is linearized into a sequence of instructions for each tile and core" teaches that the graph is linearized (converted) into a sequence of instructions in order to generate the executable ISA instructions (executable file)); and 
subject the sequential stream of instructions to a code optimization technique (Section V. G, first paragraph: "The DSL and the compiler provide several value adds and failsafes to aid rapid software development, while safeguarding the programmer from common programming pitfalls … Further, the compiler diagnoses unused tensors, thus avoiding inadvertent programming errors and conserving precious device memory" teaches that the instructions when executed are able to diagnose unused tensors (code optimization technique) in order to avoid errors and conserve memory).
	Murray et al., Ambrosi et al., Ravishankar et al., and Datta et al. are analogous to the claimed invention because they are directed to using computation graphs for neural network computations.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein, subsequent to assignment of MVM operations to the MVM units, to generate the executable file, the instructions are executable by the processing resource to: convert the computation graph into a sequential stream of instructions; and subject the sequential stream of instructions to a code optimization technique as taught by Ambrosi et al. to the disclosed invention of Murray et al. in view of Ravishankar et al., and further in view of Datta et al.
	One of ordinary skill in the art would have been motivated to make this modification to allow "flexibility in resources allocation for partition, distribution and parallelization of models and layers to cores and tiles, as well as for communication and power management of the software stack targets ISA-programmable hybrid accelerators" (Ambrosi et al. Section IX, third paragraph).


Claims 4, 10, and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Murray et al. (US 2019/0034785 A1) in view of Ambrosi et al. ("Hardware-Software Co-Design for an Analog-Digital Accelerator for Machine Learning"), in view of Ravishankar et al. (US 2019/0278574 A1), in view of Datta et al. (US 2020/0042856 A1), and further in view of Norrie et al. (US 2018/0336456 A1).
Regarding Claim 4,
Murray et al. in view of Ambrosi et al., in view of Ravishankar et al., and further in view of Datta et al. teaches the system of claim 3.
	Additionally, Ambrosi et al. further teaches wherein each core comprises … a matrix-matrix addition unit to perform … a matrix-matrix addition operation, respectively, of the neural network model (Section VIII. B, first paragraph: "In each step we accumulate in a digital adder the result from the previous step with a product of the present row input and column weight, computed by a digital multiplier" teaches that in each operation step an accumulation (addition) operation is performed by a digital adder (matrix-matrix addition unit) for accumulating (adding) the output of the previous operation step with the current multiplier output).
	Murray et al., Ambrosi et al., Ravishankar et al., and Datta et al. are analogous to the claimed invention because they are directed to using computation graphs for neural network computations.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein each core comprises … a matrix-matrix addition unit to perform … a matrix-matrix addition operation, respectively, of the neural network model as taught by Ambrosi et al. to the disclosed invention of Murray et al. in view of Ravishankar et al., and further in view of Datta et al.
	One of ordinary skill in the art would have been motivated to make this modification to allow "flexibility in resources allocation for partition, distribution and parallelization of models and layers to cores and tiles, as well as for communication and power management of the software stack targets ISA-programmable hybrid accelerators" (Ambrosi et al. Section IX, third paragraph).
	Murray et al. in view of Ambrosi et al., in view of Ravishankar et al., and further in view of Datta et al. does not appear to explicitly teach wherein each core comprises a vector-vector addition unit, and a scalar-scalar addition unit … to perform a vector-vector addition operation, and a scalar-scalar addition operation …, respectively, of the neural network model.
	However, Norrie et al. teaches wherein each core comprises a vector-vector addition unit, and a scalar-scalar addition unit … to perform a vector-vector addition operation, and a scalar-scalar addition operation …, respectively, of the neural network model (Fig. 3; [0031]: "FIG. 3 shows a high-level example of compute core (300). The compute core can be a machine, i.e., a VLIW machine, that controls several compute units in parallel. Each compute core (300) contains: a scalar memory (304), a vector memory (308), a scalar processor (303), a vector processor (306), and extended vector units (i.e., a matrix multiply unit (MXU) (313) a transpose unit (XU) (314), and a reduction and permutation unit (RPU) (316))" teaches that a computation core comprises a scalar processor 303 (scalar-scalar addition unit) for performing scalar arithmetic operations (scalar-scalar addition) and a vector processor 306 (vector-vector addition unit) for performing vector arithmetic operations (vector-vector addition)).
Murray et al., Ambrosi et al., Ravishankar et al., and Datta et al. are analogous to the claimed invention because they are directed to using computation graphs for neural network computations.
Norrie et al. is analogous to the claimed invention because it is directed to neural network compute cores for neural network operations.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein each core comprises a vector-vector addition unit, and a scalar-scalar addition unit … to perform a vector-vector addition operation, and a scalar-scalar addition operation …, respectively, of the neural network model as taught by Norrie et al. to the disclosed invention of Murray et al. in view of Ambrosi et al., in view of Ravishankar et al., and further in view of Datta et al.
One of ordinary skill in the art would have been motivated to make this modification "to efficiently perform training and inference calculations" (Norrie et al. [0025]).
Regarding Claim 10,
	Murray et al. in view of Ambrosi et al., in view of Ravishankar et al., and further in view of Datta et al. teaches the system of claim 1.
Additionally, Murray et al. further teaches wherein the neural network model comprises a plurality of tensors and a plurality of tensor operations, the plurality of tensors comprises the plurality of matrices, the plurality of vectors, and a plurality of scalars ([0039]-[0040]: "Embodiments or implementations of the approach(es) described herein may be used to generate a program or process for making a decision that is derived at least in part from data or other outputs produced by a neural network. The program or process may be specified by a computation graph, which is a form of representing a computation or computing program by the data flow (e.g., expressed as data tensors) through a series of operations (indicated by nodes in the graph). The operations or graph nodes include an operator, where the operator is used to make a nondeterministic choice that influences the program logic. The combination of a computation graph and the operator represents a collection of decisions and information about the computing architecture used to make each decision. The computation graph describes how a program will execute or how a neural network will operate to process an input. A computation graph structure is typically described in terms of the following aspects: a node is (or has) a value (e.g., tensor, matrix, vector, scalar)" teaches that the neural network computing program comprises data values, expressed as a plurality of tensors (including scalars, vectors, and matrices), and a plurality of operations (tensor operations) for the data values), 
the plurality of tensor operations comprises the plurality of MVM operations … ([0048]: "The network parameters determine how an output is produced from the input data. As tensors, these parameters and values can be manipulated with standard operations, such as matrix-vector multiplication" teaches that the tensor operations comprise matrix-vector multiplication (MVM) operations).
	Furthermore, Ravishankar et al. further teaches … the computation graph comprises a second plurality of nodes, each of the second plurality of nodes representing one of: a scalar, a VVA operation, a MMM operation, and a SSA operation (Fig. 2A; [0035]-[0036]: "Referring now to FIG. 2A, in one embodiment, compiler 100 generates graph 200 based on the example program code shown in Listing 1. As shown, graph 200 includes nodes 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, and 224 connected by various edges. A given node of graph 200 corresponds to an operation or a value specified in the example program code. A given incoming edge of graph 200 corresponds to an operand specified in or generated via the program code. In one embodiment, graph 200 is a directed acyclic graph. In one embodiment, the nodes of graph 200 are coupled together via edges to represent that some nodes receive input from other nodes. For example, node 214 (representing an addition operation) receives input from node 216 (representing the value of b) and node 218 (representing the output of a dot product between the transpose of W and a) via the edges shown. Node 214 computes the sum of the outputs of nodes 216 and node 218" teaches that the generated computation graph includes nodes (second plurality of nodes) representing operations other than an MVM operation, including one of a scalar (node 204), scalar addition (SSA) (node 206), and vector addition (node 214)).
	Murray et al., Ambrosi et al., Ravishankar et al., and Datta et al. are analogous to the claimed invention because they are directed to using computation graphs for neural network computations.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate … the computation graph comprises a second plurality of nodes, each of the second plurality of nodes representing one of: a scalar, a VVA operation, a MMM operation, and a SSA operation as taught by Ravishankar et al. to the disclosed invention of Murray et al. in view of Ambrosi et al., and further in view of Datta et al.
	One of ordinary skill in the art would have been motivated to make this modification because "At least one technological advantage of the techniques described herein is that a serial computer program designed for serial execution can be automatically converted into a parallel computer program that is optimized for parallel execution. Accordingly, serial computer programs can quickly and easily be accelerated via parallel processing hardware" (Ravishankar et al. [0095]).
	Murray et al. in view of Ambrosi et al., in view of Ravishankar et al., and further in view of Datta et al. does not appear to explicitly teach the plurality of tensor operations comprises … VVA operation, matrix-matrix multiplication (MMM) operation, and scalar-scalar addition (SSA) operation.
	However, Norrie et al. teaches the plurality of tensor operations comprises … VVA operation, matrix-matrix multiplication (MMM) operation, and scalar-scalar addition (SSA) operation (Fig. 3; [0031]: "FIG. 3 shows a high-level example of compute core (300). The compute core can be a machine, i.e., a VLIW machine, that controls several compute units in parallel. Each compute core (300) contains: a scalar memory (304), a vector memory (308), a scalar processor (303), a vector processor (306), and extended vector units (i.e., a matrix multiply unit (MXU) (313) a transpose unit (XU) (314), and a reduction and permutation unit (RPU) (316))" teaches that the operations comprise vector-vector addition (VVA) operations (executed via the vector processor 306), scalar-scalar addition (SSA) operations (via the scalar processor 303, and matrix-matrix multiplication (MMM) operations (via the matrix multiply unit 313)).
Murray et al., Ambrosi et al., Ravishankar et al., and Datta et al. are analogous to the claimed invention because they are directed to using computation graphs for neural network computations.
Norrie et al. is analogous to the claimed invention because it is directed to neural network compute cores for neural network operations.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the plurality of tensor operations comprises … VVA operation, matrix-matrix multiplication (MMM) operation, and scalar-scalar addition (SSA) operation as taught by Norrie et al. to the disclosed invention of Murray et al. in view of Ambrosi et al., in view of Ravishankar et al., and further in view of Datta et al.
One of ordinary skill in the art would have been motivated to make this modification "to efficiently perform training and inference calculations" (Norrie et al. [0025]).
Regarding Claim 11,
Murray et al. in view of Ambrosi et al., in view of Ravishankar et al., in view of Datta et al., and further in view of Norrie et al. teaches the system of claim 10.
Additionally, Ambrosi et al. further teaches wherein the instructions are executable by the processor to at least one of: deduce whether a tensor among the plurality of tensors is an input tensor or an output tensor based on a pattern of usage of the tensor:HP 9066001830 detect tensors that are defined, but left unused in the neural network model; and detect tensors that are used in the neural network model with an invalid lifetime (Section V. G, first paragraph: "The DSL and the compiler provide several value adds and failsafes to aid rapid software development, while safeguarding the programmer from common programming pitfalls. For example, the compiler automatically detects input/output tensors without the programmer having to explicitly declare them as such. Further, the compiler diagnoses unused tensors, thus avoiding inadvertent programming errors and conserving precious device memory. The DSL also implements safeguards against out-of-scope tensors used in the model, thus preventing hard to detect runtime issues" teaches that the instructions when executed are able to detect input/output tensors without the tensors having to be declared as such, detect unused tensors, and detect out-of-scope tenors (e.g. tensors with an invalid lifetime)).
Murray et al., Ambrosi et al., Ravishankar et al., and Datta et al. are analogous to the claimed invention because they are directed to using computation graphs for neural network computations.
Norrie et al. is analogous to the claimed invention because it is directed to neural network compute cores for neural network operations.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein the instructions are executable by the processor to at least one of: deduce whether a tensor among the plurality of tensors is an input tensor or an output tensor based on a pattern of usage of the tensor:HP 90660018 30detect tensors that are defined, but left unused in the neural network model; and detect tensors that are used in the neural network model with an invalid lifetime as taught by Ambrosi et al. to the disclosed invention of Murray et al. in view of Ravishankar et al., in view of Datta et al., and further in view of Norrie et al.
	One of ordinary skill in the art would have been motivated to make this modification to allow "flexibility in resources allocation for partition, distribution and parallelization of models and layers to cores and tiles, as well as for communication and power management of the software stack targets ISA-programmable hybrid accelerators" (Ambrosi et al. Section IX, third paragraph).

Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Murray et al. (US 2019/0034785 A1) in view of Ambrosi et al. ("Hardware-Software Co-Design for an Analog-Digital Accelerator for Machine Learning"), in view of Ravishankar et al. (US 2019/0278574 A1), in view of Datta et al. (US 2020/0042856 A1), and further in view of Barham et al. (US 2017/0124451 A1).
Regarding Claim 5,
	Murray et al. in view of Ambrosi et al., in view of Ravishankar et al., and further in view of Datta et al. teaches the system of claim 1.
	Additionally, Ambrosi et al. further teaches … to assign a MVM operation to a MVM unit … (Section V. A, first paragraph: "In the first stage of compilation, the graph is hierarchically partitioned and the operations in the graph are distributed to different crossbars, cores, and tiles. The hierarchical partitioning uses a bottom-up approach whereby the graph is first partitioned to crossbars, then to cores, then to tiles. The partitioning process starts by assigning all MVM operations that use the same constant matrix to the same virtual crossbar. Virtual crossbars, cores, and tiles are used at this stage for separation of concerns; they are mapped to physical crossbars, cores, and tiles at a later stage" teaches that each MVM operation in a graph is assigned to a crossbar (MVM unit)).
	Murray et al., Ambrosi et al., Ravishankar et al., and Datta et al. are analogous to the claimed invention because they are directed to using computation graphs for neural network computations.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate … to assign a MVM operation to a MVM unit …  as taught by Ambrosi et al. to the disclosed invention of Murray et al. in view of Ravishankar et al., and further in view of Datta et al.
	One of ordinary skill in the art would have been motivated to make this modification to allow "flexibility in resources allocation for partition, distribution and parallelization of models and layers to cores and tiles, as well as for communication and power management of the software stack targets ISA-programmable hybrid accelerators" (Ambrosi et al. Section IX, third paragraph).
	Murray et al. in view of Ambrosi et al., in view of Ravishankar et al., and further in view of Datta et al. does not appear to explicitly teach wherein … the instructions are executable by the processor to: assign a matrix involved in the MVM operation to the MVM unit; and assign a vector involved in the MVM operation to the MVM unit.
	However, Barham et al. teaches wherein … the instructions are executable by the processor to: assign a matrix involved in the MVM operation to the MVM unit (Fig. 2; [0051]: "The system provides the instructions and the data to the device (step 210)" teaches that when an operation is assigned to a device for execution, the system (e.g. by a processor) send the device (core) to data required to implement the operation (i.e. meaning that for an MVM operation, the device (core) is sent (assigned) the matrix required for the MVM operation). See paragraphs [0019] and [0022]-[0023] for support that a matrix is involved in MVM operations); and 
assign a vector involved in the MVM operation to the MVM unit (Fig. 2; [0051]: "The system provides the instructions and the data to the device (step 210)" teaches that when an operation is assigned to a device for execution, the system (e.g. by a processor) send the device (core) to data required to implement the operation (i.e. meaning that for an MVM operation, the device (core) is sent (assigned) the vector required for the MVM operation). See paragraphs [0019] and [0022]-[0023] for support that a vector is involved in MVM operations).
Murray et al., Ambrosi et al., Ravishankar et al., Datta et al., and Barham et al. are analogous to the claimed invention because they are directed to using computation graphs for neural network computations.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein … the instructions are executable by the processor to: assign a matrix involved in the MVM operation to the MVM unit; and assign a vector involved in the MVM operation to the MVM unit as taught by Barham et al. to the disclosed invention of Murray et al. in view of Ambrosi et al., in view of Ravishankar et al., and further in view of Datta et al.
One of ordinary skill in the art would have been motivated to make this modification to "improve efficiency when operating on disjoint streams" (Barham et al. [0057]).
Allowable Subject Matter
Claims 6 and 17 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRIAN J HALES whose telephone number is (571)272-0878. The examiner can normally be reached M-Th 8:00am - 5:00pm and F 8:00am - 2:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/BRIAN J HALES/Examiner, Art Unit 2125                                                                                                                                                                                                        
/KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125