DETAILED ACTION
Response to Arguments
Applicant’s arguments with respect to claim(s) 1, 10, and 17 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 11/14/2022 has been entered.
 
Priority
Applicant’s claim for the benefit of a prior-filed application under 35 U.S.C. 119(e) or under 35 U.S.C. 120, 121, 365(c), or 386(c) is acknowledged.

Information Disclosure Statement
The information disclosure statements (IDS) submitted on 04/08/2022, 09/21/2022,10/21/2022 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-5, 7-8, 10-12 and 14-20 are rejected under 35 U.S.C. 103 as being unpatentable over Abadi et al. "Tensorflow: Large-scale machine learning on heterogeneous distributed systems." arXiv preprint arXiv:1603.04467 2016 (“Abadi”) in view of Vasudevan et al. US 2017 /0124454 Al(“Vasudevan”) in view of Malaya et al. US 2019/0171420 Al (“Malaya”) and in view of Venieris, Stylianos I., et al. "fpgaConvNet: A toolflow for mapping diverse convolutional neural networks on embedded FPGAs." arXiv preprint arXiv:1711.08740 (2017)(“Venieris”) and further in view of Ji, Yu, et al. "Bridging the Gap Between Neural Networks and Neuromorphic Hardware with A Neural Network Compiler." arXiv preprint arXiv:1801.00746v3 (2018)(“Ji”). 
Regarding claim 1, Abadi teaches a computer-readable memory storing computer-executable instructions that when executed by a processor, configure the processor to perform at least:
 identifying the first subgraph of the neural network model to partition from the neural network model(Abadi, pg. 7, left-column, see also fig. 6,  “Two arguments to the Run call help define the exact subgraph of the computation graph that will be executed. First, the Run call accepts inputs, an optional mapping of name:port names to “fed” tensors values. Second, the Run call accepts output names, a list of output name[:port] specifications indicating which nodes should be executed, and, if the port portion is present in a name, that that particular output tensor value for the node should be returned to the client if the Run call completes successfully. The graph is transformed based on the values of inputs and outputs. Each node:port specified in inputs is replaced with a feed node, which will pick up the provided input tensor from specially-initialized entries in a Rendezvous object used for the Run call. Similarly, each output name with a port is connected to a special fetch node that arranges to save the output tensor and return it to the client when the Run call is complete.”); and 
compiling the first subgraph to a neural network accelerator to generate configuration information for the neural network accelerator, the neural network accelerator comprising one or more specialized neural network processors(Abadi, pg. 7, right-column, “TensorFlow clients can control the placement of nodes on devices by providing partial constraints for a node about which devices it can execute on. For example, [‘]only place this node on a device of type GPU[’]… [w]ithin the confines of these constraints, the placement algorithm is responsible for choosing an assignment of nodes to devices that provides fast execution of the computation and also satisfies various constraints imposed by the devices themselves, such as limiting the total amount of memory needed on a device in order to execute its subset of graph nodes.” Abadi teaches the placement algorithm is responsible for choosing an assignment of nodes to devices (i.e. compiling the first subgraph) only place this node on a device of type GPU(i.e. to a neural network accelerator) fast execution of the computation and also satisfies various constraints imposed by the devices themselves, such as limiting the total amount of memory needed on a device in order to execute its subset of graph nodes (i.e. to generate configuration information for the neural network accelerator) only place this node on a device of type GPU (i.e. the neural network accelerator comprising one or more specialized neural network processors)); and 
configuring a general-purpose processor in communication with the neural network accelerator to evaluate the subgraph (Abadi, pg. 5, right-column, see also fig. 4, “Once the node placement has been computed, the graph is partitioned into a set of subgraphs, one per device. Any cross-device edge from x to y is removed and replaced by an edge from x to a new Send node in x’s subgraph and an edge from a corresponding Receive node to y in y’s subgraph… [a]t runtime, the implementations of the Send and Receive nodes coordinate to transfer data across devices. This allows us to isolate all communication inside Send and Receive implementations, which simplifies the rest of the runtime.” Abadi teaches Once the node placement has been computed, the graph is partitioned into a set of subgraphs, one per device. Any cross-device edge from x to y is removed and replaced by an edge from x to a new Send node in x’s subgraph and an edge from a corresponding Receive node to y in y’s subgraph (i.e. configuring a general-purpose processor in communication with the neural network accelerator) [a]t runtime, the implementations of the Send and Receive nodes coordinate to transfer data across devices. This allows us to isolate all communication inside Send and Receive implementations, which simplifies the rest of the runtime (i.e. to evaluate the second subgraph)). 
Abadi does not teach: inserting a marker node that is not a neural node and does not perform any neural node operations at a boundary of the first subgraph, the marker node configured to perform at least, during an initial training mode, pass values unchanged between the first subgraph and the neural network model; and during a fine-tune training mode, perform at least one of (i) reducing a precision of values passed to the first subgraph, or (ii) reducing a precision of values output from the first subgraph.  
However, Vasudevan teaches: and inserting a marker node that is not a neural node and does not perform any neural node operations at a boundary of the first subgraph, the marker node configured to perform at least, during an initial training mode, pass values unchanged between the first subgraph and the neural network model(Vasudevan, paras. 0062-0063, see also fig. 2c, “As demonstrated by the modified computational graph 200C in FIG. 2C, the system 100 may modify the allocation such that each send node is allocated to one
respective subgraph and each receive node is allocated to another respective subgraph. For instance, the first send node                         
                            
                                
                                    S
                                
                                
                                    1
                                
                            
                        
                     may be allocated to third device included in machine 130, along with node 201, as part of subgraph 240 that the system has assigned to the third device. Similarly, the first receive node                         
                            
                                
                                    R
                                
                                
                                    1
                                
                            
                        
                     and second send node                        
                             
                            
                                
                                    S
                                
                                
                                    2
                                
                            
                        
                     may be allocated to the second device included in machine 126, along with nodes 206, 208, and 210, as part of subgraph 246 that the system has assigned to the second device…[a]t execution time, the operation represented by the first send node                         
                            
                                
                                    S
                                
                                
                                    1
                                
                            
                        
                     may include a relaying of the output of node 201 to the first receive node                         
                            
                                
                                    R
                                
                                
                                    1
                                
                            
                        
                    .”); and during a fine-tune training mode, perform at least one of (i) reducing a precision of values passed to the first subgraph, or (ii) reducing a precision of values output from the first subgraph(Vasudevan, para. 0073, see also fig. 3, “In some implementations, data exchanged between devices in association with send and receive nodes                         
                            
                                
                                    S
                                
                                
                                    3
                                
                            
                        
                     and                         
                            
                                
                                    R
                                
                                
                                    3
                                
                            
                        
                     may be compressed. That is, the operations 330 represented by send node                         
                            
                                
                                    S
                                
                                
                                    3
                                
                            
                        
                     may act to perform one or more compression processes upon the output of the operation represented by node 310. Similarly, the operations 340 represented by receive node                         
                            
                                
                                    R
                                
                                
                                    3
                                
                            
                        
                      may act to perform one or more decompression processes upon compressed data provided as output by way of execution of the operations 330 represented by send node                          
                            
                                
                                    S
                                
                                
                                    3
                                
                            
                        
                    . The compression operations performed may
include any conventional compression algorithm that is appropriate for transmitting data between the two devices. For example, the data exchanged between devices may be downconverted, truncated, or a combination thereof. Similarly, values conveyed by such data may also be subject to probabilistic rounding.”).1 
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Abadi’s computer-readable memory in view of Vasudevan the motivation to do so would be to reduce the costs associated in processing a computational graph in a distributed environment (Vasudevan, para. 0006, “Send and receive nodes serve to compartmentalize subgraphs in a manner that allows for a neural network or a portion of a neural network represented by such subgraphs to be trained on one device, and later on allocated to another device. For at least these reasons, modifying computational graphs to
include pairs of send and receive nodes may help reduce time costs and the amount of network communication required to process a computational graph in a distributed fashion.”).
Abadi does not teach: configuring the neural network accelerator with the configuration information to evaluate the first subgraph to provide an accelerated version of the first subgraph.
However Malaya teaches: configuring the neural network accelerator with the configuration information to evaluate the first subgraph to provide an accelerated version of the first subgraph(Malaya, paras. 0030-0031, fig. 4, “The reconfigurable nature of FPGA devices is also well suited for supporting variable precision calculations. The neural network 400 includes neurons 401-420, which are implemented by configuring the configurable logic blocks of FPGA devices 121-123. Neural network 400 is illustrated as including twenty neurons 401-420 and can represent the entire neural network or a portion of a larger network. In some embodiments, a neural network can include any number of neurons, with some neural networks having up to                         
                            
                                
                                    10
                                
                                
                                    7
                                
                            
                        
                     neurons or more.” Malaya teaches The neural network 400 includes neurons 401-420, which are implemented by configuring the configurable logic blocks of FPGA devices 121-123 (i.e. configuring the neural network accelerator) The reconfigurable nature of FPGA devices is also well suited for supporting variable precision calculations (i.e. with the configuration information to provide an accelerated version) Neural network 400 is illustrated as including twenty neurons 401-420 and can represent the entire neural network or a portion of a larger network. In some embodiments, a neural network can include any number of neurons, with some neural networks having up to                         
                            
                                
                                    10
                                
                                
                                    7
                                
                            
                        
                     neurons or more (i.e. of the subgraph)).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Abadi’s computer-readable memory in view of Malaya, the motivation to do so would be to have a device that can perform a wide range of numeral precisions for different machine learning applications applications(Malaya, para. 0013,  “[T]he reconfigurable nature of field programmable gate array (FPGA) devices in a computing system allows the system to support a wide range of numerical precisions and to dynamically vary the precision for key computations at run time. This capability enables optimally efficient computation and provides a competitive advantage (via reductions in energy use, memory access, and computational cost achieved by reducing unnecessary precision) over conventionally operating devices.”).
Abadi does not teach: partitioning a neural network model into a plurality of subgraphs, the plurality of subgraphs comprising at least a first subgraph and a second subgraph, wherein partitioning the neural network model comprises. 
However, Venieris teaches: partitioning a neural network model into a plurality of subgraphs, the plurality of subgraphs comprising at least a first subgraph and a second subgraph, wherein partitioning the neural network model comprises(Venieris, pg. 2-3, see also fig. 2a, “Synchronous Dataflow (SDF)…is widely used for the analysis and design of parallel systems. Under this scheme, a computing system is modelled as a directed graph (SDFG), with the nodes representing computations and with arcs in place of data streams between them… To modify the configurable parameters of each hardware building block, fpgaConvNet introduces four transformations: (1) graph partitioning with reconfiguration, (2) coarse-grained folding, (3) fine-grained folding and (4) weights reloading. Graph partitioning with reconfiguration is tailored to high-throughput applications and achieves high throughput by partitioning a ConvNet along its depth and constructing one SDF subgraph per partition. One distinct architecture is generated per subgraph tailored to the subgraph’s workload, which can exploit all the resources of the target FPGA to reach high performance. The execution of each subgraph requires the full reconfiguration of the FPGA and fpgaConvNet amortises the reconfiguration time overhead by means of batch processing, leading to high-throughput mappings (Fig. 2a).” Venieris teaches fig. 2a and Graph partitioning with reconfiguration is tailored to high-throughput applications and achieves high throughput by partitioning a ConvNet along its depth and constructing one SDF subgraph per partition. One distinct architecture is generated per subgraph tailored to the subgraph’s workload, which can exploit all the resources of the target FPGA to reach high performance (i.e. partition a neural network model into a plurality of subgraphs, the plurality of subgraphs comprising at least a first subgraph and a second subgraph, wherein partitioning the neural network model comprises)).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Abadi  in view of  Venieris the motivation to do so would be to have a tool that is able to translate deep learning models such as convolutional neural networks to FPGA without sacrificing latency and/or throughput requirements(Venieris, pg. 2, “FPGAs offer the benefits of customisability and reconfigurability
by means of a set of heterogeneous hardware resources with programmable interconnections between them. With FPGAs’ size and resource specifications advancing at a fast pace and with ConvNets becoming more complex, the possible mappings of a ConvNet on an FPGA lie in a large multidimensional design space that cannot be explored manually. At the same time, the diversity of ConvNet application domains results in a wide spectrum of performance needs. To this end, there is a need for tools that abstract the low-level resource details of a particular FPGA and automate the mapping of ConvNets on FPGAs in a principled manner while satisfying the application-level performance needs.”). 
	Abadi does not teach: the marker node being connected to a first edge that is connected to a first neural node of the neural network model that is external to the first subgraph, and (2) a second edge that is connected to a second neural node of the neural network model that is internal to the first subgraph.
	However Ji teaches: the marker node being connected to a first edge that is connected to a first neural node of the neural network model that is external to the first subgraph, and (2) a second edge that is connected to a second neural node of the neural network model that is internal to the first subgraph (Ji, pgs., 3-4 right-column, see also fig. 3, “According to the above description, it constructs G = (V, E) based on the input NN’s information that includes the trained parameters, network topology, vertex information and
training dataset…we transform the intermediate graph                         
                             
                            
                                
                                    G
                                
                                ^
                            
                            =
                            (
                            
                                
                                    V
                                
                                ^
                            
                            ,
                             
                            
                                
                                    E
                                
                                ^
                            
                            )
                        
                     to the hardware execution model                         
                            
                                
                                    G
                                
                                
                                    '
                                
                            
                            =
                            (
                            
                                
                                    V
                                
                                
                                    '
                                
                            
                            ,
                            E
                            '
                            )
                        
                    . We use the original graph G to supervise the fine-tuning progress of the generated                         
                            
                                
                                    G
                                
                                
                                    '
                                
                            
                        
                    . The graph G can provide not only labels of the output but also supervised signals of all intermediate data between operations. Each edge e ∈ E can provide supervised signal for graph                         
                            
                                
                                    G
                                
                                
                                    '
                                
                            
                        
                    .” Ji teaches: G = (V, E) based on the input NN’s information is used to supervise the fine-tuning progress of the hardware execution model                         
                            
                                
                                    G
                                
                                
                                    '
                                
                            
                            =
                            (
                            
                                
                                    V
                                
                                
                                    '
                                
                            
                            ,
                            E
                            '
                            )
                        
                    . The graph G can provide not only labels of the output but also supervised signals of all intermediate data between operations. Each edge e ∈ E can provide supervised signal for graph                         
                            
                                
                                    G
                                
                                
                                    '
                                
                            
                        
                     (i.e. the marker node being connected to a first edge that is connected to a first neural node of the neural network model that is external to the first subgraph, and (2) a second edge that is connected to a second neural node of the neural network model that is internal to the first subgraph)).
	Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Abadi in view of Ji the motivation to do so would be to map a trained neural network into an equivalent network on different hardware(Ji, pg., 2, left-column, “In this paper we propose a new method with flexibility, better applicability, and easy convergence. First, we decouple the neuromorphic computer system into two levels for better flexibility, software programming model and hardware execution model. We use computational graph (CG), which is widely used in many popular NN frameworks…as the programming model for NN models. We also provide the hardware/software (HW/SW) interface
and the minimum hardware functionality that an NN hardware should provide. We propose a transformation workflow to convert a trained NN, expressed as a CG, into an equivalent representation of HW/SW interface through the fine-tuning method.”).
Regarding claim 2, Abadi in view of Vasudevan and in view of Malaya and in view of Venieris and further in view of Ji teaches the computer-readable memory of claim 1, wherein the processor is configured to perform computations at a higher precision than the neural network accelerator(Abadi, pg. 9, sec. 5.5 Lossy Compression, “Some machine learning algorithms, including those typically used for training neural networks…we often use lossy compression of higher precision internal representations when sending data between devices…[f]or example, we often insert special conversion nodes that convert 32-bit floating point representations into a 16-bit floating point representation.” Abadi teaches we often use lossy compression of higher precision internal representations when sending data between devices…[f]or example, we often insert special conversion nodes that convert 32-bit floating point representations into a 16-bit floating point representation (i.e. wherein the processor is configured to perform computations at a higher precision than the neural network accelerator)).
Regarding claim 3, Abadi in view of Vasudevan and in view of Malaya and in view of Venieris and further in view of Ji teaches the computer-readable memory of claim 1, wherein the neural network model is specified using source code of a machine learning native framework(Abadi, pg. 2, fig. 1, fig. 2,  “We have open-sourced the TensorFlow API and a reference implementation under the Apache 2.0 license… [c]lients typically construct a computational graph using one of the supported frontend languages (C++ or Python). An example fragment to construct and then execute a TensorFlow graph using the Python front end is shown in Figure 1, and the resulting computation graph in Figure 2.” Abadi teaches [c]lients typically construct a computational graph using one of the supported frontend languages (C++ or Python) (i.e. wherein the neural network model is specified using source code) We have open-sourced the TensorFlow API and a reference implementation under the Apache 2.0 license (i.e. of a machine learning native framework)).
Regarding claim 4, Abadi in view of Vasudevan and in view of Malaya and in view of Venieris and further in view of Ji teaches the computer-readable memory of claim 1, wherein the marker node reduces a precision of  the values passed to the first subgraph during a fine-tune training mode(Vasudevan, para. 0073, see also fig. 3, “In some implementations, data exchanged between devices in association with send and receive nodes                         
                            
                                
                                    S
                                
                                
                                    3
                                
                            
                        
                     and                         
                            
                                
                                    R
                                
                                
                                    3
                                
                            
                        
                     may be compressed. That is, the operations 330 represented by send node                         
                            
                                
                                    S
                                
                                
                                    3
                                
                            
                        
                     may act to perform one or more compression processes upon the output of the operation represented by node 310. Similarly, the operations 340 represented by receive node                         
                            
                                
                                    R
                                
                                
                                    3
                                
                            
                        
                      may act to perform one or more decompression processes upon compressed data provided as output by way of execution of the operations 330 represented by send node                          
                            
                                
                                    S
                                
                                
                                    3
                                
                            
                        
                    . The compression operations performed may include any conventional compression algorithm that is appropriate for transmitting data between the two devices. For example, the data exchanged between devices may be downconverted, truncated, or a combination thereof. Similarly, values conveyed by such data may also be subject to probabilistic rounding.”).
Regarding claim 5, Abadi in view of Vasudevan and in view of Malaya and in view of Venieris and further in view of Ji teaches the computer-readable memory of claim 1, wherein the marker node reduces a precision of the values output from the first subgraph during a fine-tune training mode(Vasudevan, para. 0073, see also fig. 3, “In some implementations, data exchanged between devices in association with send and receive nodes                         
                            
                                
                                    S
                                
                                
                                    3
                                
                            
                        
                     and                         
                            
                                
                                    R
                                
                                
                                    3
                                
                            
                        
                     may be compressed. That is, the operations 330 represented by send node                         
                            
                                
                                    S
                                
                                
                                    3
                                
                            
                        
                     may act to perform one or more compression processes upon the output of the operation represented by node 310. Similarly, the operations 340 represented by receive node                         
                            
                                
                                    R
                                
                                
                                    3
                                
                            
                        
                      may act to perform one or more decompression processes upon compressed data provided as output by way of execution of the operations 330 represented by send node                          
                            
                                
                                    S
                                
                                
                                    3
                                
                            
                        
                    . The compression operations performed may include any conventional compression algorithm that is appropriate for transmitting data between the two devices. For example, the data exchanged between devices may be downconverted, truncated, or a combination thereof. Similarly, values conveyed by such data may also be subject to probabilistic rounding.”).
Regarding claim 7, Abadi in view of Vasudevan and in view of Malaya and in view of Venieris and further in view of Ji teaches the computer-readable memory of claim 1, wherein the first subgraph comprises a quantization node interposed between a first internal neural node of the first subgraph and a second internal neural node of the first subgraph, and the quantization node reduces a precision of values passed between the first internal neural node and the second internal neural node during a fine-tune training mode(Abadi, pg. 9, sec. 5.5 Lossy Compression, “Some machine learning algorithms, including those typically used for training neural networks…we often use lossy compression of higher precision internal representations when sending data between devices…[f]or example, we often insert special conversion nodes that convert 32-bit floating point representations into a 16-bit floating point representation.” Note: It is being interpreted that the special conversion node is interposed between the first internal neural node  and second internal node of the subgraph and the 32-bit floating point representation represents the precision of the processor and the 16-bit floating point representation represents the precision of the hardware accelerator); implemented on a machine learning native framework executing on the processor(Abadi, pg. 7, sec. 4.2 Partial Execution, fig.6,  “Often a client wants to execute just a subgraph of the entire execution graph. To support this, once the client has set up a computation graph in a Session, our Run method allows them to execute an arbitrary subgraph of the whole graph, and to inject arbitrary data along any edge in the graph, and to retrieve data flowing along any edge in the graph.” Note: It is being interpreted that the client is executing the machine learning framework (i.e., Tensor flow) on his or her processor).
Regarding claim 8, Abadi in view of Vasudevan and in view of Malaya and in view of Venieris and further in view of Ji teaches the computer-readable memory of claim 1, wherein the marker node comprises metadata specifying a format for communicating values between the accelerated version of the first subgraph and the neural network model executing on the processor in communication with the neural network accelerator (Abadi, pg. 5, sec. 3.2.2 Cross-Device Communication, fig. 4,  “Once the node placement has been computed, the graph is partitioned into a set of subgraphs, one per device… [a]t runtime, the implementations of the Send and Receive nodes coordinate to transfer data across devices. This allows us to isolate all communication inside Send and Receive implementations, which simplifies the rest of the runtime. When we insert Send and Receive nodes, we canonicalize all users of a particular tensor on a particular device to use a single Receive node, rather than one Receive node per downstream user on a particular device. This ensures that the data for the needed tensor is only transmitted once between a source device                         
                            →
                        
                    destination device pair.” Note: It is being interpreted that the source device represents the neural network model executing on the processor and the destination device pair represents the neural network accelerator and the Send and Receive nodes represent metadata specifying a format for communicating values between the accelerated version of the subgraph and the neural network model).
Referring to independent claim 10 is rejected on the same basis as independent claim 1 since they are analogous claims.
Regarding dependent claim 11, Abadi in view of Vasudevan and in view of Malaya and in view of Venieris and further in view of Ji teaches the method of claim 10, wherein the programming interface is an application programming interface (API)(Abadi, pg. 2, see also fig. 1 and fig. 2,  “We have open-sourced the TensorFlow API and a reference implementation under the Apache 2.0 license… [c]lients typically construct a computational graph using one of the supported frontend languages (C++ or Python). An example fragment to construct and then execute a TensorFlow graph using the Python front end is shown in Figure 1, and the resulting computation graph in Figure 2.”).
Regarding claim 12, Abadi in view of Vasudevan and in view of Malaya and in view of Venieris and further in view of Ji teaches the method of claim 10, further comprising: using a processor to train the neural network model to generate training data for the first subgraph of the neural network model (Abadi, pg. 11, sec. 7 Common Programming Idioms, see also fig. 7(top portion), “One simple technique for speeding up SGD is to parallelize the computation of the gradient for a mini-batch across mini-batch elements. For example, if we are using a mini-batch size of 1000 [training] elements, we can use 10 replicas of the model to each compute the gradient for 100 [training] elements, and then combine the gradients and apply updates to the parameters synchronously, in order to behave exactly as if we were running the sequential SGD algorithm with a batch size of 1000 [training] elements. In this case, the TensorFlow graph simply has many replicas of the portion of the graph that does the bulk of the model computation, and a single client thread drives the entire training loop for this large graph.” Abadi teaches a single client thread drives the entire training loop for this large graph (i.e. using a processor to train the neural network model to generate training data) the TensorFlow graph simply has many replicas of the portion of the graph that does the bulk of the model computation (i.e. for the first subgraph of the neural network model));the processor configured to perform computations at a higher precision than the hardware accelerator(Abadi, pg. 9, sec. 5.5 Lossy Compression, “Some machine learning algorithms, including those typically used for training neural networks…we often use lossy compression of higher precision internal representations when sending data between devices… we often insert special conversion nodes that convert 32-bit floating point representations into a 16-bit floating point representation.” Note: It is being interpreted that the 32-bit floating point representation represents the precision of the processor and the 16-bit floating point representation represents the precision of the hardware accelerator). 
Regarding claim 14, Abadi in view of Vasudevan and in view of Malaya and in view of Venieris and further in view of Ji teaches the method of claim 10, wherein the marker node(Abadi, pg. 7, sec. 4.2 Partial execution, fig. 6, “Two arguments to the Run call help define the exact subgraph of the computation graph that will be executed. First, the Run call accepts inputs, an optional mapping of name:port names to “fed” tensors values. Second, the Run call accepts output names, a list of output name[:port] specifications indicating which nodes should be executed, and, if the port portion is present in a name, that that particular output tensor value for the node should be returned to the client if the Run call completes successfully. The graph is transformed based on the values of inputs and outputs. Each node:port specified in inputs is replaced with a feed node, which will pick up the provided input tensor from specially-initialized entries in a Rendezvous object used for the Run call. Similarly, each output name with a port is connected to a special fetch node that arranges to save the output tensor and return it to the client when the Run call is complete.” Abadi teaches inputs, an optional mapping of name:port names to “fed” tensors values and output names, a list of output name[:port] specifications indicating which nodes should be executed (i.e. the marker node)); converts the value from the higher precision of the processor to the lower precision of the hardware accelerator(Abadi pg. 9, sec. 5.5 Lossy Compression, “Some machine learning algorithms, including those typically used for training neural networks…we often use lossy compression of higher precision internal representations when sending data between devices…[f]or example, we often insert special conversion nodes that convert 32-bit floating point representations into a 16-bit floating point representation.” Note: It is being interpreted that the 32-bit floating point representation represents the precision of the processor and the 16-bit floating point representation represents the precision of the hardware accelerator); when the value is passed from the neural network model to the first subgraph (Malaya para. 0050-0052, fig. 2, fig. 4, fig. 6,  “At block 617 the adjustment logic 230 reconfigures each of the computational units in the neural network 400 to use the next number representation as determined for the computational unit at block 609...[a]t block 607, each computational unit performs calculations using the current number representation for the computational unit. The output values 426(2) and 427(2) for the second iteration (i=2) are generated based on these calculations by the computational units in network 400.” Note: It is being interpreted that each of the computational units in the neural network 400 represents nodes in the first subgraph).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Abadi with the above teachings of Malaya for the same rationale stated at Claim 1.
Regarding claim 15, Abadi in view of Vasudevan and in view of Malaya and in view of Venieris and further in view of Ji teaches the method of claim 10, wherein implementing code of the programming interface comprises a quantization node between a first internal neural node and a second internal neural node of the first subgraph, and the quantization node converts a value from the higher precision of the processor to the lower precision of the hardware accelerator(Abadi, pg. 9, sec. 5.5 Lossy Compression, “Some machine learning algorithms, including those typically used for training neural networks…we often use lossy compression of higher precision internal representations when sending data between devices…[f]or example, we often insert special conversion nodes that convert 32-bit floating point representations  into a 16-bit floating point representation.” Note: It is being interpreted that the special conversion node is interposed between the first internal neural node  and second internal node of the subgraph and the 32-bit floating point representation represents the precision of the processor and the 16-bit floating point representation represents the precision of the hardware accelerator) when the value is generated by the first internal neural node and passed to second internal neural node during a second phase of the training(Malaya para. 0050-0052, fig. 2, fig. 4, fig. 6,  “At block 617 the adjustment logic 230 reconfigures each of the computational units in the neural network 400 to use the next number representation as determined for the computational unit at block 609...[a]t block 607, each computational unit performs calculations using the current number representation for the computational unit. The output values 426(2) and 427(2) for the second iteration (i=2) are generated based on these calculations by the computational units in network 400.” Note: It is being interpreted that each of the computational units in the neural network 400 represents nodes in the first subgraph).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Abadi with the above teachings of Malaya for the same rationale stated at Claim 1. 
Regarding claim 16, Abadi in view of Vasudevan and in view of Malaya and in view of Venieris and further in view of Ji teaches the method of claim 11, further comprising: using the processor to evaluate the neural network model and the first subgraph of the neural network model before the hardware accelerator is configured using the configuration information(Abadi, pgs. 4-5, sec. 3.2.1 Node Placement, “Given a computation graph, one of the main responsibilities of the TensorFlow implementation is to map the computation onto the set of available devices… one input to the placement algorithm is a cost model, which contains estimates of the sizes (in bytes) of the input and output tensors for each graph node, along with estimates of the computation time required for each node when presented with its input tensors. This cost model is either statically estimated based on heuristics associated with different operation types, or is measured based on an actual set of placement decisions for earlier executions of the graph. The placement algorithm first runs a simulated execution of the graph. The simulation is described below and ends up picking a device for each node in the graph using greedy heuristics. The node to device placement generated by this simulation is also used as the placement for the real execution.” Abadi teaches is a cost model, which contains estimates of the sizes (in bytes) of the input and output tensors for each graph node, along with estimates of the computation time required for each node when presented with its input tensors (i.e. using the processor to evaluate the neural network model and the first subgraph of the neural network model ) The placement algorithm first runs a simulated execution of the graph. The simulation is described below and ends up picking a device for each node in the graph using greedy heuristics (i.e. before the hardware accelerator is configured using the configuration information)), including the processor converting computations of the first subgraph from the higher precision of the processor to the lower precision of the hardware accelerator(Abadi, pg. 9, sec. 5.5 Lossy Compression, “Some machine learning algorithms, including those typically used for training neural networks…we often use lossy compression of higher precision internal representations when sending data between devices… we often insert special conversion nodes that convert 32-bit floating point representations into a 16-bit floating point representation.” Note: It is being interpreted that the 32-bit floating point representation represents the precision of the processor and the 16-bit floating point representation represents the precision of the hardware accelerator). 
Referring to independent claim 17 is rejected on the same basis as independent claim 1 since they are analogous claims.
Regarding, claim 18, Abadi in view of Vasudevan and in view of Malaya and in view of Venieris and further in view of Ji teaches the system of claim 17, compiling the neural network model comprises identifying the marker node at the boundary of the first subgraph(Abadi, pg. 7, sec. 4.2 Partial Execution, fig. 6, “The graph is transformed based on the values of inputs and outputs. Each node:port specified in inputs is replaced with a feed node… Similarly, each output name with a port is connected to a special fetch node… Finally, once the graph has been rewritten with the insertion of these special feed and fetch nodes, the set of nodes to execute can be determined by starting at each of the nodes named by any output and working backwards in the graph using the graph dependencies to determine the full set of nodes that must be executed in the rewritten graph in order to compute the outputs.”).
Regarding claim 19, Abadi in view of Vasudevan and in view of Malaya and in view of Venieris and further in view of Ji teaches the system of claim 17, wherein compiling the neural network model comprises assigning training data of respective neural nodes of the first subgraph to respective memory elements of the neural network accelerator (Abadi,  pg. 8, sec. 4.5 Input Operations, Although input data can be provided to a computation via feed nodes, another common mechanism used for training large-scale machine learning models is to have special input operation nodes in the graph, which are typically configured with a set of filenames and which yield a tensor containing one or more examples from the data stored in that set of files each time they are executed. This allows data to be read directly from the underlying storage system into the memory of the machine that will perform subsequent processing on the data. In configurations where the client process is separate from the worker process, if the data were fed, it typically would require an extra network hop (from the storage system to the client and then from the client to the worker vs. directly from the storage system to the[] worker when using an input node).”Note: It is being interpreted that the worker represents the neural network accelerator).
Regarding claim 20, Abadi in view of Vasudevan and in view of Malaya and in view of Venieris and further in view of Ji teaches the system of claim 19, wherein the training data is generated by using calculations of multiple precisions for the computations of the first subgraph during training of the neural network model (Malaya, para. 0055, fig. 1, fig. 4, fig. 6,  “The process 600 thus repeats blocks 607-619 for each of multiple iterations to generate outputs 426(i) and 427(i) for each iteration i, while adjusting the number precision used for performing computations in the computational units functioning as neurons 401-420 in the network 400. At each iteration of block 609, the adjustment logic 230 determines a new number representation [i.e., precision] based on the output accuracy, power consumption, values in LUT 130, or other signals.” Malaya teaches The process 600 thus repeats blocks 607-619 for each of multiple iterations to generate outputs 426(i) and 427(i) for each iteration i, while adjusting the number precision used for performing computations in the computational units (i.e. wherein the training data is generated by using calculations of multiple precisions for the computations of the first subgraph during training) functioning as neurons 401-420 in the network 400 (i.e. of the neural network model)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Abadi with the above teachings of Malaya for the same rationale stated at Claim 1.
Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Abadi et al. "Tensorflow: Large-scale machine learning on heterogeneous distributed systems." arXiv preprint arXiv:1603.04467 2016 (“Abadi”) in view  of Vasudevan et al. US 2017 /0124454 Al(“Vasudevan”)  and in view of Malaya et al. US 2019/0171420 Al (“Malaya”) and in view of Venieris, Stylianos I., et al. "fpgaConvNet: A toolflow for mapping diverse convolutional neural networks on embedded FPGAs." arXiv preprint arXiv:1711.08740 (2017)(“Venieris”) and Ji, Yu, et al. "Bridging the Gap Between Neural Networks and Neuromorphic Hardware with A Neural Network Compiler." arXiv preprint arXiv:1801.00746v3 (2018)(“Ji”) and  further in view of Zhuang et al. “Towards Effective Low-bitwidth Convolutional Neural Networks.”arXiv preprint arXiv:1711.00205v2, 2017(“Zhuang”). 
Regarding claim 9, Abadi in view of Vasudevan and in view of Malaya and in view of Venieris and further in view of Ji teaches the computer-readable memory of claim 1, but do not teach: wherein the configuration information comprises training data, and the training data of the first subgraph is generated using higher precision computations during early training and lower precision computations during later training.
However Zhuang teaches: wherein the configuration information comprises training data, and the training data of the first subgraph is generated using higher precision computations during early training and lower precision computations during later training(Zhuang, pg. 3, sec. 3.2 Two-stage optimization, “To reduce the difficulty of training, we devise a two-stage optimization procedure: at the first stage, we only quanitze the weights of the network while setting the activations to be full precision. After the converge (or after certain number of iterations) of this model, we further apply the quantization function on the activations as well….”). 
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Abadi’s method in view of Vasudevan and in view of Malaya and in view of Venieris and in view of Ji and further in view of Zhuang the motivation to do so would be to improve the accuracy of a low-precision neural network (Zhuang, pg. 1, Abstract, “Optimizing a low-precision network is very challenging since the training process can easily get trapped in a poor local minima, which results in substantial accuracy loss. To mitigate this problem…we propose to use a two-stage optimization strategy to progressively find good local minima. Specially, we propose to first optimize a net with quantized weights and then quantized activations. This is in contrast to the traditional methods which optimize them simultaneously.”).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Adam Clark Standke whose telephone number is (571)270-1806. The examiner can normally be reached 10AM-7PM M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael J Huntley can be reached on (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
Adam Clark Standke
Assistant Examiner
Art Unit 2129



/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129


    
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
    

    
        1 According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim
        requiring one or more elements but not all.