DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Amendments
This action is in response to amendments filed October 7th, 2021, in which Claims 1-3, 6-9, 13, 14, 17, 19, and 20 are amended.  Claims 5, 11, 12, and 15 are cancelled.  No claims have been added.  The amendments have been entered.  Claims 1-4, 6-10, 13, 14, and 16-20 are currently pending.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims  1-4, 6-10, 13, 14, 16-19 are rejected under 35 U.S.C. 103 as being unpatentable over Vasudevan et al. (US 2017/0124454 A1);herein Vasudevan, in view of Wang et al., “DeepBurning: Automatic Generation of FPGA-based Learning Accelerators for the Neural Network Family,” herein Wang, and further in view of Mehr et al. (US 2018/0341248 A1);herein Mehr.

Regarding Claim 1,
	Vasudevan teaches a method for compiling a neural network model, comprising: 
	identifying a subgraph of the neural network model to partition from the neural network model;([0002], [0056], “The system may, for instance, determine to allocate computational graph 200A across three different devices. To make this determination the system may analyze the computational graph 200A to identify one or more nodes that can be partitioned into subgraphs and allocated to the devices available.” Discloses the identification of nodes and associated subgraphs to partition from the neural network model. [0006] discloses information about the neural networks.)
	 inserting an interface between the neural network model and a partitioned version of the subgraph, the partitioned version being adapted to be evaluated with a neural network accelerator; ([0053] – [0054], “In some implementations, to allow the devices to communicate independently of the system 100, the system 100 modifies the computational graph such that it includes additional nodes that represent communication operations between nodes. In particular, a device's respective subgraph may include a node representing an operation which, when executed by the device, allows the device to seamlessly communicate with another device that is executing a counterpart operation.” Discloses that the system inserts send and receive nodes that are used to communicate between devices. [0073] discloses data exchange between devices. [0041] discloses that placer 108 determines a target device to perform the operations. [0056] discloses that subgraph allocations can be subdivided among devices.)
	compiling the subgraph to the neural network accelerator to generate configuration information for the neural network accelerator; and ([0007], discloses data identifying an allocation of the computational graph among the devices. [0056] discloses that the computational graph partitions its nodes into subgraphs and allocated the nodes to the devices available.[0053] discloses that subgraphs are modified to include additional nodes.(i.e.  the configuration information are the additional nodes and timings.) [0041] uses a placer 108 to determine devices for operations.)
	configuring the neural network accelerator with the configuration information to provide an accelerated version of the subgraph.([0041-0042], discloses that the places determines a target device for each operation, and after the devices (the gpu/cpus are interpreted as the neural network accelerators) perform the operations allocated by placer 108, they generate outputs that are retrieved by the executor 106.[0006] discloses that these operations performed by the devices on the subgraphs reduce the overall time required to perform operations of the neural network. (I.e. generates an accelerated subgraph after operations are performed as the overall time required to perform operations of the neural network is reduced.) … ([0051]-[0053] discloses that the additional nodes are included in the devices to allow for communication.) 
	…evaluating the subgraph by the neural network accelerator during an inference mode ([0042], the devices perform the operations allocated by the placer 108 to generate outputs .. then the executor 106 can return the response to the client).
	The configuration information of Vasudevan provides information on which devices to place the subgraphs (Vasudevan, [0041]) but Vasudevan is silent with respect to the configuration information assigning training data associated with the subgraph to particular memory elements of the neural network accelerator to load the training data into particular memory elements of the neural network accelerator, prior to evaluating; nor the training data including weights … corresponding to nodes of the subgraph.  However, Wang teaches these limitations with respect to accelerating nodes of a neural network on an FPGA (Wang, pg. 3, 1st column, last paragraph, “the compiler will also be responsible for pre-processing the … weight layout into proper ‘tiles’ by analyzing the computer throughput and on-chip memory size of the NN accelerator.  In this way, the hardware part (RTL) and software part (control flow and data layout) are generated by the NN-Gen at the same time.  Another important component … is the address flow that is used to fetch on-chip and off-chip memory data automatically.  In DeepBurning, the memory address flow is generated deterministically by the DeepBurning compiler” & pg. 4, 1st column, last three paragraphs, “the AGUs know how to load, store and forward the data sets including input feature data, NN weight data … weight AGU handles the weight fetch and store for the on-chip buffer …Each of the AGUs supports memory access patterns …the compiler can generate the memory access stream of addresses” also see pg. 5, last paragraph, Method1, “align the tiles of one map [i.e. kernel weights] in memory”).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the automatic neural network layout generation system of Wang to generate layouts on FPGAs for the subgraphs of Vasudevan.  The motivation to do so is that “DeepBurning allow[s] the application developers to build from scratch learning accelerators that targets their specific NN models with custom configurations and optimized performance … [to] exhibit great power efficiency” (Wang, Abstract).
	Vasudevan and Wang are silent on biases in the neural network models implemented, but Mehr teaches, in an analogous system, training data comprises…biases([0156], “The weighting factors, bias values, and threshold values, or other computational parameters of the neural network, can be "taught" or "learned" in a training phase using one or more sets of training data. For example, the parameters may be trained using the input data from a training data set and a gradient descent or backward propagation method so that the output value(s) (e.g., a set of predicted adjustments to process control parameter settings) that the ANN computes are consistent with the examples included in the training data set.” Discloses that bias values are learned from training data and therefore the training data must comprise biases.)
Vasudevan to include the biases as training data, as taught by Mehr. One of ordinary skill in the art would have been motivated to make the combination in order to train parameters of a neural network so that the computes may be consistent with training data. (Mehr, [0156])

Regarding Claim 2,
	Vasudevan/Wang/Mehr teaches the method of claim 1 (and thus the rejection of claim 1 is incorporated herein.).
	Vasudevan further teaches wherein inserting the interface comprises identifying a group of edges at a boundary of the subgraph.([0054], discloses that locations to insert pairs of send and receive nodes in graphs are identified by any cross-devices directed edges in the graph.)

Regarding Claim 3,
	Vasudevan/Wang/Mehr teaches the method of claim 2 (and thus the rejection of claim 2 is incorporated herein.).
	 Vasudevan further teaches wherein inserting the interface comprises generating a data structure for passing tensor values between the neural network model and the partitioned version of the subgraph across the group of edges.([0053], discloses that each node represents an operation in which tensor data is relayed to a receive node that is assigned to a different device than the send node.)

Regarding Claim 4,
	Vasudevan/Wang/Mehr teaches the method of claim 3 (and thus the rejection of claim 3 is incorporated herein.).
	Vasudevan further teaches wherein generating the data structure comprises specifying an order of tensor values within the data structure, each tensor value corresponding to a different respective edge of the group of edges.([0023], “Generally, the input and outputs flowing along directed edges in the computational graph are tensors. A tensor is a multidimensional array of numeric values or other values, e.g., strings, having a specific order that corresponds to the dimensionality of the array. For example, a scalar value is a 0th-order tensor, a vector of numeric values is a 1st-order tensor, and a matrix is a 2nd-order tensor.” Discloses that tensors specify their specific order. (i.e. a scalar – 0th order, vector – 1st order, matrix – 2nd order.). Furthermore these inputs and outputs flow along directed edges (i.e. the values correspond to different edges). [0027] discloses different edges for the tensor values.)


Regarding Claim 6,
	Vasudevan/Wang/Mehr teaches the method of claim 1 (and thus the rejection of claim 1 is incorporated herein.). 
	Vasudevan further teaches wherein compiling the subgraph comprises assigning a particular region of configurable logic of the neural network accelerator to evaluate a particular neural node of the subgraph. ([0036], [0041], discloses that a placer 108 determines a respective target device to perform each operation. (Which corresponds to assigning a region of configurable logic of the accelerator (gpus)) Fig. 2c shows that the neural nodes 206-210 exist within the subgraphs 246.)

Regarding Claim 7,
	Vasudevan/Wang/Mehr teaches the method of claim 6 (and thus the rejection of claim 6 is incorporated herein.).
	The combination, through the compiler of Wang already incorporated, further teaches wherein compiling the subgraph comprises assigning training data corresponding to the particular neural node of the subgraph to a memory element that is locally accessible to the particular region of configurable logic of the neural network accelerator (Wang, pg. 4, 1st column, last three paragraphs, “the AGUs know how to load, store and forward the data sets including input feature data, NN weight data … weight AGU handles the weight fetch and store for the on-chip buffer …Each of the AGUs supports memory access patterns …the compiler can generate the memory access stream of addresses”).

Regarding Claim 8,
	Vasudevan teaches a method for evaluating a neural network model, comprising: 
	using a neural network accelerator to evaluate a subgraph of the neural network model in the inference mode to generate output values corresponding to a first boundary of the subgraph; ([0002], [0056-[0057], discloses that the system partitions the graph 200A across three different devices(the neural network accelerators). Fig. 2A, [0057] disclose that the edges cross the boundaries by extending from node 208 (one device) to node 212(a second device)) [0064], discloses relaying the outputs  ([0067] discloses the outputs are tensor values) from one node to another. 
	using a neural network server including a general-purpose central processing unit (CPU) (Fig. 1b, [0038] discloses that the system includes a server 132 that has a cpu device 118. As the server is used as the framework for operations of neural networks represented as graphs, it is considered to be a neural network server.) to evaluate the neural network model to generate input values corresponding to a second boundary of the subgraph; and ([0051], “For example, a node that represents an operation being executed on a first device may receive, as input, an output of another node that represents an operation being executed on a second, remotely located device.” discloses that the output values of one node is an input to another node. [0057] discloses that the edges cross the boundaries from one node to another.)
	communicating the input values of the subgraph from the neural network server to the neural network accelerator using a packet comprising an identifier identifying the second boundary and the input values.([0053] discloses that the devices communicate between one another using inserted additional nodes. [0074]-[0076] discloses that the send node in graph 400b are assigned devices which include CPU or GPU (accelerators), where the devices reside within machine 420b(the server). [0060], disclose address information (identifier) of each device and nodes being executed by the devices as well as different communication processes (i.e. communication processes as the packets). [0054], [0063], disclose the outputs as tensor values and that  that during execution time the operations of a first send node relays the output of node 201 to the receive node. (I.e. corresponds to input values of the packet). [0057] discloses that edges between nodes cross boundaries.)
(Wang, pg. 3, 1st column, last paragraph, “the compiler will also be responsible for pre-processing the … weight layout into proper ‘tiles’ by analyzing the computer throughput and on-chip memory size of the NN accelerator.  In this way, the hardware part (RTL) and software part (control flow and data layout) are generated by the NN-Gen at the same time.  Another important component … is the address flow that is used to fetch on-chip and off-chip memory data automatically.  In DeepBurning, the memory address flow is generated deterministically by the DeepBurning compiler” & pg. 4, 1st column, last three paragraphs, “the AGUs know how to load, store and forward the data sets including input feature data, NN weight data … weight AGU handles the weight fetch and store for the on-chip buffer …Each of the AGUs supports memory access patterns …the compiler can generate the memory access stream of addresses” also see pg. 5, last paragraph, Method1, “align the tiles of one map [i.e. kernel weights] in memory”).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the automatic neural network layout generation system of Wang to generate layouts on FPGAs for the subgraphs of Vasudevan.  The motivation to do so is that “DeepBurning allow[s] the application developers to Wang, Abstract).
	Vasudevan and Wang are silent on biases in the neural network models implemented, but Mehr teaches, in an analogous system, training data comprises…biases([0156], “The weighting factors, bias values, and threshold values, or other computational parameters of the neural network, can be "taught" or "learned" in a training phase using one or more sets of training data. For example, the parameters may be trained using the input data from a training data set and a gradient descent or backward propagation method so that the output value(s) (e.g., a set of predicted adjustments to process control parameter settings) that the ANN computes are consistent with the examples included in the training data set.” Discloses that bias values are learned from training data and therefore the training data must comprise biases.)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Vasudevan to include the biases as training data, as taught by Mehr. One of ordinary skill in the art would have been motivated to make the combination in order to train parameters of a neural network so that the computes may be consistent with training data. (Mehr, [0156])

Regarding Claim 9,
		Vasudevan/Wang/Mehr teaches the method of claim 8 (and thus the rejection of claim 8 is incorporated herein). 
Vasudevan further teaches wherein the identifier identifying the second boundary is associated with particular memory elements of the neural network accelerator and the input values of the subgraph are stored in the particular memory elements in response to receiving the packet.([0037], [0074], “As described above, the communication protocols leveraged by operations represented by pairs of send and receive nodes may depend on one or more characteristics of the devices, machines, nodes, and networks associated with the execution of the subgraphs at hand. FIGS. 4A-B depict two portions of computational graphs 400A and 400B that include send and receive nodes and are allocated to devices. It can be seen that the send node included in computational graph 400A has been assigned to device 412A, which in this example is a GPU, and that the receive node included in computational graph 400A has been assigned to device 414A, which in this example is also a GPU. In this example, GPU 412A and GPU 414A reside within a same machine 410A. Since devices that the send and receive nodes of computational graph 400A are both GPUs and both reside within a same machine 410A, it may be advantageous for their exchanges to be conducted under a remote procedure call (RPC) or other localized request/response protocol.” Discloses that send and receive nodes reside within different machines and devices. [0037] discloses that these devices and machines include memory elements (and therefore the subgraphs are stored on the memory elements). [0063], discloses an output from one node is the input of another node as the data input values.) The edges between nodes are considered to cross boundaries as disclosed in [0057].)

Regarding Claim 10,
	Vasudevan/Wang/Mehr teaches the method of claim 9 (and thus the rejection of claim 9 is incorporated herein). 
	 Vasudevan further teaches wherein the particular memory elements are block RAMs associated with neural node processing elements that are configured to evaluate nodes of the subgraph that are connected to the second boundary of the subgraph.([0037], discloses RAM memory elements which are associated with neural node processing elements (devices and associated operations) . [0063] discloses outputs and inputs between nodes are calculated and relayed. [0054] discloses that the edges are the connections between nodes and [0057] discloses that the edges cross boundaries between the nodes.)


Regarding Claim 13,
	Vasudevan/Wang/Mehr teaches the method of claim 8 (and thus the rejection of claim 8 is incorporated herein).  
	Vasudevan further comprising: communicating the output values of the subgraph from the neural network accelerator to the neural network server using a packet comprising an identifier identifying the first boundary and the output values. ([0053] discloses that the devices communicate between one another using inserted additional nodes. [0074]-[0076] discloses that the send node in graph 400b are assigned devices which include CPU or GPU (accelerators), where the devices reside within machine 420b(the server). [0060], disclose address information (identifier) of each device and nodes being executed by the devices as well as different communication processes (i.e. communication processes as the packets). [0054], [0063], disclose the outputs as tensor values and that  that during execution time the operations of a first send node relays the output of node 201 to the receive node. (I.e. corresponds to output and input values of the packet). [0057] discloses that edges between nodes cross boundaries.)

Regarding Claim 14,
	Vasudevan teaches a system, comprising: a neural network server in communication with a neural network accelerator, the neural network server comprising: ([0074], discloses a neural network server(machines 420b) in communication with neural network accelerators (device 422B))
	at least one processor, and ([0037], disclose a processor)
	a computer-readable memory storing computer-executable instructions that when executed by the at least one processor, cause the neural network server to perform a method, the computer-executable instructions comprising:([0091]-[0092] disclose the computer readable medium storing program instructions) instructions to compile a neural network model for execution on the system, wherein compiling the neural network model comprises partitioning a subgraph of the neural network model for execution on the neural network accelerator and generating configuration data…([0007], discloses data identifying an allocation of the computational graph among the devices. [0056] discloses that the computational graph partitions its nodes into subgraphs and allocated the nodes to the devices available.[0053] discloses that subgraphs are modified to include additional nodes.(i.e.  the configuration information are the additional nodes and timings.) [0041] uses a placer 108 to determine devices for operations.)
([0042], discloses that after the devices (the gpu/cpus are interpreted as the neural network accelerators) perform the operations allocated by placer 108, they generate outputs that are retrieved by the executor 106.[0006] discloses that these operations performed by the devices on the subgraphs reduce the overall time required to perform operations of the neural network. (I.e. generates an accelerated subgraph after operations are performed as the overall time required to perform operations of the neural network is reduced.). using  the devices (accelerators) to perform operation is considered to be during a deployment mode as the devices are used/deployed.)
	instructions to evaluate the neural network model during the inference mode, including passing tensor values between the neural network server and the neural network accelerator; and ([0035], disclose a neural network inference request where the client can request output values representing inference operations from one or mode nodes of the computational graph. [0054] discloses that each node represents an operation in which data, such as tensor, is relayed to a receive node that is assigned to a different device than that of the send node. Fig. 4b show the communication of send and receive nodes between different machines (servers) and devices (accelerators)
	wherein the neural network accelerator comprises: configurable logic that is configurable…, the configurable logic comprising a plurality of regions, a respective region configured to perform an operation of a respective node of the subgraph; and ([0037], where devices 116-122 include a memory such as ram wherein the devices can include CPUs and GPUs (as the configurable logic, see [0039] where the devices are configured to perform operations). [0074] disclose that GPU 412A and 414A reside within the same machine 410A (i.e. different regions). ([0036], [0041], discloses that a placer 108 determines a respective target device to perform each operation. (Which corresponds to the respective regions performing operations) Fig. 2c shows that the neural nodes 206-210 exist within the subgraphs 246.)
	a plurality of memory elements, wherein a respective memory element is locally accessible by a respective region of the configurable logic.([0037], Fig. 4A, discloses multiple GPUs in a machine 410A where each GPU include RAM (as the plurality of memory elements) [0089] discloses that servers(the machine) include computer readable media(i.e. memory as disclosed in [0092]). [0074] disclose that GPU 412A and 414A reside within the same machine 410A (i.e. different regions locally accessible as they reside on the same machine)
	The configuration information of Vasudevan provides information on which devices to place the subgraphs (Vasudevan, [0041]) but Vasudevan is silent with respect to the configuration information assigning training data associated with the subgraph to particular memory elements of the neural network accelerator, the training data including weights … corresponding to nodes of the subgraph nor to load the training data into particular memory elements of the neural network accelerator, prior to evaluating.  However, Wang teaches these limitations with respect to accelerating nodes of a neural network on an FPGA (Wang, pg. 3, 1st column, last paragraph, “the compiler will also be responsible for pre-processing the … weight layout into proper ‘tiles’ by analyzing the computer throughput and on-chip memory size of the NN accelerator.  In this way, the hardware part (RTL) and software part (control flow and data layout) are generated by the NN-Gen at the same time.  Another important component … is the address flow that is used to fetch on-chip and off-chip memory data automatically.  In DeepBurning, the memory address flow is generated deterministically by the DeepBurning compiler” & pg. 4, 1st column, last three paragraphs, “the AGUs know how to load, store and forward the data sets including input feature data, NN weight data … weight AGU handles the weight fetch and store for the on-chip buffer …Each of the AGUs supports memory access patterns …the compiler can generate the memory access stream of addresses” also see pg. 5, last paragraph, Method1, “align the tiles of one map [i.e. kernel weights] in memory”).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the automatic neural network layout generation system of Wang to generate layouts on FPGAs for the subgraphs of Vasudevan.  The motivation to do so is that “DeepBurning allow[s] the application developers to build from scratch learning accelerators that targets their specific NN models with custom configurations and optimized performance … [to] exhibit great power efficiency” (Wang, Abstract).
	Vasudevan and Wang are silent on biases in the neural network models implemented, but Mehr teaches, in an analogous system, training data comprises…biases([0156], “The weighting factors, bias values, and threshold values, or other computational parameters of the neural network, can be "taught" or "learned" in a training phase using one or more sets of training data. For example, the parameters may be trained using the input data from a training data set and a gradient descent or backward propagation method so that the output value(s) (e.g., a set of predicted adjustments to process control parameter settings) that the ANN computes are consistent with the examples included in the training data set.” Discloses that bias values are learned from training data and therefore the training data must comprise biases.)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Vasudevan to include the biases as training data, as taught by Mehr. One of ordinary skill in the art would have been motivated to make the combination in order to train parameters of a neural network so that the computes may be consistent with training data. (Mehr, [0156])


Regarding Claim 16,
	Vasudevan/Wang/Mehr teaches the system of claim 14 (and thus the rejection of claim 14 is incorporated herein). Vasudevan/Wang/Mehr teaches partitioning the subgraph of the neural network model for execution on the neural network accelerator (see the rejection of claim 14).
	Vasudevan further teaches identifying input edges of the subgraph and generating a data structure for passing values from the input edges of the subgraph to neural nodes of the subgraph. ([0054], discloses that locations to insert pairs of send and receive nodes in graphs are identified by any cross-devices directed edges in the graph. [0058] and [0063] discloses the input edges as the edges between receive and send nodes in fig. 2c. ([0053] discloses that the devices communicate between one another using inserted additional nodes. [0060], disclose address information (identifier) of each device and nodes being executed by the devices as well as different communication processes (i.e. communication processes as the data structure for passing values). [0073] discloses data values are conveyed between the devices)

Regarding Claim 17,
	Vasudevan/Wang/Mehr teaches the system of claim 16 (and thus the rejection of claim 16 is incorporated herein).
	Vasudevan wherein the tensor values are passed between the neural network server and the neural network accelerator using a packet comprising the tensor values formatted according to the data structure.  ([0053] discloses that the devices communicate between one another using inserted additional nodes. [0074]-[0076] discloses that the send node in graph 400b are assigned devices which include CPU or GPU (accelerators), where the devices reside within machine 420b(the server). [0060], disclose address information (identifier) of each device and nodes being executed by the devices as well as different communication processes (i.e. communication processes as the packets). [0060], where the communication processes are the data structure for passing values. [0054], [0063], disclose the outputs as tensor values and that  that during execution time the operations of a first send node relays the output of node 201 to the receive node. (I.e. corresponds to values formatted according to the communication processes as the communication processes pass the values). 

Regarding Claim 18,
	Vasudevan/Wang/Mehr teaches the system of claim 14 (and thus the rejection of claim 14 is incorporated herein).
Vasudevan teaches wherein the tensor values are passed between the neural network server and the neural network accelerator ([0053] discloses that the devices communicate between one another using inserted additional nodes. [0074]- [0076] discloses that the send node in graph 400b are assigned devices which include CPU or GPU (accelerators), where the devices reside within machine 420b(the server). [0060], disclose address information (identifier) of each device and nodes being executed by the devices as well as different communication processes (i.e. communication processes as the packets) using an application-layer packet consisting of only an identifier identifying the subgraph and the tensor values. ([0034], discloses that executor 106 can retrieve and store addresses of modified weights, where the system accesses the weights using the addresses (i.e. an application-layer data packet consisting only of an address (identifier) and modified weights). [0027] discloses the tensor weights.)

Regarding Claim 19,
	Vasudevan/Wang/Mehr teaches the system of claim 14 (and thus the rejection of claim 14 is incorporated herein).
	 Vasudevan further teaches wherein the configurable logic of the neural network accelerator comprises support logic for broadcasting the tensor values passed to the neural network accelerator to memory elements associated with input neural nodes of the subgraph. [0060], where the communication processes are the data structure for passing values. [0054], [0063], disclose the outputs as tensor values and that  that during execution time the operations of a first send node relays the output of node 201 to the receive node. (I.e. corresponds to values passed according to the communication processes, where the communication using the send and receive nodes is considered to be the support logic that broadcast tensor values to memory elements (each device includes RAM for storing instructions and data as disclosed in [0037]).


Claim 20 is rejected under35 U.S.C. 103 as being unpatentable over Vasudevan et al. (US 2017/0124454 A1);herein Vasudevan, in view of Wang et al., “DeepBurning: Automatic Generation of FPGA-based Learning Accelerators for the Neural Network Family,” herein Wang, and Mehr et al. (US 2018/0341248 A1);herein Mehr, and further in view of Chung et al. (US 2016/0379108 A1);herein Chung.

Regarding Claim 20,
	Vasudevan/Wang/Mehr teaches the system of claim 14 (and thus the rejection of claim 14 is incorporated herein).
	Vasudevan further teaches wherein the configurable logic of the neural network accelerator is configured to implement a …central processing unit (CPU) for processing at least a portion of the subgraph as a hardware accelerated subgraph.([0037] disclose that devices (accelerators) 116-122 are CPUs. [0006] discloses that the devices process the subgraphs.)
	Vasudevan/Wang/Mehr does not explicitly teach, but Chung teaches, in an analogous system, using a soft central processing unit (CPU)…([0109], discloses using soft CPUs provided by single processing components)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Vasudevan/Wang/Mehr to include the Chung. One of ordinary skill in the art would have been motivated to make the combination in order to send and retrieve information using constructed packets. (Chung, [0203]).
Response to Arguments                                                                                                                                                                                                                                                                                                                                                                                                             
Applicant’s arguments filed October 7th, 2021 have been fully considered, but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.  Specifically, Wang teaches a complier for determining FPGA layout information (including memory addresses and data routing and tiling for weights) for an accelerated neural network.
Conclusion                                                                                                                                                                                                                                                                                                                                                                                                             
Applicant’s amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRIAN M SMITH whose telephone number is (469)295-9104. The examiner can normally be reached Monday - Friday, 8:30am -5pm Central.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571) 272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/BRIAN M SMITH/            Primary Examiner, Art Unit 2122