DETAILED ACTION
This Office Action is in response to the remarks entered on 02/22/2021. Claims 1, 13 were amended. No claims were added. Claims 2, 6, 8, 9 and 19 were cancelled.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 are presented for examination.
Applicant’s arguments filed 02/22/2021 have been fully considered but they are not persuasive. 

In reference to Applicant’s arguments about: Rejections under 35 U.S.C. §103:
 	Applicant’s Argument: 
Nonobviousness Under 35 USC @103(a) 
Claims 1-20 are understood to be patentable, and the rejection under 35 USC §103(a) over "Barham" (Pub. No. US20170124451 by Barham et al.) in view of "Xu" (Pub. No. US2010/0076915 by Xu et al.) is traversed, because a prima facie case of obviousness has not been established. 
Notably, claim 1 is amended to include the features of claims 7, 8, and 9, and claim 13 is similarly amended. The amendments involve selection of a preferred path in a directed graph that specifies dependencies between neural network operations, and the addition of dependencies to the directed graph from vertices in the preferred path to vertices not in the preferred path. Notably, the added dependencies place additional 
Applicant notes that Barham's paragraph 0054, which describes FIG. 3 and cited in the OA, does not show or suggest adding dependencies in a preferred path. Barham's FIG. 3 shows the same dependencies in subgraph 316 as are in streams 304 and 306; no dependencies are added. 
For at least the reasons set forth above, claims 1 and 13 are non-obvious over the Barham-Xu combination. Claims 3-5, 7, and 10-12 have claim 1 as a base claim, and claims 14-18 and 20 have claim 13 as a base claim (claims 2, 6, 8-9, and 19 are cancelled). Therefore, the dependent claims are also non-obvious over the Barham-Xu combination. Applicant respectfully requests that the rejection under 35 USC §103(a) be withdrawn. 
Examiner’s Response: 
Examiner respectfully disagrees to applicant’s argument because the claims 1 and 13 are obvious in view of Barham and Xu. Barham further discloses requested work (input) to process the computation graph, wherein the graph including plurality nodes (operation of neural network), the edges represent the relationship/dependence between the nodes, wherein the nodes represented the operation of neural network. An outgoing edges from a node represented a flow of an output of the operation. As it can be seen at ([Par.0040], “The system receives a request from a client to process a computational graph (step 202). For example, the request can be a request to perform a 
Examiner respectfully reminds applicant that Barham in view of Xu teaches each and every limitations of the dependent claims. Therefore, the argument is not persuasive, the rejections of claims 3-5, 7, 10-12, 14-18 and 20 are still maintained. 

Information Disclosure Statement

6.	The information disclosure statements (IDS) filed 10/17/2017; 04/26/2019;10/23/2019 are in compliance with the provisions of 37 CFR 1.97 and 1.98. Accordingly, the information disclosure statement is being considered by the examiner

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Barham et al. (Pub. No. US20170124451– hereinafter, Barham) and further in view of Xu et al. (Pub. No. US2010/0076915– hereinafter, Xu).
Regarding to claim 1, Barham teaches a neural network processing system, comprising (Barham, [Par.0002,] “processing computa­tional graphs representing neural networks using an accel­erator device, e.g., a graphical processing unit (GPU).”):
a host computer system (Barham, [Fig 1], “illustrates an example computational graph system for distributing operations for neural networks rep­resented as computational graphs.”);
a plurality of RAMs coupled to the host computer system; a plurality of neural network accelerators coupled to the plurality of RAMs, respectively (Barham, [Par.0033], “Any devices performing neural network operations, e.g., devices 116-122, can include a memory, e.g., a random access memory (RAM), for storing instructions and data and a processor for executing stored instructions. Gen­erally, each device is a hardware resource that performs operations independent of other devices. For example, each device can have its own processing unit. The devices can be Graphical Processing Units (GPUs), Central Processing Units (CPUs), or other accelerators. By way of illustration, one machine can host one or more devices, e.g., multiple CPUs and GPUs.” wherein, each of devices will includes the memory and processing unit or accelerator, therefore, system has plurality of neural network accelerators coupled to the plurality of RAMs. Furthermore, see [Par.0053], “…This symmetry may not be available of all accelerator devices. For example, on specific accelerator devices certain streams must be used to perform operations that copy data between host and device memory.” For further clarification, see [Par.0056, Par.0061], “);
wherein the host computer system is configured with software that when executed causes the host computer system to (Barham, [Par.0033], “Any devices performing neural network operations, e.g., devices 116-122, can include a memory, e.g., a random access memory (RAM), for storing instructions and data and a processor for executing stored instructions.”  Furthermore, see [Par.0006, lines 1-10], “In general, ):
input a directed graph that specifies dependencies between neural network operations, wherein vertices of the graph represent respective ones of the neural network operations, and the edqes represent dependencies between the neural network operations (Barham, [Par.0040], “The system receives a request from a client to process a computational graph (step 202). For example, the request can be a request to perform a neural network inference represented by the computational graph on a specified input, a request to perform neural network training operations represented by the computational graph on a specified set of training data, or a request to perform other neural network operations represented by the computational graph” and [Par.0018, lines 2-12], “Each node in the computational graph represents an operation. An incoming edge to a node represents a flow of an input into the node, i.e., an input to the operation represented by the node. An outgoing edge from a node represents a flow of an output of the operation represented by the node to be used as an input to an operation represented by another node. Thus, a directed edge connecting a first node in the graph to a second node in the graph indicates that an output generated by the operation represented by the first node is used as an input to the operation represented by the ;
 select a preferred compute path from the directed graph based on optimizinc throughput (Barham, [Par. 00046-0047], “…An example system may assign computations of some hardware accelerators to streams in a particular way ( e.g., if one operation executes on stream A, then a later, related operation must also execute on stream A.) For example, a first operation may be stateful and execute on stream A. By executing, the first operation may change the internal state of the hardware in a way that must happen before a second operation executes. The second operation may then execute on stream A after the first operation is complete.” Examiner’s note, the path or stream is being preffered based on the work flow of the output node. ); 
add dependencies from vertices in the preferred path to vertices not in the preferred path (Barham, [Par.00057-0058], “…By way of illustration, the Accelerator 302 receives the subgraph 316. The instructions received by the system cause the Accelerator 302 to assign the initial node 308 to a first stream 306. The initial node 308 has two outputs----one directed edge to node 310 and one directed edge to node 314. Therefore, using the second rule, the instructions cause the Accelerator 302 to assigns nodes 310 and 314 to different streams…”Examiner’s note,  the initial node have two ; 
write input data that include input data matrices to the RAMS (Barham, [Par. 0022-Par.0024], “By way of illustration, a neural network layer that receives an input from a previous layer can use a parameter matrix to perform a matrix multiplication between the parameter matrix and the input. In some cases, this matrix multiplication can be represented as multiple nodes in the computational graph. For example, a matrix multiplication can be divided into multiple multiplication and addition operations, and each operation can be represented by a different node in the computational graph. The operation represented by each node can generate a respective output, which flows on a directed edge to a subsequent node. After the operation represented by a final node generates a result of the matrix multiplication, the result flows, on a directed edge, to another node. The result is equivalent to an output of the neural network layer that performs the matrix multiplication…” and [Par.00033, lines 1-4], “Any devices performing neural network operations, e.g., devices 116-122, can include a memory, e.g., a random access memory (RAM), for storing instructions and data and a processor for executing stored instructions” Examiner’s note, the memory (RAMs) stores a data and executed instruction, therefore, the input is wrote into the memory for performing a work request, see [Par.00064], “The system can assign a particular node to a stream based on an amount of memory resources consumed by the node or by previously assigned nodes. For example, the system can calculate a dimension of a tensor on each directed edge to and from each node of the subgraph. The dimensions of the tensors indicate a size of memory that would be consumed by a device to perform an  ;
 	Write work requests to the RAMs, wherein each work request specifies a subset of the neural network operations to perform (Barham, [Par. 0051], “The system provides the instructions and the data to the device (step 210). In some implementations, the system sends the device a request to start the operations. The device receives the request and in response, executes the instructions received from the system.” see further [Par.0033, liens 1-4], “Any devices performing neural network operations, e.g., devices 116-122, can include a memory, e.g., a random access memory (RAM), for storing instructions and data and a processor for executing stored instructions” therefore, the request is wrote into the device, where the device includes a memory (RAM) for executing stored instruction.  Furthermore, see [Par. 0002, 0006], “ In general, one innovative aspect of the subject matter described in this specification can be embodied in methods that include the actions of receiving, by a compu­tational graph system, a request to process a computational graph; obtaining data representing a subgraph of the com­putational graph, the computational graph comprising a plurality of nodes and directed edges, wherein each node represents a respective operation, wherein each directed edge connects a respective first node to a respective second node that represents an operation that receives, as input, an output of an operation represented by the respective first node, the subgraph assigned to a first device by a placer in the computational graph system; determining that the first device comprises a 
[…]
and that include the respective weights matrix, a respective input data matrix of the input data matrices, and a respective output matrix (Barham, [Par. 0022-Par.0024], “By way of illustration, a neural network layer that receives an input from a previous layer can use a parameter matrix to perform a matrix multiplication between the parameter matrix and the input. In some cases, this matrix multiplication can be represented as multiple nodes in the computational graph. For example, a matrix multiplication can be divided into multiple multiplication and addition operations, and each operation can be represented by a different node in the computational graph. The operation represented by each node can generate a respective output, which flows on a directed edge to a subsequent node. After the operation represented by a final node generates a result of the matrix multiplication, the result flows, on a directed edge, to another node. The result is equivalent to an output of the neural network layer that performs the matrix multiplication…”), 
and wherein the work requests are written in the order specified by the dependency graph, beqinninq with the preferred compute path (Barham, [Par.0047- Par.0050], “An example system may assign computations of some hardware accelerators to streams in a particular way ( e.g., if one operation executes on stream A, then a later, related operation must also execute on stream A.) For example, a first ;
and wherein each neural network accelerator is configured to read a work request from the respective RAM and perform the subset of neural network operations on the input data using the parameters (Barham, [Par.0006], “In general, one innovative aspect of the subject matter described in this specification can be embodied in methods that include the actions of receiving, by a computational graph system, a request to process a computational graph; obtaining data representing a subgraph of the computational graph, the computational graph comprising a plurality of nodes and directed edges, wherein each node represents a respective operation, wherein each directed edge connects a respective first node to a respective second node that represents an operation that receives, as input, an output of an operation represented by the respective first node, the subgraph assigned to a first device by a placer in the computational graph system; determining that the first device comprises a graphical processing unit having a plurality of streams; in response to determining that the first device comprises a graphical processing unit having a plurality of streams, generating instructions that when executed by the first device cause the first device to: assign the operation represented by each node in the subgraph to a respective stream in the plurality of streams of the graphical processing unit; and perform the operations .
However, Barham does not teach specifies memory locations in a RAM of the plurality of RAMs of the input data and parameters for performing the work request
On the other hand, Xu teaches and specifies memory locations in a RAM of the plurality of RAMs of the input data and parameters for performing the work request (Xu, [Fig.6, Par 0126], “At (a), the application software prepares and parses the training data and parameters of neural networks model stored on the host computer. The possible data processing at this point may include organizing the data in the sequence of how the FPGA logic will access and utilize it. At (b ), application software calls the write routine in the driver installed on the host computer to write the data to memories on the FPGA accelerator. The write routine may be implemented with a direct 
Barham and Xu are analogous in arts because they are the same field of endeavor of using accelerator systems including FPGA technology to be able to perform a parallel processing data. 
Accordingly, it would have been prima facie obvious to one of the ordinary skills in the art before the effective filing date of the claimed invention to have modified Barham’s method, in view of Xu by having a module to write the input data to  specific memory location into  a Ram of the plurality of RAMs. The modification would have been obvious because one of the ordinary skills in art would be motivated to achieve high bandwidth access to the accelerator, which will help to improve a performance speed and reduce a cost. (Xu, [Par.0007], “Disclosed are accelerator systems and methods that utilize FPGA technology to achieve better parallelism and processing 
Regarding to claim 13 is being rejected for the same reason as the claim 1.
Regarding to claim 3, Barham, as modified in view of Xu teaches the neural network processing system of claim 1, wherein the host computer system is configured with software that when executed causes the host computer system to partition, reshape (Barham, [Par. 0032], “The system 100 performs the operations to generate the particular output by partitioning the operations represented by the computational graph across multiple devices 116-122. The system 100 partitions the operations to the multiple devices 116-122 over a data communication network 114, e.g., local area network (LAN) or wide area network (WAN).” Furthermore, see [Par. 0037, lines 7-13], “Although FIG. 1 illustrates one executor 106, in one implementation, there is an executor per device. This executor issues operations to the device when they become runnable (i.e., all of their inputs have been computed). This ,
Regrading to claim 14 is being rejected for the same reason as the claim 3.
Regarding to claim 4, Barham, as modified in view of Xu teaches the neural network processing system of claim 3, wherein the host computer system is further configured with software that when executed causes the host computer system to: input descriptions of the neural network accelerators (Barham, [Par.0025], “Representing a neural network as a computational graph provides for a flexible and granular way to efficient implement the neural network, especially if the operations for the neural network are distributed across multiple devices with different hardware profiles” and also see [0026-002], “…Representing a neural network as a computational graph provides for a flexible and granular way to efficient implement the neural network, especially if the operations for the neural network are distributed across multiple devices with different hardware profiles… For example, the request can identify a computational graph representing an inference for a particular neural network and can identify an input on which the inference should be performed.” Identify an operation of the neural network accelerator based on the input data.);
and partition the input data set into the input data matrices based on the descriptions of the neural network accelerators (Barham, [Par .0034 –Par.0035], “For example, the request can identify a computational graph representing an inference for a particular neural network and can identify an input on which the inference should .
Regrading to claim 15 is being rejected for the same reason as the claim 4.
Regarding to claim 5, Barham, as modified in view of Xu teaches the neural network processing system of claim 4, wherein: the plurality of neural network accelerators include at least a first neural network accelerator and a second neural network accelerator (Barham, [Fig.1, Par. 0033], “Any devices performing neural network operations, e.g., devices 116-122, can include a memory, e.g., a random access memory (RAM), for storing instructions and data and a processor for executing stored instructions. Generally, each device is a hardware resource that performs operations independent of other devices. For example, each device can have its own processing unit. The devices can be Graphical Processing Units (GPUs), Central Processing Units (CPUs), or other accelerators. By way of illustration, one machine can ;
and the respective input data matrix written to the respective RAM that is coupled to the first neural network accelerator is larger than the input data matrix written to the respective RAM that is coupled to the second neural network accelerator (Barham, [Par.0071], “…If a first and second operation require a respective estimated amount of memory less than or equal to the amount of memory that will be free, the system selects the operation that maximizes usage of the amount of memory that will be freed. In other words, in this case, the system determines the node representing the selected operation as the first unassigned node. An example system does not enqueue an operation on the stream until it can determine which regions of accelerator memory will be used to hold the temporary working space and outputs of the operation. In the event that memory is scarce, an example system may choose to enqueue operations that require smaller amounts of memory to execute or to preferentially enqueue operations that will consume large input tensors allowing them to be deallocated.” Examiner’s note, based on the memory is scare, operations require smaller amounts of memory to execute. Therefore, the input matrix of the first neural network accelerator is greater than the next accelerator (second accelerator)).
Regrading to claim 16 is being rejected for the same reason as the claim 5.
Regarding to claim 17., Barham, as modified in view of Xu teaches the neural network processing system of claim 14, wherein the subsets of neural network operations in the work requests are independent of one another (Barham, [Par.0033], ““Any devices performing neural network operations, e.g., devices 116-122, .
Regarding to claim 7, Barham, as modified in view of Xu teaches the neural network processing system of claim 3, wherein the host computer system is further configured with software that when executed causes the host computer system to: and wait to write each un-submitted work request of the work requests to a RAM of the plurality of RAMs until completion of a previously submitted work request that specifies a neural network operation on which a neural network operation of the un- submitted work request depends (Barham, [Par.0037], “…the executor 106 can generate an appropriate response to the request, e.g., an output or an indication that the processing has been completed. Then, the executor 106 can return the response to the client 102. Although FIG. 1 illustrates one executor 106, in one implementation, there is an executor per device. This executor issues operations to the device when they become runnable (i.e., all of their inputs have been computed). This implementation also has a graph manager that partitions a graph to run on multiple .
Regrading to claim 18 is being rejected for the same reason as the claim 7.
Regarding to claim 10, Barham, as modified in view of Xu teaches the neural network processing system of claim 1, wherein the host computer system is further configured with software that when executed causes the host computer system to specify dependencies between events of the neural network accelerators in writing the input data matrices and work requests to the RAMs (Xu, [Fig.6, Par 0126], “…The write routine may be implemented with a direct memory access (DMA) method to achieve high bandwidth access to the accelerator. At ( c ), the software sends instructions to set the initialize register ( a register for initial­ization) and starts to write the initialized data onto the accelerator hardware with the DMA write. At (d), the software continues to poll or wait for the "DMA write done" register (a register to indicate the status of "DMA write done") until the register is pulled up by hardware. At ( e ), the software pulls up the round-start register to declare a new round.” Wherein the initialized data is considered as input data.).

Accordingly, it would have been prima facie obvious to one of the ordinary skills in the art before the effective filing date of the claimed invention to have modified Barham’s method, in view of Xu by having a module to write the input data to the RAMs. The modification would have been obvious because one of the ordinary skills in art would be motivated to achieve high bandwidth access to the accelerator, which will help to improve a performance speed and reduce a cost. (Xu, [Par.0007], “Disclosed are accelerator systems and methods that utilize FPGA technology to achieve better parallelism and processing speed. An FPGA-based accelerator has a Field Programmable Gate Array (FPGA) configured to have a hardware logic performing computations associated with a neural network training algorithm, especially a web relevance ranking algorithm such as LambaRank. The training data is first processed and organized by a host computing device, and then streamed to the FPGA for direct access by the FPGA in order to perform high-bandwidth computation with increased training speed. Thus, large data sets such as that related to web relevance ranking can be processed. The FPGA may include a processing element performing computations of a hidden layer of the neural network training algorithm. Parallel computing may be realized using a single instruction multiple data streams (SIMD) architecture with multiple arithmetic logic units in the FPGA.).
Regarding to claim 11, Barham, as modified in view of Xu teaches the neural network processing system of claim 1, wherein the plurality of neural network accelerators are disposed on a plurality of integrated circuit dies (Barham, [Par.0077], “The processes and logic flows described in this specification can be performed by one or more program­mable computers executing one or more computer programs to perform functions by operating on input data and gener­ating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field pro­grammable gate array) or an ASIC (application-specific integrated circuit).” Furthermore, see [Par.0002], “This specification relates to processing computa­tional graphs representing neural networks using an accel­erator device, e.g., a graphical processing unit (GPU).”).
Regrading to claim 20 is being rejected for the same reason as the claim 11.
Regarding to claim 12, Barham, as modified in view of Xu teaches the neural network processing system of claim 1, wherein the plurality of neural network accelerators have respective arrays of multiplier-accumulator having different dimensions (Barham, [Par. 0077], “The processes and logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).” Furthermore, see [Par. 0064], “The system can assign a particular node to a stream based on an amount of memory resources consumed by the node or by previously assigned nodes. For example, the system can calculate a dimension of a tensor on each directed edge to and from each node of the subgraph. The dimensions of the .

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
The prior art made of record on the PTO-892 and not relied upon is considered pertinent to applicant’s disclosure.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to EM N TRIEU whose telephone number is (571)272-5747.  The examiner can normally be reached on 7:30 - 5:00 M_TH.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571) 272-9767.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/E.T./Examiner, Art Unit 2126                                                                                                                                                                                                        
/BABOUCARR FAAL/Primary Examiner, Art Unit 2184