DETAILED ACTION
Claims 1-20 are pending in this application.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

Use of the word “means” (or “step for”) in a claim with functional language creates a rebuttable presumption that the claim element is to be treated in accordance with 35 U.S.C. 112(f) (pre-AIA  35 U.S.C. 112, sixth paragraph).  The presumption that 35 U.S.C. 112(f) (pre-AIA  35 U.S.C. 112, sixth paragraph) is invoked is rebutted when the function is recited with 
Absence of the word “means” (or “step for”) in a claim creates a rebuttable presumption that the claim element is not to be treated in accordance with 35 U.S.C. 112(f) (pre-AIA  35 U.S.C. 112, sixth paragraph).  The presumption that 35 U.S.C. 112(f) (pre-AIA  35 U.S.C. 112, sixth paragraph) is not invoked is rebutted when the claim element recites function but fails to recite sufficiently definite structure, material or acts to perform that function. 
Claim elements in this application that use the word “means” (or “step for”) are presumed to invoke 35 U.S.C. 112(f) except as otherwise indicated in an Office action.  Similarly, claim elements that do not use the word “means” (or “step for”) are presumed not to invoke 35 U.S.C. 112(f) except as otherwise indicated in an Office action.
Claims 1-7 and 15-20 do not invoke 35 U.S.C. 112(f) because it recites defined or sufficient structure as described in the specification.
Claims 1-7 and 15-20 recite  “a first processor configure to...", "a second processor configure to…” and their respective functional languages and therefore meets two of the three prong analysis. 
However, claim 1-7 and 15-20  recites sufficiently definite structure because the structures (“…a first processor configure to...", "a second processor configure to…”) are described in the specification (See FIGs. 1 and 2, paragraphs 29 and 37-40) as structures for performing the respective functions and as such are not generic placeholder, (for instance “means to”, "means for", “module for" and the like) and therefore does not meet the third prong analysis and are presumed not to invoke 35 U.S.C. 112(f).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 8-10 and 15-17 are rejected under 35 U.S.C. 103 as being unpatentable over Das et al. (US 2018/0322606, Das hereinafter)   in view of Minsoo Rhu et al. “vDNN: Virtualized Deep Neural Networks for Scalable, Memory-Efficient Neural Network Design”, 14-Dec-2016, Minsoo hereinafter.

As to claim 1, Das teaches a system (See FIG. 2A, “200”) comprising: 
 	a first processor (e.g., “202”, FIG. 2A) configured to execute a runtime manager (e.g., “210”, FIG. 2A);  
 	5a second processor (e.g., “212”, FIG. 2A and “234”, FIG. 2C) configured to execute a neural network (e.g., para 137, “This is particularly useful for optimizing the training of neural networks, as the computations performed in adjusting the coefficients in neural networks lend themselves naturally to parallel implementations. Specifically, many machine learning algorithms and software applications have been adapted to make use of 
 	a heterogeneous memory subsystem (e.g., “222”, FIG. 2A)  comprising at least a first memory (e.g., “224A”, FIG. 2A)  and a second memory (e.g., “224”, FIG. 2A); 
 	wherein when executed by the first processor, the runtime manager is configured   
 	10to: 
 	 	manage buffer transfers (e.g., “buffers to allow the intermediate data to be transmitted “) between the first memory and the second memory asynchronously with respect to layer execution during training of the neural network (e.g., see FIG. 15A , para 203, “the communication module 1517 includes logic to ensure forward progress of distributed compute operations by enabling asynchronous communication between processing nodes. The asynchronous communication enabled by the communication module 1517 allows overlapping compute and communication operations that efficiently interleave to optimize both compute and communication efficiency and throughput”, for “Intermediate data produced by one or more of the clusters 214A-214N may be stored in buffers to allow the intermediate data to be transmitted between clusters 214A-214N for further processing” in para 54); 
 	Das teaches further a forward propagation pass (e.g.,  See FIG 9A-9B, para 151, and 155,  “The figures described below present exemplary feedforward, CNN, and RNN networks, as well as describe a general process for respectively training and deploying each of those types of networks” and “Forward propagation is performed independently on each node” in para 188); and  15determine how to transfer buffers between the first memory and the second memory during a backward propagation pass (e.g., para 192, “During back 
 	wherein the system is configured to deploy a trained neural network (e.g., “complex neural networks “) to generate a classification of a first dataset (e.g., para 179, “deployed machine learning platforms generally include lower power parallel processors suitable for use in products such as cameras, autonomous robots, and autonomous vehicles” for “datasets that define the appropriate responses to specific training input. The parallel processors described herein can enable rapid training of the increasingly complex neural networks used for autonomous driving solutions and enables the deployment of low power inferencing processors in a mobile platform suitable for integration into autonomous vehicles” in para 176. Also, see FIG. 11).  
 	However, Das does not teach monitor buffer usage, determine how to transfer buffers between the first memory and the second memory during a backward propagation pass based on monitored buffer usage during the forward propagation pass.
 	Minsoo teaches monitor buffer usage (e.g., “a runtime memory manager called vDNN”) during a forward propagation pass (e.g., “forward and backward propagation algorithms”) ; determine how to transfer buffers between the first memory and the second 


As to claim 2, Das teaches wherein a plurality of buffers (e.g., “buffers”) utilized by the neural network are activation buffers (e.g., para 54, “one or more of the clusters 214A-214N may be stored in buffers to allow the intermediate data to be transmitted between clusters 214A-214N for further processing” and “to allocate buffers for gradient with respect to input activation” in para 236).  

As to claim 3, Das teaches wherein when executed by the first processor, the 25runtime manager is further configured to: maintain one or more tables (e.g., “a set of page table entries (PTEs)”) to track: an order of buffer usage by the neural network (e.g., para 68, “a set of page table entries (PTEs) used to map a virtual address to a physical address of a tile and optionally a cache line index. The MMU 245 may include address translation lookaside buffers (TLB) or caches that may reside within the graphics multiprocessor 234 or the L1 cache or processing cluster 214”); transfer status of buffers between the first memory and the second memory ; and pending usage of buffers (e.g., para 68, “The cache line index may be used to determine whether a request for a cache line is a hit or miss”); and determine when to transfer buffers between the first memory and the second memory based on entries stored in the one or more tables (e.g., para 193, “to transfer data for distributed training of a neural network for machine learning operations”, see FIG. 14A-14E) .

As to claim 8, see rejection of claim 1 above. 

As to claims 9-10, see rejection of claims 2-3 above.

  
As to claim 15, see rejection of claim 1 above. Dias teaches further an apparatus (See FIG. 2A).

.

Claims 4-6, 11-13 and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Das et al. (US 2018/0322606, Das hereinafter)   in view of Minsoo Rhu et al. “vDNN: Virtualized Deep Neural Networks for Scalable, Memory-Efficient Neural Network Design”, 14-Dec-2016, Minsoo hereinafter,   as applied to claims 3, 10 and 17 above, and further in view of Beard et al. (US 2019/0303143, Beard hereinafter).

As to claim 4, Das teaches further wherein the one or more tables comprise a push table (e.g., para 55, “processing cluster array 212 is configured to a valid state before the workload specified by incoming command buffers (e.g., batch-buffers, push buffers, etc.) is initiated” and “The MMU 245 includes a set of page table entries (PTEs) used to map a virtual address to a physical address of a tile and optionally a cache line index” in para 68. Also, see FIG. 14C and 14D). However, Das and Minsoo do not teach a push table,  a pop table, and an order table .  Beard teaches a pop table, and an order table (See FIGs. 2 and 3 and “the instruction window may consist of all instructions which are in a re-order buffer (ROB)” , “POP instructions access one or more input FIFOs 514, while PUSH instructions access one or more output FIFOs 516. Write-back unit 518 may be used to return results in output registers (such as the result of the MUL operation) back to the reorder buffer 508” in para 47). Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method of Das and Minsoo by adopting the teachings of Beard  to have  “ a data processor that can perform efficient vector processing in a multi-thread execution environment” (See Beard, Abstract).


As to claim 5, Das and Minsoo do not teach wherein when executed by the first processor, the runtime manager is further configured to:  10clear a completion indicator for each new entry added to the push table; pick an oldest entry from the push table with a cleared completion indicator to initiate a buffer transfer of a corresponding buffer from the first memory to the second memory, wherein the buffer transfer is initiated during execution of a given layer of the plurality of layers;  15set the completion indicator for the oldest entry responsive to the buffer transfer completing; and allocate a buffer to corresponding locations of the oldest entry responsive to the completion indicator being set and responsive to an occupancy level of the first memory being above a threshold. However, Beard teaches wherein when executed by the first processor, the runtime manager is further configured to:  10clear a completion indicator for each new entry added to the push table; pick an oldest entry from the push table with a cleared completion indicator to initiate a buffer transfer of a corresponding buffer from the first memory to the second memory , wherein the buffer transfer is initiated during execution of a given layer of the plurality of layers;  15set the completion indicator for the oldest entry responsive to the buffer transfer completing (e.g., see FIG. 6 and 7, para 59-60, “Success is achieved when a data path from vector input to vector output has been completed. When the data path is interrupted for any reason (for example, an input FIFO becomes empty) the entry in column 626 is set to zero”, “ the POP_QR instruction writes to a zero flag in the processor. This indicates an "end-of-input" condition on the POP_QR instructions which can be tested with the "B. EQ" instruction. This allows the microarchitecture to handle the terminating case of the loop appropriately” and “When the PUSH_QR instruction is reached at time 4, the data-

As to claim 6, Das and Minsoo do not teach wherein when executed by the first processor, the runtime manager is further configured to cause a kernel to stall if the kernel is issued for execution and there is not enough free capacity in the first memory to store an output buffer of the kernel. However, Beard teaches wherein when executed by the first processor, the runtime manager is further configured to cause a kernel to stall if the kernel is issued for execution and there is not enough free capacity in the first memory to store an output buffer of the kernel (e.g. para 63, “The POP_QR_V sets the `zero` flag when the FIFO includes less elements than those in the vector. In this case, the POP_QR_V instruction does not remove elements from the FIFO unless it can remove all elements to fill the vector capacity. The code at pop_failed_1 and pop_failed_2 executes the original scalar code to process the elements remaining in the FIFO.”). Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method of Das and Minsoo by adopting the teachings of Beard  to have  “ a data processor that can perform efficient vector processing in a multi-thread execution environment” (See Beard, Abstract).

As to claim 7, Das teaches further wherein when executed by the first processor, the runtime manager is configured to manage buffer transfer between the first memory and the second memory during a backward propagation iteration based on the plurality of entries (see rejection of claim 1 above). However, Das and Minsoo do not teach   wherein one or more tables maintained by the runtime manager comprise a pop table and an order table, wherein the pop table and the order table are populated with a plurality of entries during a forward propagation iteration. Beard teaches wherein one or more tables maintained by the runtime manager comprise a pop table and an order table, wherein the pop table and the order table are populated with a 

As to claims 11-14, see rejection of claims 1-7 above.
As to claims 18-20, see rejection of claims 2-6 above.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ABDOU K SEYE whose telephone number is (571)270-1062. The examiner can normally be reached M-F 9-5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Dennis Chow can be reached on 5712727767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ABDOU K SEYE/Examiner, Art Unit 2194                                                                                                                                                                                                        


/CHARLES E ANYA/Primary Examiner, Art Unit 2194