Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Minsoo Rhu et al. “vDNN: Virtualized Deep Neural Networks for Scalable, Memory-Efficient Neural Network Design” Minsoo hereinafter , 2016 in view of Sumeet S. Kumar et al. “Low Overhead Message Passing for High Performance Many-Core Processors”, Sumeet hereinafter,  2013.

As to claim 1, Minsoo teaches a system comprising: 
a processor (e.g., see abstract  wherein “a runtime memory manager that virtualizes the memory usage of DNNs such that both GPU and CPU memory can simultaneously be utilized for training larger DNNs. Our virtualized DNN (vDNN) reduces the average GPU memory usage of AlexNet by up to 89%,” Yhus, a system comprising: a processor would have been inherent )  and; wherein  the  processor is configured to  manage buffer transfers between a first memory (e.g., “CPU memory “) and  a second memory (e.g., “GPU memory”) in a heterogeneous memory subsystem (e.g., “the GPU memory”,  “CPU memory”)  during training of a neural network  (e.g., “vDNN memory manager “) comprising a plurality of layers (e.g., see right column of page 2, “I. INTRODUCTION “, “vDNN”,  “….releases or moves these intermediate data between GPU and CPU memory”, “vDNN either 1) aggressively releases these feature maps from the GPU memory if no further reuse exists, or 2) offloads (and later prefetches) to (from) CPU memory if further reuse does exist but is not immediately required”, “vDNN memory manager intelligently overlaps the normal DNN computations with the offload/prefetch/release operations”, see Fig. 2, “An optional temporary buffer, called workspace in cuDNN [8] (yellow arrow, WS)”), wherein to manage the buffer transfers the processor is configured to: 
 	allocate space (e.g., “temporary buffer, called workspace”)  in the second memory for each buffer used as either an input or an output during processing of a current layer of the plurality of layers (e.g., see  FIG. 2,  “Memory allocations “, “An optional temporary buffer, called workspace in cuDNN [8] (yellow arrow, WS)” for “Input image batch”. Also, see Fig. 3); 
 	store, in a first data structure (e.g., one of the “data structures”), an identification of a first buffer ( e.g., one of “WS”, FIG. 2   )  to be transferred from the second memory to the first memory, in response to detecting the first buffer is used as an input during processing (e.g.,  “Input image batch”,  FIG. 2, “data structures” for “WS” of “Layer (1), “during backward propagation. An optional temporary buffer, called workspace in cuDNN [8] (yellow arrow, WS), is needed in certain convolutional algorithms”); 
 	store, in a second data structure (e.g., another one of “data structures”), an identification of the first buffer and a second buffer (e.g., another one of “WS” of “layer”, FIG. 2), in response to detecting both the first buffer and the second buffer are used as either an input or an output during the processing (e.g., see FIG. 2, “data structures” for “WS” of “Layer (1), “during backward propagation. An optional temporary buffer, called workspace in cuDNN [8] (yellow arrow, WS), is needed in certain convolutional algorithms” for “Input image batch”, FIG. 2).
 	 However, Minsoo does not teach transfer the first buffer from the second memory to the first memory , in response to determining the first buffer corresponds to an oldest entry in the first data structure.
Sumeet teaches transfer the first buffer from the second memory to the first memory (e.g.,  see pages 346-347, “Data blocks are moved between tile-local memories using hardware managed MPBs over the R3 network-on-chip interconnect” for “Data blocks are moved between tile-local memories using hardware managed MPBs over the R3 network-on-chip interconnect”, see FIG. 1, “Fig. 2. NagaM tile containing a ρ-VEX processing element, local memories, Pronto message passing interface and a network interface”) , in response to determining the first buffer corresponds to an oldest entry (e.g., the oldest waiting message entry”)  in the first data structure (e.g., . see page 347, “Fig. 3. Illustration of buffer management and message ordering in the Message Passing Buffer (MPB)”,  “Data blocks are moved between tile-local memories using hardware managed MPBs over the R3 network-on-chip interconnect”,  “A. Buffer Management”, “data can actually be transmitted, it is essential for the sending node”, “A pointer indicates the oldest waiting message entry in the table,”, “to the upstream node indicating that the transfer may commence” and “ensuring that the oldest received block is popped from the buffer when requested by the executing task” in right column of page 348). Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method of Minsoo by adopting the teachings of Sumeet to “ensure that performance critical data move in a direction orthogonal to potentially contentious interconnect traffic” (See Sumeet, VI. CONCLUSION).

As to claim 2, Minsoo teaches, wherein the first memory has a lower bandwidth than the second memory (e.g.,  see page 8,  wherein “B. GPU Node Topology We conducted experiments on NVIDIA’s Titan X [40], which provides the highest math throughput (single precision throughput of 7 TFLOPS), memory bandwidth (max 336 GB/sec), and memory capacity (12 GB) in the family of Maxwell GPUs. The GPU communicates with an Intel i7- 5930K (containing 64 GB of DDR4 memory) via a PCIe switch (gen3), which provides a maximum 16 GB/sec data transfer bandwidth.”. Thus, wherein the first memory has a lower bandwidth than the second memory would have been inherent and  well known)


As to claim 3, Minsoo does not teach, wherein the processor is configured to transfer the first buffer from the second memory to the first memory in further response to determining transfer of the first buffer has not yet been completed . However, Sumeet teaches  wherein the processor is configured to transfer the first buffer from the second memory to the first memory in further response to determining transfer of the first buffer has not yet been completed  (e.g., see right column of page 347,  wherein “In the event of insufficient MPB space, the corresponding envelope is buffered until the requested space becomes available.” and  “The MP send function, on the other hand, is nonblocking except for when the local MPB’s output buffer is full in addition to the downstream MPB’s input buffer. In this case, execution is stalled by clock-gating the local PE” in right column of  page 348). Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method of Minsoo by adopting the teachings of Sumeet to “ensure that performance critical data move in a direction orthogonal to potentially contentious interconnect traffic” (See Sumeet, VI. CONCLUSION).

As to claim 4,  Minsoo  teaches wherein the processor is configured to transfer the first buffer subsequent  (e.g., “linear networks “) to completing processing of the current layer (e.g., see page 3, “Fig. 2: Memory allocations required for linear networks using the baseline memory manager (bold arrows).”  “Forward Propagation. Forward propagation is performed from the first (input) layer to the last (output) layer”) .  

As to claim 5, Minsoo does not  explicitly  teach wherein the first data structure comprises a plurality of entries with each of the plurality of entries being configured to store an identification of a given buffer and an indication as to whether 3 / 13Application Serial No. 16/194,958 - Filed November 19, 2018 transfer of the given buffer has been completed . However,  Sumeet teaches  wherein the first data structure comprises a plurality of entries  (e.g.,  see page 347,   “Fig. 3. Illustration of buffer management and message ordering in the Message Passing Buffer (MPB) “) with each of the plurality of entries being configured to store an identification of a given buffer (See FIGs. 2 and 3)  and an indication as to whether 3 / 13Application Serial No. 16/194,958 - Filed November 19, 2018 transfer of the given buffer has been completed ( e.g., See FIGs. 2 and 3, page 346, “Each task executes asynchronously on a ρ-VEX PE upon its input data becoming ready, and produces data that similarly triggers the next task in the process network”). Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method of Minsoo by adopting the teachings of Sumeet to “ensure that performance critical data move in a direction orthogonal to potentially contentious interconnect traffic” (See Sumeet, VI. CONCLUSION).

As to claim 6, Minsoo does not  explicitly  teach  wherein the processor is configured to transfer the first buffer from the second memory to the first memory in further response to determining an occupancy level of the second memory is above a threshold. However, Sumeet teaches wherein the processor is configured to transfer the first buffer from the second memory to the first memory in further response to determining an occupancy level of the second memory is above a threshold (e.g., see left column of page 348, “Multiple tasks communicating concurrently with a downstream task would result in the latter’s MPB being inundated with only parts of messages, necessitating a buffer of a larger capacity”). Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method of Minsoo by adopting the teachings of Sumeet to “ensure that performance critical data move in a direction orthogonal to potentially contentious interconnect traffic” (See Sumeet, VI. CONCLUSION).


As to claim 7, Minsoo  teaches wherein the processor is configured to generate a classification based on the neural networkextraction layers so that a deep hierarchy of features are trained for robust image classification”).

As to claim 8, see rejection of claim 1 above. 

As to claims 9-14, see rejection of claims 2-7 above. 

As to claim 15, see rejection of claim1 above. Minsoo  teaches further an apparatus comprising: a first processor configured to execute a runtime manager and  a second processor configured to execute a neural network comprising a plurality of executable layers.   wherein when executed by the first processor, the runtime manager is configured to (e.g., see abstract, wherein “a runtime memory manager that virtualizes the memory usage of DNNs such that both GPU and CPU memory can simultaneously be utilized for training larger DNNs. Our virtualized DNN (vDNN) reduces the average GPU memory usage of AlexNet by up to 89%” . Thus, an apparatus comprising: a first processor configured to execute a runtime manager and  a second processor configured to execute a neural network comprising a plurality of executable layers.   wherein when executed by the first processor, the runtime manager is configured to would have been inherent).

As to claims 16-17 and 19, see rejection of claims 2-3 and 5 above.. 

As to claim 18, Minsoo does not   teach  wherein the first data structure is a push table and the second data structure is a pop table. However, Sumeet teaches wherein the first data structure is a push table (e.g., see page 47, “Fig. 3. Illustration of buffer management and message ordering in the Message Passing Buffer (MPB)”) and the second data structure is a pop table  (e.g., see right column of  page 348 , “B. Ordering of Messages at Destination The buffer manager preserves the entry order of incoming data blocks using the status table, ensuring that the oldest received block is popped from the buffer when requested by the executing task”). Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method of Minsoo by adopting the teachings of Sumeet to “ensure that performance critical data move in a direction orthogonal to potentially contentious interconnect traffic” (See Sumeet, VI. CONCLUSION).

As to claim 20, Minsoo does not teach  wherein when executed by the first processor, the runtime manager is further configured to cause a kernel to stall if the kernel is issued for execution and there is not enough free capacity in the first memory to store a buffer . However, Sumeet teaches wherein when executed by the first processor, the runtime manager is further configured to cause a kernel to stall if the kernel is issued for execution and there is not enough free capacity in the first memory to store a buffer (e.g., see right column of  page 348, “The MP send function, on the other hand, is nonblocking except for when the local MPB’s output buffer is full in addition to the downstream MPB’s input buffer. In this case, execution is stalled by clock-gating the local PE. Proper load-balancing of tasks to ensure that they incur similar execution times minimizes the occurrence of such buffer overflow/underflow related stalls.”). Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method of Minsoo by adopting the teachings of Sumeet to “ensure that performance critical data move in a direction orthogonal to potentially contentious interconnect traffic” (See Sumeet, VI. CONCLUSION).

Response to Arguments
Applicant argues  that:
“Neither does the remaining cited art disclose or suggest such features. Accordingly, claim 1 is patentable over the cited art for at least the above reasons. As each of claims 8 and 15, as amended, include similar features”. 
 	    Applicant’s arguments have been considered but are moot in view of new ground rejection based on Steinberg Minsoo Rhu et al. “vDNN: Virtualized Deep Neural Networks for Scalable, Memory-Efficient Neural Network Design” and  Sumeet S. Kumar et al. “Low Overhead Message Passing for High Performance Many-Core Processors”.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to ABDOU K SEYE whose telephone number is (571)270-1062. The examiner can normally be reached M-F 9-5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hyung SOUGH can be reached on 5712726799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ABDOU K SEYE/Examiner, Art Unit 2194                                                                                                                                                                                                        
/S. SOUGH/SPE, AU 2192/2194