DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

This action is in response to response filed on 8/25/2021.  This action is Non-Final.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-7, 9-15, 17-19 and 22-23 are rejected under 35 U.S.C. 103 as being unpatentable over Awan et al. (OC-DNN: Exploiting Advanced Unified Memory Capabilities in CUDA 9 and Volta GPUs for Out-Of-Core DNN Training”), further in view of Rao et al. (US 2014/0049548 A1) and Riguer et al. (US 2016/0162190 A1).
As per claim 1, Awan et al. teaches the invention as claimed including, “A computer-implemented method comprising:
receiving, by a host, a call from an application;
determining that the call includes a device allocation command, wherein the device allocation command is configured to allocate a set of data on a graphical processing unit (GPU);
intercepting the call;
initiating an alternate data allocation command, wherein the alternate data allocation command allocates coherent memory, wherein the coherent memory is located on the GPU, and data in the coherent memory is accessible to the host thorough an address translation service located on the host;
Awan et al. teaches an interception library (IL).  Any existing application (that uses cudaMalloc) can transparently leverage the proposed out-of-core design schemes without any change to the code.  Legacy applications need some interception or modification to take advantage of the current managed-memory interface.  The interception library intercepts CUDA memory allocation and management calls (determines allocation command) and redefines their behavior (alternate data allocation)(Address Translation Service), by using cuMemAllocManaged instead of cuMemAlloc inside of the IL code (Section: V. PROPOSED DESIGN: OC-DNN FRAMEWORK, B. Interception Library (IL) for Rapid experimentation).  This can further be seen in figure 1.   Figure 1 shows CUDA Applications that send requests to the Legacy CUDA interface.  The Proposed Interception Library shown in figure 1 is able to intercept CUDA memory allocations and management calls from the Legacy CUDA interface and redefine their behavior. Therefore the redefined behavior will now be performed instead of The proposed primitive allows CUDA kernels to directly access managed buffers (Section: V. PROPOSED DESIGN: OC-DNN FRAMEWORK, A).  Also see figure 2 a and b.  The unified memory buffer is allocated using cudaMallocManaged interface.  This buffer (unified managed virtual memory in figure 2b) is accessible to both the file system (host) as well as the CUDA kernels that need to operate on this data (VI. Design Details: Out-of-Core Caffe (OC-Caffe), A. OC-Caffe Basic Design, 1).  
Awan et al. does not explicitly appear to teach, 
“wherein the alternate data allocation command allocates data to a coherent memory”
“,wherein the coherent memory is located on the GPU, and data in the coherent memory is accessible to the host thorough an address translation service located on the host;
completing the alternate data allocation command; and
returning the completed alternate data allocation command to the application.”
Rao et al. teaches a unified memory architecture in which the CPU and GPU share the same physical memory (coherent memory).  Virtual memory address space of the CPU is mapped to the same physical memory pages (coherent memory) as the graphics virtual memory address space of the GPU.  The CPU and GPU are physically located on the same die.  Thus the CPU and GPU share the data contained within the physical memory (coherent memory) (0012-0013).  The virtual memory address forms the CPU page table and the graphics virtual memory address forms the GPU page table, that are mapped to the physical Data may be shared between the GPU and CPU (host) without copying the data form the CPU to GPU or vice versa (0032).  The memory management unit 126 (address translation service) is located on the host 100 as shown in figure 1. 
Rao et al. further teaches, the GPU is configured to perform any number of graphics operations within the computing device (0022).  An operation is offloaded to a GPU.  The GPU processes the operation and alerts to the host that the operation is complete.  After completion, the output of the GPU may be processed by the CPU (0049-0051).
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Awan et al. with Rao et al. because both teach the mapping/translating of calls to a unified memory architecture using unified virtual memory, wherein the calls now perform direct access to unified memory.  Rao et al. teaches a unified memory architecture in which the CPU and GPU share the same physical memory (coherent memory).  Virtual memory address space of the CPU is mapped to the same physical memory pages (coherent memory) as the graphics virtual memory address space of the GPU.  Thus the CPU and GPU share the data contained within the physical memory (coherent memory)  (0012-0013).  Data may be shared between the GPU and CPU (host) without copying the data form the CPU to GPU or vice versa (0032). Therefore the unified memory of Rao et al. is coherent memory and it would have been obvious to one of ordinary skill in the art for the unified memory of Awan et al. to also be coherent memory.  This will allow both the host GPU to share memory without performing a copy of that data.  Awan et al. teaches an interception library (address translation service) that intercepts and redirects calls.  However Awan et al. does not explicitly appear to teach, completing the alternate data allocation command, returning the completed alternate data allocation command, and the interception library being located on the host.  Rao et al. teaches that the virtual 
However neither Awan et al. nor Rao et al. explicitly teach that the coherent memory is located on the GPU.
Riguer et al. teaches memory management in graphics and compute application programming interfaces.  To take advantage of the functionality of a GPU, a program running on a CPU or other computing device may store data in a memory location dedicated for use by the GPU.  Such memory is referred to as GPU memory.  Such memory may be located on or off the GPU itself, on or off the graphics card or daughterboard incorporating the GPU, in a portion of CPU memory or main memory, on in another location depending upon the desired implementation (0020).
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Awan et al. and Rao et al. with Riguer et al. because all teach the use of a GPU to process data.  Both Awan et al. and Rao et al. teaches mapping/redirecting/translating memory calls to GPUs to unified memory in order to remove copying steps between the CPU and GPU.   However both do not explicitly appear to teach the memory being located on the GPU.  Riguet et al. teaches, storing data for use by the GPU in memory on the GPU.  Riquet et 
As per claim 2, Awan et al. teaches, “The method of claim 1, further comprising:
updating a software stack, wherein the updated software stack is configured to identify the device allocation command, intercept the data allocation command, and initiate the alternate data allocation command.”	Awan et al. teaches an interception library (IL).  Any existing application (that uses cudaMalloc) can transparently leverage the proposed out-of-core design schemes without any change to the code.  Legacy applications need some interception or modification to take advantage of the current managed-memory interface.  The interception library intercepts CUDA memory allocation and management calls and redefines their behavior (alternate data allocation), by using cuMemAllocManaged instead of cuMemAlloc inside of the IL code (Section: V. PROPOSED DESIGN: OC-DNN FRAMEWORK, B. Interception Library (IL) for Rapid experimentation).
Also see figure 1 showing the interception library in-between the CUDA Runtime/Drive and the HPC Platform layers.  Therefore the IL is added in-between the layers (software stack). 
As per claim 3, Awan et al. teaches, “The method of claim 2, wherein the application has a first state at a first time prior to updating, and a second state at a second time after the updating; and
wherein the first state and the second state are the same.”
Awan et al. teaches an interception library (IL).  Any existing application (that uses cudaMalloc) can transparently leverage the proposed out-of-core design schemes without any change to the code.  Legacy applications need some interception or modification to take advantage of the current managed-memory interface.  The interception library intercepts CUDA memory allocation and management calls and redefines their behavior (alternate data allocation), by using cuMemAllocManaged instead of cuMemAlloc inside of the IL code (Section: V. PROPOSED DESIGN: OC-DNN FRAMEWORK, B. Interception Library (IL) for Rapid experimentation).  Also see figure 1 showing the interception library in-between the CUDA Runtime/Drive and the HPC Platform layers.  Therefore the application is never changed so its state will always be the same. 
As per claim 4, Awan et al. further teaches, “The method of claim 1 where the alternate data allocation command includes a first command and a second command.”	Awan et al. teaches the performance of F2M to replace F2H + H2D, M2M to replace D2D and M2F to replace D2H + H2f (Section: V. PROPOSED DESIGN: OC-DNN FRAMEWORK, A. Categorization and Exploration of Primitives for Out-of-Core DNN training on GPUs).  Also see figure 2b.  Figure 2b shows the performance of all the operations.  Therefore multiple operations/command can be performed by the interception library for the intercepted memory allocations and management calls.  Figure 2b shows the F2H, H2D and D2H being replaced with F2M and M2M.  Therefore the alternate data allocation command includes 2 commands the F2M and M2M.
As per claim 5, Awan et al. further teaches, “The method of claim 4, wherein the first command allocates data on the host, and the second command allocates data on the GPU.”
Figure 2 B shows a File system to Management F2M allocation to unified memory of a CPU host and also shows a Managed source to managed destination M2M allocation to unified memory of the GPU device.  

As per claim 6, Awan et al. further teaches, “The method of claim 5, wherein the first command includes an mmap command, and the second command includes a mbind command.
	The examiner states that mmap and mbind commands are well known GPU commands.  It would have been an obvious design choice to substitute memory allocation commands for these well-known allocation commands. 
As per claim 7, Awan et al. further teaches, “The method of claim 5, wherein the first command includes a glibc malloc command, and the second command includes a cudaMemPrefetchAsync command.
Awan et al. teaches replacing cudaMalloc calls with cuMemAllocManaged.  Awan further teaches performing optimization like prefetching (Section: V. PROPOSED DESIGN: OC-DNN FRAMEWORK, B. Interception Library (IL) for Rapid experimentation).  Also see Section VI B. 1.
	The examiner states that glibc malloc and cudaMemPrefetchAsync commands are well known GPU commands.  It would have been an obvious design choice to substitute memory allocation commands for these well-known allocation commands.
As per claim 9, Awan et al. further teaches, “The method of claim 1, wherein the call is a cudaMalloc call.”
Awan et al. teaches an interception library (IL).  Any existing application (that uses cudaMalloc) can transparently leverage the proposed out-of-core design schemes without any change to the code.  Legacy applications need some interception or modification to take advantage of the current managed-memory interface.  The interception library intercepts CUDA memory allocation and management calls and redefines their behavior (alternate data allocation), by using cuMemAllocManaged instead of cuMemAlloc inside of the IL code (Section: V. PROPOSED DESIGN: OC-DNN FRAMEWORK, B. Interception Library (IL) for Rapid experimentation)
As per claim 10, Awan et al. further teaches, “The method of claim 1, wherein the call is a cudaMallocManaged call.”
Awan et al. teaches intercepting CUDA memory allocation and management calls and redefining their behavior (Section: V. PROPOSED DESIGN: OC-DNN FRAMEWORK, B. Interception Library (IL) for Rapid experimentation). CudaMallocManage is a well known CUDA memory allocation and management call. Therefore it would have been an obvious design choice to substitute the call for the known memory allocation command.
As per claim 11, Awan et al. further teaches, “The method of claim 1, wherein the host transfers data to the GPU via a  NVLink.
Section III. Background, B. CUDA Unified Memory and Pacal/Volta GPUs, mentions the use of NVLINK for better performance for GPUs.
As per claim 12-15, claims 12-15 contains similar limitations to claims 1, 2, 4 and 5.  Therefore claims 12-15 are rejected for the same reasons as claims 1, 2, 4 and 5.
As per claims 17-19, claims 17-19 contain similar limitations to claims 1, 2, and 4.  Therefore claims 17-19 are rejected for the same reasons as claims 1, 2 and 4.
As per claim 22 (Amended), “The method of claim 1 


As per claim 23 (Amended)), “The method of claim 22 
Rao et al. teaches a unified memory architecture in which the CPU and GPU share the same physical memory.  Virtual memory address space of the CPU is mapped (translated) to the same physical memory pages as the graphics virtual memory address space of the GPU.  The CPU and GPU are physically located on the same die.  Thus the CPU and GPU share the data contained within the physical memory (0012-0013).  The examiner states that since the CPU and GPU are on the same die and share the same physical memory then the physical memory is located on the GPU.  The virtual memory address form the CPU page table and the graphics virtual memory address from the GPU page table are mapped to the physical memory pages of the surface via a translation procedure (address translation service) (0028-0029).  Also .

Response to Arguments
Applicant's arguments filed 8/25/2021 have been fully considered but they are not persuasive. Regarding claims 1, 12 and 17 please see above rejection with Awan et al. further in view of Rao et al. and Riguer et al.
As per claims 4, 14 and 9, applicant states that Awan et al. does not teach the limitation as claimed.  The examiner disagrees. Awan et al. teaches the performance of F2M to replace F2H + H2D, M2M to replace D2D and M2F to replace D2H + H2f (Section: V. PROPOSED DESIGN: OC-DNN FRAMEWORK, A. Categorization and Exploration of Primitives for Out-of-Core DNN training on GPUs).  Also see figure 2b.  Figure 2b shows the performance of all the operations.  Therefore multiple operations/command can be performed by the interception library for the intercepted memory allocations and management calls.  Figure 2b shows the F2H, H2D and D2H being replaced with F2M and M2M.  Therefore the alternate data allocation command includes 2 commands, the F2M and M2M.
Regarding claim 10 the applicant states that Awan does not teach intercepting the call, wherein the call is cumallocmanaged as required by the claims.  The examiner respectfully disagrees.  As stated above Awan et al. teaches intercepting CUDA memory allocation and management calls and redefining their behavior (Section: V. PROPOSED DESIGN: OC-DNN FRAMEWORK, B. Interception Library (IL) for Rapid experimentation).  Cumallocmanaged is a CUDA memory allocation and management call.  The examiner agrees that Awan et al. teaches specific embodiments in which other CUDA memory allocation and management calls are intercepted, however Awan et al. still teaches the use of cumallocmanaged and that it is a 
Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARK A GOORAY whose telephone number is (571)270-7805. The examiner can normally be reached Monday - Friday 10:00am - 6:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Lewis Bullock can be reached on 571-272-3759. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like 

assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/MARK A GOORAY/Examiner, Art Unit 2199                                                                                                                                                                                         

/LEWIS A BULLOCK  JR/Supervisory Patent Examiner, Art Unit 2199