DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

This action is in response to response filed on 2/8/2021.  This action is Non-Final.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 1 recites the limitation "the alternate command"  in line 7 of the claim.  There is insufficient antecedent basis for this limitation in the claim.
Claim 1 recites the limitation "the process data"  in line 10 of the claim.  There is insufficient antecedent basis for this limitation in the claim.
Claim 12 recites the limitation "the alternate command"  in line 11 of the claim.  There is insufficient antecedent basis for this limitation in the claim.
Claim 12 recites the limitation "the process data"  in line 14 of the claim.  There is insufficient antecedent basis for this limitation in the claim.
Claim 17 recites the limitation "the process data"  in line 12 of the claim.  There is insufficient antecedent basis for this limitation in the claim.
17 recites the limitation "the alternate command"  in line 11 of the claim.  There is insufficient antecedent basis for this limitation in the claim.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-7, 9-15 and 17-19 are rejected under 35 U.S.C. 103 as being unpatentable over Awan et al. (OC-DNN: Exploiting Advanced Unified Memory Capabilities in CUDA 9 and Volta GPUs for Out-Of-Core DNN Training”)
As per claim 1, Awan et al. teaches the invention as claimed including, “A computer-implemented method comprising:
receiving, by a host, a call from an application;
determining that the call includes a device allocation command, wherein the device allocation command is configured to allocate a set of data on a graphical processing unit (GPU);

initiating an alternate data allocation command, wherein the alternate command allocate data to a coherent memory;
completing the alternate data allocation command; and
returning the processed data to the application.
Awan et al. teaches an interception library (IL).  Any existing application (that uses cudaMalloc) can transparently leverage the proposed out-of-core design schemes without any change to the code.  Legacy applications need some interception or modification to take advantage of the current managed-memory interface.  The interception library intercepts CUDA memory allocation and management calls (determines allocation command) and redefines their behavior (alternate data allocation), by using cuMemAllocManaged instead of cuMemAlloc inside of the IL code (Section: V. PROPOSED DESIGN: OC-DNN FRAMEWORK, B. Interception Library (IL) for Rapid experimentation).  This can further be seen in figure 1.   Figure 1 shows CUDA Applications that send request to the Legacy CUDA interface.  The Proposed Interception Library shown in figure 1 is able to intercept CUDA memory allocations and management calls from the Legacy CUDA interface and redefine their behavior. Therefore the redefined behavior will now be performed instead of the original intercepted CUDA memory allocation and management calls.  File system to manage memory (F2M) replaces F2H + H2D.  Therefore a copy step is removed and the new memory allocation F2M is allocating to unified memory which is coherent memory.  The proposed primitive allows CUDA kernels to directly access managed buffers (coherent memory) (Section: V. PROPOSED DESIGN: OC-DNN FRAMEWORK, A).  Also see figure 2 a and b.  The unified memory buffer is allocated using cudaMallocManaged interface.  This buffer(unified managed virtual memory in figure 2b) is 
As per claim 2, Awan et al. teaches, “The method of claim 1, further comprising:
updating a software stack, wherein the updated software stack is configured to identify the device allocation command, intercept the data allocation command, and initiate the alternate data allocation command.”	Awan et al. teaches an interception library (IL).  Any existing application (that uses cudaMalloc) can transparently leverage the proposed out-of-core design schemes without any change to the code.  Legacy applications need some interception or modification to take advantage of the current managed-memory interface.  The interception library intercepts CUDA memory allocation and management calls and redefines their behavior (alternate data allocation), by using cuMemAllocManaged instead of cuMemAlloc inside of the IL code (Section: V. PROPOSED DESIGN: OC-DNN FRAMEWORK, B. Interception Library (IL) for Rapid experimentation).
Also see figure 1 showing the interception library in-between the CUDA Runtime/Drive and the HPC Platform layers.  Therefore the IL is added in-between the layers (software stack). 
As per claim 3, Awan et al. teaches, “The method of claim 2, wherein the application has a first state at a first time prior to updating, and a second state at a second time after the updating; and
wherein the first state and the second state are the same.”
existing application (that uses cudaMalloc) can transparently leverage the proposed out-of-core design schemes without any change to the code.  Legacy applications need some interception or modification to take advantage of the current managed-memory interface.  The interception library intercepts CUDA memory allocation and management calls and redefines their behavior (alternate data allocation), by using cuMemAllocManaged instead of cuMemAlloc inside of the IL code (Section: V. PROPOSED DESIGN: OC-DNN FRAMEWORK, B. Interception Library (IL) for Rapid experimentation).  Also see figure 1 showing the interception library in-between the CUDA Runtime/Drive and the HPC Platform layers.  Therefore the application is never changed so its state will always be the same. 
As per claim 4, Awan et al. further teaches, “The method of claim 1 where the alternate data allocation command includes a first command and a second command.”	Awan et al. teaches the performance of F2M to replace F2H + H2D, M2M to replace D2D and M2F to replace D2H + H2f (Section: V. PROPOSED DESIGN: OC-DNN FRAMEWORK, A. Categorization and Exploration of Primitives for Out-of-Core DNN training on GPUs).  Also see figure 2b.  Figure 2b shows the performance of all the operations.  Therefore multiple operations/command can be performed by the interception library for the intercepted memory allocations and management calls.
As per claim 5, Awan et al. further teaches, “The method of claim 4, wherein the first command allocates data on the host, and the second command allocates data on the GPU.”
Figure 2 B shows a File system to Management F2M allocation to unified memory of a CPU host and also shows a Managed source to managed destination M2M allocation to unified memory of the GPU device.  
As per claim 6, Awan et al. further teaches, “The method of claim 5, wherein the first command includes an mmap command, and the second command includes a mbind command.
	The examiner states that mmap and mbind commands are well known GPU commands.  It would have been an obvious design choice to substitute memory allocation commands for these well-known allocation commands. 
As per claim 7, Awan et al. further teaches, “The method of claim 5, wherein the first command includes a glibc malloc command, and the second command includes a cudaMemPrefetchAsync command.
Awan et al. teaches replacing cudaMalloc calls with cuMemAllocManaged.  Awan further teaches performing optimization like prefetching (Section: V. PROPOSED DESIGN: OC-DNN FRAMEWORK, B. Interception Library (IL) for Rapid experimentation).  Also see Section VI B. 1.
	The examiner states that glibc malloc and cudaMemPrefetchAsync commands are well known GPU commands.  It would have been an obvious design choice to substitute memory allocation commands for these well-known allocation commands.
As per claim 9, Awan et al. further teaches, “The method of claim 1, wherein the call is a cudaMalloc call.”
Awan et al. teaches an interception library (IL).  Any existing application (that uses cudaMalloc) can transparently leverage the proposed out-of-core design schemes without any change to the code.  Legacy applications need some interception or modification to take advantage of the current managed-memory interface.  The interception library intercepts CUDA memory allocation and management calls and redefines their behavior (alternate 
As per claim 10, Awan et al. further teaches, “The method of claim 1, wherein the call is a cudaMallocManaged call.”
Awan et al. teaches intercepting CUDA memory allocation and management calls and redefining their behavior (Section: V. PROPOSED DESIGN: OC-DNN FRAMEWORK, B. Interception Library (IL) for Rapid experimentation). CudaMallocManage is a well known CUDA memory allocation and management call. Therefore it would have been an obvious design choice to substitute the call for the known memory allocation command.
As per claim 11, Awan et al. further teaches, “The method of claim 1, wherein the host transfers data to the GPU via a  NVLink.
Section III. Background, B. CUDA Unified Memory and Pacal/Volta GPUs, mentions the use of NVLINK for better performance for GPUs.
As per claim 12-15, claims 12-15 contains similar limitations to claims 1, 2, 4 and 5.  Therefore claims 12-15 are rejected for the same reasons as claims 1, 2, 4 and 5.
As per claims 17-19, claims 17-19 contain similar limitations to claims 1, 2, and 4.  Therefore claims 17-19 are rejected for the same reasons as claims 1, 2 and 4.

Claims 21-23 are rejected under 35 U.S.C. 103 as being unpatentable over Awan et al. (OC-DNN: Exploiting Advanced Unified Memory Capabilities in CUDA 9 and Volta GPUs for Out-Of-Core DNN Training”) as applied to claim 1 above, further in view of Rao et al. (US 2014/0049548 A1).
As per claim 21 (NEW), Awan et al. teaches a unified virtual memory (coherent memory) for the GPU devices as shown in figure2b.  However Awan et al does not explicitly appear to teach, “The method of claim 1, wherein the coherent memory is located on the GPU, the set of data in the coherent memory is accessible to the host though an address translation service.”
Rao et al. teaches a unified memory architecture in which the CPU and GPU share the same physical memory.  Virtual memory address space of the CPU is mapped to the same physical memory pages as the graphics virtual memory address space of the GPU.  The CPU and GPU are physically located on the same die.  Thus the CPU and GPU share the data contained within the physical memory (0012-0013).  The examiner states that since the CPU and GPU are on the same die and share the same physical memory then the physical memory is located on the GPU.  The virtual memory address form the CPU page table and the graphics virtual memory address from the GPU page table are mapped to the physical memory pages of the surface via a translation procedure (address translation service) (0028-0029).  Also see 0021, 0024, and 037-0040.
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Awan et al. with Rao et al. because both teaches the use of a unified memory architecture using unified virtual memory.  Rao et al. teaches that the virtual memory address is mapped to a physical address that is shared by the CPU and GPU this will allow data to be shared between the GPU and CPU without copying the data form the CPU to the GPU or vice versa and would have been obvious to try.
As per claim 22 (NEW), “The method of claim 21, the wherein the address translation allows the host to access the set of data stored on GPU without mirroring the data to host memory.”


As per claim 23 (NEW), “The method of claim 21, the wherein the address translation service allows the host to access the set of data stored on GPU by translating a virtual GPU address into a physical address usable by the host.”
Rao et al. teaches a unified memory architecture in which the CPU and GPU share the same physical memory.  Virtual memory address space of the CPU is mapped (translated) to the same physical memory pages as the graphics virtual memory address space of the GPU.  The CPU and GPU are physically located on the same die.  Thus the CPU and GPU share the data contained within the physical memory (0012-0013).  The examiner states that since the CPU and GPU are on the same die and share the same physical memory then the physical memory is located on the GPU.  The virtual memory address form the CPU page table and the graphics virtual memory address from the GPU page table are mapped to the physical memory pages of the surface via a translation procedure (address translation service) (0028-0029).  Also .

Response to Arguments
Applicant's arguments filed 2/8/2021 have been fully considered but they are not persuasive.  Applicant states that Awan et al. does not teach or suggest the limitation of “initiating an alternative data allocation command, wherein the alternate command allocate data to a coherent memory” of claim 1.  Paragraph 0016 of the specification states “Various embodiments of modern system include GPU coherent memory, GPU coherent memory is memory which is accessible by the host and GPU without mirroring/transferring the data between the two components (0016).”  However GPU coherent memory is not claimed.  Applicant states that Awan’s reference to “unified memory” does not teach coherent memory.  The examiner respectfully disagrees.
Awan et al. teaches a file system to manage memory (F2M) replaces F2H + H2D.  Therefore a copy step (host to device) is removed and the new memory allocation F2M is allocating to unified memory which is coherent memory.  The proposed primitive allows CUDA kernels to directly access managed buffers (coherent memory) (Section: V. PROPOSED DESIGN: OC-DNN FRAMEWORK, A).  Also see figure 2 a and b.  The unified memory buffer is allocated using cudaMallocManaged interface.  This buffer (unified managed virtual memory in figure 2b) is accessible to both (coherent) the file system (host) as well as the CUDA kernels that need to operate on this data (VI. Design Details: Out-of-Core Caffe (OC-Caffe), A. OC-Caffe Basic Design, 1).  As can be seen in figure 2b there exists a unified (managed) virtual memory.  This memory is accessible to the File system, CPU (host), and the GPU (device).  A M2M (memory to memory) function can also be performed to transfer the managed buffers 
Regarding claim 10 the applicant states that Awan does not teach intercepting the call, wherein the call is cumallocmanaged as required by the claims.  The examiner respectfully disagrees.  As stated above Awan et al. teaches intercepting CUDA memory allocation and management calls and redefining their behavior (Section: V. PROPOSED DESIGN: OC-DNN FRAMEWORK, B. Interception Library (IL) for Rapid experimentation).  Cumallocmanaged is a CUDA memory allocation and management call.  The examienr agrees that Awan et al. teaches specific embodiments in which other CUDA memory allocation and management calls are intercepted, however Awan et al. still teaches the use of cumallocmanaged and that it is a type of CUDA memory allocation and management call and therefore it would have been obvious for this call to be intercepted too.  It is nothing more than a design choice and would have been obvious to try. 
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Lewis Bullock can be reached on 571-272-3759.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MARK A GOORAY/               Examiner, Art Unit 2199                                                                                                                                                                                         /WYNUEL S AQUINO/Primary Examiner, Art Unit 2199