DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claims 2-23 are pending under this Office action.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 2-23 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-20 of U.S. Patent No. 10,546,361. Although the claims at issue are not identical, they are not patentably distinct from each other because they can read on to each other, see the following mapping table.

Application No. 16/919,954 (Instant Application)
U.S. Patent No. US 10,546,361 B2
2. A method comprising:





copying, without a user request, data from the memory location of the first processing unit to a memory location of the second processing unit.




performing a process of automatically managing accesses to the managed pointers across the plurality of processors and corresponding memories, including ensuring consistent information associated with the managed pointers is automatically copied, in a kernel mode, directly from the first portion of memory to a second portion of memory associated with a second one of the plurality of processors based upon initiation of an access to the managed pointers from the second one of the plurality of processors.






mapping virtual address spaces with physical memories of the first processing unit and of the second processing unit;

enabling the virtual address spaces as the memory location of the first processing unit and the memory location of the second processing unit;



flushing at least a page that was migrated to the second processing unit back to the first processing unit; and

unmapping a virtual address space mapped to the second processing unit.



6.  The method of claim 5 wherein when the CPU attempts to access the pointer, space in the central processing unit physical addresses (CPU PA) is allocated, the portion of the GPU PA is automatically copied to the CPU PA, and an address in the CPU VA is mapped to the allocated CPU PA




9.  The system of claim 8 wherein the accesses associated with the pointer are automatically managed back and forth between the first processor and the second processor according to which processor is accessing the pointer.

2.  The method of claim 1 wherein establishing space for managed pointers comprises reserving a region from the first processor's virtual address space and reserving a region from the second processor's virtual address space, wherein the regions are reserved for allocations of the managed pointers.
6. The method of claim 2, further comprising:




allocating the second managed pointer to the memory location of the second processing unit.




directly from the first portion of memory to a second portion of memory associated with a second one of the plurality of processors based upon initiation of an access to the managed pointers from the second one of the plurality of processors.


handling the page fault using a driver in a kernel mode that supports the copying of the data without the user request in response to an attempted access to the pointer.
20.  The tangible computer readable medium of claim 15 wherein there is support for page faults to the pointer associated with accesses by the second processor.


using a call to a managed memory location to perform the allocating of the pointer to the memory location of the first processing unit.
10.  The system of claim 8 wherein an API managed memory allocation call triggers the automatic management of the pointer and a driver manages the memories associated with the pointer.
9. The method of claim 2, further comprising:

receiving a request to perform, on the first processing unit, an operation using altered data in the memory location of the second processing unit; and

copying, in the kernel mode, the altered data from the memory location of the second processing unit to the memory location of the first processing unit in response to the received request.


3.  The method of claim 1 wherein data coherency and concurrency across the memories are automatically maintained.


    12.  The system of claim 8 wherein movement or copying of information between processors is automated and transparent to the user utilizing a single managed pointer without having to be concerned about concurrency or coherency of data between the different processors or memories.


creating, using the second processing unit, the altered data based at least in part on the data.
9.  The system of claim 8 wherein the accesses associated with the pointer are automatically managed back and forth between the first processor and the second processor according to which processor is accessing the pointer.
11. The method of claim 2, wherein a kernel mode supports the copying of the data without the user request in response to an attempted access to the pointer, and wherein the kernel mode operates independently of user code operating on at least one of the first processing unit and the second processing unit.
1. including ensuring consistent information associated with the managed pointers is automatically copied, in a kernel mode, directly from the first portion of memory to a second portion of memory associated with a second one of the plurality of processors based upon initiation of an access to the managed pointers from the second one of the plurality of processors
12. A system comprising:

a first processing unit including a first memory location; a second processing unit including a second memory location; and a non-transitory computer-readable medium 

allocate a pointer to a memory location of a first processing unit, the pointer accessible by a second processing unit; and


copy, without a user request, data from the memory location of the first processing unit to a memory location of the second processing unit.







allocating a pointer to a first portion of memory associated with a first processor, wherein the pointer is also utilized by a second processor;  and 

managing accesses to the pointer automatically, including making sure appropriate consistent information associated with the pointer is copied directly to a second portion of physical memory associated with the second processor, wherein the copying is automatically done, in a kernel mode, based on attempts to access the information by the second processor.


enable a kernel mode that supports the copying of the data without the user request in response to an attempted access to the pointer.



map virtual address spaces with physical memories of the first processing unit and of the second processing unit;

enable the virtual address spaces as the memory location of the first processing unit and the memory location of the second processing unit;



unmap a virtual address space mapped to the second processing unit.





6.  The method of claim 5 wherein when the CPU attempts to access the pointer, space in the central processing unit physical addresses (CPU PA) is allocated, the portion of the GPU PA is automatically copied to the CPU PA, and an address in the CPU VA is mapped to the allocated CPU PA




9.  The system of claim 8 wherein the accesses associated with the pointer are automatically managed back and forth between the first processor and the second processor according to which processor is accessing the pointer.

2.  The method of claim 1 wherein establishing space for managed pointers comprises reserving a region from the first processor's virtual address space and reserving a region from the second processor's virtual address space, wherein the regions are reserved for allocations of the managed pointers.
16. The system of claim 12, wherein the instructions executable by the at least one of 

create a second managed pointer in a reserved virtual address space of the second processing unit in response to an attempted access to the pointer; and


allocate the second managed pointer to the memory location of the second processing unit.





1. establishing space for managed pointers across a plurality of memories, including allocating one of the managed pointers with a first portion of memory associated with a first one of a plurality of processors,

directly from the first portion of memory to a second portion of memory associated with a second one of the plurality of processors based upon initiation of an access to the managed pointers from the second one of the plurality of processors.




18. The system of claim 12, wherein the instructions executable by the at least one of the first processing unit and the second processing unit are further configured to:

use a call to a managed memory location to perform the allocating of the pointer to the memory location of the first processing unit.





15. allocating a pointer to a first portion of memory associated with a first processor, wherein the pointer is also utilized by a second processor
19. The system of claim 12, wherein the instructions executable by the at least one of the first processing unit and the second processing unit are further configured to:



copy, in the kernel mode, the altered data from the memory location of the second processing unit to the memory location of the first processing unit in response to the received request.








    12.  The system of claim 8 wherein movement or copying of information between processors is automated and transparent to the user utilizing a single managed pointer without having to be concerned about concurrency or coherency of data between the different processors or memories.


create, using the second processing unit, the altered data based at least in part on the data.
9.  The system of claim 8 wherein the accesses associated with the pointer are automatically managed back and forth between the first processor and the second processor according to which processor is accessing the pointer.

15. allocating a pointer to a first portion of memory associated with a first processor, wherein the pointer is also utilized by a second processor;  and 
wherein the copying is automatically done, in a kernel mode, based on attempts to access the information by the second processor.
22.  A computing device, comprising:




a first processing unit with a first memory location and having an allocated pointer that is accessible by a second processing unit having a second memory location; and

a driver to copy the data from the first memory location to the second memory 


allocating a pointer to a first portion of memory associated with a first processor, wherein the pointer is also utilized by a second processor;  and 

managing accesses to the pointer automatically, including making sure appropriate consistent information 


a kernel mode that supports the copying of the data without the user request in response to an attempted access to the pointer.



15. wherein the copying is automatically done, in a kernel mode, based on attempts to access the information by the second processor.




Claim 2 of the instant application is drawn to a method comprising: allocating a pointer to a memory location of a first processing unit, the pointer accessible by a second processing unit; and copying, without a user request, data from the memory location of the first processing unit to a memory location of the second processing unit.
While the exact wordings of claim 1 of the ‘361 patent may not be the same as that of claim 2 of the instant application, but there is no significant difference in scope between the .


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 2-23 are rejected under 35 U.S.C. 103 as being unpatentable over Kaminski, etc. (US 20110161619 A1) in view of Chen, etc. (US 20100118041 A1), further in view of Ganapathy, etc. (US 20050125572 A1).
Regarding claim 2, Kaminski teaches that a method (See Kaminski: Fig. 2, and [0016], "Systems and methods are provided that can allow for an accelerator device to share physical memory of a computer system that is managed by and operates under control of an operating system") comprising:
allocating a pointer to a memory location of a first processing unit, the pointer accessible by a second processing unit (See Kaminski: Fig. 2, and [0073], "Each of the CPU processor cores 230 can be associated with a corresponding one of the OS page tables 240-1 ... N (as indicated by the arrows linking particular ones of the CPU processor cores 230 with 
copying, without a user request, data from the memory location of the first processing unit to a memory location of the second processing unit.
However, Kaminski fails to explicitly disclose that copying, without a user request, data from the memory location of the first processing unit to a memory location of the second processing unit.
However, Chen teaches that copying, without a user request, data from the memory location of the first processing unit to a memory location of the second processing unit (See Chen: Fig. 3, and [0041], "This problem may be solved by leveraging the PCI aperture in a novel way.  FIG. 4 is a flow chart for one embodiment of the shared memory model that leverages the PCI aperture.  A sequence 400 may be implemented in firmware, software, or hardware.  During initialization, a portion of the PCI aperture space may be mapped into the user space of the application and instantiated with a task queue, a message queue, and copy buffers (block 410).  When there is a need to copy pages (block 420), for example from the CPU to GPU, the runtime copies the pages into the PCI aperture copy buffers and tags the buffers with the virtual address and the process identifier (block 430).  On the GPU side, the daemon thread copies the contents of the buffers into its address space by using the virtual address tag (block 440).  Thus the copy may be performed in a 2 step process--the CPU copies from its address 
Therefore, it would have been obvious to one of ordinary skill in the art at the time of the invention was effectively filed to modify Kaminski to have copying, without a user request, data from the memory location of the first processing unit to a memory location of the second processing unit as taught by Chen in order to make the application stack more efficient (See Chen: [0022], "Enable user-level communication between the CPU and GPU thus making the application stack much more efficient"). Kaminski teaches a system that shares memory through page table entries, while Chen teaches a system that shares memory through a pointer to make it more efficient. Therefore, it is obvious to one of ordinary skill in the art to modify Kaminski by Chen. The motivation to modify Kaminski by Chen is "Simple substitution of one known element for another to obtain predictable results".
However, Kaminski, modified by Chen, fails to explicitly disclose that copying, without a user request.

Therefore, it would have been obvious to one of ordinary skill in the art at the time of the invention was effectively filed to modify Kaminski to have copying, without a user request as taught by Ganapathy in order to improve the arbitration (See Ganapathy: [0051], "The preferred embodiments of the present invention are thus described. As those of ordinary skill will recognize, the present invention has many advantages.  One advantage of the present invention is that the bandwidth to the global buffer memory is increased due to the wide system bus, the remapping of serial data, and compression/decompression of data on the fly.  Another advantage of the present invention is that arbitration is simplified by using common 
Regarding claim 3, Kaminski, Chen, and Ganapathy teach all the features with respect to claim 2 as outlined above. Further, Kaminski and Ganapathy teach that the method of claim 2, further comprising: enabling a kernel mode (See Kaminski: Fig. 2, and [0071], "The computer system 210 includes an operating system kernel 220, a plurality of CPU processor core devices 230-1 .  . . N, a kernel mode device driver (KMDD) 260 (referred to below simply as a device driver 260 or driver 260) for the various accelerator devices 290, and a shared physical memory 250 (e.g., RAM) that operates in accordance with virtual memory (VM) address translation techniques (e.g., translating virtual memory addresses used by the CPU (and its cores) to memory addresses at the memory 250).  As used herein, the term "kernel mode device driver" refers to a driver that runs in protected or privileged mode, and has full, unrestricted access to the system memory, devices, processes and other protected subsystems of the OS.  Operation of the computer system's operating system kernel 220, the device driver 260 and the accelerator devices 290 will be described below with reference to FIGS. 4, 6, 8 and 9 ") that supports the copying of the data without the user request in response to an attempted access 
Regarding claim 4, Kaminski, Chen, and Ganapathy teach all the features with respect to claim 2 as outlined above. Further, Kaminski and Chen teach that the method of claim 2, further comprising:
mapping virtual address spaces with physical memories of the first processing unit and of the second processing unit (See Kaminski: Figs. 1-2, and [0072], “When a process requests access to its virtual memory, it is the responsibility of the OS to map the virtual memory address provided by the process to the physical memory address where that virtual memory is mapped to.  The OS stores its mappings of virtual memory addresses to physical memory addresses in a page table”);

causing an operation to be launched on the first processing unit (See Chen: Figs. 1-2, and [0029], “In one embodiment, shared data may be privatized by copying from shared space to the private space.  Non-pointer containing data structures may be privatized simply by copying the memory contents.  While copying pointer containing data structures, pointers into shared data must be converted to pointers into private data”; and [0030], “Private data may be globalized by copying from the private space to the shared space and made visible to other computations.  Non-pointer containing data structures may be globalized simply by copying the memory contents.  While copying pointer containing data structures, pointers into private data must be converted as pointers into shared data (converse of the privatization example)”);
flushing at least a page that was migrated to the second processing unit back to the first processing unit (See Kaminski: Figs. 1-2, and [0077], “The driver 260 also includes an independent memory management unit 280 (i.e., that is independent of the main kernel MMU 
unmapping a virtual address space mapped to the second processing unit (See Kaminski: Figs. 1-2, and [0110], “In response to the TLB flush, each accelerator device must determine if its TLB table contains any address translation entries corresponding to the page table entries that were invalidated.  If so, the affected accelerator devices must delete such entries from their respective TLB tables.  Finally, the potentially affected accelerator devices must signal the driver that they have finished handling the TLB flush operation”).
Regarding claim 5, Kaminski, Chen, and Ganapathy teach all the features with respect to claim 2 as outlined above. Further, Chen teaches that the method of claim 2, wherein the pointer is a first managed pointer in a reserved virtual address space of the first processing unit (See Chen: Fig. 1, and [0023], "The system may provide a special malloc function that allocates data in this space 130. Static variables may be annotated with a type quantifier to have them allocated in the shared window 130. However, unlike PGAS languages there is no notion of 
Regarding claim 6, Kaminski, Chen, and Ganapathy teach all the features with respect to claim 2 as outlined above. Further, Kaminski and Chen teach that the method of claim 2, further comprising:
creating a second managed pointer in a reserved virtual address space of the second processing unit in response to an attempted access to the pointer (See Kaminski: Fig. 2, and [0073], "Each of the CPU processor cores 230 can be associated with a corresponding one of the OS page tables 240-1 ...	N (as indicated by the arrows linking particular ones of the CPU processor cores 230 with corresponding ones of the OS page tables 240-1. .. N). Each of the OS page tables 240-1. .. N include a plurality of page table entries (not shown) that are each mapped to particular locations in the shared physical memory 250 as indicated by the arrows linking a particular one of the OS page tables 240-1 ... N with locations at the shared physical memory 250"); and
allocating the second managed pointer to the memory location of the second processing unit (See Chen: Fig. ln and [0029-0030], "In one embodiment, shared data may be privatized 
Regarding claim 7, Kaminski, Chen, and Ganapathy teach all the features with respect to claim 2 as outlined above. Further, Kaminski and Chen teach that the method of claim 2, further comprising: triggering a page fault in response to an attempted access to the pointer (See Kaminski: [0018], "The driver can monitor for page fault notifications generated by the accelerator device and handle any page fault notifications received from the accelerator device. When a request for access to the physical memory causes the accelerator device to generate a page fault notification, the driver can determine a memory address space and virtual memory location of a process that contains a virtual memory address specified in the request for access to the physical memory. The driver can then determine whether the request for access to physical memory is a valid request.  If the request is determined to be valid, the driver "pins" a limited amount of memory pages of the physical memory for use by the accelerator device to prevent the process from releasing limited amount of memory pages of the physical memory. To update the non-shared page table for the memory pages being used by the accelerator 
handling the page fault using a driver in a kernel mode that supports the copying of the data without the user request in response to an attempted access to the pointer (See Chen: Fig. 6, and [0044], “FIG. 6 is a flow chart for one embodiment of the shared memory model in operation.  A sequence 500 may be implemented in firmware, software, or hardware.  In one embodiment, a sequence 600 may be implemented in firmware, software, or hardware.  When the GPU performs an acquire operation (block 610), the corresponding pages may be set to no-access on the GPU (620).  At a subsequent read operation the page fault handler on the GPU copies the page from the CPU (block 640) if the page has been updated and released by the CPU since the last GPU acquire (block 630).  The directory and private version numbers may be used to determine this.  The page is then set to read-only (block 650). At a subsequent write operation the page fault handler creates the backup copy of the page, marks the page as read-write and increments the local version number of the page (block 660).  At a release point, a diff is performed with the backup copy of the page and the changes transmitted to the home location, while incrementing the directory version number (block 670). The diff operation computes the differences in the memory locations between the two pages (i.e. the page and its 
Regarding claim 8, Kaminski, Chen, and Ganapathy teach all the features with respect to claim 2 as outlined above. Further, Chen teaches that the method of claim 2, further comprising:
using a call to a managed memory location to perform the allocating of the pointer to the memory location of the first processing unit (See Chen: Fig. 1, and [0023], "The system may provide a special malloc function that allocates data in this space 130. Static variables may be annotated with a type quantifier to have them allocated in the shared window 130. However, unlike PGAS languages there is no notion of affinity in the shared window. This is because data in the shared space 130 migrates between the CPU and GPU caches as it gets used by each processor.  Also unlike PGAS implementations, the representation of pointers does not change between the shared and private spaces. The remaining virtual address space is private to the CPU 110 and GPU 120. By default data gets allocated in this space 130, and is not visible to the other side. This partitioned address space approach may cut down on the amount of memory that needs to be kept coherent and enables a more efficient implementation for discrete devices").
Regarding claim 9, Kaminski, Chen, and Ganapathy teach all the features with respect to claim 2 as outlined above. Further, Chen teaches that the method of claim 2, further 
copying, in the kernel mode, the altered data from the memory location of the second processing unit to the memory location of the first processing unit in response to the received request (See Chen: Fig. 4, and [0041], “This problem may be solved by leveraging the PCI aperture in a novel way.  FIG. 4 is a flow chart for one embodiment of the shared memory model that leverages the PCI aperture.  A sequence 400 may be implemented in firmware, software, or hardware.  During initialization, a portion of the PCI aperture space may be mapped into the user space of the application and instantiated with a task queue, a message queue, and copy buffers (block 410).  When there is a need to copy pages (block 420), for example from the CPU to GPU, the runtime copies the pages into the PCI aperture copy buffers and tags the buffers with the virtual address and the process identifier (block 430).  On the GPU side, the daemon thread copies the contents of the buffers into its address space by using the virtual address tag (block 440).  Thus the copy may be performed in a 2 step process--the CPU 
Regarding claim 10, Kaminski, Chen, and Ganapathy teach all the features with respect to claim 9 as outlined above. Further, Chen teaches that the method of claim 9, further comprising: creating, using the second processing unit, the altered data based at least in part on the data (See Chen: Fig. 5, and [0043], “The metadata says whether the CPU or GPU holds the golden copy of a page (home for the page), contains a version number that tracks the number of updates to the page, mutexes that are acquired before updating the page, and miscellaneous metadata.  The directory may be indexed by the virtual address of a page (block 520).  Both the CPU and the GPU runtime systems maintain a similar private structure that contains the local access permissions for the pages, and the local version numbers of the pages”. Updating the content by GPU may be corresponding to creating altered data).
Regarding claim 11, Kaminski, Chen, and Ganapathy teach all the features with respect to claim 2 as outlined above. Further, Kaminski and Ganapathy teach that the method of claim 
claim 12, Kaminski, Chen, and Ganapathy teach all the features with respect to claim 2 as outlined above. Further, Kaminski, Chen, and Ganapathy teach that a system (See Kaminski: Fig. 2, and [0016], "Systems and methods are provided that can allow for an accelerator device to share physical memory of a computer system that is managed by and operates under control of an operating system") comprising:
a first processing unit including a first memory location (See Kaminski: Fig. 1, and [0063], “Components of computer 110 may include, but are not limited to, one or more processing units 120, a system memory 130, and a system bus 121 that couples various system components including the system memory to the processing unit 120”); 
a second processing unit including a second memory location (See Kaminski: Fig. 1, and [0067], “In this regard, GPUs 184 generally include on-chip memory storage, such as register storage and GPUs 184 communicate with a video memory 186”); and 
a non-transitory computer-readable medium storing instructions executable by at least one of the first processing unit and the second processing unit to (See XXX: Fig. 1, and [0064], “Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.  Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CDROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or 
allocate a pointer to a memory location of a first processing unit, the pointer accessible by a second processing unit (See Kaminski: Fig. 2, and [0073], "Each of the CPU processor cores 230 can be associated with a corresponding one of the OS page tables 240-1 ... N (as indicated by the arrows linking particular ones of the CPU processor cores 230 with corresponding ones of the OS page tables 240-1. .. N). Each of the OS page tables 240-1. .. N include a plurality of page table entries (not shown) that are each mapped to particular locations in the shared physical memory 250 as indicated by the arrows linking a particular one of the OS page tables 240-1 ... N with locations at the shared physical memory 250"); and
copy, without a user request (See Ganapathy: Figs. 2-4, and [0021], "The DMA descriptor list includes a starting address for the data and the number of bytes to be transferred.  The DMA descriptor list includes other information which is described in greater detail below.  The one or more core processors 202A-202N can also form DMA descriptors in the global memory 210 especially for core DMA transfers, in addition to the microcontroller 223.  As an example, the microcontroller 223 sets up a DMA with one of the core processors 202A-202N in order to process a frame or block of data for a given communication channel.  It communicates with the one core processor the starting address of the descriptor list in the global buffer memory.  The one core processor reads through each line in the descriptor list and performs the DMA of the data from the global buffer memory into the core processor's local memory ". The data stored in the global buffer memory is automatically copied to the local 
claim 13, Kaminski, Chen, and Ganapathy teach all the features with respect to claim 12 as outlined above. Further, Kaminski and Ganapathy teach that the system of claim 12, wherein the instructions executable by the at least one of the first processing unit and the second processing unit are further configured to:
enable a kernel mode (See Kaminski: Fig. 2, and [0071], "The computer system 210 includes an operating system kernel 220, a plurality of CPU processor core devices 230-1 .  . . N, a kernel mode device driver (KMDD) 260 (referred to below simply as a device driver 260 or driver 260) for the various accelerator devices 290, and a shared physical memory 250 (e.g., RAM) that operates in accordance with virtual memory (VM) address translation techniques (e.g., translating virtual memory addresses used by the CPU (and its cores) to memory addresses at the memory 250).  As used herein, the term "kernel mode device driver" refers to a driver that runs in protected or privileged mode, and has full, unrestricted access to the system memory, devices, processes and other protected subsystems of the OS.  Operation of the computer system's operating system kernel 220, the device driver 260 and the accelerator devices 290 will be described below with reference to FIGS. 4, 6, 8 and 9 ") that supports the copying of the data without the user request in response to an attempted access to the pointer (See Ganapathy: Figs. 2-4, and [0021], "The DMA descriptor list includes a starting address for the data and the number of bytes to be transferred.  The DMA descriptor list includes other information which is described in greater detail below.  The one or more core processors 202A-202N can also form DMA descriptors in the global memory 210 especially for core DMA transfers, in addition to the microcontroller 223.  As an example, the microcontroller 223 sets 
Regarding claim 14, Kaminski, Chen, and Ganapathy teach all the features with respect to claim 12 as outlined above. Further, Kaminski and Chen teach that the system of claim 12, wherein the instructions executable by the at least one of the first processing unit and the second processing unit are further configured to:
map virtual address spaces with physical memories of the first processing unit and of the second processing unit (See Kaminski: Figs. 1-2, and [0072], “When a process requests access to its virtual memory, it is the responsibility of the OS to map the virtual memory address provided by the process to the physical memory address where that virtual memory is mapped to.  The OS stores its mappings of virtual memory addresses to physical memory addresses in a page table”);
enable the virtual address spaces as the memory location of the first processing unit and the memory location of the second processing unit (See Kaminski: Figs. 1-2, and [0073], “Each of the CPU processor cores 230 can be associated with a corresponding one of the OS page tables 240-1 .  . . N (as indicated by the arrows linking particular ones of the CPU processor 
cause an operation to be launched on the first processing unit (See Chen: Figs. 1-2, and [0029], “In one embodiment, shared data may be privatized by copying from shared space to the private space.  Non-pointer containing data structures may be privatized simply by copying the memory contents.  While copying pointer containing data structures, pointers into shared data must be converted to pointers into private data”; and [0030], “Private data may be globalized by copying from the private space to the shared space and made visible to other computations.  Non-pointer containing data structures may be globalized simply by copying the memory contents.  While copying pointer containing data structures, pointers into private data must be converted as pointers into shared data (converse of the privatization example)”); 
flush at least a page that was migrated to the second processing unit back to the first processing unit (See Kaminski: Figs. 1-2, and [0077], “The driver 260 also includes an independent memory management unit 280 (i.e., that is independent of the main kernel MMU 225 of the main OS kernel 220).  The primary role of driver 260 is to handle the page faults (when the accelerator device 290 tries to access virtual memory area that is not currently in physical memory) and page table related tasks.  The MMU 280 includes a process termination detection module 284 that detects when the process terminates (e.g., closes its last open 
unmap a virtual address space mapped to the second processing unit (See Kaminski: Figs. 1-2, and [0110], “In response to the TLB flush, each accelerator device must determine if its TLB table contains any address translation entries corresponding to the page table entries that were invalidated.  If so, the affected accelerator devices must delete such entries from their respective TLB tables.  Finally, the potentially affected accelerator devices must signal the driver that they have finished handling the TLB flush operation”).
Regarding claim 15, Kaminski, Chen, and Ganapathy teach all the features with respect to claim 12 as outlined above. Further, Chen teaches that the system of claim 12, wherein the pointer is a first managed pointer in a reserved virtual address space of the first processing unit (See Chen: Fig. 1, and [0023], "The system may provide a special malloc function that allocates data in this space 130. Static variables may be annotated with a type quantifier to have them allocated in the shared window 130. However, unlike PGAS languages there is no notion of affinity in the shared window. This is because data in the shared space 130 migrates between the CPU and GPU caches as it gets used by each processor. Also unlike PGAS implementations, the representation of pointers does not change between the shared and private spaces. The remaining virtual address space is private to the CPU 110 and GPU 120. By default data gets 
Regarding claim 16, Kaminski, Chen, and Ganapathy teach all the features with respect to claim 12 as outlined above. Further, Kaminski and Chen teach that the system of claim 12, wherein the instructions executable by the at least one of the first processing unit and the second processing unit are further configured to:
create a second managed pointer in a reserved virtual address space of the second processing unit in response to an attempted access to the pointer (See Kaminski: Fig. 2, and [0073], "Each of the CPU processor cores 230 can be associated with a corresponding one of the OS page tables 240-1 ...	N (as indicated by the arrows linking particular ones of the CPU processor cores 230 with corresponding ones of the OS page tables 240-1. .. N). Each of the OS page tables 240-1. .. N include a plurality of page table entries (not shown) that are each mapped to particular locations in the shared physical memory 250 as indicated by the arrows linking a particular one of the OS page tables 240-1 ... N with locations at the shared physical memory 250"); and
allocate the second managed pointer to the memory location of the second processing unit (See Chen: Fig. ln and [0029-0030], "In one embodiment, shared data may be privatized by copying from shared space to the private space. Non-pointer containing data structures may be privatized simply by copying the memory contents. While copying pointer containing data structures, pointers into shared data must be converted to pointers into 
Regarding claim 17, Kaminski, Chen, and Ganapathy teach all the features with respect to claim 12 as outlined above. Further, Kaminski and Chen teach that the system of claim 12, wherein the instructions executable by the at least one of the first processing unit and the second processing unit are further configured to:
trigger a page fault in response to an attempted access to the pointer (See Kaminski: [0018], "The driver can monitor for page fault notifications generated by the accelerator device and handle any page fault notifications received from the accelerator device. When a request for access to the physical memory causes the accelerator device to generate a page fault notification, the driver can determine a memory address space and virtual memory location of a process that contains a virtual memory address specified in the request for access to the physical memory. The driver can then determine whether the request for access to physical memory is a valid request.  If the request is determined to be valid, the driver "pins" a limited amount of memory pages of the physical memory for use by the accelerator device to prevent the process from releasing limited amount of memory pages of the physical memory. To update the non-shared page table for the memory pages being used by the accelerator device, the driver can add new page table entries to the non-shared page table or edit existing page table 
handle the page fault using a driver in a kernel mode that supports the copying of the data without the user request in response to an attempted access to the pointer (See Chen: Fig. 6, and [0044], “FIG. 6 is a flow chart for one embodiment of the shared memory model in operation.  A sequence 500 may be implemented in firmware, software, or hardware.  In one embodiment, a sequence 600 may be implemented in firmware, software, or hardware.  When the GPU performs an acquire operation (block 610), the corresponding pages may be set to no-access on the GPU (620).  At a subsequent read operation the page fault handler on the GPU copies the page from the CPU (block 640) if the page has been updated and released by the CPU since the last GPU acquire (block 630).  The directory and private version numbers may be used to determine this.  The page is then set to read-only (block 650). At a subsequent write operation the page fault handler creates the backup copy of the page, marks the page as read-write and increments the local version number of the page (block 660).  At a release point, a diff is performed with the backup copy of the page and the changes transmitted to the home location, while incrementing the directory version number (block 670). The diff operation computes the differences in the memory locations between the two pages (i.e. the page and its backup) to find out the changes that have been made.  The CPU operations are done in a 
Regarding claim 18, Kaminski, Chen, and Ganapathy teach all the features with respect to claim 12 as outlined above. Further, Chen teaches that the system of claim 12, wherein the instructions executable by the at least one of the first processing unit and the second processing unit are further configured to:
use a call to a managed memory location to perform the allocating of the pointer to the memory location of the first processing unit (See Chen: Fig. 1, and [0023], "The system may provide a special malloc function that allocates data in this space 130. Static variables may be annotated with a type quantifier to have them allocated in the shared window 130. However, unlike PGAS languages there is no notion of affinity in the shared window. This is because data in the shared space 130 migrates between the CPU and GPU caches as it gets used by each processor.  Also unlike PGAS implementations, the representation of pointers does not change between the shared and private spaces. The remaining virtual address space is private to the CPU 110 and GPU 120. By default data gets allocated in this space 130, and is not visible to the other side. This partitioned address space approach may cut down on the amount of memory that needs to be kept coherent and enables a more efficient implementation for discrete devices").
Regarding claim 19, Kaminski, Chen, and Ganapathy teach all the features with respect to claim 12 as outlined above. Further, Chen teaches that the system of claim 12, wherein the 
receive a request to perform, on the first processing unit, an operation using altered data in the memory location of the second processing unit (See Chen: Figs. 1-2, and [0027], “Embodiments of the invention may provide these ownership rights to leverage common CPU-GPU usage models.  For example, the CPU first accesses some data (e.g. initializing a data structure), and then hands it over to the GPU (e.g. computing on the data structure in a data parallel manner), and then the CPU analyzes the results of the computation and so on.  The ownership rights allow an application to inform the system of this temporal locality and optimize the coherence implementation.  Note that these ownership rights are optimization hints and it is legal for the system to ignore these hints”); and
copy, in the kernel mode, the altered data from the memory location of the second processing unit to the memory location of the first processing unit in response to the received request (See Chen: Fig. 4, and [0041], “This problem may be solved by leveraging the PCI aperture in a novel way.  FIG. 4 is a flow chart for one embodiment of the shared memory model that leverages the PCI aperture.  A sequence 400 may be implemented in firmware, software, or hardware.  During initialization, a portion of the PCI aperture space may be mapped into the user space of the application and instantiated with a task queue, a message queue, and copy buffers (block 410).  When there is a need to copy pages (block 420), for example from the CPU to GPU, the runtime copies the pages into the PCI aperture copy buffers and tags the buffers with the virtual address and the process identifier (block 430).  On the GPU 
Regarding claim 20, Kaminski, Chen, and Ganapathy teach all the features with respect to claim 19 as outlined above. Further, Chen teaches that the system of claim 19, wherein the instructions executable by the at least one of the first processing unit and the second processing unit are further configured to:
create, using the second processing unit, the altered data based at least in part on the data (See Chen: Fig. 5, and [0043], “The metadata says whether the CPU or GPU holds the golden copy of a page (home for the page), contains a version number that tracks the number of updates to the page, mutexes that are acquired before updating the page, and miscellaneous metadata.  The directory may be indexed by the virtual address of a page (block 520).  Both the CPU and the GPU runtime systems maintain a similar private structure that contains the local 
Regarding claim 21, Kaminski, Chen, and Ganapathy teach all the features with respect to claim 12 as outlined above. Further, Kaminski and Ganapathy teach that the system of claim 12, wherein a kernel mode supports the copying of the data without the user request in response to an attempted access to the pointer (See Ganapathy: Figs. 2-4, and [0021], "The DMA descriptor list includes a starting address for the data and the number of bytes to be transferred.  The DMA descriptor list includes other information which is described in greater detail below.  The one or more core processors 202A-202N can also form DMA descriptors in the global memory 210 especially for core DMA transfers, in addition to the microcontroller 223.  As an example, the microcontroller 223 sets up a DMA with one of the core processors 202A-202N in order to process a frame or block of data for a given communication channel.  It communicates with the one core processor the starting address of the descriptor list in the global buffer memory.  The one core processor reads through each line in the descriptor list and performs the DMA of the data from the global buffer memory into the core processor's local memory ". The data stored in the global buffer memory is automatically copied to the local memory of any core processor 1 to N, which may be mapped to copying, without a user request), and wherein the kernel mode operates independently of user code operating on at least one of the first processing unit and the second processing unit (See Kaminski: [0056], “In accordance with some of the disclosed embodiments, a kernel mode device driver creates and maintains a set of page tables to be used by the accelerator device to provide a consistently 
Regarding claim 22, Kaminski, Chen, and Ganapathy teach all the features with respect to claim 2 as outlined above. Further, Kaminski, Chen, and Ganapathy teach that a computing device (See Kaminski: Fig. 2, and [0016], "Systems and methods are provided that can allow for an accelerator device to share physical memory of a computer system that is managed by and operates under control of an operating system"), comprising:
a first processing unit with a first memory location and having an allocated pointer that is accessible by a second processing unit having a second memory location (See Kaminski: Fig. 2, and [0073], "Each of the CPU processor cores 230 can be associated with a corresponding one of the OS page tables 240-1 ... N (as indicated by the arrows linking particular ones of the CPU processor cores 230 with corresponding ones of the OS page tables 240-1. .. N). Each of the OS page tables 240-1. .. N include a plurality of page table entries (not shown) that are each mapped to particular locations in the shared physical memory 250 as indicated by the arrows linking a particular one of the OS page tables 240-1 ... N with locations at the shared physical memory 250"); and
a driver to copy the data from the first memory location to the second memory location (See Chen: Fig. 3, and [0041], "This problem may be solved by leveraging the PCI aperture in a novel way.  FIG. 4 is a flow chart for one embodiment of the shared memory model that 
Regarding claim 23, Kaminski, Chen, and Ganapathy teach all the features with respect to claim 19 as outlined above. Further, Kaminski and Ganapathy teach that the computing device of claim 19, further comprising:
a kernel mode (See Kaminski: Fig. 2, and [0071], "The computer system 210 includes an operating system kernel 220, a plurality of CPU processor core devices 230-1 .  . . N, a kernel mode device driver (KMDD) 260 (referred to below simply as a device driver 260 or driver 260) for the various accelerator devices 290, and a shared physical memory 250 (e.g., RAM) that operates in accordance with virtual memory (VM) address translation techniques (e.g., translating virtual memory addresses used by the CPU (and its cores) to memory addresses at the memory 250).  As used herein, the term "kernel mode device driver" refers to a driver that runs in protected or privileged mode, and has full, unrestricted access to the system memory, devices, processes and other protected subsystems of the OS.  Operation of the computer system's operating system kernel 220, the device driver 260 and the accelerator devices 290 


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to GORDON G LIU whose telephone number is (571)270-0382.  The examiner can normally be reached on Monday - Friday 8:00-5:00.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Mehmood can be reached on 571-272-2976.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/GORDON G LIU/             Primary Examiner, Art Unit 2612