DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claims 2-23 are pending under this Office action.

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 4/4/2022 has been entered.


Response to Amendment
Applicant's arguments filed on April 4, 2022, have been fully considered.
Applicant argues that the independent claims 2, 12, and 22, are amended with new limitations of "copying, without a user request, data indicated by the pointer from the memory location of the first processing unit to a memory location of the second processing unit" (emphasis added). Applicant argues that the prior arts on record do not disclose or suggest the claimed features as claimed in the independent claim 2.
Examiner replies that the newly added limitations may overcome the current rejection. However, a new art has been found, and the new art “Transfer Time Reduction of Data Transfers Between CPU and GPU” (by Julius Sandgren, UPPSALA  Universitet, July 2012, hereinafter referred as Sandgren), teaches that copying, without a user request, data indicated by the pointer from the memory location of the first processing unit to a memory location of the second processing unit (See Sandgren: Section 3.2.1 “Upload”, Page 20, "1. Generate a PBO on the GPU using glGenBuffers. 2. Bind PBO to unpack buffer target. 3. Allocate buffer space on GPU according to data size using glBufferData. 4. Map PBO to CPU memory denying GPU access for now. glMapBuffer returns a pointer to a place in GPU memory where the PBO resides. 5. Copy data from CPU to GPU using pointer from glMapBuffer. 6. Unmap PBO (glUnmapBuffer) to allow GPU full access of the PBO again. 7. Transfer data from buffer to a texture target. 8. Unbind the PBO to allow for normal operation again”). Note that data are copied to and back between CPU and GPU through a pointer.

Double Patenting
Claims 2-23 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-20 of U.S. Patent No. 10,546,361. Although the claims at issue are not identical, they are not patentably distinct from each other because they can read on to each other. The detailed mapping is omitted, because applicant is willing to file the Terminal Disclaimer when the application is in allowable conditions.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 2-23 are rejected under 35 U.S.C. 103 as being unpatentable over Kaminski, etc. (US 20110161619 A1) in view of Chen, etc. (US 20100118041 A1), further in view of Ganapathy, etc. (US 20050125572 A1), and “Transfer Time Reduction of Data Transfers Between CPU and GPU” (by Julius Sandgren, UPPSALA  Universitet, July 2012, hereinafter referred as Sandgren). 
Regarding claim 2, Kaminski teaches that a method (See Kaminski: Fig. 2, and [0016], "Systems and methods are provided that can allow for an accelerator device to share physical memory of a computer system that is managed by and operates under control of an operating system") comprising:
allocating a pointer to a memory location of a first processing unit, the pointer accessible by a second processing unit (See Kaminski: Fig. 2, and [0073], "Each of the CPU processor cores 230 can be associated with a corresponding one of the OS page tables 240-1 ... N (as indicated by the arrows linking particular ones of the CPU processor cores 230 with corresponding ones of the OS page tables 240-1. .. N). Each of the OS page tables 240-1. .. N include a plurality of page table entries (not shown) that are each mapped to particular locations in the shared physical memory 250 as indicated by the arrows linking a particular one of the OS page tables 240-1 ... N with locations at the shared physical memory 250"); and
copying, without a user request, data indicated by the pointer from the memory location of the first processing unit to a memory location of the second processing unit.
However, Kaminski fails to explicitly disclose that copying, without a user request, data indicated by the pointer from the memory location of the first processing unit to a memory location of the second processing unit.
However, Chen teaches that copying data from the memory location of the first processing unit to a memory location of the second processing unit (See Chen: Fig. 3, and [0041], "This problem may be solved by leveraging the PCI aperture in a novel way. FIG. 4 is a flow chart for one embodiment of the shared memory model that leverages the PCI aperture. A sequence 400 may be implemented in firmware, software, or hardware. During initialization, a portion of the PCI aperture space may be mapped into the user space of the application and instantiated with a task queue, a message queue, and copy buffers (block 410). When there is a need to copy pages (block 420), for example from the CPU to GPU, the runtime copies the pages into the PCI aperture copy buffers and tags the buffers with the virtual address and the process identifier (block 430). On the GPU side, the daemon thread copies the contents of the buffers into its address space by using the virtual address tag (block 440). Thus the copy may be performed in a 2 step process--the CPU copies from its address space into a common buffer (PCI aperture) that both CPU and GPU may access, while the GPU picks up the pages from the common buffer into its address space. GPU-CPU copies are done in a similar way. Since the aperture is pinned memory, the contents of the aperture are not lost if the CPU or GPU process gets context switched out. This allows the two processors to execute asynchronously which may be critical since the two processors may have different operating systems and hence the context switches may not be synchronized. Furthermore, the aperture space may be mapped into the user space of the applications thus enabling user level CPU-GPU communication. This makes the application stack vastly more efficient than going through the OS driver stack").
Therefore, it would have been obvious to one of ordinary skill in the art at the time of the invention was effectively filed to modify Kaminski to have copying data from the memory location of the first processing unit to a memory location of the second processing unit as taught by Chen in order to make the application stack more efficient (See Chen: [0022], "Enable user-level communication between the CPU and GPU thus making the application stack much more efficient"). Kaminski teaches a system that shares memory through page table entries, while Chen teaches a system that shares memory through a pointer to make it more efficient. Therefore, it is obvious to one of ordinary skill in the art to modify Kaminski by Chen. The motivation to modify Kaminski by Chen is "Simple substitution of one known element for another to obtain predictable results".
However, Kaminski, modified by Chen, fails to explicitly disclose that copying, without a user request, data indicated by the pointer.
However, Ganapathy teaches that copying, without a user request (See Ganapathy: Figs. 2-4, and [0021], "The DMA descriptor list includes a starting address for the data and the number of bytes to be transferred. The DMA descriptor list includes other information which is described in greater detail below. The one or more core processors 202A-202N can also form DMA descriptors in the global memory 210 especially for core DMA transfers, in addition to the microcontroller 223. As an example, the microcontroller 223 sets up a DMA with one of the core processors 202A-202N in order to process a frame or block of data for a given communication channel. It communicates with the one core processor the starting address of the descriptor list in the global buffer memory. The one core processor reads through each line in the descriptor list and performs the DMA of the data from the global buffer memory into the core processor's local memory". The data stored in the global buffer memory is automatically copied to the local memory of any core processor 1 to N, which may be mapped to copying, without a user request).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of the invention was effectively filed to modify Kaminski to have copying, without a user request as taught by Ganapathy in order to improve the arbitration (See Ganapathy: [0051], "The preferred embodiments of the present invention are thus described. As those of ordinary skill will recognize, the present invention has many advantages. One advantage of the present invention is that the bandwidth to the global buffer memory is increased due to the wide system bus, the remapping of serial data, and compression/decompression of data on the fly. Another advantage of the present invention is that arbitration is simplified by using common standards for bus arbitration and is improved due to the distribution of direct memory access controllers"). Kaminski teaches a system that shares memory through page table entries, while Ganapathy teaches a system that use DMA (direct memory access) controller to access global shared memory and copy data without a user request to local core processor memory. Therefore, it is obvious to one of ordinary skill in the art to modify Kaminski by Ganapathy to copy data without user requests. The motivation to modify Kaminski by Ganapathy is "Use of known technique to improve similar devices (methods, or products) in the same way".
However, Kaminski, modified by Chen and Ganapathy, fails to explicitly disclose that copying data indicated by the pointer.
However, Sandgren teaches that copying data indicated by the pointer (See Sandgren: Section 3.2.1 “Upload”, Page 20, "1. Generate a PBO on the GPU using glGenBuffers. 2. Bind PBO to unpack buffer target. 3. Allocate buffer space on GPU according to data size using glBufferData. 4. Map PBO to CPU memory denying GPU access for now. glMapBuffer returns a pointer to a place in GPU memory where the PBO resides. 5. Copy data from CPU to GPU using pointer from glMapBuffer. 6. Unmap PBO (glUnmapBuffer) to allow GPU full access of the PBO again. 7. Transfer data from buffer to a texture target. 8. Unbind the PBO to allow for normal operation again”). 
Therefore, it would have been obvious to one of ordinary skill in the art at the time of the invention was effectively filed to modify Kaminski to have copying, without a user request as taught by Sandgren in order to reduce the data transfer time (See Sandgren: Section 7 “Conclusion”, Page 49, " The average upload time for the standard method across all computers in the test set was ~8 ms, and the download time was ~21 ms rendering download impossible in the given time constraint (40 ms including video-processing). When using the decision algorithms to determine the fastest method the times can be cut down to ~2 ms for upload and ~2 ms for download. This is an overall speedup of 7,25 and most importantly enables download to be performed in Imint's products without spending the entire time budget "). Kaminski teaches a system that shares memory through page table entries; while Sandgren teaches a system that may transfer data between CPU and GPU through a pointer to minimize the data transfer time. Therefore, it is obvious to one of ordinary skill in the art to modify Kaminski by Sandgren to copy data from CPU to GPU through a pointer. The motivation to modify Kaminski by Sandgren is "Use of known technique to improve similar devices (methods, or products) in the same way".
Regarding claim 3, Kaminski, Chen, Ganapathy, and Sandgren teach all the features with respect to claim 2 as outlined above. Further, Kaminski and Ganapathy teach that the method of claim 2, further comprising: enabling a kernel mode (See Kaminski: Fig. 2, and [0071], "The computer system 210 includes an operating system kernel 220, a plurality of CPU processor core devices 230-1 ...	N, a kernel mode device driver (KMDD) 260 (referred to below simply as a device driver 260 or driver 260) for the various accelerator devices 290, and a shared physical memory 250 (e.g., RAM) that operates in accordance with virtual memory (VM) address translation techniques (e.g., translating virtual memory addresses used by the CPU (and its cores) to memory addresses at the memory 250). As used herein, the term "kernel mode device driver" refers to a driver that runs in protected or privileged mode, and has full, unrestricted access to the system memory, devices, processes and other protected subsystems of the OS. Operation of the computer system's operating system kernel 220, the device driver 260 and the accelerator devices 290 will be described below with reference to FIGS. 4, 6, 8 and 9 ") that supports the copying of the data without the user request in response to an attempted access to the pointer (See Ganapathy: Figs. 2-4, and [0021], "The DMA descriptor list includes a starting address for the data and the number of bytes to be transferred. The DMA descriptor list includes other information which is described in greater detail below. The one or more core processors 202A-202N can also form DMA descriptors in the global memory 210 especially for core DMA transfers, in addition to the microcontroller 223. As an example, the microcontroller 223 sets up a DMA with one of the core processors 202A-202N in order to process a frame or block of data for a given communication channel. It communicates with the one core processor the starting address of the descriptor list in the global buffer memory. The one core processor reads through each line in the descriptor list and performs the DMA of the data from the global buffer memory into the core processor's local memory". The data stored in the global buffer memory is automatically copied to the local memory of any core processor 1 to N, which may be mapped to copying, without a user request).
Regarding claim 4, Kaminski, Chen, Ganapathy, and Sandgren teach all the features with respect to claim 2 as outlined above. Further, Kaminski and Chen teach that the method of claim 2, further comprising:
mapping virtual address spaces with physical memories of the first processing unit and of the second processing unit (See Kaminski: Figs. 1-2, and [0072], "When a process requests access to its virtual memory, it is the responsibility of the OS to	map the virtual memory address provided by the process to the physical memory address where that virtual memory is mapped to. The OS stores its mappings of virtual memory addresses to physical memory addresses in a page table");
enabling the virtual address spaces as the memory location of the first processing unit and the memory location of the second processing unit (See Kaminski: Figs. 1-2, and [0073], "Each of the CPU processor cores 230 can be associated with a corresponding one of the OS page tables 240-1 ... N (as indicated by the arrows linking particular ones of the CPU processor cores 230 with corresponding ones of the OS page tables 240-1 ... N). Each of the OS page tables 240-1 ...	N include a plurality of page table entries (not shown) that are each mapped to particular locations in the shared physical memory 250 as indicated by the arrows linking a particular one of the OS page tables 240-1 ...	N with locations at the shared physical memory 250");
causing an operation to be launched on the first processing unit (See Chen: Figs. 1-2, and [0029], "In one embodiment, shared data may be privatized by copying from shared space to the private space. Non-pointer containing data structures may be privatized simply by copying the memory contents. While copying pointer containing data structures, pointers into shared data must be converted to pointers into private data"; and [0030], "Private data may be globalized by copying from the private space to the shared space and made visible to other computations. Non-pointer containing data structures may be globalized simply by copying the memory contents. While copying pointer containing data structures, pointers into private data must be converted as pointers into shared data (converse of the privatization example)"); 
flushing at least a page that was migrated to the second processing unit back to the first processing unit (See Kaminski: Figs. 1-2, and [0077], "The driver 260 also includes an independent memory management unit 280 (i.e., that is independent of the main kernel MMU 225 of the main OS kernel 220). The primary role of driver 260 is to handle the page faults (when the accelerator device 290 tries to access virtual memory area that is not currently in physical memory) and page table related tasks. The MMU 280 includes a process termination detection module 284 that detects when the process terminates (e.g., closes its last open handle), a page fault notification module 286 that receives page fault notifications and a page fault handler module 288 that handles the page fault notifications. These modules will be described in detail below. As will be described in detail below, the memory management unit 280 also issues translation lookaside buffer (TLB) flush indicators to appropriate ones of the accelerator devices 290"); and
unmapping a virtual address space mapped to the second processing unit (See Kaminski: Figs. 1-2, and [0110], "In response to the TLB flush, each accelerator device must determine if its TLB table contains any address translation entries corresponding to the page table entries that were invalidated. If so, the affected accelerator devices must delete such entries from their respective TLB tables. Finally, the potentially affected accelerator devices must signal the driver that they have finished handling the TLB flush operation").
Regarding claim 5, Kaminski, Chen, Ganapathy, and Sandgren teach all the features with respect to claim 2 as outlined above. Further, Chen teaches that the method of claim 2, wherein the pointer is a first managed pointer in a reserved virtual address space of the first processing unit (See Chen: Fig. 1, and [0023], "The system may provide a special malloc function that allocates data in this space 130. Static variables may be annotated with a type quantifier to have them allocated in the shared window 130. However, unlike PGAS languages there is no notion of affinity in the shared window. This is because data in the shared space 130 migrates between the CPU and GPU caches as it gets used by each processor. Also unlike PGAS implementations, the representation of pointers does not change between the shared and private spaces. The remaining virtual address space is private to the CPU 110 and GPU 120. By default data gets allocated in this space 130, and is not visible to the other side. This partitioned address space approach may cut down on the amount of memory that needs to be kept coherent and enables a more efficient implementation for discrete devices").
Regarding claim 6, Kaminski, Chen, Ganapathy, and Sandgren teach all the features with respect to claim 2 as outlined above. Further, Kaminski and Chen teach that the method of claim 2, further comprising:
creating a second managed pointer in a reserved virtual address space of the second processing unit in response to an attempted access to the pointer (See Kaminski: Fig. 2, and [0073], "Each of the CPU processor cores 230 can be associated with a corresponding one of the OS page tables 240-1 ...	N (as indicated by the arrows linking particular ones of the CPU processor cores 230 with corresponding ones of the OS page tables 240-1. .. N). Each of the OS page tables 240-1. .. N include a plurality of page table entries (not shown) that are each mapped to particular locations in the shared physical memory 250 as indicated by the arrows linking a particular one of the OS page tables 240-1 ... N with locations at the shared physical memory 250"); and 
allocating the second managed pointer to the memory location of the second processing unit (See Chen: Fig. In and [0029-0030],"ln one embodiment, shared data may be privatized by copying from shared space to the private space. Non-pointer containing data structures may be privatized simply by copying the memory contents. While copying pointer containing data structures, pointers into shared data must be converted to pointers into private data"; "Private data may be globalized by copying from the private space to the shared space and made visible to other computations. Non- pointer containing data structures may be globalized simply by copying the memory contents. While copying pointer containing data structures, pointers into private data must be converted as pointers into shared data (converse of the privatization example)").
Regarding claim 7, Kaminski, Chen, Ganapathy, and Sandgren teach all the features with respect to claim 2 as outlined above. Further, Kaminski and Chen teach that the method of claim 2, further comprising: triggering a page fault in response to an attempted access to the pointer (See Kaminski: [0018], "The driver can monitor for page fault notifications generated by the accelerator device and handle any page fault notifications received from the accelerator device. When a request for access to the physical memory causes the accelerator device to generate a page fault notification, the driver can determine a memory address space and virtual memory location of a process that contains a virtual memory address specified in the request for access to the physical memory. The driver can then determine whether the request for access to physical memory is a valid request. If the request is determined to be valid, the driver "pins" a limited amount of memory pages of the physical memory for use by the accelerator device to prevent the process from releasing limited amount of memory pages of the physical memory. To update the non-shared page table for the memory pages being used by the accelerator device, the driver can add new page table entries to the non-shared page table or edit existing page table entries in the non-shared page table. When the shared page table is updated, the driver can notify the accelerator device that the page fault has been successfully handled and that the accelerator device is permitted to resume processing. When processing resumes the accelerator device can then use the updated page table entries from the non-shared page table to perform virtual address translation"); and
handling the page fault using a driver in a kernel mode that supports the copying of the data without the user request in response to an attempted access to the pointer (See Chen: Fig. 6, and [0044], "FIG. 6 is a flow chart for one embodiment of the shared memory model in operation. A sequence 500 may be implemented in firmware, software, or hardware. In one embodiment, a sequence 600 may be implemented in firmware, software, or hardware. When the GPU performs an acquire operation (block 610), the corresponding pages may be set to no­ access on the GPU (620). At a subsequent read operation the page fault handler on the GPU copies the page from the CPU (block 640) if the page has been updated and released by the CPU since the last GPU acquire (block 630). The directory and private version numbers may be used to determine this. The page is then set to read-only (block 650). At a subsequent write operation the page fault handler creates the backup copy of the page, marks the page as read­ write and increments the local version number of the page (block 660). At a release point, a diff is performed with the backup copy of the page and the changes transmitted to the home location, while incrementing the directory version number (block 670). The diff operation computes the differences in the memory locations between the two pages (i.e. the page and its backup) to find out the changes that have been made. The CPU operations are done in a symmetrical way. Thus, between acquire and release points the GPU and CPU operate out of their local memory and caches and communicate with each other only at the explicit synchronization points").
Regarding claim 8, Kaminski, Chen, Ganapathy, and Sandgren teach all the features with respect to claim 2 as outlined above. Further, Chen teaches that the method of claim 2, further comprising:
using a call to a managed memory location to perform the allocating of the pointer to the memory location of the first processing unit (See Chen: Fig. 1, and [0023], "The system may provide a special malloc function that allocates data in this space 130. Static variables may be annotated with a type quantifier to have them allocated in the shared window 130. However, unlike PGAS languages there is no notion of affinity in the shared window. This is because data in the shared space 130 migrates between the CPU and GPU caches as it gets used by each processor. Also unlike PGAS implementations, the representation of pointers does not change between the shared and private spaces. The remaining virtual address space is private to the CPU 110 and GPU 120. By default data gets allocated in this space 130, and is not visible to the other side. This partitioned address space approach may cut down on the amount of memory that needs to be kept coherent and enables a more efficient implementation for discrete devices").
Regarding claim 9, Kaminski, Chen, Ganapathy, and Sandgren teach all the features with respect to claim 2 as outlined above. Further, Chen teaches that the method of claim 2, further comprising: receiving a request to perform, on the first processing unit, an operation using altered data in the memory location of the second processing unit (See Chen: Figs.1-2, and [0027], "Embodiments of the invention may provide these ownership rights to leverage common CPU-GPU usage models. For example, the CPU first accesses some data (e.g. initializing a data structure), and then hands it over to the GPU (e.g. computing on the data structure in a data parallel manner), and then the CPU analyzes the results of the computation and so on. The ownership rights allow an application to inform the system of this temporal locality and optimize the coherence implementation. Note that these ownership rights are optimization hints and it is legal for the system to ignore these hints"); and
copying, in the kernel mode, the altered data from the memory location of the second processing unit to the memory location of the first processing unit in response to the received request (See Chen: Fig. 4, and [0041], "This problem may be solved by leveraging the PCI aperture in a novel way. FIG. 4 is a flow chart for one embodiment of the shared memory model that leverages the PCI aperture. A sequence 400 may be implemented in firmware, software, or hardware. During initialization, a portion of the PCI aperture space may be mapped into the user space of the application and instantiated with a task queue, a message queue, and copy buffers (block 410). When there is a need to copy pages (block 420), for example from the CPU to GPU, the runtime copies the pages into the PCI aperture copy buffers and tags the buffers with the virtual address and the process identifier (block 430). On the GPU side, the daemon thread copies the contents of the buffers into its address space by using the virtual address tag (block 440). Thus the copy may be performed in a 2 step process--the CPU copies from its address space into a common buffer (PCI aperture) that both CPU and GPU may access, while the GPU picks up the pages from the common buffer into its address space. GPU- CPU copies are done in a similar way. Since the aperture is pinned memory, the contents of the aperture are not lost if the CPU or GPU process gets context switched out. This allows the two processors to execute asynchronously which may be critical since the two processors may have different operating systems and hence the context switches may not be synchronized. Furthermore, the aperture space may be mapped into the user space of the applications thus enabling user level CPU-GPU communication. This makes the application stack vastly more efficient than going through the OS driver stack").
Regarding claim 10, Kaminski, Chen, Ganapathy, and Sandgren teach all the features with respect to claim 9 as outlined above. Further, Chen teaches that the method of claim 9, further comprising: creating, using the second processing unit, the altered data based at least in part on the data (See Chen: Fig. 5, and [0043], "The metadata says whether the CPU or GPU holds the golden copy of a page (home for the page), contains a version number that tracks the number of updates to the page, mutexes that are acquired before updating the page, and miscellaneous metadata. The directory may be indexed by the virtual address of a page (block 520). Both the CPU and the GPU runtime systems maintain a similar private structure that contains the local access permissions for the pages, and the local version numbers of the pages". Updating the content by GPU may be corresponding to creating altered data).
Regarding claim 11, Kaminski, Chen, Ganapathy, and Sandgren teach all the features with respect to claim 2 as outlined above. Further, Kaminski and Ganapathy teach that the method of claim 2, wherein a kernel mode supports the copying of the data without the user request in response to an attempted access to the pointer (See Ganapathy: Figs. 2-4, and [0021], "The DMA descriptor list includes a starting address for the data and the number of bytes to be transferred. The DMA descriptor list includes other information which is described in greater detail below. The one or more core processors 202A-202N can also form DMA descriptors in the global memory 210 especially for core DMA transfers, in addition to the microcontroller 223. As an example, the microcontroller 223 sets up a DMA with one of the core processors 202A-202N in order to process a frame or block of data for a given communication channel. It communicates with the one core processor the starting address of the descriptor list in the global buffer memory. The one core processor reads through each line in the descriptor list and performs the DMA of the data from the global buffer memory into the core processor's local memory". The data stored in the global buffer memory is automatically copied to the local memory of any core processor 1 to N, which may be mapped to copying, without a user request), and wherein the kernel mode operates independently of user code operating on at least one of the first processing unit and the second processing unit (See Kaminski: [0056], "In accordance with some of the disclosed embodiments, a kernel mode device driver creates and maintains a set of page tables to be used by the accelerator device to provide a consistently correct view of main system memory. These page tables will be referred to herein as separate "non-shared" page tables. These separate non-shared page tables are independent from the OS (i.e., the page tables used by the accelerator device are independent of the page tables used by the CPU for accessing process virtual memory)").
Regarding claim 12, Kaminski, Chen, Ganapathy, and Sandgren teach all the features with respect to claim 2 as outlined above. Further, Kaminski, Chen, Ganapathy, and Sandgren teach that a system (See Kaminski: Fig. 2, and [0016], "Systems and methods are provided that can allow for an accelerator device to share physical memory of a computer system that is managed by and operates under control of an operating system") comprising:
a first processing unit including a first memory location (See Kaminski: Fig. 1, and [0063], "Components of computer 110 may include, but are not limited to, one or more processing units 120, a system memory 130, and a system bus 121 that couples various system components including the system memory to the processing unit 120");
a second processing unit including a second memory location (See Kaminski: Fig. 1, and [0067], "In this regard, GPUs 184 generally include on-chip memory storage, such as register storage and GPUs 184 communicate with a video memory 186"); and
a non-transitory computer-readable medium storing instructions executable by at least one of the first processing unit and the second processing unit to (See XXX: Fig. 1, and [0064], "Computer storage media includes both volatile and nonvolatile, removable and non- removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CDROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computer 110"):
allocate a pointer to a memory location of a first processing unit, the pointer accessible by a second processing unit (See Kaminski: Fig. 2, and [0073], "Each of the CPU processor cores 230 can be associated with a corresponding one of the OS page tables 240-1 ... N (as indicated by the arrows linking particular ones of the CPU processor cores 230 with corresponding ones of the OS page tables 240-1. .. N). Each of the OS page tables 240-1. .. N include a plurality of page table entries (not shown) that are each mapped to particular locations in the shared physical memory 250 as indicated by the arrows linking a particular one of the OS page tables 240-1 ... N with locations at the shared physical memory 250"); and
copy, without a user request (See Ganapathy: Figs. 2-4, and [0021], "The DMA descriptor list includes a starting address for the data and the number of bytes to be transferred. The DMA descriptor list includes other information which is described in greater detail below. The one or more core processors 202A-202N can also form DMA descriptors in the global memory 210 especially for core DMA transfers, in addition to the microcontroller 223. As an example, the microcontroller 223 sets up a DMA with one of the core processors 202A-202N in order to process a frame or block of data for a given communication channel. It communicates with the one core processor the starting address of the descriptor list in the global buffer memory. The one core processor reads through each line in the descriptor list and performs the DMA of the data from the global buffer memory into the core processor's local memory". The data stored in the global buffer memory is automatically copied to the local memory of any core processor 1 to N, which may be mapped to copying, without a user request), data indicated by the pointer (See Sandgren: Section 3.2.1 “Upload”, Page 20, "1. Generate a PBO on the GPU using glGenBuffers. 2. Bind PBO to unpack buffer target. 3. Allocate buffer space on GPU according to data size using glBufferData. 4. Map PBO to CPU memory denying GPU access for now. glMapBuffer returns a pointer to a place in GPU memory where the PBO resides. 5. Copy data from CPU to GPU using pointer from glMapBuffer. 6. Unmap PBO (glUnmapBuffer) to allow GPU full access of the PBO again. 7. Transfer data from buffer to a texture target. 8. Unbind the PBO to allow for normal operation again”) from the memory location of the first processing unit to a memory location of the second processing unit (See Chen: Fig. 3, and [0041], "This problem may be solved by leveraging the PCI aperture in a novel way. FIG. 4 is a flow chart for one embodiment of the shared memory model that leverages the PCI aperture. A sequence 400 may be implemented in firmware, software, or hardware. During initialization, a portion of the PCI aperture space may be mapped into the user space of the application and instantiated with a task queue, a message queue, and copy buffers (block 410). When there is a need to copy pages (block 420), for example from the CPU to GPU, the runtime copies the pages into the PCI aperture copy buffers and tags the buffers with the virtual address and the process identifier (block 430). On the GPU side, the daemon thread copies the contents of the buffers into its address space by using the virtual address tag (block 440). Thus the copy may be performed in a 2 step process-- the CPU copies from its address space into a common buffer (PCI aperture) that both CPU and GPU may access, while the GPU picks up the pages from the common buffer into its address space. GPU- CPU copies are done in a similar way. Since the aperture is pinned memory, the contents of the aperture are not lost if the CPU or GPU process gets context switched out. This allows the two processors to execute asynchronously which may be critical since the two processors may have different operating systems and hence the context switches may not be synchronized. Furthermore, the aperture space may be mapped into the user space of the applications thus enabling user level CPU-GPU communication. This makes the application stack vastly more efficient than going through the OS driver stack").
Regarding claim 13, Kaminski, Chen, Ganapathy, and Sandgren teach all the features with respect to claim 12 as outlined above. Further, Kaminski and Ganapathy teach that the system of claim 12, wherein the instructions executable by the at least one of the first processing unit and the second processing unit are further configured to:
enable a kernel mode (See Kaminski: Fig. 2, and [0071], "The computer system 210 includes an operating system kernel 220, a plurality of CPU processor core devices 230-1 ... N, a kernel mode device driver (KMDD) 260 (referred to below simply as a device driver 260 or driver 260) for the various accelerator devices 290, and a shared physical memory 250 (e.g., RAM) that operates in accordance with virtual memory (VM) address translation techniques (e.g., translating virtual memory addresses used by the CPU (and its cores) to memory addresses at the memory 250). As used herein, the term "kernel mode device driver" refers to a driver that runs in protected or privileged mode, and has full, unrestricted access to the system memory, devices, processes and other protected subsystems of the OS. Operation of the computer system's operating system kernel 220, the device driver 260 and the accelerator devices 290 will be described below with reference to FIGS. 4, 6, 8 and 9 ") that supports the copying of the data without the user request in response to an attempted access to the pointer (See Ganapathy: Figs. 2-4, and [0021], "The DMA descriptor list includes a starting address for the data and the number of bytes to be transferred. The DMA descriptor list includes other information which is described in greater detail below. The one or more core processors 202A- 202N can also form DMA descriptors in the global memory 210 especially for core DMA transfers, in addition to the microcontroller 223. As an example, the microcontroller 223 sets up a DMA with one of the core processors 202A-202N in order to process a frame or block of data for a given communication channel. It communicates with the one core processor the starting address of the descriptor list in the global buffer memory. The one core processor reads through each line in the descriptor list and performs the DMA of the data from the global buffer memory into the core processor's local memory". The data stored in the global buffer memory is automatically copied to the local memory of any core processor 1 to N, which may be mapped to copying, without a user request).
Regarding claim 14, Kaminski, Chen, Ganapathy, and Sandgren teach all the features with respect to claim 12 as outlined above. Further, Kaminski and Chen teach that the system of claim 12, wherein the instructions executable by the at least one of the first processing unit and the second processing unit are further configured to:
map virtual address spaces with physical memories of the first processing unit and of the second processing unit (See Kaminski: Figs. 1-2, and [0072], "When a process requests access to its virtual memory, it is the responsibility of the OS to map the virtual memory address provided by the process to the physical memory address where that virtual memory is mapped to. The OS stores its mappings of virtual memory addresses to physical memory addresses in a page table");
enable the virtual address spaces as the memory location of the first processing unit and the memory location of the second processing unit (See Kaminski: Figs. 1-2, and [0073], "Each of the CPU processor cores 230 can be associated with a corresponding one of the OS page tables 240-1 ...N (as indicated by the arrows linking particular ones of the CPU processor cores 230 with corresponding ones of the OS page tables 240-1 ... N). Each of the OS page tables 240-1 ...N include a plurality of page table entries (not shown) that are each mapped to particular locations in the shared physical memory 250 as indicated by the arrows linking a particular one of the OS page tables 240-1 ...N with locations at the shared physical memory 250");
cause an operation to be launched on the first processing unit (See Chen: Figs. 1-2, and [0029], "In one embodiment, shared data may be privatized by copying from shared space to the private space. Non-pointer containing data structures may be privatized simply by copying the memory contents. While copying pointer containing data structures, pointers into shared data must be converted to pointers into private data"; and [0030], "Private data may be globalized by copying from the private space to the shared space and made visible to other computations. Non-pointer containing data structures may be globalized simply by copying the memory contents. While copying pointer containing data structures, pointers into private data must be converted as pointers into shared data (converse of the privatization example)");
flush at least a page that was migrated to the second processing unit back to the first processing unit (See Kaminski: Figs. 1-2, and [0077], "The driver 260 also includes an independent memory management unit 280 (i.e., that is independent of the main kernel MMU 225 of the main OS kernel 220). The primary role of driver 260 is to handle the page faults (when the accelerator device 290 tries to access virtual memory area that is not currently in physical memory) and page table related tasks. The MMU 280 includes a process termination detection module 284 that detects when the process terminates (e.g., closes its last open handle), a page fault notification module 286 that receives page fault notifications and a page fault handler module 288 that handles the page fault notifications. These modules will be described in detail below. As will be described in detail below, the memory management unit 280 also issues translation lookaside buffer (TLB) flush indicators to appropriate ones of the accelerator devices 290"); and
unmap a virtual address space mapped to the second processing unit (See Kaminski: Figs. 1-2, and [0110], "In response to the TLB flush, each accelerator device must determine if its TLB table contains any address translation entries corresponding to the page table entries that were invalidated. If so, the affected accelerator devices must delete such entries from their respective TLB tables. Finally, the potentially affected accelerator devices must signal the driver that they have finished handling the TLB flush operation").
Regarding claim 15, Kaminski, Chen, Ganapathy, and Sandgren teach all the features with respect to claim 12 as outlined above. Further, Chen teaches that the system of claim 12, wherein the pointer is a first managed pointer in a reserved virtual address space of the first processing unit (See Chen: Fig. 1, and [0023], "The system may provide a special malloc function that allocates data in this space 130. Static variables may be annotated with a type quantifier to have them allocated in the shared window 130. However, unlike PGAS languages there is no notion of affinity in the shared window. This is because data in the shared space 130 migrates between the CPU and GPU caches as it gets used by each processor. Also unlike PGAS implementations, the representation of pointers does not change between the shared and private spaces. The remaining virtual address space is private to the CPU 110 and GPU 120. By default data gets allocated in this space 130, and is not visible to the other side. This partitioned address space approach may cut down on the amount of memory that needs to be kept coherent and enables a more efficient implementation for discrete devices").
Regarding claim 16, Kaminski, Chen, Ganapathy, and Sandgren teach all the features with respect to claim 12 as outlined above. Further, Kaminski and Chen teach that the system of claim 12, wherein the instructions executable by the at least one of the first processing unit and the second processing unit are further configured to:
create a second managed pointer in a reserved virtual address space of the second processing unit in response to an attempted access to the pointer (See Kaminski: Fig. 2, and [0073], "Each of the CPU processor cores 230 can be associated with a corresponding one of the OS page tables 240-1 ...N (as indicated by the arrows linking particular ones of the CPU processor cores 230 with corresponding ones of the OS page tables 240-1. .. N). Each of the OS page tables 240-1. .. N include a plurality of page table entries (not shown) that are each mapped to particular locations in the shared physical memory 250 as indicated by the arrows linking a particular one of the OS page tables 240-1 ... N with locations at the shared physical memory 250"); and
allocate the second managed pointer to the memory location of the second processing unit (See Chen: Fig. In and [0029-0030],"ln one embodiment, shared data may be privatized by copying from shared space to the private space. Non-pointer containing data structures may be privatized simply by copying the memory contents. While copying pointer containing data structures, pointers into shared data must be converted to pointers into private data"; "Private data may be globalized by copying from the private space to the shared space and made visible to other computations. Non- pointer containing data structures may be globalized simply by copying the memory contents. While copying pointer containing data structures, pointers into private data must be converted as pointers into shared data (converse of the privatization example)").
Regarding claim 17, Kaminski, Chen, Ganapathy, and Sandgren teach all the features with respect to claim 12 as outlined above. Further, Kaminski and Chen teach that the system of claim 12, wherein the instructions executable by the at least one of the first processing unit and the second processing unit are further configured to:
trigger a page fault in response to an attempted access to the pointer (See Kaminski: [0018], "The driver can monitor for page fault notifications generated by the accelerator device and handle any page fault notifications received from the accelerator device. When a request for access to the physical memory causes the accelerator device to generate a page fault notification, the driver can determine a memory address space and virtual memory location of a process that contains a virtual memory address specified in the request for access to the physical memory. The driver can then determine whether the request for access to physical memory is a valid request. If the request is determined to be valid, the driver "pins" a limited amount of memory pages of the physical memory for use by the accelerator device to prevent the process from releasing limited amount of memory pages of the physical memory. To update the non-shared page table for the memory pages being used by the accelerator device, the driver can add new page table entries to the non-shared page table or edit existing page table entries in the non-shared page table. When the shared page table is updated, the driver can notify the accelerator device that the page fault has been successfully handled and that the accelerator device is permitted to resume processing. When processing resumes the accelerator device can then use the updated page table entries from the non-shared page table to perform virtual address translation"); and
handle the page fault using a driver in a kernel mode that supports the copying of the data without the user request in response to an attempted access to the pointer (See Chen: Fig. 6, and [0044], "FIG. 6 is a flow chart for one embodiment of the shared memory model in operation. A sequence 500 may be implemented in firmware, software, or hardware. In one embodiment, a sequence 600 may be implemented in firmware, software, or hardware. When the GPU performs an acquire operation (block 610), the corresponding pages may be set to no­ access on the GPU (620). At a subsequent read operation the page fault handler on the GPU copies the page from the CPU (block 640) if the page has been updated and released by the CPU since the last GPU acquire (block 630). The directory and private version numbers may be used to determine this. The page is then set to read-only (block 650). At a subsequent write operation the page fault handler creates the backup copy of the page, marks the page as read­ write and increments the local version number of the page (block 660). At a release point, a diff is performed with the backup copy of the page and the changes transmitted to the home location, while incrementing the directory version number (block 670). The diff operation computes the differences in the memory locations between the two pages (i.e. the page and its backup) to find out the changes that have been made. The CPU operations are done in a symmetrical way. Thus, between acquire and release points the GPU and CPU operate out of their local memory and caches and communicate with each other only at the explicit synchronization points").
Regarding claim 18, Kaminski, Chen, Ganapathy, and Sandgren teach all the features with respect to claim 12 as outlined above. Further, Chen teaches that the system of claim 12, wherein the instructions executable by the at least one of the first processing unit and the second processing unit are further configured to:
use a call to a managed memory location to perform the allocating of the pointer to the memory location of the first processing unit (See Chen: Fig. 1, and [0023], "The system may provide a special malloc function that allocates data in this space 130. Static variables may be annotated with a type quantifier to have them allocated in the shared window 130. However, unlike PGAS languages there is no notion of affinity in the shared window. This is because data in the shared space 130 migrates between the CPU and GPU caches as it gets used by each processor. Also unlike PGAS implementations, the representation of pointers does not change between the shared and private spaces. The remaining virtual address space is private to the CPU 110 and GPU 120. By default data gets allocated in this space 130, and is not visible to the other side. This partitioned address space approach may cut down on the amount of memory that needs to be kept coherent and enables a more efficient implementation for discrete devices").
Regarding claim 19, Kaminski, Chen, Ganapathy, and Sandgren teach all the features with respect to claim 12 as outlined above. Further, Chen teaches that the system of claim 12, wherein the instructions executable by the at least one of the first processing unit and the second processing unit are further configured to:
receive a request to perform, on the first processing unit, an operation using altered data in the memory location of the second processing unit (See Chen: Figs. 1-2, and [0027], "Embodiments of the invention may provide these ownership rights to leverage common CPU- GPU usage models. For example, the CPU first accesses some data (e.g. initializing a data structure), and then hands it over to the GPU (e.g. computing on the data structure in a data parallel manner), and then the CPU analyzes the results of the computation and so on. The ownership rights allow an application to inform the system of this temporal locality and optimize the coherence implementation. Note that these ownership rights are optimization hints and it is legal for the system to ignore these hints"); and
copy, in the kernel mode, the altered data from the memory location of the second processing unit to the memory location of the first processing unit in response to the received request (See Chen: Fig. 4, and [0041], "This problem may be solved by leveraging the PCI aperture in a novel way. FIG. 4 is a flow chart for one embodiment of the shared memory model that leverages the PCI aperture. A sequence 400 may be implemented in firmware, software, or hardware. During initialization, a portion of the PCI aperture space may be mapped into the user space of the application and instantiated with a task queue, a message queue, and copy buffers (block 410). When there is a need to copy pages (block 420), for example from the CPU to GPU, the runtime copies the pages into the PCI aperture copy buffers and tags the buffers with the virtual address and the process identifier (block 430). On the GPU side, the daemon thread copies the contents of the buffers into its address space by using the virtual address tag (block 440). Thus the copy may be performed in a 2 step process--the CPU copies from its address space into a common buffer (PCI aperture) that both CPU and GPU may access, while the GPU picks up the pages from the common buffer into its address space. GPU- CPU copies are done in a similar way. Since the aperture is pinned memory, the contents of the aperture are not lost if the CPU or GPU process gets context switched out. This allows the two processors to execute asynchronously which may be critical since the two processors may have different operating systems and hence the context switches may not be synchronized. Furthermore, the aperture space may be mapped into the user space of the applications thus enabling user level CPU-GPU communication. This makes the application stack vastly more efficient than going through the OS driver stack").
Regarding claim 20, Kaminski, Chen, Ganapathy, and Sandgren teach all the features with respect to claim 19 as outlined above. Further, Chen teaches that the system of claim 19, wherein the instructions executable by the at least one of the first processing unit and the second processing unit are further configured to: 
create, using the second processing unit, the altered data based at least in part on the data (See Chen: Fig. 5, and [0043], "The metadata says whether the CPU or GPU holds the golden copy of a page (home for the page), contains a version number that tracks the number of updates to the page, mutexes that are acquired before updating the page, and miscellaneous metadata. The directory may be indexed by the virtual address of a page (block 520). Both the CPU and the GPU runtime systems maintain a similar private structure that contains the local access permissions for the pages, and the local version numbers of the pages". Updating the content by GPU may be corresponding to creating altered data).
Regarding claim 21, Kaminski, Chen, Ganapathy, and Sandgren teach all the features with respect to claim 12 as outlined above. Further, Kaminski and Ganapathy teach that the system of claim 12, wherein a kernel mode supports the copying of the data without the user request in response to an attempted access to the pointer (See Ganapathy: Figs. 2-4, and [0021], "The DMA descriptor list includes a starting address for the data and the number of bytes to be transferred. The DMA descriptor list includes other information which is described in greater detail below. The one or more core processors 202A-202N can also form DMA descriptors in the global memory 210 especially for core DMA transfers, in addition to the microcontroller 223. As an example, the microcontroller 223 sets up a DMA with one of the core processors 202A-202N in order to process a frame or block of data for a given communication channel. It communicates with the one core processor the starting address of the descriptor list in the global buffer memory. The one core processor reads through each line in the descriptor list and performs the DMA of the data from the global buffer memory into the core processor's local memory". The data stored in the global buffer memory is automatically copied to the local memory of any core processor 1 to N, which may be mapped to copying, without a user request), and wherein the kernel mode operates independently of user code operating on at least one of the first processing unit and the second processing unit (See Kaminski: [0056], "In accordance with some of the disclosed embodiments, a kernel mode device driver creates and maintains a set of page tables to be used by the accelerator device to provide a consistently correct view of main system memory. These page tables will be referred to herein as separate "non-shared" page tables. These separate non-shared page tables are independent from the OS (i.e., the page tables used by the accelerator device are independent of the page tables used by the CPU for accessing process virtual memory)").
Regarding claim 22, Kaminski, Chen, Ganapathy, and Sandgren teach all the features with respect to claim 2 as outlined above. Further, Kaminski, Chen, Ganapathy, and Sandgren teach that a computing device (See Kaminski: Fig. 2, and [0016], "Systems and methods are provided that can allow for an accelerator device to share physical memory of a computer system that is managed by and operates under control of an operating system"), comprising:
a first processing unit with a first memory location and having an allocated pointer that is accessible by a second processing unit having a second memory location (See Kaminski: Fig. 2, and [0073], "Each of the CPU processor cores 230 can be associated with a corresponding one of the OS page tables 240-1 ... N (as indicated by the arrows linking particular ones of the CPU processor cores 230 with corresponding ones of the OS page tables 240-1. .. N). Each of the OS page tables 240-1. .. N include a plurality of page table entries (not shown) that are each mapped to particular locations in the shared physical memory 250 as indicated by the arrows linking a particular one of the OS page tables 240-1 ... N with locations at the shared physical memory 250"); and
a driver to copy the data indicated by the pointer (See Sandgren: Section 3.2.1 “Upload”, Page 20, "1. Generate a PBO on the GPU using glGenBuffers. 2. Bind PBO to unpack buffer target. 3. Allocate buffer space on GPU according to data size using glBufferData. 4. Map PBO to CPU memory denying GPU access for now. glMapBuffer returns a pointer to a place in GPU memory where the PBO resides. 5. Copy data from CPU to GPU using pointer from glMapBuffer. 6. Unmap PBO (glUnmapBuffer) to allow GPU full access of the PBO again. 7. Transfer data from buffer to a texture target. 8. Unbind the PBO to allow for normal operation again”) from the first memory location to the second memory location (See Chen: Fig. 3, and [0041], "This problem may be solved by leveraging the PCI aperture in a novel way. FIG. 4 is a flow chart for one embodiment of the shared memory model that leverages the PCI aperture. A sequence 400 may be implemented in firmware, software, or hardware. During initialization, a portion of the PCI aperture space may be mapped into the user space of the application and instantiated with a task queue, a message queue, and copy buffers (block 410). When there is a need to copy pages (block 420), for example from the CPU to GPU, the runtime copies the pages into the PCI aperture copy buffers and tags the buffers with the virtual address and the process identifier (block 430). On the GPU side, the daemon thread copies the contents of the buffers into its address space by using the virtual address tag (block 440). Thus the copy may be performed in a 2 step process--the CPU copies from its address space into a common buffer (PCI aperture) that both CPU and GPU may access, while the GPU picks up the pages from the common buffer into its address space. GPU-CPU copies are done in a similar way. Since the aperture is pinned memory, the contents of the aperture are not lost if the CPU or GPU process gets context switched out. This allows the two processors to execute asynchronously which may be critical since the two processors may have different operating systems and hence the context switches may not be synchronized. Furthermore, the aperture space may be mapped into the user space of the applications thus enabling user level CPU-GPU communication. This makes the application stack vastly more efficient than going through the OS driver stack"), without a user request for the copying (See Ganapathy: Figs. 2-4, and [0021], "The DMA descriptor list includes a starting address for the data and the number of bytes to be transferred. The DMA descriptor list includes other information which is described in greater detail below. The one or more core processors 202A- 202N can also form DMA descriptors in the global memory 210 especially for core DMA transfers, in addition to the microcontroller 223. As an example, the microcontroller 223 sets up a DMA with one of the core processors 202A-202N in order to process a frame or block of data for a given communication channel. It communicates with the one core processor the starting address of the descriptor list in the global buffer memory. The one core processor reads through each line in the descriptor list and performs the DMA of the data from the global buffer memory into the core processor's local memory". The data stored in the global buffer memory is automatically copied to the local memory of any core processor 1 to N, which may be mapped to copying, without a user request).
Regarding claim 23, Kaminski, Chen, Ganapathy, and Sandgren teach all the features with respect to claim 19 as outlined above. Further, Kaminski and Ganapathy teach that the computing device of claim 19, further comprising:
a kernel mode (See Kaminski: Fig. 2, and [0071], "The computer system 210 includes an operating system kernel 220, a plurality of CPU processor core devices 230-1 ...	N, a kernel mode device driver (KMDD) 260 (referred to below simply as a device driver 260 or driver 260) for the various accelerator devices 290, and a shared physical memory 250 (e.g., RAM) that operates in accordance with virtual memory (VM) address translation techniques (e.g., translating virtual memory addresses used by the CPU (and its cores) to memory addresses at the memory 250). As used herein, the term "kernel mode device driver" refers to a driver that runs in protected or privileged mode, and has full, unrestricted access to the system memory, devices, processes and other protected subsystems of the OS. Operation of the computer system's operating system kernel 220, the device driver 260 and the accelerator devices 290 will be described below with reference to FIGS. 4, 6, 8 and 9 ") that supports the copying of the data without the user request in response to an attempted access to the pointer (See Ganapathy: Figs. 2-4, and [0021], "The DMA descriptor list includes a starting address for the data and the number of bytes to be transferred. The DMA descriptor list includes other information which is described in greater detail below. The one or more core processors 202A- 202N can also form DMA descriptors in the global memory 210 especially for core DMA transfers, in addition to the microcontroller 223. As an example, the microcontroller 223 sets up a DMA with one of the core processors 202A-202N in order to process a frame or block of data for a given communication channel. It communicates with the one core processor the starting address of the descriptor list in the global buffer memory. The one core processor reads through each line in the descriptor list and performs the DMA of the data from the global buffer memory into the core processor's local memory". The data stored in the global buffer memory is automatically copied to the local memory of any core processor 1 to N, which may be mapped to copying, without a user request).






Conclusion



Any inquiry concerning this communication or earlier communications from the examiner should be directed to GORDON G LIU whose telephone number is (571)270-0382. The examiner can normally be reached Monday - Friday 8:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Mehmood can be reached on 571-272-2976. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/GORDON G LIU/Primary Examiner, Art Unit 2612