DETAILED ACTION
Claims 1-20 are pending.
Priority: March 29, 2019
Assignee: AMD


Response to Arguments
Applicant's arguments filed 11/4/2020 have been fully considered but they are not persuasive. The applicant argues that regarding claims 1, 10 and 19 the prior art of Duluk(2017/0249254, Duluk2) does not disclose:
“…wherein the first processing unit selectively provides the atomic operation to the trap handler in response to detecting that the memory access request is directed to the second memory via the interface…”.
The USPTO disagrees with this assertion. The applicant does not elaborate how the second memory is structurally or functionally different than the first memory, except for stating that “…the second memory and the interface does not support atomicity of the atomic operations…”.
The “trap handler” is equivalent to the “fault handler” of Duluk2 where in para. 0066 it is explained that the sequence migrates memory pages from one memory to another. In para. 0104, “…Fault buffer entries 502 may include, for example, the type of access that was attempted (e.g., read, write, or atomic), the virtual memory address for .  


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-7, 10-16, 19 are rejected under AIA  35 U.S.C. 103(a) as being unpatentable over Duluk et al (20140267334) in view of Duluk et al (20170249254, hereinafter Duluk-2).

As per Claim 1, Duluk discloses an apparatus (Duluk, [0030 – In Fig. 2, unified virtual memory system 200 includes CPU 102, system memory 104, and PPU 202 coupled to PPU memory 204. CPU 102 and system memory 104 are coupled to each other and to PPU 202 via memory bridge 105]) comprising:
a first processing unit configured to access a first memory that supports atomic operations and a second memory (Duluk, [Abstract - A first processing unit performs an atomic operation on a memory page associated with a page table entry corresponding to the first processing unit]; [0008 - To support atomic operations in a system, they are handled on a localized basis. i.e. multiple cores in the CPU execute atomic operations and specialized hardware included in the CPU ensure the integrity of the results. This is similar to the recitation in Para-0025 of the spec]; [0031 – CPU 102 executes threads that request data stored in system memory 104 or PPU memory 204 via a virtual memory address]; [0047 - A memory page is CPU-shared if it is stored in system memory 104 and a mapping to it exists in PPU page table 208 to allow access by PPU 202]) via an interface (Duluk, [0030 – CPU 102 and system memory 104 are coupled to each other and to PPU 202 via memory bridge 105]; [0107 - Including an atomic enable/disable mechanism in PPU page table entries enables efficient support for atomic operations across the processing units included in a computer system, implying that the first processing unit is also included. The above citations are also applicable to PPU 202 operating as a first processing unit, and is a valid interpretation because the claim does not explicitly recite if the first processing unit is a CPU or PPU/GPU]), 
wherein at least one of the second memory and the interface does not support atomicity of the atomic operations (Duluk, [0008 - The atomicity of an ‘atomic’ operation initiated by a PPU/GPU is preserved with respect to the particular GPU, but is not ensured with respect to the CPU]; [0047 - A memory page is PPU-owned if PPU/GPU 202 can access the page via a virtual address, and if CPU 102 cannot access the memory page via a virtual address without causing a page fault]; [0006, 0007 - If not handled appropriately, atomic operations like read-modify-write to a shared memory page may result in unintended results. For example, if a particular memory location were to specify a value of 10. Further, if a CPU and a GPU were to perform an increment atomic operation on the memory location in parallel. If the GPU and CPU are permitted to simultaneously access the memory page, then the resulting memory location could be updated to a value of 11 instead of the desired value of 12]); 
a trap handler configured to trap an atomic operation (Duluk, [0032 - CPU 102 includes CPU fault handler 211, which executes steps in response to CPU MMU 209 generating a page fault, to make requested data available to the CPU 102, implying that the atomic access operation is trapped]; [0034 - CPU fault/trap handler 211 and PPU fault handler 215 can be a unified software program that is invoked by a fault on either the CPU 102 or the PPU 202. Since the claim does not recite the components of the trap/fault handler, it is valid to interpret the trap handler a unified component]) that includes a memory access request (Duluk, [0078 - An operation executing in the CPU 102 attempts to access memory at a virtual memory address that is not mapped in CPU page table 206, which causes a CPU-based page fault.  CPU fault handler 211 reads the PSD 210 entry corresponding to the virtual memory address and identifies the memory page associated with the virtual memory address]) and enforce atomicity of the atomic operation (Duluk, [0012 – A computer system such as Fig. 2, that implements a unified VM architecture common to both the CPU and PPU/GPU ensures the atomicity of atomic operations on memory that is shared between the processing units]; [0011 - Receiving a request from a first processing unit to perform an atomic operation on the memory page, determining that an atomic permission bit in a first page table entry included in a first page table associated with the first processing unit is inactive, activating the atomic permission bit, and performing the atomic operation on the memory page while denying memory write and atomic accesses to the memory page by any processing unit except the first processing unit]), 
Duluk-2 further discloses,
wherein the first processing unit selectively (Duluk-2, [Abstract, 0058 - If the CPU page table 206 does not include a mapping corresponding to the virtual memory address, helps determine the selective action]) provides the atomic operation to the trap handler (Duluk-2, [0098 - Atomic operations and transition states are used to effectively manage a page fault sequence]; [0094 - A fault by CPU 102 initiates a transition from PPU-owned to CPU-owned, implying that the CPU-based page fault results in providing the atomic access operation to the trap handler. Since the claim does not define ‘atomic operation’ or its steps, it is valid to interpret that the atomic access operation also includes a compare-and swap atomic operation]; [0088 - When CPU MMU 209 generates a page fault, the thread that requested the data at the virtual memory address stalls, and a local fault handler like the CPU fault handler 211 remedies the page fault by executing a page fault sequence], [0104 -- The fault buffer 216 stores fault buffer entries 502 that indicate information related to page faults generated by the PPU 202. Fault buffer entries 502 may include, for example, the type of access that was attempted (e.g., read, write, or atomic), the virtual memory address for which an attempted access caused a page fault, the virtual address space, and an indication of a unit or thread that caused a page fault.]) in response to detecting that the memory access request is directed to the second memory via the interface (Duluk-2, [0094 - An operation executing in CPU 102 attempts to access memory at a virtual memory address that is not mapped in the CPU page table 206, which causes a CPU-based page fault. The CPU fault handler 211 reads the PSD 210 entry corresponding to the virtual memory address and identifies the memory page associated with the virtual memory address. After reading the PSD 210, the CPU fault handler 211 determines that the current ownership state for the memory page associated with the virtual memory address is PPU-owned/second memory]).
Therefore it would have been obvious to a person of ordinary skill at the time of filing to incorporate the virtual memory management of Duluk-2 into the CPU-to-GPU atomic operations of Duluk, for the benefit of using a processing unit to execute an operation that references a virtual memory address, wherein a MMU associated with the processing unit and configured to generate a page fault upon determining that a page table stored in a memory unit associated with the processing unit does not include a mapping corresponding to the virtual memory address. An associated copy engine 

As per Claim 2, the rejection of claim 1 is incorporated, and Duluk, Duluk-2 further disclose, 
wherein the first processing unit is configured to determine whether the memory access request (Duluk-2, [0056 – In Fig. 2, when a thread executing in CPU 102 requests data via a virtual memory address, CPU 102 requests translation of the virtual memory address/memory access request to a physical memory address from CPU MMU 209. In response, CPU MMU 209 attempts to translate the virtual memory address into a physical memory address]; [0058 - For any given virtual memory address, CPU page table 206 may or may not include a mapping between the virtual memory address and a physical memory address]) is directed to the second memory via the interface based on a translation of a virtual address in the memory access request to a physical address (Duluk-2, [0058 - If the CPU page table 206 does not include a mapping associated with the virtual memory address, then CPU MMU 209 is unable to translate the virtual memory address into a physical memory address, implying that the request is directed to the second memory]).


wherein the first processing unit provides the atomic operation (Duluk-2, [0049 - CPU 102 updates PSD 210, wherein updates to a PSD 210 page in system memory 104 is accomplished by using atomic compare-and-swap across the PCI-E bus]) to the trap handler in response to the physical address being in the second memory (Duluk-2, [0058 - If the CPU page table 206 does not include a mapping associated with the virtual memory address, then CPU MMU 209 is unable to translate the virtual memory address into a physical memory address, and the CPU MMU 209 generates a page fault]; [0060 - When the CPU MMU 209 generates a page fault, the CPU fault/trap handler 211 executes a sequence of atomic operations for the appropriate page fault sequence to remedy the page fault. As per Para-0036, CPU 102 includes CPU fault handler 211 which executes steps in response to CPU MMU 209 generating a page fault. This implies that the first processing unit provides the atomic operation to the fault handler]).

As per Claim 4, the rejection of claim 2 is incorporated, and Duluk, Duluk-2 further disclose,
wherein the first processing unit does not provide the atomic operation to the trap handler in response to the physical address being in the first memory (Duluk-2, [0058 - If the CPU page table 206 includes a mapping, then the CPU MMU 209 reads that mapping to determine a physical memory address associated with the virtual memory address and provides that physical memory address to the CPU 102. Since the physical address was in the first memory, it implies that the atomic operation is not provided to the fault/trap handler]).

As per Claim 5, the rejection of claim 2 is incorporated, and Duluk, Duluk-2 further disclose,
a memory management unit (MMU) configured to translate the virtual address to the physical address (Duluk-2, [0036 - CPU 102 includes CPU MMU 209, which processes requests from the CPU 102 for translating virtual memory addresses to physical memory addresses]) and selectively provide the atomic operation directly to the interface or to the trap handler based on whether the physical address is in the first memory or the second memory (Duluk-2, [0058 - If the CPU page table 206 includes a mapping, then CPU MMU 209 reads that mapping to determine a physical memory address associated with the virtual memory address and provides that physical memory address to CPU 102. However, if the CPU page table 206 does not include a mapping associated with the virtual memory address, then CPU MMU 209 is unable to translate the virtual memory address into a physical memory address, and the CPU MMU 209 generates a page fault, and remedy it via a page fault sequence and providing the atomic operation/atomic access to the fault handler]).

As per Claim 6, the rejection of claim 1 is incorporated, and Duluk, Duluk-2 further disclose,
wherein the trap handler is configured to perform a compare-and-swap operation (Duluk-2, [0085 - For modifications to PSD 210 entries, the CPU fault handler 211 or the PPU fault handler 215 issues an atomic compare and swap operation to modify the page state of a particular entry in the PSD 210. Consequently, the modification is done without interference by operations from other units]) to verify that a value at a memory location indicated by the memory access request has not changed between an initiation of the atomic operation and performance of the compare-and-swap operation (Duluk-2, [0099 - The PPU fault handler 215 reads PSD 210 to determine which memory page is associated with the virtual memory address and to determine the state for the virtual memory address]; [0100 - Based on the state information obtained from PSD 210, PPU fault handler 215 determines that the new state for the memory page should be CPU-shared. The PPU fault handler 215 changes the state to ‘transitioning to CPU-shared’. This state indicates that the page is currently in the process of being transitioned to CPU-shared. When the PPU fault handler 215 runs on a microcontroller in the memory management unit, then two processors update the PSD 210 asynchronously, using atomic compare-and-swap/CAS operations on the PSD 210 to change the state to ‘transitioning to GPU visible’/CPU-shared; This shows that though the state of the memory page changes, the memory location value of the request before the initiation of the atomic operation and completion of the CAS, remains unchanged]).

As per Claim 7, the rejection of claim 6 is incorporated, and Duluk, Duluk-2 further disclose,
wherein the trap handler is configured to allow the atomic operation to modify or write to the memory location in response to the value at the memory location being unchanged (Duluk-2, [0101 - The PPU 202 updates the PPU page table 208 to associate the virtual address with the memory page. The PPU 202 also invalidates the TLB cache entries. Next, the PPU 202 performs another atomic compare-and-swap operation on the PSD 210 to change the ownership state associated with the memory page to CPU-shared. Finally, the page fault sequence ends, and the thread that requested the data via the virtual memory address resumes execution; Changing the ownership state is a valid modification of the memory location because the claim does not recite the contents/format of the memory location]).

As per Claim 10, Duluk discloses a method (Duluk, [0018 - Figs. 5A-5B show method steps for performing an atomic operation on a memory page shared by a CPU and PPU]) comprising:
receiving, at a first processing unit (Duluk, [0020 – In Fig. 1, I/O bridge 107 receives user input and forwards it to CPU 102 via communication path 106 and memory bridge 105]) configured to access a first memory that supports atomic operations and a second memory (Duluk, [Abstract - A first processing unit performs an atomic operation on a memory page associated with a page table entry corresponding to the first processing unit]; [0008 - To support atomic operations in a system, they are handled on a localized basis. i.e. multiple cores in the CPU execute atomic operations and specialized hardware included in the CPU ensure the integrity of the results. This is similar to the recitation in Para-0025 of the spec]; [0031 – CPU 102 executes threads that request data stored in system memory 104 or PPU memory 204 via a virtual memory address]; [0047 - A memory page is CPU-shared if it is stored in system memory 104 and a mapping to it exists in PPU page table 208 to allow access by PPU 202]) via an interface (Duluk, [0030 – CPU 102 and system memory 104 are coupled to each other and to PPU 202 via memory bridge 105]; [0107 - Including an atomic enable/disable mechanism in PPU page table entries enables efficient support for atomic operations across the processing units included in a computer system, implying that the first processing unit is also included. The above citations are also applicable to PPU 202 operating as a first processing unit, and is a valid interpretation because the claim does not explicitly recite if the first processing unit is a CPU or PPU/GPU]), an atomic operation including a memory access request (Duluk, [0011 - Receiving a request from a first processing unit to perform an atomic operation on the memory page]), 


As per Claim 11, it is similar to claim 2 and therefore the same rejections are incorporated.

As per Claim 12, it is similar to claim 3 and therefore the same rejections are incorporated.

As per Claim 13, the rejection of claim 12 is incorporated, and Duluk further discloses,
determining an initial value at a memory location indicated by the memory access request (Duluk, [0011 - Receiving a request from a first processing unit to perform an atomic operation on the memory page]; [0032 - CPU 102 includes CPU MMU 209, which processes requests from CPU 102 for translating virtual memory addresses to physical memory addresses, thereby helping to determine the initial value at the memory location for reading]) prior to an initiation of the atomic operation (Duluk, [0006 - When a read-modify-write atomic operation to a memory location is performed, the processing unit prevents other accesses to the memory location while the process executing the atomic operation reads a value from the memory location]).

As per Claim 14, the rejection of claim 13 is incorporated, and Duluk further discloses,
verifying that a value at the memory location prior to completion of the atomic operation is unchanged from the initial value (Duluk, [0011 - Receiving a request from a first processing unit to perform an atomic operation on the memory page, determining that an atomic permission bit in a first page table entry included in a first page table associated with the first processing unit is inactive, activating the atomic permission bit, and performing the atomic operation on the memory page while denying memory write and atomic accesses to the memory page by any processing unit except the first processing unit; The claim does not recite what the atomic operation is. So it is valid to interpret that if the atomic operation was a read-modify-write, then these steps verify that a value at the memory location prior to completion of the atomic operation is unchanged from the initial value]).

As per Claim 15, it is similar to claim 7 and therefore the same rejections are incorporated.

As per Claim 16, it is similar to claim 4 and therefore the same rejections are incorporated.

As per Claim 19, Duluk discloses an apparatus (Duluk, [0030 – In Fig. 2, unified virtual memory system 200 includes CPU 102, system memory 104, PPU coupled to PPU memory 204. CPU 102 and system memory 104 are coupled to each other and to the PPU 202 via memory bridge 105]) comprising:
a first processing unit configured to access a first memory that supports atomic operations (Duluk, [Abstract - A first processing unit performs an atomic operation on a memory page associated with a page table entry corresponding to the first processing unit]; [0008 - To support atomic operations in a system, they are handled on a localized basis. i.e. multiple cores in the CPU execute atomic operations and specialized hardware included in the CPU ensure the integrity of the results. This is similar to the recitation in Para-0025 of the spec]; [0031 – CPU 102 executes threads that request data stored in system memory 104 or PPU memory 204 via a virtual memory address]; [0047 - A memory page is CPU-shared if it is stored in system memory 104 and a mapping to it exists in PPU page table 208 to allow access by PPU 202]; [0107 - Including an atomic enable/disable mechanism in PPU page table entries enables efficient support for atomic operations across the processing units included in a computer system, implying that the first processing unit is also included. The above citations are also applicable to PPU 202 operating as a first processing unit, and is a valid interpretation because the claim does not explicitly recite if the first processing unit is a CPU or PPU/GPU]);
a second processing unit configured to access a second memory (Duluk, [0034 – In Fig. 2, PPU 202 executes instructions that requests data stored in PPU memory 204 via a virtual memory address. PPU 202 includes a PPU MMU 213, which processes requests from PPU 202 for translating virtual memory addresses to physical memory addresses]); 
an interface configured to support memory access requests from the first processing unit to the second memory (Duluk, [0030 – CPU 102 and system memory 104 are coupled to each other and to PPU 202 via memory bridge 105]), 
The remaining limitations are similar to claim 1 and therefore the same rejections are incorporated.


Claims 8-9, 17-18, 20 are rejected under AIA  35 U.S.C. 103(a) as being unpatentable over Duluk et al (20140267334) in view of Duluk et al (20170249254, hereinafter Duluk-2) and Duluk et al (20170371822, hereinafter Duluk-3).

As per Claim 8, the rejection of claim 1 is incorporated, and Duluk, Duluk-2 disclose a command queue.
Duluk-3 further discloses,
wherein the first processing unit is configured to detect a frequency of traps that result from atomic operations that include memory access requests to a page stored in the second memory (Duluk-3, [0006 – If the first processor accesses certain shared memory pages more often than the second processor, then shared memory access time is improved by migrating such pages from the local memory space of the second processor to the local memory space of the first processor. These pages are identified by invalidating all page table entries associated with the first processor's shared memory page addresses for a specified measurement interval. During the measurement interval, page faults/traps resulting from accesses by the first processor to shared/second memory are counted]; [0005 - The shared memory pages are mapped to a local memory of a second processor]).
Therefore it would have been obvious to a person of ordinary skill at the time of filing to incorporate the trap frequency of Duluk-3 into the CPU-to-GPU atomic operations of Duluk, Duluk-2 for the benefit of intelligent migration wherein a CPU-shared memory page associated with a cache entry that has a high access count may be a good candidate for migration from the CPU memory system to the GPU memory system. By contrast, a CPU-shared memory page associated with a cache entry that has a low access count may remain in the CPU memory system (Duluk-3, 0018).

As per Claim 9, the rejection of claim 8 is incorporated, and Duluk, Duluk-2 disclose a command queue.
Duluk-3 further discloses,
wherein the first processing unit is configured to transfer the page from the second memory to the first memory in response to the frequency exceeding a threshold (Duluk-3, [0006 - At the end of the measurement interval, shared memory pages accessed with relatively high frequency by the first processor are migrated from the local memory of the second processor to the local memory of the first processor]; [0021 - The cache tracker includes a threshold value, where the access tracker indicates when a counter has reached the threshold value]; [0019 - The access tracker includes a cache memory that includes a quantity of cache entries, where each cache entry includes a count of the number of times a first processor accesses a particular memory page]).
Therefore it would have been obvious to a person of ordinary skill at the time of filing to incorporate the migration of Duluk-3 into the CPU-to-GPU atomic operations of Duluk, Duluk-2 for the benefit of improved performance wherein a shared memory page may reside in the CPU memory system, even though both the CPU and the GPU access the memory page. Typically, CPU accesses of such a shared memory page have relatively low access times, while GPU accesses of a CPU-shared memory page may have relatively higher access times. If the GPU frequently accesses such a shared memory page, then GPU performance may be reduced. But GPU accesses of GPU-owned memory pages typically have relatively low access times. As a result, performance may be improved if CPU-shared memory pages that are accessed relatively frequently by the GPU are migrated to the GPU memory system (Duluk-3, 0018).

As per Claim 17, it is similar to claim 8 and therefore the same rejections are incorporated.



As per Claim 20, it is similar to claims 8, 9 and therefore the same rejections are incorporated.



Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ARVIND TALUKDAR whose telephone number is (571)270-3177.  The examiner can normally be reached on M-F, 10 am-6pm EST.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, David Yi can be reached on 571-270-7519.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


Arvind Talukdar
Primary Examiner
Art Unit 2132