DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
The present Office action is in response to Applicant’s amendment/request for reconsideration submitted on January 25, 2021, hereinafter “Reply”, after the non-final rejection of September 24, 2020, hereinafter “Non-Final Rejection”.  Claims 1 and 14-15 have been amended.  No claims have been added.  Claims 7 have been cancelled.  Claims 1-6 and 8-20 remain pending in the application. 

Response to Amendments and Arguments
The Reply has been fully considered, with the Examiner’s response set forth below.
Applicant’s arguments on pp. 7-11 of the Reply with respect to the amended independent claims 1 and 15 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Applicant’s general arguments on p. 11 of the Reply with respect to the dependent claims have been fully considered but are not persuasive because the dependent claims depend on the independent claims 1 and 15 and do not overcome the deficiency thereof for the reasons stated in the rejections of the independent claims of the claim analysis below.
Another iteration of claim analysis has been made due to the amendments to the claims. Refer to the corresponding sections of the claim analysis below for details. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-6, 8-9, and 12-20 are rejected under 35 U.S.C. 103 as being unpatentable over Levandoski et al. (US 2016/0357791 A1), hereinafter “Levandoski”, in view of Schwartz (a document retrieved from the Internet titled "LOCK prefix vs MESI protocol?"; January 8, 2016), hereinafter “Schwartz”, Moses et al. (US 2007/0150658 A1), hereinafter “Moses”, and Alexander et al. (US 2006/0212653 A1), hereinafter “Alexander”.

Regarding claim 1, Levandoski teaches a processor comprising:
a cache ([0071], “a TLB may refer to a cache that memory management hardware uses to improve virtual address translation speed”), the cache comprising a cache line ([0072], “conflict detection may be done at the granularity of a cache line, which may lead to cases of false sharing where aborts occur due to threads accessing and modifying separate items on the same cache line”); and
an execution circuit (FIG. 11, [0162], “processing unit 1120”) to execute an atomic primitive, wherein the atomic primitive is implemented in hardware and is defined in an instruction set of architecture (ISA) of the processor, the atomic primitive comprising built-in operations to ([0062], “atomic CPU hardware primitives such as compare-and-swap (CAS)”):
responsive to executing a read instruction to retrieve a data item from a memory location, cause to store a copy of the data item in the cache line (FIG. 2A, [0045], “pages 212 (i.e., data item) may be read from secondary storage into a main memory cache on demand”; [0139], “this technique may ;
execute a lock instruction to cause a cache controller to lock the cache line to the processor (FIG. 1, [0025]-[0026], [0032], [0034], cache layer 104 (i.e., cache controller); [0073], “a programmer may utilize lock elision by using a set of CPU operations such as: AcquireElided (Lock) (i.e. lock instruction)”; [0097], “When a transaction falls back and acquires the lock, all other transactions in the critical section abort and cannot restart until the lock is released. The effect is that execution is fully serialized until the lock is released—even if the other transactions operate on non-conflicting cache lines”; FIG. 9, [0147], “exclusivity may be achieved while avoiding a global shared lock by assigning a thread-local read/write lock to each thread, as depicted on the right-hand side of FIG. 9”; [0148], “a lockaside table may be implemented as an array of cache-line sized entries, where each entry is assigned to a thread executing reads or updates against the index”; [0189]-[0192], “[0189] … the table state control module may be structured to maintain a respective thread-local read/write lock for each thread of a plurality of concurrently executing threads, and before starting an operation by one of the threads, acquire exclusive access for the one of the threads, to the respective thread-local read/write lock for the one of the threads, wherein the lockaside operations include obtaining exclusive access to the mapping table by acquiring all respective thread-local read/write locks from other threads of the plurality of , wherein to lock the cache line, the cache controller is to set a lock status flag in the cache line (FIG. 1, [0025]-[0026], [0032], [0034], cache layer 104 (i.e., cache controller); [0073], “a programmer may utilize lock elision by using a set of CPU operations such as: AcquireElided (Lock) (i.e. lock instruction)”; [0097], “When a transaction falls back and acquires the lock, all other transactions in the critical section abort and cannot restart until the lock is released. The effect is that execution is fully serialized until the lock is released—even if the other transactions operate on non-conflicting cache lines”; note that status of the lock of a cache line must be necessarily be stored in a flag, bit, or register in order to determine when the cache line is locked or released) and is to prevent other processors from modifying the data item stored at the memory location;

responsive to determining that the execution of the lock instruction is successful, 
guarantee that the memory location is locked to the processor until completion of executing an unlock instruction; and
execute at least one instruction while the cache line is locked to the processor ([0073], “a programmer may utilize lock elision by using a set of CPU operations such as: …Withdraw (A, X) (2) Deposit (B, X) (i.e., at least one instruction)”; [0147], “Once the MWCAS acquires all such locks, it may modify its mapping table entries and may then release all locks”; [0076], “Hardware Lock Elision (HLE) adds two new instruction prefixes (XACQUIRE and XRELEASE) for use in conjunction with instructions that implement a lock”; [0097], “When a transaction falls back and acquires the lock, all other transactions in the critical section abort and cannot restart until the lock is released. The effect is that execution is fully serialized until the lock is released—even if the other transactions operate on non-conflicting cache lines”; [0192], “the controlling including initiating an atomic multi-word compare-and-swap (MWCAS) operation on a plurality of words using a hardware transactional memory (HTM) resident in a device processor, the MWCAS operation performed using hardware primitive operations of the HTM, via the device processor”); and
execute the unlock instruction to cause the cache controller to release the lock of the cache line from the processor (FIG. 1, [0025]-[0026], [0032], [0034], cache layer 104 (i.e., cache controller); [0073], “a programmer may utilize lock elision by using a set of CPU operations such as: … ReleaseElided (Lock) (i.e., unlock instruction)”; [0097], “When a transaction falls back and acquires the lock, all other transactions in the critical section abort and cannot restart until the lock is released. The effect is that execution is fully serialized until the lock is released—even if the other transactions operate on non-conflicting cache lines”; [0147], “Once the MWCAS acquires all such locks, it may modify its mapping table entries and may then release all locks”; [0191], “the table state control module may be structured to modify target mapping table entries, by the one of the threads, and release all the respective thread-local read/write locks from the other threads, after the modifying of the target mapping table entries”; [0192], “the controlling including initiating an atomic multi-word compare-and-swap (MWCAS) operation on a plurality of words using a hardware transactional memory (HTM) resident in a device processor, the MWCAS operation performed using hardware primitive operations of the HTM, via the device processor”).

Levandoski teaches execute an atomic primitive; store a copy of the data item in the cache line; and wherein to lock the cache line, the cache controller is to set a lock status flag.  Nevertheless, Levandoski does not expressly teach execute an atomic primitive comprising built-in operations; responsive to executing a read instruction to retrieve a data item from a memory location, cause to store a copy of the data item in 

However, Schwartz teaches:
execute an atomic primitive comprising built-in operations (p. 2; “Locked increment [atomic primitive]: 1. Acquire cache line exclusive (if not already E or M) and lock it. 2. Read value. 3. Add one to it. 4. Write the new value to the cache line. 5. Change the cache line to modified and unlock it.”; note that the built-in operations are described in steps 1-5, such as acquire cache line exclusive and lock it, read value, add one to it, write the new value to the cache line, and change the cache line to modified and unlock it); 
responsive to executing a read instruction to retrieve a data item from a memory location, cause to store a copy of the data item in the cache (p. 2; “Locked increment: 1. Acquire cache line exclusive (if not already E or M) and lock it. 2. Read value. 3. Add one to it. 4. Write the new value to the cache line”; note that the Locked increment primitive reads the value from the memory (see step 2), increments the read value (see step 3), and then eventually writes [store] the incremented value to the cache line (see step 4)); and
wherein to lock the cache line, the cache controller is to set a lock status flag and is to prevent other processors from modifying the data item stored at the memory location (p. 2; “On modern CPUs, the LOCK prefix locks the cache line so that the read-modify-write operation is Locked increment: 1. Acquire cache line exclusive (if not already E or M) and lock it. 2. Read value. 3. Add one to it. 4. Write the new value to the cache line. 5. Change the cache line to modified and unlock it. … In the locked increment, the cache line is held across the entire instruction, all the way from the read operation to the write operation and including during the increment itself.”; note that in the locked increment, the cache line is held locked across the entire instruction, all the way from the read operation to the write operation and including during the increment itself, and thus would prevent other CPUs from modifying the value of the memory location associated with the cache line being locked).
	
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Levandoski to incorporate the teachings of Schwartz to provide the device processor of Levandoski (FIG. 11, [0192]) with the CPUs of Schwartz. Doing so with the CPUs of Schwartz would lock the cache line so that the read-modify-write operation is logically atomic such that the cache line is held locked across the entire instruction, all the way from the read operation to the write operation and including during the increment itself, in order to handle memory operation ordering issues (Schwartz, p.2).

Levandoski teaches set a lock for the cache line.  Nevertheless, the combination of Levandoski does not explicitly teach set a lock status flag in the cache line. 

 set a lock status flag in the cache line (FIG. 2; [0013], “The shared cache 108 may also include one or more lock/monitor status bits (204) [status flag] for each of the cache lines (202) … one bit may be utilized to indicate whether the corresponding cache line is locked”; note that FIG. 2 illustrates that each of the cache lines (202) includes the lock/monitor status bits (204) [status flag]).

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Levandoski to incorporate the teachings of Moses to provide the device processor of Levandoski (FIG. 11, [0192]) with the processors of Moses, each having one or more processor cores that accesses a shared cache and a cache controller for locking one or more cache lines in the shared cache and monitoring one or more addresses in the shared cache that correspond to one or more pinned and locked cache lines.  Doing so with the processors having the cache controller of Moses would provide efficient mechanisms for pinning locks in a shared cache, which may reduce the amount of snoop traffic generated in computing systems that include multiple processor cores (Moses, [0009]).

The combination of Levandoski does not explicitly teach wherein the atomic primitive is implemented in hardware and is defined in an instruction set of architecture (ISA) of the processor, the atomic primitive comprising built-in operations to: determine whether execution of the lock instruction is successful; responsive to determining that the execution of the lock instruction is successful, guarantee that the memory location is 

However, Alexander teaches:
wherein the atomic primitive is implemented in hardware and is defined in an instruction set of architecture (ISA) of the processor, the atomic primitive comprising built-in operations to ([0004], “load-reserve and store-conditional instructions have been implemented in the PowerPC® instruction set architecture with operation codes (opcodes) associated with the LWARX and STWCX mnemonics, respectively (referred to hereafter as LARX and STCX) [atomic primitive]”):
determine whether execution of the lock instruction is successful ([0009], “RC logic 142 stores the address of the reservation granule (e.g., cache line) containing the target address in reservation address field 148 and sets [lock instruction] reservation flag 150”; [0039], “At step 514, reservation logic 346 determines whether a reservation exists for address A”; note that the setting [lock instruction] of the reservation flag is successful if a reservation exists for address A);
responsive to determining that the execution of the lock instruction is successful ([0009], “RC logic 142 stores the address of the reservation granule (e.g., cache line) containing the target address in reservation address field 148 and sets [lock instruction] reservation flag 150”; [0012], “RC logic 142 obtains owner permission for the target cache line and then determines at block 226 whether or not reservation flag 150 is still set (i.e., whether or not any other determines whether a reservation exists for address A. If a reservation exists for address A, then the STCWX instruction passes, and the process next proceeds to step 516, which depicts an RC machine 342 of L2 storage array and directory 340 updating the L2 cache 330 data with the data for address A sent with the STCWX operation”; note that the setting [lock instruction] of the reservation flag is successful if a reservation exists for address A), 
guarantee that the memory location is locked to the processor until completion of executing an unlock instruction ([0038], “LSU 325 rejects or flushes loads to address A from all threads on the current processor core 320, in this case processor core 320 a.”; [0039], “The process then proceeds to step 522, which illustrates LSU 325 releasing [unlock instruction] the block on loads to address A [memory location]. The process next moves to step 523. At step 523, LSU 325 optionally releases the block on stores from other threads, if those other threads were optionally blocked at step 506”; note that the address A [memory location] is guaranteed to be locked to the current processor core 320, (e.g., in this case processor core 320 a) until LSU 325 releases [unlock instruction] the block on loads to address A memory location]); and
execute at least one instruction while the cache line is locked to the processor ([0009], “RC logic 142 stores the address of the reservation granule (e.g., cache line) containing the target address in reservation address field 148 and sets [lock instruction] reservation flag 150”; [0039], “If a reservation exists for address A, then the STCWX instruction passes, and the process next proceeds to step 516, which depicts an RC machine 342 of L2 storage array and directory 340 updating the L2 cache 330 data with the data for address A sent with the STCWX operation. The process then proceeds to step 518, which depicts reservation logic 346, returning a STCWX pass indication on pass/fail bus 374a to L1 cache 326. The process next moves to step 520. At step 520, LSU 325 updates appropriate status information for a pass as necessary and completes the STWCX instruction”; note that the address A is locked to the current processor core 320 until LSU 325 releases the block on loads to address A; note also that steps 518-520 perform instructions, such as updating the L2 cache 330 data, returning a STCWX pass indication, and updates appropriate status information, while the reservation exists while the reservation flag is set [lock instruction] for address A associated with the reservation granule (e.g., cache line) containing the target address in the reservation address field).



Regarding claim 15, the claimed method comprises the steps for carrying out the same steps in claim 1. Accordingly, the claim is also rejected for the same reasons as set forth for those in claim 1 above. 

Regarding claim 2, the combination of Levandoski teaches the processor of claim 1.
Levandoski further teaches wherein the cache is one of an L1 data cache ([0093], “One example HTM implementation may leverage its 32 KB L1 cache to buffer a transaction's writes and to track its read and write set”) or an L2 data cache.

Regarding claim 16, the claimed method comprises the steps for carrying out the same steps in claim 2. Accordingly, the claim is also rejected for the same reasons as set forth for those in claim 2 above.

Regarding claim 3, the combination of Levandoski teaches the processor of claim 1.
Levandoski further teaches wherein the cache controller associated with the processor (([0073], “a programmer may utilize lock elision by using a set of CPU operations) is to:
determine that the cache line is in an exclusive state under a cache coherence protocol ([0070], “HTM piggybacks on existing features in CPU micro-architectures to support transactions.  For example, CPU caches may be used to store transaction buffers and provide isolation.  For example, the CPU cache coherence protocol can be used to detect conflicting transactional accesses”; [0139], “both singleton reads and writes may be removed from HTM transactions, which may be referred to herein as an example "infinite retry" technique.  For example, this technique may advantageously utilize an example property of singleton reads or updates (that are non-transactional), that they may still trigger the cache coherence protocol for their target cache lines”; FIG. 9, [0147], “exclusivity may be achieved while avoiding a global shared lock by assigning a thread-local read/write lock to each thread, as depicted on the right-hand side of FIG. 9”); and
responsive to determining that the cache line is in the exclusive state, mark the cache line as locked to the processor ([0139], “both singleton reads and writes may be removed from HTM transactions, which may be referred to herein as an example "infinite retry" technique.  For example, this technique may advantageously utilize an example property of singleton reads or updates (that are non-transactional), .

Regarding claim 17, the claimed method comprises the steps for carrying out the same steps in claim 3. Accordingly, the claim is also rejected for the same reasons as set forth for those in claim 3 above.

Regarding claim 4, the combination of Levandoski teaches the processor of claim 1.

Moses further teaches wherein the execution circuit is to mark the cache line as locked to the processor responsive to setting the lock status flag in the cache line (FIG. 2; [0013], “FIG. 2 illustrates a block diagram of portions of a shared cache 108 and other components of a processor core … The shared cache 108 may also include one or more lock/monitor status bits (204) [lock status flag] for each of the cache lines (202) … one bit may be utilized to indicate whether the corresponding cache line is locked and another bit may be used to indicate whether the corresponding cache line is monitored (or pinned) in the shared cache 108”, the execution circuit is a portion of the first microprocessor that executes the program to set the lock variable).
	
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Levandoski to incorporate the teachings of Moses to provide the device processor of Levandoski (FIG. 11, [0192]) with the processors of Moses, each having one or more processor cores that accesses a shared cache and a cache controller for locking one or more cache lines in the shared cache and monitoring one or more addresses in the shared cache that correspond to one or more pinned and locked cache lines.  Doing so with the processors having the cache controller of Moses would provide efficient mechanisms for pinning locks in a shared cache, which may reduce the amount of snoop traffic generated in computing systems that include multiple processor cores (Moses, [0009]).

Regarding claim 18, the claimed method comprises the steps for carrying out the same steps in claim 4. Accordingly, the claim is also rejected for the same reasons as set forth for those in claim 4 above.

the processor of claim 4.

Moses further teaches wherein to release the cache line from the processor, the cache controller is to unset the lock status flag in the cache line (FIG. 4; [0014], “the cache controller 206 may include a locking logic 208 (e.g., to lock one or more cache lines 202 in the shared cache 108), a monitoring logic 210 (e.g., to monitor one or more addresses in the shared cache 108 that correspond to one or more pinned and locked cache lines”; [0022], “at an operation 402, the monitoring logic 210 may determine whether one or more locks present in the shared cache 108 have been released (or otherwise unlocked) [unset], e.g., by referring to the value stored in the corresponding lock/monitor status bit(s) 204 [lock status flag]”).

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Levandoski to incorporate the teachings of Moses to provide the device processor of Levandoski (FIG. 11, [0192]) with the processors of Moses, each having one or more processor cores that accesses a shared cache and a cache controller for locking one or more cache lines in the shared cache and monitoring one or more addresses in the shared cache that correspond to one or more pinned and locked cache lines.  Doing so with the processors having the cache controller of Moses would provide efficient mechanisms for pinning locks in a shared cache, which may reduce the amount of snoop traffic generated in computing systems that include multiple processor cores (Moses, [0009]).

Regarding claim 19, the claimed method comprises the steps for carrying out the same steps in claim 5. Accordingly, the claim is also rejected for the same reasons as set forth for those in claim 5 above.

Regarding claim 6, the combination of Levandoski teaches the processor of claim 1.

Moses further teaches further comprising:
identifying a request to access the cache line by a second processor (FIG. 3; [0014], “a lock forwarding logic 212 (e.g., to determine which one of a plurality of processor cores is notified when one or more locked cache lines of the shared cache 108 are unlocked or released, as will be further discussed with reference to FIG. 4”; [0020], “At an operation 324, the monitoring logic 210 may monitor the locked cache lines of operation 316, e.g., to suspend one or more memory requests to these cache lines until the cache lines are unlocked or released”; [0023], “At operation 408, the lock forwarding logic 212 may notify a processor core (e.g., one of the cores 106 that are contending for the locked cache lines) that the locked cache line(s) of the operation 316 are unlocked. As discussed with reference to FIG. 2, the lock forwarding logic 212 may determine which one of a plurality of processor cores 106 is notified (408) when one or more locked cache lines of the shared cache 108 are unlocked. The plurality of processor cores (106) may be cores that execute a plurality of threads that are contending for the one or more locked cache lines in the shared cache 108”; note that second processor is another processor core, e.g., one of the cores 106 that are contending for the locked cache lines, that is contending for the locked cache lines); and
causing the cache controller to delay granting access by the second processor to the cache line until the execution circuit completes execution of the at least one instruction (FIG. 4; [0023], “At operation 408, the lock forwarding logic 212 may notify a processor core (e.g., one of the cores 106 that are contending for the locked cache lines) that the locked cache line(s) of the operation 316 are unlocked. As discussed with reference to FIG. 2, the lock forwarding logic 212 may determine which one of a plurality of processor cores 106 is notified (408) when one or more locked cache lines of the shared cache 108 are unlocked. The plurality of processor cores (106) may be cores that execute a plurality of threads that are contending for the one or more locked cache lines in the shared cache 108”; note that the second processor is another processor core, e.g., one of the cores 106 that are contending for the locked cache lines, that is contending for the locked cache lines; also, execution of the at least one instruction is completed for the processor core associated with the locked cache line before the locked cache line(s) of the operation 316 are unlocked; the second processor cannot access the cache line until the cache line is unlocked).

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Levandoski to incorporate the teachings of Moses to provide the device processor of Levandoski (FIG. 

Regarding claim 20, the claimed method comprises the steps for carrying out the same steps in claim 6. Accordingly, the claim is also rejected for the same reasons as set forth for those in claim 6 above.

Regarding claim 8, the combination of Levandoski teaches the processor of claim 1.
Levandoski further teaches wherein the cache controller is to access a pool to store a plurality of memory addresses that correspond to locked cache lines, and wherein to lock the cache line to the processor, the cache controller is to place a memory address associated with the cache line in the pool ([0039], “replacing a prior (or current) state of a page with a new (or updated) state of the page may include at least one of replacing a physical address of a first storage object … with a physical address of a delta record that is associated with the new state of the page”; [0040], “replacing a physical address of the current page with a physical address of the new state of the page (e.g., the modified version or the other page for replacement), via .

Regarding claim 9, the combination of Levandoski teaches the processor of claim 8.
Levandoski further teaches wherein the cache controller is to remove the memory address from the pool to unlock the cache line (FIG. 1, [0025]-[0026], [0032], [0034], cache layer 104 (i.e., cache controller); [0022], “techniques discussed .

Regarding claim 12, the combination of Levandoski teaches the processor of claim 1.
	Levandoski further teaches wherein at least one processor other than the processor is denied access to the cache line while the cache line is locked to the processor ([0049], “transactions may be characterized as ACID: Atomic (meaning all or nothing), Consistent … Isolated (changes that the transaction makes cannot be seen until the transaction is finished (committed) or which disappear if the transaction is aborted) and Durable … Transactions provided by current HTM implementations apply to operations performed in volatile memory only; therefore current HTM transactions encompass only a subset of the ACID guarantees … since HTM transactions are memory-only, they may not provide durability (the "D" in ACID) nor recoverability guarantees”; [0050], “Computer systems using transaction processing may employ a commit protocol to insure that no permanent change is made in a data item, or no change visible to other nodes of the system (i.e., processors other than the processor), until a specified "commit" is executed.  In this context, to "commit" a transaction generally may refer to installing the results of a transaction in a data base”; [0055], “For systems that support locks, a "data lock" is a mechanism for assigning exclusive rights to a datum or record in a data base.  For such systems, a first transaction may lock a particular piece of data so as to ensure that no other .

Regarding claim 13, the combination of Levandoski teaches the processor of claim 1.
Levandoski further teaches wherein a second processor is to access the cache line while the cache line is unlocked ([0021], “Recent developments in hardware platforms have exploited multi-core processors, multi-tiered memory hierarchies … central processing unit (CPU) changes have included multi-core processors and main memory access that involves multiple levels of caching”; [0097], “When a transaction falls back and acquires the lock, all other transactions in the critical section abort and cannot restart until the lock is released (i.e., unlocked). The effect is that execution is fully serialized until the lock is released--even if the other transactions operate on non-conflicting cache lines”; [0105], “In lock-coupling a pair of locks are held as a worker traverses pages: one on a "source" page and another on a "target" page.  As the traversal proceeds, a lock on the target page in the traversal is first acquired and only afterward the lock on the source page is released”; [0147], .

Regarding claim 14, the combination of Levandoski teaches the processor of claim 1.
Levandoski further teaches wherein to execute the atomic primitive, the execution circuit (FIG. 11, [0162], processing unit 1120) is further to: 
responsive to determining that the execution of the lock instruction fails:
return a status flag indicating that the execution fails;
branch to a pre-determined memory address; and
trigger an exception to notify an exception handler; or
repeatedly execute the lock instruction until successfully acquiring a lock to the processor ([0145], “speculative transactional execution may suppress the page fault event, so that retrying the transaction speculatively may always fail without some outside help.  For example, an HTM designer may indicate that synchronous exception events, including page faults, "are suppressed as if they had never occurred"”, repeatedly executing the lock instruction is performed by retrying the transaction; [0146], “when omitting a fallback lock, a fallback code path may need to (at least) pre-fault the addresses that the transaction intends to access”; [0058], “an example "infinite retry" approach may be utilized to achieve a multi-word compare-and-swap that only brackets multi-slot updates to the mapping table in hardware transactions”; [0059], “techniques may be utilized for guaranteeing progress in the "infinite retry" approach that aim to reduce spurious aborts of hardware transactions”; FIG. 6, [0098], “performing lock-elision with retry using the example RTM instructions.  For example, FIG. 6 depicts an example lock elision technique using RTM with a configurable retry threshold 602.  As shown in FIG. 6, a lock 604 is utilized for executing a critical section.  For example, in a retry section 606, if the lock 604 is already taken, or if an abort occurs, a fallback section 608 is executed for handling a critical section”).

Claims 10-11 are rejected under 35 U.S.C. 103 as being unpatentable over Levandoski et al. (US 2016/0357791 A1), hereinafter “Levandoski”, in view of Schwartz (a document retrieved from the Internet titled "LOCK prefix vs MESI protocol?"; January 8, 2016), hereinafter “Schwartz”, Moses et al. (US 2007/0150658 A1), hereinafter “Moses”, and Alexander et al. (US 2006/0212653 A1), hereinafter “Alexander”, as applied to claim 1 above, and further in view of Vorbach (US 2016/0004639 A1), hereinafter “Vorbach”.

Regarding claim 10, the combination of Levandoski teaches the processor of claim 1.

Levandoski further teaches wherein the lock instruction ([0073], “a programmer may utilize lock elision by using a set of CPU operations such as: AcquireElided (Lock) (i.e. lock instruction)”) is a privileged instruction of the operating system (FIG. 11, [0168], operating system 1134), and wherein the unlock instruction ([0073], “a programmer may utilize lock elision by using a set of CPU operations such as: … ReleaseElided (Lock) (i.e., unlock instruction)”) is a privileged instruction of the operating system (FIG. 11, [0168], operating system 1134; [0128], “the SMO thread may be scheduled out by the operating system (OS)”).

Levandoski teaches the lock instruction and the unlock instruction. Nevertheless, the combination of Levandoski does not expressly teach a privileged instruction.

However, Vorbach teaches a privileged instruction ([0209], “For enabling the operating system to take control, instructions may be implemented in the core and/or processor to modify any TAGs and/or Locks in higher privileged code, e g running in 
	
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Levandoski to incorporate the teachings of Vorbach to provide the device processor of Levandoski (FIG. 11, [0192]) with the processors that implemented instructions in the core and/or processor to modify any TAGs and/or Locks in higher privileged code of Vorbach. Doing so with the processors of Vorbach would be advantageous for multi-processor arrangements having the ability of accessing multiple memories or memory sections through a plurality of rather independent Load/Store-units and/or Address-Generators increases bandwidth and coherence problems significantly for state of the art implementations of the memory hierarchy (Vorbach, [0041]).

Regarding claim 11, the combination of Levandoski teaches the processor of claim 10.
the processor of claim 10, wherein the processor (FIG. 11, [0162], “processing unit 1120”) is to support the operating system (FIG. 11, [0168], operating system 1134) to call the atomic primitive (FIG. 11, [0022], “Example techniques discussed herein utilize hardware primitives (e.g., using hardware transactional memory (HTM)) to atomically modify multiple values (e.g., memory addresses)”; [0024], “a "compare and swap" operation, or a "CAS" may refer to an atomic instruction or operation (i.e. atomic primitive) that may be used in a multithreading environment to achieve synchronization”; [0168], “data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 1120 … FIG. 11 illustrates operating system 1134”).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Tong B Vo whose telephone number is (571)272-7568.  The examiner can normally be reached on M-F 8:00 AM - 4:00 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Charles Rones can be reached on (571)272-4085.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/T.B.V./Patent Examiner, Art Unit 2136