DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Office Action is in response to Applicant’s Amendment and Remarks filed on 06 April 2022. 
Claims 1-20 are pending in this application. 


Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f): 
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) are: “a compute unit", “an advanced controller”, “a workgroup scheduler” and “cache controller” in claims 10, 13-14, 17 and 19.

Because these claim limitations are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, they are being interpreted to cover the corresponding either structure, material, or acts to the function described in the specification as performing the claimed function, and equivalents thereof.  The corresponding structure can be found in paragraph [0049] that discloses “The various functional units illustrated in the figures and/or described herein (including, but not limited to, the processor 102, the input driver 112, the input devices 108, the output driver 114, the output devices 110, the accelerated processing device 116, the scheduler 136, the graphics processing pipeline 134, the compute units 132, the SIMD units 138, the advanced controller 306, the workgroup scheduler 308, or the cache controller 304) may be implemented as a general purpose computer, a processor, or a processor core, or as a program, software, or firmware, stored in a non-transitory computer readable medium or in another medium, executable by a general purpose computer, a processor, or a processor core.”

If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.


Claim Rejections - 35 USC § 112(b)
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

Claims 1-20 are rejected under 35 U.S.C. 112(b), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.
As per claims 1, 10 and 19 (line# refers to claim 1):
Lines 3-10, it recites the steps of “detecting execution of a wait instruction by a workgroup…monitoring…in response to detecting that the condition is met” and “the detecting that the condition is met comprises determining…after the wait instruction is executed by each workgroup of a set of workgroups” render the claim indefinite. This is indefinite and unclear because it is uncertain how to determining the condition is met since the determination is based on both (a) wait instruction executed by a single workgroup and (b) wait instruction executed by each workgroup of a set of workgroups. Such inconsistent claim limitations render the claim indefinite because it is not clearly indicated how the determination is to be performed based on both two (a) wait instruction executed by a single workgroup and (b) wait instruction executed by each workgroup of a set of workgroups.

In lines 9-10, it recites the phrase “each workgroup of a set of workgroups”. However, prior to this phrase at line 3, it recites “a workgroup”. Thus, it is unclear whether the second recitation of “each workgroup of a set of workgroups” is the same or different from the first recitation of “a workgroup”. It is uncertain if this term “workgroup” in line 3 intent to refer to one of the “each workgroup of a set of workgroups”, since they are all executing the wait instruction. 

As per claims 2-9, 11-18 and 20:
They are method and system claims that depend on claims 1, 10 and 19 respectively above, therefore, they have same deficiencies as claims 1, 10 and 19 above.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2 and 10-11 are rejected under 35 U.S.C. 103 as being unpatentable over Uhrenholt et al. (US. Pub. 2020/0211146 A1) in view of Marejka (US Patent. 7,512,950 B1).
Uhrenholt and Marejka were cited in the previous Office Action.

As per claim 1, Uhrenholt teaches the invention substantially as claimed including A method for scheduling on an accelerated processing device ("APD") (Uhrenholt, Fig. 1 (as accelerated processing device); [0025] lines 1-3, a graphics processor comprising a programmable execution unit operable to execute programs to perform processing operations; [0070] lines 1-7, the program can, and in an embodiment does, include one or more sections of instructions to be executed (and that in an embodiment are executed) by only a subset of execution threads (e.g. and in an embodiment, a single execution thread) at any one time), the method comprising: 
detecting execution of a wait instruction by a workgroup (Uhrenholt, Fig. 4, T0 (as workgroup), SBARRIER; [0069] lines 4-6, the program may contain a single set or plural sets of one or more instructions, each set associated with a corresponding thread exclusivity instruction; [0070] lines 2-5, one or more sections of instructions to be executed…by only a subset of execution threads (e.g. and in an embodiment, a single execution thread) at any one time; [0267] lines 1-11, When a thread reaches "SBARRIER", in response to that instruction (as detecting the execution of “SBARRIER” instruction (as wait instruction)), the thread sends a message to the thread exclusivity control unit 303 ("SBARRIER Control Unit") to determine whether the thread satisfies a condition for proceeding with the execution of instructions beyond "SBARRIER" in the program…The thread exclusivity instruction ("SBARRIER") (as wait instruction) in the program thus acts as a "barrier", through which only a single execution thread is allowed to pass at any one time.), the wait instruction specifying a condition value and a condition to be met for the condition value (Uhrenholt, [0126] lines 1-2, A thread exclusivity condition can be associated with a thread exclusivity instruction; [0128] lines 1-8, the subset consists of only a single execution thread from the set of plural execution threads. Thus, in an embodiment, the condition is such that only a single execution thread will satisfy the condition at any one time… this means that the associated set of one or more instructions can be executed by (only) one execution thread at a time (as specifying that only one (as condition value) execution thread will be satisfying/met for the condition)); 
monitoring the condition value to detect the condition being met (Uhrenholt, [0086] lines 1-5, determine (as monitoring) whether the execution thread satisfies a condition associated with the instruction, the condition being such that only a subset of a set of plural execution threads will satisfy the condition at any one time; [0133] lines 1-6, the thread exclusivity condition is a condition that can be (and is), at any one time, only satisfied by a subset (in an embodiment consisting of only a single execution thread) of a set of plural execution threads that are to execute the shader program that includes the thread exclusivity instruction); 
in response to detecting that the condition is met, determining that the wait instruction is part of a high contention scenario (Uhrenholt, [0128] lines 3-8, the condition is such that only a single execution thread will satisfy the condition at any one time… this means that the associated set of one or more instructions can be executed by (only) one execution thread at a time; [0267] lines 8-11, The thread exclusivity instruction ("SBARRIER") (as wait instruction) in the program thus acts as a "barrier", through which only a single execution thread is allowed to pass at any one time; [0054] lines 2-13, serial processing of processing items…only a, e.g. single execution thread satisfying the thread exclusivity condition is able to be "awake" to execute the set of instructions associated with the thread exclusivity instruction at any one time, while (all) other threads "sleep". Accordingly, the technology described herein can save processing power, as compared to e.g., "spinlock" arrangements. This is generally advantageous, but may be particularly beneficial in modern mobile devices such as smart phones, tablets, and the like where system resources are restricted; [0055] lines 1-9, when using a "spinlock" arrangement, for example, to ensure exclusive access to a resource by a thread, when the lock is obtained by one thread, other threads waiting to use the resource will repeatedly attempt to obtain the lock. Such repeated attempts, or "spinning"…in which many threads may be trying to acquire the lock at any given time [Examiner noted: only one thread is allowed (serial processing) due to resource restriction (many thread trying to acquire the same resource, as high contention scenario)]; wherein the set of workgroups includes two or more workgroups (Uhrenholt, Fig. 4, t0, t1 (as number of workgroups is greater than one), warp 1 (as a set of workgroups); SBARRIER (T0 and T1 is hitting the SBARRIER and wait)), and
waking up a single workgroup of the set of workgroups in response to the condition being met and in response to the wait instruction being part of the high contention scenario (Uhrenholt, Fig. 4, SBARRIER, Wakeup t0; Fig. 5, 503, Thread satisfies thread exclusivity condition, Yes to 504; [0267] lines 1-6, When a thread reaches "SBARRIER", in response to that instruction, the thread sends a message to the thread exclusivity control unit 303 ("SBARRIER Control Unit") to determine whether the thread satisfies a condition for proceeding with the execution of instructions beyond "SBARRIER" in the program; [0270] lines 5-8, the control unit 303 sends "wakeup" messages such that, at any one time, only a single execution thread will continue to execute instructions in the program following "SBARRIER"; [0298] lines 6-11,  When the thread encounters the thread exclusivity instruction in the program, it is determined whether the thread satisfies a thread exclusivity condition (step 503), which condition can only be satisfied by a single execution thread (subset) at any one time. If the thread satisfies the condition, the thread executes a set of instructions in the program associated with the thread exclusivity instruction; [Examiner noted; in response the condition is met, only one workgroup is wakeup due to the high contention scenario]).

Uhrenholt fails to specifically teach the conditional value is condition variable, and wherein the detecting that the condition is met comprises determining that a single update occurs to the condition variable after the wait instruction is executed by each workgroup of a set of workgroups.

However, Marejka teaches the conditional value is condition variable (Marejka, Col 2, lines 50-53, For all threads except the last thread to enter the barrier, the mygeneration variable will represent the current value of the barrier's generation variable (as condition variable) (e.g., zero in the specific example); Col 2, line 65-Col 3 line 3, The last to arrive thread may also execute instructions (as include wait instruction) to prepare the barrier for the next iteration, for example by incrementing the generation variable and resetting the counter value to equal the limit variable. Expressed in pseudocode, the above steps may be represented as shown in Table 1; Col 3, lines 5-20, Table 1, Initialize barrier for N thread usage, generation =0 (Table 1, top portion (initialize barrier to wait) as whole as wait instruction include condition variable); wait);
wherein the detecting that the condition is met comprises determining that a single update occurs to the condition variable after the wait instruction is executed by each workgroup of a set of workgroups (Marejka, Col 2, lines 55-56, While its mygeneration variable remains equal to the barrier's generation variable the thread will continue to wait. The last to arrive thread will change the barrier's generation variable value (as one update). Col 3, lines 5-20, Table 1, Initialize barrier for N thread usage, generation =0, wait, Else mygeneration=generation, counter-- for each thread [Examiner noted: when each thread entering barrier, it will wait (wait instruction), as after the wait instruction is executed by each workgroup]; Col 3, lines 21-28, each of the awakened threads must acquire the barrier's lock, however, only one thread can own the lock at any time.  The awakened threads will attempt to acquire the lock as many times as necessary. Because they are all trying to acquire the lock concurrently, most of the threads will have to make multiple attempts to acquire the lock.  After each failed attempt, the thread will go back into a wait state; line 35, one thread will leave the barrier; Col 2, lines 60-65, When the last to arrive thread enters the barrier…The last to arrive thread signals the waiting thread using, for example, a cond_broadcast instruction which signals all of the waiting threads to resume.  It is this nearly simultaneous awakening that leads to the contention as the barrier is released; also see Col 3, lines 5-20, Table 1, generation =0 (as condition variable =0, no updated), wait, if counter = =1 (detect last to arrive thread), generation ++ (as condition variable that a single update), else,  mygeneration= generation (no update to the generation variable), counter --; [Examiner noted: each workgroup (thread including last thread) of the set of workgroups are entering the barrier, and the high contention occurs due to all threads need to acquire the barrier's lock for existing the barrier, and when the last thread enter the barrier, after wait (see Col 3, Table 1), if determined that thread is last thread (Col 3, table 1, counter ==1), then the generation variable (as conditional variable) is updated (generation ++)]).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Uhrenholt with Marejka because Marejka’s teaching of providing the a generation variable and determine if it is the high contention based on the last thread arrival (the generation variable value updated one time) would have provided Uhrenholt’s system with the advantage and capability to easily identifying the high contention scenario which improving the system efficiency. 

As per claim 2, Uhrenholt and Marejka teach the invention according to claim 1 above. Marejka further teaches wherein the number of updates to the condition variable before the condition is met is equal to one (Marejka, Col 2, lines 55-56, While its mygeneration variable remains equal to the barrier's generation variable the thread will continue to wait. The last to arrive thread will change the barrier's generation variable value (as equal to one update). Col 3, lines 21-28, each of the awakened threads must acquire the barrier's lock, however, only one thread can own the lock at any time.  The awakened threads will attempt to acquire the lock as many times as necessary. Because they are all trying to acquire the lock concurrently, most of the threads will have to make multiple attempts to acquire the lock.  After each failed attempt, the thread will go back into a wait state; line 35, one thread will leave the barrier; Col 2, lines 60-65, When the last to arrive thread enters the barrier…The last to arrive thread signals the waiting thread using, for example, a cond_broadcast instruction which signals all of the waiting threads to resume.  It is this nearly simultaneous awakening that leads to the contention as the barrier is released; also see Col 3, lines 5-20, Table 1, generation =0 (as condition variable =0, no updated), wait, if counter = =1 (detect last to arrive thread), generation ++ (as generation variable (condition variable is updated one time), else, mygeneration= generation (no update to the generation variable), counter --).

As per claim 10, it is a system claim of the claim 1 above. Therefore, it is rejected for the same reason as claim 1 above. In addition, Uhrenholt further teaches a compute unit configured to execute workgroups (Uhrenholt, Fig. 1; [0025] lines 1-3, a graphics processor comprising a programmable execution unit operable to execute programs to perform processing operations), an advanced controller configured to monitor the condition value (Uhrenholt, [0147] lines 4-6, a thread exclusivity control unit (as an advanced controller) operable to determine whether a thread or threads (and which thread) satisfies the thread exclusivity condition); and a workgroup scheduler (Uhrenholt, Fig. 3, 3, 301, 302, 303 (as whole as workgroup scheduler);   [0147] lines 4-6, a thread exclusivity control unit (as an advanced controller) operable to determine whether a thread or threads (and which thread) satisfies the thread exclusivity condition; [0260] lines 1-3, The thread spawner 301 is operable to spawn execution threads for execution by the programmable execution unit 302 for (graphics) processing items). In addition, Marejka teaches the conditional value is condition variable (Marejka, Col 2, lines 50-53, For all threads except the last thread to enter the barrier, the mygeneration variable will represent the current value of the barrier's generation variable (as condition variable) (e.g., zero in the specific example));

As per claim 11, it is a system claim of claim 2 above. Therefore, it is rejected for the same reason as claim 2 above.


Claims 3 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Uhrenholt and Marejka, as applied to claim 1 above, and further in view of SONG et al. (US Pub. 2019/0356622 A1).
SONG was cited in the previous Office Action.

As per claim 3, Uhrenholt and Marejka teach the invention according to claim 1 above. Uhrenholt and Marejka fail to specifically teach wherein the monitoring includes monitoring accesses to a cache in which the condition variable is stored.

However, SONG teaches wherein the monitoring includes monitoring accesses to a cache in which the condition variable is stored (SONG, [0052] lines 5-10, The rule-checking module 510 may check each type of message (e.g., PointHistory Grain 511, AFD/SystemStatus/ModelChange Subscription Grain 512, Alam/Event Grain 513) against rules (as condition variable) stored in a storage 523 (e.g., cache) using annotation rule checker 521).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Uhrenholt and Marejka with SONG because SONG’s teaching of storing the rules within the cache and checking the message against the rules (as monitoring access, since the rule is read) that stored in the cache would have provided Uhrenholt and Marejka’s system with the advantage and capability to easily determining the whether the condition/rule is satisfied or not which improving the system performance and efficiency. 

As per claim 12, it is a system claim of claim 3 above. Therefore, it is rejected for the same reason as claim 3 above.


Claims 4 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Uhrenholt and Marejka, as applied to claims 1 and 10 above, and further in view of Yadav (US Pub. 2015/0286586 A1).
Yadav was cited in the previous Office Action.

As per claim 4, Uhrenholt and Marejka teach the invention according to claim 1 above. Uhrenholt further teaches in response to the condition being met and the wait instruction being part of the high contention scenario, scheduling one or more workgroups to be woken up (Uhrenholt, [0128] lines 3-8, the condition is such that only a single execution thread will satisfy the condition at any one time… this means that the associated set of one or more instructions can be executed by (only) one execution thread at a time; [0267] lines 8-11, The thread exclusivity instruction ("SBARRIER") (as wait instruction) in the program thus acts as a "barrier", through which only a single execution thread is allowed to pass at any one time; [0054] lines 2-13, serial processing of processing items…only a, e.g. single execution thread satisfying the thread exclusivity condition is able to be "awake" to execute the set of instructions associated with the thread exclusivity instruction at any one time, while (all) other threads "sleep". Accordingly, the technology described herein can save processing power, as compared to e.g., "spinlock" arrangements. This is generally advantageous, but may be particularly beneficial in modern mobile devices such as smart phones, tablets, and the like where system resources are restricted; Fig. 4, SBARRIER, Wakeup t0; Fig. 5, 503, Thread satisfies thread exclusivity condition, Yes to 504; [0270] lines 5-8, the control unit 303 sends "wakeup" messages such that, at any one time, only a single execution thread will continue to execute instructions in the program following "SBARRIER"). In addition, Marejka teaches in response to the condition being met and the wait instruction being part of the high contention scenario, scheduling a second set of one or more workgroups to be woken up after time associated with a synchronization primitive associated with the condition variable (Marejka, Marejka, Col 2, lines 55-56, While its mygeneration variable remains equal to the barrier's generation variable the thread will continue to wait. The last to arrive thread will change the barrier's generation variable value (as one update). Col 3, lines 21-28, each of the awakened threads must acquire the barrier's lock, however, only one thread can own the lock at any time.  The awakened threads will attempt to acquire the lock as many times as necessary. Because they are all trying to acquire the lock concurrently, most of the threads will have to make multiple attempts to acquire the lock.  After each failed attempt, the thread will go back into a wait state; line 35, one thread will leave the barrier; Col 2, lines 60-65, When the last to arrive thread enters the barrier…The last to arrive thread signals the waiting thread using, for example, a cond_broadcast instruction which signals all of the waiting threads to resume.  It is this nearly simultaneous awakening that leads to the contention as the barrier is released; also see Col 3, lines 5-20, Table 1, generation =0 (as condition variable =0, no updated), wait, if counter = =1 (detect last to arrive thread), generation ++ (as generation variable (condition variable is updated one time), else,  mygeneration= generation (no update to the generation variable), counter --; Col 8, lines 36-38, the N threads will concurrently or substantially concurrently leave the barrier 201 (as synchronization primitive). Because each thread is accessing its own semaphore [Examiner noted: the high contention still exist, however, the semaphore technique is used which allow the second set of one or more workgroups to be woken up]).

Uhrenholt and Marejka fail to specifically teach the time is an estimated time of completion of a critical section.

However, Yadav teaches the time is an estimated time of completion of a critical section (Yadav, [0004] lines 6-14, The thread that owns the lock is permitted to enter a critical section of code protected by the lock or otherwise access a shared resource protected by the lock. If a second thread attempts to obtain ownership of a lock while the lock is held by a first thread, the second thread will not be permitted to proceed into the critical section of code (or access the shared resource) until the first thread releases the lock and the second thread successfully claims ownership of the lock; [0010] lines 13-18, writer thread that wishes to acquire a lock in write mode that has been acquired by one or more reader threads may spin for a pre-determined amount of time (as estimated time of completion of a critical section, since the lock will allow the thread to enter the critical section of code) and re-try its attempt to acquire the lock).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Uhrenholt and Marejka with Yadav because Yadav’s teaching of allowing the thread to proceed into critical section code until previous thread finished its critical section to release the lock would have provided Uhrenholt and Marejka’s system with the advantage and capability to preventing any potential system failure due to the resource restriction which improving the system stability.

As per claim 13, it is a system claim of claim 4 above. Therefore, it is rejected for the same reason as claim 4 above.

Claims 5 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Uhrenholt, Marejka and Yadav, as applied to claims 4 and 13 respectively above, and further in view of Saito et al. (US Patent 5,887,143).
Saito was cited in the previous Office Action. 

As per claim 5, Uhrenholt, Marejka and Yadav teach the invention according to claim 4 above. Yadav teaches determining the estimated time of completion of the critical section (Yadav, [0004] lines 6-14, The thread that owns the lock is permitted to enter a critical section of code protected by the lock or otherwise access a shared resource protected by the lock. If a second thread attempts to obtain ownership of a lock while the lock is held by a first thread, the second thread will not be permitted to proceed into the critical section of code (or access the shared resource) until the first thread releases the lock and the second thread successfully claims ownership of the lock; [0010] lines 13-18, writer thread that wishes to acquire a lock in write mode that has been acquired by one or more reader threads may spin for a pre-determined amount of time (as estimated time of completion of a critical section, since the lock will allow the thread to enter the critical section of code) and re-try its attempt to acquire the lock).

Uhrenholt, Marejka and Yadav fail to specifically teach the time is determined by measuring a time period between detecting execution of the wait instruction and detecting that the condition is met.

However, Saito teaches the time is determined by measuring a time period between detecting execution of the wait instruction and detecting that the condition is met (Saito, Col 6, lines 13-17, Real -time barrier synchronization causes barrier synchronization of the execution of programs (threads) to be predictable by placing bounds on two important time values. One of the time values is earliest release time which is the delay from the time when the last program participating in the barrier synchronization issues a barrier message (as wait) until one of the participating programs resumes its execution (as the time period between detecting execution of the wait instruction and that the condition is met, since the programs are resumes its execution).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Uhrenholt, Marejka and Yadav with Saito because Saito’s teaching of measuring the time between wait instruction and resumes execution would have provided Uhrenholt, Marejka and Yadav’s system with the advantage and capability to allow the system to easily identifying the time that particular thread spend for executing its code section when another thread is waiting (due to barrier synchronization) in order to allowing the system to scheduling the thread more efficiently. 

As per claim 14, it is a system claim of claim 5 above. Therefore, it is rejected for the same reason as claim 5 above.

Claims 6-7 and 15-16 are rejected under 35 U.S.C. 103 as being unpatentable over Uhrenholt and Marejka, as applied to claims 1 and 10 respectively above, and further in view of Wang et al. (US Pub. 2019/0042615 A1).
Wang was cited in the previous Office Action.

As per claim 6, Uhrenholt and Marejka teach the invention according to claim 1 above. Uhrenholt further teaches detecting execution of a second wait instruction by a second workgroup, the second wait instruction specifying a second condition value and a second condition to be met for the second condition value (Uhrenholt, Fig. 4, right side, t2 (as second workgroup), After Time (wakeup t2 and wakeup t3, t2 is waiting t3 until t3 finish (as second wait instruction)), wakeup W2; [0291] lines 1-4, once threads t2 and t3 of Warp 2 have both completed executing the "barriered" section of code (as second wait instruction specifying a second condition value and a second condition to be met, since the t2 is waiting t3 to finish up), the control unit 303 sends a "warp wakeup" message (" Wakeup w2") to the threads of Warp 2); 
in response to detecting that the second condition is met, determining that the second wait instruction is part of a non-high contention scenario after the wait instruction is executed by each workgroup of a set of workgroups (Uhrenholt, [0279] lines 4-7, at any one time, only a single execution thread can execute instructions in the program in between "SBARRIER" and "SBARRIER end" at any one time (as high-contention scenario, since only one workgroup is wakeup); [0291] lines 1-4, once threads t2 and t3 of Warp 2 have both completed executing the "barriered" section of code, the control unit 303 sends a "warp wakeup" message (" Wakeup w2") to the threads of Warp 2 (as non-high contention scenario));
Application No.: 16/425,881in response to the second condition being met, and the second wait instruction being part of a non-high contention scenario, waking up each of the workgroups of a second set of workgroups waiting on the second condition value (Uhrenholt, Fig. 4, warp 2 (as second set of workgroups); [0291] lines 1-4, once threads t2 and t3 of Warp 2 have both completed executing the "barriered" section of code, the control unit 303 sends a "warp wakeup" message (" Wakeup w2") (as waking up each of the workgroups of a second set) to the threads of Warp 2).
In addition, Marejka teaches the conditional value is condition variable (Marejka, Col 2, lines 50-53, For all threads except the last thread to enter the barrier, the mygeneration variable will represent the current value of the barrier's generation variable (as condition variable) (e.g., zero in the specific example));

Uhrenholt and Marejka fail to specifically teach the detecting that the condition is met comprises determining that a plurality of updates occur to the condition variable after the wait instruction is executed by each workgroup of a set of workgroups.

However, Wang teaches the detecting that the condition is met comprises determining that a plurality of updates occur to the condition variable (Wang, [0028] lines 8-20, The contention value 204 can include, for example, a Boolean set to indicate high contention or low contention, a contention amount integer representative of an amount of contention, or another dynamically tracked access statistic. In one example, a database object under high contention includes a contention amount that surpasses a predefined contention threshold amount. For example, the predefined contention threshold amount may be "10," and contention values of 10 (or more) indicate the database object is to be accessed under the pessimistic concurrency control protocol 210 at 304. (In this example, contention values of 0 to 9 indicate the database object is under low contention.) (As if the determining that a plurality of updates occur to the condition variable (update 0-9), determining that the wait instruction is part of non-contention scenario).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Uhrenholt and Marejka with Wang because Wang’s teaching of determining the non-contention scenario based on its contention value range (as if the contention value is greater than one) would have provided Uhrenholt and Marejka’s system with the advantage and capability to identifying the different contention scenarios in order to perform necessary wakeup processes for the threads.

As per claim 7, Uhrenholt, Marejka and Wang teach the invention according to claim 6 above. Uhrenholt further teaches wherein the number of workgroups of the second set of workgroups is greater than one (Uhrenholt, Fig. 4, warp 2 (as second set of workgroups), t2 and t3 (within the warp 2 is greater than one). 

As per claims 15-16, they are system claims of claims 6-7 respectively above. Therefore, they are rejected for the same reason as claims 6-7 respectively above.

Claims 8-9 and 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Uhrenholt and Marejka, as applied to claims 1 and 10 respectively above, and further in view of Dice (US Patent. 7,318,128 B1).
Dice was cited in the previous Office Action.

As per claim 8, Uhrenholt and Marejka teach the invention according to claim 1 above. Uhrenholt teaches wherein the monitoring is performed by an advanced controller (Uhrenholt, [0147] lines 4-6, a thread exclusivity control unit (as an advanced controller) operable to determine whether a thread or threads (and which thread) satisfies the thread exclusivity condition).

Uhrenholt and Marejka fail to specifically teach the advanced controller associated with a cache or cache bank in which the condition variable is stored.

However, Dice teaches the advanced controller associated with a cache or cache bank in which the condition variable is stored (Dice, Fig. 1, 111 Cache; Col 1, lines 38-42, a cache controller in the processing device stores data, such as values for variables and/or other execution state information associated with the thread, in the cache for faster access when this information is needed during execution of that thread).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Uhrenholt and Marejka with Dice because Dice’s teaching of storing the condition variables and execution state of the threads into the cache would have provided Uhrenholt and Marejka’s system with the advantage and capability to allow the system to improve the accessing speed for the condition variables and execution state (see Dice, Col 1, lines 41-42, faster access).

As per claim 9, Uhrenholt and Marejka teach the invention according to claim 1 above. Uhrenholt teaches the condition being met (Uhrenholt, [0086] lines 1-5, determine (as monitoring) whether the execution thread satisfies a condition associated with the instruction, the condition being such that only a subset of a set of plural execution threads will satisfy the condition at any one time; [0133] lines 1-6, the thread exclusivity condition is a condition that can be (and is), at any one time, only satisfied by a subset (in an embodiment consisting of only a single execution thread) of a set of plural execution threads that are to execute the shader program that includes the thread exclusivity instruction). In addition, Marejka teaches the conditional value is condition variable (Marejka, Col 2, lines 50-53, For all threads except the last thread to enter the barrier, the mygeneration variable will represent the current value of the barrier's generation variable (as condition variable) (e.g., zero in the specific example));

Uhrenholt and Marejka fails to specifically teach wherein the condition being met comprises that condition variable being equal to, less than, or greater than a comparison value.

However, Dice teaches the condition being met comprises that condition variable being equal to, less than, or greater than a comparison value (Dice, Col 9, lines 10-16, determine if the execution behavior pattern statistically meets a threshold associated with patterns of access to the shared data, and if the execution behavior pattern statistically meets the threshold (as include equal to, less than or great than a comparison value) associated with patterns of access to the shared data, the synchronization subsystem responsible for controlling access to the shared data)

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Uhrenholt and Marejka with Dice because Dice’s teaching of determining whether the execution behavior pattern (as condition variable) being meet based on comparing with the threshold (as comparison value) would have provided Uhrenholt and Marejka’s system with the advantage and capability to allow the system to easily determine when to allow the  workgroups/threads to access the data (or wake up) which improving the system efficiency.

As per claims 17-18, they are system claims of claims 8-9 respectively above. Therefore, they are rejected for the same reason as claims 8-9 respectively above.


Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Uhrenholt et al. (US. Pub. 2020/0211146 A1) in view of Dice (US Patent. 7,318,128 B1) and further in view of Marejka (US Patent. 7,512,950 B1).
Uhrenholt, Dice and Marejka were cited in the previous Office Action.

As per claim 19, Uhrenholt teaches the invention substantially as claimed including A system for scheduling on an accelerated processing device (Uhrenholt, Fig. 1 (as accelerated processing device); [0025] lines 1-3, a graphics processor comprising a programmable execution unit operable to execute programs to perform processing operations; [0070] lines 1-7, the program can, and in an embodiment does, include one or more sections of instructions to be executed (and that in an embodiment are executed) by only a subset of execution threads (e.g. and in an embodiment, a single execution thread) at any one time), the system comprising: 
a compute unit configured to execute workgroups (Uhrenholt, Fig. 1; [0025] lines 1-3, a graphics processor comprising a programmable execution unit operable to execute programs to perform processing operations), wherein a workgroup of the workgroups is configured to execute a wait instruction specifying a condition value and a condition to be met for the condition value (Uhrenholt, [0126] lines 1-2, A thread exclusivity condition can be associated with a thread exclusivity instruction; [0128] lines 1-8, the subset consists of only a single execution thread from the set of plural execution threads. Thus, in an embodiment, the condition is such that only a single execution thread will satisfy the condition at any one time… this means that the associated set of one or more instructions can be executed by (only) one execution thread at a time (as specifying that only one (as condition value) execution thread will be satisfying/met for the condition); 
an advanced controller, the advanced controller being configured to monitor the condition value to detect the condition being met (Uhrenholt, [0147] lines 4-6, a thread exclusivity control unit (as an advanced controller) operable to determine whether a thread or threads (and which thread) satisfies the thread exclusivity condition [0086] lines 1-5, determine (as monitoring) whether the execution thread satisfies a condition associated with the instruction, the condition being such that only a subset of a set of plural execution threads will satisfy the condition at any one time; [0133] lines 1-6, the thread exclusivity condition is a condition that can be (and is), at any one time, only satisfied by a subset (in an embodiment consisting of only a single execution thread) of a set of plural execution threads that are to execute the shader program that includes the thread exclusivity instruction); and 
a workgroup scheduler configured to (Uhrenholt, Fig. 3, 3, 301, 302, 303 (as whole as workgroup scheduler);   [0147] lines 4-6, a thread exclusivity control unit (as an advanced controller) operable to determine whether a thread or threads (and which thread) satisfies the thread exclusivity condition; [0260] lines 1-3, The thread spawner 301 is operable to spawn execution threads for execution by the programmable execution unit 302 for (graphics) processing items).
in response to detecting that the condition is met, determine that the wait instruction is part of a high contention scenario (Uhrenholt, [0128] lines 3-8, the condition is such that only a single execution thread will satisfy the condition at any one time… this means that the associated set of one or more instructions can be executed by (only) one execution thread at a time; [0267] lines 8-11, The thread exclusivity instruction ("SBARRIER") (as wait instruction) in the program thus acts as a "barrier", through which only a single execution thread is allowed to pass at any one time; [0054] lines 2-13, serial processing of processing items…only a, e.g. single execution thread satisfying the thread exclusivity condition is able to be "awake" to execute the set of instructions associated with the thread exclusivity instruction at any one time, while (all) other threads "sleep". Accordingly, the technology described herein can save processing power, as compared to e.g., "spinlock" arrangements. This is generally advantageous, but may be particularly beneficial in modern mobile devices such as smart phones, tablets, and the like where system resources are restricted; [0055] lines 1-9, when using a "spinlock" arrangement, for example, to ensure exclusive access to a resource by a thread, when the lock is obtained by one thread, other threads waiting to use the resource will repeatedly attempt to obtain the lock. Such repeated attempts, or "spinning"…in which many threads may be trying to acquire the lock at any given time [Examiner noted: only one thread is allowed (serial processing) due to resource restriction (many thread trying to acquire the same resource, as high contention scenario)]; wherein the set of workgroups includes two or more workgroups (Uhrenholt, Fig. 4, t0, t1 (as number of workgroups is greater than one), warp 1 (as a set of workgroups); SBARRIER (T0 and T1 is hitting the SBARRIER and wait)), and
wake up a single workgroup of the set of workgroups in response to the condition being met and in response to the wait instruction being part of the high contention scenario (Uhrenholt, Fig. 4, SBARRIER, Wakeup t0; Fig. 5, 503, Thread satisfies thread exclusivity condition, Yes to 504; [0270] lines 5-8, the control unit 303 sends "wakeup" messages such that, at any one time, only a single execution thread will continue to execute instructions in the program following "SBARRIER"; [0298] lines 6-11,  When the thread encounters the thread exclusivity instruction in the program, it is determined whether the thread satisfies a thread exclusivity condition (step 503), which condition can only be satisfied by a single execution thread (subset) at any one time. If the thread satisfies the condition, the thread executes a set of instructions in the program associated with the thread exclusivity instruction; [Examiner noted: in response the condition is met, only one workgroup is wakeup due to the high contention scenario]).

Uhrenholt fails to specifically teach the system comprising a cache system having a cache controller, and wherein the advanced controller associated with a cache or cache bank of the system. -25-5628839-1AMD-180498-US-NP

However, Dice teaches the system comprising a cache system having a cache controller (Dice, Col 1, lines 38-39, a cache controller in the processing device stores data), and wherein the advanced controller associated with a cache or cache bank of the system (Dice, Fig. 1, 110-1, 111 Cache; Col 1, lines 38-42, a cache controller in the processing device stores data, such as values for variables and/or other execution state information associated with the thread, in the cache for faster access when this information is needed during execution of that thread).-25-5628839-1AMD-180498-US-NP

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Uhrenholt with Dice because Dice’s teaching of cache controller and storing the condition variables and execution state of the threads into the cache would have provided Uhrenholt’s system with the advantage and capability to allow the system to improve the accessing speed for the condition variables and execution state (see Dice, Col 1, lines 41-42, faster access).

Uhrenholt and Dice fail to specifically teach the conditional value is condition variable, and wherein the detecting that the condition is met comprises determining that a single update occurs to the condition variable after the wait instruction is executed by each workgroup of a set of workgroups.

However, Marejka teaches the conditional value is condition variable (Marejka, Col 2, lines 50-53, For all threads except the last thread to enter the barrier, the mygeneration variable will represent the current value of the barrier's generation variable (as condition variable) (e.g., zero in the specific example); Col 2, line 65-Col 3 line 3, The last to arrive thread may also execute instructions (as include wait instruction) to prepare the barrier for the next iteration, for example by incrementing the generation variable and resetting the counter value to equal the limit variable. Expressed in pseudocode, the above steps may be represented as shown in Table 1; Col 3, lines 5-20, Table 1, Initialize barrier for N thread usage, generation =0; wait); and
wherein the detecting that the condition is met comprises determining that a single update occurs to the condition variable after the wait instruction is executed by each workgroup of a set of workgroups (Marejka, Col 2, lines 55-56, While its mygeneration variable remains equal to the barrier's generation variable the thread will continue to wait. The last to arrive thread will change the barrier's generation variable value (as one update). Col 3, lines 5-20, Table 1, Initialize barrier for N thread usage, generation =0, wait, Else mygeneration=generation, counter-- for each thread [Examiner noted: when each thread entering barrier, it will wait (wait instruction), as after the wait instruction is executed by each workgroup]; Col 3, lines 21-28, each of the awakened threads must acquire the barrier's lock, however, only one thread can own the lock at any time.  The awakened threads will attempt to acquire the lock as many times as necessary. Because they are all trying to acquire the lock concurrently, most of the threads will have to make multiple attempts to acquire the lock.  After each failed attempt, the thread will go back into a wait state; line 35, one thread will leave the barrier; Col 2, lines 60-65, When the last to arrive thread enters the barrier…The last to arrive thread signals the waiting thread using, for example, a cond_broadcast instruction which signals all of the waiting threads to resume.  It is this nearly simultaneous awakening that leads to the contention as the barrier is released; also see Col 3, lines 5-20, Table 1, generation =0 (as condition variable =0, no updated), wait, if counter = =1 (detect last to arrive thread), generation ++ (as condition variable that a single update), else,  mygeneration= generation (no update to the generation variable), counter --; [Examiner noted: each workgroup (thread) of the set of workgroups are entering the barrier, and the high contention occurs due to all the thread need to acquire the barrier's lock for existing the barrier, and when the last thread enter the barrier, after wait (see Col 3, Table 1), if determined that thread is last thread (Col 3, table 1, counter ==1), then the generation variable (as conditional variable) is updated (generation ++)]).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Uhrenholt and Dice with Marejka because Marejka’s teaching of providing the a generation variable and determine if it is the high contention based on the last thread arrival (the generation variable value updated one time) would have provided Uhrenholt and Dice’s system with the advantage and capability to easily identifying the high contention scenario which improving the system efficiency. 

Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Uhrenholt, Dice and Marejka, as applied to claim 19 above, and further in view of SONG et al. (US Pub. 2019/0356622 A1).
SONG was cited in the previous Office Action.

As per claim 20, Uhrenholt, Dice and Marejka teach the invention according to claim 19 above. Uhrenholt, Dice and Marejka fail to specifically teach wherein the monitoring includes monitoring accesses to a cache in which the condition variable is stored.

However, SONG teaches wherein the monitoring includes monitoring accesses to a cache in which the condition variable is stored (SONG, [0052] lines 5-10, The rule-checking module 510 may check each type of message (e.g., PointHistory Grain 511, AFD/SystemStatus/ModelChange Subscription Grain 512, Alam/Event Grain 513) against rules (as condition variable) stored in a storage 523 (e.g., cache) using annotation rule checker 521).

It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined the teaching of Uhrenholt, Dice and Marejka with SONG because SONG’s teaching of storing the rules within the cache and checking the message against the rules (as monitoring access, since the rule is read) that stored in the cache would have provided Uhrenholt, Dice and Marejka’s system with the advantage and capability to easily determining the whether the condition/rule is satisfied or not which improving the system performance and efficiency. 


Response to Arguments  
In the remark Applicant’s argue in substance: 
(a). Applicants disagree that the generation variable is the same as a condition variable for the following reasons. The generation variable is not a condition variable because in Marejka, a condition variable is explicitly described as being separate from a generation variable. For example, column 2, lines 14-17 list a condition variable ("cv") as a separate thing than the "generation variable." Thus it is clear that the condition variable is separate from the generation variable.

(b), claims 1, 10, and 19 recite that the wait instruction specifies a condition variable. The generation variable in Marejka is not specified by a wait instruction. For example, in Table 1, the wait instruction ("wait") is subsequent to the generation variable declaration ("generation=0"). Thus, the generation variable specified in Marejka cannot be the recited condition variable.

Examiner respectfully disagreed with Applicant’s argument for the following reasons:
As to point (a), Examiner would like to point out that Applicant is mischaracterizing the mapping of the Marejka reference. Examiner has explicitly mapping the “generation variable” of Marejka to the “conditional variable” of the instant application, because the “generation variable” of Marejka is corresponding to the conditional variable as claimed. 
For example, Marejka teaches a mechanism that when threads arrived barrier, they are acquire the lock as many times as necessary. Because they are all trying to acquire the lock concurrently (as high contention), most of the threads will have to make multiple attempts to acquire the lock. After each failed attempt, the thread will go back into a wait state, and only one thread will leave the barrier (see Marejka, Col 2, lines 60-65, When the last to arrive thread enters the barrier…The last to arrive thread signals the waiting thread using, for example, a cond_broadcast instruction which signals all of the waiting threads to resume.  It is this nearly simultaneous awakening that leads to the contention as the barrier is released (condition to be met); Col 3, lines 21-28). And when threads arrived barrier, it will execute instructions to prepare the barrier, and if the last thread is entering the barrier, the generation variable will be updated/incremented (see Marejka, Col 2, line 65-Col 3 line 3, The last to arrive thread may also execute instructions (as include wait instruction) to prepare the barrier for the next iteration, for example by incrementing the generation variable and resetting the counter value to equal the limit variable). That is, the instruction executed to prepare the barrier and wait of Marejka refers to the wait instruction, and when determining condition is met which is when the last thread enter the barrier, the generation variable is updated, all the waiting threads to resume which leads to the contention (see Marejka, Col 3, Table 1, initialize barrier, generation=0, wait, if counter == 1 (that means this is last thread), generation ++ (updated), Else, mygeneration=generation (not updated), and counter --, cond-wait (wait until next iteration). Therefore, the “generation variable” of Marejka is indeed corresponding to the “conditional variable” as claimed. 

As to point (b), Examiner would like to point out that the rejection is based on 103 rejection using multiple references, and “a wait instruction” was taught by Uhrenholt. For example, Uhrenholt teaches execution of a wait instruction (see Uhrenholt, Fig. 4, T0 (as workgroup), SBARRIER; [0069] lines 4-6, the program may contain a single set or plural sets of one or more instructions, each set associated with a corresponding thread exclusivity instruction; [0070] lines 2-5, one or more sections of instructions to be executed…by only a subset of execution threads (e.g. and in an embodiment, a single execution thread) at any one time; [0267] lines 1-11, When a thread reaches "SBARRIER", in response to that instruction (as detecting the execution of “SBARRIER” instruction (as wait instruction)), the thread sends a message to the thread exclusivity control unit 303 ("SBARRIER Control Unit") to determine whether the thread satisfies a condition for proceeding with the execution of instructions beyond "SBARRIER" in the program…The thread exclusivity instruction ("SBARRIER") (as wait instruction) in the program thus acts as a "barrier", through which only a single execution thread is allowed to pass at any one time.). Furthermore, Marejka was used for teaching the condition variable of the wait instruction (see Marejka, Col 2, lines 50-53, For all threads except the last thread to enter the barrier, the mygeneration variable will represent the current value of the barrier's generation variable (as condition variable) (e.g., zero in the specific example); Col 2, line 65-Col 3 line 3, The last to arrive thread may also execute instructions (as include wait instruction) to prepare the barrier for the next iteration, for example by incrementing the generation variable and resetting the counter value to equal the limit variable. Expressed in pseudocode, the above steps may be represented as shown in Table 1; Col 3, lines 5-20, Table 1, Initialize barrier for N thread usage, generation =0 (Table 1, top portion (initialize barrier to wait) as whole as wait instruction include condition variable); wait).
To the extent that applicants are arguing against the references individually, the examiner reminds the applicants that one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).

In addition, in response to Applicant’s argument that “the wait instruction ("wait") is subsequent to the generation variable declaration ("generation=0"). Thus, the generation variable specified in Marejka cannot be the recited condition variable.” Examiner respectfully disagrees. The generation variable declaration in Marejka is before the wait instruction (‘wait’), however, the updating of the generation variable is after wait instruction(‘wait’), which is corresponding to the determining that a single update occurs to the condition variable after the wait instruction is executed by each workgroup of a set of workgroups (see Marejka, Col 2, lines 55-56, While its mygeneration variable remains equal to the barrier's generation variable the thread will continue to wait. The last to arrive thread will change the barrier's generation variable value (as one update). Col 3, lines 5-20, Table 1, Initialize barrier for N thread usage, generation =0, wait, Else mygeneration=generation, counter-- for each thread [Examiner noted: when each thread entering barrier, it will wait (wait instruction), as after the wait instruction is executed by each workgroup]; Col 2, lines 60-65; also see Col 3, lines 5-20, Table 1, generation =0 (as condition variable =0, no updated), wait, if counter = =1 (detect last to arrive thread), generation ++ (as condition variable that a single update), else,  mygeneration= generation (no update to the generation variable), counter --; [Examiner noted: each workgroup (threads including last thread) of the set of workgroups are entering the barrier, and the high contention occurs due to all threads need to acquire the barrier's lock for existing the barrier, and when the last thread enter the barrier, after wait (see Col 3, Table 1), if determined that thread is last thread (Col 3, table 1, counter ==1), then the generation variable (as conditional variable) is updated (generation ++)]).
For the reasons above, Applicant’s argument has not been found to be persuasive, and therefore the rejections are maintained. 



Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ZUJIA XU whose telephone number is (571)272-0954. The examiner can normally be reached M-F 9:00-5:30 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Meng-Ai An can be reached on (571) 272-3756. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/MENG AI T AN/Supervisory Patent Examiner, Art Unit 2195                                                                                                                                                                                                        




/Z.X./Examiner, Art Unit 2195