DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This final office action is responsive to the amendments filed on 02/23/2022.
Claims 1-20 are pending.

Response to Amendment

Applicant has amended independent claims 1, 9, 14 and dependent claims 5-8, and 20 to include new/old limitations in a form not previously presented necessitating new search and considerations.  


Specification

The title of the invention filed on 02/23/2022 is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. Following is an example of the title suggested by the Examiner:

-- MULTIPLE INDEPENDENT SYNCHRONIZATION NAMED BARRIER WITHIN A THREAD GROUP --


Claim Rejections - 35 USC § 112

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.



Claims 1-20 are rejected under 35 U.S.C. 112 (b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or joint inventor regards as the invention.

The following claim language is not clearly understood:

Claim 1 lines 6-8 recites “allocate a first/second set of the plurality of buffers associated with a first/second named barrier”. It is unclear set of buffers are allocated to what. 

Claims 9 and 14 recites elements of claim 1 and have similar deficiency as claim 1. Therefore, they are rejected for the same rational. Remaining dependent claims are also rejected due to their dependency on the rejected independent claims.


Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:




Claims 1-8 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.

Independent claim 1 recites an “apparatus” comprising …buffers…processing resources, which is not defined by the specification. Buffers and processing resources are described in the specification without specifically reciting if these are hardware and/or software e.g. return buffers ([0050] [0151]) command buffer ([0052]) TLBs ([0062]) color buffers, depth buffers stencil buffers ([0112]) memory buffers ([0158]) and Graphics processing resources ([0058]) Parallel processing resources ([0207] ) processing resources ([0244]  [0259] [0267] [0272] fig. 21) are described as examples and are not limiting. The broadest reasonable interpretation of a claim drawn to claimed “apparatus” covers forms of software and/or hardware per se in view of the ordinary and customary meaning of computer usable program product, particularly when the specification is silent. Applicant is advised to recite at least any one hardware elements e.g. processor, GPU, memory) to overcome the 35 USC 101 non-statutory rejection. 
Claims 2-8 are dependent claims of claim 1 and do not cure the deficiency of the independent claim. Therefore, they are rejected for the same reason.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more or integrating into practical application.  

Based upon at least the decision by the United States Supreme Court in Alice Corp. v. CLS Bank Int'l, 134 S. Ct. 2347, 2354 (2014), post-Alice precedential court decisions, and 2019 Revised Patent Subject Matter Eligibility Guidance, claims 9-13 are determined to be directed to an abstract idea.  Examples of abstract ideas include at least Mathematical concepts, Mental process and Certain Methods of organizing human activity.
Step 1: Statutory category? - 
	Claims 1-8: No
Claims 9-20: Yes 

Step 2A prong 1: Recites a judicial exception? - Yes
	Claim 1 recites “allocate a first set of buffers associated with a first named barrier, allocate a second set of the plurality of buffers associated with a second named barrier,”, “assign a first set of plurality of execution threads in the thread workgroup to the first named barrier, assign a second set of plurality of execution threads in the thread workgroup to the second named barrier”, “each thread in first set of execution threads is assigned a first identifier indicating that the thread is associated with the first named barrier and each thread in the second set of execution threads is assigned a second identifier indicating that the thread is 


Step 2A prong 2: Integrate judicial exception into practical application? - No
	Additional elements of claim 1: apparatus, plurality of buffers; synchronizing execution of the first set of execution threads via the first named barrier; synchronizing execution of the second set of execution threads via the second named barrier; (generic computing component/method tied to technological environments without any improvement in the technology/technical field).

Step 2B: Amount to significantly more than judicial exception? - No
Additional claim elements same as step 2A prong 2 (WURC)

First, claims 1-8 are directed to an apparatus without any hardware and doesn’t pass the step 1 (Step 1 - Yes). The analysis moves to step 2A of the two-prong inquiry of Mayo/Alice two-part framework. 

	Claim 1 is directed to “synchronizing execution of group of threads by assigning different sets of threads of a thread workgroup to different named barriers” at a high level of generality”. The claim elements of “allocate a first set of buffers associated with a first named barrier, allocate a second set of the plurality of buffers associated with a second 

The judicial exception is not integrated into a practical application. In particular, the claim 1 only recites additional claim elements of “apparatus”, “plurality of buffers”; synchronizing execution of the first/second set of execution threads via the first/second named barrier. These additional elements recite generic computing component or generic computing method, that when considered alone or in combination, either fall into insignificant pre/post solution activities or merely tieing the generic components/methods Mayo/Alice two-part framework.

The claim doesn’t include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional claim elements of the processing unit for the data processing system are an example of pre/post-solution activity and are merely linking the abstract idea to a particular technological environment. For example, dispatching thread to processing resource and synchronizing execution threads are well-known, routine and conventional (specification: Background, IDS filed in the instant application, references cited in PTO-892) and are not sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.
		



Dependent claims 2-8, 10-13 and 15-20 recite claim elements that are either abstract idea or additional claim elements, that individually or in combination, are either generic computing methods/components or insignificant pre-post solution activity and neither integrate into practical application nor amount to significantly more, based on similar analysis as above with respect to claim 1. 

Therefore, the claim(s) 1-20 are rejected under 35 U.S.C. 101 as being directed to judicial exception without integrating into practical application or significantly more.




Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.



Claims 1, 7, 9, 14, 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lindholm et al. (US Pub. No. 2014/0282566A1, hereafter Lindholm) in view of Lindholm.


As per claim 1, Lindholm teaches the invention substantially as claimed including an apparatus, comprising: 

a plurality of buffers ([0095] fig. 5C FIFO 505 [0057] registers 304, each register is allocated to one thread);
a plurality of processing resources to execute a plurality of execution threads included in a thread workgroup ([0041] fig 3B, GPC 208, number M of SM 310, process one or more thread groups,
a first set of the plurality of buffers associated with a first named barrier ([0095] fig. 5C FIFO 505 buffers barriers, barrier popped from the FIFO 505 i.e. barrier is associated with the buffer [0068] barrier state [0069] thread state, barrier identifier),
a second set of the plurality of buffers associated with a second named barrier ([0095] fig. 5C FIFO 505 buffers barriers, barrier popped from the FIFO 505 i.e. barrier is associated with the buffer [0068] barrier state [0069] thread state, barrier identifier), 

assign a first set of the plurality of execution threads in the thread workgroup to the first named barrier ([0004] first sub-group of threads in the plurality of threads is associated with a first sub-barrier index, barrier instruction scheduled for execution [0042] thread group, one or more threads, concurrently executing, each SM 310 can support up to G thread groups [0043] collection of thread groups, CTA, each CTA , number of warps, executing in the same SM310 [0061] fig. 4 each CTA can use eight different barriers, unique barrier identifier [0064] fig 4 [0077] assign, each sub-groups, different barrier identifier), assign a second set of the plurality of execution threads in the thread workgroup to a second named barrier ([0004] second sub-group of threads in the plurality of threads is associated with a second sub-barrier index, barrier instruction scheduled for execution [0042] thread group, one or more threads, concurrently executing, each SM 310 can support up to G thread groups [0043] collection of thread groups, CTA, each CTA , number of warps, executing in the same SM310 [0061] fig. 4each CTA can use eight different barriers, unique barrier identifier [0064] fig 4 [0077] assign, each sub-groups, different barrier identifier), 

synchronize execution of the first set of execution threads via the first named barrier ([0074]-[0076] barrier to synchronize execution of participating threads [0061] fig. 4each CTA can use eight different barriers, unique barrier identifier [0064] fig 4) and synchronize execution of the second set of execution threads via the second named barrier ([0074]-[0076] barrier to synchronize execution of participating threads [0061] fig. 4each CTA can use eight different barriers, unique barrier identifier [0064] fig 4), wherein each thread in the first set of execution threads is assigned a first identifier ([0069] thread, state, barrier identifier) indicating that the thread is associated with the first named barrier ([0070] each thread that participates in a particular barrier specifies a barrier identifier corresponding to the particular barrier and a thread may participate in one barrier at a time) and each thread in the second set of execution threads is assigned a second identifier ([0069] thread, state, barrier identifier ) second named barrier ([0070] each thread that participates in a particular barrier specifies a barrier identifier corresponding to the particular barrier and a thread may participate in one barrier at a time).

Lindholm doesn’t specifically teach allocate buffers. Lindholm , however, teaches allocate registers exclusively to threads ([0057]).

It would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention was made to have realized that teachings of Lindholm of allocating registers for exclusive use by the threads is  equivalent to the claim limitations of allocating buffers as in the instant invention.

As per claim 7, Lindholm teaches a register to track the first named barrier and the second named barrier ([0078] 2 bits, sub-barrier index, stored in a per-thread register [0094] tracking unit, updates barrier state information stored in the barrier state 502, each barrier, allocated to CTA).


Claim 9 recites a method for limitation similar to those of claim 1. Therefore, it is rejected for the same rationales.


Claim 14 recites a graphics processing unit (GPU) comprising a plurality of slices (Lindholm: Fig 3B GPC 208 SM 310), each having a plurality of sub-slices (Lindholm: [0041] each SM includes identical set of function execution units), including: a plurality of processing resources to execute a plurality of execution threads (Lindholm: [0042] collection, concurrently executing threads, across parallel processing engines within SM) for limitations similar to those of claim 1. Therefore, it is rejected for the same rational.

Claim 20 recites the GPU of claim 14 with limitations similar to those of claim 7. Therefore, it is rejected for the same rationales.


Claims 2-3, 6, 8, 10-11, 13, 15-16 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lindholm et al. (US Pub. No. 2014/0282566A1, hereafter Lindholm) in view of Lindholm and further in view of Champseix et al. (US Pub. No. 2015/0339173 A1, hereafter Champseix).
Champseix was cited in the last office action.
As per claim 2, Lindholm doesn’t specifically teach message gateway to facilitate communication between the plurality of execution threads.

Champseix teaches further comprising a message gateway to facilitate communication between the plurality of execution threads ([0048] fig 5 transmission, content, cores, tri-state gates 58 [0018] core allocated to the thread).  

It would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention was made to combine the teachings of Lindholm with the teachings of Champseix of transmission of contents through the tri-state gates among 


As per claim 3, Lindholm teaches wherein each of the first set of execution threads signals to the first named barrier ([0061] fig. 4 each CTA can use eight different barriers, unique barrier identifier [0064] fig 4 [0077] assign, each sub-groups, different barrier identifier ([0042] thread group, one or more threads, concurrently executing, one thread of the group being assigned to a different processing engine with SM310, each SM 310 can support up to G thread groups [0043] collection of thread groups, CTA, each CTA , number of warps, executing in the same SM310 [103] thread, reaches, barrier 565 [0115] fig 6 636) and each of the second set of execution threads signals to the second named barrier ([0061] fig. 4each CTA can use eight different barriers, unique barrier identifier [0064] fig 4 [0077] assign, each sub-groups, different barrier identifier [0042] thread group, one or more threads, concurrently executing, one thread of the group being assigned to a different processing engine with SM310, each SM 310 can support up to G thread groups [0043] collection of thread groups, CTA, each CTA , number of warps, executing in the same SM310 [103] thread, reaches, barrier 565 [0115] fig 6 636).  

Champseix teaches remaining claim elements of first set transmit signal to first barrier via the message gateway ([0019] cores, exchange, information,  through message [0019] thread, reached synchronization point, message sent to the synchronization mailbox [0048] fig 5 58 20 54 barrier, participant register, transmission, content, register, cores, through tri-state gates) and second set transmit signal to second barrier ([0019] cores, exchange, information,  through message [0019] thread, reached synchronization point, message sent to the synchronization mailbox [0048] fig 5 58 20 54 barrier, participant register, transmission, content, register, cores, through tri-state gates).


As per claim 6, Lindholm teaches wherein the first set of execution threads and the second set of execution threads each comprise producer threads and consumer threads ([0042] thread group, one or more threads, concurrently executing, one thread of the group being assigned to a different processing engine with SM310, each SM 310 can support up to G thread groups [0043] collection of thread groups, CTA, each CTA , number of warps, executing in the same SM310 [0050] thread, read/update data [0058] read from or write to any location [0049] CTA, read from/write to, thread, produce/write).  

As per claim 8, Lindholm teaches wherein the first counter is reset upon completion of execution of the first set of execution threads ([0068] barrier, changes state, idle , execution of barrier is done [0076] last participating thread reaches the bottom barrier, execution of the barrier is complete, state, idle [0086] last threads, particular sub-group, reaches, barrier, state, barrier identifier, idle [0111] barrier, reset) and the second counter is reset upon completion of execution of the second set of execution threads ([0068] barrier, changes state, idle , execution of barrier is done [0076] last participating thread reaches the bottom barrier, execution of the barrier is complete, state, idle [0086] last threads, particular sub-group, reaches, barrier, state, barrier identifier, idle [0111] barrier, reset).  

Claim 10 recites the method of claim 9 for limitations similar to those of claim 2. Therefore, it is rejected for the same rational.
Claim 11 recites the method of claim 9 for limitations similar to those of claim 3. Therefore, it is rejected for the same rational.
Claim 13 recites the method of claim 9 for limitations similar to those of claim 8. Therefore, it is rejected for the same rational.
Claim 15 recites the GPU of claim 14 with limitations similar to those of claim 2. Therefore, it is rejected for the same rationales.
Claim 16 recites the GPU of claim 14 with limitations similar to those of claim 3. Therefore, it is rejected for the same rationales.
Claim 19 recites the GPU of claim 14 with limitations similar to those of claim 6. Therefore, it is rejected for the same rationales.



Claims 4-5, 12, and 17-18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lindholm in view of Lindholm and further in view of Champseix, as applied to above claims, and further in view of Williamson (US Pub. No. 2009/0157817 A1).
Williamson was cited in the last office action.

As per claim 4, Lindholm teaches wherein the message gateway: 
a first counter associated with the first named barrier ([0061] counter, appended to barrier identifier); and 
([0061] counter, appended to barrier identifier).  
Champseix teaches remaining claim elements of message gateway ([0019] cores, exchange, information,  through message [0019] thread, reached synchronization point, message sent to the synchronization mailbox [0048] fig 5 58 20 54 barrier, participant register, transmission, content, register, cores, through tri-state gates).
Lindholm and Champseix, in combination, do not specifically teach message gateway comprises counter.
Williamson, however, teaches message gateway comprises counter (fig 1 event driven instant message gateway 105 event object pool 115 fig. 2 object pool 210 index counter 225 230).

It would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention was made to combine the teachings of analogous (Lindholm [0001] Champseix [0001]) prior art of Lindholm and Champseix with the teachings of Williamson of gateway comprising counters to improve efficiency (Lindholm [0002] [0003] Champseix [0026] [0027]) and allow message gateway comprising counters to the method of Lindholm and Champseix as in the instant invention.
As per claim 5, Lindholm teaches wherein the first counter is incremented upon receiving a signal from one of the first set of execution threads and the second counter is incremented upon receiving a signal from one of the second set of execution threads ([0061] counter, appended to barrier identifier, incrementing counter, unique barrier identifiers [103] thread, reaches, barrier 565 [0115] fig 6 636).  

Claim 12 recites the method of claim 9 for limitations similar to those of claims 4 and 5. Therefore, it is rejected for the same rational.

Claim 17 recites the GPU of claim 14 with limitations similar to those of claim 4. Therefore, it is rejected for the same rationales.
 
Claim 18 recites the GPU of claim 14 with limitations similar to those of claim 5. Therefore, it is rejected for the same rationales.


Examiners Note
Applicant is further reminded of that the cited paragraphs and in the references as applied to the claims above for the convenience of the applicant(s) and although the specified citations are representative of the teachings of the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested from the applicant in preparing responses, to fully consider all of the references in entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the examiner. 


Response to Arguments

The previous claim objections have been withdrawn.
The previous specification objections have been maintained.
The previous 112 (a) rejection has been withdrawn.
The previous 112 (b) rejection has been withdrawn. However, some new 112(b) objections has been made.
The previous 112(f) interpretations have been cancelled. 
The previous 35 USC 101 rejections have been maintained and the previous invitation to DSMER pilot has been withdrawn.
Applicant's arguments filed on 02/03/2020 have been fully considered but they are not persuasive. In Applicant’s response filed on 02/03/2020, Applicant argues the following:
Applicant submits that a thread block that can use any of multiple barrier is not 
equivalent to assigning threads within a single thread group to multiple barrier.

Examiner has thoroughly considered Applicant’s arguments, but respectfully, find them unpersuasive for at least the following reasons:
With respect to point a: Examiner respectfully disagree. Lindholm teaches first sub-group of threads in the plurality of threads is associated with a first sub-barrier index ([0004]) and each sub-group is associated with different barrier identifier ([0077]), i.e. within a given group of threads, multiple sub-groups are associated with different barriers, which is equivalent to the claimed limitations of assigning threads within a single thread group to multiple barrier. Applicant argues that the CTA include one or more thread blocks that can use eight different barriers and is .

Allowable Subject Matter

Examiner proposed claims 1+2+4 as allowable subject matter on 03/21/2022. However, Applicant instead requested office action.

Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Bourd; Alexei Vladimirovich (US-20170316076-A1) Inter-Subgroup Data Sharing
Diamos; Gregory Frederick (US-20160019066-A1)  Execution Of Divergent Threads Using A Convergence Barrier
Liu; Yanxun (US-20140130025-A1) Compiler Optimization Based On Collectivity Analysis
Lopez; Ricardo Jorge (US-20180314460-A1) INCLUSION MONITORS Using CAM For Naming Barriers
Tørudbakken; Ola (US-20200014560-A1) Gateway Fabric Ports

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to ABU GHAFFARI whose telephone number is (571)270-3799.  The examiner can normally be reached on Monday-Thursday 14:00 - 15:00 Hrs.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.  
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Meng-Ai AN can be reached on 571-272-3756.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.




/ABU ZAR GHAFFARI/Primary Examiner, Art Unit 2195