DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
This office action is responsive to the amendment received 1/3/2022.

In the response to the Non-Final Office Action 10/1/2021, the applicant states that Claims 21, 23, 27, 30, 32, 36, and 38 have been amended. Claim 26 has been cancelled. Claims 21-40 are currently pending.

Claims 21, 23, 27, 30, 32, 36, and 38 have been amended. Claim 26 has been cancelled. In summary, claims 21-25 and 27-40 are now presented for examination.

Response to Arguments
Applicant's arguments filed 1/3/2022 have been fully considered but they are not persuasive. 

The amendment has cured the basis of Objections of Drawings. Therefore, the Objections of Drawings are hereby withdrawn.

Regarding to claim 21, the applicant argues that the Sampson and Kiel references do not teach the elements of the amended claim 21. In particular, it is submitted that Sampson and Kiel do not teach reserving a block of memory for a plurality of potential memory-based software barriers, and assigning a first memory-based software barrier to a first thread group of the one or more thread groups including allocating a first memory block to support the first memory-based software barrier, the first memory block for the first memory-based software barrier being allocated in the reserved block of memory, and setting a first register to identify the first memory-based software barrier, the first register including a plurality of bits to indicate a virtual address for the assigned first memory block in the reserved block of memory. The arguments have been fully considered, but they are not persuasive. The examiner cannot concur with the applicant for following reasons:
 Sampson discloses “reserving a block of memory for a plurality of potential memory-based software barriers”. For example, in page 2, section 1. Introduction, Sampson teaches the shared portions of the memory subsystem. In page 5, section 3.3.1. Registering a Barrier, Sampson teaches the barrier routines are located within a barrier library in a memory; Sampson further teaches a block of memory is reserved and is occupied by a barrier library in a memory; Sampson further more teaches receiving the virtual addresses corresponding to the arrival address and the exit address as the response. In page 5, section 3.3.2: Initializing the Barrier, Sampson teaches the operating system allocates and reserve the cache line addresses, i.e. a memory, for a barrier filter in such a way that the lower bits of the arrival and exit address can 
Sampson further discloses “assigning a first memory-based software barrier to a first thread group of the one or more thread groups including”. For example, in page 3, section 3.1. Barrier Overview, Sampson teaches providing barrier synchronization for a set of threads by making those threads access specific cache lines; Sampson further teaches a thread uses this barrier mechanism; Sampson further more teaches denoting the thread’s arrival at the barrier and the thread does not continue to execute instructions after the arrival at the barrier. In page 4, section 3.2. Barrier Filter Architecture, Sampson teaches each barrier filter contains the number of threads participating in the barrier. In page 4, Sampson teaches memory fence or synchronization instructions; Sampson further teaches each thread executes the following abstract code sequence to perform the barrier. In page 5, section 3.3.1. Registering a Barrier, Sampson teaches the barrier routines are located within a barrier library in a memory; Sampson further teaches a block of memory is allocated and is occupied by a barrier library in a memory; Sampson further more teaches receiving the virtual addresses corresponding to the arrival address and the exit address as the response. In page 7, 3.4.2, Data Cache Barriers, Sampson teaches each thread executes the following code sequence to perform the memory barrier.
Sampson further more discloses “allocating a first memory block to support the first memory-based software barrier”. For example, in page 4, section 3.2. Barrier Filter Architecture and Fig. 2, Sampson teaches the operating system allocates the cache line addresses, i.e. a memory, for a barrier filter. In page 4, Sampson teaches memory fence or synchronization instructions; Sampson further teaches each thread executes the following abstract code sequence to perform the barrier. In page 5 and Fig. 3, Sampson teaches allocating the small size 
Sampson suggest “the first memory block for the first memory-based software barrier being allocated in the reserved block of memory”. For example, in page 2, section 1. Introduction, Sampson teaches the shared portions of the memory subsystem. In page 5, section 3.3.1. Registering a Barrier, Sampson teaches the barrier routines are located within a barrier library in a memory; Sampson further teaches receiving the virtual addresses corresponding to the arrival address and the exit address as the response. In page 5, section 3.3.2: Initializing the Barrier, Sampson teaches the operating system allocates the cache line addresses for a barrier filter in such a way that the lower bits of the arrival and exit address can be used to distinguish which thread is accessing the barrier filter.
Sampson further suggests “setting a first register to identify the first memory-based software barrier”. For example, in page 4, Fig. 2 and section 3.2. Barrier Filter Architecture, Sampson teaches distinguishing which thread is accessing the barrier filter; Sampson further teaches each thread entry contains a valid bit. In page 5, section 3.2.1 MSHR Utilization, Sampson teaches Miss Status Holding Registers; Sampson further teaches Outstanding fill requests to barrier filters thus occupy an MSHR slot in the core originating the request. In page 5, section 3.3.1. Registering a Barrier, Sampson teaches each thread uses the OS interface to register itself with the filter using the barrier’s handle. In section 3.3.1/3.3.3, Sampson teaches 
Sampson further more suggest “the first register including a plurality of bits to indicate a virtual address for the assigned first memory block in the reserved block of memory”. For example, in page 4 and Fig. 2, Sampson teaches the operating system allocates the cache line addresses for a barrier filter in such a way that the lower bits of the arrival and exit address are used to distinguish which thread is accessing the barrier filter. In page 5, section 3.3.1 Registering a Barrier, Sampson teaches receiving the virtual addresses corresponding to the arrival address. In page 8, section 4. Results: Sampson teaches the processor is checking and resetting a local status register.

Independent claims 30 and 36 have been amended to include similar claim limitations as claim 21. Therefore, claims 30 and 36 are not allowable due to the similar reasons as discussed above.

Claim Objections
Claim 30 is objected to because of the following informalities:  the claim language “for for a plurality of” in line 2 is not correct. Appropriate correction is required.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 21-40 are rejected under 35 U.S.C. 103 as being unpatentable over Sampson (Exploiting Fine-Grained Data Parallelism with Chip Multiprocessors and Fast Barriers) and in view of Kiel (US 20140146062 A1).
Regarding to claim 21 (Currently amended), Sampson discloses an apparatus (page 1; section 1. Introduction: Large-scale chip multiprocessors; multi-chip processors; chip multi-processors; use many-core CMPs to exploit fine-grained data parallelism; software-only barriers when distributing kernels across a 16-core CMP) comprising: 
one or more processors (page 1; section 1. Introduction: Large-scale chip multiprocessors; multi-chip processors; chip multi-processors; use many-core CMPs to exploit fine-grained data parallelism; software-only barriers when distributing kernels across a 16-core CMP), the one or more processors to process a plurality of threads (page 3; section 3.1. Barrier Filter Overview: provide barrier synchronization for a set of threads; processors make those threads access specific cache lines; all of the threads have arrived at the barrier); and 
the one or more processors including a plurality of hardware barriers (page 2; section 2. Related Work: hardware implementations of barriers have been around for a long time; page 5; section 3.3.1: this library provides a fall-back software based barrier; page 6; section 3.3.3 
wherein the one or more processors are to:
reserve a block of memory for a plurality of potential memory-based software barriers (Sampson; page 2; section 1. Introduction: the shared portions of the memory subsystem; page 5; section 3.3.1. Registering a Barrier: the barrier routines are located within a barrier library in a memory; receive the virtual addresses corresponding to the arrival address and the exit address as the response; page 5; section 3.3.2: Initializing the Barrier: Section 3.2, the operating system allocates the cache line addresses for a barrier filter in such a way that the lower bits of the arrival and exit address can be used to distinguish which thread is accessing the barrier filter); 
assign one or more barriers to each of one or more thread groups, wherein the one or more barriers assigned to each of the one or more thread groups include one or more hardware barriers of the plurality of hardware barriers, one or more memory-based software barriers, or both (page 2; section 2. Related Work: hardware implementations of barriers have been around for a long time; page 3; section 3.1. Barrier Overview: provide barrier synchronization for a set of threads by making those threads access specific cache lines; a thread uses this barrier mechanism; denote the thread’s arrival at the barrier; the thread does not continue to execute instructions after the arrival at the barrier; page 5; section 3.3.1: this library provides a fall-back software based barrier; page 5; section 3.3.2 Initializing the Barrier: a given filter is aware of all addresses assigned to threads participating in the barrier it supports; the operating system allocates the cache line addresses for a barrier filter; page 6; section 3.3.2 
 allocating a first memory block to support the first memory-based software barrier (page 4; section 3.2. Barrier Filter Architecture; Fig. 2: the operating system allocates the cache line addresses for a barrier filter; page 5; Fig. 3: allocate the small size of the barrier filter state; page 5; section 3.3.1 Registering a Barrier: the barrier routines are located within a barrier library in a memory; each thread uses the OS interface to register itself with the filter using the barrier’s handle, and receive the virtual addresses corresponding to the arrival address and the exit address as the response), the first memory block for the first memory-based software barrier being allocated in the reserved block of memory (Sampson; page 2; section 1. Introduction: the shared portions of the memory subsystem; page 5; section 3.3.1. Registering a Barrier: the barrier routines are located within a barrier library in a memory; receive the virtual addresses corresponding to the arrival address and the exit address as the response; page 5; section 3.3.2: Initializing the Barrier: Section 3.2, the operating system allocates the cache line addresses for a barrier filter in such a way that the lower bits of the arrival and exit address can be used to distinguish which thread is accessing the barrier filter), and

the first register including a plurality of bits to indicate a virtual address for the assigned first memory block in the reserved block of memory (Sampson; page 4; Fig. 2: the operating system allocates the cache line addresses for a barrier filter in such a way that the lower bits of the arrival and exit address are used to distinguish which thread is accessing the barrier filter; page 5; section 3.3.1 Registering a Barrier: receive the virtual addresses corresponding to the arrival address; page 8; section 4. Results: the processor is checking and resetting a local status register).
Sampson fails to explicitly disclose including a graphics processor.
In same filed of endeavor, Kiel teaches including a graphics processor ([0016]: debug graphics programs on a system having a single GPU; Fig. 3; [0027]: a GPU 306).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Sampson to include including a graphics processor at taught by Kiel. The motivation for doing so would have been to debug graphics programs on a system having a single GPU as taught by Kiel in paragraph [0016].

Regarding to claim 22 (Previously presented), Sampson in view of Kiel discloses the apparatus of claim 21, wherein the one or more processors are further to track barrier resources for thread scheduling (Sampson; page 3; section 3.1. Barrier Filter Overview: the barrier filter keeps track of the arrival address assigned to each thread; page 4; Table 2: Figure 2 shows the state kept track of for a single barrier filter; the barrier filter has an arrival address tag and an exit address tag and a table containing T entries, where T is the maximum number of threads supported for a barrier), including tracking of whether each assigned barrier is a hardware barrier or a software barrier (Sampson; page 4; Fig. 2; section 3.2. Barrier Filter Architecture: distinguish which thread is accessing the barrier filter; each thread entry contains a valid bit; page 4; Fig. 2; Figure 2 shows the state kept track of for a single barrier filter; page 6; section 3.3.3 Context Switch and Swapping Out a Barrier: the transitive set of threads assigned to a group of barriers needs enough concurrent hardware barrier support to allow these barriers to be loaded into the hardware at the same time; page 5; section 3.3.1 Registering a Barrier: once all the threads have registered, then the barrier will be completely constructed and ready to use; page 6; 3.3.3 Context Switch and Swapping Out a Barrier). 

Regarding to claim 23 (Currently amended), Sampson in view of Kiel discloses the apparatus of claim 21, wherein the first register includes at least one bit to indicate a memory-based software barrier (Sampson; page 4; Fig. 2: the operating system allocates the cache line addresses for a barrier filter in such a way that the lower bits of the arrival and exit address are used to distinguish which thread is accessing the barrier filter; page 5; section 3.3.1 Registering . 

Regarding to claim 24 (Previously presented), Sampson in view of Kiel discloses the apparatus of claim 21, wherein the first memory block includes: a set of bits for counting a number of threads that have not yet reached the first memory-based software barrier (Sampson; page 3; section 3. Barrier Filter; 3.1. Barrier Filter Overview: count all threads; after a thread has arrived at a barrier, it is blocked until all of the threads have arrived at the barrier; once all of the threads have arrived, the barrier filter will allow the fill requests to be serviced; page 4; section 3.2. Barrier Filter Architecture: when a barrier is created, it starts off with arrived-counter set to zero, num-threads set to the number of threads in the barrier, i.e. the number of threads have not yet reached the barrier; page 5; Fig. 3: arrived-counter is equal to zero; page 8; section 4. Results: the processor is checking and resetting a local status register). 

Regarding to claim 25 (Previously presented), Sampson in view of Kiel discloses the apparatus of claim 21, wherein the first memory block includes: a set of bits to indicate which threads of the first thread group have reached the first memory-based software barrier (Sampson; page 2; 2. Related Work: resets the global bit-vector associated with the satisfied barrier; page 3; 3. Barrier Filter; 3.1. Barrier Filter Overview:  count all threads; after a thread has arrived at a barrier, it is blocked until all of the threads have arrived at the barrier; page 4; 3.2. Barrier Filter Architecture: each thread contains a valid bit; a counter represents the 

Regarding to claim 27 (Currently amended), Sampson in view of Kiel discloses the apparatus of claim 21, wherein the reserved block of memory comprises global memory of the apparatus (Sampson; page 2; section 1. Introduction: the shared portions of the memory subsystem; page 2; section 2. Related Work: use the existing memory interconnect network; page 3; section 2. Related Work: large shared memory systems with a barrier Algorithm; page 3; section 3.1. Barrier Overview: provide barrier: placement of the barrier filter to be in the controller for the first shared level of memory). 

Regarding to claim 28 (Previously presented), Sampson in view of Kiel discloses the apparatus of claim 21, wherein the one or more processors are further to: perform one or more atomic operations to update the first memory-based software barrier (Sampson; page 3; section 2. Related Work:  that work offers fast atomic access to lock variables via a dedicated hardware unit; page 3; section 3.1 Barrier Overview: a thread needs to inform the filter that it has proceeded past the barrier and is now eligible to enter a barrier again; page 5; section 3.3.1: memory update; page 11; section 5. Summary: update shared barrier state variables). 

Regarding to claim 29 (Previously presented), Sampson in view of Kiel discloses the apparatus of claim 21, wherein the one or more processors include a plurality of streaming 

Regarding to claim 30 (Currently amended), Sampson discloses a method (page 1; section 1. Introduction: Large-scale chip multiprocessors; multi-chip processors; chip multi-processors; use many-core CMPs to exploit fine-grained data parallelism; software-only barriers when distributing kernels across a 16-core CMP) comprising: 
reserving a block of memory for for a plurality of  potential memory-based barriers in a computing system (page 2; section 1. Introduction: the shared portions of the memory subsystem; page 5; section 3.3.1. Registering a Barrier: the barrier routines are located within a barrier library in a memory; receive the virtual addresses corresponding to the arrival address and the exit address as the response; page 5; section 3.3.2: Initializing the Barrier: the operating system allocates the cache line addresses for a barrier filter in such a way that the lower bits of the arrival and exit address can be used to distinguish which thread is accessing the barrier filter),
the rest limitations are similar to claim limitations recited in claim 21. Therefore, the same rational used to reject claim 21 is also used to reject claim 30. 

Regarding to claim 31 (Previously presented), the claim limitations are similar to claim limitations recited in claim 22. Therefore, the same rational used to reject claim 22 is also used to reject 31.



Regarding to claim 33 (Previously presented), the claim limitations are similar to claim limitations recited in claim 24. Therefore, the same rational used to reject claim 24 is also used to reject 33.

Regarding to claim 34 (Previously presented), the claim limitations are similar to claim limitations recited in claim 25. Therefore, the same rational used to reject claim 25 is also used to reject 34.

Regarding to claim 35 (Previously presented), the claim limitations are similar to claim limitations recited in claim 28. Therefore, the same rational used to reject claim 28 is also used to reject 38.

Regarding to claim 36 (Currently amended), Sampson discloses at least one non-transitory machine-readable medium comprising instructions that, when executed by a processing device, cause the processing device to perform operations including (page 1; section 1. Introduction: Large-scale chip multiprocessors; multi-chip processors; chip multi-processors; use many-core CMPs to exploit fine-grained data parallelism; software-only barriers when distributing kernels across a 16-core CMP): 
a plurality of potential memory-based barriers in a computing system (page 2; section 1. Introduction: the shared portions of the memory subsystem; page 5; section 3.3.1. Registering a Barrier: the barrier routines are located within a barrier library in a memory; receive the virtual addresses corresponding to the arrival address and the exit address as the response; page 5; section 3.3.2: Initializing the Barrier: the operating system allocates the cache line addresses for a barrier filter in such a way that the lower bits of the arrival and exit address can be used to distinguish which thread is accessing the barrier filter), 
the rest limitations are similar to claim limitations recited in claim 21. Therefore, same rational used to reject claim 21 is also used to reject claim 36.

Regarding to claim 37 (Previously presented), the claim limitations are similar to claim limitations recited in claim 22. Therefore, the same rational used to reject claim 22 is also used to reject 37.

Regarding to claim 38 (Currently amended), the claim limitations are similar to claim limitations recited in claim 23. Therefore, the same rational used to reject claim 23 is also used to reject 38.

Regarding to claim 39 (Previously presented), the claim limitations are similar to claim limitations recited in claim 24 and claim 25. Therefore, the same rational used to reject claim 24 and claim 25 are also used to reject 39.

Regarding to claim 40 (Previously presented), the claim limitations are similar to claim limitations recited in claim 28. Therefore, the same rational used to reject claim 28 is also used to reject claim 40.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Hai Tao Sun whose telephone number is (571)272-5630. The examiner can normally be reached 9:00AM-6:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Gregory Tryder can be reached on 5712707365. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/HAI TAO SUN/Primary Examiner, Art Unit 2616