DETAILED ACTION
Claims 1-20 are presented for examination.

Notice of Pre-AIA  or AIA  Status
The present application is being examined under the pre-AIA  first to invent provisions. 

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.

Claims 1-20 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-20 of copending Application No. 16/746714.

This is a provisional nonstatutory double patenting rejection.

As to claim 1, every limitation in instant claim 1 is disclosed as a subset within claim 1 of Application No. 16/746714.  See the table for comparison below: 
Claim 1 of Instant Application

    PNG
    media_image1.png
    229
    786
    media_image1.png
    Greyscale

Claim 1 of 7/29/22 of Application No. 16/746714

    PNG
    media_image2.png
    233
    641
    media_image2.png
    Greyscale



Similar to independent claim 11 of the instant application, every limitation in instant claim 11 is disclosed as a subset within claim 11 of Application No. 16/746714.
As to dependent claims 2-10 and 12-20, their limitations are identical to dependent claims 2-10 and 12-20 of Application No. 16/746714. 

Claim Rejections - 35 USC § 103
The following is a quotation of pre-AIA  35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:
(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the manner in which the invention was made.

Claims 1-10 are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Schuster (US 2013/0125133 A1).

As to claim 1, Schuster teaches a computer-readable storage medium having stored thereon instructions, which if performed by one or more processors (program of instructions stored on a computer-readable storage media that is executable by one or more CPUs and/or GPUs, including one or more multi-core or multi-threaded processors), cause the one or more processors to at least (Figs 1,3, and 5; [0013]; [0023]; [0119]): 
execute a parent thread (parent thread or given thread) within a first multiprocessor (executing a given/parent thread within a multi-core processor or multiple processor, wherein each number of threads, including the given/parent thread, may execute concurrently or in parallel) (Figs 1,3, and 5; [0013]; [0023]; [0040]; [0062]; [0118]-[0119]; [0123]); 
launch a child thread (wherein a parent/given thread spawns a child thread or children of threads) within a second multiprocessor (nested parallelism involving a second multi-core processor or multiple processor from the plurality of multi-core processors or multiple processors) (Abstract; Figs 1,3, and 5; [0013]; [0023]; [0033]; [0040]; [0062]; [0118]-[0119]; [0123]); and 
in response to a synchronization function call, block execution of the parent thread while waiting for the child thread to complete (in response to a sync function call, the parent thread is suspended and waits until all of its child threads are completed before continuing/resuming execution) ([0031]; [0095]).
Although Schuster does not literally disclose to “block” execution of the parent thread, one of ordinary skill in the art before the invention was made would know that Schuster’s teaching of the parent thread being “suspended” and to wait until all of its child threads are completed before resuming execution serves the same function as blocking.  It would be obvious to include this feature of blocking execution of the parent thread because it would provide the predicted result of having fully strict thread-level parallel programs with load balancing between concurrently executing threads that efficiently distribute work among themselves.

As to claim 2, Schuster teaches wherein the one or more processors comprise a graphics processing unit (GPU) (Computer System 500 includes a plurality of GPU(s) 540 and CPU(s) 530) (Fig. 5).

As to claim 3, Schuster teaches wherein the instructions, if performed by the one or more processors, cause the one or more processors to resume execution of the parent thread after completion of execution of the child thread (in response to a sync function call, the parent thread is blocked and waits until all of its child threads are completed before continuing/resuming execution) ([0031]).

As to claim 4, Schuster teaches wherein the instructions, if performed by the one or more processors, cause the one or more processors to store execution state of the parent thread in response to the synchronization function call ([0034]; [0031]).

As to claim 5, Schuster teaches wherein the instructions that cause the one or more processors to block execution of the parent thread, if performed by the one or more processors, cause the one or more processors to ensure memory coherence between the parent thread and the child thread ([0062]).

As to claim 6, Schuster teaches wherein the instructions, if performed by the one or more processors, cause the one or more processors to resume execution of the parent thread in response to notification that the child thread has completed execution (in response to a sync function call, the parent thread is blocked and waits until be notified that all of its child threads are completed before continuing/resuming execution) ([0031]).

As to claim 7, Schuster teaches wherein: the one or more processors comprise a graphics processing unit (GPU) (GPU(s) 540); and the instructions, if performed by the one or more processors, cause the one or more processors to: store execution state of the parent thread in response to the synchronization function call ([0034]; [0031]); receive a notification that execution of the child thread completed (in response to a sync function call, the parent thread is blocked and waits until be notified that all of its child threads are completed before continuing/resuming execution) ([0031]); and resume execution of the parent thread in response to notification that the child thread has completed execution (in response to a sync function call, the parent thread is blocked and waits until be notified that all of its child threads are completed before continuing/resuming execution) ([0031]).

As to claim 8, Schuster teaches wherein the one or more processors comprise a graphics processing unit (GPU) and wherein the GPU comprises the first multiprocessor and second multiprocessor (Fig. 5; [0013]; [0040]; [0118]).

As to claim 9, Schuster teaches wherein the parent thread comprises an instruction following the synchronization function call and wherein the instructions of the computer-readable storage medium, if performed by the one or more processors, cause the one or more processors to continue execution at the instruction following the synchronization function call (in response to a sync function call, the parent thread is blocked and waits until be notified that all of its child threads are completed before continuing/resuming execution) ([0031]).

As to claim 10, Schuster teaches wherein the first multiprocessor and second multiprocessor are in the same parallel processing unit (PPU) (Fig. 5; [0013]; [0040]; [0118]).

Claims 11-20 are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Schuster in view of Aingaran et al. (hereinafter Aingaran) (US 2006/0136915 A1).

As to claim 11, Schuster teaches a processor, comprising: 
a plurality of cores (multi-core) ([0013]; [0040]; [0118]); 
an L1 cache ([0122]); 
an instruction cache ([0122]); 
a scheduler ([0003]); and 
the processor to execute instructions to: 
execute a parent thread within a first multiprocessor (executing a given/parent thread, wherein each number of threads, including the given/parent thread, may execute concurrently with the same number of processors or cores in a multi-core processor) (Abstract; Figs. 1, 3, and 5, items 105 and 305; [0040]; 0123]); 
launch a child thread within a second multiprocessor (nested parallelism by parent/given thread spawning one or more children with concurrent/parallel execution of one or more CPUs and/or GPUs, wherein a CPU or GPU processor that is a multi-core or multi-threaded processor can be the second multiprocessor) (Abstract; [0013]; [0023]; [0119]; Figs. 1, 3, and 5, items 105 and 305; [0040]; 0123]); and 
in response to a synchronization function call (sync function call), block execution of the parent thread while waiting for the child thread to completes (in response to a sync function call, the parent thread is suspended and waits until all of its child threads are completed before continuing/resuming execution) ([0031]; [0095]).
Although Schuster does not literally disclose to block execution of the parent thread, one of ordinary skill in the art before the invention was made would know that Schuster’s teaching of the parent thread being suspended and to wait until all of its child threads are completed before resuming execution serves the same function as blocking.  It would be obvious to include this feature of blocking execution of the parent thread because it would provide the predicted result of having fully strict thread-level parallel programs with load balancing between concurrently executing threads that efficiently distribute work among themselves.
Schuster does not explicitly teach its processor to have a register file and a crossbar unit.  However, Aingaran teaches a multiprocessor that includes items such as a plurality of cores 36a-h, a crossbar 34, L1 cache 42, L1 instruction cache 43, scheduler 216, register files 210, etc. (Figs. 3 and 8).  Schuster and Aingaran are analogous art with the claimed invention because they are all in the same field of endeavor of thread processing.  It would have been obvious to one of ordinary skill in the art before the invention was made to modify Shuster’s processor such that it would include a register file, crossbar unit, etc., as taught in Aingaran.  The suggestion/motivation for doing so would have been to provide the predicted result of having the computer architectural structure needed for scheduling multiple threads for execution.

As to claim 12, Schuster teaches wherein the processor comprises a graphics processing unit (GPU) to execute the instructions (Computer System 500 includes a plurality of GPU(s) 540 and CPU(s) 530) (Fig. 5).

As to claim 13, Schuster teaches wherein the instructions, if performed by the processor, cause the processor to resume execution of the parent thread after completion of execution of the child thread (in response to a sync function call, the parent thread is blocked and waits until all of its child threads are completed before continuing/resuming execution) ([0031]).

As to claim 14, Schuster teaches wherein the instructions, if performed by the processor, cause the processor to store execution state of the parent thread in response to the synchronization function call ([0034]; [0031]).

As to claim 15, Schuster teaches wherein the instructions, if executed by the processor, cause the processor to ensure memory coherence between the parent thread and the child thread ([0062]).

As to claim 16, Schuster teaches wherein the instructions, if executed by the processor, cause the processor to resume execution of the parent thread in response to notification that the child thread has completed execution (in response to a sync function call, the parent thread is blocked and waits until be notified that all of its child threads are completed before continuing/resuming execution) ([0031]).

As to claim 17, Schuster teaches wherein: the processor comprises a graphics processing unit (GPU) (GPU(s) 540); and the instructions, if performed by the processor, cause the processor to: store execution state of the parent thread in response to the synchronization function call ([0034]; [0031]); receive a notification that execution of the child thread completed (in response to a sync function call, the parent thread is blocked and waits until be notified that all of its child threads are completed before continuing/resuming execution) ([0031]); and resume execution of the parent thread in response to notification that the child thread has completed execution (in response to a sync function call, the parent thread is blocked and waits until be notified that all of its child threads are completed before continuing/resuming execution) ([0031]).

As to claim 18, Schuster teaches wherein the processor comprises a graphics processing unit (GPU) and wherein the GPU comprises the first multiprocessor and second multiprocessor (Fig. 5; [0013]; [0040]; [0118]).

As to claim 19, Schuster teaches wherein the parent thread comprises an instruction following the synchronization function call and wherein the instructions, if performed by the processor, cause the processor to continue execution at the instruction following the synchronization function call (in response to a sync function call, the parent thread is blocked and waits until be notified that all of its child threads are completed before continuing/resuming execution) ([0031]).

As to claim 20, Schuster teaches wherein the first multiprocessor and second multiprocessor are in the same parallel processing unit (PPU) (Fig. 5; [0013]; [0040]; [0118]).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Nickolls et al. (“Scalable Parallel Programming”, ACM QUEUE, March/April 2008) teaches nested parallelism involving a plurality of streaming multiprocessors and using barrier synchronization.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KENNETH TANG whose telephone number is (571)272-3772. The examiner can normally be reached Monday-Friday 7AM-3PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Lewis Bullock can be reached on 571-272-3759. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/KENNETH TANG/Primary Examiner, Art Unit 2199