DETAILED ACTION
Claims 1-10 are pending in the present application.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statements (IDS) submitted on 02/22/2019 and 09/01/2021 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Election/Restrictions
Claims 11-20 are canceled from further consideration pursuant to 37 CFR 1.142(b) as being drawn to a nonelected Species, there being no allowable generic or linking claim. Election was made without traverse in the reply filed on 05/03/2022.
Applicant’s election without traverse of claims 1-10 in the reply filed on 05/03/2022 is acknowledged.

Allowable Subject Matter
Claims 1-10 are allowed.
Regarding independent claim 1, the closed prior art Durant in view of Duluk, JR. et al. and Tardif et al. teach a cascade of graph streaming processors (Durant: Fig 2, par 0033-0034, “a parallel processing subsystem includes a number U of PPUs, where U.gtoreq.1. (Herein, multiple instances of like objects are denoted with reference numbers identifying the object and parenthetical numbers identifying the instance where needed.) PPUs 202 and parallel processing memories 204 may be implemented using one or more integrated circuit devices, such as programmable processors, application specific integrated circuits (ASICs), or memory devices, or in any other technically feasible fashion“, Figs 4 and 6, par 0071-0072, “FIG. 4 illustrates nested task execution on parallel processing subsystem 112, according to one embodiment of the present invention. As shown, CPU 102 initiates execution of exemplary tasks 420 on parallel processing subsystem 112. After task 420(0) completes, task 420(1) executes. After task 420(1) completes, task 420(2) executes”…a graphics processing unit include a multiple parallel processing subsystems, each subsystem is used to perform a series tasks), comprising: 
a plurality of graph streaming processors (Durant : Figs 2 and 4, par 0033-0034, “a parallel processing subsystem includes a number U of PPUs, where U.gtoreq.1. (Herein, multiple instances of like objects are denoted with reference numbers identifying the object and parenthetical numbers identifying the instance where needed.) PPUs 202 and parallel processing memories 204 may be implemented using one or more integrated circuit devices, such as programmable processors, application specific integrated circuits (ASICs), or memory devices, or in any other technically feasible fashion”), wherein each of the graph streaming processor comprises: 
a processor array including a plurality of processors (Durant: Fig 2, par 0038-0039, “PPU 202(0) includes a processing cluster array 230 that includes a number C of general processing clusters (GPCs) 208, where C 1. Each GPC 208 is capable of executing a large number (e.g., hundreds or thousands) of threads concurrently, where each thread is an instance of a program. “ …processing cluster array); 
a thread manager (Durant: Figs 2 and 3A-3B, par 0045-0046, “FIG. 3A is a block diagram of the task/work unit 207 of FIG. 2, according to one embodiment of the present invention. The task/work unit 207 includes a task management unit 300 and the work distribution unit 340. The task management unit 300 organizes tasks to be scheduled based on execution priority levels “ …. pipeline manager, scheduler, and task management), the thread manager comprising a plurality of stages (Durant: Figs 4 and 7, par 0070-0071, par 0098-0106, “FIG. 4 illustrates nested task execution on parallel processing subsystem 112, according to one embodiment of the present invention. As shown, CPU 102 initiates execution of exemplary tasks 420 on parallel processing subsystem 112. After task 420(0) completes, task 420(1) executes. After task 420(1) completes, task 420(2) executes” …perform tasks one after one) and a plurality of command buffers located between each of the plurality of stages), wherein each stage includes physical hardware operative to schedule each of a plurality of threads of the stage for processing on the processor array (Durant: Figs 4 and 6-8, par 0070-0071, par 0098-0108, “FIG. 8 illustrates a related hierarchical execution graph including associated TMDQs and tasks, according to another embodiment of the present invention. As shown, the hierarchical execution graph includes thread group 810 at nesting depth 1, TMDQs 812, tasks 820 830 840 850 860, an execution graph 880 at nesting depth 2, and an execution graph 890 at nesting depth 3. The components of the hierarchical execution graph function substantially as described above in conjunction with FIG. 7 except as detailed below. As shown, each TMDQ 812 of thread group 810 has one or more pending tasks. In one example, task 820(0) associated with stream 870 could have been launched into TMDQ 812(0), but task 860(0) associated with stream 875 would not yet have been launched. Tasks 830 associated with one stream could have been launched into TMDQ (1) 812(1)” ….perform series tasks by processing array based on the scheduler), including an input command buffer parser operative to interpret commands within a corresponding input command buffer and generate the plurality of threads (Durant:: par 0035, “A pointer to each data structure is written to a pushbuffer to initiate processing of the stream of commands in the data structure. The PPU 202 reads command streams from one or more pushbuffers and then executes commands asynchronously relative to the operation of CPU 102. Execution priorities may be specified for each pushbuffer by an application program via the device driver 103 to control scheduling of the different pushbuffers”, par 0037-0039, “The pointers to TMDs are included in the command stream that is stored as a pushbuffer and received by the front end unit 212 from the host interface 206. Processing tasks that may be encoded as TMDs include indices of data to be processed, as well as state parameters and commands defining how the data is to be processed (e.g., what program is to be executed)”, Duluk, JR. et al.: par 0082, “Push buffer 590 includes configuration commands 501 or 503, data processing commands 502 or 504. Configuration commands 501 or 503 specify the configuration for programmable units within PPU 202, SCC 525, and front end 517. Configuration commands 501 or 503 may be used to set up a particular processing state for use when processing a set of samples”, par 0127, “The device driver 103 programs the MME 910 with macros that, when executed, generate constant buffer update commands to update pipeline state in constant buffer 906. The device driver 103 then transmits pipeline state received from the application directly to the MME 910. The device driver 103 also invokes the macros in the MME 910 that cause the MME 910 to generate constant buffer update commands based on the pipeline state, and in some embodiments, additional commands that update the pipeline state in the graphics processing pipeline 400. In one embodiment, the device driver 103 compresses the pipeline state, and a macro executed by the MME 910 decompresses the pipeline state before updating the constant buffer 906 in addition to updating the hardware pipeline state”); 
the cascade of graph streaming processors further comprising (Durant: Fig 2, par 0033-0034, “a parallel processing subsystem includes a number U of PPUs, where U.gtoreq.1. (Herein, multiple instances of like objects are denoted with reference numbers identifying the object and parenthetical numbers identifying the instance where needed.) PPUs 202 and parallel processing memories 204 may be implemented using one or more integrated circuit devices, such as programmable processors, application specific integrated circuits (ASICs), or memory devices, or in any other technically feasible fashion“, Figs 4 and 7-8, par 0070-0071, par 0098-0108, “As shown, CPU 102 initiates execution of exemplary tasks 420 on parallel processing subsystem 112. After task 420(0) completes, task 420(1) executes. After task 420(1) completes, task 420(2) executes. During the course of execution, task 420(1) invokes tasks 430(0) through 430(2), for example, to compute an intermediate result used by task 420(1). To maintain proper instruction execution order, task 420(1) should wait until tasks 430 complete before continuing” …a graphics processing unit include a multiple parallel processing subsystems, each subsystem is used to perform a series tasks); 
one or more shared command buffers located between each of the plurality of graph streaming processors (Duluk, JR. et al.: Fig 9A, par 0124-0126, a buffer 906 is shared by shader engines to be accessed by them), wherein each shared command buffer includes a buffer address, a write pointer, and a read pointer (Tardif et al.: Fig 6, par 0066, command data include address, input pointer and out pointers). 
But the prior art or combination thereof fails to disclose or make obvious the claimed invention as a whole, about “wherein for each of the shared command buffers a first graph streaming processor of the plurality of graph streaming processors operates to write commands to the shared command buffer as indicated by the write pointer of the shared command buffer and a second graph streaming processor of the plurality of graph streaming processors operates to read commands from the shared command buffer as indicated by the read pointer by interpreting commands of the shared command buffer by an input command buffer parser of a first stage of the second graph streaming processor; wherein for each one of the shared command buffers, at least one graph streaming processor scheduler operates to manage the write pointer and the read pointer to avoid overwriting unused commands of the shared command buffer”.  Claims 2-10, each ultimately depend from claim 1, and are therefore allowed at least due to their respective dependencies from the allowable claim 1.
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jin Ge whose telephone number is (571)272-5556. The examiner can normally be reached 8:00 to 5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee M Tung can be reached on (571)272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

JIN . GE
Examiner
Art Unit 2616



/JIN GE/Primary Examiner, Art Unit 2616