DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: 
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.
Claims 1-24 are rejected under 35 U.S.C. 103 as being unpatentable by Shad et al. (U.S. 2013/0132711 A1) in view of Harris (U.S. 2017/0024848 A1). 
Regarding Claim 1, Shad discloses a method of operating a data processor that includes a programmable processor operable to execute programs, and in which when executing a program (Shad, [0008] “a method of the invention for preempting execution of program instructions in a multi-threaded system” Shad teaches a method of execution of program instruction in a multi-threaded system by the programmable processors, the programmable processor executes the program for respective groups of one or more execution threads, each execution thread in a group of execution threads corresponding to a respective program of an output being generated (Shad, [0025] “parallel processing subsystem 112 includes one or more parallel processing units (PPUs) 202..may be implemented using one or more integrated circuit  and each execution thread having an associated set of registers for storing data for the execution thread (Shad, [0043] “a particular GPC 208 constitutes a thread, collection of a certain number of concurrently executing threads across the parallel processing engines within an SM 310 is referred to as a "thread group" and [0064] “The context state that is unloaded and stored includes registers within the SMs 310” execution a thread group (an SM 310) is stored in registers) ,the method comprising: 
in response to a command to suspend the processing of an output being generated by the data processor (Shad, [0051] “the sequence of per-thread instructions might include an instruction to suspend execution of operations for the representative thread at a particular point in the sequence until such time as one or more of the other threads reach that particular point”: 
for a group of one or more execution threads currently executing a program for the output being generated: 
stopping the issuing of program instructions for execution by the group of one or more execution threads (Shad, [0051] “the terms "CTA" and "thread array" are used synonymously herein” and [0055] “FIG.4, A first phase (phase 1) stops the processing in the current context. For CTA level stopping work at a CTA task boundary. For instruction level stopping work at an SM 310. If an interrupt or fault occurs after initiated preemption is and during phase 1”; 
waiting for any outstanding transactions that affect the content of the registers associated with the threads of the group of one or more execution threads for the group of one or more execution threads to complete (Shad, [0058] “When the preempt command is received by a processing unit, the processing unit stops outputting work to a downstream unit... Assertion of the context freeze signal ensures that the processing pipeline does not perform any operation based on the transactions used to save the context state” and [0064] “The context state that is stored includes registers within SMs 310” and Fig. 5A [0080] “At step 525 the SMs 310 stop executing instructions and in step 530 the SMs 310 wait for any outstanding memory transactions to complete” Shad teach waiting for the outstanding memory transaction affect to the context state stored in register associate with execution of CTA (thread array) in register to complete; and 
when any outstanding transactions that affect the content of the registers associated with the threads of the group of one or more execution threads for the group of one or more execution threads have completed: storing to memory (Shad, Fig. 5, [0080] “The SMs 310 indicate to the pipeline manager 305 whether each thread group exited or was preempted. When all of the outstanding memory transactions are complete, at step 535 the context state maintained in the SMs 310 is stored into a context buffer and the context state maintained in the pipeline managers 305 is also stored into the context buffer (: 
the content of the registers associated with the threads of the group of one or more execution threads (Shad, [0064] “The context state that is stored includes registers within SMs 310” and [0080] “The SMs 310 indicate to the pipeline manager 305 whether each thread group exited or was preempted” Shad teaches the content of the register (the context state) associated with the execution threads; and 
a set of state information for the group of one or more execution threads, the set of state information including at least an indication of the last instruction in the program that was executed for the threads of the group of one or more execution threads (Shad, [0060] “Even after the task/work unit 207 stops launching work, the task/work unit 207 may receive additional work that may be generated by the GPCs 208 during execution of previous instructions. The task/work unit 207 buffers the additional work to be stored by the front end 212 as part of the context state for the task/work unit 207” Shad teaches after the stopping launching work (suspended execution threads) an additional information is generated by the GPCs during execution of previous instruction (indication of the last instruction in the program).
However, Shad does not explicitly teach programmable execution unit each execution thread in a group of execution threads corresponding to a respective work item of an output being generated;
Harris teaches each execution thread in a group of execution threads corresponding to a respective work item of an output being generated (Harris, [0005] “A graphics processing pipeline shader thus performs processing by running small programs for each "work item" in an output to be generated” and [0057] “The execution threads that are issued to the shading stage to execute the shader program will represent appropriate " work items" for the shader program” Harris teaches execution threads corresponding to an appropriate work item in an output to be generated for the shader program.
Shad and Harris are combinable because they are from the same field of endeavor, system and method for image processing and try to solve similar problems.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made for modifying the method of Shad to combine with work items (as taught by Harris) in order to include work items in a group of execution threads because Harris can provide execution threads corresponding to an appropriate work item in an output to be generated for the shader program (Harris, [0005] 0057]). Doing so, it may provide improvements to execution of shader programs in graphics processing pipelines that include one or more shader stages (Harris, [0010]).
Regarding Claim 2, the method of claim 1, 
Shad teaches the sequence of per-thread…to suspend execution of operation for the representative threads..(Shad, [0051], L. 4-8).
However, Shad does not explicitly teach wherein the suspending of the processing for the output being generated is performed by and under the control of the driver for the data processor for the output that is being suspended.  
Harris teaches wherein the suspending of the processing for the output being generated is performed by and under the control of the driver for the data processor for the output that is being suspended (Harris, [0149] “the application will generate API (Application Programming Interface) calls that are interpreted by a driver 4 for the graphics process pipeline 3 that is running on the host processor 1 to generate appropriate commands to the graphics processor 3 to generate graphics output required by the application 2” the combination between Shad and Harris can be used to teaches the output that is being suspended (as taught by Shad) is performed by and under the control of the driver for data processor (as taught by Harris).
Shad and Harris are combinable see rationale in claim 1.
Regarding Claim 3, the method of claim 1, Shad does not explicitly teach comprising: storing the content of the registers and the set of state information for the group of one or more execution threads to memory once any outstanding transactions that affect the content of the registers associated with the threads of the group of one or more execution threads for the group of one or more execution threads have completed, without waiting for any outstanding barrier dependencies for the group of one or more execution threads to be met.  
However, Harris teaches storing the content of the registers and the set of state information for the group of one or more execution threads to memory once any outstanding transactions that affect the content of the registers associated with the threads of the group of one or more execution threads for the group of one or more execution threads have completed, without waiting for any outstanding barrier dependencies for the group of one or more execution threads to be met (Harris, Fig. 8, [0211] If at step 95 it is determined that there are work group common expressions, then the thread executes the instructions for the work group common expressions that are not dependent upon the common expressions (step 96)” and [0215] “then proceeds to execute the instruction in the main instruction sequence that are not dependent on any of the common expressions (step 101) and [0216] “it executes the remaining main instruction sequence instruction that are dependent upon the work group common expressions (step 107)” Harris teaches an execution thread is not dependent upon outstanding barrier dependencies (e.g. the common expression, main instruction sequence) and is completed without waiting for any outstanding barrier dependencies (Fig. 8).
Shad and Harris are combinable see rationale in claim 1.
Regarding Claim 4, the method of claim 1, Shad does not explicitly teach comprising: including in the set of state information that is stored for the thread group an indication of any outstanding barrier dependencies for the threads of the thread group at the time that the thread group was suspended.
Harris teaches including in the set of state information that is stored for the thread group an indication of any outstanding barrier dependencies for the threads of the thread group at the time that the thread group was suspended (Harris, Fig. 6, [0202] “In this Figure, solid black lines show thread group paths, and dotted lines show cross-thread control signalling. Thus, as can be seen, when a thread completes the global expressions 71, that will be signalled to the wait points 79, 83, so that any threads that reach or are stalled at those points can then be allowed to continue their execution” Harris teaches a control signal (the dotted lines at wait points) as an indication of an outstanding barrier dependencies (threads reach at wait points) at the time that the thread was stalled.
Shad and Harris are combinable see rationale in claim 1.
Regarding Claim 5, Shad as modified discloses the method of claim 1, comprising: storing the content of the registers and the set of state information for the group of one or more execution threads together in a suspend data buffer for the group of one or more execution threads (Shad, [0064] “Once the SMs 310 in the GPCs 208 have stopped issuing instructions and each SM 310 becomes idle, the trap handler unloads the context state for the CTAs running on the GPCs 208. The context state that is unloaded and stored includes registers within the SMs 310, registers within the GPCs 208, is saved to a predefined buffer in graphics memory” Shad teaches the context state for stopping the CTAs (thread array) running on the GPCS are stored on registers within the SMs 310 and stored on a predefined buffer in graphics memory.
Regarding Claim 6, Shad as modified discloses the method of claim 1, comprising: the data processor storing the register content and the state information for the thread group in the memory by executing a sequence of instructions that store the register content and the thread group state information in memory (Shad, [0051] “the sequence of per-thread instructions might include an instruction to suspend execution of operations for the representative thread at a particular point in the sequence” and [0064] “the SMs 310 in the GPCs 208 have stopped issuing instructions. The context state that is unloaded and stored includes registers within the SMs 310, registers within the GPCs 208, shared memory, and the like, is saved to a predefined buffer in graphics memory” Shad teaches executing a sequence of suspend execution of operation on threads that store as the context state in registers and in memory.
Regarding Claim 7, Shad as modified discloses the method of claim 1, comprising: also storing an indication that the thread group's processing has been suspended (Shad,  Fig. 5A, [0081] “At step 540 the pipeline managers 305 report to the work distribution unit 340 that the instruction level portion of the processing pipeline, e.g., the SMs 310 and the GPCs 208, are idle and the work distribution unit 340 then saves the CTA level state that is maintained in the work distribution unit 340 for the current context. At step 550 the front end 212 then stores an indication that the saved context state is for a preempted context and resets the processing pipeline” Shad teaches storing an indication of a context state that CTA (thread array) are idle (suspended).
Regarding Claim 8, Shad as modified discloses the method of claim 1, comprising: Attorney Docket No. DEHN-16239US0148032 dehn/16239/16239-app- 76 – 
suspending the processing of a set of plural groups of one or more execution threads; and for each group of execution threads in the set, storing a processing status of the thread group at the time the processing of the set of plural groups of one or more execution threads was suspended (Shad, [0064] “Once the SMs 310 in the GPCs 208 have stopped issuing instructions and each SM 310 becomes idle, the trap handler unloads the context state for the CTAs running on the GPCs 208. The context state that is unloaded and stored includes registers within the SMs 310, registers within the pipeline manager 305” Shad teaches suspending (stopping) the processing of CTAs (thread array) on GPCs and saving the context state for CTAs on registers.
Regarding Claim 9, Shad discloses the method of claim 1, further comprising:
 the data processor receiving a command to resume processing of the suspended output, and, in response to receiving the command to resume processing of the suspended output: 
for a thread group whose program execution was suspended partway through when processing of the output was suspended (Shad, [0060] “Even after the task/work unit 207 stops launching work, the task/work unit 207 may receive additional work that may be generated by the GPCs 208 during execution of previous instructions. The task/work unit 207 buffers the additional work to be stored by the front end 212 as part of the context state for the task/work unit 207” Shad teaches after the stopping launching work (suspended execution threads) an additional information is generated by the GPCs during execution of previous instruction as a partway through when processing of the output was suspended. 
loading the register content for the threads of the thread group that was written out when processing of the thread group was suspended to registers associated with threads of the issued thread group (Shad, [0064] “Once the SMs 310 in the GPCs 208 have stopped issuing instructions, and the front end 212 stores the context state. The context state that is unloaded and stored includes registers within the SMs 310” and Fig. 5B, [0083] “At step 565 the front end 212 initiates restoration of a saved context for a context selected by the host interface 206” Shad teaches restoring (loading) the register content for the threads (CTAs) was stopped and context state was saved into registers; 
loading the thread group state information for the thread group including at least the indication of the last instruction in the program that was executed for the threads of the thread group (Shad, [0060] “Even after the task/work unit 207 stops launching work, the task/work unit 207 may receive additional work that may be generated by the GPCs 208 during execution of previous instructions. The task/work unit 207 buffers the additional work to be stored by the front end 212 as part of the context state for the task/work unit 207” and Fig. 5B, [0083] “At step 575 the selected context state is read from a context buffer by the front end 212 and task/work unit 207, and restored at the task and CTA level” Shad teaches restoring (loading) including the execution of the previous instruction (the last instruction) for the task of stopping the threads (CTA, thread array).
and, after the register content and thread group state data has been loaded: 
resuming execution of the program for the issued thread group after the indicated last instruction in the program that was executed for the threads of the thread group (Shad [0068] “When a context is selected to be executed, the host interface 206 needs to determine if the selected context is a context that was previously preempted. A context reload (ctx_reload) flag indicating whether a context was preempted is maintained by the host interface 206. When the host interface 206 recognizes that the selected context was preempted, the previously unloaded and stored context state is reloaded before execution of the selected context resumes” and [0075] “Finally, The front end 212 ACKs the original preemption command to the host interface 206. Any previously preempted CTAs have resumed execution in the Task/Work Unit 207 and the GPCs 208. When instruction level preemption is used, any previously preempted threads have resumed execution on the SMs 310” Shad teaches resuming execution of threads (preempted CTAs) of the thread group; and   
using the loaded content of the registers for the threads of the thread group when executing the program for the issued thread group (Shad, [0071] “The work distribution unit 340 within the task/work unit 207 receives the preempt restore command and restores the selected context state, replaying the restored tasks into the GPCs 208, and restoring preempted CTAs and thread groups back into the pipeline managers 305 and the SMs 310, respectively” Shad teaches using the restored context state of register for  the threads (CTAs, thread arrays) when executing the program for the thread group (tasks in GPCs 208).
However, Shad does not explicitly teach issuing a corresponding group of one or more execution threads to the programmable execution unit to execute the program.
Harris teaches issuing a corresponding group of one or more execution threads to the programmable execution unit to execute the program (Harris, [0055] “the programmable execution unit will receive execution threads to be executed, and execute appropriate shading programs for those threads to generate the desired output” Harris teaches issuing a programmable execution unit executes the execution threads for the shading programs;
Shad and Harris are combinable see rationale claim 1.
Regarding Claim 10, Shad as modified discloses the method of claim 9, comprising: in response to receiving the command to resume processing of the suspended output: 
first identifying one or more groups of one or more execution threads whose execution of a program for the suspended output was suspended when the processing of the output was suspended from a previously stored set of thread group suspend status indications for the output (Shad, Fig. 5A, [0080] “At step 525 the SMs 310 stop executing instructions and in step 530 the SMs 310 wait for any outstanding memory transactions to complete. The SMs 310 indicate to the pipeline manager 305 whether each thread group exited or was preempted. When all of the outstanding memory transactions are complete, at step 535 the context state maintained in the SMs 310 is stored into a context buffer” Shad teaches identifying  groups of execution threads were preempted for the suspending (stopping) by the SMs 310 and store the context state into a context buffer and Attorney Docket No. DEHN-16239US0148032 
dehn/16239/16239-app-77 - then, for each such identified group of one of more execution threads, resuming the execution of the program for that group of one or more execution threads (Shad, Fig. 5B, [0084] “At step 585 the CTAs are launched in the preempted order and at step 590 execution is resumed using the restored context state for the selected context” Shad teaches for identified group of execution thread by using the restored context state for the selected context and resuming the execution the threads (the CTAs) are launched in the preempted order.
Regarding Claim 11, Shad as modified discloses the method of claim 1, wherein the data processor is a graphics processor and the program is a shader program (Shad, Fig. 2 [0035] “a PPU 202 can be a graphics processor in a unified memory architecture (UMA) and [0034] “GPCs 208 can be programmed to execute processing tasks including image rendering operations (e.g., pixel shader programs) Shad teaches a graphic processor and shader program.
Regarding Claim 12, Shad as modified discloses a data processor, the data processor comprising: 
a programmable execution unit operable to execute programs, and in which when executing a program, the programmable execution unit executes the program for respective groups of one or more execution threads, each execution thread in a group of one or more execution threads corresponding to a respective work item of an output being generated; and 
the data processor further comprising a processing circuit (Shad, [0025] “parallel processing memories 204 may be implemented using one or more integrated circuit devices, such as programmable processors” configured to 
in response to a command to suspend the processing of an output being generated by the data processor: 
for a group of one or more execution threads currently executing a program for the output being generated: 
stop the issuing of program instructions for execution by the group of one or more execution threads; 
wait for any outstanding transactions that affect the content of the registers associated with the threads of the group of one or more execution threads for the group of one or more execution threads to complete; and 
when any outstanding transactions that affect the content of the registers associated with the threads of the group of one or more execution threads for the group of one or more execution threads have completed: 
store to memory: 
the content of the registers associated with the threads of the group of one or more execution threads; and Attorney Docket No. DEHN-16239US0148032 dehn/16239/16239-app- 78 – 
a set of state information for the group of one or more execution threads, the set of state information including at least an indication of the last instruction in the program that was executed for the threads of the group of one or more execution threads.  
However, Shad does not explicitly teach a plurality of registers for storing data for execution threads executing a program, each execution thread when executing a program having an associated set of registers of the plurality of registers for storing data for the execution thread;
each execution thread in a group of execution threads corresponding to a respective work item of an output being generated;
Harris teaches a plurality of registers for storing data for execution threads executing a program, each execution thread when executing a program having an associated set of registers of the plurality of registers for storing data for the execution thread (Harris, [0218] [0219] “each thread that the shader program is to be executed for. In this process, the threads that execute the common expression instructions save their results to appropriate registers 46” Harris teaches registers (46) store the results of execution threads executing a shader program.
Shad and Harris are combinable see rationale in claim 1.
Harris teaches each execution thread in a group of execution threads corresponding to a respective work item of an output being generated;
Claim 12 is substantially similar to claim 1 is rejected based on similar analyses.
Regarding Claim 13, Shad as modified discloses the data processor of claim 12, wherein the suspending of the processing for the output being generated is performed by and under the control of the driver for the data processor for the output that is being suspended.  
Claim 13 is substantially similar to claim 2 is rejected based on similar analyses.
Regarding Claim 14, Shad as modified discloses the data processor of claim 12, wherein the processing circuit is configured to: 
store the content of the registers and the set of state information for the group of one or more execution threads to memory once any outstanding transactions that affect the content of the registers associated with the threads of the group of one or more execution threads for the group of one or more execution threads have completed, without waiting for any outstanding barrier dependencies for the group of one or more execution threads to be met.  
Claim 14 is substantially similar to claim 3 is rejected based on similar analyses.
Regarding Claim 15, Shad as modified discloses the data processor of claim 12, wherein the processing circuit is configured to:  
 	include in the set of state information that is stored for the thread group an indication of any outstanding barrier dependencies for the threads of the thread group at the time that the thread group was suspended.  
Claim 15 is substantially similar to claim 4 is rejected based on similar analyses.
Regarding Claim 16, Shad as modified discloses the data processor of claim 12, wherein the processing circuit is configured to: 
 	store the content of the registers and the set of state information for the group of one or more execution threads together in a suspend data buffer for the group of one or more execution threads.  
Claim 16 is substantially similar to claim 5 is rejected based on similar analyses.
Regarding Claim 17, Shad as modified discloses the data processor of any one of claim 12, wherein the processing circuit is configured to: store the register content and the state information for the thread group in the memory by executing a sequence of instructions that store the register content and the thread group state information in memory.  
Claim 17 is substantially similar to claim 6 is rejected based on similar analyses.
Regarding Claim 18, Shad as modified discloses the data processor of any one of claim 12, wherein the processing circuit is configured to: 
also store an indication that the thread group's processing has been suspended.  
Claim 18 is substantially similar to claim 7 is rejected based on similar analyses.
Regarding Claim 19, Shad as modified discloses the data processor of any one of claim 12, wherein the processing circuit is configured to: 
suspend the processing of a set of plural groups of one or more execution threads; and 
for each group of execution threads in the set, store a processing status of the thread group at the time the processing of the set of plural groups of one or more execution threads was suspended.  
Claim 19 is substantially similar to claim 8 is rejected based on similar analyses.
Regarding Claim 20, Shad as modified discloses the data processor of claim 12, comprising a processing circuit configured to: 
in response to receiving the command to resume processing of the suspended output: 
for a thread group whose program execution was suspended partway through when processing of the output was suspended: 
issue a corresponding group of one or more execution threads to the programmable execution unit to execute the program; 
load the register content for the threads of the thread group that was written out when processing of the thread group was suspended to registers associated with threads of the issued thread group; 
load the thread group state information for the thread group including at least the indication of the last instruction in the program that was executed for the threads of the thread group; 
and, after the register content and thread group state data has been loaded: 
resume execution of the program for the issued thread group after the indicated last instruction in the program that was executed for the threads of the thread group; and 
use the loaded content of the registers for the threads of the thread group when executing the program for the issued thread group.  
Claim 20 is substantially similar to claim 9 is rejected based on similar analyses.
Regarding Claim 21, Shad as modified discloses the data processor of claim 20, wherein the processing circuit is configured to:   
 	in response to receiving the command to resume processing of the suspended output: Attorney Docket No. DEHN-16239US0148032 dehn/16239/16239-app- 80 – 
first identify one or more groups of one or more execution threads whose execution of a program for the suspended output was suspended when the processing of the output was suspended from a previously stored set of thread group suspend status indications for the output; and 
then, for each such identified group of one of more execution threads, resume the execution of the program for that group of one or more execution threads.  
Claim 21 is substantially similar to claim 10 is rejected based on similar analyses.
Regarding Claim 22, Shad discloses the data processor (Shad, [0025] “programmable processors”), the data processor comprising: 
a programmable processor operable to execute programs, and in which when executing a program, the programmable processor executes the program for respective groups of one or more execution threads, each execution thread in a group of one or more execution threads corresponding to a respective work item of an output being generated (Shad, [0025] “parallel processing subsystem 112 includes one or more parallel processing units (PPUs) 202..may be implemented using one or more integrated circuit devices, programmable processors” and Fig. 2, [0030] “PPU 202(0) includes a processing cluster array 230 of general processing clusters (GPCs) 208. Each GPC 208 is capable of executing a large number (e.g., hundreds) of threads concurrently, where each thread is an instance of a program” and [0033] “Crossbar unit 210 is configured to route the output of each GPC 208 to the input of any partition unit 215 or to another GPC 208 for further processing” Shad teaches the programmable processor executes each thread in a multi-threads corresponding to a respective program of an output being generated (output of each GPC 208, Fig.2); and 
the data processor further comprising a processing circuit (Shad, [0025] “parallel processing memories 204 may be implemented using one or more integrated circuit devices, such as programmable processors” configured to, in response to a command to resume the processing of an output being generated by the data processor whose processing was previously suspended C9: 
for a group of one or more execution threads whose execution of a program for the output whose processing is being resumed was stopped when the processing of the output was suspended (Shad, [0060] “Even after the task/work unit 207 stops launching work, the task/work unit 207 may receive additional work that may be generated by the GPCs 208 during execution of previous instructions. The task/work unit 207 buffers the additional work to be stored by the front end 212 as part of the context state for the task/work unit 207” Shad teaches after the stopping launching work (suspended execution threads) an additional information is generated by the GPCs during execution of previous instruction as a partway through when processing of the output was suspended: 
load from memory into registers associated with threads of the issued thread group a set of register content for the threads of the thread group that was written out to memory when processing of the thread group was suspended (Shad, [0064] “Once the SMs 310 in the GPCs 208 have stopped issuing instructions, and the front end 212 stores the context state. The context state that is unloaded and stored includes registers within the SMs 310, registers within the GPCs 208, shared memory, is saved to a predefined buffer in graphics memory” and Fig. 5B, [0083] “At step 565 the front end 212 initiates restoration of a saved context for a context selected by the host interface 206” Shad teaches restoring (loading) the register content for the threads (CTAs) was stopped and context state was saved into registers and a predefined buffer in graphics memory; 
load from memory a set of thread group state information for the thread group including at least an indication of the last instruction in the program that was executed for the threads of the thread group when processing of the thread group was suspended (Shad, [0060] “Even after the task/work unit 207 stops launching work, the task/work unit 207 may receive additional work that may be generated by the GPCs 208 during execution of previous instructions. The task/work unit 207 buffers the additional work to be stored by the front end 212 as part of the context state for the task/work unit 207” and Fig. 5B, [0083] “At step 575 the selected context state is read from a context buffer by the front end 212 and task/work unit 207, and restored at the task and CTA level” Shad teaches restoring (loading) including the execution of the previous instruction (the last instruction) for the task of stopping the threads (CTA, thread array) from the graphic memory; and 
after the register content and the thread group state data has been loaded: Attorney Docket No. DEHN-16239US0148032 dehn/16239/16239-app- 81 – 
resume execution of the program for the issued thread group after the indicated last instruction in the program that was executed for the threads of the thread group (Shad [0068] “When a context is selected to be executed, the host interface 206 needs to determine if the selected context is a context that was previously preempted. A context reload (ctx_reload) flag indicating whether a context was preempted is maintained by the host interface 206. When the host interface 206 recognizes that the selected context was preempted, the previously unloaded and stored context state is reloaded before execution of the selected context resumes” and [0075] “Finally, The front end 212 ACKs the original preemption command to the host interface 206. Any previously preempted CTAs have resumed execution in the Task/Work Unit 207 and the GPCs 208. When instruction level preemption is used, any previously preempted threads have resumed execution on the SMs 310” Shad teaches resuming execution of threads (preempted CTAs) of the thread group; and
use the loaded content of the registers for the threads of the issued thread group when executing the program for the issued thread group (Shad, [0071] “The work distribution unit 340 within the task/work unit 207 receives the preempt restore command and restores the selected context state, replaying the restored tasks into the GPCs 208, and restoring preempted CTAs and thread groups back into the pipeline managers 305 and the SMs 310, respectively” Shad teaches using the restored context state of register for  the threads (CTAs, thread arrays) when executing the program for the thread group (tasks in GPCs 208).
Shad does not explicitly teach a plurality of registers for storing data for execution threads executing a program, each execution thread when executing a program having an associated set of registers of the plurality of registers for storing data for the execution thread;
issuing a corresponding group of one or more execution threads to the programmable execution unit to execute the program;
a programmable execution unit operable to execute programs, and in which when executing a program, the programmable execution unit executes the program for respective groups of one or more execution threads, each execution thread in a group of one or more execution threads corresponding to a respective work item of an output being generated.
Harris teaches a plurality of registers for storing data for execution threads executing a program, each execution thread when executing a program having an associated set of registers of the plurality of registers for storing data for the execution thread (Harris, [0218] [0219] “each thread that the shader program is to be executed for. In this process, the threads that execute the common expression instructions save their results to appropriate registers 46” Harris teaches a plurality of registers (46) store the results of execution threads executing a shader program;
issuing a corresponding group of one or more execution threads to the programmable execution unit to execute the program (Harris, [0055] “the programmable execution unit will receive execution threads to be executed, and execute appropriate shading programs for those threads to generate the desired output” Harris teaches issuing a programmable execution unit executes the execution threads for the shading programs;
a programmable execution unit operable to execute programs, and in which when executing a program, the programmable execution unit executes the program for respective groups of one or more execution threads, each execution thread in a group of one or more execution threads corresponding to a respective work item of an output being generated (Harris, [0055] “the programmable execution unit execute appropriate shading programs for those threads to generate the desired output” and [0057] “The execution threads that are issued to the shading stage to execute the shader program will represent appropriate " work items" for the shader program” Harris teaches a programmable execution unit  executes execution threads corresponding to an appropriate work item in an output to be generated for the shader program.
Shad and Harris are combinable see rationale in claim 1.
Regarding Claim 23, Shad as modified discloses the data processor of claim 12, wherein the data processor is a graphics processor and the program is a shader program.  
Claim 23 is substantially similar to claim 11 is rejected based on similar analyses
Regarding Claim 24, Shad as modified discloses a non-transitory computer readable storage medium (Shad, [0096] “a variety of computer-readable storage media”) storing computer software which when executing on a processor performs a method of operating a data processor (Shad, [0027] “CPU 102 is the master processor of computer system 100, controlling and coordinating operations of other system components”) that includes a programmable execution unit operable to execute programs, and in which when executing a program, the programmable execution unit executes the program for respective groups of one or more execution threads, each execution thread in a group of execution threads corresponding to a respective work item of an output being generated (Harris, [0055] “the programmable execution unit will receive execution threads to be executed, and execute appropriate shading programs for those threads to generate the desired output”), and each execution thread having an associated set of registers for storing data for the execution thread (Harris, [0064] “the threads that execute the common expression instructions save their results to appropriate registers 46”), the method comprising: 
in response to a command to suspend the processing of an output being generated by the data processor: 
for a group of one or more execution threads currently executing a program for the output being generated: 
stopping the issuing of program instructions for execution by the group of one or more execution threads; 
waiting for any outstanding transactions that affect the content of the registers associated with the threads of the group of one or more execution threads for the group of one or more execution threads to complete; and 
when any outstanding transactions that affect the content of the registers associated with the threads of the group of one or more execution threads for the group of one or more execution threads have completed: 
storing to memory: 
the content of the registers associated with the threads of the group of one or more execution threads; and Attorney Docket No. DEHN-16239US0148032 dehn/16239/16239-app- 82 – 
a set of state information for the group of one or more execution threads, the set of state information including at least an indication of the last instruction in the program that was executed for the threads of the group of one or more execution threads.
Claim 24 is substantially similar to claim 1 is rejected based on similar analyses.
Conclusion
The prior arts made of record and not relied upon are considered pertinent to applicant's disclosure Valerio et al. (U.S. 2020/0286201 A1) and Cuadra et al. (U.S. 2017/ 0249151 A1). 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KHOA VU whose telephone number is (571)272-5994. The examiner can normally be reached 8:00- 4:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached on 571-272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/KHOA VU/Examiner, Art Unit 2611                                                                                                                                                                                                        

/SING-WAI WU/Primary Examiner, Art Unit 2611