Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Detailed Action
Claims 1-20 are pending and they are presented for examination.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claims recite a method executing a shader program, the shader program having divergent control flow comprising a method for detecting whether there is entry in divergent control flow.
The claims recite limitations to executing a shader program, the shader program having divergent control flow comprising a method for detecting whether there is entry in divergent control flow; in response to detecting entry of wavefront, store one or more task list entries, selecting a plurality of task list entries for execution wavefront and scheduling the wavefront for execution, are processes that, under its broadest reasonable interpretation, covers performance of the limitations in the mind but for the recitation of generic computer components. 
That is, other than reciting “A method for executing a shader program” in claims 1, 10 and 19. For example, the detecting entry of a first wavefront of the shader program. The associating the software artifacts and the functional specification with one or more functional areas encompasses the user thinking of a correlation between the software artifacts and the functional specification detecting, selecting and scheduling the wavefront for execution, one or more development parameters encompasses the user performing metal calculations, perhaps with the aid of pen and paper. Finally, the output an indication step can encompass the user conveying the selecting a plurality of task list entries for execution and scheduling the wavefront for execution. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.
This judicial exception is not integrated into a practical application. In particular, the claim only recites one additional element - a computer unit comprising processor of claim 10, a processing unit configured to execute a shader program having divergent control flow, wherein the processor is configured to detect entry, selecting and schedule (i.e. as a generic computer system performing generic computer functions) such that it amounts no more than mere instructions to apply the exception using a generic component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. 
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using. A processing unit configured to execute a shader program having divergent control flow, wherein the processor is configured to detect entry, selecting and outputting (scheduling) step amounts to no more than mere instructions to apply the exception using a generic computer system. Mere instructions to apply an exception using a generic computer system cannot provide an inventive concept. Accordingly, the claims are not patent eligible.
Claims 2-9 and 11, 12, 13, 14, 15, 16, 17 and 18, and 20 are rejected for the same reason as claims 1, 10 and 19 above.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention.

Claims 1, 2, 10,11, and 19 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over HARTOG (US. 2013/0160017 A1) in view of Bharadwaj (US 5,894,576 A) further in view of Agarwal (US 2015/0124608 A1).
As to claim 1, HARTOG teaches detecting entry of a first wavefront of the shader program into a divergent section (Instructions across a wavefront are issued one at a time, and when all work-items follow the same control flow, each work-item executes the same program. An execution mask and work-item predication are used to enable divergent control flow within a wavefront, paragraphs [36]-[38]), wherein, in the divergent section, a first set of one or more work-items is slated to execute in a first control flow path and additional work-items are slated to execute in a second control flow path different from the first control flow path (an execution mask and work-item predication are used to enable divergent control flow where each individual work-item can actually take a unique code path through a kernel driver, paragraphs [36]-[38]);
in response to the detecting, storing one or more task list entries corresponding to the first set of one or more work-items into one or more task lists for the first control flow path (when all work-items follow the same control flow (divergent), each work-item executes the same program… each individual work -item can actually take a unique code path through the kernel… A plurality of command buffers 125 can be maintained with each process (small programs can aid running a specific processes) scheduled for execution on the APD, paragraphs [63], [78]-[83]; system also includes a hardware scheduler (HWS) 128 for selecting a process from a run list for execution on APD 104. HWS can select processes from run list using round robin methodology, priority level, or based on other scheduling policies, paragraphs [45-55]);
selecting, from the one or more task lists for the first control path, a plurality of task list entries for execution as a second wavefront, (each of the CPs 124 can have one or more tasks to submit as inputs to other resources within APD 104, where each task can represent multiple wavefronts. After a first task is submitted as an input, this task may be allowed to ramp up, over a period of time, to utilize all the APD resources necessary for completion of the task. By itself, this first task may or may not reach a maximum APD utilization threshold, paragraphs [72-75]); and 
scheduling the second wavefront for execution (a scheduler populates the RL of processes, paragraphs [75]-[83]).

HARTOG does not teach selecting, from the one or more task lists for the first control path, a plurality of task list entries for execution as a second wavefront, wherein the plurality of task list entries includes one or more tasks from the first wavefront and one or more entries from an additional wavefront. However, Bharadwaj teaches selecting, from the one or more task lists for the first control path, a plurality of task list entries for execution as a second wavefront, wherein the plurality of task list entries includes one or more tasks from the first wavefront and one or more entries from an additional wavefront (The wavefront initializer 74 also chooses a block on the wavefront to be the active block, which is the block on the wavefront into which instructions are currently being scheduled. Once the wavefront is chosen, the instruction scheduler 76 may begin scheduling instructions. Instructions are always scheduled into the active block. The instruction scheduler 76 determines which instruction should be scheduled next by looking at the data ready list for the active block. If no suitable candidate instruction is available, … whenever the active block becomes closed (the closing of blocks is discussed below), the wavefront updater 78 attempts to update the wavefront. The wavefront updater 78 uses information generated by other parts of the scheduler to determine which blocks should be removed from the wavefront and which should be added to the wavefront, col. 4, line 61-col. 5, line 16).
It would have obvious to one of ordinary skill in the art before effective filing date of claimed invention to incorporate the teaching of selecting, from the one or more task lists for the first control path, a plurality of task list entries for execution as a second wavefront, wherein the plurality of task list entries includes one or more tasks from the first wavefront and one or more entries from an additional wavefront as taught by Bharadwaj into HARTOG to efficiently and simultaneously launching two or more tasks to resources within accelerated processing device, enabling all work-items to access various accelerated processing device resources.
However, HARTOG and Bharadwaj do not teach wherein each of the plurality of task list entries is slated to execute in the first control flow path. However, Agarwal teaches wherein each of the plurality of task list entries is slated to execute in the first control flow path (centralized flow scheduler 180 adds a table entry in master weighting table 190 for each available path, which includes a path identifier, the path's corresponding links, and an initial random path weighting (see FIG. 2A and corresponding text for further details). In turn, centralized flow scheduler 180 sends table entries to each host that corresponds to the particular host. For example, centralized flow scheduler 180 sends table entries to host A 100 corresponding to paths AB1, AB2, and AB3, which host A 100 stores in local weighting table 135 (see FIG. 2B and corresponding text for further details), paragraphs [32-33]; Host A 100's virtual machine 115 generates data packet 300, which has a destination at a virtual machine executing on host B, paragraphs [40-44]).
It would have obvious to one of ordinary skill in the art before effective filing date of claimed invention to incorporate the teaching of wherein each of the plurality of task list entries is slated to execute in the first control flow path as taught by Agarwal into HARTOG and Bharadwaj to efficient resource utilization in a data center network through adaptive data flow scheduling.

As to claim 10, Bharadwaj teaches a compute unit comprising:
 	a command processor; and 
a processing unit configured to execute a shader program having divergent control flow (shader core 122 can simultaneously execute a predetermined number of wavefronts 136, each wavefront 136 comprising a multiple work-items, paragraphs []; across a wavefront are issued one at a time, and when all work-items follow the same control flow, each work-item executes the same program (or shade program), paragraphs [36-40] wherein the command processor is configured:
detecting entry of a first wavefront of the shader program into a divergent section (Instructions across a wavefront are issued one at a time, and when all work-items follow the same control flow, each work-item executes the same program. An execution mask and work-item predication are used to enable divergent control flow within a wavefront, paragraphs [36]-[38]), wherein, in the divergent section, a first set of one or more work-items is slated to execute in a first control flow path and additional work-items are slated to execute in a second control flow path different from the first control flow path (an execution mask and work-item predication are used to enable divergent control flow where each individual work-item can actually take a unique code path through a kernel driver, paragraphs [36]-[38]);
in response to the detecting, storing one or more task list entries corresponding to the first set of one or more work-items into one or more task lists for the first control flow path (when all work-items follow the same control flow (divergent), each work-item executes the same program… each individual work -item can actually take a unique code path through the kernel… a plurality of command buffers 125 can be maintained with each process (small programs can aid running a specific processes) scheduled for execution on the APD, paragraphs [63], [78]-[83]; system also includes a hardware scheduler (HWS) 128 for selecting a process from a run list for execution on APD 104. HWS can select processes from run list using round robin methodology, priority level, or based on other scheduling policies, paragraphs [45-55]);
selecting, from the one or more task lists for the first control path, a plurality of task list entries for execution as a second wavefront, (each of the CPs 124 can have one or more tasks to submit as inputs to other resources within APD 104, where each task can represent multiple wave fronts. After a first task is submitted as an input, this task may be allowed to ramp up, over a period of time, to utilize all the APD resources necessary for completion of the task. By itself, this first task may or may not reach a maximum APD utilization threshold, paragraphs [72-75]); and 
scheduling the second wavefront for execution (a scheduler populates the RL of processes, paragraphs [75]-[83]).
HARTOG does not teach selecting, from the one or more task lists for the first control path, a plurality of task list entries for execution as a second wavefront, wherein the plurality of task list entries includes one or more tasks from the first wavefront and one or more entries from an additional wavefront. Bharadwaj teaches selecting, from the one or more task lists for the first control path, a plurality of task list entries for execution as a second wavefront, wherein the plurality of task list entries includes one or more tasks from the first wavefront and one or more entries from an additional wavefront (the wavefront initializer 74 also chooses a block on the wavefront to be the active block, which is the block on the wavefront into which instructions are currently being scheduled. Once the wavefront is chosen, the instruction scheduler 76 may begin scheduling instructions. Instructions are always scheduled into the active block. The instruction scheduler 76 determines which instruction should be scheduled next by looking at the data ready list for the active block. If no suitable candidate instruction is available, … whenever the active block becomes closed (the closing of blocks is discussed below), the wavefront updater 78 attempts to update the wavefront. The wavefront updater 78 uses information generated by other parts of the scheduler to determine which blocks should be removed from the wavefront and which should be added to the wavefront. The operational flow of the wavefront updater is illustrated in FIGS. 11 and 12 and will be described below, col. 4, line 61-col. 5, line 16).
It would have obvious to one of ordinary skill in the art before effective filing date of claimed invention to incorporate the teaching of selecting, from the one or more task lists for the first control path, a plurality of task list entries for execution as a second wavefront, wherein the plurality of task list entries includes one or more tasks from the first wavefront and one or more entries from an additional wavefront as taught by Bharadwaj into HARTOG to efficiently and simultaneously launching two or more tasks to resources within accelerated processing device, enabling all work-items to access various accelerated processing device resources.
However, HARTOG and Bharadwaj do not teach wherein each of the plurality of task list entries is slated to execute in the first control flow path. However, Agarwal teaches wherein each of the plurality of task list entries is slated to execute in the first control flow path (centralized flow scheduler 180 adds a table entry in master weighting table 190 for each available path, which includes a path identifier, the path's corresponding links, and an initial random path weighting (see FIG. 2A and corresponding text for further details). In turn, centralized flow scheduler 180 sends table entries to each host that corresponds to the particular host. For example, centralized flow scheduler 180 sends table entries to host A 100 corresponding to paths AB1, AB2, and AB3, which host A 100 stores in local weighting table 135 (see FIG. 2B and corresponding text for further details), paragraphs [32-33]; Host A 100's virtual machine 115 generates data packet 300, which has a destination at a virtual machine executing on host B, paragraphs [40-44]).
It would have obvious to one of ordinary skill in the art before effective filing date of claimed invention to incorporate the teaching of wherein each of the plurality of task list entries is slated to execute in the first control flow path as taught by Agarwal into HARTOG and Bharadwaj to efficient resource utilization in a data center network through adaptive data flow scheduling.
As to claim 19, it is rejected for the same reason as claims 1 or 10 above.
As to claims 2, 11 and 20, Bharadwaj teaches exiting the shader program at the divergent section (claims 1-3).

Claims 3 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over HARTOG (US. 2013/0160017 A1) in view of Bharadwaj (US 5,894,576 A) further in view of Agarwal (US 2015/0124608 A1) further in view of Diamos (US 2016/0019066 A1).
As to claims 3 and 12, Hartog, Bharadwaj and Agarwal do not teach detecting entry of the shader program into the divergent section comprises detecting execution of a function pointer call by the shader program. However, Diamos teaches detecting entry of the shader program into the divergent section comprises detecting execution of a function pointer call by the shader program (Inline function calls are equivalent to branches. Function-calls that use function-pointers are implemented using a branch instruction (BRX). For example, BRX R0 causes each thread to branch to a location that is determined by the per-thread register value R0. In the node 520, threads are added to the convergence barrier B0 at the entry point to the region of the program. The barrier participation mask 425 for the convergence barrier B0 is updated to indicate the threads that participate in the convergence barrier B0. The convergence barrier B0 synchronizes threads after the function-calls complete. The threads participating in the convergence barrier B0 may diverge when the branch instruction is executed, paragraphs [83]-[94]). 

It would have obvious to one of ordinary skill in the art before effective filing date of claimed invention to incorporate the teaching of detecting entry of the shader program into the divergent section comprises detecting execution of a function pointer call by the shader program as taught by HARTOG and Bharadwaj into Diamos to allow improves execution efficiency for threads to check on a volatile value and then be suspended for a specified duration to allow other threads to execute.

Claims 4, 8, 9, 13, 17, 18 are rejected under 35 U.S.C. 103 as being unpatentable over HARTOG (US. 2013/0160017 A1) in view of Bharadwaj (US 5,894,576 A) Agarwal (US 2015/0124608 A1) further in view of Coon (US. 2009/0240931).



As to claims 4 and 13, Hartog, Bharadwaj and Agarwal do not teach detecting entry of the shader program into the divergent section comprises detecting a return from a function call. However, Coon teaches detecting entry of the shader program into the divergent section comprises detecting a return from a function call (individual threads within a thread group to branch independently from other threads in the thread group while allowing multiple threads in the thread group to be executed in parallel when the threads take the same branch path or function call path and when they return from diverging branches and function calls, paragraphs [7]-[8]).
It would have obvious to one of ordinary skill in the art before effective filing date of claimed invention to incorporate the teaching of detecting entry of the shader program into the divergent section comprises detecting a return from a function call as taught by Coon into HARTOG and Bharadwaj to allow designed to switch rapidly from one thread to another so that instructions from different threads can be issued in any sequence without loss of efficiency.

As to claim 8, Coon teaches each task list is associated with a different segment of a shader program (the first task is removed from the shader core and stored. Then, a task with the next highest priority in the RL is executed within the shared core, paragraphs [77]-[83]).
As to claim 17, Coon teaches each task list is associated with a different segment of a shader program (the first task is removed from the shader core and stored. Then, a task with the next highest priority in the RL is executed within the shared core, paragraphs [77]-[83]).

As to claims 9 and 18, Coon teaches wherein scheduling the second wavefront comprises: determining that the one of the one or more task lists includes a plurality of tasks for the segment of the shader program associated with the task list (the first task is removed from the shader core and stored. Then, a task with the next highest priority in the RL is executed within the shared core, paragraphs [77]-[83]); and 
scheduling at least some of the plurality of tasks for execution as a wavefront (a scheduler populates the RL of processes, paragraphs [75]-[83]).

Allowable Subject Matter
Claims 5, 6, 7, 14, 15 and 16 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Conclusion

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CAMQUY TRUONG whose telephone number is (571)272-3773. The examiner can normally be reached M-F 8:30Am -5Pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Meng-Ai An can be reached on 571272-3756. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/CAMQUY TRUONG/Primary Examiner, Art Unit 2195