DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114.
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 08/11/2022  has been entered. Claims 1-4, 6-21, 23-36 filed 08/11/2022 are presented for examination.
Response to Arguments
Applicant’s arguments with respect to amended claims 1, 9, 18, 26, 27, 35, 36 filed on 08/11/2022 have been considered but they are moot in view of the new ground(s) of rejection.
Claim Interpretation - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 
The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claim 35 in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional Language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: " means for determining whether to divide a group of threads”; “means for dividing, upon determining to divide the group of threads”; “means for executing, upon dividing the group of threads” in claim 35. The claim limitation uses generic placeholders “means" coupled with functional Language “for” without reciting sufficient structure and the generic placeholder “means for" are not preceded by a structural modifier. 
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. 
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: 
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.
Claims 1-4 and 6-21 and 23-36 are rejected under 35 U.S.C. 103 as being unpatentable by Vembu et al. (U.S. 2018/0308195 A1) in view of Dejanovic et al. (U.S. 2021/0157600 A1) and further in view of Chen et al. (U.S. 9,799,094 B1) and further in view of Johnson et al. (U.S. 2020/0081748 A1)
Regarding Claim 1, Vembu discloses a method of graphics processing (Vembu, [0002] “methods developed to perform specific operations on graphics data”), comprising: 
determining whether to distributing a group of threads into a plurality of sub-groups of threads, each thread of the group of threads being associated with a shader program (Vembu, Fig. 13, [0140] “the launch unit 1319 will attempt to launch the sub-groups of the thread group based on available execution resources within the execution pipeline” and Fig. 14, [0145] “thread group 1404, thread group 1406, and thread group 1408, have been dispatched for execution on the graphics multiprocessor 1410, all sub-groups of a thread group can be launched on the graphics multiprocessor 1410” Vembu teaches determine whether distributing (attempt to launch) thread groups (1404, 1406, 1408) into thread sub-groups (1404A, 1404B, Fig. 14) associated with a shader program by the instructions and shader unit (graphic multiprocessor 1410).
distributing upon determining to divide the group of threads into the plurality of sub- groups of threads, the group of threads into the plurality of sub-groups of threads (Vembu, [0240] “the pipeline manager is to distribute the thread group as multiple thread sub-groups” and Fig. 15, [0145] “thread group 1404, thread group 1406, and thread group 1408, have been dispatched for execution on the graphics multiprocessor 1410, all sub-groups of a thread group can be launched on the graphics multiprocessor 1410” Vembu teaches dividing thread groups (1404, 1406, 1408) into thread sub-groups (1404A, 1404B, Fig. 14) associated with a shader program by the instructions and shader unit (processor 1410); and 
executing, upon distributing the group of threads into the plurality of sub-groups of threads, a subsection of the shader program for each sub-group of threads of the plurality of sub-groups of threads, the subsection of the shader program completing execution for one sub-group of threads before commencing execution for a subsequent sub-group of threads (Vembu, Fig. 15, [0149] “the logic is configured to launch thread sub-groups for a thread group on a multiprocessor unit at 1502. When starting from an idle state, all sub-groups of a thread group can be launched on the graphics multiprocessor. Once a thread sub-group of a thread group completes at 1503” and [0150] “the logic 1500 can check the dependencies of pending thread sub-groups at 1506. If the logic 1500 determines at 1507 that a pending thread sub-group is ready to launch (e.g., has no unsatisfied dependencies), the logic 1500 can launch one or more pending sub-groups at 1508” Vembu teaches upon distributing g the group of threads, the instruction of the shader program and shader unit completes execution for one sub thread group (Fig 15, step 1503) before start to execute the next sub group thread (step 1508).
However, Vembu does not explicitly teach determining whether to divide distributing a group of threads into a plurality of sub-groups of threads; dividing upon determining to divide the group of threads into the plurality of sub- groups of threads;
Dejanonvic teaches determining whether to divide a group of threads into a plurality of sub-groups of threads; dividing upon determining to divide the group of threads into the plurality of sub- groups of threads (Dejanovic, [0051] The division of the thread group into thread group sub-sets can be as desired, the number of threads within a (and each) thread group sub-set is equal to a (the same) power of two, for example two, four or eight, a thread group ("warp") is divided into thread group sub-sets that each comprise (exactly) four execution threads” and [0052] “for example, the thread group ("warp") comprises a total of sixteen execution threads, divided into four thread group sub-sets, each such thread group sub-set comprising four threads” Dejanovic teaches determining whether to divide a group of threads into a plurality of sub-groups of threads is equal to power of two or four or eight. Upon determining of power of two, divide a group of thread (16 threads) into four sub-groups of four threads.
Vembu and Dejanovic are combinable because they are from the same field of endeavor, system and method for image processing and try to solve similar problems.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made for modifying the method of Vembu to combine with determining whether to divide a group of threads into a plurality of sub-groups of threads (as taught by Dejanovic) in order to divide a group of threads into a plurality of sub-groups of threads because Dejanovic can provide upon determining of power of two, dividing a group of thread (16 threads) into four sub-groups of four threads (Dejanovic, [0051] [0052]). Doing so, it may provide (Dejanovic, Col. 1, lines 27-29).
However, Vembu and Dejanovic does not explicitly teach insert a loop marker for each of the plurality of sub-groups of threads based on a subsection of the shader program; 
Chen teaches insert a loop marker for each of the plurality of sub-groups of threads based on a subsection of the shader program (Chen, Col. 5 lines 37-41 “The per-instance shader preamble may enable the shader processor to allocate memory for instance constants for use by different groups of threads (also referred to as a wave or warp) to execute main shader code” and Col. 11, lines 10-13 “Compiler 38 may utilize instructions such as a per-instance preamble start to mark the beginning of the per-instance shader preamble and a preamble end instruction to mark the end of the per-instance shader preamble” and Col. 11, lines 32- 36  “(iii) a flag for each wave (up to the maximum number of waves) denoting whether that particular wave is the first wave of the instance to be executed, and (iv) a flag denoting the per-instance shader preamble has completed execution” and Fig. 3, Col. 12, lines 55-62 “per-instance shader preamble code block 39 comprises pseudocode for a per-instance shader preamble start instruction (per_instance_preamble_start) called shader code and ends with an end per-instance shader preamble instruction (end_preamble), a loop that loads constants (via instruction "ldck") to a destination address (dst) from a source address (src), load by compiler 38” Chen teaches a compiler can insert a loop maker e.g. a loop with marker (flag up to  the maximum number of waves) denoting “per-instance shader preamble”, defines a range includes start/end instructions (subsection) of shader code for different sub-groups of threads (waves/warps) (Fig. 3).
Vembu, Dejanovic and Chen are combinable because they are from the same field of endeavor, system and method for image processing and try to solve similar problems.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made for modifying the method of Vembu to combine with a loop marker for sub-group of threads (as taught by Chen) in order to insert a loop marker for each of the plurality of sub-groups of threads based on the subsection of the shader program because Chen can provide a compiler can insert a loop maker e.g. a For loop with marker I, defines a range includes start/end instructions (subsection) of shader code for different sub-groups of threads (waves/warps) (Chen, Fig. 2, Col. 7, lines 15-23). Doing so, it may reduce the need to increase the amount of on-chip RAM, reduce memory traffic, increases performance, and reduces power consumption as there may be a reduction of traffic on the bus (Chen, Col. 2, lines 9-11).
Furthermore, Vembu teaches pixel shader 2012 uses texture sample logic Vembu, ([0191] To execute the shader program,..pixel shader 2102 uses texture sampling logic in the sampler 2110 to access texture data…for each geometric fragment” Dejanovic teaches fragment shader programs apply textures to the fragments  (Dejanovic, [0191]). So that a combination of Vembu, Dejanovic and Chen teaches the shader code of Chen is interpreted to include texture processing.
However, Vembu, Dejanovic and Chen does not explicitly teach wherein the loop marker is a texture loop marker associated with texture instructions, and wherein the texture loop marker is inserted if successive texture instructions for the group of threads are associated with a same surface; 
Johnson teaches wherein the loop marker is a texture loop marker associated with texture instructions, and wherein the texture loop marker is inserted if successive texture instructions for the group of threads are associated with a same surface (Johnson, [0005] “Threads executing common code sections (e.g., inner loop bodies) are urged to converge using instructions inserted at strategic locations in computer code sections. In support of this various instructions and/or code markers (e.g., compiler directives, ISA extensions) are introduced: Predict, Join, Wait, Confirm, Rejoin, and Cancel” and [0115] “FIG. 3. The loop comprises a common code section 302 (DoSomething( )) with a divergent code branch 304 that evaluates to TRUE (i.e., is taken) in a different iteration for each of many threads all with different thread IDs 308. The common code section 302 is therefore serially executed… it may be more efficient to wait for all threads to arrive at body inside the if block (just before DoSomethingO) before executing the common code section 302” and [0117] “FIG. 20, the streaming multiprocessors 2000 may provide aspects of the thread scheduler and code profiler…execution instructions (Predict instruction…) and [0179] “The streaming multiprocessors 2000 utilize a high performance LI data cache and a SIMT thread… The GV100 includes an L1 instruction cache 2006, a 128 KB L1 data cache/shared memory 2002 and four TEX blocks 2004” Johnson teaches a texture loop marker associated with texture instruction inserted e.g. Predict, Join, Wait…wherein if successive texture instruction for the group threads (IF = TRUE, Fig. 3) associated with a same surface e.g. Fig. 20 shows a same four TEX blocks 2004 (refer as to texture surface) in memory (2002).
wherein the loop marker is a texture loop marker associated with texture instructions, and wherein the texture loop marker is inserted if successive texture instructions for the group of threads are associated with a same surface (Chen Col.9, L 2-6 “code for shader programs (e.g., a vertex shader, a pixel or fragment shader, tessellation-related shaders, a compute shader, etc.) that execute on a shader core (also referred to as a shader processor or kernel) of GPU and Col. 11, lines 10-13 “Compiler 38 may utilize instructions such as a per-instance preamble start to mark the beginning of the per-instance shader preamble and a preamble end instruction to mark the end of the per-instance shader preamble” a combination of Vembu (pixel shader uses texture sampling logic) and Dejanovic (shader program applying textures to the fragments) and Chen (shader program, a pixel or fragment shader) that execute on a shader code include texture processing  with instructions can be used to teach wherein the loop marker is a texture loop marker associated with texture instructions, and wherein the texture loop marker is inserted if successive texture instructions for the group of threads are associated with a same surface.
Vembu, Dejanovica, Chen and Johnson are combinable because they are from the same field of endeavor, system and method for image processing and try to solve similar problems.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made for modifying the method of Vembu to combine with a loop marker for sub-group of threads (as taught by Johnson) in order to insert a texture loop marker if successive texture instructions are
associated with a same surface because Johnson can provide a texture loop marker associated with texture instruction inserted e.g. Predict, Join, Wait…wherein if successive texture instruction for the group threads (IF = TRUE, Fig. 3) associated with a same surface e.g. Fig. 20 shows a same four TEX blocks 2004 (refer as to texture surface) in memory (2002) (Johnson, [0005], Fig. 3, [0115], Fig. 20, [0117]). Doing so, it may influence the operation of a thread scheduler on multiple threads executing the code segment (Johnson, [0007]).
Regarding Claim 2, Vembu as modified discloses the method of claim 1, wherein the group of threads is a wave of threads and the plurality of sub-groups of threads is a plurality of sub-waves of threads (Vembu, [0071] “The instruction unit 404 can dispatch instructions as thread groups (e.g. warps)” Vembu teaches a group of threads is a warp (wave) of threads and in other word, a sub group of thread can be a sub-warp of threads.
Regarding Claim 3, Vembu as modified discloses the method of claim 1, wherein the group of threads is at least one of a group of pixels, a group of vertices, or a group of work items (Vembu, [0072] “the register file 408 is divided between the different warps being executed by the graphics multiprocessor 400” and [0190] “the thread execution logic 2100 to cache thread instructions for the execution units” and [0189] “the 256 bits of the vector are stored in a register and the execution unit operates on the vector as four separate 64-bit, 32-bit, 16-bit, 8-bit data” Vembu teaches a group of thread (warp) stored in a register is at least one of a group of pixels (64-bit, 32-bit, 16-bit, 8-bit data).
Regarding Claim 4, Vembu as modified discloses the method of claim 3, wherein the group of pixels is a group of texture pixels or a group of texels (Vembu, Fig. 21, [0191] “To execute the shader program, the shader processor 2102 dispatches threads to an execution unit (e.g., 2108A) via thread dispatcher 2104, pixel shader 2102 uses texture sampling logic in the sampler 2110 to access texture data in texture maps stored in memory” Vembu teaches a pixel shader dispatches the threads to an execution unit by access texture data (pixels data) stored in memory.
Regarding Claim 5 (Canceled).
Regarding Claim 6, the method of claim 1, Vembu as modified does not explicitly teach wherein the loop marker defines a range of the subsection of the shader program.  
However, Chen teaches wherein the loop marker defines a range of the subsection of the shader program (Chen, Fig. 3, for (i = O; i < FOOTPRINT; i++),  Col.12 lines 55-59 “Register reg_instance_footprint may store a value that denotes the number of uniforms in the constant memory 44 and may be received from compiler 38 (for example, in a 8-bit vec4 register with a value of up to 256 (or 4 KB)” Chen teaches a the loop marker defines a range of subsection  i<FOOTPRINT from 0 to 256 bits of shader program.
Vembu, Dejanovica and Chen are combinable see rationale in claim 1.
Regarding Claim 7, the method of claim 6, Vembu as modified does not explicitly teach wherein the range includes a start of the subsection of the shader program and an end of the subsection of the shader program.  
However, Chen teaches wherein the range includes a start of the subsection of the shader program and an end of the subsection of the shader program (Chen, “FIG. 3  per _instance preamble_ start shader_ code…. end preamble;” Chen teaches the range includes a start and an end of the instruction of the shader program (shader code).
Vembu, Dejanovica and Chen are combinable see rationale in claim 1.
Regarding Claim 8, the method of claim 1, Vembu as modified does not explicitly teach wherein the loop marker is associated with an initiation mechanism or a hardware initiation mechanism.  
However, Chen teaches wherein the loop marker is associated with an initiation mechanism or a hardware initiation mechanism (Chen, Col. 7, lines 26-29 “Processor 12 (via, e.g., a compiler) may compose a per-instance shader preamble to, when executed on GPU 14, load and store the attributes from the UBO to a section of constant RAM” Chen teaches a loop marker (i) is associated with a hardware initiation mechanism (load and store the attributes to RAMS), executed on GPU by a compiler
Vembu, Dejanovic and Chen are combinable see rationale in claim 1.
Regarding Claim 9 (Currently amended), Vembu as modified discloses the method of claim 1, further comprising: determining whether successive instructions of a set of instructions for the group of threads are associated with [[a]] the same surface (Vembu, Fig. 3B, [0064]  “The series of instructions transmitted to the processing cluster 314 constitutes a thread, a thread group refers to a group of threads concurrently executing the same program on different input data” and Fig. 21, [0191] “the graphics and media pipelines send thread initiation requests to thread execution logic 2100, the shader processor 2102 is invoked to further compute output information and cause results to be written to output surfaces” Vembu teaches the series of instructions transmitted to a group of threads concurrently executing the same program to a pipeline manager (Fig. 3B) and the media pipelines send thread initiation request to thread execution logic, the shader processor compute threads and output with a same surface.
Regarding Claim 10 (Currently amended), Vembu as modified discloses the method of claim 9, wherein the set of instructions [[are]] is the texture instructions and the same surface is a texture surface (Vembu, [0067] “a texture unit 306 for performing texture mapping operations, e.g., determining texture sample, reading texture data” and [0053] “For graphics processing operations, processing tasks can include indices of data to be processed, e.g., surface (patch) data” Vembu teaches a texture unit performs instructions for texture mapping includes processed surface patch data.
Regarding Claim 11, the method of claim 9, Vembu does not explicitly teach wherein the set of instructions correspond to downsampling.
However, Dejanovic teaches wherein the set of instructions correspond to downsampling instructions (Dejanovic, [0197] “The downsampling and writeout unit 31 downsamples the fragment data stored in the tile buffer 30 to the appropriate resolution for the output buffer (device)” Dejanovic teaches the instruction the downsampling downsamples the fragment data to the output buffer.
Vembu and Dejanovic are combinable see rationale in claim 1.
Regarding Claim 12, the method of claim 1, Vembu as modified does not explicitly teach wherein a memory footprint of the subsection of the shader program for each of the plurality of sub-groups of threads is associated with a size of a cache.  
However, Chen teaches wherein a memory footprint of the subsection of the shader program for each of the plurality of sub-groups of threads is associated with a size of a cache (Chen, Col. 16, lines 62-63 “a memory (such as constant memory 44) or cache” and Col. 13, lines 55-67 and Col. 14 line 1 “Register reg_instance_footprint may store a value that denotes the number of uniforms in the constant memory 44. Register reg_max_waves_by_instance may store a value of the number of instances that are able to fit within the allocated space within constant memory 44 (for example, a 5-bit register with a value up to 16) that may be precalculated by compiler 38” Chen teaches a memory footprint of shader program (memory 44) for each sub-group threads/waves is associated a size of cache (5 -16 bits).
Vembu, Dejanovic and Chen are combinable see rationale in claim 1.
Regarding Claim 13, the method of claim 12, Vembu as modified does not explicitly teach wherein the memory footprint of all 
However, Chen teaches wherein the memory footprint of all subsections of the shader program for the group of threads exceeds the size of the cache (Chen, Col. 14, lines 47-64 “Graphics driver 40 or compiler 38 may alert…set the instance_head to 0 where the instance_head+reg_instance_footprint >= reg_instance_ram_size (i.e., where incrementing the instance head would be greater than or equal to the total allocated instance space in memory, treating that portion of memory as a wrap-around ring buffer) Chen teaches the compiler may alert when the memory footprint for the group of threads exceeds the size of the cache (instance_head+reg_instance_footprint >= reg_instance_ram_size)
Vembu, Dejanovic and Chen are combinable see rationale in claim 1.
Regarding Claim 14, the method of claim 12, Vembu as modified does not explicitly teach further comprising: storing, in the cache, the memory footprint of the subsection of the shader program for each of the plurality of sub-groups of threads.  
However, Chen teaches storing, in the cache, the memory footprint of the subsection of the shader program for each of the plurality of sub-groups of threads (Chen, Col. 16, lines 62-63 “a memory (such as constant memory 44) or cache” and Col. 13, lines 55-67 “Register reg_instance_footprint may store a value that denotes the number of uniforms in the constant memory 44. Register reg_max_waves_by_instance may store a value of the number of instances that are able to fit within the allocated space within constant memory 44 (for example, a 5-bit register with a value up to 16) that may be precalculated by compiler 38” Chen teaches storing register_reg_instance_footprint for threads/waves in a constant memory 44.
Vembu, Dejanovic and Chen are combinable see rationale in claim 5.
Regarding Claim 15, Vembu discloses the method of claim 12, wherein the cache is a texture cache or a level 1 (L1) texture cache (Vembu, [0067] “Texture data is read from an internal texture L1 cache (not shown) or in some embodiments from the L1 cache within graphics multiprocessor 304 and is fetched from an L2 cache” Vembu teaches a level 1 (L1) texture cache.
Regarding Claim 16, the method of claim 1, Vembu does not explicitly teach wherein the determination whether to divide the group of threads into the plurality of sub-groups of threads is performed by a compiler.
However, Dejanovic teaches wherein the determination whether to divide the group of threads into the plurality of sub-groups of threads is performed by a compiler (Dejanovic, Fig. 3, [0203] “the shader program being provided by the application 2 to the driver 4 which then compiles 302 the shader program to the binary code 303 for the graphics processing pipeline” and [0205] “threads that are to execute a shader program can be organized into groups ("warps") of threads” and [0229] FIG. 8 shows one way in which the sixteen threads T0-T15 of the thread group ("warp") 601 could be divided between the four execution lanes 41 for execution” Dejanovic teaches a compiler can execute a shader program includes group of threads (warps) which can be divided into plurality of sub-groups of threads (Fig. 8).
Vembu and Dejanovic are combinable see rationale in claim 1.
Regarding Claim 17, Vembu as modified discloses the method of claim 1, wherein the subsection of the shader program for each of the plurality of sub-groups of threads is executed by a graphics shader in a graphics processing unit (GPU) (Vembu, [0122] “a shader unit (e.g., graphics multiprocessor 304 of FIG. 3) and Fig. 14, [0145] “multiple thread groups, for example, thread group 1404, thread group 1406, and thread group 1408, have been dispatched for execution on the graphics multiprocessor 1410” Vembu teaches each of the plurality of sub-groups of threads is executed by a graphic shader (1410, Fig. 14) in a GPU.
Regarding Claim 18 (Currently amended), Vembu as modified discloses an apparatus for graphics processing (Vembu, [0021] “FIG.16 a diagram of a processing system” Fig. 6 is diagram of a graphics processing system, comprising: 
a memory; and 
at least one processor coupled to the memory and configured to: 
determine whether to divide a group of threads into a plurality of sub- groups of threads, each thread of the group of threads being associated with a shader program; 
divide, upon determining to divide the group of threads into the plurality of sub-groups of threads, the group of threads into the plurality of sub-groups of threads; and 
insert a loop marker for each of the plurality of sub-groups of threads based on a subsection of the shader program, wherein the loop marker is a texture loop marker associated with texture instructions, and wherein the texture loop marker is inserted if successive texture instructions for the group of threads are associated with a same surface;
execute, upon dividing the group of threads into the plurality of sub-groups of threads, a subsection of the shader program for each sub-group of threads of the plurality of sub-groups of threads, the subsection of the shader program completing execution for one sub-group of threads before commencing execution for a subsequent sub-group of threads.  
Claim 18 is substantially similar to claim 1 is rejected based on similar analyses.
Regarding Claim 19, Vembu discloses the apparatus of claim 18, wherein the group of threads is a wave of threads and the plurality of sub-groups of threads is a plurality of sub-waves of threads.
 Claim 19 is substantially similar to claim 2 is rejected based on similar analyses.
Regarding Claim 20, Vembu as modified discloses the apparatus of claim 18, wherein the group of threads is at least one of a group of pixels, a group of vertices, or a group of work items.  
Claim 20 is substantially similar to claim 3 is rejected based on similar analyses.
Regarding Claim 21, Vembu as modified discloses the apparatus of claim 20, wherein the group of pixels is a group of texture pixels or a group of texels.  
Claim 21 is substantially similar to claim 4 is rejected based on similar analyses.
Regarding Claim 22 (Canceled).
.Regarding Claim 23 (Currently amended), Vembu as modified discloses the apparatus of claim [[22]] 18, wherein the loop marker defines a range of the subsection of the shader program.  
Claim 23 is substantially similar to claim 6 is rejected based on similar analyses.
Regarding Claim 24, Vembu as modified discloses the apparatus of claim 23, wherein the range includes a start of the subsection of the shader program and an end of the subsection of the shader program.  
Claim 24 is substantially similar to claim 7 is rejected based on similar analyses.
Regarding Claim 25 (Currently amended), Vembu discloses the apparatus of claim [[22]] 18, wherein the loop marker is associated with an initiation mechanism or a hardware initiation mechanism.  
Claim 25 is substantially similar to claim 8 is rejected based on similar analyses.
Regarding Claim 26 (Currently amended), Vembu as modified discloses the apparatus of claim 18, wherein the at least one processor is further configured to: determine whether successive instructions of a set of instructions for the group of threads are associated with [[a]] the same surface.  
Claim 26 is substantially similar to claim 9 is rejected based on similar analyses.
Regarding Claim 27 (Currently amended), Vembu as modified discloses the apparatus of claim 26, wherein the set of instructions [[are]] is the texture instructions and the same surface is a texture surface.  
Claim 27 is substantially similar to claim 10 is rejected based on similar analyses.
Regarding Claim 28, Vembu as modified discloses the apparatus of claim 26, wherein the set of instructions correspond to downsampling instructions.  
Claim 28 is substantially similar to claim 11 is rejected based on similar analyses.
Regarding Claim 29, Vembu as modified discloses the apparatus of claim 18, wherein a memory footprint of the subsection of the shader program for each of the plurality of sub-groups of threads is associated with a size of a cache.  
Claim 29 is substantially similar to claim 12 is rejected based on similar analyses.
Regarding Claim 30, Vembu as modified discloses the apparatus of claim 29, wherein the memory footprint of all subsections of the shader program for the group of threads exceeds the size of the cache.  
Claim 30 is substantially similar to claim 13 is rejected based on similar analyses.
Regarding Claim 31, Vembu as modified discloses the apparatus of claim 29, wherein the at least one processor is further configured to: store, in the cache, the memory footprint of the subsection of the shader program for each of the plurality of sub-groups of threads.  
Claim 31 is substantially similar to claim 14 is rejected based on similar analyses.
Regarding Claim 32, Vembu as modified discloses the apparatus of claim 29, wherein the cache is a texture cache or a level 1 (L1) texture cache.  
Claim 32 is substantially similar to claim 15 is rejected based on similar analyses.
Regarding Claim 33, Vembu as modified discloses the apparatus of claim 18, wherein the determination whether to divide the group of threads into the plurality of sub-groups of threads is performed by a compiler.  
Claim 33 is substantially similar to claim 16 is rejected based on similar analyses.
Regarding Claim 34, Vembu as modified discloses the apparatus of claim 18, wherein the subsection of the shader program for each of the plurality of sub-groups of threads is executed by a graphics shader in a graphics processing unit (GPU).  
Claim 34 is substantially similar to claim 17 is rejected based on similar analyses.
Regarding Claim 35 (Currently amended), Vembu as modified discloses an apparatus for graphics processing, comprising: 
means for determining whether to divide a group of threads into a plurality of sub- groups of threads, each thread of the group of threads being associated with a shader program; 
means for dividing, upon determining to divide the group of threads into the plurality of sub-groups of threads, the group of threads into the plurality of sub-groups of threads; 
insert a loop marker for each of the plurality of sub-groups of threads based on a subsection of the shader program, wherein the loop marker is a texture loop marker associated with texture instructions, and wherein the texture loop marker is inserted if successive texture instructions for the group of threads are associated with a same surface;
means for executing, upon dividing the group of threads into the plurality of sub- groups of threads, a subsection of the shader program for each sub-group of threads of the plurality of sub-groups of threads, the subsection of the shader program completing execution for one sub-group of threads before commencing execution for a subsequent sub-group of threads.  
Claim 35 is substantially similar to claim 1 is rejected based on similar analyses.
Regarding Claim 36 (Currently amended), Vembu as modified discloses a non-transitory computer-readable medium storing computer executable code for graphics processing, the code when executed by a processor causes the processor to: 
determine whether to divide a group of threads into a plurality of sub-groups of threads, each thread of the group of threads being associated with a shader program; 
divide, upon determining to divide the group of threads into the plurality of sub- groups of threads, the group of threads into the plurality of sub-groups of threads; 
insert a loop marker for each of the plurality of sub-groups of threads based on a subsection of the shader program, wherein the loop marker is a texture loop marker associated with texture instructions, and wherein the texture loop marker is inserted if successive texture instructions for the group of threads are associated with a same surface;
execute, upon dividing the group of threads into the plurality of sub-groups of threads, a subsection of the shader program for each sub-group of threads of the plurality of sub-groups of threads, the subsection of the shader program completing execution for one sub-group of threads before commencing execution for a subsequent sub-group of threads.
Claim 36 is substantially similar to claim 1 is rejected based on similar analyses.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KHOA VU whose telephone number is (571)272-5994. The examiner can normally be reached 8:00- 4:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached on 571-272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/KHOA VU/Examiner, Art Unit 2611                                                                                                                                                                                                        
/SING-WAI WU/Primary Examiner, Art Unit 2611