Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Response to Amendment
The amendment filed on December 13, 2021 has been entered.
A substitute specification has been acknowledged.
In view of the amendment to the claims, the amendment of claims 35-40 have been acknowledged.
In view of the amendment of claim 35, Applicant removed “one or more processors including”. Accordingly, the objections of claim 35 has been withdrawn.

Response to Arguments
Applicant’s arguments, see pages 7-9 of Remarks, filed December 13, 2021 have been fully considered. But they are not persuasive. 

Regarding Claim 21, Applicants state in pages 7-9 of Remarks that “Cox and Parker are cited against the rejected claims. Cox is directed towards a technique for tuning a digital camera that makes use of a parallel processing unit that includes a number C of general processing clusters (Cox, par. 25). Parker describes a technique to enable the yielding by threads executing in a processing unit to transfer control to a host processor. (Parker, par. 5). 

As to claim 21, neither Cox nor Parker describe limiting contexts to a specified portion of available threads. Specifically, Cox is cited as allegedly teaching "upon determining that a limitation on usage of the one or more graphics processors is set, limiting usage of the one or more graphics processors by one or more contexts of the plurality of contexts." However, Cox does not describe any form of limitation of usage and furthermore does not describe any type of context specific limitation ….”.

Examiner replies:
The examiner disagrees with Applicant’s premises and conclusion. Claim 21 recited limitations of “upon determining that a limitation on usage of the one or more graphics processors is set, limiting usage of the one or more graphics processors by one or more contexts of the plurality of contexts, wherein limiting usage of the one or more graphics processors includes limiting threads for the one or more contexts to a specified portion of available threads of the one or more graphics processors, the specified portion being less than all available threads of the one or more graphics processors”. Thus, the claim limitations can be interpreted as “a limitation on usage of one graphics processor by one context” during the examination. Therefore, the examiner respectfully maintains that the prior art rejections in this case are proper for the following reasons. In respond to the applicant’s arguments, the examiner recites the combination of COX and Parker in order to disclose the issue. COX discloses a parallel processing subsystem for graphics and video processing and constitutes a graphics processing unit (GPU); a parallel processing subsystem includes one or more general processing clusters (GPCs) and receive processing tasks to be executed via work distribution unit is configured to fetch the indices corresponding to the tasks and distribute the tasks in the parallel processing subsystem (FIGS. 1 and 2; paragraphs [0018] and [0026]). More specifically, paragraph [0025] of COX discloses “PPU 202(0) includes a processing cluster array 230 that includes a number C of general processing clusters (GPCs) 208, where C≥1. Each GPC 208 is capable of executing a large number (e.g., hundreds or thousands) of threads concurrently, where each thread is an instance of a program. In various applications, different GPCs 208 may be allocated for processing different types of programs or for performing different types of computations.  For example, in a graphics application, a first set of GPCs 208 may be allocated to perform tessellation operations and to produce primitive topologies for patches, and a second set of GPCs 208 may be allocated to perform tessellation shading to evaluate patch parameters for the primitive topologies and to determine vertex positions and other per-vertex attributes.  The allocation of GPCs 208 may vary dependent on the workload arising for each type of program or computation”. Thus, COX discloses the limitations of “upon determining that a limitation on usage of the one or more graphics processors is set, limiting usage of the one or more graphics processors by one or more contexts of the plurality of contexts, wherein limiting usage of the one or more graphics processors includes limiting threads for the one or more contexts” recited in claim 21. In the previous Office Action, Examiner stated “COX discloses that each different set of GPCs can be allocated to perform different types of computations. COX dose not specifically disclose “limiting threads for the one or more contexts” to a specified portion of available threads of the one or more graphics processors, the specified portion being less than all available threads of the one or more graphics processors”. In additional, Examiner cited the prior reference Parker to teach those limitations. Parker discloses a parallel processing unit (PPU) is configured to execute a plurality of threads concurrently in two or more streaming multi-processors (SMs) for processing graphics data (FIG. 2; paragraphs [0017] and [0025]). More specifically, paragraph [0023] of Parker describes “the PPU 200 may include 15 distinct SMs 250.  Each SM 250 is multi-threaded and configured to execute a plurality of threads (e.g., 32 threads) from a particular thread block concurrently” and paragraph [0026] of Parker describes “the TMU 215 may configure one or more SMs 250 to execute a vertex shader program that processes a number of vertices defined by the model data.  In one embodiment, the TMU 215 may configure different SMs 250 to execute different shader programs concurrently.  For example, a first subset of SMs 250 may be configured to execute a vertex shader program while a second subset of SMs 250 may be configured to execute a pixel shader program”. Thus, the total available threads is 15x32 (for example based on the above information), the first subset of SMs 250 specifics a portion of available threads for a vertex shader program and the specified subset threads of the first subset of SMs 250 is less than all available threads 

Regarding Claim 22, Applicants state in page 9 of Remarks that “Further as to claim 22, Cox in view of Parker fails also fails to teach or suggest, "specifying the portion of available threads for the limitation on usage of the one or more graphics processors, wherein limiting threads for the one or more contexts to the specified portion of available threads includes limiting the one or more contexts to a subset of the one or more graphics processors." Parker, even in view of Cox, does not address limiting "contexts" to "a portion of available threads." Claims 28 and 36 include similar limitations and overcome this rejection for similar reasons”.

Examiner replies:
The examiner disagrees with Applicant’s premises and conclusion. Claim 22 recited limitations of “specifying the portion of available threads for the limitation on usage of the one or more graphics processors, wherein limiting threads for the one or more contexts to the specified portion of available threads includes limiting the one or more contexts to a subset of the one or more graphics processors, the subset being less than all available graphics processors”. The claim limitations can be interpreted as “a limitation on usage of one graphics processor” and “limiting threads for one context” during the examination. As discussed above, Parker discloses a parallel processing unit (PPU) is configured to execute a plurality of threads concurrently in two or more streaming multi-processors (SMs) for processing graphics data (FIG. 2; paragraphs [0017] and [0025]). More specifically, paragraph [0023] of Parker describes “the PPU 200 may include 15 distinct SMs 250.  Each SM 250 is multi-threaded and configured to execute a plurality of threads (e.g., 32 threads) from a particular thread block concurrently” and paragraph [0026] of Parker describes “the TMU 215 may configure one or more SMs 250 to execute a vertex shader program that processes a number of vertices defined by the model data.  In one embodiment, the TMU 215 may configure different SMs 250 to execute different shader programs concurrently.  For example, a first subset of SMs 250 may be configured to execute a vertex shader program while a second subset of SMs 250 may be configured to execute a pixel shader program”. Thus, the total available threads is 15x32 (for example based on the above information), the first subset of SMs 250 specifics a portion of available threads “15x32” for a vertex shader program. Accordingly, Parker discloses the limitations recited in claim 22. Therefore, the combination of COX and Parker discloses the limitations recited in claim 22 and the above arguments (Claims 28 and 36 have the same reasons)
	

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 21-22, 28-29 and 35-36 are rejected under 35 U.S.C. 103 as being unpatentable over COX (U.S. Patent Application Publication 2014/0152848 A1) in view of Parker et al (U.S. Patent Application Publication 2015/0145871 A1).

	Regarding claim 21, COX discloses a method comprising: 
scheduling shared resources in a system (FIG. 2 is a block diagram of a parallel processing subsystem for the computer system of FIG. 1; paragraph [0026], work distribution unit 200 may be configured to fetch the indices corresponding to the tasks and distribute the tasks in the parallel processing subsystem) for a plurality of contexts of clients of the system (Paragraph [0026], GPCs 208 receive processing tasks to be executed via a work distribution unit 200, which receives commands defining processing tasks from front end unit 212.  Processing tasks include indices of data to be processed, e.g., surface (patch) data, primitive data, vertex data, and/or pixel data, as well as state parameters and commands defining how the data is to be processed (e.g., what program is to be executed)), wherein the shared resources include one or more graphics processors (Paragraph [0025], each PPU 202 advantageously implements a highly parallel processing architecture and includes one or more GPCs 208; paragraph [0027], when PPU 202 is used for graphics processing, for example, the processing workload for each patch is divided into approximately equal sized tasks to enable distribution of the tessellation processing to multiple GPCs 208 … in some embodiments of the present invention, portions of GPCs 208 are configured to perform different types of processing); and 
upon determining that a limitation on usage of the one or more graphics processors is set (Paragraph [0025], as shown in detail, PPU 202(0) includes a processing cluster array 230 that includes a number C of general processing clusters (GPCs) 208, where C≥1), limiting usage of the one or more graphics processors by one or more contexts of the plurality of contexts (Paragraph [0025], for example, in a graphics application, a first set of GPCs 208 may be allocated to perform tessellation operations and to produce primitive topologies for patches, and a second set of GPCs 208 may be allocated to perform tessellation shading to evaluate patch parameters for the primitive topologies and to determine vertex positions and other per-vertex attributes.  The allocation of GPCs 208 may vary dependent on the workload arising for each type of program or computation), wherein limiting usage of the one or more graphics processors includes limiting threads for the one or more contexts (Paragraph [0025], each GPC 208 is capable of executing a large number (e.g., hundreds or thousands) of threads concurrently, where each thread is an instance of a program for performing different types of computations).

In the similar field of endeavor, Parker discloses “limiting threads for the one or more contexts” (Paragraph [0017], FIG. 2 illustrates a parallel processing unit (PPU) 200 … In one embodiment, the PPU 200 is configured to execute a plurality of threads concurrently in two or more streaming multi-processors (SMs) 250; paragraph [0023], in one embodiment, the PPU 200 comprises X SMs 250(X).  For example, the PPU 200 may include 15 distinct SMs 250.  Each SM 250 is multi-threaded and configured to execute a plurality of threads (e.g., 32 threads) from a particular thread block concurrently; paragraph [0025], in one embodiment, the PPU 200 comprises a graphics processing unit (GPU).  The PPU 200 is configured to receive commands that specify shader programs for processing graphics data.  Graphics data may be defined as a set of primitives such as points, lines, triangles, quads, triangle strips, and the like) to a specified portion of available threads of the one or more graphics processors (Paragraph [0026], for example, the TMU 215 may configure one or more SMs 250 to execute a vertex shader program that processes a number of vertices defined by the model data.  In one embodiment, the TMU 215 may configure different SMs 250 to execute different shader programs concurrently.  For example, a first subset of SMs 250 may be configured to execute a vertex shader program while a second subset of SMs 250 may be configured to execute a pixel shader program; paragraph [0023], for example, the PPU 200 may include 15 distinct SMs 250.  Each SM 250 is multi-threaded and configured to execute a plurality of threads (e.g., 32 threads) from a particular thread block concurrently. Thus, a first subset of SMs 250 specifics a portion of available threads for a vertex shader program), the specified portion being less than all available threads of the one or more graphics processors (Paragraph [0023], for example, the PPU 200 may include 15 distinct SMs 250.  Each SM 250 is multi-threaded and configured to execute a plurality of threads (e.g., 32 threads) from a particular thread block concurrently; paragraph [0026], a first subset of SMs 250 may be configured to execute a vertex shader program. Thus, the specified subset threads of the first subset of SMs 250 is less than all available threads of 15 distinct SMs 250).
COX and Parker are analogous art because both pertain to utilize the parallel computing program. It would have been obvious to one of ordinary skill in the art at the time of invention was made to modify the parallel computing program taught by COX incorporate the teachings of Parker, and utilizing the parallel processing method and applying the parallel computing program taught by Parker to determine the available resource within the parallel system and allocate the subset of available resource to execute the specified shader program. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify COX according to the relied-upon teachings of Parker to obtain the invention as specified in claim.

Regarding claim 22, the combination of COX in view of Parker discloses everything claimed as applied above (see claim 21).
However, COX does not specifically disclose further comprising: 
specifying the portion of available threads for the limitation on usage of the one or more graphics processors, wherein limiting threads for the one or more contexts to the specified portion of available threads includes limiting the one or more contexts to a subset of the one or more graphics processors, the subset being less than all available graphics processors.
In the similar field of endeavor, Parker discloses further comprising: 
specifying the portion of available threads (FIG. 2; paragraph [0026], for example, the TMU 215 may configure one or more SMs 250 to execute a vertex shader program that processes a number of vertices defined by the model data.  In one embodiment, the TMU 215 may configure different SMs 250 to execute different shader programs concurrently.  For example, a first subset of SMs 250 may be configured to execute a vertex shader program while a second subset of SMs 250 may be configured to execute a pixel shader program) for the limitation on usage of the one or more graphics processors (Paragraph [0017], the PPU 200 is configured to execute a plurality of threads concurrently in two or more streaming multi-processors (SMs) 250; paragraph [0023], in one embodiment, the PPU 200 comprises X SMs 250(X).  For example, the PPU 200 may include 15 distinct SMs 250.  Each SM 250 is multi-threaded and configured to execute a plurality of threads (e.g., 32 threads) from a particular thread block concurrently), wherein limiting threads for the one or more contexts (Paragraph [0025], in one embodiment, the PPU 200 comprises a graphics processing unit (GPU).  The PPU 200 is configured to receive commands that specify shader programs for processing graphics data.  Graphics data may be defined as a set of primitives such as points, lines, triangles, quads, triangle strips, and the like) to the specified portion of available threads (Paragraph [0026], for example, the TMU 215 may configure one or more SMs 250 to execute a vertex shader program that processes a number of vertices defined by the model data.  In one embodiment, the TMU 215 may configure different SMs 250 to execute different shader programs concurrently.  For example, a first subset of SMs 250 may be configured to execute a vertex shader program while a second subset of SMs 250 may be configured to execute a pixel shader program) includes limiting the one or more contexts to a subset of the one or more graphics processors (Paragraph [0026], for example, a first subset of SMs 250 may be configured to execute a vertex shader program while a second subset of SMs 250 may be configured to execute a pixel shader program), the subset being less than all available graphics processors (Paragraph [0023], in one embodiment, the PPU 200 comprises X SMs 250(X).  For example, the PPU 200 may include 15 distinct SMs 250; paragraph [0026], for example, a first subset of SMs 250 may be configured to execute a vertex shader program. Thus, the first subset of SMs 250 is less than all 15 distinct SMs 250).
COX and Parker are analogous art because both pertain to utilize the parallel computing program. It would have been obvious to one of ordinary skill in the art at the time of invention was made to modify the parallel computing program taught by COX incorporate the teachings of Parker, and utilizing the parallel processing method and 

	Regarding claim 28, COX discloses a system comprising: 
	a memory device (FIG. 1; system memory 104); and 
one or more processors (FIG. 2; paragraph [0021], PUs 202 operate as graphics processors) including one or more graphics processors (Paragraph [0025], PPU 202(0) includes a processing cluster array 230 that includes a number C of general processing clusters (GPCs) 208), the one or more processors to execute instruction stored on the memory device (Paragraph [0021], some or all of PPUs 202 in parallel processing subsystem 112 are graphics processors with rendering pipelines that can be configured to perform various tasks related to generating pixel data from graphics data supplied by CPU 102 and/or system memory 104 via memory bridge 105 and bus 113, interacting with local parallel processing memory 204 (which can be used as graphics memory including, e.g., a conventional frame buffer) to store and update pixel data, delivering pixel data to display device 110, and the like), wherein the instructions cause the one or more processors to perform operations comprising: 
Paragraph [0026], work distribution unit 200 may be configured to fetch the indices corresponding to the tasks and distribute the tasks in the parallel processing subsystem), the system including a plurality of clients (Paragraph [0026], GPCs 208 receive processing tasks to be executed via a work distribution unit 200, which receives commands defining processing tasks from front end unit 212.  Processing tasks include indices of data to be processed, e.g., surface (patch) data, primitive data, vertex data, and/or pixel data, as well as state parameters and commands defining how the data is to be processed (e.g., what program is to be executed)); and 
accessing scheduled resources of the system for the one or more contexts (Paragraph [0026], GPCs 208 receive processing tasks to be executed via a work distribution unit 200, which receives commands defining processing tasks from front end unit 212.  Processing tasks include indices of data to be processed, e.g., surface (patch) data, primitive data, vertex data, and/or pixel data, as well as state parameters and commands defining how the data is to be processed (e.g., what program is to be executed).  Work distribution unit 200 may be configured to fetch the indices corresponding to the tasks, or work distribution unit 200 may receive the indices from front end 212.  Front end 212 ensures that GPCs 208 are configured to a valid state before the processing specified by the pushbuffers is initiated), the resources being shared by the plurality of clients, wherein shared resources of the system include the one or more graphics processors (Paragraph [0025], each PPU 202 advantageously implements a highly parallel processing architecture and includes one or more GPCs 208; paragraph [0027], when PPU 202 is used for graphics processing, for example, the processing workload for each patch is divided into approximately equal sized tasks to enable distribution of the tessellation processing to multiple GPCs 208 … in some embodiments of the present invention, portions of GPCs 208 are configured to perform different types of processing); 
wherein access to the one or more graphics processors by the one or more contexts is subject to a limitation on usage set by the system (Paragraph [0025], as shown in detail, PPU 202(0) includes a processing cluster array 230 that includes a number C of general processing clusters (GPCs) 208, where C≥1; for example, in a graphics application, a first set of GPCs 208 may be allocated to perform tessellation operations and to produce primitive topologies for patches, and a second set of GPCs 208 may be allocated to perform tessellation shading to evaluate patch parameters for the primitive topologies and to determine vertex positions and other per-vertex attributes.  The allocation of GPCs 208 may vary dependent on the workload arising for each type of program or computation); and 
wherein the limitation on usage includes a limitation on threads of the one or more graphics processors for the one or more contexts (Paragraph [0025], each GPC 208 is capable of executing a large number (e.g., hundreds or thousands) of threads concurrently, where each thread is an instance of a program for performing different types of computations).
COX discloses that each different set of GPCs can be allocated to perform different types of computations. However, “the limitation on usage includes a limitation 
In the similar field of endeavor, Parker discloses “a limitation on threads of the one or more graphics processors” for the one or more contexts (Paragraph [0017], FIG. 2 illustrates a parallel processing unit (PPU) 200 … In one embodiment, the PPU 200 is configured to execute a plurality of threads concurrently in two or more streaming multi-processors (SMs) 250; paragraph [0023], in one embodiment, the PPU 200 comprises X SMs 250(X).  For example, the PPU 200 may include 15 distinct SMs 250.  Each SM 250 is multi-threaded and configured to execute a plurality of threads (e.g., 32 threads) from a particular thread block concurrently; paragraph [0025], in one embodiment, the PPU 200 comprises a graphics processing unit (GPU).  The PPU 200 is configured to receive commands that specify shader programs for processing graphics data.  Graphics data may be defined as a set of primitives such as points, lines, triangles, quads, triangle strips, and the like) to a specified portion of available threads (Paragraph [0026], for example, the TMU 215 may configure one or more SMs 250 to execute a vertex shader program that processes a number of vertices defined by the model data.  In one embodiment, the TMU 215 may configure different SMs 250 to execute different shader programs concurrently.  For example, a first subset of SMs 250 may be configured to execute a vertex shader program while a second subset of SMs 250 may be configured to execute a pixel shader program; paragraph [0023], for example, the PPU 200 may include 15 distinct SMs 250.  Each SM 250 is multi-threaded and configured to execute a plurality of threads (e.g., 32 threads) from a particular thread block concurrently. Thus, a first subset of SMs 250 specifics a portion of available threads for a vertex shader program), the specified portion being less than all available threads of the one or more graphics processors (Paragraph [0023], for example, the PPU 200 may include 15 distinct SMs 250.  Each SM 250 is multi-threaded and configured to execute a plurality of threads (e.g., 32 threads) from a particular thread block concurrently; paragraph [0026], a first subset of SMs 250 may be configured to execute a vertex shader program. Thus, the specified subset threads of the first subset of SMs 250 is less than all available threads of 15 distinct SMs 250).
COX and Parker are analogous art because both pertain to utilize the parallel computing program. It would have been obvious to one of ordinary skill in the art at the time of invention was made to modify the parallel computing program taught by COX incorporate the teachings of Parker, and utilizing the parallel processing method and applying the parallel computing program taught by Parker to determine the available resource within the parallel system and allocate the subset of available resource to execute the specified shader program. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify COX according to the relied-upon teachings of Parker to obtain the invention as specified in claim.

Regarding claim 29, the combination of COX in view of Parker discloses everything claimed as applied above (see claim 28).
However, COX does not specifically disclose wherein the portion of available threads for the limitation on usage of the one or more graphics processors is specified by the system and the limitation on usage limits the one or more contexts to a subset of the one or more graphics processors, the subset being less than all available graphics processors.
In the similar field of endeavor, Parker discloses wherein the portion of available threads (FIG. 2; paragraph [0026], for example, the TMU 215 may configure one or more SMs 250 to execute a vertex shader program that processes a number of vertices defined by the model data.  In one embodiment, the TMU 215 may configure different SMs 250 to execute different shader programs concurrently.  For example, a first subset of SMs 250 may be configured to execute a vertex shader program while a second subset of SMs 250 may be configured to execute a pixel shader program) for the limitation on usage of the one or more graphics processors is specified by the system (Paragraph [0017], the PPU 200 is configured to execute a plurality of threads concurrently in two or more streaming multi-processors (SMs) 250; paragraph [0023], in one embodiment, the PPU 200 comprises X SMs 250(X).  For example, the PPU 200 may include 15 distinct SMs 250.  Each SM 250 is multi-threaded and configured to execute a plurality of threads (e.g., 32 threads) from a particular thread block concurrently) and the limitation on usage limits the one or more contexts (Paragraph [0025], in one embodiment, the PPU 200 comprises a graphics processing unit (GPU).  The PPU 200 is configured to receive commands that specify shader programs for processing graphics data.  Graphics data may be defined as a set of primitives such as points, lines, triangles, quads, triangle strips, and the like) to a subset of the one or more graphics processors (Paragraph [0026], for example, the TMU 215 may configure one or more SMs 250 to execute a vertex shader program that processes a number of vertices defined by the model data.  In one embodiment, the TMU 215 may configure different SMs 250 to execute different shader programs concurrently.  For example, a first subset of SMs 250 may be configured to execute a vertex shader program while a second subset of SMs 250 may be configured to execute a pixel shader program), the subset being less than all available graphics processors (Paragraph [0023], in one embodiment, the PPU 200 comprises X SMs 250(X).  For example, the PPU 200 may include 15 distinct SMs 250; paragraph [0026], for example, a first subset of SMs 250 may be configured to execute a vertex shader program. Thus, the first subset of SMs 250 is less than all 15 distinct SMs 250).
COX and Parker are analogous art because both pertain to utilize the parallel computing program. It would have been obvious to one of ordinary skill in the art at the time of invention was made to modify the parallel computing program taught by COX incorporate the teachings of Parker, and utilizing the parallel processing method and applying the parallel computing program taught by Parker to determine the available resource within the parallel system and allocate the subset of available resource to execute the specified shader program. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify 

Regarding claim 35, COX discloses an apparatus comprising: 
a processing system (FIG. 1; paragraph [0017], a parallel processing subsystem 112, as shown in FIG. 2) including one or more processors (Paragraph [0020], parallel processing subsystem 112 includes one or more parallel processing units (PPUs) 202), the one or more processors including one or more graphics processors (Paragraph [0025], PPU 202(0) includes a processing cluster array 230 that includes a number C of general processing clusters (GPCs) 208); 
a scheduler to schedule shared resources in the system (FIG. 2 is a block diagram of a parallel processing subsystem for the computer system of FIG. 1; paragraph [0026], work distribution unit 200 may be configured to fetch the indices corresponding to the tasks and distribute the tasks in the parallel processing subsystem) for a plurality of contexts of clients of the processing system (Paragraph [0026], GPCs 208 receive processing tasks to be executed via a work distribution unit 200, which receives commands defining processing tasks from front end unit 212.  Processing tasks include indices of data to be processed, e.g., surface (patch) data, primitive data, vertex data, and/or pixel data, as well as state parameters and commands defining how the data is to be processed (e.g., what program is to be executed)); and 
wherein the processing system has a capability to limit usage of processing resources of the one or more graphics processors (Paragraph [0025], as shown in detail, PPU 202(0) includes a processing cluster array 230 that includes a number C of general processing clusters (GPCs) 208, where C≥1) by the plurality of contexts (Paragraph [0025], for example, in a graphics application, a first set of GPCs 208 may be allocated to perform tessellation operations and to produce primitive topologies for patches, and a second set of GPCs 208 may be allocated to perform tessellation shading to evaluate patch parameters for the primitive topologies and to determine vertex positions and other per-vertex attributes.  The allocation of GPCs 208 may vary dependent on the workload arising for each type of program or computation), wherein the limitation on usage of the one or more graphics processors includes to limit threads for one or more contexts of the plurality of contexts (Paragraph [0025], each GPC 208 is capable of executing a large number (e.g., hundreds or thousands) of threads concurrently, where each thread is an instance of a program for performing different types of computations) 
COX discloses that each different set of GPCs can be allocated to perform different types of computations. COX dose not specifically disclose “limit threads for one or more contexts of the plurality of contexts” to a specified App. No.: 17/143,8055/7Atty. Docket No. P116249-C2portion of available threads of the one or more graphics processors, the specified portion being less than all available threads of the one or more graphics processors.
In the similar field of endeavor, Parker discloses “limit threads for one or more contexts of the plurality of contexts” (Paragraph [0017], FIG. 2 illustrates a parallel processing unit (PPU) 200 … In one embodiment, the PPU 200 is configured to execute a plurality of threads concurrently in two or more streaming multi-processors (SMs) 250; paragraph [0023], in one embodiment, the PPU 200 comprises X SMs 250(X).  For example, the PPU 200 may include 15 distinct SMs 250.  Each SM 250 is multi-threaded and configured to execute a plurality of threads (e.g., 32 threads) from a particular thread block concurrently; paragraph [0025], in one embodiment, the PPU 200 comprises a graphics processing unit (GPU).  The PPU 200 is configured to receive commands that specify shader programs for processing graphics data.  Graphics data may be defined as a set of primitives such as points, lines, triangles, quads, triangle strips, and the like) to a specified App. No.: 17/143,8055/7Atty. Docket No. P116249-C2portion of available threads of the one or more graphics processors (Paragraph [0026], for example, the TMU 215 may configure one or more SMs 250 to execute a vertex shader program that processes a number of vertices defined by the model data.  In one embodiment, the TMU 215 may configure different SMs 250 to execute different shader programs concurrently.  For example, a first subset of SMs 250 may be configured to execute a vertex shader program while a second subset of SMs 250 may be configured to execute a pixel shader program; paragraph [0023], for example, the PPU 200 may include 15 distinct SMs 250.  Each SM 250 is multi-threaded and configured to execute a plurality of threads (e.g., 32 threads) from a particular thread block concurrently. Thus, a first subset of SMs 250 specifics a portion of available threads for a vertex shader program), the specified portion being less than all available threads of the one or more graphics processors (Paragraph [0023], for example, the PPU 200 may include 15 distinct SMs 250.  Each SM 250 is multi-threaded and configured to execute a plurality of threads (e.g., 32 threads) from a particular thread block concurrently; paragraph [0026], a first subset of SMs 250 may be configured to execute a vertex shader program. Thus, the specified subset threads of the first subset of SMs 250 is less than all available threads of 15 distinct SMs 250).
COX and Parker are analogous art because both pertain to utilize the parallel computing program. It would have been obvious to one of ordinary skill in the art at the time of invention was made to modify the parallel computing program taught by COX incorporate the teachings of Parker, and utilizing the parallel processing method and applying the parallel computing program taught by Parker to determine the available resource within the parallel system and allocate the subset of available resource to execute the specified shader program. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify COX according to the relied-upon teachings of Parker to obtain the invention as specified in claim.

Regarding claim 36, the combination of COX in view of Parker discloses everything claimed as applied above (see claim 35).
However, COX does not specifically disclose wherein the system is to specify the portion of available threads for the limitation on usage of the one or more graphics processors and the limitation on usage is to limit the one or more contexts to a subset of the one or more graphics processors, the subset being less than all available graphics processors.
	In the similar field of endeavor, Parker discloses wherein the system is to specify the portion of available threads (FIG. 2; paragraph [0026], for example, the TMU 215 may configure one or more SMs 250 to execute a vertex shader program that processes a number of vertices defined by the model data.  In one embodiment, the TMU 215 may configure different SMs 250 to execute different shader programs concurrently.  For example, a first subset of SMs 250 may be configured to execute a vertex shader program while a second subset of SMs 250 may be configured to execute a pixel shader program) for the limitation on usage of the one or more graphics processors (Paragraph [0017], the PPU 200 is configured to execute a plurality of threads concurrently in two or more streaming multi-processors (SMs) 250; paragraph [0023], in one embodiment, the PPU 200 comprises X SMs 250(X).  For example, the PPU 200 may include 15 distinct SMs 250.  Each SM 250 is multi-threaded and configured to execute a plurality of threads (e.g., 32 threads) from a particular thread block concurrently) and the limitation on usage is to limit the one or more contexts (Paragraph [0025], in one embodiment, the PPU 200 comprises a graphics processing unit (GPU).  The PPU 200 is configured to receive commands that specify shader programs for processing graphics data.  Graphics data may be defined as a set of primitives such as points, lines, triangles, quads, triangle strips, and the like) to a subset of the one or more graphics processors (Paragraph [0026], for example, the TMU 215 may configure one or more SMs 250 to execute a vertex shader program that processes a number of vertices defined by the model data.  In one embodiment, the TMU 215 may configure different SMs 250 to execute different shader programs concurrently.  For example, a first subset of SMs 250 may be configured to execute a vertex shader program while a second subset of SMs 250 may be configured to execute a pixel shader program), the subset being less than Paragraph [0023], in one embodiment, the PPU 200 comprises X SMs 250(X).  For example, the PPU 200 may include 15 distinct SMs 250; paragraph [0026], for example, a first subset of SMs 250 may be configured to execute a vertex shader program. Thus, the first subset of SMs 250 is less than all 15 distinct SMs 250).
COX and Parker are analogous art because both pertain to utilize the parallel computing program. It would have been obvious to one of ordinary skill in the art at the time of invention was made to modify the parallel computing program taught by COX incorporate the teachings of Parker, and utilizing the parallel processing method and applying the parallel computing program taught by Parker to determine the available resource within the parallel system and allocate the subset of available resource to execute the specified shader program. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify COX according to the relied-upon teachings of Parker to obtain the invention as specified in claim.

	
Claims 23-25, 30-32 and 37-38 are rejected under 35 U.S.C. 103 as being unpatentable over COX (U.S. Patent Application Publication 2014/0152848 A1) in view of Parker et al (U.S. Patent Application Publication 2015/0145871 A1) in view of ZHU et al (U.S. Patent Application Publication 2018/0210732 A1).

	Regarding claim 23, the combination of COX in view of Parker discloses everything claimed as applied above (see claim 22). 

In the similar field of endeavor, ZHU discloses (FIG. 1; paragraphs [0014-[0017], the GPU 104 includes a plurality of compute units 122 (also known as stream processors, cores, or single-instruction-multiple-data (SIMD) engines), including the illustrated compute units 122(1), 122(2), and 122(N) …each compute unit 122 includes a thread dispatch controller 132 and a plurality of arithmetic logic units (ALUs) 134, such as the depicted ALUs 134(1), 134(2), and 134(N), supported by a register file 136 comprising a plurality of physical GPRs 138, such as the depicted GPRs 138(1), 138(2), and 138(N) …As a general operational overview, a shader or other compute kernel dispatched to the compute unit 122 illustrated by detailed view 130 is received at the thread dispatch controller 132, which in turn dispatches a corresponding thread to each of the ALUs 134, with the totality of threads dispatched concurrently referred to as a wavefront of L threads, with L representing the number of threads in the wavefront; FIG. 2; paragraph [0019], before initiating execution of the wavefront of threads, the GPR allocator 142 determines various GPR-usage metrics for the threads of the wavefront.  In particular, at block 204, the GPR allocator 142 determines the total number of GPRs that will be used or requested by the threads of the wavefront during its entire allocation, this number being identified herein as value "N".  At block 206, the GPR allocator 142 determines the maximum number of GPRs that are to be used by the threads of the wavefront concurrently, that is the maximum number of GPRs that are to be employed simultaneously by the threads at the compute unit 122.  This maximum number is identified herein as value "M".  At block 208, the GPR allocator 142 determines the minimum number of GPRs required to allow each and every thread of the wavefront (that is, all L threads) to initiate execution.  This number is identified herein as value "K".  Note that these GPU-usage metrics may be determined in another order, or may be determined concurrently.  Further, the GPR allocator 142 is informed of the number of physical GPRs 138 implemented at the compute unit 122, with this number being identified herein as value "S") wherein usage of the one or more graphics processors by one or more contexts is limited in part to increase utilization of the one or more graphics processors (Paragraph [0022], at block 210, the GPR allocator 142 determines the relationships between the GPR-usage metrics and the number of physical GPRs 138 by comparing these values with each other; paragraph [0024], conversely, in response to determining that the number of physical GPRs 138 is less than the maximum number of GPRs used concurrently but greater than or equal to the number of GPRs required to initiate execution of all of the threads of the wavefront (that is, M&gt;S&gt;=K), then the GPR allocator 142 sets the compute unit 122 to a thread initialization allocation mode (represented by block 216) in which each thread of the wavefront is allocated a number of physical GPRs 138 required to initiate execution of the thread (that is, K/L physical GPRs 138) and then execution of the threads of the wavefront initiates or otherwise commences with the initial set of allocated physical GPRs 138.  Additional physical GPRs 138 then may be dynamically on an as-needed basis as execution of the threads progresses.  An example of the thread initialization allocation mode is described in greater detail below with reference to FIG. 4; paragraph [0030], if there is at least one unallocated and unassigned physical GPR 138, at block 414 the GPR allocator 142 dynamically allocates a selected unallocated and unassigned physical GPR 138 to the requesting thread).
COX and ZHU are analogous art because both pertain to utilize the parallel computing program. It would have been obvious to one of ordinary skill in the art at the time of invention was made to modify the parallel computing program taught by COX incorporate the teachings of ZHU, and utilizing the parallel processing method and applying the method for selective allocation resource taught by ZHU to monitor the allocated resource within the parallel system; deallocate the resource when the instruction has completed execution and becomes available for reallocation. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify COX according to the relied-upon teachings of ZHU to obtain the invention as specified in claim.

	Regarding claim 24, the combination of COX in view of Parker in view of ZHU discloses everything claimed as applied above (see claim 23). 
However, COX does not specifically disclose further comprising monitoring utilization of the one or more graphics processors.
In the similar field of endeavor, ZHU discloses further comprising monitoring utilization of the one or more graphics processors (FIGS. 1 and 2; paragraph [0025], the GPR monitor 144 monitors the allocation and deallocation of physical GPRs 138 by the GPR allocator 142 during wavefront execution by monitoring updates to the free list 148 made by the GPR allocator 142 as physical GPRs 138 are allocated and deallocated).
COX and ZHU are analogous art because both pertain to utilize the parallel computing program. It would have been obvious to one of ordinary skill in the art at the time of invention was made to modify the parallel computing program taught by COX incorporate the teachings of ZHU, and utilizing the parallel processing method and applying the method for selective allocation resource taught by ZHU to monitor the allocated resource within the parallel system; deallocate the resource when the instruction has completed execution and becomes available for reallocation. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify COX according to the relied-upon teachings of ZHU to obtain the invention as specified in claim.

Regarding claim 25, the combination of COX in view of Parker in view of ZHU discloses everything claimed as applied above (see claim 23). 
However, COX does not specifically disclose additionally comprising adjusting a limitation on threads for the one or more contexts based in part on a control target and a demand from a scheduler.
In the similar field of endeavor, ZHU discloses additionally comprising adjusting a limitation on threads for the one or more contexts based in part on a control target and a demand from a scheduler (FIGS. 1 and 2; paragraph [0025], the GPR monitor 144 monitors the allocation and deallocation of physical GPRs 138 by the GPR allocator 142 during wavefront execution by monitoring updates to the free list 148 made by the GPR allocator 142 as physical GPRs 138 are allocated and deallocated).  
COX and ZHU are analogous art because both pertain to utilize the parallel computing program. It would have been obvious to one of ordinary skill in the art at the time of invention was made to modify the parallel computing program taught by COX incorporate the teachings of ZHU, and utilizing the parallel processing method and applying the method for selective allocation resource taught by ZHU to monitor the allocated resource within the parallel system; deallocate the resource when the instruction has completed execution and becomes available for reallocation. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify COX according to the relied-upon teachings of ZHU to obtain the invention as specified in claim.

	Regarding claim 30, the combination of COX in view of Parker discloses everything claimed as applied above (see claim 29). 
However, COX does not specifically disclose wherein the limitation on threads for the one or more contexts is provided at least in part to increase utilization of the one or more graphics processors.
In the similar field of endeavor, ZHU discloses (FIG. 1; paragraphs [0014-[0017], the GPU 104 includes a plurality of compute units 122 (also known as stream processors, cores, or single-instruction-multiple-data (SIMD) engines), including the illustrated compute units 122(1), 122(2), and 122(N) …each compute unit 122 includes a thread dispatch controller 132 and a plurality of arithmetic logic units (ALUs) 134, such as the depicted ALUs 134(1), 134(2), and 134(N), supported by a register file 136 comprising a plurality of physical GPRs 138, such as the depicted GPRs 138(1), 138(2), and 138(N) …As a general operational overview, a shader or other compute kernel dispatched to the compute unit 122 illustrated by detailed view 130 is received at the thread dispatch controller 132, which in turn dispatches a corresponding thread to each of the ALUs 134, with the totality of threads dispatched concurrently referred to as a wavefront of L threads, with L representing the number of threads in the wavefront; FIG. 2; paragraph [0019], before initiating execution of the wavefront of threads, the GPR allocator 142 determines various GPR-usage metrics for the threads of the wavefront.  In particular, at block 204, the GPR allocator 142 determines the total number of GPRs that will be used or requested by the threads of the wavefront during its entire allocation, this number being identified herein as value "N".  At block 206, the GPR allocator 142 determines the maximum number of GPRs that are to be used by the threads of the wavefront concurrently, that is the maximum number of GPRs that are to be employed simultaneously by the threads at the compute unit 122.  This maximum number is identified herein as value "M".  At block 208, the GPR allocator 142 determines the minimum number of GPRs required to allow each and every thread of the wavefront (that is, all L threads) to initiate execution.  This number is identified herein as value "K".  Note that these GPU-usage metrics may be determined in another order, or may be determined concurrently.  Further, the GPR allocator 142 is informed of the number of physical GPRs 138 implemented at the compute unit 122, with this number being identified herein as value "S") wherein the limitation on threads for the one or more contexts is provided at least in part to increase utilization of the one or more graphics processors (Paragraph [0022], at block 210, the GPR allocator 142 determines the relationships between the GPR-usage metrics and the number of physical GPRs 138 by comparing these values with each other; paragraph [0024], conversely, in response to determining that the number of physical GPRs 138 is less than the maximum number of GPRs used concurrently but greater than or equal to the number of GPRs required to initiate execution of all of the threads of the wavefront (that is, M&gt;S&gt;=K), then the GPR allocator 142 sets the compute unit 122 to a thread initialization allocation mode (represented by block 216) in which each thread of the wavefront is allocated a number of physical GPRs 138 required to initiate execution of the thread (that is, K/L physical GPRs 138) and then execution of the threads of the wavefront initiates or otherwise commences with the initial set of allocated physical GPRs 138.  Additional physical GPRs 138 then may be dynamically on an as-needed basis as execution of the threads progresses.  An example of the thread initialization allocation mode is described in greater detail below with reference to FIG. 4; paragraph [0030], if there is at least one unallocated and unassigned physical GPR 138, at block 414 the GPR allocator 142 dynamically allocates a selected unallocated and unassigned physical GPR 138 to the requesting thread).
COX and ZHU are analogous art because both pertain to utilize the parallel computing program. It would have been obvious to one of ordinary skill in the art at the 

Regarding claim 31, the combination of COX in view of Parker in view of ZHU discloses everything claimed as applied above (see claim 30). 
However, COX does not specifically disclose wherein utilization of the one or more graphics processors is monitored by the system.
In the similar field of endeavor, ZHU discloses wherein utilization of the one or more graphics processors is monitored by the system (FIGS. 1 and 2; paragraph [0025], the GPR monitor 144 monitors the allocation and deallocation of physical GPRs 138 by the GPR allocator 142 during wavefront execution by monitoring updates to the free list 148 made by the GPR allocator 142 as physical GPRs 138 are allocated and deallocated).
COX and ZHU are analogous art because both pertain to utilize the parallel computing program. It would have been obvious to one of ordinary skill in the art at the time of invention was made to modify the parallel computing program taught by COX incorporate the teachings of ZHU, and utilizing the parallel processing method and 

Regarding claim 32, the combination of COX in view of Parker in view of ZHU discloses everything claimed as applied above (see claim 30). 
However, COX does not specifically disclose the operations additionally comprising adjusting the limitation on threads for the one or more contexts based in part on a control target and a demand from a scheduler.
In the similar field of endeavor, ZHU discloses the operations additionally comprising adjusting the limitation on threads for the one or more contexts based in part on a control target and a demand from a scheduler (FIGS. 1 and 2; paragraph [0025], the GPR monitor 144 monitors the allocation and deallocation of physical GPRs 138 by the GPR allocator 142 during wavefront execution by monitoring updates to the free list 148 made by the GPR allocator 142 as physical GPRs 138 are allocated and deallocated).  
COX and ZHU are analogous art because both pertain to utilize the parallel computing program. It would have been obvious to one of ordinary skill in the art at the time of invention was made to modify the parallel computing program taught by COX incorporate the teachings of ZHU, and utilizing the parallel processing method and 

Regarding claim 37, the combination of COX in view of Parker discloses everything claimed as applied above (see claim 36). 
However, COX does not specifically disclose wherein the limitation on threads for the one or more contexts is provided by the system at least in part to increase utilization of the one or more graphics processors.
In the similar field of endeavor, ZHU discloses (FIG. 1; paragraphs [0014-[0017], the GPU 104 includes a plurality of compute units 122 (also known as stream processors, cores, or single-instruction-multiple-data (SIMD) engines), including the illustrated compute units 122(1), 122(2), and 122(N) …each compute unit 122 includes a thread dispatch controller 132 and a plurality of arithmetic logic units (ALUs) 134, such as the depicted ALUs 134(1), 134(2), and 134(N), supported by a register file 136 comprising a plurality of physical GPRs 138, such as the depicted GPRs 138(1), 138(2), and 138(N) …As a general operational overview, a shader or other compute kernel dispatched to the compute unit 122 illustrated by detailed view 130 is received at the thread dispatch controller 132, which in turn dispatches a corresponding thread to each of the ALUs 134, with the totality of threads dispatched concurrently referred to as a wavefront of L threads, with L representing the number of threads in the wavefront; FIG. 2; paragraph [0019], before initiating execution of the wavefront of threads, the GPR allocator 142 determines various GPR-usage metrics for the threads of the wavefront.  In particular, at block 204, the GPR allocator 142 determines the total number of GPRs that will be used or requested by the threads of the wavefront during its entire allocation, this number being identified herein as value "N".  At block 206, the GPR allocator 142 determines the maximum number of GPRs that are to be used by the threads of the wavefront concurrently, that is the maximum number of GPRs that are to be employed simultaneously by the threads at the compute unit 122.  This maximum number is identified herein as value "M".  At block 208, the GPR allocator 142 determines the minimum number of GPRs required to allow each and every thread of the wavefront (that is, all L threads) to initiate execution.  This number is identified herein as value "K".  Note that these GPU-usage metrics may be determined in another order, or may be determined concurrently.  Further, the GPR allocator 142 is informed of the number of physical GPRs 138 implemented at the compute unit 122, with this number being identified herein as value "S") wherein the limitation on threads for the one or more contexts is provided by the system at least in part to increase utilization of the one or more graphics processors (Paragraph [0022], at block 210, the GPR allocator 142 determines the relationships between the GPR-usage metrics and the number of physical GPRs 138 by comparing these values with each other; paragraph [0024], conversely, in response to determining that the number of physical GPRs 138 is less than the maximum number of GPRs used concurrently but greater than or equal to the number of GPRs required to initiate execution of all of the threads of the wavefront (that is, M&gt;S&gt;=K), then the GPR allocator 142 sets the compute unit 122 to a thread initialization allocation mode (represented by block 216) in which each thread of the wavefront is allocated a number of physical GPRs 138 required to initiate execution of the thread (that is, K/L physical GPRs 138) and then execution of the threads of the wavefront initiates or otherwise commences with the initial set of allocated physical GPRs 138.  Additional physical GPRs 138 then may be dynamically on an as-needed basis as execution of the threads progresses.  An example of the thread initialization allocation mode is described in greater detail below with reference to FIG. 4; paragraph [0030], if there is at least one unallocated and unassigned physical GPR 138, at block 414 the GPR allocator 142 dynamically allocates a selected unallocated and unassigned physical GPR 138 to the requesting thread).
COX and ZHU are analogous art because both pertain to utilize the parallel computing program. It would have been obvious to one of ordinary skill in the art at the time of invention was made to modify the parallel computing program taught by COX incorporate the teachings of ZHU, and utilizing the parallel processing method and applying the method for selective allocation resource taught by ZHU to monitor the allocated resource within the parallel system; deallocate the resource when the instruction has completed execution and becomes available for reallocation. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective 

Regarding claim 38, the combination of COX in view of Parker in view of ZHU discloses everything claimed as applied above (see claim 37). 
However, COX does not specifically disclose wherein the system is further to monitor utilization of the one or more graphics processors.
In the similar field of endeavor, ZHU discloses wherein the system is further to monitor utilization of the one or more graphics processors (FIGS. 1 and 2; paragraph [0025], the GPR monitor 144 monitors the allocation and deallocation of physical GPRs 138 by the GPR allocator 142 during wavefront execution by monitoring updates to the free list 148 made by the GPR allocator 142 as physical GPRs 138 are allocated and deallocated).
COX and ZHU are analogous art because both pertain to utilize the parallel computing program. It would have been obvious to one of ordinary skill in the art at the time of invention was made to modify the parallel computing program taught by COX incorporate the teachings of ZHU, and utilizing the parallel processing method and applying the method for selective allocation resource taught by ZHU to monitor the allocated resource within the parallel system; deallocate the resource when the instruction has completed execution and becomes available for reallocation. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify COX according to the relied-upon teachings of ZHU to obtain the invention as specified in claim.

Claims 26-27, 33-34 and 39-40 are rejected under 35 U.S.C. 103 as being unpatentable over COX (U.S. Patent Application Publication 2014/0152848 A1) in view of Parker et al (U.S. Patent Application Publication 2015/0145871 A1) in view of Pajak et al (U.S. Patent Application Publication 2015/0206504 A1).

Regarding claim 26, the combination of COX in view of Parker discloses everything claimed as applied above (see claim 21). 
However, COX does not specifically disclose wherein the one or more graphics processors include a single instruction multiple thread (SIMT) architecture.
In the similar field of endeavor, Pajak discloses wherein the one or more graphics processors (Paragraph [0046], FIG. 2 is a block diagram of an example of a computer system 100 capable of implementing embodiments according to the present invention.  In one embodiment, the integrated, end-to-end optimization framework for image reconstruction of the present invention may be implemented on a GPU 135; paragraph [0051], the GPU 135 generates pixel data for output images from rendering commands.  The physical GPU 135 can be configured as multiple virtual GPUs that may be used in parallel (concurrently) by a number of applications or processes executing in parallel) include a single instruction multiple thread (SIMT) architecture (Paragraph [0117], for example, a GPU processor can have 12 groups of 16 stream processors (cores).  Each core can execute a sequential thread, but typically the cores execute in a SIMT (Single Instruction, Multiple Thread) fashion; all cores in the same group can execute the same instruction at the same time).
COX and Pajak are analogous art because both pertain to utilize the parallel computing program. COX discloses that the parallel processing subsystem includes one or more parallel processing units. It would have been obvious to one of ordinary skill in the art at the time of invention was made to combine the work of COX and Pajak to support the single instruction multiple thread (SIMT) architecture. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify COX according to the relied-upon teachings of Pajak to obtain the invention as specified in claim.

Regarding claim 27, the combination of COX in view of Parker in view of Pajak discloses everything claimed as applied above (see claim 26). 
However, COX does not specifically disclose wherein the SIMT architecture includes hardware multithreading.
In the similar field of endeavor, Pajak discloses wherein the SIMT architecture includes hardware multithreading (Paragraph [0117], GPUs typically have a number of multiprocessors, each of which execute in parallel with the others.  For example, a GPU processor can have 12 groups of 16 stream processors (cores).  Each core can execute a sequential thread, but typically the cores execute in a SIMT (Single Instruction, Multiple Thread) fashion; all cores in the same group can execute the same instruction at the same time).


Regarding claim 33, the combination of COX in view of Parker discloses everything claimed as applied above (see claim 28). 
However, COX does not specifically disclose wherein the one or more graphics processors include a single instruction multiple thread (SIMT) architecture.
In the similar field of endeavor, Pajak discloses wherein the one or more graphics processors (Paragraph [0046], FIG. 2 is a block diagram of an example of a computer system 100 capable of implementing embodiments according to the present invention.  In one embodiment, the integrated, end-to-end optimization framework for image reconstruction of the present invention may be implemented on a GPU 135; paragraph [0051], the GPU 135 generates pixel data for output images from rendering commands.  The physical GPU 135 can be configured as multiple virtual GPUs that may be used in parallel (concurrently) by a number of applications or processes executing in parallel) include a single instruction multiple thread (SIMT) architecture (Paragraph [0117], for example, a GPU processor can have 12 groups of 16 stream processors (cores).  Each core can execute a sequential thread, but typically the cores execute in a SIMT (Single Instruction, Multiple Thread) fashion; all cores in the same group can execute the same instruction at the same time).
COX and Pajak are analogous art because both pertain to utilize the parallel computing program. COX discloses that the parallel processing subsystem includes one or more parallel processing units. It would have been obvious to one of ordinary skill in the art at the time of invention was made to combine the work of COX and Pajak to support the single instruction multiple thread (SIMT) architecture. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify COX according to the relied-upon teachings of Pajak to obtain the invention as specified in claim.

Regarding claim 34, the combination of COX in view of Parker in view of Pajak discloses everything claimed as applied above (see claim 33). 
However, COX does not specifically disclose wherein the SIMT architecture includes hardware multithreading.
In the similar field of endeavor, Pajak discloses wherein the SIMT architecture includes hardware multithreading (Paragraph [0117], GPUs typically have a number of multiprocessors, each of which execute in parallel with the others.  For example, a GPU processor can have 12 groups of 16 stream processors (cores).  Each core can execute a sequential thread, but typically the cores execute in a SIMT (Single Instruction, Multiple Thread) fashion; all cores in the same group can execute the same instruction at the same time).
COX and Pajak are analogous art because both pertain to utilize the parallel computing program. COX discloses that the parallel processing subsystem includes one or more parallel processing units. It would have been obvious to one of ordinary skill in the art at the time of invention was made to combine the work of COX and Pajak to support the single instruction multiple thread (SIMT) architecture. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify COX according to the relied-upon teachings of Pajak to obtain the invention as specified in claim.

Regarding claim 39, the combination of COX in view of Parker discloses everything claimed as applied above (see claim 35). 
However, COX does not specifically disclose wherein the one or more graphics processors include a single instruction multiple thread (SIMT) architecture.
In the similar field of endeavor, Pajak discloses wherein the one or more graphics processors (Paragraph [0046], FIG. 2 is a block diagram of an example of a computer system 100 capable of implementing embodiments according to the present invention.  In one embodiment, the integrated, end-to-end optimization framework for image reconstruction of the present invention may be implemented on a GPU 135; paragraph [0051], the GPU 135 generates pixel data for output images from rendering commands.  The physical GPU 135 can be configured as multiple virtual GPUs that may be used in parallel (concurrently) by a number of applications or processes executing in parallel) include a single instruction multiple thread (SIMT) architecture (Paragraph [0117], for example, a GPU processor can have 12 groups of 16 stream processors (cores).  Each core can execute a sequential thread, but typically the cores execute in a SIMT (Single Instruction, Multiple Thread) fashion; all cores in the same group can execute the same instruction at the same time).
COX and Pajak are analogous art because both pertain to utilize the parallel computing program. COX discloses that the parallel processing subsystem includes one or more parallel processing units. It would have been obvious to one of ordinary skill in the art at the time of invention was made to combine the work of COX and Pajak to support the single instruction multiple thread (SIMT) architecture. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify COX according to the relied-upon teachings of Pajak to obtain the invention as specified in claim.

Regarding claim 40, the combination of COX in view of Parker in view of Pajak discloses everything claimed as applied above (see claim 39). 
However, COX does not specifically disclose wherein the SIMT architecture includes hardware multithreading.
In the similar field of endeavor, Pajak discloses wherein the SIMT architecture includes hardware multithreading (Paragraph [0117], GPUs typically have a number of multiprocessors, each of which execute in parallel with the others.  For example, a GPU processor can have 12 groups of 16 stream processors (cores).  Each core can execute a sequential thread, but typically the cores execute in a SIMT (Single Instruction, Multiple Thread) fashion; all cores in the same group can execute the same instruction at the same time).
COX and Pajak are analogous art because both pertain to utilize the parallel computing program. COX discloses that the parallel processing subsystem includes one or more parallel processing units. It would have been obvious to one of ordinary skill in the art at the time of invention was made to combine the work of COX and Pajak to support the single instruction multiple thread (SIMT) architecture. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify COX according to the relied-upon teachings of Pajak to obtain the invention as specified in claim.

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Gregory J Tryder can be reached on 571-270-7365. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/XILIN GUO/Primary Examiner, Art Unit 2616