DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 7, 16, and 18-23 have been amended.
Claims 1-23 have been examined.

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on December 8, 2020 has been entered.

Information Disclosure Statement
The Applicant's submission of the Information Disclosure Statements dated January 7, 2021 and January 25, 2021 is acknowledged by the examiner and the cited references have, except where otherwise indicated, been considered in the examination of the claims now pending. The citation in the January 25, 2021 IDS of the communication regarding the EP application does not comply with 37 CFR 1.98(a)(2), which requires that a legible copy of documents other than U.S. patents and U.S. applications be submitted, as no copy of the document was received. Copies of the PTOL-1449s initialed and dated by the Examiner are attached to the instant office action.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:



Claims 7 and 16 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the Applicant regards as the invention.
Claim 7 recites, at lines 3-4, “an operation from a set of operations including a 16-bit and a 32-bit floating point operation.” This is an improper Markush claim, as explained at MPEP § 2173.05(h), which states, “A Markush grouping is a closed group of alternatives, i.e., the selection is made from a group "consisting of" (rather than "comprising" or "including") the alternative members. Abbott Labs., 334 F.3d at 1280, 67 USPQ2d at 1196. If a Markush grouping requires a material selected from an open list of alternatives (e.g., selected from the group "comprising" or "consisting essentially of" the recited alternatives), the claim should generally be rejected under 35 U.S.C. 112(b)  as indefinite because it is unclear what other alternatives are intended to be encompassed by the claim.” For purposes of examination, the claim is interpreted as though the claim read, “an operation from a set of operations consisting of a 16-bit and a 32-bit floating point operation.” Claim 16 has similar language and is similarly rejected. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-7, 9, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over “NVIDIA’s Next Generation CUDA Compute Architecture: Fermi” (hereinafter referred to as “Fermi”) in view of NVIDIA Tesla P100 (hereinafter referred to as “Tesla”). 
Regarding claim 1, Fermi discloses:
a general-purpose graphics processing unit comprising: a streaming multiprocessor having a single instruction, multiple thread (SIMT) architecture including hardware multithreading, wherein the streaming multiprocessor comprises: (Fermi discloses, at page 7, second paragraph, a GPU comprising 16 streaming multiprocessors. As described at page 4, Fermi is based on SIMT execution model. Fermi also discloses, at page 7, first paragraph, the threads are executed by cores, which are hardware, thus disclosing hardware multithreading.); 
a first processing block including a first processing core having a first floating-point data path and a second processing core having a first integer data path, the first integer data path independent of the first floating-point data path and the first processing block additionally including a first register file coupled with the first processing core and the second processing core, wherein the first integer data path is to enable execution of a first instruction and the first floating-point data path is to enable execution of a second instruction, the first instruction to be executed concurrently with the second instruction (Fermi discloses, at page 10, first paragraph, two groups, each having 16 cores, a scheduler, and an instruction dispatch unit. This discloses the first processing block and first and second processing cores. As disclosed at page 8, second paragraph, each core has a floating point unit (path) and an integer unit (path) to execute instructions, which discloses the first floating point path and the first integer path. Fermi discloses, at the Figure on page 8 a register file coupled to both blocks of cores. As disclosed at page 10, second paragraph, integer and floating point instructions can be concurrently executed.); 
a second processing block including a third processing core having a second floating-point data path and a fourth processing core having a second integer data path, the second integer data path independent of the second floating- point data path…wherein the second integer data path is to enable execution of a third instruction and the second floating-point data path is to enable execution of a fourth instruction, the third instruction to be executed concurrently with the fourth instruction (Fermi discloses, at page 10, first paragraph, two groups, each having 16 cores, a scheduler, and an instruction dispatch unit. This discloses the second processing block and third and fourth processing cores. As disclosed at page 8, second paragraph, each core has a floating point unit (path) and an integer unit (path) to execute instructions, which discloses the second floating point path and the second integer path. As disclosed at page 10, second paragraph, integer and floating point instructions can be concurrently executed.); and 
a memory coupled with the first processing block and the second processing block (Fermi discloses, at the Figure on page 8 and description on pages 10-11, a shared memory coupled to the blocks.).
Fermi does not explicitly disclose the second processing block additionally including a second register file coupled with the third processing core and the fourth processing core, the first register file different from the second register file.
However, in the same field of endeavor (e.g., GPUs) Tesla discloses:
the second processing block additionally including a second register file coupled with the third processing core and the fourth processing core, the first register file different from the second register file (Tesla discloses, at pages 12-13, Figure 8, two register files, each associated with different groups of cores.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Fermi to include two register files, as taught by Tesla, because doing so merely entails a combination of prior art elements (cited above) according to known methods to yield predictable results, which is an exemplary rationale to support a conclusion of obviousness, as per MPEP § 2143. 

Regarding claim 2, Fermi discloses the elements of claim 1, as discussed above. Fermi also discloses:
wherein the memory is shared between the first processing block and the second processing block (Fermi discloses, at the Figure on page 8 and description on pages 10-11, a shared memory coupled to the blocks.).

Regarding claim 3, Fermi discloses the elements of claim 2, as discussed above. Fermi also discloses:
(Fermi discloses, at the Figure on page 8 and description on pages 10-11, the shared memory coupled to the blocks includes cache accessible to the cores.).

Regarding claim 4, Fermi discloses the elements of claim 1, as discussed above. Fermi also discloses:
the streaming multiprocessor additionally comprising one or more hardware schedulers to schedule a first group of threads to at least the first processing core and the third processing core (Fermi discloses, at the Figure on page 8 and description on page 10-11, dual schedulers to schedule threads to the cores.).

Regarding claim 5, Fermi discloses the elements of claim 4, as discussed above. Fermi also discloses:
the one or more hardware schedulers additionally to schedule a second group of threads to at least the second processing core and the fourth processing core (Fermi discloses, at the Figure on page 8 and description on page 10-11, dual schedulers to schedule threads to both groups of cores.).

Regarding claim 6, Fermi discloses the elements of claim 5, as discussed above. Fermi also discloses:
the first group of threads is associated with the first instruction; and the second group of threads is associated with the second instruction (Fermi discloses, at page 5, that instructions are associated with warps, which are, as disclosed at page 7, groups of threads. 

Regarding claim 7, Fermi discloses the elements of claim 1, as discussed above. Fermi also discloses:
wherein the first processing core is to perform a 64-bit floating-point operation and the third processing core is to perform an operation selected from a set of operations including…a 32-bit floating-point operation (Fermi discloses, at page 8, second paragraph, each core can perform single or double precision floating point operations, which discloses the first core performing double (64 bit) and the third core performing single (32 bit).).
Fermi does not explicitly disclose performing a 16-bit operation.
However, in the same field of endeavor (e.g., GPUs) Tesla discloses:
performing 16-bit operations (Tesla discloses, at page 12, performing 16-bit operations.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Fermi to include 16-bit operations, as taught by Tesla, because doing so enables increased operation throughput. See Tesla, page 12. 

Regarding claim 9, Fermi discloses:
a graphics processor system comprising: a graphics double data rate memory; and a general-purpose graphics processor coupled with the graphics double data rate memory via two or more memory controllers, the general-purpose graphics processor comprising a hardware multithreading compute unit having a single instruction, multiple thread (SIMT) architecture, wherein the hardware multithreading compute unit comprises: (Fermi discloses, at page 7, second paragraph, a GPU comprising 16 streaming multiprocessors and GDDR5 DRAM, which is high speed DDR memory. The memory includes 6 partitions coupled via an interface. Each portion of the interface dedicated to a respective partition is considered a respective controller. As described at page 4, Fermi is based on SIMT execution model. Fermi also discloses, at page 7, first paragraph, the threads are executed by cores, which are hardware, thus disclosing hardware multithreading.);
a first register file and a first hardware scheduler, the first register file and the first hardware scheduler each coupled with a first processing core and a second processing core, the first processing core having a first floating-point data path and the second processing core having a first integer data path, the first integer data path independent of the first floating-point data path, wherein the first integer data path is to enable execution of a first instruction and the first floating- point data path is to enable execution of a second instruction, the first instruction to be executed concurrently with the second instruction (Fermi discloses, at the Figure on page 8 a register file coupled to both blocks of cores. Fermi also discloses, at the Figure on page 8 and description on page 10-11, dual schedulers to schedule threads to the cores. Fermi also discloses, at page 10, first paragraph, two groups, each having 16 cores, a scheduler, and an instruction dispatch unit. This discloses the first processing block and first and second processing cores. As disclosed at page 8, second paragraph, each core has a floating point unit (path) and an integer unit (path) to execute instructions, which discloses the first floating point path and the first integer path. As disclosed at page 10, second paragraph, integer and floating point instructions can be concurrently executed.); 
…a second hardware scheduler different from the first hardware scheduler…the second hardware scheduler…coupled with a third processing core and a fourth processing core, the third processing core having a second floating- point data path and the fourth processing core having a second integer data path, the second integer data path independent of the second floating-point data path, wherein the second integer data path is to enable execution of a third instruction and the second floating-point data path is to enable execution of a fourth instruction, the third instruction to be executed concurrently with the fourth instruction (Fermi discloses, at the Figure on page 8 a register file coupled to both blocks of cores. As described on page 6, first paragraph, each thread has registers. This discloses logically dividing the register file in to multiple different register files such that each block has its own register file. Fermi also discloses, at the Figure on page 8 and description on page 10-11, dual schedulers to schedule threads to the cores. Fermi also discloses, at page 10, first paragraph, two groups, each having 16 cores, a scheduler, and an instruction dispatch unit. This discloses the second processing block and third and fourth processing cores. As disclosed at page 8, second paragraph, each core has a floating point unit (path) and an integer unit (path) to execute instructions, which discloses the second floating point path and the second integer path. As disclosed at page 10, second paragraph, integer and floating point instructions can be concurrently executed.); and 
an internal memory coupled with the first processing core, the second processing core, the third processing core, and the fourth processing core (Fermi discloses, at the Figure on page 8 and description on pages 10-11, a shared memory coupled to the blocks. The shared memory is internal.).
Fermi does not explicitly disclose a second register file different from the first register file coupled with the third and fourth processing cores.
However, in the same field of endeavor (e.g., GPUs) Tesla discloses:
 (Tesla discloses, at pages 12-13, Figure 8, two register files, each associated with different groups of cores.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Fermi to include two register files, as taught by Tesla, because doing so merely entails a combination of prior art elements (cited above) according to known methods to yield predictable results, which is an exemplary rationale to support a conclusion of obviousness, as per MPEP § 2143. 

Regarding claim 18, Fermi discloses:
a method executed on a general-purpose graphics processing unit, the method comprising: scheduling a first group of threads via a first hardware scheduler to a first processing core and a third processing core via a second hardware scheduler, the first group of threads to perform a set of floating-point operations, the first processing core and the third processing core included within a floating-point data path of a first processing block of the general-purpose graphics processing unit, the general- purpose graphics processing unit comprising a hardware multithreading compute unit having a single instruction, multiple thread (SIMT) architecture, wherein the first group of threads is associated with a first instruction (Fermi discloses, at page 7, second paragraph, using a GPU. As disclosed at the Figure on page 8 and description on page 10-11, the GPU includes dual schedulers to schedule groups of threads to groups of cores. Fermi also discloses, at page 10, first paragraph, two groups, each having 16 cores, a scheduler, and an instruction dispatch unit. This discloses the first processing block. As disclosed at page 8, second paragraph, each core has a floating point unit (path). As described at page 4, Fermi is based on SIMT execution model. Fermi also discloses, at page 7, first paragraph, the threads are executed by cores, which are hardware, thus disclosing hardware multithreading. As disclosed at page 6, first paragraph, the cores execute kernels, which include instructions.); 
scheduling a second group of threads to a second processing core via the first hardware scheduler and a fourth processing core via the second hardware scheduler, the second hardware scheduler different from the first hardware scheduler, the second group of threads to perform a set of (Fermi discloses, at the Figure on page 8 and description on page 10-11, the GPU includes dual schedulers (i.e., two different schedulers) to schedule groups of threads to groups of cores. Fermi also discloses, at page 10, first paragraph, two groups, each having 16 cores, a scheduler, and an instruction dispatch unit. This discloses the second processing block. As disclosed at page 8, second paragraph, each core has an integer unit (path). As disclosed at page 6, first paragraph, the cores execute kernels, which include instructions.); 
executing a first thread associated with the first instruction using processing cores of the floating-point data path wherein executing the first thread includes allocating a first register and a third register within a first register file associated with the first processing block (Fermi discloses, at page 8, second paragraph, each core has a floating point unit (path). Fermi also discloses, at the Figure on page 8, a register file coupled to both blocks of cores, which discloses allocating registers as needed for execution of instructions, including a first and third register. As disclosed at page 6, first paragraph, the cores execute kernels, which include instructions, which discloses the first thread of the first instruction being executed by the floating point data path.); and 
executing a second thread associated with the second instruction using processing cores of the integer data path, the second thread executed independently and contiguously with the first thread, wherein executing the second thread includes allocating a second register and a fourth register (Fermi discloses, at page 8, second paragraph, each core has an integer unit (path). Fermi also discloses, at the Figure on page 8, a register file coupled to both blocks of cores, which discloses allocating registers as needed for execution of instructions, including a second and fourth register. As disclosed at page 6, first paragraph, the cores execute kernels, which include instructions, which discloses the second thread of the second instruction being executed by the integer data path. Fermi also discloses, at page 10, second paragraph, integer and floating point instructions can be concurrently executed, which discloses independent and contiguous thread execution.).

However, in the same field of endeavor (e.g., GPUs) Tesla discloses:
a second register file associated with the second processing block, the first register file different from the second register file (Tesla discloses, at pages 12-13, Figure 8, two register files, each associated with different groups of cores.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Fermi to include two register files, as taught by Tesla, because doing so merely entails a combination of prior art elements (cited above) according to known methods to yield predictable results, which is an exemplary rationale to support a conclusion of obviousness, as per MPEP § 2143. 

Claims 8and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Fermi in view of Tesla in view of US Publication No. 2012/0233444 by Stephens et al. (hereinafter referred to as “Stephens”) 
Regarding claim 8, Fermi discloses the elements of claim 1, as discussed above. Fermi also discloses:
wherein the second processing core is to perform a 32-bit integer operation and the fourth processing core is to perform one or more…integer operations (Fermi discloses, at page 8, last paragraph, each core can perform 32 bit integer operations.)
Fermi does not explicitly disclose 8 bit integer operations.
However, in the same field of endeavor (e.g., data processing) Stephens discloses:
8 bit operations (Stephens discloses, at ¶ [0059], 8 bit operands.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Fermi’s cores to support 8 bit operations, as disclosed by Stephens, because it is known to do so in order to support more efficient operation by not using a larger operand size than is actually needed. Stephens, ¶ [0005].

Regarding claim 21, Fermi discloses:
scheduling a first group of threads to a first processing core via a first hardware scheduler and a third processing core via a second hardware scheduler, the first group of threads to perform a set of floating-point operations, the first processing core and the third processing core included within a floating-point data path of a first processing block of a general-purpose graphics processing unit, the general- purpose graphics processing unit comprising a hardware multithreading compute unit having a single instruction, multiple thread (SIMT) architecture, wherein the first group of threads is associated with a first instruction (Fermi discloses, at page 7, second paragraph, using a GPU. As disclosed at the Figure on page 8 and description on page 10-11, the GPU includes dual schedulers to schedule groups of threads to groups of cores. Fermi also discloses, at page 10, first paragraph, two groups, each having 16 cores, a scheduler, and an instruction dispatch unit. This discloses the first processing block. As disclosed at page 8, second paragraph, each core has a floating point unit (path). As described at page 4, Fermi is based on SIMT execution model. Fermi also discloses, at page 7, first paragraph, the threads are executed by cores, which are hardware, thus disclosing hardware multithreading. As disclosed at page 6, first paragraph, the cores execute kernels, which include instructions.);  
scheduling a second group of threads to a second processing core via the first hardware scheduler and a fourth processing core via the second hardware scheduler, the second hardware scheduler different from the first hardware scheduler, the second group of threads to perform a set of integer operations, the second processing core and the fourth processing core included within an integer data path of a second processing block of the general-purpose graphics processing unit, wherein the second group of threads is associated with a second instruction (Fermi discloses, at the Figure on page 8 and description on page 10-11, the GPU includes dual schedulers (i.e., two different schedulers) to schedule groups of threads to groups of cores. Fermi also discloses, at page 10, first paragraph, two groups, each having 16 cores, a scheduler, and an instruction dispatch unit. This discloses the second processing block. As disclosed at page 8, second paragraph, each core has an integer unit (path). As disclosed at page 6, first paragraph, the cores execute kernels, which include instructions.);  
(Fermi discloses, at page 8, second paragraph, each core has a floating point unit (path). Fermi also discloses, at the Figure on page 8, a register file coupled to both blocks of cores, which discloses allocating registers as needed for execution of instructions, including a first and third register. As disclosed at page 6, first paragraph, the cores execute kernels, which include instructions, which discloses the first thread of the first instruction being executed by the floating point data path.); and 
executing a second thread associated with the second instruction using processing cores of the integer data path, the second thread executed independently and contiguously with the first thread, wherein executing the second thread includes allocating a second register and a fourth register (Fermi discloses, at page 8, second paragraph, each core has an integer unit (path). Fermi also discloses, at the Figure on page 8, a register file coupled to both blocks of cores, which discloses allocating registers as needed for execution of instructions, including a second and fourth register. As disclosed at page 6, first paragraph, the cores execute kernels, which include instructions, which discloses the second thread of the second instruction being executed by the integer data path. Fermi also discloses, at page 10, second paragraph, integer and floating point instructions can be concurrently executed, which discloses independent and contiguous thread execution.).
Fermi does not explicitly disclose the aforementioned second and fourth registers are within a second register file associated with the second processing block, the first register file different from the second register file.
However, in the same field of endeavor (e.g., GPUs) Tesla discloses:
a second register file associated with the second processing block, the first register file different from the second register file (Tesla discloses, at pages 12-13, Figure 8, two register files, each associated with different groups of cores.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Fermi to include two register files, as taught by Tesla, because doing so merely entails a combination of prior art elements (cited above) according to known methods to 
Fermi does not explicitly disclose a non-transitory machine-readable medium storing instructions to cause one or more processors to perform operations. 
However, in the same field of endeavor (e.g., data processing) Stephens discloses:
a non-transitory machine-readable medium storing instructions to cause one or more processors to perform operations (Stephens discloses, at ¶ [0048], storing instructions in non-transitory form, which discloses machine readable media.). 
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Fermi’s GPU to include non-transitory storage of instructions, as disclosed by Stephens, because doing so represents combining prior art elements according to known methods to yield predictable results. That is, one of ordinary skill in the art would have been motivated to make such a combination because all of the claimed elements were known in the prior art and one skilled in the art could have combined the elements as claimed by known methods with no change in their respective functions and the combination would have yielded predictable results to one of ordinary skill in the art before the effective filing date of the claimed invention.

Claims 10 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Fermi in view of Tesla in view of US Publication No. 2015/0160872 by Chen (hereinafter referred to as “Chen”).
Regarding claim 10, Fermi, as modified, discloses the elements of claim 9, as discussed above. Fermi, as modified, also discloses:
wherein the internal memory is to be shared between the first processing core, the second processing core, the third processing core, and the fourth processing core (Fermi discloses, at the Figure on page 8 and description on pages 10-11, a shared memory coupled to the blocks. The shared memory is internal.).
Fermi does not explicitly disclose GDDR6 memory.
However, in the same field of endeavor (e.g., data processing) Chen discloses:
GDDR6 memory (Chen discloses, at ¶ [0024], GDDR6 memory.).


Regarding claim 11, Fermi, as modified, discloses the elements of claim 10, as discussed above. Fermi, as modified, also discloses:
wherein the internal memory includes a data cache accessible by the first processing core, the second processing core, the third processing core, and the fourth processing core (Fermi discloses, at the Figure on page 8 and description on pages 10-11, the shared memory coupled to the blocks includes cache accessible to the cores.).

Claims 12-17, 19, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Fermi in view of Tesla in view of “The Compute Architecture of Intel® Processor Graphics Gen9” by Stephen Junkins (as disclosed by Applicant and hereinafter referred to as “Junkins”).
Regarding claim 12, Fermi, as modified, discloses the elements of claim 9, as discussed above. Fermi, as modified, also discloses:
wherein the first hardware scheduler is to schedule threads of a first thread group to the first processing core and threads of a second thread group to the second processing core, wherein the second hardware scheduler is to schedule threads of the first thread group to the third processing core and threads of the second thread group to the fourth processing core (Fermi discloses, at the Figure on page 8 and description on page 10-11, dual schedulers, where the schedulers schedule threads from two warps to the groups of cores. As disclosed at page 6, first paragraph, the cores execute kernels, which execute across sets of threads, and as disclosed at page 18, multiple kernels can be executed concurrently, which discloses multiple sets of threads being concurrently scheduled.). 
Fermi does not explicitly disclose wherein execution context for threads within the first thread group and the second thread group is to be maintained on-chip during execution.
However, in the same field of endeavor (e.g., GPUs) Junkins discloses:
execution context for threads within the first thread group and the second thread group is to be maintained on-chip during execution (Junkins discloses, at § 5.3, page 7, storing context in a dedicated register file that is, as shown in Figure 3, on-chip.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Fermi’s GPU to include the dedicated on chip context storage disclosed by Junkins because doing so would have been obvious to try. That is, Fermi discloses context switching, which necessarily involves context information that must be stored somewhere. Fermi is silent about where the GPU stores the context information. However, there are a finite number of identified, predictable potential solutions, which could have been pursued with a reasonable expectation of success. One of the options one of ordinary skill in the art could have pursued with a reasonable expectation of success would have been to store the context information on chip, as evidenced by Junkins.

Regarding claim 13, Fermi, as modified, discloses the elements of claim 12, as discussed above. Fermi, as modified, also discloses:
wherein the first processing core includes a first functional unit associated with the first floating-point data path and the second processing core includes a second functional unit associated with the first integer data path (Fermi discloses, at page 8, second paragraph, each core includes a floating point unit (first functional unit) and an integer unit (second functional unit.).

Regarding claim 14, Fermi, as modified, discloses the elements of claim 13, as discussed above. Fermi, as modified, also discloses:
the first functional unit to perform a floating- point operation and the second functional unit to perform an integer operation independently of the first functional unit (Fermi discloses, at page 10, first paragraph, that warps execute independently, which means that the cores executing the respective warps operate independently.).

Regarding claim 15, Fermi, as modified, discloses the elements of claim 14, as discussed above. Fermi, as modified, also discloses:
wherein the first functional unit is to perform one or more operations selected from a set of operations including…a 32-bit floating-point operation (Fermi discloses, at page 8, second paragraph, each core can perform single precision (32 bit) floating point operations.).

Regarding claim 16, Fermi, as modified, discloses the elements of claim 14, as discussed above. Fermi, as modified, also discloses:
wherein the second functional unit is to perform one or more of an 8-bit, 16-bit, and a 32-bit integer operation (Fermi discloses, at page 8, last paragraph, each core can perform 32 bit integer operations.).
Fermi does not explicitly disclose performing a 16-bit operation.
However, in the same field of endeavor (e.g., GPUs) Tesla discloses:
performing 16-bit operations (Tesla discloses, at page 12, performing 16-bit operations.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Fermi to include 16-bit operations, as taught by Tesla, because doing so enables increased operation throughput. See Tesla, page 12. 

Regarding claim 17, Fermi, as modified, discloses the elements of claim 14, as discussed above. Fermi, as modified, also discloses:
wherein the third processing core includes a third functional unit to perform a 64-bit floating-point operation (Fermi discloses, at page 8, second paragraph, each core can perform double precision (64 bit) floating point operations.).

Regarding claim 19, Fermi discloses the elements of claim 18, as discussed above Fermi also discloses:
storing first data associated with the first instruction and second data associated with the second instruction within a memory coupled with the first processing block and the second processing block (Fermi discloses, at the Figure on page 8 and description on pages 10-11, threads storing data in a shared memory coupled to the blocks.).
Fermi does not explicitly disclose maintaining execution context of the first and the second thread within hardware of the general-purpose graphics processing unit during execution of the first instruction and the second instruction.
However, in the same field of endeavor (e.g., GPUs) Junkins discloses:
maintaining execution context of the first and the second thread within hardware of the general-purpose graphics processing unit during execution of the first instruction and the second instruction (Junkins discloses, at § 5.3, page 7, storing context in a dedicated register file that is, as shown in Figure 3, on-chip.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Fermi’s GPU to include the dedicated on chip context storage disclosed by Junkins because doing so would have been obvious to try. That is, Fermi discloses context switching, which necessarily involves context information that must be stored somewhere. Fermi is silent about where the GPU stores the context information. However, there are a finite number of identified, predictable potential solutions, which could have been pursued with a reasonable expectation of success. One of the options one of ordinary skill in the art could have pursued with a reasonable expectation of success would have been to store the context information on chip, as evidenced by Junkins.

Regarding claim 20, Fermi, as modified, discloses the elements of claim 19, as discussed above. Fermi also discloses:
allocating the first register for use by the first processing core; allocating the second register for use by the second processing core; allocating the third register for use by the third processing core; and allocating the fourth register for use by the fourth processing core (Fermi also discloses, at the Figure on page 8, a register file coupled to both blocks of cores, which discloses allocating registers as needed for execution of instructions, including a first, second, third, and fourth registers.).

Claims 22 and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Fermi in view of Tesla in view of Stephens in view of Junkins.
Regarding claim 22, Fermi, as modified, discloses the elements of claim 21, as discussed above. Fermi, as modified, also discloses:
storing first data associated with the first instruction and second data associated with second instruction within a memory coupled with the first processing block and the second processing block (Fermi discloses, at the Figure on page 8 and description on pages 10-11, storing data, which includes data associated with instructions, in a shared memory coupled to the blocks.).
Fermi does not explicitly disclose maintaining execution context of the first and the second thread within hardware of the general-purpose graphics processing unit during execution of the first instruction and the second instruction.
However, in the same field of endeavor (e.g., GPUs) Junkins discloses:
maintaining execution context of the first and the second thread within hardware of the general-purpose graphics processing unit during execution of the first instruction and the second instruction (Junkins discloses, at § 5.3, page 7, storing context in a dedicated register file that is, as shown in Figure 3, on-chip.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Fermi’s GPU to include the dedicated on chip context storage disclosed by Junkins because doing so would have been obvious to try. That is, Fermi discloses context switching, which necessarily involves context information that must be stored somewhere. Fermi is silent about where the GPU stores the context information. However, there are a finite number of identified, predictable potential solutions, which could have been pursued with a reasonable expectation of success. One of the options one of ordinary skill in the art could have pursued with a reasonable expectation of success would have been to store the context information on chip, as evidenced by Junkins.

Regarding claim 23, Fermi, as modified, discloses the elements of claim 22, as discussed above. Fermi also discloses:
allocating the first register for use by the first processing core; allocating the second register for use by the second processing core; allocating the third register for use by the third processing core; and allocating the fourth register for use by the fourth processing core (Fermi also discloses, at the Figure on page 8, a register file coupled to both blocks of cores, which discloses allocating registers as needed for execution of instructions, including a first, second, third, and fourth registers.).

Response to Arguments
On pages 13-14 of the response filed December 8, 2020 (“response”), the Applicant argues that Fermi does not teach the claimed first and second register files. 
These remarks have been fully considered and are deemed persuasive. Please see above for new grounds of rejection of the amended claims. Specifically, Tesla discloses separate register files for separate groups of cores.

On page 14 of the response the Applicant argues that Fermi does not teach 16-bit floating point operations. 
These remarks have been fully considered and, in light of the claim amendments presented in the response, are deemed persuasive. Please see above for new grounds of rejection of the amended claims. Specifically, Tesla discloses 16-bit FP operations.

On page 14 of the response the Applicant argues, “the dual warp schedulers of Fermi function differently than detailed in claim 18. Specifically, Fermi, pg. 10 states, "Fermi's dual warp scheduler selects two warps, and issues one instruction from each warp to a group of sixteen cores, sixteen load/store units, or four SFUs." (emphasis added). Thus, one scheduler is associated with one group of cores. In claim 18, as previously presented, each of the two hardware schedulers explicitly schedule to both groups.


On page 15 of the response the Applicant argues that claim 18 and the remaining claims are allowable for similar reasons to those presented above.
Though fully considered, the Examiner respectfully disagrees. The remarks and rejections presented above apply similarly to these claims.

Conclusion
The following prior art made of record and not relied upon is considered pertinent to Applicant’s disclosure.
US 20160062947 by Chetlur discloses the integer unit of an SM generating an image tile while the FP unit works on another tile. 
Dynamic Warp Subdivision for Integrated Branch and Memory Divergence Tolerance by Meng discloses addressing idle processing cycles in a SIMT architecture.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHAWN DOMAN whose telephone number is (571)270-5677.  The examiner can normally be reached on Monday through Friday 8:30am-6pm Eastern Time.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/SHAWN DOMAN/Examiner, Art Unit 2183