DETAILED ACTION
It is hereby acknowledged that the following papers have been received and placed of record in the file:
Amended Claims						-Receipt Date 11/20/2020
Applicant Arguments						-Receipt Date 11/20/2020		
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
This office action is in response to the amendment filed on 11/20/2020. Claims 1-24 are pending. Claims 1, 9, and 17 are amended. Applicant's amendments to the claims have not overcome the previous objections. Examiner notes that Applicant’s amendments to Claim 17 does not properly address the previous objections, raises new objections, and raises issues of non-compliance since the amendments do not indicate deletions using strikethrough or brackets. 

Response to Arguments
Applicant's arguments filed 11/20/2020 have been fully considered but they are not persuasive. 
Applicant submits:
“The cited reference does not anticipate an arrangement in which a general-purpose graphics processor unit (GPU) comprises a plurality of graphics processing cores, wherein at least one of the plurality of graphics processing cores to: host an extension mechanism within a register file of the general-purpose GPU, the register file comprising registers, wherein the 
However, Applicant’s arguments are not persuasive. Amended claim 1 recites:
wherein the extended register file to receive instructions of the stream of instructions that request two or more ports for servicing, wherein the extended register file utilizes only a single write port to service the instructions that request two or more ports
Nunamaker teaches instructions that may have three source operands and one destination operand that refers to registers in the vector register file (col 8 lines 53-56) and the vector register file has three read ports and one write port for the instructions three source operands and one destination operand (col 10 lines 3-13). Since the office action maps the vector subunit and vector register file of Nunamaker as the extended register file, Nunamaker teaches the extended register file to receive instructions of the stream of instructions that request two or more ports for servicing since Nunamaker teaches that the vector subunit receives instructions that have three source operands and one destination operand which require/request 3 read ports and one write port, i.e. two or more ports for servicing, and wherein the extended register file utilizes only a single write port to service the instructions that request two or more ports since Nunamaker teaches that a write port is used by the vector subunit for the destination operand of the instruction, i.e. to service the instruction.
Examiner suggests clarifying the claim language to indicate that the instruction may require two or more “write” ports which the extended register file services by using a single write port, as described in [0025] for example. 

Claim Objections
Claims 1-3, 6, 8-9, and 17 are objected to because of the following informalities:  
Claim 1- “wherein at least one of the plurality of graphics processing cores to:” should be “wherein at least one of the plurality of graphics processing cores is to:”, similar corrections should be made for other instances of infinitive phrases such as:
 claims 1, 9, 17- “wherein the hosting of the extension mechanism to convert the register file”
claim 2- “wherein the GPU execution unit to host the extended register file”
claim 3- “wherein the arithmetic logic unit to host the extended register file”
claim 6- “wherein the GPU execution unit to”
claim 8- “wherein the arithmetic logic unit to”
Claim 9 line 20- “the extension execution unit of the extension mechanism” should be “an extension execution unit of the extension mechanism” since the extension mechanism is not previously introduced has having an extension execution unit
Claim 17 line 21- “the extension execution units of the extension mechanism” should be “an extension execution unit of the extension mechanism” since the extension mechanism is not previously introduced has having an extension execution unit
Claim 17 line 11- “executed by  extension execution unit that resides in the extended register file of an extension execution unit” should be “executed by an extension execution unit that resides in the extended register file of the execution unit”
Claim 17 line 14- “the extension execution unit” is a non-compliant amendment because it does not properly indicate the amendment being made as “the extension execution unit[[s]]”
Claim 17 line 21- “the extension execution units” is a non-compliant amendment because it does not properly indicate the amendment being made as “the extension execution units”
Appropriate correction is required.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5 are rejected under 35 U.S.C. 103 as being unpatentable over Nunamaker et al. US 7,284,092 (hereinafter, Nunamaker) in view of Wu et al. US 8,400,458 (hereinafter, Wu)
Regarding claim 1, Nunamaker teaches:
1. An apparatus comprising:
an instruction cache to receive a stream of instructions (col 6 lines 4-8 and lines 29-30: L1 I-cache is an instruction cache that receives instructions, i.e. a stream of instructions, from L2 cache);
an instruction unit to execute the stream of instructions (col 5 lines 61-62 and col 6 lines 27-46: instruction unit 201 executes instructions from the L1-cache by decoding and dispatching instructions for execution);
 a processor unit comprising a plurality of processing cores (col 6 lines 13-17: a processor unit contains two processor cores 101), wherein at least one of the plurality of processing cores to: 
host an extension mechanism within a register file, the register file comprising registers, wherein the hosting of the extension mechanism to convert the register file to an extended register file (col 6 lines 55-64, col 9 lines 1-12, and Fig. 2: vector subunit 216 and register file 217 form an extended register file, where register file 305 comprises registers, and the circuitry placing/connecting the register file physically close to the vector subunit is an extension mechanism which the register file “hosts” to convert the register file to a register file that is closely connected to the vector subunit, i.e. to convert the register file to an extended register file), and wherein the extended register file to receive instructions of the stream of instructions that request two or more ports for servicing, wherein the extended register file utilizes only a single write port to service the instructions that request two or more ports (col 8 lines 53-56 and col 10 lines 3-13: an instruction may have three source operands and one destination operand which refers to registers in the register file, i.e. the instruction requests 3 read ports and 1 write which are two or more ports for servicing since each port of the register file provides a register for the corresponding operand in the instruction, and the vector subunit/extended register file uses only a single write port for the one destination operand to service the instruction that requests the 3 read ports and 1 write port); 
wherein the extended register file comprising an extension execution unit to, in response to determining that an instruction of the instructions comprises one or more tasks that can be executed by the extension execution unit, execute, inside the extended register file, one or more tasks relating to the instruction using the registers and using processing units of the extension execution unit of the extension mechanism (col 8 lines 32-41 and col 9 lines 1-25: vector execution subunit 216 is an extension execution unit that executes instructions in response to instruction unit determining that an instruction can be executed by it, and vector execution subunit executes the instruction using function units 301-304, i.e. processing units, and the execution is done inside the extended register file since the extended register file is the vector subunit placed close together with its register file); and 
wherein executing the one or more tasks inside the extended register file is to preclude reading or writing of the registers of the extended register file by components outside of the extended register file (col 9 lines 1-25: the function execution units inside the vector subunit of the extended register file executes the instruction, i.e. tasks relating to the instruction, by reading the register file 305 in the extended register file, thus the instruction is executed without external components outside the extended register file reading the registers); and
a shared memory communicatively coupled to the plurality of processing cores (col 6 lines 13-20: L2 cache is shared memory coupled to two processors, i.e. a plurality of processing cores).

a general-purpose graphics processor unit (GPU) comprising a plurality of graphics processing cores;
wherein at least one of the plurality of graphics processing cores to: 
host an extension mechanism within a register file of the general-purpose GPU
a shared memory communicatively coupled to the plurality of graphics processing cores
	However, Wu teaches a GPU having a plurality of cores and being used for general-purpose computing (col 1 line 61- col 2 line 10). In particular, Wu teaches:
a general-purpose graphics processor unit (GPU) comprising a plurality of graphics processing cores (col 1 line 61- col 2 line 10 and col 2 lines 28-33: GPU 140 comprises a plurality of graphics processing cores 176);
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the CPU cores of a CPU of Nunamaker to be GPU cores of a GPU as taught by Wu. This combination would teach the cores of Nunamaker being GPU cores to host an extension mechanism within a register file of a GPU and the L2 cache of Nunamaker would be coupled to the graphics processing cores. One of ordinary skill in the art would have been motivated to make this modification to run general business purpose applications at a lower capital equipment cost and at much higher power efficiency (Wu col 1 line 61-col line 10).

	Regarding claim 2, Nunamaker in view of Wu teaches: 
2. (Previously Presented) The apparatus of claim 1, wherein the general-purpose GPU comprises a GPU execution unit, wherein the GPU execution unit to host the extended register file (Nunamaker col 5 lines 58-62: processor 101, which is a GPGPU in the combination, has execution unit 211, i.e. a GPU execution unit, which hosts the extended register file in vector subunit 216).

	Regarding claim 3, Nunamaker in view of Wu teaches:
3. (Currently amended) The apparatus of claim 1, wherein the general-purpose GPU comprises an arithmetic logic unit, wherein the arithmetic logic unit to host the extended register file (Nunamaker col 5 lines 58-62 and col 8 lines 32-41: processor 101, which is a GPGPU in the combination, has execution unit 211, i.e. an arithmetic logic unit since it performs arithmetic operations in its subunits, which hosts the extended register file in vector subunit 216).

	Regarding claim 4, Nunamaker in view of Wu teaches:
4. (Previously Presented) The apparatus of claim 1, further comprising: 
detection/reading unit to detect the instruction (Nunamaker col 6 lines 34-39: decode/dispatch unit 201 is a unit to detect the instruction); and 
processing/decision unit to process the one or more tasks, wherein processing of the one or more tasks includes managing one or more operations relating to contents of one or more of the registers, wherein the one or more operations include one or more of a comparison operation, a swapping operation, an arithmetic operation (Nunamaker col 9 lines 34-51: a function execution unit, i.e. processing/decision unit, performs arithmetic operations on register inputs, i.e. manages an arithmetic operation relating to contents of the input registers), and a decision-making operation.

	Regarding claim 5, Nunamaker in view of Wu teaches:
5. (Previously Presented) The apparatus of claim 4, further comprising an execution/forwarding unit to execute results associated with the one or more operations to complete performance of the one or more tasks relating to the instruction, wherein the execution/forwarding unit is further to facilitate communication of at least one of the results, the contents, and other relevant data within or between one or more of the extension mechanism, the extended register file, a processor execution unit, and an arithmetic logic unit (Nunamaker col 9 lines 19-28 and col 12 lines 49-53: 307 and 308 is an execution/forwarding unit which selects the outputs of function execution units to write to the register file, i.e. to complete performance of tasks relating to an instruction and to facilitate communication of the results within the extended register file).

Claims 3, 6-9, 11-24 are rejected under 35 U.S.C. 103 as being unpatentable over Nunamaker et al. US 7,284,092 (hereinafter, Nunamaker) in view of Wu et al. US 8,400,458 (hereinafter, Wu), and Miller et al. US 6,405,303 (hereinafter, Miller).
	Regarding claim 3, Nunamaker in view of Wu teaches:
3. (Previously Presented) The apparatus of claim 1, further comprising an application processor having an arithmetic logic unit (Nunamaker col 4 lines 38-42 and col 6 lines 42-46: processor 101 is an application processor since it executes instructions of programs/applications stored in memory, processor 101 includes ALU 213 of execution unit 211), 
	Although Nunamaker teaches the execution unit having an ALU and execution subunits having their own register files (col 6 lines 42-64), under a more narrow interpretation, Nunamaker does not teach the ALU hosting the execution subunits. That is, Nunamaker does not explicitly teach:
wherein the arithmetic logic unit to host the extended register file.
	However, Miller teaches decoder/execution units which are able to determine simple instructions which it can execute and forwards complex instructions to further functional units (Miller, Abstract). That is, Miller teaches:
a logic unit to host a functional unit (col 6 line 51-62: decode/execution unit executes simple instructions and forwards unexecuted instructions to functional units).
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the execution unit of Nunamaker to decode and execute simple instructions in its ALU as taught by Miller and to forward unexecuted instructions to the further subunits to be executed. This combination would teach:
wherein the arithmetic logic unit to host the extended register file (ALU of Nunamaker sending unexecuted instructions to the vector subunit would be considered “hosting” the vector subunit/extended register file).
One of ordinary skill in the art would have been motivated to make this modification to simplify the hardware of decoder/execution units when supporting variable length instructions (Miller col 7 lines 6-10).

	Regarding claim 6, Nunamaker in view of Wu teaches: 
6. (Previously Presented) The apparatus of claim 2, wherein the processor to determine whether the instruction is qualified to be processed by the extension mechanism inside the extended register file, wherein if the instruction is qualified, the one or more tasks relating to the instruction are performed by the extension mechanism inside the extended register file (Nunamaker col 6 lines 34-39, col 7 lines 51-61, col 8 lines 36-38 and lines 49-41: decode/dispatch unit 203 determines the operations to be performed for an instruction and dispatches that instruction to the vector subunit 216/extended register file if the instruction is for the subunit 216, i.e. if the instruction is qualified to be processed by the function units inside 216).
	Nunamaker in view of Wu does not teach:
the GPU execution unit to determine whether the instruction is qualified
	However, Miller teaches:
a unit to determine whether an instruction is qualified (col 6 line 51-62: decode/execution unit determines if an instruction is qualified to be forwarded to a functional unit or is a simple instruction to be executed by the execution unit).
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the execution unit of Nunamaker in view of Wu to decode and execute simple instructions in its ALU as taught by Miller and to forward unexecuted instructions to the further subunits to be executed. This combination would teach:
the GPU execution unit to determine whether the instruction is qualified (execution unit 211 of Nunamaker determining whether instructions are qualified to be processed by the vector subunit 216).
One of ordinary skill in the art would have been motivated to make this modification to simplify the hardware of decoder/execution units when supporting variable length instructions (Miller col 7 lines 6-10).

	Regarding claim 7, Nunamaker in view of Wu and Miller teaches: 
7. (Previously Presented) The apparatus of claim 6, wherein if the instruction is not qualified, the one or more tasks relating to the instruction are performed by a processing engine of the GPU execution unit, wherein the processing engine of the GPU execution unit includes an arithmetic/logic unit located outside the extended register file (Miller col 6 lines 51-62: if an instruction is a simple instruction, i.e. not qualified, the decode/execution unit performs the instruction, i.e. the execution unit 211 of Nunamaker would perform a simple instruction using a processing engine of the execution unit).

	Regarding claim 8, Nunamaker in view of Wu and Miller teaches:
8. (Original) The apparatus of claim 3, wherein the arithmetic logic unit to determine whether the instruction is qualified to be processed by the extension mechanism inside the extended register file, wherein if the instruction is qualified, the one or more tasks relating to the instruction are performed by the extension mechanism inside the extended register file, and wherein if the instruction is not qualified, the one or more tasks relating to the instruction are performed by a processing engine of the arithmetic logic unit, wherein the arithmetic logic unit- based processing engine includes an arithmetic/logic engine located outside the extended register file (Nunamaker col 6 lines 42-46 and Miller col 6 lines 51-62: the ALU of Nunamaker would determine if the instructions are qualified to be processed by the vector subunit 216, and if so the instructions are sent to 216, or if the instructions are simple instructions which are not qualified to be processed by the vector subunit 216, and in this case the instructions are performed by the ALU 213 which is outside the vector subunit 216).

	Regarding claim 9, Nunamaker teaches: 
9. (Currently Amended) A method comprising: 
hosting an extension mechanism within a register file of a general-purpose graphics processing unit (GPU), the register file comprising registers, wherein the hosting of the extension mechanism to convert the register file to an extended register file (col 6 lines 55-64, col 9 lines 1-12, and Fig. 2: vector subunit 216 and register file 217 form an extended register file, where register file 305 comprises registers, and the circuitry placing/connecting the register file physically close to the vector subunit is an extension mechanism which the register file “hosts” to convert the register file to a register file that is closely connected to the vector subunit, i.e. to convert the register file to an extended register file); 
receiving, in an instruction cache, a stream of instructions (col 6 lines 4-8 and lines 29-30: L1 I-cache is an instruction cache that receives instructions, i.e. a stream of instructions, from L2 cache);
initiating, in an instruction unit, execution of the stream of instructions (col 5 lines 61-62 and col 6 lines 27-46: instruction unit 201 executes instructions from the L1-cache by decoding and dispatching instructions for execution); 
receiving an instruction of the stream of instructions for execution (col 6 lines 42-56: execution unit 211 receives instructions for execution); 
determining whether the instruction comprises one or more tasks to be executed by an extension execution unit that resides in the extended register file of the execution unit (col 6 lines 34-39, col 7 lines 51-61, col 8 lines 36-38 and lines 49-41: decode/dispatch unit 203 determines whether an instruction may be executed by the vector subunit 216, i.e. an extended register file); and 
in response to a determination that the instruction comprises the one or more tasks to be executed by the extension execution unit:
receiving the instruction at the extended register file, wherein the instruction requests two or more ports for servicing (col 8 lines 53-56 and col 10 lines 3-13: an instruction may have three source operands and one destination operand which refers to registers in the register file, i.e. the instruction requests 3 read ports and 1 write which are two or more ports for servicing since each port of the register file provides a register for the corresponding operand in the instruction);
utilizing, by the extended register file, only a single write port to service the instruction that requests two or more ports (col 8 lines 53-56 and col 10 lines 3-13: the vector subunit/extended register file uses only a single write port for the one destination operand to service the instruction that requests the 3 read ports and 1 write port); and
processing the instruction inside the extended register file, wherein the one or more tasks are performed using the registers and using processing units of the extension execution unit of the extension mechanism (col 8 lines 32-41 and col 9 lines 1-25: vector execution subunit 216 is an extension execution unit that executes instructions in response to instruction unit determining that an instruction can be executed by it, and vector execution subunit executes the instruction using function units 301-304, i.e. processing units, and the execution is done inside the extended register file since the extended register file is the vector subunit placed close together with its register file), wherein executing the one or more tasks inside the extended register file is to preclude reading or writing of the registers of the extended register file by components outside of the extended register file (col 9 lines 1-25: the function execution units inside the vector subunit of the extended register file executes the instruction, i.e. tasks relating to the instruction, by reading the register file 305 in the extended register file, thus the instruction is executed without external components outside the extended register file reading the registers).
Although Nunamaker teaches the execution unit having an ALU and execution subunits having their own register files (col 6 lines 42-64), Nunamaker does not teach the ALU determining instructions to be executed by the execution subunits. Further, although Nunamaker teaches that its CPU 101 may be a core of a plurality of cores (col 6 lines 13-20) Nunamaker does not explicitly teach the cores being graphics processing cores. That is, Nunamaker does not explicitly teach:
receiving, in the general-purpose GPU comprising a plurality of graphics processing cores, an instruction of the stream of instructions for execution; 
determining, in an execution unit, whether the instruction comprises one or more tasks to be executed by an extension execution unit
	However, Wu teaches a GPU having a plurality of cores and being used for general-purpose computing (col 1 line 61- col 2 line 10). In particular, Wu teaches:
a general-purpose graphics processor unit (GPU) comprising a plurality of graphics processing cores (col 1 line 61- col 2 line 10 and col 2 lines 28-33: GPU 140 comprises a plurality of graphics processing cores 176);
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the CPU cores of a CPU of Nunamaker to be GPU cores of a GPU as taught by Wu. This combination would teach the cores of Nunamaker being GPU cores to host an extension mechanism within a register file of a GPU. One of ordinary skill in the art would have been motivated to make this modification to run general business purpose applications at a lower capital equipment cost and at much higher power efficiency (Wu col 1 line 61-col line 10).

a unit, in the execution unit, to determine whether an instruction may be executed (col 6 line 51-62: decode/execution unit determines if an instruction is to be forwarded to a functional unit or is a simple instruction to be executed by the execution unit)
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the execution unit of Nunamaker to decode and execute simple instructions in its ALU as taught by Miller and to forward unexecuted instructions to the further subunits to be executed. One of ordinary skill in the art would have been motivated to make this modification to simplify the hardware of decoder/execution units when supporting variable length instructions (Miller col 7 lines 6-10).

	Regarding claim 11, Nunamaker in view of Wu and Miller teaches:
11. (Previously Presented) The method of claim 9, further comprising: facilitating an arithmetic logic unit of the general-purpose GPU to host the extended register file (Nunamaker col 4 lines 38-42 and col 6 lines 42-46: processor 101 is an application processor since it executes instructions of programs/applications stored in memory, processor 101 includes ALU 213 of execution unit 211; Miller col 6 line 51-62: decode/execution unit, i.e. ALU of Nunamaker, executes simple instructions and forwards unexecuted instructions to functional units, i.e. hosts the vector subunit/extended register file).

	Regarding claim 12, Nunamaker in view of Wu and Miller teaches:
12. (Original) The method of claim 9, further comprising: 
detecting the instruction (Nunamaker col 6 lines 34-39: decode/dispatch unit 201 is logic to detect the instruction); and 
processing the one or more tasks, wherein processing of the one or more tasks includes managing one or more operations relating to contents of one or more of the registers, wherein the one or more operations include one or more of a comparison operation, a swapping operation, an arithmetic operation (Nunamaker col 9 lines 34-51: a function execution unit, i.e. processing/decision unit, performs arithmetic operations on register inputs, i.e. manages an arithmetic operation relating to contents of the input registers), and a decision-making operation.

Regarding claim 13, Nunamaker in view of Wu and Miller teaches:
13. (Previously Presented) The method of claim 12, further comprising: 
executing results associated with the one or more operations to complete the performance of the one or more tasks relating to the instruction (Nunamaker col 9 lines 19-28 and col 12 lines 49-53: logic 307 and 308 is execution/forwarding logic which selects the outputs of function execution units to write to the register file, i.e. to complete performance of tasks relating to an instruction); and 
facilitating communication of at least one of the results, the contents, and other relevant data within or between one or more of the extension mechanism, the extended register file, the execution unit, and an arithmetic logic unit (Nunamaker col 9 lines 19-28 and col 12 lines 49-53: logic 307 and 308 is execution/forwarding logic which facilitates communication of the results within the extended register file).

	Regarding claim 14, Nunamaker in view of Wu and Miller teaches:
14. (Previously Presented) The method of claim 9, further comprising: determining, by the execution unit, whether the instruction is qualified to be processed by the extension mechanism inside the extended register file, wherein if the instruction is qualified, the one or more tasks relating to the instruction are performed by the extension mechanism inside the extended register file  (Nunamaker col 6 lines 34-39, col 7 lines 51-61, col 8 lines 36-38 and lines 49-41, and Miller col 6 line 51-62: decode/dispatch unit 203, which is included in the execution unit when modified by Miller, determines the operations to be performed for an instruction and dispatches that instruction to the vector subunit 216/extended register file if the instruction is for the subunit 216, i.e. if the instruction is qualified to be processed by the function units inside 216).

	Regarding claim 15, Nunamaker in view of Wu and Miller teaches:
15. (Original) The method of claim 14, wherein if the instruction is not qualified, the one or more tasks relating to the instruction are performed by a processing engine of the execution unit, wherein the execution unit-based processing engine includes an arithmetic/logic unit located outside the extended register file (Nunamaker col 6 lines 42-46 and Miller col 6 lines 51-62: the ALU of Nunamaker would determine if the instructions are qualified to be processed by the vector subunit 216, and if so the instructions are sent to 216, or if the instructions are simple instructions which are not qualified to be processed by the vector subunit 216, and in this case the instructions are performed by the ALU 213 which is outside the vector subunit 216).

	Regarding claim 16, Nunamaker in view of Wu and Miller teaches:
16. (Original) The method of claim 11, further comprising: determining, by the arithmetic logic unit, whether the instruction is qualified to be processed by the extension mechanism inside the extended register file, wherein if the instruction is qualified, the one or more tasks relating to the instruction are performed by the extension mechanism inside the extended register file, and wherein if the instruction is not qualified, the one or more tasks relating to the instruction are performed by a processing engine of the arithmetic logic unit, wherein the arithmetic logic unit- based processing engine includes an arithmetic/logic engine located outside the extended register file (Nunamaker col 6 lines 42-46 and Miller col 6 lines 51-62: the ALU of Nunamaker would determine if the instructions are qualified to be processed by the vector subunit 216, and if so the instructions are sent to 216, or if the instructions are simple instructions which are not qualified to be processed by the vector subunit 216, and in this case the instructions are performed by the ALU 213 which is outside the vector subunit 216).

	Regarding claim 17, Nunamaker teaches:
17. (Currently Amended) At least one machine-readable storage medium comprising a plurality of instructions, executed on a computing device, to facilitate the computing device to: 
host an extension mechanism within a register file of a general-purpose graphics processing unit (GPU), the register file comprising registers, wherein the hosting of the extension mechanism to convert the register file to an extended register file (col 6 lines 55-64, col 9 lines 1-12, and Fig. 2: vector subunit 216 and register file 217 form an extended register file, where register file 305 comprises registers, and the circuitry placing/connecting the register file physically close to the vector subunit is an extension mechanism which the register file “hosts” to convert the register file to a register file that is closely connected to the vector subunit, i.e. to convert the register file to an extended register file)
receive, in an instruction cache, a stream of instructions (col 6 lines 4-8 and lines 29-30: L1 I-cache is an instruction cache that receives instructions, i.e. a stream of instructions, from L2 cache);
initiate, in an instruction unit, execution of the stream of instructions (col 5 lines 61-62 and col 6 lines 27-46: instruction unit 201 executes instructions from the L1-cache by decoding and dispatching instructions for execution);
receive an instruction of the stream of instructions for execution (col 6 lines 42-56: execution unit 211 receives instructions for execution); 
determine whether the instruction comprises one or more tasks to be executed by extension execution unit that resides in the extended register file of an extension execution unit (col 6 lines 34-39, col 7 lines 51-61, col 8 lines 36-38 and lines 49-41: decode/dispatch unit 203 determines whether an instruction may be executed by the vector subunit 216, i.e. an extended register file); and 
in response to a determination that the instruction comprises the one or more tasks to be executed by the extension execution unit:
receive the instruction at the extended register file, wherein the instruction requests two or more ports for servicing (col 8 lines 53-56 and col 10 lines 3-13: an instruction may have three source operands and one destination operand which refers to registers in the register file, i.e. the instruction requests 3 read ports and 1 write which are two or more ports for servicing since each port of the register file provides a register for the corresponding operand in the instruction);
utilize, by the extended register file, only a single write port to service the instruction that requests two or more ports (col 8 lines 53-56 and col 10 lines 3-13: the vector subunit/extended register file uses only a single write port for the one destination operand to service the instruction that requests the 3 read ports and 1 write port); and
process the instruction inside the extended register file, wherein the one or more tasks are performed using the registers and using processing units of the extension execution unit of the extension mechanism (col 8 lines 32-41 and col 9 lines 1-25: vector execution subunit 216 is an extension execution unit that executes instructions in response to instruction unit determining that an instruction can be executed by it, and vector execution subunit executes the instruction using function units 301-304, i.e. processing units, and the execution is done inside the extended register file since the extended register file is the vector subunit placed close together with its register file), wherein executing the one or more tasks inside the extended register file is to preclude reading or writing of the registers of the extended register file by components outside of the extended register file (col 9 lines 1-25: the function execution units inside the vector subunit of the extended register file executes the instruction, i.e. tasks relating to the instruction, by reading the register file 305 in the extended register file, thus the instruction is executed without external components outside the extended register file reading the registers).
Although Nunamaker teaches the execution unit having an ALU and execution subunits having their own register files (col 6 lines 42-64), Nunamaker does not teach the ALU determining instructions to be executed by the execution subunits. Further, although Nunamaker teaches that its CPU 101 may 
receive, in the general-purpose GPU comprising a plurality of graphics processing cores, an instruction of the stream of instructions for execution; 
determine, in an execution unit, whether the instruction comprises one or more tasks to be executed by an extension execution unit
	However, Wu teaches a GPU having a plurality of cores and being used for general-purpose computing (col 1 line 61- col 2 line 10). In particular, Wu teaches:
a general-purpose graphics processor unit (GPU) comprising a plurality of graphics processing cores (col 1 line 61- col 2 line 10 and col 2 lines 28-33: GPU 140 comprises a plurality of graphics processing cores 176);
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the CPU cores of a CPU of Nunamaker to be GPU cores of a GPU as taught by Wu. This combination would teach the cores of Nunamaker being GPU cores to host an extension mechanism within a register file of a GPU. One of ordinary skill in the art would have been motivated to make this modification to run general business purpose applications at a lower capital equipment cost and at much higher power efficiency (Wu col 1 line 61-col line 10).
	Further, Miller teaches decoder/execution units which are able to determine simple instructions which it can execute and forwards complex instructions to further functional units (Miller, Abstract). That is, Miller teaches:
a unit, in the execution unit, to determine whether an instruction may be executed (col 6 line 51-62: decode/execution unit determines if an instruction is to be forwarded to a functional unit or is a simple instruction to be executed by the execution unit)


	Regarding claim 18, Nunamaker in view of Wu and Miller teaches:
18. (Previously Presented) The machine-readable storage medium of claim 17, wherein the computing device is further to: 
facilitate the execution unit to host the extended register file (col 5 lines 58-62: processor 101 has execution unit 211 which hosts the vector subunit 216/extended register file; col 6 lines 46-50: the vector subunit 216 is a Vector/SIMD Multimedia Extension, indicating that the processor is capable of multimedia processing which would include types of graphics processing).

	Regarding claim 19, Nunamaker in view of Wu and Miller teaches: 
19. (Previously Presented) The machine-readable storage medium of claim 17, wherein the computing device is further to: 
facilitate an arithmetic logic unit of the general-purpose GPU to host the extended register file (Nunamaker col 4 lines 38-42, col 6 lines 42-46, and Miller col 6 line 51-62: processor 101 is an application processor since it executes instructions of programs/applications stored in memory, processor 101 includes ALU 213 of execution unit 211 which sends instructions to the vector subunit/hosts the extended register file)

	Regarding claim 20, Nunamaker in view of Wu and Miller teaches:
20. (Original) The machine-readable storage medium of claim 17, wherein the computing device is further to: 
detect the instruction (Nunamaker col 6 lines 34-39: decode/dispatch unit 201 is logic to detect the instruction); and 
process the one or more tasks, wherein processing of the one or more tasks includes managing one or more operations relating to contents of one or more of the registers, wherein the one or more operations include one or more of a comparison operation, a swapping operation, an arithmetic operation (Nunamaker col 9 lines 34-51: a function execution unit, i.e. processing/decision unit, performs arithmetic operations on register inputs, i.e. manages an arithmetic operation relating to contents of the input registers), and a decision-making operation.

	Regarding claim 21, Nunamaker in view Wu and Miller teaches:
21. (Previously Presented) The machine-readable storage medium of claim 20, wherein the computing device is further to: 
execute results associated with the one or more operations to complete the performance of the one or more tasks relating to the instruction (Nunamaker col 9 lines 19-28 and col 12 lines 49-53: logic 307 and 308 is execution/forwarding logic which selects the outputs of function execution units to write to the register file, i.e. to complete performance of tasks relating to an instruction); and 
facilitate communication of at least one of the results, the contents, and other relevant data within or between one or more of the extension mechanism, the extended register file, the execution unit, and an arithmetic logic unit (Nunamaker col 9 lines 19-28 and col 12 lines 49-53: logic 307 and 308 is execution/forwarding logic which facilitates communication of the results within the extended register file).

	Regarding claim 22, Nunamaker in view of Wu and Miller teaches:
22. (Original) The machine-readable storage medium of claim 18, wherein the computing device is further to: determine, by the execution unit, whether the instruction is qualified to be processed by the extension mechanism inside the extended register file, wherein if the instruction is qualified, the one or more tasks relating to the instruction are performed by the extension mechanism inside the extended register file (Nunamaker col 6 lines 34-39, col 7 lines 51-61, col 8 lines 36-38 and lines 49-41, and Miller col 6 line 51-62: decode/dispatch unit 203, which is included in the execution unit when modified by Miller, determines the operations to be performed for an instruction and dispatches that instruction to the vector subunit 216/extended register file if the instruction is for the subunit 216, i.e. if the instruction is qualified to be processed by the function units inside 216).

	Regarding claim 23, Nunamaker in view of Wu and Miller teaches:
23. (Original) The machine-readable storage medium of claim 22, wherein if the instruction is not qualified, the one or more tasks relating to the instruction are performed by a processing engine of the execution unit, wherein the execution unit-based processing engine includes an arithmetic/logic unit located outside the extended register file  (Miller col 6 lines 51-62: if an instruction is a simple instruction, i.e. not qualified, the decode/execution unit performs the instruction, i.e. the execution unit 211 of Nunamaker would perform a simple instruction using a processing engine of the execution unit).

	Regarding claim 24, Nunamaker in view of Wu and Miller teaches:
24. (Original) The machine-readable storage medium of claim 19, wherein the computing device is further to: determine, by the arithmetic logic unit, whether the instruction is qualified to be processed by the extension mechanism inside the extended register file, wherein if the instruction is qualified, the one or more tasks relating to the instruction are performed by the extension mechanism inside the extended register file, and wherein if the instruction is not qualified, the one or more tasks relating to the instruction are performed by a processing engine of the arithmetic logic unit, wherein the arithmetic logic unit-based processing engine includes an arithmetic/logic engine located outside the extended register file  (Nunamaker col 6 lines 42-46 and Miller col 6 lines 51-62: the ALU of Nunamaker would determine if the instructions are qualified to be processed by the vector subunit 216, and if so the instructions are sent to 216, or if the instructions are simple instructions which are not qualified to be processed by the vector subunit 216, and in this case the instructions are performed by the ALU 213 which is outside the vector subunit 216).

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KASIM ALLI whose telephone number is (571)270-1476.  The examiner can normally be reached on Monday - Friday 9am 5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached on 5712724169.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/K.A./Examiner, Art Unit 2183                                                                                                                                                                                                        
/William B Partridge/Primary Examiner, Art Unit 2183