DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .  

Response to Amendments and Arguments
The present Office action is in response to Applicant’s amendment/request for reconsideration submitted on August 16, 2022, hereinafter “Reply”, after non-final rejection of March 10, 2022, hereinafter “Non-Final Rejection”.  In the Reply, claims 1, 3-4, 8, 10-11, 15, and 17-18 were amended, and no claims were cancelled nor added.  Claims 1-20 remain pending in the application.  
The Reply has been fully considered, with the examiner’s response set forth below.
(1)	In view of the amendments to the claims, the claim objections are withdrawn.
(2)	Applicant’s arguments on pp. 7-9 with respect to independent claims 1, 8, and 15 and dependent claims thereof have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
(3)	Another iteration of claim analysis has been made due to the amendments to the claims in the Reply.  Refer to the corresponding sections of the claim analysis below for details.  

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1, 3-4, 8, 10-11, 15, and 17-18 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Choi (US 2017/0060588 A1), hereinafter “Choi”.

Regarding claim 1, Choi discloses:
An apparatus configured to: 
receive an operation targeting a processing device; (FIGs. 1-8, 11-12; [0051], “[0053] The host processor 10 corresponds to hardware that processes various operations.”, [0055], “[0057] In the example of FIG. 1, the memory controller 20 is hardware that accesses [operation] the memory 30 according to memory requests issued by the host processor 10 and performs control operations to load or store various kinds of data from or into the memory 30. Here, the data that is loaded from the memory 30 is possibly, for example, host instructions, source codes, and data associated with various types of operations.”, [0058], “[0059] For example, the internal processor 31 [processing device] corresponds to a processor-in-memory (PIM). In this example, the PIM is a device configured to process data of the memory array 35 without having latency by using dedicated pins to connect a processor implemented as hardware logic to the memory array 35. … For example, the memory 30 having the internal processor 31 such as the PIM is also possibly referred to as an intelligent random access memory (RAM), computational RAM, or smart memory.”, [0060]-[0061], “[0062] When an offloading instruction is included in the host instructions, the host processor 10 offloads processing of an operation corresponding to the offloading instruction to the internal processor 31 [processing device] of the memory 30. The term “offloading” as used herein denotes that the internal processor 31 [processing device] performs processing of a certain operation instead of the host processor 10.”)
convert the operation into at least one memory access command, responsive to one or more conditions for applying an optimization not satisfied.  (FIGs. 1-8, 11-12; [0051], [0053], [0055], “[0057] In the example of FIG. 1, the memory controller 20 is hardware that accesses [operation] the memory 30 according to memory requests issued by the host processor 10 and performs control operations to load or store various kinds of data from or into the memory 30.”, [0058]-[0062], “[0074] The determiner 120 determines [one or more conditions, such as] whether an offloading instruction is included in the host instructions. For example, the offloading instruction refers to an instruction for performing a certain type of operation, and examples of the certain type of operation include an SQRT operation, a reciprocal operation, a log operation, an exponential operation, a power series operation, and a trigonometric operation as discussed further, above. For example, [an optimization is applied when using] the PIM 310 of FIG. 2 is implemented as a dedicated processor for processing the certain type of operation corresponding to the offloading instruction.”, [0109], “[0110] In operation S1101, the compiler analyzes a given source code. Here, the source code to be analyzed possibly refers to the code shown in the examples, FIG. 4A or 6”, “[0111] In operation S1102, the compiler determines whether [one or more conditions, such as] a certain type of operation is included in the source code as a result of the analysis of the source code. Here, the certain type of operation corresponds to an SFU including, but is not limited to, an SQRT operation, a reciprocal operation, a log operation, an exponential operation, a power series operation, a trigonometric operation, and so on.”, “[0114] In operation S1105 [applied for the optimization], when the use of the PIM 310 is determined as being more efficient, the compiler generates a code for using the PIM 310. For example, the compiler generates the assembly code 500 described in the example of FIG. 5 or the assembly code 700 described in the example of FIG. 7, but is not limited thereto.”, “[0115] In operation S1106, when [one or more conditions, such as] the certain type of operation is not included [not satisfied] in the source code, the PIM is not available in the computing system 1, or the use of the PIM 310 is inefficient, the compiler generates a normal code, and merely compiles the instruction in a standard, alternative manner. For example, the compiler generate a code for calling an existing software library in order to process a certain operation, such as the assembly code 420 described in FIG. 4B and the assembly code 620 described in FIG. 6B.”;  In operation S1106 of FIG. 11, when [one or more conditions, such as] the certain type of operation is not included [not satisfied] in the source code, the compiler generate a code for calling an existing software library in order to process a certain operation, such as the assembly code 420 described in FIG. 4B and the assembly code 620 described in FIG. 6B.  The assembly code in FIGs. 4B and 6B contains “LOAD R2 [R7]”, which requires at least one memory access command -- a memory access command that reads a value of a[i] at a location in the memory 30 specified by address R7 and another memory access command that loads the value of a[i] into a location in the memory 30 specified by address R2, before a SQRT operation is performed and a result b[i] is stored in the memory 30.)

Regarding claim 8, the claimed method comprises substantially the same steps or elements as those in claim 1.  Accordingly, the claim is also rejected for the same reasons as set forth for those in claim 1 above.

Further regarding claim 8, Choi further discloses:
receiving, by a control unit, an operation targeting a processing device.  (FIGs. 1-8, 11-12; [0051], “[0053] The host processor 10 [control unit] corresponds to hardware that processes various operations.”, [0055], “[0057] In the example of FIG. 1, the memory controller 20 is hardware that accesses [operation] the memory 30 according to memory requests issued by the host processor 10 [control unit] and performs control operations to load or store various kinds of data from or into the memory 30. Here, the data that is loaded from the memory 30 is possibly, for example, host instructions, source codes, and data associated with various types of operations.”, [0058], “[0059] For example, the internal processor 31 [processing device] corresponds to a processor-in-memory (PIM). In this example, the PIM is a device configured to process data of the memory array 35 without having latency by using dedicated pins to connect a processor implemented as hardware logic to the memory array 35. … For example, the memory 30 having the internal processor 31 such as the PIM is also possibly referred to as an intelligent random access memory (RAM), computational RAM, or smart memory.”, [0060]-[0061], “[0062] When an offloading instruction is included in the host instructions, the host processor 10 [control unit] offloads processing of an operation corresponding to the offloading instruction to the internal processor 31 [processing device] of the memory 30. The term “offloading” as used herein denotes that the internal processor 31 [processing device] performs processing of a certain operation instead of the host processor 10 [control unit].”)

Regarding claim 3, Choi discloses the apparatus as recited in claim 1.

Choi further discloses:
wherein the memory access 15command is not generated for the operation responsive to determining that the one or more conditions are satisfied.  (FIGs. 1-8, 11-12; [0051], [0053], [0055], [0057]-[0062], [0074], “[0085] In the example of FIG. 5, the instruction “PIM_INTRINSIC” is an offloading instruction, which is an instruction for offloading the SQRT operation to the internal processor 31 that is, the PIM 310, and processing the SQRT operation in the internal processor 31, that is, the PIM 310.”, [0088]-[0089], “[0111] In operation S1102, the compiler determines whether [one or more conditions, such as] a certain type of operation is included in the source code as a result of the analysis of the source code.”, “[0112] In operation S1103, when [one or more conditions, such as] the certain type of operation is included [satisfied] in the source code, the compiler determines whether the PIM is available in the computing system in which the source code is to be executed.”, “[0114] In operation S1105, when the use of the PIM 310 is determined as being more efficient, the compiler generates a code for using the PIM 310. For example, the compiler generates the assembly code 500 described in the example of FIG. 5 or the assembly code 700 described in the example of FIG. 7, but is not limited thereto.”;  Through S1101-S1105 of FIG. 11, when [one or more conditions, such as] the certain type of operation is included [satisfied] in the source code, the compiler generates a code for using the PIM 310.  This code contains PIM_INTRINSIC, but not the memory access commands such as the LOAD command in the assembly code of FIGs. 4B and 6B.)

Regarding claims 10 and 17, the claimed method and the claimed system comprise substantially the same steps or elements as those in claim 3.  Accordingly, the claims are also rejected for the same reasons as set forth for those in claim 3 above.

Regarding claim 4, Choi discloses the apparatus as recited in claim 1.

Choi further discloses:
convert the operation into N commands responsive to determining that the one or more conditions are not satisfied, wherein N is a positive integer greater than one, and wherein the N commands comprise the memory access command and an arithmetic command; and  (FIGs. 1-8, 11-12; [0051], [0053], [0055], “[0057] In the example of FIG. 1, the memory controller 20 is hardware that accesses [operation] the memory 30 according to memory requests issued by the host processor 10 and performs control operations to load or store various kinds of data from or into the memory 30.”, [0058]-[0062], “[0074] The determiner 120 determines [one or more conditions, such as] whether an offloading instruction is included in the host instructions. For example, the offloading instruction refers to an instruction for performing a certain type of operation, and examples of the certain type of operation include an SQRT operation, a reciprocal operation, a log operation, an exponential operation, a power series operation, and a trigonometric operation as discussed further, above. For example, [an optimization is applied when using] the PIM 310 of FIG. 2 is implemented as a dedicated processor for processing the certain type of operation corresponding to the offloading instruction.”, [0109], “[0110] In operation S1101, the compiler analyzes a given source code. Here, the source code to be analyzed possibly refers to the code shown in the examples, FIG. 4A or 6”, “[0111] In operation S1102, the compiler determines whether [one or more conditions, such as] a certain type of operation is included in the source code as a result of the analysis of the source code. Here, the certain type of operation corresponds to an SFU including, but is not limited to, an SQRT operation, a reciprocal operation, a log operation, an exponential operation, a power series operation, a trigonometric operation, and so on.”, “[0114] In operation S1105 [applied for the optimization], when the use of the PIM 310 is determined as being more efficient, the compiler generates a code for using the PIM 310. For example, the compiler generates the assembly code 500 described in the example of FIG. 5 or the assembly code 700 described in the example of FIG. 7, but is not limited thereto.”, “[0115] In operation S1106, when [one or more conditions, such as] the certain type of operation is not included [not satisfied] in the source code, the PIM is not available in the computing system 1, or the use of the PIM 310 is inefficient, the compiler generates a normal code, and merely compiles the instruction in a standard, alternative manner. For example, the compiler generate a code for calling an existing software library in order to process a certain operation, such as the assembly code 420 described in FIG. 4B and the assembly code 620 described in FIG. 6B.”;  In operation S1106 of FIG. 11, when [one or more conditions, such as] the certain type of operation is not included [not satisfied] in the source code, the compiler generate a code for calling an existing software library in order to process a certain operation, such as the assembly code 420 described in FIG. 4B and the assembly code 620 described in FIG. 6B.  The assembly code in FIGs. 4B and 6B contains “LOAD R2 [R7]”, which requires a memory access command that reads a value of a[i] at a location in the memory 30 specified by address R7, and a SQRT operation [arithmetic command], where N is considered to be 2 in this case.)
convert the operation into N-1 commands responsive to determining that the one or more conditions are satisfied, wherein the N-1 commands comprise only the arithmetic command.  (FIGs. 1-8, 11-12; [0051], [0053], [0055], “[0057] In the example of FIG. 1, the memory controller 20 is hardware that accesses [operation] the memory 30 according to memory requests issued by the host processor 10 and performs control operations to load or store various kinds of data from or into the memory 30.”, [0058]-[0062], [0074], “[0085] In the example of FIG. 5, the instruction “PIM_INTRINSIC” is an offloading instruction, which is an instruction for offloading the SQRT operation to the internal processor 31 that is, the PIM 310, and processing the SQRT operation in the internal processor 31, that is, the PIM 310.”, [0088]-[0089], “[0114] In operation S1105, when [one or more conditions, such as] the use of the PIM 310 is determined as [satisfied] being more efficient, the compiler generates a code for using the PIM 310. For example, the compiler generates the assembly code 500 described in the example of FIG. 5 or the assembly code 700 described in the example of FIG. 7, but is not limited thereto.”;  In operation S1105 of FIG. 11, when [one or more conditions, such as] the use of the PIM 310 is determined as [satisfied] being more efficient, the compiler generates a code for using the PIM 310.  This code contains only the PIM_INTRINSIC [arithmetic command], but not the memory access commands such as the LOAD command in the assembly code in FIGs. 4B and 6B.  The PIM_INTRINSIC [arithmetic command] is for performing the SQRT operation in the internal processor 31.  N is considered to be 2 in this case.)

Regarding claims 11 and 18, the claimed method and the claimed system comprise substantially the same steps or elements as those in claim 4.  Accordingly, the claims are also rejected for the same reasons as set forth for those in claim 4 above.

Regarding claim 15, Choi discloses:
A system comprising: 
a processing in memory (PIM) device; and  (FIGs. 1-8, 11-12; “[0059] For example, the internal processor 31 [processing in memory (PIM) device] corresponds to a processor-in-memory (PIM).”)
a memory controller coupled to the PIM device, wherein the memory controller is 15configured to:  (FIGs. 1-8, 11-12; [0051], [0055], “[0057] In the example of FIG. 1, the memory controller 20 is hardware that accesses the memory 30 according to memory requests issued by the host processor 10 and performs control operations to load or store various kinds of data from or into the memory 30.”, “[0059] For example, the internal processor 31 [processing in memory (PIM) device] corresponds to a processor-in-memory (PIM). … the memory 30 having the internal processor 31 such as the PIM is also possibly referred to as an intelligent random access memory (RAM), computational RAM, or smart memory.”)
receive a PIM operation targeting the PIM device; (FIGs. 1-8, 11-12; [0051], “[0053] The host processor 10 corresponds to hardware that processes various operations.”, [0055], “[0057] In the example of FIG. 1, the memory controller 20 is hardware that accesses [PIM operation] the memory 30 according to memory requests issued by the host processor 10 and performs control operations to load or store various kinds of data from or into the memory 30. Here, the data that is loaded from the memory 30 is possibly, for example, host instructions, source codes, and data associated with various types of operations.”, [0058], “[0059] For example, the internal processor 31 [PIM device] corresponds to a processor-in-memory (PIM). In this example, the PIM is a device configured to process data of the memory array 35 without having latency by using dedicated pins to connect a processor implemented as hardware logic to the memory array 35. … For example, the memory 30 having the internal processor 31 such as the PIM is also possibly referred to as an intelligent random access memory (RAM), computational RAM, or smart memory.”, [0060]-[0061], “[0062] When an offloading instruction is included in the host instructions, the host processor 10 offloads processing of an operation corresponding to the offloading instruction to the internal processor 31 [PIM device] of the memory 30. The term “offloading” as used herein denotes that the internal processor 31 [PIM device] performs processing of a certain operation instead of the host processor 10.”)
convert the operation into at least two commands, responsive to one or more conditions for applying an optimization not being satisfied.  (FIGs. 1-8, 11-12; [0051], [0053], [0055], “[0057] “[0057] In the example of FIG. 1, the memory controller 20 is hardware that accesses [operation] the memory 30 according to memory requests issued by the host processor 10 and performs control operations to load or store various kinds of data from or into the memory 30.”, [0058]-[0062], “[0074] The determiner 120 determines [one or more conditions, such as] whether an offloading instruction is included in the host instructions. For example, the offloading instruction refers to an instruction for performing a certain type of operation, and examples of the certain type of operation include an SQRT operation, a reciprocal operation, a log operation, an exponential operation, a power series operation, and a trigonometric operation as discussed further, above. For example, [an optimization is applied when using] the PIM 310 of FIG. 2 is implemented as a dedicated processor for processing the certain type of operation corresponding to the offloading instruction.”, [0109], “[0110] In operation S1101, the compiler analyzes a given source code. Here, the source code to be analyzed possibly refers to the code shown in the examples, FIG. 4A or 6”, “[0111] In operation S1102, the compiler determines whether [one or more conditions, such as] a certain type of operation is included in the source code as a result of the analysis of the source code. Here, the certain type of operation corresponds to an SFU including, but is not limited to, an SQRT operation, a reciprocal operation, a log operation, an exponential operation, a power series operation, a trigonometric operation, and so on.”, “[0114] In operation S1105 [applied for the optimization], when the use of the PIM 310 is determined as being more efficient, the compiler generates a code for using the PIM 310. For example, the compiler generates the assembly code 500 described in the example of FIG. 5 or the assembly code 700 described in the example of FIG. 7, but is not limited thereto.”, “[0115] In operation S1106, when [one or more conditions, such as] the certain type of operation is not included [not satisfied] in the source code, the PIM is not available in the computing system 1, or the use of the PIM 310 is inefficient, the compiler generates a normal code, and merely compiles the instruction in a standard, alternative manner. For example, the compiler generate a code for calling an existing software library in order to process a certain operation, such as the assembly code 420 described in FIG. 4B and the assembly code 620 described in FIG. 6B.”;  In operation S1106 of FIG. 11, when [one or more conditions, such as] the certain type of operation is not included [not satisfied] in the source code, the compiler generate a code for calling an existing software library in order to process a certain operation, such as the assembly code 420 described in FIG. 4B and the assembly code 620 described in FIG. 6B.  The assembly code in FIGs. 4B and 6B contains at least two commands -- “LOAD R2 [R7]”, a SQRT operation performed on R2.)

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 2, 5-7, 9, 12-14, 16, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Choi (US 2017/0060588 A1), hereinafter “Choi”, as applied to claims 1, 8, and 15 above, and further in view of Gierach et al. (US 2020/0175741 A1), hereinafter “Gierach”.

Regarding claim 2, Choi teaches the apparatus as recited in claim 1.

Choi does not explicitly teach wherein applying the optimization causes a reduction in a number of commands that are executed by the processing device.

However, Gierach teaches:
wherein applying the optimization causes a reduction in a number of commands that are executed by the processing device.  (FIGs. 4, 14.B, 15, 16A, 18-19; [0147]-[0148], [0182]-[0188], [0190]-[0191]; when the conditions are determined to be satisfied, the simple constant folding pass can optimize the shader by removing the memory accesses and replacing them with the immediate values (e.g., if the value of CB0 is zero); the simple constant folding pass can optimize the shader by way of the above block of code can be simply replaced [converting] with an instruction “R6=10 [one or more commands]”; a number of instructions [commands] of the above block of code is reduced from 5 to 1 by simply replacing [converting] the above block of code with an instruction “R6=10”; the above block of code is associated with the shading processed for the initial target shader executed by the constant folding unit 1521 in the GPGPU 1520 [processing device] if the conditions are determined to be satisfied)

	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Choi to incorporate the teachings of Gierach to provide a computing system of Choi having a host processor that may be a graphics processing unit (GPU) that offloads processing of an operation corresponding to an offloading instruction to an internal processor, with a general purpose graphics processing device of Gierach that processes a workload including graphics or compute operations and a constant folding unit for reducing a number of instructions by simply replacing a block of code with an instruction.  Doing so with the system of Choi would provide a graphics processor unit (GPU) using constant folding with hardware and software approaches that combine runtime constants with a shader to produce an improved, or optimized, shader entirely on the GPU.  (Gierach, [0153])

Regarding claims 9 and 16, the claimed method and the claimed system comprise substantially the same steps or elements as those in claim 2.  Accordingly, the claims are also rejected for the same reasons as set forth for those in claim 2 above.

Regarding claim 5, Choi teaches the apparatus as recited in claim 1.

Choi does not explicitly teach wherein the one or more conditions comprise the operation targeting a constant value.

However, Gierach teaches:
wherein the one or more conditions comprise the operation targeting a constant value.  (FIGs. 4, 14.B, 15, 16A, 18-19; [0147]-[0150], [0182]-[0188], [0190]; “[0170] Constant folding unit 1521 may comprise an intermediate storage 1830 which provides storage [i.e, an operation such as memory access] for data including initial and intermediate shaders, runtime constant values, and metadata structures generated by the shader compiler in the software stack”; in the sample sequence above, the optimization of the shader is processed for the shading of the initial target shader using the constant folding unit 1521 in the GPGPU 1520 if the offset exists in the list of known constant values as one of the one or more conditions being considered to be satisfied where the constant folding operations are determined to be offloaded and performed on a GPU rather than on a CPU.)

	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Choi to incorporate the teachings of Gierach to provide a computing system of Choi having a host processor that may be a graphics processing unit (GPU) that offloads processing of an operation corresponding to an offloading instruction to an internal processor, with a general purpose graphics processing device of Gierach that processes a workload including graphics or compute operations and a constant folding unit for reducing a number of instructions by simply replacing a block of code with an instruction.  Doing so with the system of Choi would provide a graphics processor unit (GPU) using constant folding with hardware and software approaches that combine runtime constants with a shader to produce an improved, or optimized, shader entirely on the GPU.  (Gierach, [0153])

Regarding claim 12, the claimed method comprises substantially the same steps or elements as those in claim 5.  Accordingly, the claim is also rejected for the same reasons as set forth for those in claim 5 above.

Regarding claim 6, Choi teaches the apparatus as recited in claim 1.

Choi does not explicitly teach wherein the one or more conditions comprise a constant value cache lookup for a targeted value resulting in a hit.

However, Gierach teaches:
wherein the one or more conditions comprise a constant value cache lookup for a targeted value resulting in a hit.  (FIGs. 4, 14.B, 15, 16A, 18-19; [0147]-[0150], [0182]-[0188], [0190]; “[0155] … constant folding unit 1521 will receives as an input an original input shader 1612 as well as a set of runtime constants from an input constant buffer 1614 and produces an improved or optimized shader 1630”; “[0170] Constant folding unit 1521 may comprise an intermediate storage 1830 which provides storage [i.e, an operation such as memory access] for data including initial and intermediate shaders, runtime constant values, and metadata structures generated by the shader compiler in the software stack”; in the sample sequence above, the optimization of the shader is processed for the shading of the initial target shader using the constant folding unit 1521 in the GPGPU 1520 if the offset exists in the list of known constant values as one of the one or more conditions being considered to be satisfied where the constant folding operations are determined to be offloaded and performed on a GPU rather than on a CPU; when the conditions are determined to be satisfied, the simple constant folding pass can optimize the shader by removing the memory accesses and replacing them with the immediate values (e.g., if the value of CB0 [target value] is zero); a constant value cache lookup is considered to be a search of the input constant buffer 1614 for a value of CB0 [target value] of 0; the checking step of determining if the offset exists in the list of known constant values from the input constant buffer 1614 is one of the one or more conditions being considered to be satisfied, the result of the checking step would be a hit when the value of CB0 [target value] of 0 is found from the list of known constant values)

	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Choi to incorporate the teachings of Gierach to provide a computing system of Choi having a host processor that may be a graphics processing unit (GPU) that offloads processing of an operation corresponding to an offloading instruction to an internal processor, with a general purpose graphics processing device of Gierach that processes a workload including graphics or compute operations and a constant folding unit for reducing a number of instructions by simply replacing a block of code with an instruction and performs a search of an input constant buffer of known constant values.  Doing so with the system of Choi would provide a graphics processor unit (GPU) using constant folding with hardware and software approaches that combine runtime constants with a shader to produce an improved, or optimized, shader entirely on the GPU.  (Gierach, [0153])

Regarding claims 13 and 20, the claimed method and the claimed system comprise substantially the same steps or elements as those in claim 6.  Accordingly, the claims are also rejected for the same reasons as set forth for those in claim 6 above.

Regarding claim 7, Choi teaches the apparatus as recited in claim 1.

Choi does not explicitly teach wherein the processing device is a processing in 5memory (PIM) device, and wherein the one or more conditions comprise the operation being called by a kernel that invokes a loop with an invariant variable.

However, Gierach teaches:
wherein the processing device is a processing in 5memory (PIM) device, and wherein the one or more conditions comprise the operation being called by a kernel that invokes a loop with an invariant variable.  (FIGs. 4, 14.B, 15, 16A, 18-19; “[0145] … The processor 1502 and the GPGPU 1520 [processing device] can be any of the processors and GPGPU/parallel processors as described herein. … In some embodiments the GPGPU memory 1518 includes GPGPU local memory 1528 within the GPGPU 1520 [processing device] and can also include some or all of system memory 1512”; [0147]-[0150], “[0166] This process is similar to a function call and call stack in c++. The function gets called, the function data and instruction pointers get pushed on the call stack during execution. When execution is done, the call and data are popped off the callstack”; [0170], “[0177] At operation 2045 the pass is marked as having been executed. Control then passes back to operation 2020. Thus, operations 2020 through 2045 define a loop pursuant to which any constant folding passes loaded into the constant folding unit 1521 during the initialization process are executed by the processing units 1812, 1814, 1816”; “[0203] In some examples, GPU-based constant [invariant variable] folding can be implemented on existing GPU execution units using compute shader kernels to execute the desired constant folding passes”; the GPGPU 1520 [processing device] is considered to be a processing in memory (PIM) device since the GPGPU 1520 [processing device] has processors and the GPGPU local memory 1528; the operations 2020 through 2045 that define a loop pursuant to which any constant [invariant variable] (e.g., CB0 of 0) folding passes loaded into the constant folding unit 1521; note that the constant [invariant variable] value, e.g., CB0 is used inside the loop, in a similar manner described in paragraphs [0038], [0043], etc. of Applicant’s specification; in the sample sequence above, the optimization of the shader is processed for the shading of the initial target shader using the constant folding unit 1521 in the GPGPU 1520 if the offset exists in the list of known constant values as one of the one or more conditions being considered to be satisfied where the constant folding operations are determined to be offloaded and performed on a GPU rather than on a CPU)

	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Choi to incorporate the teachings of Gierach to provide a computing system of Choi having a host processor that may be a graphics processing unit (GPU) that offloads processing of an operation corresponding to an offloading instruction to an internal processor, with a general purpose graphics processing device of Gierach having processors and a local memory for processing a workload including graphics or compute operations and a constant folding unit for reducing a number of instructions by simply replacing a block of code with an instruction and performs a search of an input constant buffer of known constant values.  Doing so with the system of Choi would provide a graphics processor unit (GPU) using constant folding with hardware and software approaches that combine runtime constants with a shader to produce an improved, or optimized, shader entirely on the GPU.  (Gierach, [0153])

Regarding claim 14, the claimed method comprises substantially the same steps or elements as those in claim 7.  Accordingly, the claim is also rejected for the same reasons as set forth for those in claim 7 above.

Regarding claim 19, Choi teaches the system as recited in claim 15.

Choi does not explicitly teach wherein the one or more conditions comprise the PIM operation targeting a constant value.

However, Gierach teaches:
wherein the one or more conditions comprise the PIM operation targeting a constant value.  (FIGs. 4, 14.B, 15, 16A, 18-19; [0147]-[0148], [0152], [0182]-[0188], [0190]; “[0170] Constant folding unit 1521 may comprise an intermediate storage 1830 which provides storage [i.e, an operation such as memory access] for data including initial and intermediate shaders, runtime constant values, and metadata structures generated by the shader compiler in the software stack”; in the sample sequence above, the optimization of the shader is processed for the shading of the initial target shader using the constant folding unit 1521 in the GPGPU 1520 if the offset exists in the list of known constant values as one of the one or more conditions being considered to be satisfied where the constant folding operations are determined to be offloaded and performed on a GPU rather than on a CPU)

	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Choi to incorporate the teachings of Gierach to provide a computing system of Choi having a host processor that may be a graphics processing unit (GPU) that offloads processing of an operation corresponding to an offloading instruction to an internal processor, with a general purpose graphics processing device of Gierach that processes an optimization of a shader for a workload including graphics or compute operations using a constant folding unit for reducing a number of instructions and performing a search of an input constant buffer of known constant values.  Doing so with the system of Choi would provide a graphics processor unit (GPU) using constant folding with hardware and software approaches that combine runtime constants with a shader to produce an improved, or optimized, shader entirely on the GPU.  (Gierach, [0153])

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Tong B Vo whose telephone number is (571)272-7568.  The examiner can normally be reached on M-F 9:00 AM - 5:00 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Charles Rones can be reached on (571)272-4085.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/T.B.V./Patent Examiner, Art Unit 2136


/CHARLES RONES/Supervisory Patent Examiner, Art Unit 2136