DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
This Office Action has been issued in response to amendment filed 06/08/2022.  Applicant's arguments have been carefully and fully considered but they are not persuasive.  Accordingly, this action has been made FINAL.
 
Claim Status
Claims 1, 3-5, 7, 9-12, 14, 16-18, 20, and 22-25 have been amended. Claims 2, 8, 15, and 21 were canceled. Claims 1, 3-7, 9-14, 16-20, and 22-25 remain pending and are ready for examination.

Rejections not based on Prior Art
In view of Applicant’s amendments, the previous 35 U.S.C. § 101 rejection has been withdrawn.

Claim Objections
Claims 1, 7, and 20 are objected to because of the following informalities:  
Regarding claim 1, it is suggested to put “;” instead of “,” at the end of lines 13 and 15.
Regarding claim 7, it is suggested to put “;” instead of “,” at the end of lines 9 and 11.
Regarding claim 7, line 6 recites “that is to be determined” which should be “that is determined”.
Regarding claim 20, line 4 recites “that is to be determined” which should be “that is determined”.
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1, 3-7, 9-14, 16-20, and 22-25 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) 1, 7, 14, and 20 contain(s) subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. The Specification [0034] discloses “The RL agent 52 may also receive input information related to a power mode (e.g., performance mode, normal mode, power saving mode, etc.), reward information, and/or penalty information. The reward and/or penalty information may be different between the various power modes to encourage the RL agent 52 to adopt different policies based on the power mode” while the Specification [0016] discloses “the reinforcement information may include one or more of reward information and penalty information.” That is, the input information related to a power mode and the reinforcement information. There is no disclosure of “identify reinforcement information that is determined based on the electronic processing system being identified as being operated in a first power mode of a plurality of power modes associated with the electronic processing system”. As such, there is no indication in the specification that the inventor has possession of identifying reinforcement information based on operating in a first power mode.
 
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1, and 3-6 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 1 recites the limitation "the logic" in line 8. There is insufficient antecedent basis for this limitation in the claim. In particular, there are three instances of the term ‘logic’ previously recited in the claim, thus rendering it unclear as to what logic the claim is referring to with this recitation. For the purpose of examination, the Examiner will interpret the claim to read, "the machine learning logic".
Claims 3-6 depend upon claim 1, thus inherit its deficiencies and therefore are rejected as well.
   
Rejections based on Prior Art
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3-4, 6-7, 9-10, 12, 14, 16-17, 19-20, 22-23, and 25 are rejected under 35 U.S.C. 103 as being unpatentable over Nakagawa et al. (US 20200403554 A1 – hereinafter Nakagawa) in view of Kim (US 20170168532 A1 –hereinafter Kim) further in view of Ansorregui et al. (US 20170031430 A1 –hereinafter Ansorregui).
  Regarding Claim 1, Nakagawa teaches an electronic processing system, comprising: 
a processor; (see [0106]; Nakagawa: “a central processing unit (CPU) 200”)
memory communicatively coupled to the processor; (see [0106]; Nakagawa: “Programs that are read by the CPU 200 are saved in the memory 202.”)
a sensor communicatively coupled to the processor; (see [0173] and Fig. 9; Nakagawa: “9 a temperature sensor (first temperature sensor); 9 b temperature sensor (second temperature sensor)”)
a cooling subsystem communicatively coupled to the processor (see [0173] and Fig. 9; Nakagawa: “10, 10A cooling fan control unit”); and 
a machine learning agent communicatively coupled to the processor, the sensor, and the cooling subsystem, the machine learning agent including machine learning logic (see [0119]; Nakagawa: “The machine learning device 310 includes a learning unit 311 and a state observing unit 312 as illustrated in FIG. 15.” See [0120]; Nakagawa: “The information detected by the temperature sensor 9 a is input to the state observing unit 312 from the power conversion device 2.” See [0108]; Nakagawa: “the processing circuit 203 is a single circuit, a composite circuit, a programmed processor, a parallel programmed processor, an application specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination thereof.”)
identify reinforcement information (see [0120]; Nakagawa: “The information detected by the temperature sensor 9 a is input to the state observing unit 312 from the power conversion device 2. The state observing unit 312 observes and outputs the “capacitor life” and the “fan life” as state variables.”)
learn thermal behavior information of the electric processing system based on the reinforcement information, and (see [0120]; Nakagawa: “The state observing unit 312 observes and outputs the “capacitor life” and the “fan life” as state variables.” See [0121]; Nakagawa: “The learning unit 311 receives the state variable “capacitor life” and the state variable “fan life”.” See [0122]; Nakagawa: “The learning unit 311 includes a reward calculation unit 311 a and a function update unit 311 b. The reward calculation unit 311 a calculates a reward r based on the state variables, namely the capacitor life and the fan life.”)
adjust (see [0121]; Nakagawa: “The learning unit 311 feeds back the fan rotational speed, i.e. information on the rotational speed of the cooling fan 8, to the power conversion device 2.” See [0169]; Nakagawa: “The power conversion device 2 controls the rotational speed N of the cooling fan 8 based on the rotational speed N of the cooling fan 8 fed back from the machine learning device 310.”)
However, Nakagawa does not explicitly teach including at least partly implemented in one or more of … logic hardware; identify reinforcement information that is determined based on the electronic processing system being identified as being operated in a first power mode of a plurality of power modes associated with the electronic processing system, the reinforcement information being associated with one or more of the processor, the sensor, or the cooling subsystem, adjust one or more of a parameter of the processor … based on the learned thermal behavior information and information from one or more of the processor, the sensor, and the cooling subsystem.
	Kim from the same or similar field of endeavor teaches:
including at least partly implemented in one or more of …logic hardware (see [0035]; Kim: “Each of the operation processing cores of the operation processor 110 may perform various arithmetic operations or logical operations or both arithmetic and logical operations to operate the electronic device 100.” See [0109]; Kim: “The determination of operation 520 may be performed by an operation processing core that is operating. The reference value Thr may have a value suitable to manage operations of the operation processing cores 111. The suitable value may be obtained by any combination of test(s), experiment(s), and machine learning.” That is, the machine learning is implemented in the operation processing cores 111);
adjust one or more of a parameter of the processor … based on the learned thermal behavior information and information from one or more of the processor, the sensor, and the cooling subsystem. (see [0011]; Kim: “A system control device is in communication with each of the thermal sensors and is configured to migrate at least one task from a first processor core to a second processor core in response to a temperature of the first processor core exceeding a first threshold, reduce an operating frequency of the first processor core in response to the temperature of the first processor core exceeding a second threshold, …” See [0088]; Kim: “the operation frequency of the specific operation processing core may decrease to prevent the temperature of the operation processing core from continuing to increase. When the operation frequency of the operation processing core decreases, the temperature of the operation processing core may not increase rapidly.”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify Nakagawa to include Kim’s features of including at least partly implemented in one or more of logic hardware and adjusting one or more of a parameter of the processor based on the learned thermal behavior information and information from one or more of the processor, the sensor, and the cooling subsystem. Doing so would improve operating performance of the operation processor to provide various services within the device. (Kim, [0004])
However, neither Nakagawa nor Kim does not explicitly teach identify reinforcement information that is determined based on the electronic processing system being identified as being operated in a first power mode of a plurality of power modes associated with the electronic processing system, 
Ansorregui from the same or similar field of endeavor teaches:
identify reinforcement information that is determined based on the electronic processing system (see [0122]; Ansorregui: “the electronic device may calculate a reward according to Formula 3 below.” See [0122]; Ansorregui: “p represents power, t represents time, f represents a frequency, T represents a temperature, n represents a current reward value, n−1 represents a previous reward value, L represents a load value of GPU, and R represents a next reward value.”) being identified as being operated in a first power mode of a plurality of power modes associated with the electronic processing system (see [0090]; Ansorregui: “an electronic device may determine an operation mode of the electronic device… the operation mode may include a normal mode and a power saving mode”. See [0091]; Ansorregui: “when the electronic device determines the operation mode to be a normal mode, the electronic device may determine whether an input indicating a temperature exceeds a certain threshold value.” See [0095]; Ansorregui: “, when the electronic device determines the operation mode to be the power saving mode, the electronic device may then determine whether an input indicating a temperature exceeds a certain threshold value.”), 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the teachings of Nakagawa and Kim to include Ansorregui’s features of identify reinforcement information that is determined based on the electronic processing system being identified as being operated in a first power mode of a plurality of power modes associated with the electronic processing system, the reinforcement information being associated with one or more of the processor, the sensor, or the cooling subsystem. Doing so would balance performance or power consumption of hardware. (Ansorregui, [0011])

Regarding claim 3, the combination of Nakagawa, Kim, and Ansorregui teaches all the limitations of claim 1 above, Nakagawa further teaches wherein the reinforcement information includes one or more of reward information or penalty information. (see [0122]; Nakagawa: “The learning unit 311 includes a reward calculation unit 311 a and a function update unit 311 b. The reward calculation unit 311 a calculates a reward r based on the state variables, namely the capacitor life and the fan life.”)

Regarding claim 4, the combination of Nakagawa, Kim, and Ansorregui teaches all the limitations of claim 3 above, Nakagawa further teaches wherein the machine learning logic is further to: 
learn the thermal behavior of the system based on adjustments to increase the reward information and decrease the penalty information. (see Fig. 16 and [0147]; Nakagawa: “The first process is the process of determining whether to increase the reward r or reduce the reward r based only on |ΔL (N)|, which is the absolute value of the difference between the life Lc (N) and the life Lf (N).” That is, ‘increase reward’ reads on ‘increase the reward information’ and ‘reduce reward’ reads on ‘decrease the penalty information’)

Regarding claim 6, the combination of Nakagawa, Kim, and Ansorregui teaches all the limitations of claim 1 above, Nakagawa further teaches wherein the machine learning agent includes a deep reinforcement learning agent with Q-learning. (see [0125]; Nakagawa: “In a case where Q-learning is applied to the present embodiment, the action at is the fan rotational speed, namely the rotational speed of the cooling fan 8.”)

Regarding Claim 7, Nakagawa teaches a 
identify reinforcement information (see [0120]; Nakagawa: “The information detected by the temperature sensor 9 a is input to the state observing unit 312 from the power conversion device 2. The state observing unit 312 observes and outputs the “capacitor life” and the “fan life” as state variables.”)
learn thermal behavior information of the system based on the reinforcement information, and (see [0120]; Nakagawa: “The state observing unit 312 observes and outputs the “capacitor life” and the “fan life” as state variables.” See [0121]; Nakagawa: “The learning unit 311 receives the state variable “capacitor life” and the state variable “fan life”.” See [0122]; Nakagawa: “The learning unit 311 includes a reward calculation unit 311 a and a function update unit 311 b. The reward calculation unit 311 a calculates a reward r based on the state variables, namely the capacitor life and the fan life.”)
provide information to adjust one or more of (see [0121]; Nakagawa: “The learning unit 311 feeds back the fan rotational speed, i.e. information on the rotational speed of the cooling fan 8, to the power conversion device 2.” See [0169]; Nakagawa: “The power conversion device 2 controls the rotational speed N of the cooling fan 8 based on the rotational speed N of the cooling fan 8 fed back from the machine learning device 310.”)
However, Kim does not explicitly teach:
a semiconductor package apparatus, 
one or more substrates; and
logic coupled to the one or more substrates, wherein the logic is at least partly implemented in one or more of configurable logic or fixed-functionality logic hardware, the logic coupled to the one or more substrates to: 
identify reinforcement information that is to be determined based on a system being identified as being operated in a first power mode of a plurality of power modes associated with the system, 
provide information to … a parameter of a cooling subsystem based on the learned thermal behavior information and the input information.
Kim from the same or similar field of endeavor teaches:
a semiconductor package apparatus, (see [0044]; Kim: “the operation processing cores 111 may share a single die in a single semiconductor package.”)
one or more substrates; and (see [0043]; Kim: “the operation processor 110 includes a plurality of operation processing cores on a monolithic substrate.” See [0134]; Kim: “a stacked substrate 115 may be provided between the first layer and the second layer.”)
logic coupled to the one or more substrates, wherein the logic is at least partly implemented in one or more of configurable logic or fixed-functionality logic hardware, the logic coupled to the one or more substrates to: (see [0035]; Kim: “Each of the operation processing cores of the operation processor 110 may perform various arithmetic operations or logical operations or both arithmetic and logical operations to operate the electronic device 100.”)
provide information to adjust one or more of a parameter of a processor … based on the learned thermal behavior information and the input information. (see [0011]; Kim: “A system control device is in communication with each of the thermal sensors and is configured to migrate at least one task from a first processor core to a second processor core in response to a temperature of the first processor core exceeding a first threshold, reduce an operating frequency of the first processor core in response to the temperature of the first processor core exceeding a second threshold, …” See [0088]; Kim: “the operation frequency of the specific operation processing core may decrease to prevent the temperature of the operation processing core from continuing to increase. When the operation frequency of the operation processing core decreases, the temperature of the operation processing core may not increase rapidly.”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify Nakagawa to include Kim’s features of a semiconductor package apparatus, one or more substrates; and logic coupled to the one or more substrates, wherein the logic is at least partly implemented in one or more of configurable logic and fixed-functionality hardware logic, the logic coupled to the one or more substrates to: provide information to adjust one or more of a parameter of the processor based on the learned thermal behavior information and information from one or more of the processor, the sensor, and the cooling subsystem. Doing so would improve operating performance of the operation processor to provide various services within the device. (Kim, [0004])
However, neither Nakagawa nor Kim does not explicitly teach identify reinforcement information that is to be determined based on a system being identified as being operated in a first power mode of a plurality of power modes associated with the system, 
Ansorregui from the same or similar field of endeavor teaches:
identify reinforcement information that is to be determined based on a system (see [0122]; Ansorregui: “the electronic device may calculate a reward according to Formula 3 below.” See [0122]; Ansorregui: “p represents power, t represents time, f represents a frequency, T represents a temperature, n represents a current reward value, n−1 represents a previous reward value, L represents a load value of GPU, and R represents a next reward value.”) being identified as being operated in a first power mode of a plurality of power modes associated with the system (see [0090]; Ansorregui: “an electronic device may determine an operation mode of the electronic device… the operation mode may include a normal mode and a power saving mode”. See [0091]; Ansorregui: “when the electronic device determines the operation mode to be a normal mode, the electronic device may determine whether an input indicating a temperature exceeds a certain threshold value.” See [0095]; Ansorregui: “, when the electronic device determines the operation mode to be the power saving mode, the electronic device may then determine whether an input indicating a temperature exceeds a certain threshold value.”), 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the teachings of Nakagawa and Kim to include Ansorregui’s features of identify reinforcement information that is determined based on the electronic processing system being identified as being operated in a first power mode of a plurality of power modes associated with the electronic processing system, the reinforcement information being associated with one or more of the processor, the sensor, or the cooling subsystem. Doing so would balance performance or power consumption of hardware. (Ansorregui, [0011])

Regarding to Claim 9, the limitations in this claim is taught by the combination of Nakagawa, Kim, and Ansorregui as discussed connection with claim 3.

Regarding to Claim 10, the limitations in this claim is taught by the combination of Nakagawa, Kim, and Ansorregui as discussed connection with claim 4.

Regarding to Claim 12, the limitations in this claim is taught by the combination of Nakagawa, Kim, and Ansorregui as discussed connection with claim 6.

Regarding Claim 14, Nakagawa teaches a method of managing a thermal system, comprising:
 identifying reinforcement information (see [0120]; Nakagawa: “The information detected by the temperature sensor 9 a is input to the state observing unit 312 from the power conversion device 2. The state observing unit 312 observes and outputs the “capacitor life” and the “fan life” as state variables.”);
learn thermal behavior information of the system based on reinforcement information, and (see [0120]; Nakagawa: “The state observing unit 312 observes and outputs the “capacitor life” and the “fan life” as state variables.” See [0121]; Nakagawa: “The learning unit 311 receives the state variable “capacitor life” and the state variable “fan life”.” See [0122]; Nakagawa: “The learning unit 311 includes a reward calculation unit 311 a and a function update unit 311 b. The reward calculation unit 311 a calculates a reward r based on the state variables, namely the capacitor life and the fan life.”)
provide information to adjust one or more of (see [0121]; Nakagawa: “The learning unit 311 feeds back the fan rotational speed, i.e. information on the rotational speed of the cooling fan 8, to the power conversion device 2.” See [0169]; Nakagawa: “The power conversion device 2 controls the rotational speed N of the cooling fan 8 based on the rotational speed N of the cooling fan 8 fed back from the machine learning device 310.”)
However, Kim does not explicitly teach:
identifying reinforcement information that is determined based on a system being identified as being operated in a first power mode of a plurality of power modes associated with the system,
provide information to … a parameter of a cooling subsystem based on the learned thermal behavior information and the input information.
Kim from the same or similar field of endeavor teaches:
provide information to adjust one or more of a parameter of a processor … based on the learned thermal behavior information and the input information. (see [0011]; Kim: “A system control device is in communication with each of the thermal sensors and is configured to migrate at least one task from a first processor core to a second processor core in response to a temperature of the first processor core exceeding a first threshold, reduce an operating frequency of the first processor core in response to the temperature of the first processor core exceeding a second threshold, …” See [0088]; Kim: “the operation frequency of the specific operation processing core may decrease to prevent the temperature of the operation processing core from continuing to increase. When the operation frequency of the operation processing core decreases, the temperature of the operation processing core may not increase rapidly.”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify Nakagawa to include Kim’s features of adjust one or more of a parameter of the processor based on the learned thermal behavior information and information from one or more of the processor, the sensor, and the cooling subsystem. Doing so would improve operating performance of the operation processor to provide various services within the device. (Kim, [0004])
However, neither Nakagawa nor Kim does not explicitly teach identifying reinforcement information that is determined based on a system being identified as being operated in a first power mode of a plurality of power modes associated with the system,
Ansorregui from the same or similar field of endeavor teaches:
identifying reinforcement information that is determined based on a system (see [0122]; Ansorregui: “the electronic device may calculate a reward according to Formula 3 below.” See [0122]; Ansorregui: “p represents power, t represents time, f represents a frequency, T represents a temperature, n represents a current reward value, n−1 represents a previous reward value, L represents a load value of GPU, and R represents a next reward value.”) being identified as being operated in a first power mode of a plurality of power modes associated with the system, (see [0090]; Ansorregui: “an electronic device may determine an operation mode of the electronic device… the operation mode may include a normal mode and a power saving mode”. See [0091]; Ansorregui: “when the electronic device determines the operation mode to be a normal mode, the electronic device may determine whether an input indicating a temperature exceeds a certain threshold value.” See [0095]; Ansorregui: “, when the electronic device determines the operation mode to be the power saving mode, the electronic device may then determine whether an input indicating a temperature exceeds a certain threshold value.”), 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the teachings of Nakagawa and Kim to include Ansorregui’s features of identify reinforcement information that is determined based on the electronic processing system being identified as being operated in a first power mode of a plurality of power modes associated with the electronic processing system, the reinforcement information being associated with one or more of the processor, the sensor, or the cooling subsystem. Doing so would balance performance or power consumption of hardware. (Ansorregui, [0011])

Regarding to Claim 16, the limitations in this claim is taught by the combination of Nakagawa, Kim, and Ansorregui as discussed connection with claim 3.

Regarding to Claim 17, the limitations in this claim is taught by the combination of Nakagawa, Kim, and Ansorregui as discussed connection with claim 4.

Regarding to Claim 19, the limitations in this claim is taught by the combination of Nakagawa, Kim, and Ansorregui as discussed connection with claim 6.

Regarding Claim 20, Nakagawa teaches at least one computer readable storage medium, comprising a set of instructions, which when executed by a computing device, cause the computing device to:
identify reinforcement information (see [0120]; Nakagawa: “The information detected by the temperature sensor 9 a is input to the state observing unit 312 from the power conversion device 2. The state observing unit 312 observes and outputs the “capacitor life” and the “fan life” as state variables.”);
learn thermal behavior information of the system based on the reinforcement information, and (see [0120]; Nakagawa: “The state observing unit 312 observes and outputs the “capacitor life” and the “fan life” as state variables.” See [0121]; Nakagawa: “The learning unit 311 receives the state variable “capacitor life” and the state variable “fan life”.” See [0122]; Nakagawa: “The learning unit 311 includes a reward calculation unit 311 a and a function update unit 311 b. The reward calculation unit 311 a calculates a reward r based on the state variables, namely the capacitor life and the fan life.”)
provide information to adjust one or more of (see [0121]; Nakagawa: “The learning unit 311 feeds back the fan rotational speed, i.e. information on the rotational speed of the cooling fan 8, to the power conversion device 2.” See [0169]; Nakagawa: “The power conversion device 2 controls the rotational speed N of the cooling fan 8 based on the rotational speed N of the cooling fan 8 fed back from the machine learning device 310.”)
However, Kim does not explicitly teach:
identify reinforcement information that is to be determined based on a system being identified as being operated in a first power mode of a plurality of power modes associated with the system,
provide information to … a parameter of a cooling subsystem based on the learned thermal behavior information.
Kim from the same or similar field of endeavor teaches:
provide information to adjust one or more of a parameter of a processor … based on the learned thermal behavior information. (see [0011]; Kim: “A system control device is in communication with each of the thermal sensors and is configured to migrate at least one task from a first processor core to a second processor core in response to a temperature of the first processor core exceeding a first threshold, reduce an operating frequency of the first processor core in response to the temperature of the first processor core exceeding a second threshold, …” See [0088]; Kim: “the operation frequency of the specific operation processing core may decrease to prevent the temperature of the operation processing core from continuing to increase. When the operation frequency of the operation processing core decreases, the temperature of the operation processing core may not increase rapidly.”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify Nakagawa to include Kim’s features of adjust one or more of a parameter of the processor based on the learned thermal behavior information and information from one or more of the processor, the sensor, and the cooling subsystem. Doing so would improve operating performance of the operation processor to provide various services within the device. (Kim, [0004])
However, neither Nakagawa nor Kim does not explicitly teach identify reinforcement information that is to be determined based on a system being identified as being operated in a first power mode of a plurality of power modes associated with the system,
Ansorregui from the same or similar field of endeavor teaches:
identify reinforcement information that is determined based on the electronic processing system (see [0122]; Ansorregui: “the electronic device may calculate a reward according to Formula 3 below.” See [0122]; Ansorregui: “p represents power, t represents time, f represents a frequency, T represents a temperature, n represents a current reward value, n−1 represents a previous reward value, L represents a load value of GPU, and R represents a next reward value.”) being identified as being operated in a first power mode of a plurality of power modes associated with the electronic processing system (see [0090]; Ansorregui: “an electronic device may determine an operation mode of the electronic device… the operation mode may include a normal mode and a power saving mode”. See [0091]; Ansorregui: “when the electronic device determines the operation mode to be a normal mode, the electronic device may determine whether an input indicating a temperature exceeds a certain threshold value.” See [0095]; Ansorregui: “, when the electronic device determines the operation mode to be the power saving mode, the electronic device may then determine whether an input indicating a temperature exceeds a certain threshold value.”), 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the teachings of Nakagawa and Kim to include Ansorregui’s features of identify reinforcement information that is determined based on the electronic processing system being identified as being operated in a first power mode of a plurality of power modes associated with the electronic processing system, the reinforcement information being associated with one or more of the processor, the sensor, or the cooling subsystem. Doing so would balance performance or power consumption of hardware. (Ansorregui, [0011])

Regarding to Claim 22, the limitations in this claim is taught by the combination of Nakagawa, Kim, and Ansorregui as discussed connection with claim 3.

Regarding to Claim 23, the limitations in this claim is taught by the combination of Nakagawa, Kim, and Ansorregui as discussed connection with claim 4.

Regarding to Claim 25, the limitations in this claim is taught by the combination of Nakagawa, Kim, and Ansorregui as discussed connection with claim 6.

Claims 5, 11, 18, and 24 are rejected under 35 U.S.C. 103 as being unpatentable over Nakagawa in view of Kim in view of Ansorregui further in view of Shen et al. (NPL: “Learning Based DVFS for Simultaneous Temperature, Performance and Energy Management” – hereinafter Shen).
Regarding claim 5, the combination of Nakagawa, Kim, and Ansorregui teaches all the limitations of claim 4 above; Ansorregui from the same or similar field of endeavor teaches wherein increased reward information corresponds to one or more of increased processor frequencies or reduced active cooling, and (see [0059]; Ansorregui: “the electronic device may increase the frequency and/or voltage in response to an increase in the load”. See [0100]; Ansorregui: “When it is determined that FPS is less than the sum of the fourth FPS threshold value and the F parameter, the electronic device may raise the frequency of the hardware in operation 750.”)
The same motivation to combine Nakagawa, Kim, and Ansorregui set forth for Claim 1 equally applies to Claim 5.
However, it does not explicitly teach wherein increased penalty information corresponds to processor temperatures above a threshold temperature.
Shen from the same or similar field of endeavor teaches wherein increased penalty information corresponds to processor temperatures above a threshold temperature. (fourth page, right column; “we use the change of the temperature as temperature penalty: If T > Told, a positive temperature penalty will be given. That is, ‘positive temperature penalty’ reads on ‘increased penalty information’ and ‘T > Told’ reads on ‘processor temperatures above a threshold temperature’.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify Nakagawa, Kim, and Ansorregui to include Shen’s features of increased penalty information corresponds to processor temperatures above a threshold temperature. Doing so would perform dynamic thermal management using reinforcement learning algorithm in order to improve the reliability and performance of the system. 
Regarding to Claim 11, the limitations in this claim is taught by the combination of Nakagawa, Kim, Ansorregui, and Shen as discussed connection with claim 5.

Regarding to Claim 18, the limitations in this claim is taught by the combination of Nakagawa, Kim, Ansorregui, and Shen as discussed connection with claim 5.

Regarding to Claim 24, the limitations in this claim is taught by the combination of Nakagawa, Kim, Ansorregui, and Shen as discussed connection with claim 5.

Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Nakagawa in view of Kim in view of Ansorregui in view of Thaploo et al. (US 20180301120 A1 –hereinafter Thaploo).
Regarding claim 13, the combination of Nakagawa, Kim, and Ansorregui teaches all the limitations of claim 7 above; however, it does not explicitly teach wherein the logic coupled to the one or more substrates includes transistor channel regions that are positioned within the one or more substrates.
Thaploo from the same or similar field of endeavor teaches wherein the logic coupled to the one or more substrates includes transistor channel regions that are positioned within the one or more substrates. (see [0141]; Thaploo: “With particular reference to repeater 6101, a buffer is formed of a plurality of inverters, namely a first inverter including metal oxide semiconductor field effect transistors (MOSFETs), namely a first inverter formed of a p-channel MOSFET (PMOS) P1,1 and an n-channel MOSFET (NMOS) N1,1.”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify Nakagawa, Kim, and Ansorregui to include Thaploo’s features of the logic coupled to the one or more substrates includes transistor channel regions that are positioned within the one or more substrates. Doing so would consume significantly less power, increase performance, and achieve high manufacturing yield. (Thaploo, [0002]-[0003])

Response to Arguments
Applicant's arguments filed 06/08/2022 have been fully considered but they are not persuasive. 
With respect to applicant’s argument located within the second page of the remarks (numbered as page 9) which recites:
“Nakagawa does not disclose or suggest identifying reinforcement information that is determined based on the electronic processing system being identified as being operated in a first power mode of a plurality of power modes associated with the electronic processing system, the reinforcement information being associated with one or more of the processor, the sensor, or the cooling subsystem, and learning thermal behavior information of the electronic processing system based on the reinforcement information as claimed. The other cited art does not remedy the deficiencies of Nakagawa.”
Examiner respectfully disagrees. Examiner interprets the state variables of Nakagawa as ‘reinforcement information’. Since the state variables are detected from the sensor (Nakagawa. [0120]), it reads on ‘identify reinforcement information, …the reinforcement information being associated with one or more of the processor, the sensor, or the cooling subsystem’. Moreover, Nakagawa [0121] discloses “The learning unit 311 receives the state variable “capacitor life” and the state variable “fan life”.” That is, Nakagawa still reads on “learning thermal behavior information of the electronic processing system based on the reinforcement information”. Examiner also notes that the argument is moot in view of new grounds of rejection, as necessitated by the amendment. The reference, namely Ansorregui, has been relied upon to reject the limitations incorporated in the amendment. Ansorregui [0122] discloses the electronic device calculates a reward based on frequency and temperature, and Ansorregui ([0090]-[0091] and [0095]) discloses the electronic device determines an operation mode in order to check a temperature with a threshold. That is, the combination of Nakagawa, Kim, and Ansorregui read on the limitation. The claims as presently presented do not preclude this interpretation.
With respect to applicant’s argument located within the third page of the remarks (numbered as page 10) which recites:
“Applicant traverses this rejection and respectfully asserts that Ansorregui and Shen do not remedy the deficiencies of Nakagawa and Kim.”
The arguments do not provide any details or evidence why Ansorregui fail to teach the limitations. As rationale and evidence has been provided in the section regarding 35 U.S.C. 103, for this argument to be persuasive, some more argumentation is required beyond a simple assertion. As the explanation above, Ansorregui [0122] discloses the electronic device calculates a reward (reinforcement information) based on frequency and temperature, and Ansorregui ([0090]-[0091] and [0095]) discloses the electronic device determines an operation mode in order to check a temperature with a threshold. That is, the combination of Nakagawa, Kim, and Ansorregui read on the limitation. The claims as presently presented do not preclude this interpretation.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Iwane (US 20190302708 A1) discloses a reinforcement learning device includes a processor that determines a first action on a control target by using a basic controller that defines an action on the control target depending on a state of the control target.
Kutty (US20180252593) discloses activate at least one cooling device housed in the rack or device chassis to provide cooling when a temperature sensor provides temperature data that satisfies a temperature condition.

THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to VI N TRAN whose telephone number is (571)272-1108. The examiner can normally be reached Mon-Fri 7:30-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ROCIO PEREZ-VELEZ can be reached on 571-270-5935. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/V.N.T./Examiner, Art Unit 2117                                                                                                                                                                                                        
/ROCIO DEL MAR PEREZ-VELEZ/Supervisory Patent Examiner, Art Unit 2117