DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
	Receipt is acknowledged of the amendment filed 9/19/2022.  Claims 1, 11 and 16 have been amended. Claims 3-4, 13-14 and 18-19 have been cancelled. No claims have been added. Claims 1-2, 5-12, 15-17 and 20 are pending and an action is as follows.

Response to Arguments
Applicant’s arguments with respect to claims 1-2, 5-12, 15-17 and 20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 5, 10-12, 15-17 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Calmon et al. US 2020/0241921 (hereinafter Calmon), in view of de Silva et al. 2022/0156639 (hereinafter Silva) and Chalupka et al. US 2021/0004735 (hereinafter Chalu).

Regarding claim 1, Calmon teaches a computer-implemented method comprising: 
determining, by a machine learning model, a predicted workload for a system (employing a reinforcement learning module for determining an iterative workload via a an obtaining step 210 of Fig. 2, wherein each simulated iteration of the workload is the next forecasted (arguably akin to “predicted”) workload for the systems as per ¶34) and a current system state of the system (the current state is also determined [Calmon, Figs. 2 & 3, Steps 220 & 310 and ¶36]); 
determining an action to be enacted for the system based at least in part on the predicted workload and the current system state (during the simulated iteration of the iterative workload (interpreted as based on the predicted workload) the determination is made to select an action from the set of available actions for the current state (interpreted as the current system state) of the simulated iteration of the iterative workload.) [Calmon, ¶34-¶36]; 
enacting the action for the system (the selected action is applied to the iteration and executed ¶36 & ¶48); 
evaluating a state of the system after the action has been enacted (the state after applying the executed action is evaluated with a numerical score [Calmon, ¶48]); 
determining a reward for the machine learning model based at least in part on the state of the system after the action has been enacted (the numerical score is associated with reward which is determined for the machine learning model based on the state after applied the executed action [Calmon, ¶48 and Figure 3, 310-320]); and 
updating the machine learning model based on the reward (Through updating, based on the rewards, the reinforcement learning agent learns which action is substantially optimal for each state, Calmon, Figure 3 320, ¶48), while the workload is not explicitly stated by the Calmon reference to be “predicted”.
Silva teaches wherein the workload is a predicted workload [Silva, Figure 1, Element 102, ¶30-¶36].
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the teachings of Calmon, indicating that the workload for system and state may be determined and used to update a machine learning model based on a reward determined for outcome of workload, with the teaching that the of Silva, indicating that the workload may be a predictive workload. The resulting benefit would have been the ability to improve efficiency and/or speed for processing future and possibly reduce the processing time required for future workloads [Silva, ¶30], but it does not explicitly teach wherein the is trained utilizing historical system state data that includes a historic state, a taken action, and a gained reward.
However, Chalu teaches wherein the machine learning model comprises a reinforcement learning model that is trained utilizing historical system state data that includes a historic state, a taken action, and a gained reward [Chalu, ¶17].
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the teachings of Calmon, in view of Silva, indicating that the workload for system and state may be determined and used to update a machine learning model based on a reward determined for outcome of workload, with the teaching that the of Chalu, indicating wherein the machine learning model comprises a reinforcement learning model that is trained utilizing historical system state data that includes a historic state, a taken action, and a gained reward. The resulting benefit would have been the ability to enable the agent to “learn” about the long term consequences of its actions, and can be used to formulate policy functions (or simply “policies”) on which to base future decisions [Chalu, ¶17].

Regarding claim 11, Calmon teaches a system comprising: a memory having computer readable instructions and one or more processors for executing the computer readable instructions, the computer readable instructions controlling the one or more processors to perform operations [Calmon, ¶105 software programs stored in memory executed by a processor of a processing device such as a computer)] comprising:
 determining, by a machine learning model, a predicted workload for a system (employing a reinforcement learning module for determining an iterative workload via an obtaining step 210 of Fig. 2, wherein each simulated iteration of the workload is the next forecasted/predicted workload for the systems as per ¶34) and a current system state of the system (the current state is also determined [Calmon, Figs. 2 & 3, Steps 220 & 310 and ¶36]); 
determining an action to be enacted for the system based at least in part on the predicted workload and the current system state (during the simulated iteration of the iterative workload (interpreted as based on the predicted workload) the determination is made to select an action from the set of available actions for the current state (interpreted as the current system state) of the simulated iteration of the iterative workload.) [Calmon, ¶34-¶36]; 
enacting the action for the system (the selected action is applied to the iteration and executed ¶36 & ¶48); 
evaluating a state of the system after the action has been enacted (the state after applying the executed action is evaluated with a numerical score [Calmon, ¶48]); 
determining a reward for the machine learning model based at least in part on the state of the system after the action has been enacted (the numerical score is associated with reward which is determined for the machine learning model based on the state after applied the executed action. [Calmon, ¶48 and Figure 3, 310-320]); and 
updating the machine learning model based on the reward (Through updating, based on the rewards, the reinforcement learning agent learns which action is substantially optimal for each state, Calmon Figure 3 320, ¶48). While the workload is not explicitly stated by the Calmon reference to be predicted.
 Silva teaches wherein the workload is a predicted workload [Silva, Figure 1, Element 102, ¶30-¶36].
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the teachings of Calmon, indicating that the workload for system and state may be determined and used to update a machine learning model based on a reward determined for outcome of workload, with the teaching that the of Silva, indicating that the workload may be a predictive workload. The resulting benefit would have been the ability to improve efficiency and/or speed for processing future and possibly reduce the processing time required for future workloads [Silva, ¶30], but it does not explicitly teach wherein the is trained utilizing historical system state data that includes a historic state, a taken action, and a gained reward.
However, Chalu teaches wherein the machine learning model comprises a reinforcement learning model that is trained utilizing historical system state data that includes a historic state, a taken action, and a gained reward [Chalu, ¶17].
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the teachings of Calmon, in view of Silva, indicating that the workload for system and state may be determined and used to update a machine learning model based on a reward determined for outcome of workload, with the teaching that the of Chalu, indicating wherein the machine learning model comprises a reinforcement learning model that is trained utilizing historical system state data that includes a historic state, a taken action, and a gained reward. The resulting benefit would have been the ability to enable the agent to “learn” about the long term consequences of its actions, and can be used to formulate policy functions (or simply “policies”) on which to base future decisions [Chalu, ¶17].

Regarding claim 16, Calmon teaches a computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to perform operations [Calmon, ¶105 software programs stored in memory executed by a processor of a processing device such as a computer)] comprising: 
 	determining, by a machine learning model, a predicted workload for a system (employing a reinforcement learning module for determining an iterative workload via a an obtaining step 210 of Fig. 2, wherein each simulated iteration of the workload is the next forecasted/predicted workload for the systems as per ¶34) and a current system state of the system (the current state is also determined [Calmon, Figs. 2 & 3, Steps 220 & 310 and ¶36]); 
determining an action to be enacted for the system based at least in part on the predicted workload and the current system state (during the simulated iteration of the iterative workload (interpreted as based on the predicted workload) the determination is made to select an action from the set of available actions for the current state (interpreted as the current system state) of the simulated iteration of the iterative workload.) [Calmon, ¶34-¶36]; 
enacting the action for the system (the selected action is applied to the iteration and executed ¶36 & ¶48); 
evaluating a state of the system after the action has been enacted (the state after applying the executed action is evaluated with a numerical score [Calmon, ¶48]); 
determining a reward for the machine learning model based at least in part on the state of the system after the action has been enacted (the numerical score is associated with reward which is determined for the machine learning model based on the state after applied the executed action. [Calmon, ¶48 and Figure 3, 310-320]); and 
updating the machine learning model based on the reward (Through updating, based on the rewards, the reinforcement learning agent learns which action is substantially optimal for each state, Calmon Figure 3 320, ¶48). While the workload is not explicitly stated by the Calmon reference to be predicted;
Silva teaches wherein the workload is a predicted workload [Silva, Figure 1, Element 102, ¶30-¶36].
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the teachings of Calmon, indicating that the workload for system and state may be determined and used to update a machine learning model based on a reward determined for outcome of workload, with the teaching that the of Silva, indicating that the workload may be a predictive workload. The resulting benefit would have been the ability to improve efficiency and/or speed for processing future and possibly reduce the processing time required for future workloads [Silva, ¶30], but it does not explicitly teach wherein the is trained utilizing historical system state data that includes a historic state, a taken action, and a gained reward.
However, Chalu teaches wherein the machine learning model comprises a reinforcement learning model that is trained utilizing historical system state data that includes a historic state, a taken action, and a gained reward [Chalu, ¶17].
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the teachings of Calmon, in view of Silva, indicating that the workload for system and state may be determined and used to update a machine learning model based on a reward determined for outcome of workload, with the teaching that the of Chalu, indicating wherein the machine learning model comprises a reinforcement learning model that is trained utilizing historical system state data that includes a historic state, a taken action, and a gained reward. The resulting benefit would have been the ability to enable the agent to “learn” about the long term consequences of its actions, and can be used to formulate policy functions (or simply “policies”) on which to base future decisions [Chalu, ¶17].

	Regarding claim 2, the combination of Calmon, in view of Silva and Chalu teaches the computer-implemented method of claim 1, further comprising: determining, by the machine learning model, a second predicted workload for the system and a second current system state of the system; determining a second action to enact for the system based at least in part on the second predicted workload and the second current system state; enacting the second action for the system; evaluating a second state of the system after the second action has been enacted; determining a second reward for the machine learning model based at least in part on the second state of the system after the second action has been enacted; and updating the machine learning model based on the second reward (Figure 3, Step 330 of Calmon suggests that a new allocation of resources for simulation of iterative workload is performed such that steps 310-320 are repeat for the new iterative workload. Thus, it is disclosed that the Employment of the reinforcement learning agents again select a second action for the second current state, whereby the selected second action is executed for the second state based in part of the new allocation of resources for simulation of the iterative workload, interpreted as the second predicted workload. This second action is evaluated and scored to obtain its corresponding reward which will be used to update the reinforcement learning module/agents [Calmon, Figures 2-3 ¶34-36 & ¶48]).

	Regarding claim 5, the combination of Calmon, in view of Silva and Chalu teaches the computer-implemented method of claim 1, wherein the action comprises an allocation of system resources within the system [Calmon, ¶83 (The action is defined as an increment or decrement of the amount of resources dedicated to the controlled workload)].

	Regarding claim 10, the combination of Calmon, in view of Silva and Chalu teaches the computer-implemented method of claim 1, wherein determining the reward is further based on an execution cost on the system for the predicted workload after the action has been enacted. (The numerical score is associated with reward which is determined for the machine learning model based on the state after applied the executed action. [Calmon, ¶48 and Figure 3, 310-320.  Calmon further teaches wherein the rewards is further based on an execution cost on the system for the predicted workload after the action has been enacted See Figure 10 and ¶98 & ¶100 (indicates that the job does not need the amount of resources allocated to it and reducing the allocation can decrease cost and even make room for other jobs to run)])

	Regarding claim 12, the combination of Calmon, in view of Silva and Chalu teaches the system of claim 11, further comprising: determining, by the machine learning model, a second predicted workload for the system and a second current system state of the system; determining a second action to enact for the system based at least in part on the second predicted workload and the second current system state; enacting the second action for the system; evaluating a second state of the system after the second action has been enacted; determining a second reward for the machine learning model based at least in part on the second state of the system after the second action has been enacted; and updating the machine learning model based on the second reward.
(See the rationale applied to claim 2, is hereby applied to the rejection of claim 12).

Regarding claim 15, the combination of Calmon, in view of Silva and Chalu teaches the system of claim 11, wherein the action comprises an allocation of system resources within the system.
(See the rationale applied to claim 5, is hereby applied to the rejection of claim 15)

Regarding claim 17, the combination of Calmon, in view of Silva and Chalu teaches the computer program product of claim 16, further comprising: determining, by the machine learning model, a second predicted workload for the system and a second current system state; determining a second action to enact for the system based at least in part on the second predicted workload and the second current system state; enacting the second action for the system; evaluating a second state of the system after the second action has been enacted; determining a second reward for the machine learning model based at least in part on the second state of the system after the second action has been enacted; and updating the machine learning model based on the second reward.
(See the rationale applied to claim 2, is hereby applied to the rejection of claim 17)

Regarding claim 20, the combination of Calmon, in view of Silva and Chalu teaches the computer program product of claim 16, wherein the action comprises an allocation of system resources within the system.
(See the rationale applied to claim 5, is hereby applied to the rejection of claim 20)


Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Calmon, in view of Silva and Chalu as applied to claim 1 above, and further in view of Gong et al. US 2018/0276050 (hereinafter Gong).
Regarding claim 6, the combination of Calmon, in view of Silva and Chalu teaches the computer-implemented method of claim 1, wherein actions are enacted for the system [Calmon, ¶45-48], but it does not teach wherein the actions is performed by a workload manager.
However, Gong teaches wherein enacting the action for the system is performed by a workload manager (WLM) [Gong ¶21-23 cluster manager 110 may identify or group one or more workloads into a workload cluster. In this example, cluster manager 110 may also include functionality for managing a workload cluster, e.g., initiating actions that are to be performed with regard to each workload within the workload cluster].
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the teachings Calmon, in view of Silva and Chalu, indicating that the machine learning model that performs the optimization of workload resources, with the teachings of Gong, indicating that the machine learning model performs reinforcement learning model is trained utilizing historical system state data. The benefiting result of the combination would have been the ability to is to guarantee robust actions without the complete knowledge, if any, of the situational environment. A main advantage of reinforcement learning according to illustrative embodiments, compared to other learning approaches, is that it requires no information about the environment except for a reinforcement signal (which is a signal that reflects the success or failure of the entire system after it has performed some sequence of actions) [Gong, Col. 3, Lines 63-Col. 4, Lines 6], whereby the historical states which are accumulated with each iteration further refines the overall accuracy.

Claims 7-8 are rejected under 35 U.S.C. 103 as being unpatentable over Calmon, in view of Silva and Chalu as applied to claim 1 above, and further in view of Gopalan et al. US 2018/0034920 (hereinafter Gopa).

Regarding claim 7, Calmon, in view of Silva and Chalu teaches the computer-implemented method of claim 1, wherein the reward is based on optimization [Calmon, ¶48], but it does not teach wherein the reward is further based on a customer goal for the system.
However, Gopa teaches wherein the reward is further based on a customer goal for the system [Gopa, ¶19, For machine learning, e.g. reinforcement learning (RL), defines rewards broadly to encompass a broad range of possible goals for customer services of the system]. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the teachings Calmon, in view of Silva and Chalu, indicating that the machine learning model that performs the optimization of workload resources, with the teachings of Gopa, indicating that the machine learning model performs reinforcement learning model is trained utilizing a reward system tailored to the customers goals. The benefiting result of the combination would have been the ability to is to create a system that addresses customer’s concerns, cost to customers and increases customer satisfaction [Gopa, ¶19].

Regarding claim 8, the combination of Calmon, in view of Silva, Chalu and Gopa teaches the computer-implemented method of claim 7, wherein the customer goal comprises a low-cost workload execution goal [Gopa, ¶19, For machine learning, e.g. reinforcement learning (RL), defines rewards broadly to encompass a broad range of possible goals for customer services of the system as they relate to cost].
The same motivation to combine the applied references is that same a that which is expressed in the rejection of claim 7 above.

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Calmon, in view of Silva, Chalu and Gopa as applied to claim 7 above, and further in view of Babu et al. US 2019/0370146 (hereinafter Babu).

Regarding claim 9, the combination of Calmon, in view of Silva, Chalu and Gopa teaches the computer-implemented method of claim 7, wherein the reward and goal is based on optimization [Calmon, ¶48], but it does not teach wherein the customer goal comprises a high throughput workload execution goal.
However, Babu teaches wherein the customer goal comprises a high throughput workload execution goal [Babu, ¶58 goals comprise workload priorities for improving workload throughput].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the teachings Calmon, in view of Silva, Chalu and Gopa, indicating that the machine learning model that performs the optimization of workload resources based on customer goals, with the teachings of Babu, indicating that the customer goals are directed to increased throughput. The benefiting result of the combination would have been the ability to is to create a system that addresses customer’s concerns regarding system ability to handle the demands of large or multiple applications and improve latency [Babu, ¶58].


Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
  
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LONNIE V SWEET whose telephone number is (571)270-3622. The examiner can normally be reached Monday-Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hassan Phillips can be reached on 571-272-3940. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/LONNIE V SWEET/Primary Examiner, Art Unit 2467