DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 have been examined.

Specification
The disclosure is objected to because of the following informalities: Page 8, line 13 refers to a “U.S. Patent Application Serial No. ___” This appears to be a placeholder for a particular application serial number. Appropriate correction is required.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 15-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because: Claim 15 is directed to a “A computer program product, comprising a tangible machine-readable storage medium”.  Page 23 of the originally filed specification provides a discussion of “computer program products” and “processor-readable storage media”. Page 23 also recites “The term "article of manufacture" as In re Nuijten, 500 F.3d 1346, 84 USPQ2d 1495 (Fed. Cir. 2007).  In contrast, a computer-readable medium (e.g. magnetic or optical disk) claimed as a "non-transitory" medium encoded with a data structure defines structural and functional interrelationships between the data structure and the computer software and hardware components which permit the data structure’s functionality to be realized, and is thus statutory.  See MPEP 2106.03. Claims 16-20 are dependent upon claim 15 and are rejected for the same reasons provided above.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-2, 4-5, 9, 10-12, and 15-18 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent 10,417,556 to Fairbank et al. (“Fairbank”) in view of U.S. Patent Application Publication 2017/0063974 by Wang et al. ("Wang") and U.S. Patent Application Publication 2008/0097816 by Freire et al. ("Freire").

In regard to claim 1, Fairbank discloses:
1. A method, comprising: See Fairbank, Fig. 3, broadly depicting a method.
obtaining a specification …, wherein the specification comprises a plurality of states … and one or more control variables …; See Fairbank, detx 28, Fig. 1 and col. 4, lines 27-30, e.g. “In some embodiments, learning model 106 can be used to generate time series predictions based on time series training data 102 and input data 104, where these predictions are input into reinforcement learning model 108.” 
Fairbank does not expressly disclose a specification of at least one workflow of a plurality of concurrent workflows in a shared computing environment … states of the at least one workflow … variables for the at least one workflow in the shared computing environment. However, this is taught by Wang. See Wang, ¶ 0018, e.g. “Examples described herein provide a system for monitoring concurrent workflow execution across distributed nodes.” It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Wang’s concurrent workflows with Fairbank’s model/specification in order to facilitate and manage workflow execution as essentially suggested by Wang (see Wang, ¶ 0021).
obtaining a simulation model of the at least one workflow of the plurality of concurrent workflows representing a plurality of different configurations of the one or more control variables of the at least one workflow of the concurrent workflows …; See Fairbank, col. 4, lines 31-38, e.g. “Reinforcement learning model 108 can be configured by configuration 110, where the reinforcement learning model can simulate conditions according to the generated time series predictions and artificial intelligence agent 112. For example, artificial intelligence agent 112 can iterate over multiple steps of the simulation to arrive at parameters that optimize a defined reward function, which can  Also see Fairbank, col. 26, lines 33-35, e.g. “As the simulations are performed, and the Q-values can be updated in the Q-table based on the outcomes of each simulation step.”
Fairbank does not expressly disclose by mapping the states of the at least one workflow based on a similarity given by one or more state similarity functions However, this is taught by Freire.  See Freire, Abstract, e.g. “In workflow matching, a mapping from the context of one workflow to another is determined. To do so, the workflows are converted to labeled graphs and a scoring function is defined for nodes based on their labels.” Also see ¶ 0118-0119, e.g. “In workflow matching, a mapping from the context of one workflow to another is determined. To do so, the workflows are converted to labeled graphs and a scoring function is defined for nodes based on their labels.”  … “In an exemplary embodiment, the similarity score strikes a balance between the locality of pairwise compatibility and the overall similarity of the neighborhood.” It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Freire’s workflow mapping with Fairbank’s model in order to quickly build a workflow as suggested by Freire (see ¶ 0014).
evaluating, using at least one processing device, a plurality of values of the one or more control variables for an execution of said plurality of concurrent workflows using at least one reinforcement learning agent, See Fairbank, Fig. 1, element 112 and col. 4, lines 34-38, e.g. “For example, artificial intelligence agent 112 can iterate over multiple steps of the simulation to arrive at parameters that optimize a defined reward function, which can be output as output parameters 114.”
wherein said evaluating comprises 
observing said plurality of states, including a current state comprising a current configuration of said plurality of concurrent workflows and said shared computing environment, and See Fairbank, Fig. 1, element 112 and col. 4, lines 34-38, e.g. “For example, artificial intelligence agent 112 can iterate over multiple steps of the simulation to arrive at parameters that optimize a defined reward function, which can be output as output parameters 114n”
obtaining an expected utility score for a plurality of combinations of said control variables for the execution of said plurality of concurrent workflows given an allocation of one or more resources of the shared computing environment corresponding to said combination of said control variables in said current state, See Fairbank, col. 11, lines 23-25, e.g. “Subsequently, the set of rules with the best reward function score can be output from system 100, for example as parameter output 114.”
wherein the at least one reinforcement learning agent performs, using the simulation model, one or more of (i) the evaluating, (ii) the obtaining the expected utility score, and (iii) a training of a reinforcement learning model used by the at least one reinforcement learning agent; and  See Fairbank, col. 26, lines 33-35, e.g. “As the simulations are performed, and the Q-values can be updated in the Q-table based on the outcomes of each simulation step. For example, a positive reward causes the Q-value to be adjusted upward, a negative reward causes the Q-value to be adjusted downward, and so on.”
providing an allocation of the one or more resources of the shared computing environment reflecting the combination of the control variables having the expected utility score that satisfies a predefined score criteria. See Fairbank, col. 6, lines 49-51, e.g. “In some embodiments, reinforcement learning model 108 can iterate through steps of the simulation based on the predicted time series data. The result can generate parameters that meet a selection criteria, such as an optimized reward function.”

In regard to claim 2, Fairbank discloses:
2. The method of claim 1, wherein the evaluating the plurality of values of the one or more control variables for the execution of said plurality of concurrent workflows using the at least one reinforcement learning agent further comprises observing the current state and selecting an action based on a path in the simulation model that substantially maximizes at least one utility function for one or more nodes in the simulation model. See Fairbank, col. 11, lines 23-25, e.g. “Subsequently, the set of rules with the best reward function score can be output from system 100, for example as parameter output 114.”

In regard to claim 4, Fairbank discloses:
4. The method of claim 1, wherein estimated values of the expected utility score are given by observing the current state and the estimated values of the expected utility score are estimated based on a path in the simulation model that substantially maximizes at least one utility function for one or more nodes in the simulation model for a predefined number of training epochs. See Fairbank, col. 11, lines 23-25, e.g. “Subsequently, the set of rules with the best reward function score can be output from system 100, for example as parameter output 114.” Also see col. 21, lines 27-30, e.g. “For example, over “n” iterations of model training, hyperparameters such as a number of hidden layers, a number of epocs, a learning rate, a number of input units, and the like can be varied.”

In regard to claim 5, Fairbank discloses:
5. The method of claim 1, wherein the reinforcement learning model used by the at least one reinforcement learning agent is trained using input/output training pairs generated from the simulation model as a training batch for a predefined number of training epochs. See col. 10, lines 64-66, e.g. “produce training and testing inputs and outputs for the balance predictor mode.” Also see col. 21, lines 27-30, e.g. “For example, over “n” iterations of model training, hyperparameters such as a number of hidden layers, a number of epocs, a learning rate, a number of input units, and the like can be varied.”

In regard to claim 9, Fairbank discloses:
9. A system, comprising: a memory; and at least one processing device, coupled to the memory, operative to implement the following steps: See Fairbank, Fig. 2, depicting a system comprising memory coupled to a processor.


In regard to claims 10-12, parent claim 9 is addressed above. All further limitations have been addressed in the above rejections of claims 2 and 4-5, respectively.

In regard to claim 15, Fairbank discloses:
15. A computer program product, comprising a tangible machine-readable storage medium having encoded therein executable code of one or more software programs, wherein the one or more software programs when executed by at least one processing device perform the following steps:  See Fairbank, Fig. 2, element 214 and col. 5, lines 29-31, e.g. “computer-readable medium.”
All further limitations of claim 15 have been addressed in the above rejection of claim 1.

In regard to claims 16-18, parent claim 15 is addressed above. All further limitations have been addressed in the above rejections of claims 2 and 4-5, respectively.

3 is rejected under 35 U.S.C. 103 as being unpatentable over Fairbank in view of Wang and Freire as applied above, and further in view of U.S. Patent Application Publication 2017/0140270 by Mnih et al. (“Mnih”).

In regard to claim 3, Fairbank, Wang, and Freire do not expressly disclose:
3. The method of claim 2, wherein the action is selected based on the path in the simulation model when a configurable threshold satisfies a predefined value criteria. However, this is taught by Mnih. See Mnih, ¶ 0045, e.g. “As another example, the criteria may specify that the worker update the current values when the total number of reinforcement learning iterations performed by all of the multiple workers since the most recent time the workers updated the parameter values exceeds a specified threshold value.” It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Fairbank’s model with Mnih’s threshold in order to determine when parameter conditions have been satisfied as suggested by Mnih.

Claims 6, 13, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Fairbank in view of Wang and Freire as applied above, and further in view of U.S. Patent Application Publication 2013/0263117 by Konik et al. ("Konik").

In regard to claim 6, Fairbank, Wang, and Freire do not expressly disclose:
6. The method of claim 1, wherein said expected utility score further comprises an expected cost depending on one or more of an execution time of the at least one workflow and a consumption of resources in said shared computing environment. However, Konik teaches this. See Konik ¶ 0052, e.g. “The estimated cost 425, in an entry, is the cost (e.g., the estimated time), that the optimizer 315 estimates executing the execution plan 325 for the query will take using the current allocated resources (the current allocated memory 410 and the current allocated CPU 415) plus the estimated amount of resources requested (the estimated CPU requested 440 and the estimated RAM requested 445).” It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Fairbank’s score with Konik’s estimated cost in order to optimize execution and save costs as essentially suggested by Konik.

In regard to claim 13, parent claim 9 is addressed above. All further limitations have been addressed in the above rejection of claim 6.

In regard to claim 19, parent claim 15 is addressed above. All further limitations have been addressed in the above rejection of claim 6.

Claims 7, 14, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Fairbank in view of Wang and Freire as applied above, and further in view of U.S. Patent Application Publication 2015/0100530 by Mnih et al. (“Mnih2015”). 

In regard to claim 7, Fairbank, Wang, and Freire do not expressly disclose the claimed limitations. However, Mnih2015 teaches this as follows:
7. The method of claim 1, wherein said at least one reinforcement learning agent comprises a Deep Q-Learning agent using a Q-Deep Neural Network (QDNN) as a representation of a Q-Function, and See Mnih2015, ¶ 0041, e.g. “FIGS. 3a and 3b show alternative example configurations of a Q-learning deep neural network according to an embodiment of the invention;” Also see ¶ 0080, e.g. “We refer to convolutional networks trained with the described approach as Deep Q-Networks (DQN)”
wherein said obtaining the expected utility score for the plurality of combinations of said control variables comprises selecting an action at random and computing a cost-to-go from the expected utility score of the selected action updated by an observation of the current state, and See Mnih2015, ¶ 0011, e.g. “In embodiments, the first neural network generates a target action-value parameter, such as a target Q-value, and the second neural network is updated based on the target generated by the first.”
wherein an updating of the at least one reinforcement learning agent comprises a training of the QDNN given new samples in iterative epochs. See Mnih2015, Figs. 6a and 6b, depicting reward score training over iterative epochs.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Fairbank’s agent with Mnih2015’s DQN in order to compute Q-values for all possible actions in a given state with only a single forward pass through the network as suggested by Mnih2015 (see ¶ 0079).

In regard to claim 14, parent claim 9 is addressed above. All further limitations have been addressed in the above rejection of claim 7.

In regard to claim 20, parent claim 15 is addressed above. All further limitations have been addressed in the above rejection of claim 7.

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Fairbank in view of Wang and Freire as applied above, and further in view of U.S. Patent Application Publication 2013/0254402 by Vibhor et al. (“Vibhor”). 

In regard to claim 8, Fairbank, Wang, and Freire do not expressly disclose the claimed limitations. However, Vibhor teaches the following:
8. The method of claim 1, wherein the one or more control variables comprise one or more of a number of processing cores allocated to a given workflow and an amount of memory allocated to the given workflow. See Vibhor, ¶ 0303, e.g. “The utilization rate can be based on the amount, size and/or speed of the resources available to the workflow engine (e.g., processing speed, number of processors or processor cores, memory size and speed, communication rates, etc.) compared with how close to capacity the resources are operated.” It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use .

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
U.S. Patent Application Publication 2018/0285794 by Gray-Donald et al. See ¶ 0064, e.g. “Optimizer instance 620 may simulate the one or more service workflows to estimate their associated costs and forward the estimated costs for the one or more service workflows to service recommendation engine 630 for analysis at step 665.”
U.S. Patent Application Publication 2009/0007063 by Szpak et al. teaches workflow management providing a customized graphical workflow  for focused and efficient workflow design. See ¶ 0023 and 0032.
“Reinforcement Learning for Robots Using Neural Networks” by Lin. See p. 19, section 2.6, e.g. “One way to reduce the risk it to utilize an action model, which may come from the agent’s previous experience with solving a related task.” Also see p. 49, section 4.4.3, e.g. “If a perfect model is available, it is always useful to use the model. If a sufficiently good model can be learned faster than a good control policy …, it is worthwhile to learn the model.”
U.S. Patent Application Publication 2018/0121766 by McCord et al. See ¶ 0055, e.g. “The model manager 380 then prompts the reinforcement learning server 210 to process the recorded observations and actions 450 to find the best parameters to match the observations 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to James D Rutten whose telephone number is (571)272-3703. The examiner can normally be reached M-F 9:00-5:30 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached on (571)272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/James D. Rutten/Primary Examiner, Art Unit 2121