DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Remarks
The present application having Application No. 16/736,476 filed on 01/07/2020 presents claims 1-20 for examination.

Examiner Notes
Examiner cites particular columns and line numbers in the references as applied to the claims below for the convenience of the applicant. Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested that, in preparing responses, the applicant fully consider the references in entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the examiner.

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  


Drawings
The applicant’s drawings submitted are acceptable for examination purposes.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 01/08/2020 has been acknowledged and the cited references have been considered by the examiner.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
Such claim limitation(s) is/are: “means for selecting…” in claims 16, 18 and 19.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 1-5, 8, 11-12, 14 and 16-20 are rejected under AIA  35 U.S.C. 103 as being unpatentable over Mallya Kasaragod et al. (US 2020/0167686 A1) (hereinafter Kasaragod) in view of Padala et al. (US 2015/0058265 A1) (hereinafter Padala).

As per claim 1, Kasaragod discloses In a resource management digital medium environment, a method implemented by at least one computing device across multiple iterations, and in each iteration the method comprising: identifying, by an application, a previous action performed in a previous iteration of the multiple iterations to manage computing device resource usage by the application (e.g. Kasaragod: [0042] [0064] attainment of an average reward value for the simulation through execution of actions in the simulation environment over a minimum number of iterations of the simulation.  Also see [0040] [0052] [0070] [0079] [0096] [0012].); determining a current state of the application indicating a current health of the application, the current state being one of multiple states for the application (e.g. Kasaragod: [0040] discloses a training application container that performs training of the reinforcement learning model based on actions performed within the simulation environment.  The training of the reinforcement learning model may take into account the reward value, as determined via the reinforcement function, corresponding to the action performed, the initial state, and the state attained via execution of the action.  The training container may provide the updated reinforcement learning model to a simulation application container to utilize in the simulation of the application and to obtain new state-action-reward data that may be used to continue updating the reinforcement learning model.  Also see [0107]); determining a reward value to apply based at least in part on the current state of the application (e.g. Kasaragod: [0021] discloses based on the simulation environment state achieved through execution of the action, the application may determine, based on the reinforcement function, a reward value.  [0040] [0059] The training of the reinforcement learning model may further take into account the reward value, as determined via the custom-designed reinforcement function, corresponding to the action performed, the initial state, and the state attained via execution of the action. The training application container may provide the updated reinforcement learning model to a simulation application container to utilize in the simulation of the application and to obtain new state-action-reward data that may be used to continue updating the reinforcement learning model. [0069] [0077-0078] discloses using the reinforcement function, the simulation application container may determine the corresponding reward value for the tuple comprising the initial state, action performed, and resulting state of the simulation environment. [0033] discloses customer may provide reinforcement learning function, which may be used to define a set of reward values corresponding to actions performable by the device based on an initial state of the simulation environment and the resulting state.  Also see [0067] [0088] [0107-0108] [0110] [0114].); updating a reinforcement learning model by distributing the reward value across action values associated with at least one action, the reinforcement learning model associating each of multiple actions with each of the multiple states (e.g. Kasaragod: [0021-0022] discloses based on the simulation environment state achieved through execution of the action, the application may determine, based on the reinforcement function, a reward value.  The simulation application may transmit this information to the training application to update the reinforcement learning model.  The training application uses the data from the simulation application to update the reinforcement learning model.  [0040] discloses the training of the reinforcement learning model may further take into account the reward value, as determined via the custom-designed reinforcement function, corresponding to the action performed, the initial state, and the state attained via execution of the action. The training application container may provide the updated reinforcement learning model to a simulation application container to utilize in the simulation of the application and to obtain new state-action-reward data that may be used to continue updating the reinforcement learning model. Also see [0042] [0059] [0067-0068] [0070] [0074] [0078] [0107-0109] [0112-0113] [claim 2].); selecting, based on the reinforcement learning model, an action of the multiple actions associated with the current state (e.g. Kasaragod: [0069] [0108] the simulation application container may initiate the simulation using a randomized reinforcement learning model, whereby the simulation application container uses the model to select, based on an initial state of the simulation environment, a random action to be performed. The simulation application container may execute the action and determine the resulting state of the simulation environment. Using the reinforcement function, the simulation application container may determine the corresponding reward value for the tuple comprising the initial state, action performed, and resulting state of the simulation environment. The simulation application container may store this data point in the memory buffer to provide the performance data to the training application and execute another action based on the current state of the simulation environment. Through this process, the simulation application container may continue to add data points to the memory buffer.  Also see [0076-0077][Claim 2].).
As discussed above, Kasaragod discloses selecting and performing an action but does not expressly disclose performing, by the application, the selected action to modify usage of at least one computing device resource.
However, Padala discloses selecting, based on the reinforcement learning model, an action of the multiple actions associated with the current state; and performing, by the application, the selected action to modify usage of at least one computing device resource (e.g. Padala: [Abstract] [0003-0005] discloses recommending and selecting a scaling action from a plurality of possible actions for the multi-tier application in the current state.  [0034] [0038-0041] discloses the automatic scaling modules operates to automatically scale the multi-tier application as needed.  [0071-0072] discloses selecting scaling action and applying scale-up or scale-down policies based on the selected action.  Also see [0043] [0075].). Padala also discloses determining a reward value to apply based at least in part on the current state of the application (e.g. Padala: [0039] [0043-0044] [0048].).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the method of scaling resources for multi-tier application based on selected action as taught by Padala into Kasaragod because it would allow ensuring application performance is meeting the SLO while preventing excessive usage of resources (See Padala: [0043]).

As per claim 2, the combination of Kasaragod and Padala discloses The method as recited in claim 1 [See rejection to claim 1 above], wherein the determining the current state of the application comprises determining the current state of the application based on at least one of a nature of a workflow being performed by the application, a health of the application, and user interface activity for the application (e.g. Kasaragod: [0040] discloses attaining new/current state via execution of the action in the simulation of the application.  [0073] discloses simulation agent provides multiple simulation application containers to allow performance of multiple simulations to provide data to train the reinforcement learning model.  Also see [0025] [0059] [0067] [0069] [0077] [0108] [0110-0111].  Padala: [0004] discloses determining current state of application based on the operational metrics.  [0039] discloses agent obtains current state of the application at certain time interval.  Also see [0070] [0073] [0075].).

As per claim 3, the combination of Kasaragod and Padala discloses The method as recited in claim 1 [See rejection to claim 1 above], wherein determining the reward value comprises determining the reward value based on the current state being different than a previous state in the previous iteration, the reward value being greater if the current state is an improved state over the previous state (e.g. Kasaragod: [0021-0022] discloses based on the simulation environment state achieved through execution of the action, the application may determine, based on the reinforcement function, a reward value.  This information is used to update the reinforcement model until maximum reward value has been attained over the last several simulation attempts.  [0042] [0059] discloses attaining average reward value for simulation through execution of actions over a number of iterations of the simulation.  [0069] discloses using the reinforcement function, the simulation application container may determine the corresponding reward value for the tuple comprising the initial state, action performed and resulting state of the simulation.  The container may store this data point and execute another action based on the current state. [0079] discloses the model training application determines that the reinforcement learning model has converged on an optimal solution and a determination is made that the reward value is not going to improve beyond the average reward value.  While average reward values are used throughout the disclosure, other statistics involving reward value over a set of previous simulation iterations may be used.  Thus, the reward value is determined for each iteration while the current state is improving from the past iteration.  Also see [0076-0078] [0097-0098] [0115-0016].).

As per claim 4, the combination of Kasaragod and Padala discloses The method as recited in claim 1 [See rejection to claim 1 above], wherein determining the reward value comprises determining the reward value based on a change in resources consumed by the application if the current state is the same as a previous state in the previous iteration (e.g. Kasaragod: [0070] [0079] discloses The simulation application container may obtain an updated reinforcement learning model from the training application container. In response to obtaining the updated reinforcement learning model, the simulation application container may perform another iteration of the simulation to generate new data points usable to continue updating the reinforcement learning model. The training application container may evaluate the reinforcement learning model to determine whether a termination condition has been met. For instance, if based on the data points obtained from the memory buffer, the training application container determines that the reinforcement learning model has converged on an optimal solution, the training application container may transmit a notification to the simulation agent to indicate completion of the simulation. The convergence indicates reward value will remain constant and will not improve from previous iteration.  Padala: [0039] [0042-0043].).

As per claim 5, the combination of Kasaragod and Padala discloses The method as recited in claim 1 [See rejection to claim 1 above], wherein the reinforcement learning model comprises a table including multiple columns and multiple rows corresponding to the multiple states and the multiple actions, updating the reinforcement learning model comprises distributing the reward value across a first cell of the table corresponding to the previous action and a previous state in the previous iteration, as well as one or more cells of the table corresponding to the previous state that are adjacent to the first cell (e.g. Kasaragod: [0040] [0042] a training application container that performs training of the reinforcement learning model based on actions performed by the simulated robotic device within the simulation environment based on the state of the robotic device and simulation environment prior to and after execution of the action. The training of the reinforcement learning model may further take into account the reward value, as determined via the custom-designed reinforcement function, corresponding to the action performed, the initial state, and the state attained via execution of the action. The training application container may provide the updated reinforcement learning model to a simulation application container to utilize in the simulation of the application and to obtain new state-action-reward data that may be used to continue updating the reinforcement learning model.  [0069] Using the reinforcement function, the simulation application container may determine the corresponding reward value for the tuple comprising the initial state, action performed, and resulting state of the simulation environment. The simulation application container may store this data point in the memory buffer and execute another action based on the current state of the simulation environment. Through this process, the simulation application container may continue to add data points to the memory buffer. It is understood that state-action-reward data stored in the memory buffer may be stored using any well-known data structure such as a table including multiple columns and multiple rows. Also see [0022] [0059] [0067] [0070] [0077] [0108].).

As per claim 8, the combination of Kasaragod and Padala discloses The method as recited in claim 1  [See rejection to claim 1 above], wherein selecting the action comprises selecting an action using a first policy and a second policy, the first policy comprising selecting the action based on which action in the reinforcement learning model corresponding to the current state has the greatest action value, the second policy comprising selecting an action from the reinforcement learning model randomly (e.g. Kasaragod: [0069] [0108] [0110] the simulation application container uses the model to select, based on an initial state of the simulation environment, a random action to be performed.  [0076] discloses selecting pairing of initial simulation environment states and corresponding actions.  During the initial execution of the simulation application, the system agent may select this paring at random.  Padala: [Abstract] [0003] discloses selecting one of reinforced learning and heuristic operation based on a policy to recommend a scaling action from a current state of the application.  If reinforced learning is selected, the reinforced learning is applied to select the scaling action from a plurality of possible actions.  If heuristic operation is selected, the heuristic operation is applied to select the scaling action.  [0039] discloses agent chooses the action based on the optimal policy to achieve maximum reward.  Also see [0046-0048] [0050] [0059] [0070]).

As per claims 11, 12 and 14, these are device claims having similar limitations as cited in method claims 1, 5 and 8, respectively.  Thus, claims 11, 12 and 14 are also rejected under the same rationale as cited in the rejection of rejected claims 1, 5 and 8, respectively.

As per claims 16-20, these are system claims having similar limitations as cited in method claims 1-5, respectively.  Thus, claims 16-20 are also rejected under the same rationale as cited in the rejection of rejected claims 1-5, respectively.

Allowable Subject Matter
Claims 6-7, 9-10, 13 and 15 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims, subject to 35 U.S.C. 101 and/or 112 rejections detailed above.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Hiren Patel whose telephone number is (571) 270-3366.  The examiner can normally be reached on Monday to Friday 9:30 AM to 6:00 PM. If attempts to reach the above noted Examiner by telephone are unsuccessful, the Examiner’s supervisor, Emerson Puente, can be reached at the following telephone number: (571) 272-3652. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov.  Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).

May 21, 2022

/HIREN P PATEL/Primary Examiner, Art Unit 2196