Detailed Action
This action is in response to Applicant's communications filed 15 July 2022.  
Claim(s) 1, 8, 14 and 20 was/were amended.  No claims were cancelled.  No claims were withdrawn.  No claims were added.  Claims 1-4, 7-9, 11-16, 18-21, and 23-25 are pending in this Application.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendments/Arguments
Applicant's arguments, filed 15 July 2022, regarding the rejections of claims 1-4, 7-9, 11-16, 18-21, and 23-25 under 35 USC 103 have been fully considered but are not persuasive.
Regarding independent claims 1, 8, 14, and 20, Applicant argues (Remarks, pp. 9-10) that Liu does not teach "identify one or more states in a code, the one or more states to comprise virtual resources identified in the code to assign to physical resources."  Applicant argues that Liu is directed to assigning jobs to virtual machines in a cloud storage system and does not teach "identifying states within program code."  It is noted that the features upon which applicant relies (i.e., "program code") are not recited in the rejected claim(s).  Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).  It is understood that virtual machines on server clusters using cloud computing would necessarily be implemented on computer code (Liu: "Cloud computing with virtualization technology enables computational resources (including CPU, memory, disk, communication bandwidth, etc.) in data centers or server clusters to be shared by allocating virtual machines (VMs) in an on-demand manner." sec. I, p. 372). Liu further teaches modeling the state space S and available actions A (sec. II.A, p. 373) its virtual machine resource allocation ("we will describe the global and local tiers of the overall cloud resource allocation and power management framework. The global tier of cloud resource (VM) allocation exhibits high dimensions in state and action spaces" sec. III, p. 375).  Thus, Liu teaches the limitations of the claims.
Applicant argues (Remarks, pp. 11- 12) that Barrett does not teach "generate at least one additional sequence based on a random assignment of the virtual resources..." because Barrett is not random.  Applicant argues that Barrett using random mutations is not random because while the mutations may be random, the parent selection is not random.  Examiner notes that the claim language does not require that the parent selection be random, and also disagrees that the parent selection is not random.  Barrett teaches randomness in its parent selection, crossover, and mutation.  As discussed with regards to Algorithm 1, parents are probabilistically selected using roulette wheel selection ("Roulette wheel selection involves ranking chromosomes in terms of their fitness and probabilistically selecting them." sec. IV.B, p. 98), wherein probabilistic selection is random.  Thus, Barrett teaches the claim limitations.
The rejection of the dependent claims for depending from rejected claims is maintained.
For the aforementioned reasons, claims 1-4, 7-9, 11-16, 18-21, and 23-25 are rejected under 35 USC 103.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.


Claim(s) 1-2, 4, 7-9, 11-15, 18-21, and 23-25 is/are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. (A Hierarchical Framework of Cloud Resource Allocation and Power Management Using Deep Reinforcement Learning, hereinafter "Liu") in view of Barrett et al. (A Learning Architecture for Scheduling Workflow Applications in the Cloud, hereinafter "Barrett").

Regarding Claim 1,
Liu teaches an apparatus to determine a physical resource assignment, comprising:
a compiler logic circuitry ("computational resources (including CPU, memory, disk, communication bandwidth, etc.)" sec. I, p. 372) arranged to identify one or more states in a code, the one or more states to comprise virtual resources identified in the code to assign to physical resources ("However, a complete resource allocation framework in the cloud computing systems exhibits high dimensions in state and action spaces. For example, a state in the state space may be the Cartesian product of characteristics and current resource utilization level of each server (for hundreds of servers) as well as current workload level (number and characteristics of VMs for allocation)." sec. I, p. 372; "Cloud computing with virtualization technology enables computational resources (including CPU, memory, disk, communication bandwidth, etc.) in data centers or server clusters to be shared by allocating virtual machines (VMs) in an on-demand manner." sec. I, p. 372; "we will describe the global and local tiers of the overall cloud resource allocation and power management framework. The global tier of cloud resource (VM) allocation exhibits high dimensions in state and action spaces" sec. III, p. 375);
each policy of the policies comprising the one or more states,  each state having a status of the code (Algorithm 2; "RL state", sec. VI.B, p. 379), an action (Algorithm 2; "action set A" sec. VI.B, p. 379), and an expected reward (Algorithm 2; "reward rate" sec. VI.B, p. 379) for assignment of the virtual resources to the physical resources ("a deep Q-learning framework was also proposed to derive the optimal action a at each state s in order to maximize (or minimize) the corresponding Q(s, a) value." sec. I, p. 373; "The deep Q-learning technique is adopted for the online control based on the ofﬂine-trained DNN. More speciﬁcally, at each decision epoch tk of an execution sequence, the system under control is in a state sk. The DRL agent performs inference using the DNN to derive the Q(sk,a) estimate of each state-action pair (sk,a), and uses ε-greedy policy to derive the action with the highest Q(sk,a) with probability 1 − ε and choose the other actions randomly with total probability ε. The chosen action is denoted by ak. At the next decision epoch tk+1, the DRL agent performs Q-value updating based on the total reward (or cost) rk(sk,ak) observed during this time period [tk,tk+1). At the end of the execution sequence, the DRL agent updates the DNN using the newly observed Q-value estimates, and the updated DNN will be utilized in the next execution sequence. More detailed procedures are shown in Algorithm 1." sec. IV, p. 376; and
generate training data for a neural network logic of the compiler logic circuitry ("Based on the stored state transition proﬁles and Q(s, a) value estimates, the DNN is constructed with weight set θ trained using standard training algorithms [27]." sec. IV, p. 376; "In this work, for the global tier, we perform ofﬂine training, including the experience memory initialization and training of the autoencoder, using the whole Google cluster traces. To obtain DNN model in the global tier of the proposed framework, we use workload traces for ﬁve different M-machine clusters. We generate four new state transition proﬁles using the ε-greedy policy [20] and store the transition proﬁles in the memory before sampling the minibatch for training the DNN." sec. VII.A, p. 380) from the at least one sequence and the at least one additional sequence (Algorithm 1, "for each execution sequence"; p. 376), the training data to comprise the policies ("An effective dynamic power management (DPM) policy should minimize power consumption while maintaining performance degradation within an acceptable level" Abstract, par. 1, p. 372; "It has been proven that the Q-learning technique will gradually converge to the optimal policy" sec. V.B, p. 378); and
the neural network logic arranged to model two or more physical resource assignments as actions ("A scheduling scheme must be developed for dynamically assigning the jobs to servers and allocating resources in each server" sec. III, p. 374; "An action in the action space may be the allocation of VMs to the servers (a.k.a. physical machines) and allocating resources in the servers for VM execution." sec. I, p. 372; " Action space: The action of the DRL agent for cloud resource allocation is deﬁned as the index of server for VM (job) allocation. The action space for a cluster with M servers is deﬁned" sec. V.A, p. 377);
determine an approximated value function based on the training data (At the next decision epoch tk+1, the DRL agent performs Q-value updating based on the total reward (or cost) rk(sk,ak) observed during this time period [tk,tk+1). At the end of the execution sequence, the DRL agent updates the DNN using the newly observed Q-value estimates, and the updated DNN will be utilized in the next execution sequence. More detailed procedures are shown in Algorithm 1." sec. IV, p. 376); and
determine the physical resource assignment based on the approximated value function ("we present a generalized form of DRL technique compared with the prior work, which could be utilized for resource allocation and other problems as well... At the end of the execution sequence, the DRL agent updates the DNN using the newly observed Q-value estimates, and the updated DNN will be utilized in the next execution sequence." sec. IV, p. 376).

Liu does not explicitly teach determine at least one sequence of assignments of the virtual resources to the physical resources;
generate at least one additional sequence based on a random assignment of the virtual resources to the physical resources, 
wherein the at least one sequence and the at least one additional sequence are identified as a policies, 

Barrett teaches determine at least one sequence of assignments of the virtual resources to the physical resources ("In creating feasible workﬂow schedules, a task to resource mapping {Ti,Cj } is represented as a single gene in the chromosome. A valid chromosome contains a sequence of genes, mapping every task in the workﬂow to a corresponding resource. The order of the genes represents the schedule execution order on the chosen resources. A feasible solution to the scheduling problem must maintain the precedence constraints between the tasks speciﬁed in the workﬂow." p. 85, sec. II.B);
generate at least one additional sequence based on a random assignment of the virtual resources to the physical resources ("if (random > crossoverRate), then Apply single point crossover, end if, if (random > mutationRate), then Apply mutation" Algorithm 1, sec. II.B., p. 85; "Mutation involves randomly altering the bit string altering aspects of a chromosome. Mutation occurs on the assigned service in a given task-service mapping as seen in Figure 5" p. 87, para. 6; "Genetic algorithms are stochastic search and optimization techniques based on evolution. In their simplest form, a set of possible solutions to a particular problem are evaluated in an iterative manner. From the ﬁttest of these solutions, the next generation is created and the evaluation process begins once more. A solutions suitability to its environment is determined using a ﬁtness function. By iterating through successive generations good approximate solutions can be found for the given environment." p. 85, sec. II.B), 
wherein the at least one sequence and the at least one additional sequence are identified as a policies ("In solving a MDP two algorithms, value iteration or policy iteration algorithms from dynamic programming can be used. In this work we choose the value iteration algorithm to calculate the state value function. Approximations of V π(s) which are indicative as to the beneﬁt of being state s under policy π are calculated after each time interval. Actions are chosen based on π the policy being followed. The policy denotes the optimal mapping from states to actions." p. 88, sec. IV.C), 
Liu and Barrett are analogous art because both are directed towards cloud resource management. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the cloud resource management of Liu with the workflow scheduling of Barrett.  The modification would have been obvious because one of ordinary skill in the art would be motivated to minimize cost and makespan, as suggested by Barrett (Abstract, p. 83).

Regarding Claim 2,
The Liu/Barrett combination teaches claim 1.  Liu further teaches wherein the compiler logic circuitry comprises the neural network logic to determine the approximated value function by iterative determination of a gradient descent of the approximated value function and backpropagation of error to incrementally converge to the approximated value function; and to determine the physical resource assignment based on the approximated value function ("In the training process, ﬁrst we initialize the weights for the input layer and output layer as a normal distribution with a mean value of 0 and standard deviation of 1. The bias for both layers is set as a constant value 0.1. The initial state of LSTM cell is set as 0 for all cells. In response to the back propagated errors, the network is updated by adopting Adam optimization [27], a method for efﬁcient stochastic optimization that only requires ﬁrst-order gradients with little memory requirement. The method computes individual adaptive learning rates from estimates of the ﬁrst and second moments of the gradients [33]. The state of the LSTM cell and weights will be trained for minimizing the propagated errors." sec. VI.A, p. 378).

Regarding Claim 4,
The Liu/Barrett combination teaches claim 1.  Liu further teaches wherein the neural network logic comprises a mini-batch logic configured to determine a sample set of training data with which to perform a gradient descent ("We generate four new state transition proﬁles using the ε-greedy policy [20] and store the transition proﬁles in the memory before sampling the minibatch for training the DNN. In addition, we clip the gradients to make their norm values less than or equal to 10. The gradient clipping method has been introduced to DRL in [37]." sec. VII.A, p. 380).

Regarding Claim 7,
The Liu/Barrett combination teaches claim 1.  Liu further teaches wherein the physical resource assignment comprises an assignment of a register class or an assignment of a task to a processor ("We demonstrate how jobs are executed on an active server with only CPU resource usage in Fig. 3 as an example. Job 1 consumes 50% of CPU, while each of job 2 and 3 requires 40%. Job 1, 2 and 3 arrive at t1, t2 and t3, respectively, and complete at t4, t5 and t6, respectively. We assume that the server is in active mode at time 0. When job 1 and 2 arrive, there are enough CPU resources so their requirements are satisﬁed immediately. When job 3 arrives, it waits until the job 1 is completed, and the waiting time is t4 − t3. The job latency is deﬁned as the duration between the job arrival and completion. Therefore the latency of job 3 is t6 − t3, which is longer than the job duration. To reduce the job latency, the job broker should avoid overloading servers. A scheduling scheme must be developed for dynamically assigning the jobs to servers and allocating resources in each server." sec. III, p. 374).


Regarding Claim 8,
Liu teaches method to determine a physical resource assignment, the method comprising:
identifying, by a compiler logic circuitry ("computational resources (including CPU, memory, disk, communication bandwidth, etc.)" sec. I, p. 372), one or more states in a code, the one or more states to comprise virtual resources identified in the code to assign to physical resources ("However, a complete resource allocation framework in the cloud computing systems exhibits high dimensions in state and action spaces. For example, a state in the state space may be the Cartesian product of characteristics and current resource utilization level of each server (for hundreds of servers) as well as current workload level (number and characteristics of VMs for allocation)." sec. I, p. 372; "Cloud computing with virtualization technology enables computational resources (including CPU, memory, disk, communication bandwidth, etc.) in data centers or server clusters to be shared by allocating virtual machines (VMs) in an on-demand manner." sec. I, p. 372; "we will describe the global and local tiers of the overall cloud resource allocation and power management framework. The global tier of cloud resource (VM) allocation exhibits high dimensions in state and action spaces" sec. III, p. 375);
generating, by the compiler logic circuitry, training data for a neural network of the compiler logic circuitry ("Based on the stored state transition proﬁles and Q(s, a) value estimates, the DNN is constructed with weight set θ trained using standard training algorithms [27]." sec. IV, p. 376; "In this work, for the global tier, we perform ofﬂine training, including the experience memory initialization and training of the autoencoder, using the whole Google cluster traces. To obtain DNN model in the global tier of the proposed framework, we use workload traces for ﬁve different M-machine clusters. We generate four new state transition proﬁles using the ε-greedy policy [20] and store the transition proﬁles in the memory before sampling the minibatch for training the DNN." sec. VII.A, p. 380), 
the training data to comprise more than one policy ("An effective dynamic power management (DPM) policy should minimize power consumption while maintaining performance degradation within an acceptable level" Abstract, par. 1, p. 372), 
each policy comprising the one or more states, each state having a status of the code (Algorithm 2; "RL state", sec. VI.B, p. 379), an action (Algorithm 2; "action set A" sec. VI.B, p. 379), and an expected reward (Algorithm 2; "reward rate" sec. VI.B, p. 379) for assignment of a virtual resource to a physical resource ("a deep Q-learning framework was also proposed to derive the optimal action a at each state s in order to maximize (or minimize) the corresponding Q(s, a) value." sec. I, p. 373; "The deep Q-learning technique is adopted for the online control based on the ofﬂine-trained DNN. More speciﬁcally, at each decision epoch tk of an execution sequence, the system under control is in a state sk. The DRL agent performs inference using the DNN to derive the Q(sk,a) estimate of each state-action pair (sk,a), and uses ε-greedy policy to derive the action with the highest Q(sk,a) with probability 1 − ε and choose the other actions randomly with total probability ε. The chosen action is denoted by ak. At the next decision epoch tk+1, the DRL agent performs Q-value updating based on the total reward (or cost) rk(sk,ak) observed during this time period [tk,tk+1). At the end of the execution sequence, the DRL agent updates the DNN using the newly observed Q-value estimates, and the updated DNN will be utilized in the next execution sequence. More detailed procedures are shown in Algorithm 1." sec. IV, p. 376), 
the neural network to model two or more physical resource assignments as actions ("A scheduling scheme must be developed for dynamically assigning the jobs to servers and allocating resources in each server" sec. III, p. 374; "An action in the action space may be the allocation of VMs to the servers (a.k.a. physical machines) and allocating resources in the servers for VM execution." sec. I, p. 372; " Action space: The action of the DRL agent for cloud resource allocation is deﬁned as the index of server for VM (job) allocation. The action space for a cluster with M servers is deﬁned" sec. V.A, p. 377), 
training, by the compiler logic circuitry, the neural network logic by determining an approximated value function based on the training data (At the next decision epoch tk+1, the DRL agent performs Q-value updating based on the total reward (or cost) rk(sk,ak) observed during this time period [tk,tk+1). At the end of the execution sequence, the DRL agent updates the DNN using the newly observed Q-value estimates, and the updated DNN will be utilized in the next execution sequence. More detailed procedures are shown in Algorithm 1." sec. IV, p. 376); and
determining the physical resource assignment based on the approximated value function ("we present a generalized form of DRL technique compared with the prior work, which could be utilized for resource allocation and other problems as well... At the end of the execution sequence, the DRL agent updates the DNN using the newly observed Q-value estimates, and the updated DNN will be utilized in the next execution sequence." sec. IV, p. 376).  

Liu does not explicitly teach wherein generating the training data comprises performing a genetic logic function to determine virtual resource assignments to physical resources, the genetic logic function to combine sequences of assignments from two or more different instances of the code, to introduce a mutation into one or more sequences of assignments of virtual resources to physical resources to generate additional sequences of assignments, or to both combine the sequences and to introduce the mutation.
Barrett teaches wherein generating the training data comprises performing a genetic logic function ("Genetic algorithms are stochastic search and optimization techniques based on evolution. In their simplest form, a set of possible solutions to a particular problem are evaluated in an iterative manner. From the ﬁttest of these solutions, the next generation is created and the evaluation process begins once more. A solutions suitability to its environment is determined using a ﬁtness function. By iterating through successive generations good approximate solutions can be found for the given environment." p. 85, sec. II.B) to determine virtual resource assignments to physical resources (Fig. 2, Workflow Management System Architecture), 
the genetic logic function to combine sequences of assignments from two or more different instances of the code (Algorithm 1, Genetic Algorithm, "Select parents using roulette wheel selection", p. 85, sec. II.B), to introduce a mutation into one or more sequences of assignments of virtual resources to physical resources to generate additional sequences of assignments (Algorithm 1, Genetic Algorithm, "Apply mutation", p. 85, sec. II.B), or to both combine the sequences and to introduce the mutation (Algorithm 1, Genetic Algorithm, Select parents using roulette wheel selection, Apply single point crossover, Apply mutation, Create next generation, p. 85, sec. II.B).
Liu and Barrett are analogous art because both are directed towards cloud resource management. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the cloud resource management of Liu with the workflow scheduling of Barrett.  The modification would have been obvious because one of ordinary skill in the art would be motivated to minimize cost and makespan, as suggested by Barrett (Abstract, p. 83).

Regarding Claim 9,
The Liu/Barrett combination teaches claim 8.  Liu further teaches wherein the generating the training data further comprises executing multiple instances of one or more different codes, wherein each of the one or more different codes is compiled with multiple different sequences of assignments of the virtual resources to the physical resources, and measuring objective metrics associated with the approximated value function for each instance ("The deep Q-learning technique is adopted for the online control based on the ofﬂine-trained DNN. More speciﬁcally, at each decision epoch tk of an execution sequence, the system under control is in a state sk. The DRL agent performs inference using the DNN to derive the Q(sk,a) estimate of each state-action pair (sk,a), and uses ε-greedy policy to derive the action with the highest Q(sk,a) with probability 1 − ε and choose the other actions randomly with total probability ε. The chosen action is denoted by ak. At the next decision epoch tk+1, the DRL agent performs Q-value updating based on the total reward (or cost) rk(sk,ak) observed during this time period [tk,tk+1). At the end of the execution sequence, the DRL agent updates the DNN using the newly observed Q-value estimates, and the updated DNN will be utilized in the next execution sequence. More detailed procedures are shown in Algorithm 1." sec. IV, p. 376).  

Regarding Claim 11,
The Liu/Barrett combination teaches claim 8.  Barrett further teaches wherein generating the training data comprises performing a genetic logic to select sequences of assignments from two or more different instances of the code based on evaluation scores for the sequences of assignments produced by the approximated value function (Algorithm 1, Genetic Algorithm, "Evaluate chromosome fitness, Rank chromosomes according to overall population fitness, Select parents using roulette wheel selection", p. 85, sec. II.B).  

Regarding Claim 12,
The Liu/Barrett combination teaches claim 8.  Liu further teaches wherein training the neural logic comprises training the neural network logic by determining the approximated value function by iterative determination of a gradient descent of the approximated value function and backpropagation of error to incrementally converge to the approximated value function ("In the training process, ﬁrst we initialize the weights for the input layer and output layer as a normal distribution with a mean value of 0 and standard deviation of 1. The bias for both layers is set as a constant value 0.1. The initial state of LSTM cell is set as 0 for all cells. In response to the back propagated errors, the network is updated by adopting Adam optimization [27], a method for efﬁcient stochastic optimization that only requires ﬁrst-order gradients with little memory requirement. The method computes individual adaptive learning rates from estimates of the ﬁrst and second moments of the gradients [33]. The state of the LSTM cell and weights will be trained for minimizing the propagated errors." sec. VI.A, p. 378).  

Regarding Claim 13,
The Liu/Barrett combination teaches claim 8.  Liu further teaches wherein the physical resource assignment comprises an assignment of a register class or an assignment of a task to a processor ("We demonstrate how jobs are executed on an active server with only CPU resource usage in Fig. 3 as an example. Job 1 consumes 50% of CPU, while each of job 2 and 3 requires 40%. Job 1, 2 and 3 arrive at t1, t2 and t3, respectively, and complete at t4, t5 and t6, respectively. We assume that the server is in active mode at time 0. When job 1 and 2 arrive, there are enough CPU resources so their requirements are satisﬁed immediately. When job 3 arrives, it waits until the job 1 is completed, and the waiting time is t4 − t3. The job latency is deﬁned as the duration between the job arrival and completion. Therefore the latency of job 3 is t6 − t3, which is longer than the job duration. To reduce the job latency, the job broker should avoid overloading servers. A scheduling scheme must be developed for dynamically assigning the jobs to servers and allocating resources in each server." sec. III, p. 374).

Regarding Claim(s) 14-15 and 19,
Claim(s) 14-15 and 19 recite(s) a system including memory storing instructions for performing functions corresponding to the apparatus performing functions recited in claim(s) 1-2 and 7, respectively.  The Liu/Barrett combination teaches the limitations of claim(s) 14-15 and 19 as set forth above in connection with claim(s) 1-2 and 7.  Therefore, claim(s) 14-15 and 19 is/are rejected under the same rationale as respective claim(s) 1-2 and 7.

Regarding Claim 18,
The Liu/Barrett combination teaches the system of claim 14.  Barrett further teaches wherein the compiler logic circuitry comprises the training logic to execute multiple instances of the code and multiple instances of other code, each instance of the code having different sequences of assignments of virtual resources to physical resources for the code and each instance of the other code having different sequences of assignments of virtual resources to physical resources for the other code (Algorithm 1, Genetic Algorithm, Initialise population of feasible schedules, p. 85; "A number of solvers with ranging conﬁgurations are instantiated to produce schedules of varying cost and makespan. From these schedules an agent utilising a MDP computes the optimal schedule based on the current state of the cloud environment. The scheduling plan is then executed on the cloud via the Executor module." p. 86, sec. III).
The motivation to combine Liu and Barrett is the same as the motivation for claim 1.

Regarding Claim(s) 20 and 23-25,
Claim(s) 20 and 23-25 recite(s) a system including memory storing instructions for performing functions corresponding to the apparatus performing functions recited in claim(s) 8 and 11-13, respectively.  The Liu/Barrett combination teaches the limitations of claim(s) 20 and 23-25 as set forth above in connection with claim(s) 8 and 11-13.  Therefore, claim(s) 20 and 23-25 is/are rejected under the same rationale as respective claim(s) 8 and 11-13.

Regarding Claim 21,
The Liu/Barrett combination teaches claim 20.  Liu further teaches wherein the operations further comprise determining, for a different code, an optimal assignment ("in each decision epoch the DRL agent needs to enumerate all possible actions at current state and perform inference using DNN to derive the optimal Q(s, a) value estimate" sec. IV, p. 376) of the virtual resource to the physical resource ("Cloud computing with virtualization technology enables computational resources (including CPU, memory, disk, communication bandwidth, etc.) in data centers or server clusters to be shared by allocating virtual machines (VMs) in an on-demand manner." sec. I, p. 372) based on the training data ("In this work, for the global tier, we perform ofﬂine training, including the experience memory initialization and training of the autoencoder, using the whole Google cluster traces. To obtain DNN model in the global tier of the proposed framework, we use workload traces for ﬁve different M-machine clusters. We generate four new state transition proﬁles using the ε-greedy policy [20] and store the transition proﬁles in the memory before sampling the minibatch for training the DNN." sec. VII.A, p. 380) and a current state of the different code ("in each decision epoch the DRL agent needs to enumerate all possible actions at current state and perform inference using DNN to derive the optimal Q(s, a) value estimate" sec. IV, p. 376).

Claim(s) 3 and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. (A Hierarchical Framework of Cloud Resource Allocation and Power Management Using Deep Reinforcement Learning, hereinafter "Liu") in view of Barrett et al. (A Learning Architecture for Scheduling Workflow Applications in the Cloud, hereinafter "Barrett"), and Palmes et al. (Mutation-Based Genetic Neural Network, hereinafter "Palmes").

Regarding Claims 3 and 16,
The Liu/Barrett combination teaches claim(s) 1 and 14, from which claim(s) 3 and 16 depend, respectively.
Liu further teaches wherein the neural network logic comprises a weight and a bias ("In the training process, ﬁrst we initialize the weights for the input layer and output layer as a normal distribution with a mean value of 0 and standard deviation of 1. The bias for both layers is set as a constant value 0.1. The initial state of LSTM cell is set as 0 for all cells. In response to the back propagated errors, the network is updated by adopting Adam optimization [27], a method for efﬁcient stochastic optimization that only requires ﬁrst-order gradients with little memory requirement. The method computes individual adaptive learning rates from estimates of the ﬁrst and second moments of the gradients [33]. The state of the LSTM cell and weights will be trained for minimizing the propagated errors." sec. VI.A., p. 378).

Liu does not explicitly teach wherein the neural network logic comprises a transitivity layer configured to apply an activation function to the approximated value function.
Palmes teaches wherein the neural network logic comprises a transitivity layer configured to apply an activation function to the approximated value function ("The output value in the second term of the ﬁtness function follows the typical feed-forward computation that uses sigmoidal activation function and threshold values of ANN" sec. V, p. 591)
Liu and Palmes are analogous art because they are both directed to using neural networks to find optimal solutions. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the resource allocation neural network of the Liu/Barrett combination with the mutation-based genetic neural network of Palmes.  The modification would have been obvious because one of ordinary skill in the art would be motivated to use a popular approach (Palmes:"The use of evolutionary algorithms (EAs) to aid in artificial neural networks (ANNs) learning has been a popular approach to address the shortcomings of back-propagation (BP)" sec. I, p. 587) for fast operation and more robust search coverage (Palmes:"important EA operations are fast and makes feasible the use of a bigger population size for a more robust search coverage" sec. I, p. 587).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHARLES C KUO whose telephone number is (571)270-7477.  The examiner can normally be reached on M-F: 9:00 a.m. - 6:00 p.m..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571) 272-9767.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/CHARLES C KUO/Examiner, Art Unit 2126   
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126