DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
2.	This action is in response to the following communication: Non-provisional Application No. 16/625,223 filed on 12/20/2019.
3.	Claims 1-19 are cancelled.  

Claims 20-36 are pending.  

Claims 20, 33, 35 and 36 are independent claims.  

Claim Objections
4.	It is noted that Claims 24, 25 and 31 recites alternative/optional use language:
Claim 24:  Line 3 references “and/or…”
Claim 25:  Line 4 references “and/or…”
Claim 31:  Line 4 references “and/or…”
5:	Claims 20, 25, 26, 33, 35 and 36 are objected to because of the following informalities:  
Claims 20 and 33, grammatical error, “is deter-mined”, examiner suggest using “is determined”.
Claim 25, grammatical error, “the control vari-able”, examiner suggest using “the control variable”.
Claim 26, grammatical error, “at at least”, examiner suggest using “at least”.

Claim 33, grammatical error, “the associated sup-porting”, examiner suggest using “the associated supporting”.

Claims 35 and 36, grammatical error, “a prob-ability distribution”, examiner suggest using “a probability distribution”.
Claims 35 and 36, grammatical error, “is selected de-pending”, examiner suggest using “is selected depending”.
6:	Claims 20, 33, 35 and 36 are objected to because of the following informalities:  
Claim 20 is objected to because of the following informalities: “an actuator” in line 3 should specify “the actuator”.
Claim 33 is objected to because of the following informalities: “an actuator” in line 3 should specify “the actuator”.
Claim 35 is objected to because of the following informalities: “an actuator” in line 4 should specify “the actuator”.
Appropriate corrections are required.
Claim Rejections - 35 USC § 101
7.	35 U.S.C. 101 reads as follows: 
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

8.	Claims 35-36 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory and/or abstract subject matter.  
	Claim 35 recites a “machine-readable storage medium”.  However, it appears that this “machine-readable storage medium” is directed to a signal per se. A product is a tangible physical article or object, some form of matter, which a signal is not.  A signal, a form of energy, does not fall within one of the four statutory classes of § 101.  As such, the claimed “machine-readable storage medium” is not limited to embodiments that fall within a statutory category of invention (see Interim Guidelines for Examination of Patent Applications for Patent Subject Matter Eligibility – Annex IV(c) (1300 OG 142 signed 26OCT2005).  Consequently, claim 35 is rejected as non-statutory.
	Claim 36 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter because claim 36 recites an “actuator control system …” that has been reasonably interpreted as a computer program, software, listing per se (see p. 1, 1st para. of the specification).  Claim 36 recite the “actuator control system …” which does not fall within at least one of the four categories of patent eligible subject matter recited in 35 U.S.C 101 (process, machine, manufacture, or composition of matter), e.g., since the claim is directed to software. Therefore, claim 36 is rejected as non-statutory – see MPEP 7.05.01.  
Allowable Subject Matter

9.	Claim 29, wherein the cited prior art taken alone or in combination fail to teach, in combination with the other claimed limitations of the density of supporting points is increased if a quotient of average density of supporting points (&) and the smallest value (min-Var) falls below a predefinable threshold value.  The art of record does not expressly disclose such features.
Claim Rejections - 35 USC § 102

10.	The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.

6.	Claims 20-22, 24, 25, 31-33,  and 35-36 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Bischoff et al., "Policy search for learning robot control using sparse data", (hereinafter Bischoff), published , May 31st, 2014. 
   In regards to claim 20, Bischoff teaches:
Method for automatically setting at least one parameter (9) of an actuator control system (45) (p. 2, 1st column, 1st para., see Fig. 1, see a mobile service robot with a pneumatic ARM, Fig. 3, throttle valve system to regulate a gas or fluid flow) and (p. 4, 1st column, 5th para., see comparative Evaluation on a Throttle Valve Simulation. The throttle valve shown in Figure 3 is an important technical device that allows flow regulation of gas and fluids… The valve system basically consists of a DC-motor, a spring and a valve with position sensors… α and ω are the valve angle and corresponding angular velocity T is the actuator input) (emphasis added).  
which is set up for controlling a control variable (x) of an actuator (20) to a predefinable target value (xd) (desired angle g) (p. 4, 1st column, 5th para., see comparative Evaluation on a Throttle Valve Simulation. The throttle valve shown in Figure 3 is an important technical device that allows flow regulation of gas and fluids… The valve system basically consists of a DC-motor, a spring and a valve with position sensors… α and ω are the valve angle and corresponding angular velocity T is the actuator input) and (p. 4, 2nd column, 2nd para., see we describe the system state as valve angle α and velocity ω, the voltage u(t) corresponds to the action space. The learning goal is to move the valve to a desired angle g) (emphasis added).  
the actuator control system (45) is set up, depending on the at least one parameter (0) ) (velocity ω), the target value (xd) ) (desired angle g) and the control variable (x) (valve angle α) to generate a manipulated variable (u) (voltage u(t)) (p. 4, 1st column, 5th para., see the dynamics of the throttle valve system can be analytically approximated by the model… here α and ω are the valve angle and corresponding angular velocity T is the actuator input) and (p. 4, 2nd column, 2nd para., see we describe the system state as valve angle α and velocity ω, the voltage u(t) corresponds to the action space. The learning goal is to move the valve to a desired angle g). 
depending on this manipulated variable (u) to control the actuator (20) (p. 4, 1st column, 5th para., see comparative Evaluation on a Throttle Valve Simulation. The throttle valve shown in Figure 3 is an important technical device that allows flow regulation of gas and fluids… The valve system basically consists of a DC-motor, a spring and a valve with position sensors… the dynamics of the throttle valve system can be analytically approximated by the model … α and ω are the valve angle and corresponding angular velocity T is the actuator input), and (p. 4, 2nd column, 2nd para., see we describe the system state as valve angle α and velocity ω, the voltage u(t) corresponds to the action space. The learning goal is to move the valve to a desired angle g).
a new value (0*) of the at least one parameter (0) is selected depending on a long-term cost function (R) (p. 2, 1st column, 2nd para., see it is the goal of the learning agent to find a controller that minimizes the expected long-term cost J(π) … The learning algorithm uses samples si,ai,si+1 to optimize the controller with respect to the expected long-term cost. …  the learning algorithm directly operates on the parameters θ of a controller πθ to minimize the expected long-term cost. On the other hand, value-function approaches learn a long-term cost estimate for each state. Using this estimate, a controller can be determined) and (p. 2, 2nd, column, 2nd para., see when the policy parameters have been learned, the corresponding policy is applied to the robot. The data from this experiment is collected and used to update the learned dynamics model. Then, this new model is used to update the policy. The process of model learning, policy learning, and application of the policy to the robot is repeated until a good policy is found). 
this long-term cost function (R) is deter-mined depending on a predicted temporal evolution (F) of a probability distribution (P) of the control variable (x) of the actuator (20) (p. 2, 1st column, last para., see for policy evaluation, we use the GP dynamics model to iteratively compute Gaussian approximations to the long-term predictions  p(s0|θ),…,p(sT|θ) for a given policy parametrization θ. Since PILCO explicitly accounts for model uncertainty in this process, it reduces the effect of model bias [2]. With the predicted state distributions p(st|θ),t=1,…,T, an approximation to the expected long term cost J(π) in Eq.(emphasis added).
and the parameter (UH) is then set to this new value (0*) (p. 2, 2nd, column, 2nd para., see when the policy parameters have been learned, the corresponding policy is applied to the robot. The data from this experiment is collected and used to update the learned dynamics model. Then, this new model is used to update the policy. The process of model learning, policy learning, and application of the policy to the robot is repeated until a good policy is found) and (p. 1, 2nd column, last para., see RL considers a learning agent and its interactions with the environment [3], [7]. In each state s∈S the agent can apply an action a∈A and, subsequently, moves to a new state s′. The system dynamics define the next state probability). 

   In regards to claim 21, Bischoff teaches:
the predicted temporal evolution (F) is determined as a function of a model (9g), in particular a Gaussian process, advantageously a sparse Gaussian process, of the actuator (20) (p. 1, 1st column, 1st para., see the probabilistic inference for learning control method (PILCO), can be tailored to cope with sparse training data to speed up the learning process… The policy search RL algorithm Pilco [2] employs Gaussian processes (GP) to model the system dynamics) and (p. 2, 1st column, last para., see for policy evaluation, we use the GP dynamics model to iteratively compute Gaussian approximations to the long-term predictions p(s0|θ),…,p(sT|θ) for a given policy parametrization θ. Since PILCO explicitly accounts for model uncertainty in this process, 
it reduces the effect of model bias [2]. With the predicted state distributions p(st|θ),t=1,…,T, an approximation to the expected longterm cost J(π) in Eq. (1) and the gradients dJ(θ)/dθ can be computed analytically) (emphasis added).
   In regards to claim 22, Bischoff teaches:
the model (g), depending on the manipulated variable (u), which is supplied to the actuator (20) with the actuator control system (45) in a control of the actuator (20), and then adapted to the resulting control variable (x) (p. 4, 1st column, 5th para., see the dynamics of the throttle valve system can be analytically approximated by the model… here α and ω are the valve angle and corresponding angular velocity T is the actuator input) and (p. 4, 2nd column, 2nd para., see we describe the system state as valve angle α and velocity ω, the voltage u(t) corresponds to the action space. The learning goal is to move the valve to a desired angle g).
after the adaptation of the model (g) a new value (0*) of the at least one parameter (0) is again determined, depending on the predicted evolution (F) of the probability distribution (p) of the control variable (x) of the actuator (20), wherein the redetermination of the new value (0*) of the at least one parameter (8) is determined depending on the now adapted model (g) (p. 2, 1st column, 2nd para., see it is the goal of the learning agent to find a controller that minimizes the expected long-term cost J(π) … The learning algorithm uses samples si, ai, si+1 to optimize the controller with respect to the expected long-term cost. …  the learning algorithm directly operates on the parameters θ of a controller πθ to minimize the expected long-term cost. On the other hand, value-function approaches learn a long-term cost estimate for each state. Using this estimate, a controller can be determined) and (p. 2, 2nd, column, 2nd para., see when the policy parameters have been learned, the corresponding policy is applied to the robot. The data from this experiment is collected and used to update the learned dynamics model. Then, this new model is used to update the policy. The process of model learning, policy learning, and application of the policy to the robot is repeated until a good policy is found). 

   In regards to claim 24, Bischoff teaches:
a density of the supporting points (&) depends on a determined temporal evolution (T1... TT), determined in particular by means of the model (g) and/or the actuator control system (45), of the control variable (x), starting from a randomly determined initial value (T0) of the control variable (x) from an initial probability distribution (p(x0)) (p. 4, 2nd column, 2nd para., see we set the start state to be α0=10o, the desired angle is g=90o. In each controller rollout, 15 dynamics samples s,a,s′ are collected. This results in a sparse data set of 75 samples after initial random movements and 4 learning episodes) and (p. 5, 2nd column, last para., see the initial robot pose has a displacement of approximately 50 cm to the mug. We start with application of random actions to generate initial dynamics data. Here, we collect four random trajectories of 15 seconds starting from the initial pose. This results in an initial dynamics data set of 120 samples. Now, we iterate the steps: learn dynamics model, improve controller, apply controller to collect additional data. In each rollout, the controller is applied for 10 seconds resulting in 20 additional dynamics samples). 

   In regards to claim 25, Bischoff teaches:
the density of supporting points (&) also depends on a determined temporal evolution (tT!1l ... tT'T) of the control variable (x), determined in particular by means of the model (g) and/or the actuator control system (45), starting from the target value (xd) as the initial value (t'0O) of the control vari-able (x) (p. 4, 2nd column, 2nd para., see we set the start state to be α0=10o, the desired angle is g=90o. In each controller rollout, 15 dynamics samples s,a,s′ are collected. This results in a sparse data set of 75 samples after initial random movements and 4 learning episodes) and (p. 5, 2nd column, last para., see the initial robot pose has a displacement of approximately 50 cm to the mug. We start with application of random actions to generate initial dynamics data. Here, we collect four random trajectories of 15 seconds starting from the initial pose. This results in an initial dynamics data set of 120 samples. Now, we iterate the steps: learn dynamics model, improve controller, apply controller to collect additional data. In each rollout, the controller is applied for 10 seconds resulting in 20 additional dynamics samples). 

   In regards to claim 31, Bischoff teaches:
the long-term cost function (R) is selected as a function of a local cost function (xr), the local cost function (r) being selected as a function of a Gaussian function and/or a polynomial function which depends on a difference between the manipulated variable (x) and the predefinable target value (xd) (p. 2, 1st column, last para., see for policy evaluation, we use the GP dynamics model to iteratively compute Gaussian approximations to the long-term predictions… an approximation to the expected longterm cost J(π) in Eq. (1) and the gradients dJ(θ)/dθ can be computed analytically) and (p. 2, 1st column, 2nd para., see it is the goal of the learning agent to find a controller that minimizes the expected long-term cost J(π).l.. where st is the resulting state distribution when the controller π is applied for t timesteps.. The learning algorithm uses samples si,ai,si+1 to optimize the controller with respect to the expected long-term cost… the learning algorithm directly operates on the parameters θ of a controller πθ to minimize the expected long-term cost). 

   In regards to claim 32, Bischoff teaches:
the manipulated variable (u) is limited to values within a predefinable manipulated variable range by means of a limitation function (0) (p. 4, 2nd column, 4th para., see the Festo Robotino XT shown in Figure 1, is a mobile service robot with an omni-directional drive and an attached pneumatic arm. The design of the arm is inspired biologically by the trunk of an elephant. The arm itselflow consists of two segments with three bellows each. The pressure in each bellow can be regulated in a range of 0.0 to 1.5 bar to move the arm) and (p. 5, 1st column, 4th para., see 2) Learning Object Grasping: to learn object grasping in the policy search setting, we need to define states, actions, cost, and a control structure…. The action space is also 9  dimensional, A⊂R9, with 3 dimensions for the base movement and 6 dimensions to modify the bellow pressure in a range of −0.3 bar to 0.3 bar. The dynamics f,f:S×A→S, is hence an 18 to 9 dimensional mapping).

   In regards to claim 33, Bischoff teaches:
Learning system (40) for automatically setting at least one parameter (9) of an actuator control system (45) (p. 2, 1st column, 1st para., see Fig. 1, see a mobile service robot with a pneumatic ARM, Fig. 3, throttle valve system to regulate a gas or fluid flow) and (p. 4, 1st column, 5th para., see comparative Evaluation on a Throttle Valve Simulation. The throttle valve shown in Figure 3 is an important technical device that allows flow regulation of gas and fluids… The valve system basically consists of a DC-motor, a spring and a valve with position sensors… α and ω are the valve angle and corresponding angular velocity T is the actuator input) (emphasis added).  
which is set up to control a control variable (x) of an actuator (20) toa predefinable target value (xd) (p. 4, 1st column, 5th para., see comparative Evaluation on a Throttle Valve Simulation. The throttle valve shown in Figure 3 is an important technical device that allows flow regulation of gas and fluids… The valve system basically consists of a DC-motor, a spring and a valve with position sensors… α and ω are the valve angle and corresponding angular velocity T is the actuator input) and (p. 4, 2nd column, 2nd para., see we describe the system state as valve angle α and velocity ω, the voltage u(t) corresponds to the action space. The learning goal is to move the valve to a desired angle g) (emphasis added).  
said learning system (40) is set up to carry out a method (p. 2, 1st column, 2nd para., see it is the goal of the learning agent to find a controller that minimizes the expected long-term cost J(π) … The learning algorithm uses samples si,ai,si+1 to optimize the controller with respect to the expected long-term cost. …  the learning algorithm directly operates on the parameters θ of a controller πθ to minimize the expected long-term cost. On the other hand, value-function approaches learn a long-term cost estimate for each state. Using this estimate, a controller can be determined) and (p. 2, 2nd, column, 2nd para., see when the policy parameters have been learned, the corresponding policy is applied to the robot. The data from this experiment is collected and used to update the learned dynamics model. Then, this new model is used to update the policy. The process of model learning, policy learning, and application of the policy to the robot is repeated until a good policy is found).
the actuator control system (45) is set up, depending on the at least one parameter (0) ) (velocity ω), the target value (xd) and the control variable (x) (valve angle α)   to generate a manipulated variable (u) (voltage u(t)) and depending on this manipulated variable (u) to control the actuator (20) (p. 4, 1st column, 5th para., see the dynamics of the throttle valve system can be analytically approximated by the model… here α and ω are the valve angle and corresponding angular velocity T is the actuator input) and (p. 4, 2nd column, 2nd para., see we describe the system state as valve angle α and velocity ω, the voltage u(t) corresponds to the action space. The learning goal is to move the valve to a desired angle g). 
a new value (0*) of the at least one parameter (0) is selected depending on a long-term cost function (R) (p. 2, 1st column, 2nd para., see it is the goal of the learning agent to find a controller that minimizes the expected long-term cost J(π) … The learning algorithm uses samples si,ai,si+1 to optimize the controller with respect to the expected long-term cost. …  the learning algorithm directly operates on the parameters θ of a controller πθ to minimize the expected long-term cost. On the other hand, value-function approaches learn a long-term cost estimate for each state. Using this estimate, a controller can be determined) and (p. 2, 2nd, column, 2nd para., see when the policy parameters have been learned, the corresponding policy is applied to the robot. The data from this experiment is collected and used to update the learned dynamics model. Then, this new model is used to update the policy. The process of model learning, policy learning, and application of the policy to the robot is repeated until a good policy is found).
this long-term cost function (R) is deter-mined depending on a predicted temporal evolution (F) of a probability distribution (P) of the control variable (x) of the actuator (20) and the parameter (0) is then set to this new value (0*) (p. 2, 1st column, last para., see for policy evaluation, we use the GP dynamics model to iteratively compute Gaussian approximations to the long-term predictions p(s0|θ),…,p(sT|θ) for a given policy parametrization θ. Since PILCO explicitly accounts for model uncertainty in this process, it reduces the effect of model bias [2]. With the predicted state distributions p(st|θ),t=1,…,T, an approximation to the expected longterm cost J(π) in Eq (emphasis added).

   In regards to claim 35, Bischoff teaches:
A machine-readable storage medium (42) storing a computer program set up to carry out a method for automatically setting at least one parameter (0) of an actuator control system (45) (p. 2, 1st column, 1st para., see Fig. 1, see a mobile service robot with a pneumatic ARM, Fig. 3, throttle valve system to regulate a gas or fluid flow) and (p. 4, 1st column, 5th para., see comparative Evaluation on a Throttle Valve Simulation. The throttle valve shown in Figure 3 is an important technical device that allows flow regulation of gas and fluids… The valve system basically consists of a DC-motor, a spring and a valve with position sensors… α and ω are the valve angle and corresponding angular velocity T is the actuator input) (emphasis added).  
which is set up for controlling a control variable (x) of an actuator (20) to a predefinable target value (xd) (desired angle g) (p. 4, 1st column, 5th para., see comparative Evaluation on a Throttle Valve Simulation. The throttle valve shown in Figure 3 is an important technical device that allows flow regulation of gas and fluids… The valve system basically consists of a DC-motor, a spring and a valve with position sensors… α and ω are the valve angle and corresponding angular velocity T is the actuator input) and (p. 4, 2nd column, 2nd para., see we describe the system state as valve angle α and velocity ω, the voltage u(t) corresponds to the action space. The learning goal is to move the valve to a desired angle g) (emphasis added).  
the actuator control system (45) is set up, depending on the at least one parameter (9) ) (velocity ω), the target value (xd) and the control variable (x) (valve angle α)  to generate a manipulated variable (u) (voltage u(t)) and depending on this manipulated variable (u) to control the actuator (20) (p. 4, 1st column, 5th para., see the dynamics of the throttle valve system can be analytically approximated by the model… here α and ω are the valve angle and corresponding angular velocity T is the actuator input) and (p. 4, 2nd column, 2nd para., see we describe the system state as valve angle α and velocity ω, the voltage u(t) corresponds to the action space. The learning goal is to move the valve to a desired angle g). 
a new value (0*) of the at least one parameter (0) is selected de-pending on a long-term cost function (R) (p. 2, 1st column, 2nd para., see it is the goal of the learning agent to find a controller that minimizes the expected long-term cost J(π) … The learning algorithm uses samples si,ai,si+1 to optimize the controller with respect to the expected long-term cost. …  the learning algorithm directly operates on the parameters θ of a controller πθ to minimize the expected long-term cost. On the other hand, value-function approaches learn a long-term cost estimate for each state. Using this estimate, a controller can be determined) and (p. 2, 2nd, column, 2nd para., see when the policy parameters have been learned, the corresponding policy is applied to the robot. The data from this experiment is collected and used to update the learned dynamics model. Then, this new model is used to update the policy. The process of model learning, policy learning, and application of the policy to the robot is repeated until a good policy is found).
this long-term cost function (R) is determined depending on a predicted temporal evolution (F) of a prob-ability distribution (P) of the control variable (x) of the actuator (20) and the parameter (0) is then set to this new value (0*) (p. 2, 1st column, last para., see for policy evaluation, we use the GP dynamics model to iteratively compute Gaussian approximations to the long-term predictions p(s0|θ),…,p(sT|θ) for a given policy parametrization θ. Since PILCO explicitly accounts for model uncertainty in this process, it reduces the effect of model bias [2]. With the predicted state distributions p(st|θ),t=1,…,T, an approximation to the expected longterm cost J(π) in Eq.(emphasis added).

   In regards to claim 36, Bischoff teaches:
Actuator control system (45) which is set up to control a control variable (x) of an actuator (20) to a predefinable target value (xd) (desired angle g) (p. 2, 1st column, 1st para., see Fig. 1, see a mobile service robot with a pneumatic ARM, Fig. 3, throttle valve system to regulate a gas or fluid flow) and (p. 4, 1st column, 5th para., see comparative Evaluation on a Throttle Valve Simulation. The throttle valve shown in Figure 3 is an important technical device that allows flow regulation of gas and fluids… The valve system basically consists of a DC-motor, a spring and a valve with position sensors… α and ω are the valve angle and corresponding angular velocity T is the actuator input) and (p. 4, 1st column, 5th para., see the dynamics of the throttle valve system can be analytically approximated by the model… here α and ω are the valve angle and corresponding angular velocity T is the actuator input) and (p. 4, 2nd column, 2nd para., see we describe the system state as valve angle α and velocity ω, the voltage u(t) corresponds to the action space. The learning goal is to move the valve to a desired angle g) (emphasis added).  
the actuator control system (45) is set up, depending on at least one parameter (0) ) (velocity ω), target value (xd) and the control variable (x) (valve angle α) to generate a manipulated variable (u) (voltage u(t)) and depending on this manipulated variable (u) to control the actuator (20) (p. 4, 1st column, 5th para., see the dynamics of the throttle valve system can be analytically approximated by the model… here α and ω are the valve angle and corresponding angular velocity T is the actuator input) and (p. 4, 2nd column, 2nd para., see we describe the system state as valve angle α and velocity ω, the voltage u(t) corresponds to the action space. The learning goal is to move the valve to a desired angle g). 
the at least one parameter (0) has been set with a method for automatically setting the at least one parameter (0) of the actuator control system (45) (p. 2, 1st column, 1st para., see Fig. 1, see a mobile service robot with a pneumatic ARM, Fig. 3, throttle valve system to regulate a gas or fluid flow) and (p. 4, 1st column, 5th para., see comparative Evaluation on a Throttle Valve Simulation. The throttle valve shown in Figure 3 is an important technical device that allows flow regulation of gas and fluids… The valve system basically consists of a DC-motor, a spring and a valve with position sensors… α and ω are the valve angle and corresponding angular velocity T is the actuator input) (emphasis added).  
a new value (0*) of the at least one parameter (9) is selected de-pending on a long-term cost function (R) (p. 2, 1st column, 2nd para., see it is the goal of the learning agent to find a controller that minimizes the expected long-term cost J(π) … The learning algorithm uses samples si,ai,si+1 to optimize the controller with respect to the expected long-term cost. …  the learning algorithm directly operates on the parameters θ of a controller πθ to minimize the expected long-term cost. On the other hand, value-function approaches learn a long-term cost estimate for each state. Using this estimate, a controller can be determined) and (p. 2, 2nd, column, 2nd para., see when the policy parameters have been learned, the corresponding policy is applied to the robot. The data from this experiment is collected and used to update the learned dynamics model. Then, this new model is used to update the policy. The process of model learning, policy learning, and application of the policy to the robot is repeated until a good policy is found).
this long-term cost function (R) is determined depending on a predicted temporal evolution (F) of a prob-ability distribution (P) of the control variable (x) of the actuator (20) and the parameter (8) is then set to this new value (0*) (p. 2, 1st column, last para., see for policy evaluation, we use the GP dynamics model to iteratively compute Gaussian approximations to the long-term predictions p(s0|θ),…,p(sT|θ) for a given policy parametrization θ. Since PILCO explicitly accounts for model uncertainty in this process, it reduces the effect of model bias [2]. With the predicted state distributions p(st|θ),t=1,…,T, an approximation to the expected longterm cost J(π) in Eq.(emphasis added).

Claim Rejections - 35 USC § 103

11.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

12.	Claim 23 is rejected under 35 U.S.C. 103 as being unpatentable over Bischoff in view of Julia Vinogradska ET AL: "Stability of Controllers for Gaussian Process Forward Models", (hereinafter Vinogradska), published 2016. 
   In regards to claim 20, the rejections above are incorporated respectively.
   In regards to claim 23, Bischoff doesn’t explicitly teach:
the expected temporal evolution (F) of the probability distribution (p) of the control variable (x) is determined by an approximation of an integration over possible values of the control variable (x), this approximation being done through numerical quadrature.
However, Vinogradska teaches such use: (p. 6, 4.2.1 Numerical Quadrature, see we propose Gaussian product integration, which extends univariate Gaussian quadrature to a multivariate rule using a product grid X of evaluation points and positive weights wn for all nodes ξ n ∈ X. Integral (7) is then approximated… resulting in a weighted sum of Gaussian distributions. The approximate state distribution at time t + 1 can be given…  the state distribution at time t is represented by the weight vector α(t) . To propagate any distribution multiple steps through the GP, the basis functions φn must be calculated only once and the task reduces to sequential updates of the weight vector α Our algorithm aims to find a stability region Qs, i.e., a region where the success probability is at least 1 − λ for a given time horizon T, target region Q, and λ>0. Computing m= R Qφ(x)dx, the probability for x (T) to be in Q is approximately m|α(T)). 
Bischoff and Vinogradska are analogous art because they are from the same field of endeavor, program control.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having the teaching of Bischoff and Vinogradska before him or her, to modify the system of Bischoff to include the teachings of Vinogradska, as a system for  stability of controllers using Gaussian process models, and accordingly it would enhance the system of Bischoff, which is focused on policy search for control learning, because that would provide Bischoff with the ability to utilize a probability distribution as suggested by Vinogradska (p. 6, 4.2.1, p. 1, abstract).      

13.	Claims 26 and 27 are rejected under 35 U.S.C. 103 as being unpatentable over Bischoff in view of Vinogradska in view of Waldock et al., US 20090125274 (hereinafter Waldock).
   In regards to claims 20, 23 and 24, the rejections above are incorporated accordingly.
   In regards to claim 26, Bischoff and Vinogradska in particular Bischoff doesn’t explicitly teach:
a density of supporting points (€) is selected depending on a variable (Var), which is a smoothness of the model (g) at least one value (T0 ... tT, T'O ... T'T) Of the control variable (x) in the one or more determined temporal evolutions of the control variable (x).
However, Waldock teaches such use (p. 5, [0066], see FIG. 3 illustrates steps performed during step 206 of FIG. 2. At step 302 installatioin takes place by specifying the probability of the target being measured by the sensor over the range of the sensor's allowed control or reconfiguration parameters (based on the sensor's predicted position as computed at step 204). In general this can be a flat distribution, or it could be biased toward a specific control parameter if there is good prior knowledge or operational reasons to support this) and (p. 5, [0074], see At step 308 the set of probability distributions about the optimal control parameters are sharpened. This can
be achieved using an iterative process that is terminated by a convergence criterion relating to a judgement about how sharp those distributions need to be in practice. In practice, this iterative process is likely to be controlled by two parameters: an upper limit on the time taken to perform the optimization, and the accuracy of the sensor actuation. For example, if the sensor can only orientate to within +/-5 degrees then this will determine the variance (sharpness) of the target probability distribution required). It is noted that the range reads on “a region”, the sharpened distribution reads on “which characterize a smoothness of the model”, and iterative process being terminated by the convergence criterion reads on “a smallest value”.
Bischoff, Vinogradska and Waldock are analogous art because they are from the same field of endeavor, program control.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having the teaching of Bischoff, Vinogradska and Waldock before him or her, to modify the system of Bischoff and Vinogradska, in particular Bischoff to include the teachings of Waldock, as a system for  sensor control, and accordingly it would enhance the system of Bischoff, which is focused on policy search for control learning, because that would provide Bischoff with the ability to choose points in a range, as suggested by Waldock (p. 5, [0066], p. 1, [0002]).      

   In regards to claim 27, Bischoff and Vinogradska in particular Bischoff doesn’t explicitly teach:
the density of supporting points (&) is chosen in a range (Xi) as a function of a minimum value (minVar), wherein the smallest value (minVar) is the smallest value of the variables (Var) characterizing a smoothness of the model on those values (T0 ... TT, T'O ... T'T) of the control variable (x), which are in this range (Xj).
However, Waldock teaches such use (p. 5, [0066], see FIG. 3 illustrates steps performed during step 206 of FIG. 2. At step 302 initialisation takes place by specifying the probability of the target being measured by the sensor over the range of the sensor's allowed control or reconfiguration parameters (based on the sensor's predicted position as computed at step 204). In general this can be a flat distribution, or it could be biased toward a specific control parameter if there is good prior knowledge or  operational reasons to support this) and (p. 5, [0074], see At step 308 the set of probability distributions about the optimal control parameters are sharpened. This can be achieved using an iterative process that is terminated by a convergence criterion relating to a judgement about how sharp those distributions need to be in practice. In practice, this iterative process is likely to be controlled by two parameters: an upper limit on th0e time taken to perform the optimisation, and the accuracy of the sensor actuation. For example, if the sensor can only orientate to within +/-5 degrees then this will determine the variance (sharpness) of the target probability distribution required). It is noted that the range reads on “a region”, the sharpened distribution reads on “which characterize a smoothness of the model”, and iterative process being terminated by the convergence criterion reads on “a smallest value”.
Bischoff, Vinogradska and Waldock are analogous art because they are from the same field of endeavor, program control.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having the teaching of Bischoff, Vinogradska and Waldock before him or her, to modify the system of Bischoff and Vinogradska, in particular Bischoff to include the teachings of Waldock, as a system for  sensor control, and accordingly it would enhance the system of Bischoff, which is focused on policy search for control learning, because that would provide Bischoff with the ability to choose points in a range, as suggested by Waldock (p. 5, [0066], p. 1, [0002]).      

14.	Claims 28, 30 and 34 are rejected under 35 U.S.C. 103 as being unpatentable over Bischoff in view of Vinogradska in view of Huang, US 20170200089. 
   In regards to claims 20 and 23, the rejections above are incorporated accordingly.
   In regards to claim 28, Bischoff and Vinogradska in particular Bischoff doesn’t explicitly teach:
the density of the supporting points (&) of a range (Xi) is also selected depending on a mean density of the supporting points (&) in this range (X).
However, Huang teaches such use (p. 1, [0008], see the analyzing processor is connected the measurement processor for decomposing the data into a plurality of IMFs by utilizing an EMD method, obtains a plurality of probability density functions based on accumulating the distribution of each IMF according to a longest mean time scale) and  (p. 1,  [0005], see the present invention provides a method for data analyzing in a data analyzing processor, comprises receiving a data, then decomposing the data into a plurality of intrinsic mode functions (IMFs) by utilizing an empirical mode decomposition (EMD) method, wherein the intrinsic mode functions are a value changes over time of the data in different frequencies, obtaining a plurality of probability density functions based on accumulating the distribution of each IMF according to a longest mean time scale, and generating an intrinsic probability distribution function (iPDF) component spectrum, wherein the iPDF component spectrum comprises the distribution of probability density functions between a frequency dimensional and a standard deviation dimensional). 
Bischoff, Vinogradska and Huang  are analogous art because they are from the same field of endeavor, program control.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having the teaching of Bischoff, Vinogradska and Huang  before him or her, to modify the system of Bischoff and Vinogradska, in particular Bischoff, to include the teachings of Huang  , as a system for analyzing data by using an intrinsic probability distribution function , and accordingly it would enhance the system of Bischoff, which is focused on policy search for control learning, because that would provide Bischoff with the ability to using supporting points to for a probability distribution evaluation, as suggested by Huang  (p. 1,  [0001], p. 5, [0073]).      

   In regards to claim 30, Bischoff and Vinogradska in particular Bischoff doesn’t explicitly teach:
the determination of a result of the numerical quadrature is dependent on a determination of a temporal evolution of weights (oi), wherein the weights (QO 1) are given in each case by the product of support weights (w,) and the respective values of the probability density (p) at the associated supporting point (&).
However, Huang teaches such use (p. 1,  [0001], see the invention relates to the method and system for data analyzing. In particular, a method and system to extract intrinsic information from any data, stationary or non-stationary to define probability distribution as the intrinsic probability distribution), (p. 2, [0037], see “FIG. 3 illustrates the probability density functions accumulated from each of IMFs in accordance with various embodiments of the present disclosure. The analyzing processor 120 accumulates the distribution of each IMF in different frequencies according to a longest mean time scale to obtain a plurality of probability density functions. And the function  310, 320, 330, 340, 350, 360, 370, 380, 390 provides standard deviations as horizontal axes corresponding to probability density as vertical axes to determine relation points) and (p. 3, [0049], see “FIG. 7 illustrates an iPDF partial sum spectrum of the white noise data. The iPDF partial sum spectrum 700 provides time 702 as horizontal axes corresponding to standard deviation 704 as vertical axes”). It is noted that the relation points read on “the support points”.
Bischoff, Vinogradska and Huang  are analogous art because they are from the same field of endeavor, program control.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having the teaching of Bischoff, Vinogradska and Huang  before him or her, to modify the system of Bischoff and Vinogradska, in particular Bischoff, to include the teachings of Huang  , as a system for analyzing data by using an intrinsic probability distribution function , and accordingly it would enhance the system of Bischoff, which is focused on policy search for control learning, because that would provide Bischoff with the ability to using supporting points to for a probability distribution evaluation, as suggested by Huang  (p. 1,  [0001], p. 5, [0073]).      

   In regards to claim 34, Bischoff doesn’t explicitly teach:
the method is performed using a GPU (43), wherein the determination of a result of the numerical quadrature is dependent on a determination of a temporal evolution of weights (oi), wherein the weights (Qi) are given in each case by the product of support weights (w,) and the respective values of the probability density (p) at the associated sup-porting point (€ ;).
However, Huang teaches such use (p. 1,  [0001], see the invention relates to the method and system for data analyzing. In particular, a method and system to extract intrinsic information from any data, stationary or non-stationary to define probability distribution as the intrinsic probability distribution), (p. 2, [0037], see “FIG. 3 illustrates the probability density functions accumulated from each of IMFs in accordance with various embodiments of the present disclosure. The analyzing processor 120 accumulates the distribution of each IMF in different frequencies according to a longest mean time scale to obtain a plurality of probability density functions. And the function  310, 320, 330, 340, 350, 360, 370, 380, 390 provides standard deviations as horizontal axes corresponding to probability density as vertical axes to determine relation points) and (p. 3, [0049], see “FIG. 7 illustrates an iPDF partial sum spectrum of the white noise data. The iPDF partial sum spectrum 700 provides time 702 as horizontal axes corresponding to standard deviation 704 as vertical axes”). It is noted that the relation points read on “the support points”.
Bischoff, Vinogradska and Huang  are analogous art because they are from the same field of endeavor, program control.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having the teaching of Bischoff, Vinogradska and Huang  before him or her, to modify the system of Bischoff and Vinogradska, in particular Bischoff, to include the teachings of Huang  , as a system for analyzing data by using an intrinsic probability distribution function , and accordingly it would enhance the system of Bischoff, which is focused on policy search for control learning, because that would provide Bischoff with the ability to using supporting points to for a probability distribution evaluation, as suggested by Huang  (p. 1,  [0001], p. 5, [0073]).      
Conclusion

15.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
US Patent Application Publications
Rashev    20040198268

Krosschell   9429235

16.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to Evral Bodden whose telephone number is 571-272-3455.  The examiner can normally be reached on Monday to Friday, 8:30 to 5:00.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chat Do can be reached on 571-272-3721.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/EVRAL E BODDEN/Primary Examiner, Art Unit 2193