Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Specification
The disclosure is objected to because of the following informalities: in paragraph [0059] the phrase “The goal of the stabilizing controller is push all the states in this set back to the original nominal trajectory” should be corrected to “The goal of the stabilizing controller is to push all the states in this set back to the original nominal trajectory”.  
The specification has not been checked to the extent necessary to determine the presence of all possible minor errors. Applicant’s cooperation is requested in correcting any errors the applicant may become aware of in the specification. Appropriate correction is required.

Claim Objections
Claims 1, 6, and 20 are objected to because of the following informalities:  
Claim 1 recites the limitation “determine a local set of deviations of the system, using the learned stochastic system model”. This should be corrected to “the learned stochastic predictive model” following the antecedent basis in the claim.
Claim 6 recites the limitation “wherein the time-invariant local policy is configured to satisfy the robustness constraint which is push a current state of the system in a worst-case deviation state”. This should be corrected to “wherein the time-invariant local policy is configured to satisfy the robustness constraint which is configured to push a current state of the system in a worst-case deviation state” or an equivalent that would match the language in paragraphs [0065] and [0066] in Applicant’s specification.
Claim 6 also recites “the time-invariant local policy” which should be corrected to match “the local time-invariant feedback policy” from claim 1 on which it depends.
Claim 20 recites “The method of claim 3”, this should be corrected to “The method of claim 17” or possibly “the method of claim 19” as claim 3 is interpreted as a system claim directed to “the controller of claim 1” and claim 20 corresponds to method claim 4 which depends on claim 3.
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claims 1 and 17 recite the limitation "determine a gradient of the robustness constraint".  There is insufficient antecedent basis for a robustness constraint in the claims.

Claims 4 and 20 recites the limitation "the discrete-time dynamical system".  There is insufficient antecedent basis for this limitation in the claim or in claims 1 and 3 on which claim 4 depends or in claims 17 or 19 on which Examiner is interpreting that claim 20 depends. The discrete-time dynamical system is claimed in claims 2 and 18.

Claim 6 recites the limitation "an error-tolerance around the trajectory".  There is insufficient antecedent basis for “the trajectory” in the claim, as claim 1 recites “a control trajectory” “a nominal trajectory”, “an optimal system state trajectory” and “a state-control trajectory”. It is not clear which trajectory “the trajectory” refers to, though for purposes of prior art examination, Examiner is interpreting that this claim is describing the process of determining an optimal system state trajectory that will move the state with the worst-case deviation back towards to the nominal trajectory.
Claim 9 recites the limitations "the additional robustness constraint" and “the additional time-constant feedback controller”.  There is insufficient antecedent basis for these limitations in the claim or in claim 1 on which it depends. For purposes of prior art examination, Examiner is interpreting that these limitations refer to “the robustness constraint” and “the time-invariant feedback controller” from claim 1.
Claim 13 recites the limitations "the 3D camera".  There is insufficient antecedent basis for these limitations in the claim or in claims 1 and 12 on which it depends. The 3D camera is first claimed in claim 11.
Dependent claims 2-16 and 18-20 are also rejected because they fail to correct the deficiencies of their respective independent claims.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Koc et al (“Optimizing the Execution of Dynamic Robot Movements with Learning Control”, herein Koc) in view of Zeman et al (“Robust Model Predictive Control of Linear Time Invariant Systems with Disturbances”, herein Zeman).
Regarding claim 1, Koc teaches a controller for optimizing a local control policy of a system for trajectory-centric reinforcement learning (page 909 right column para. 2 recites “Iterative learning control (ILC) is a control theoretic learning framework restricted to tracking (time varying) reference trajectories [3]. In ILC, the goal is to improve the tracking performance, reducing the future deviations along the fixed trajectory, and ultimately driving them to the minimum possible. After observing the deviations from the reference trajectory at each iteration, the errors are fed back to the (feedforward) control inputs for the next iteration. Any available dynamics models can be incorporated easily during these updates, see e.g., [4] and [5]. ILC has been used successfully in several robotics tasks to improve trajectory tracking performance under unknown repeating disturbances and model mismatch”. Page 910 left column para. 3 recites “Our contributions can be stated succinctly as follows: We propose a new adaptive and cautious model-based ILC (i.e. iterative learning control) algorithm, that is implemented efficiently using a recursive formulation. More specifically, the existing model-based recursive ILC approach of Amann et al. [5], introduced briefly in Section II, is extended to include adaptation (by using Linear Bayes Regression on the errors) and caution (or in other terms, robustness to modeling errors, which shows itself as learning stability in the iteration domain)” (i.e. optimizing control using trajectory-centric reinforcement learning)), comprising:
an interface configured to receive data including tuples of system states, control data and state transitions measured by sensors (figs. 1 and 10 show the robotic system; the description of fig. 1 recites the robot table tennis platform where a seven degree of freedom Barrett WAM arm is shown facing a ball-launcher. The ball is tracked using four cameras on the ceiling. Whenever a ball is approaching the robot, reference trajectories are computed online in order to return the ball to a desired location on the opponent’s court (i.e. the interface receives state, control, and transition data as measured by visual sensors). Page 911 right column para. 4 recites the goal in trajectory tracking is to track a given reference r(t), 0 ≤ t ≤ T , by applying the control inputs u(t). In dynamic robotic tasks, the references are often in the combined state space of joint positions and velocities (qT, q.T)T ϵ Q ⊂ R2n, and the control inputs u ϵ U ⊂ Rm are applied for each joint of the robot, i.e., m = n. (i.e. the system state, control, and transition data));
a memory to store processor executable programs including a stochastic predictive learned model for generating a nominal state and control trajectory for a desired time-horizon as a function of time steps, in response to a task command for the system received via the interface (page 910 left column para. 3 recites a new adaptive and cautious model-based ILC algorithm, that is implemented efficiently using a recursive formulation. More specifically, the existing model-based recursive ILC approach of Amann et al. [5], introduced briefly in Section II, is extended to include adaptation (by using Linear Bayes Regression on the errors) and caution (or in other terms, robustness to modeling errors, which shows itself as learning stability in the iteration domain. Page 913 right column para. 3 recites the posterior model covariances Σkj can be used to make more cautious decisions within a stochastic control framework (i.e. a stochastic predictive model for generating state and control trajectories over time)), a control policy including machine learning method algorithms and an initial random control policy (page 915 shows Algorithm 1 for the iterative learning control, page 916 shows Algorithm 2 for the use case of robotic tennis striking  (i.e. a control policy including machine learning algorithms). Page 918 left column para. 1-2 recite the realistic simulation environment SL [31] acts as both a simulator and as a real-time interface to the Barrett WAMs in our experiments. The initial positioning is given by a PD controller with high gains on the shoulder joints, which is then toggled off during the experiments with the striking movements, as summarized in Algorithm 2. The high-gain PD controller used to initialize the robots was also tested for tracking the striking movements, see Fig. 9 (i.e. an initial random control policy)), a local policy for regulating deviations along a nominal trajectory (page 912 right column para. 3 recites the observed deviations from the trajectory at iteration k, ekj can be used to update the discrete-time LTV model matrices Akj Bkj that describe the nonlinear dynamics around the trajectory, to first order. Instead of estimating all the parameters together in a costly estimation procedure, the model matrices Akj Bkj can rather be updated separately for each j = 1, . . . ,N, given the smoothened errors ekj (i.e. a policy for regulating trajectory deviations));
at least one processor configured to:
learn the stochastic predictive model for the system using a set of the data collected during trial and error experiments performed using the initial random control policy (page 918 left column para. 1-2 recite the realistic simulation environment SL [31] acts as both a simulator and as a real-time interface to the Barrett WAMs in our experiments. The initial positioning is given by a PD controller with high gains on the shoulder joints, which is then toggled off during the experiments with the striking movements, as summarized in Algorithm 2. The high-gain PD controller used to initialize the robots was also tested for tracking the striking movements, see Fig. 9 (i.e. the predictive model is based on data sets from experiments using the random control policy));
estimate mean prediction and uncertainty associated with the stochastic predictive model (page 913 para. 3 – page 914 para. 1 recite the uncertainty of the model parameters can be seen as a multiplicative noise model and the ILC optimality criterion (9) can be extended to include expectations over them. The multiplicative noise model, unlike the additive noise case, does not lead to certainty-equivalence: the covariance estimates are incorporated in the decision rule. Page 915 left column para 2 recites during the cautious ILC update the feedback control law as well as the feedforward control inputs are updated recursively (line 9). From the first iteration onwards, the means and the covariances of the model matrices are updated (line 14) before computing the feedforward input δukj compensations and the feedback matrices Kkj (i.e. estimating the mean and uncertainty of the predictive model));
formulate a trajectory-centric controller synthesis problem to compute the nominal trajectory along with a feedforward control and a stabilizing [time-invariant] feedback control simultaneously (page 914 left column para. 3 recites the ILC update is decomposed into two components: a current iteration feedback term ufb = KkjEk+1j calculated using the iteration dependent Riccati equations and a feedforward, purely predictive term uff = -φ-1kjℓkj, solved backwards for each j = 1, . . . ,N. The feedforward terms are responsible for compensating for the estimated random disturbances, calculated using eq (10) (i.e. computing the trajectory with a feedforward control and a stabilizing feedback control simultaneously. Examiner’s note: the equations for calculating the feedforward and feedback compensations are also shown as equation 24 and its simplified form equation 29));
determine a local set of deviations of the system, using the learned stochastic system model, from a nominal system state upon use of a control input at a current time-step (page 913 right column para. 3 – page 914 left column para. 1 recite the posterior model covariances Σkj can be used to make more cautious decisions within a stochastic control framework. The uncertainty of the model parameters can be seen as a multiplicative noise model and the ILC optimality criterion (9) can be extended to include expectations over them. The multiplicative noise model, unlike the additive noise case, does not lead to certainty-equivalence: the covariance estimates are incorporated in the decision rule. To see how the expected cost minimization leads to caution, note that [eq (22)] follows from Markov’s inequality. Page 914, left column, paragraph 3 recites the feedforward terms are responsible for compensating for the estimated random disturbances dj, calculated using eq (10) (i.e. a set of deviations from the nominal system trajectory));
determine a system state with a worst-case deviation from the nominal system state in the local set of deviations of the system (page 913 right column para. 3 – page 914 left column para. 1 recite the posterior model covariances Σkj can be used to make more cautious decisions within a stochastic control framework. The uncertainty of the model parameters can be seen as a multiplicative noise model and the ILC optimality criterion (9) can be extended to include expectations over them. The multiplicative noise model, unlike the additive noise case, does not lead to certainty-equivalence: the covariance estimates are incorporated in the decision rule. To see how the expected cost minimization leads to caution, note that [eq (22)] follows from Markov’s inequality. Minimizing the upper bound forces the probability of nonmonotonicity to be low as well. Figs 4 and 5 also illustrate the set of local deviations from the nominal trajectory over a set of iterations (i.e. determining a local set of deviations using the stochastic predictive model at a current time step; the upper bound in this case would be the worst case deviation));
determine a gradient of the robustness constraint by computing a first-order derivative of the robustness constraint at the system state with worst-case deviation (page 914 shows equation 24 and its simplified version equation 29 which determine the gradient of the robustness constraint for the set of deviated trajectories, which would include the worst case deviation; these equations are derived step by step in Appendix A on page 922);
determine the optimal system state trajectory, the feedforward control input and a local [time-invariant] feedback policy that regulates the system state to the nominal trajectory by minimizing a cost of a state-control trajectory while satisfying state and input constraints (page 915 left column para. 1-2 recites the online learning algorithm is readily applicable to similar dynamic tasks with real-time constraints, such as throwing, catching skills in sports or fast, demanding manufacturing tasks (i.e. state and input constraints). The proposed ILC framework is summarized in Algorithm 1. Before entering the main loop (lines 7−16), the trajectory is executed with inverse dynamics and time-varying LQR feedback (line 5). The errors along the trajectory are filtered with a zero phase filter (line 6). During the cautious ILC update the feedback control law as well as the feedforward control inputs are updated recursively (line 9) (i.e. the optimal trajectory is determined with a feedforward control input and a feedback policy). From the first iteration onwards, the means and the covariances of the model matrices are updated (line 14) before computing the feedforward input compensations δuk,j and the feedback matrices Kk,j . If the variant adaptation laws discussed in Section III are employed, it will be enough to store the means and covariances of the relevant model parameters. These parameters can then be transformed, as discussed before, to form the discrete-time model matrix means and covariances, which are used in the cautious ILC update (line 9). Page 915 left column para. 4 – page 915 right column para. 1 recite the practitioner, wary of the model inaccuracies, can increase robustness and ensure stability by setting large diagonal terms for the initial covariance of model uncertainty, Σ0j = ϒI, ϒ >>1, j = 1, . . . N. Moreover, setting large covariances initially helps to observe the inaccuracies of the model and the noise statistics. The covariance will be suitably decreased over the iterations, as adaptation (18) updates the linear models (i.e. minimizing the costs of a deviated trajectory in order to return to the optimal trajectory));
solve a robust policy optimization using non-linear programming (Page 914 left column para. 2 recites for the expected cost case, where the expectation is taken over the random variables Akj and Bkj, for each j, the optimality criterion [eq (23)] can be solved recursively using dynamic programming [eq (24)]. Page 914 also shows eq (29), which optimizes the feedforward and feedback components using nonlinear programming);
update the control data according to the solved optimization problem (Page 915 shows Algorithm 1 line 14 and page 915 left column para. 2 recites the means and the covariances of the model matrices are updated (line 14) before computing the feedforward input compensations δukj and the feedback matrices Kkj); and
output the updated control data via the interface (the description of fig. 6 on page 918 recites one of the desired trajectories, shown in dashed red on the RHS, is tracked very closely in the final iteration. The blue markers correspond to the time profile of the motion, which are drawn uniformly spaced, one for each 80 ms. The final hitting positions reached are shown as filled circles. Figs. 6 and 7 show the output trajectories of the updated control data for a striking movement).
However, Koc does not explicitly teach that the stabilizing feedback policy may be a time invariant stabilizing feedback policy.
Zeman teaches a time invariant stabilizing feedback policy (page 326 right column para. 3 recites let as consider a dual mode control law similar to Scokaert and Mayne (1998). For the dual mode control it is necessary to design the inner and outer controller as follows
uk+i = u(xk), i = 0, ... ,N-1, X /ϵ T
uk+i = -Kxk+i , Ɐi >= N, X ϵ T
The outer controller operates when the state is outside the invariant set and steers the state of the system to the target robust invariant set T. The inner controller operates when the state is in the robust control invariant set T. As the inner controller, the state feedback control law uk+i = -Kxk+i , Ɐi >= N will be further considered (i.e. a time invariant stabilizing feedback policy)).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine these teachings by using the time-invariant feedback policy from Zeman to replace the time-varying feedback policy from Koc. Whether the feedback is time-varying or time-invariant depends largely on the type of trajectory being tracked (i.e. Koc teaches a use case for table tennis where the trajectory is varying over time vs a use case where the trajectory is the same each time), so one of ordinary skill would know to substitute a known element (the time-invariant feedback policy) for a another (the time varying feedback policy) in order to obtain predictable results. 
Regarding claim 2, the combination of Koc and Zeman teaches the controller of claim 1, wherein the system is a discrete-time dynamical system (Koc page 912 left column para. 1 recites in the error dynamics (5), the additional (unknown) term d(t, u) accounts for the disturbances and the effects of the linearization (i.e., higher order terms).We can discretize (5)–(6) with step size δ, N = T/δ and step index j = 1, . . . , N to get the following discrete-time linear system: ej+1 = Ajej + Bjδuj + dj+1 (7); where the matrices Aj ,Bj are the discretizations of (6). Conventional (discrete) ILC algorithms learn to compensate for the errors incurred along the trajectory by updating the control inputs δuj iteratively. Whenever we refer to the outcome of a particular iteration k, we will use the first subindex for iterations and the second subindex will be used to denote the (discrete) time step, i.e., the vectors ek,j ϵ R2n, δukj ϵ Rm denote the deviations and control input compensations at the time step j during iteration k, respectively (i.e. a discrete-time dynamical system)).
Regarding claim 3, the combination of Koc and Zeman teaches the controller of claim 1, wherein a trajectory-centric control policy is synthesized by a time-dependent feedforward control and a local time-invariant feedback control that stabilizes the time-dependent feedforward control (page 914 left column para. 3 recites the ILC update is decomposed into two components: a current iteration feedback term ufb = KkjEk+1j calculated using the iteration dependent Riccati equations and a feedforward, purely predictive term uff = -φ-1kjℓkj, solved backwards for each j = 1, . . . ,N. The feedforward terms are responsible for compensating for the estimated random disturbances, calculated using eq (10) (i.e. computing the trajectory with a feedforward control and a stabilizing feedback control simultaneously. Examiner’s note: the equations for calculating the feedforward and feedback compensations are also shown as equation 24 and its simplified form equation 29)).
Regarding claim 4, the combination of Koc and Zeman teaches the controller of claim 3, wherein a synthesis of the trajectory-centric control policy for the discrete-time dynamical system is formulated as a non-linear optimization program with non-linear constraints (Koc page 912 left column para. 1 recites in the error dynamics (5), the additional (unknown) term d(t, u) accounts for the disturbances and the effects of the linearization (i.e., higher order terms).We can discretize (5)–(6) with step size δ, N = T/δ and step index j = 1, . . . , N to get the following discrete-time linear system: ej+1 = Ajej + Bjδuj + dj+1 (7); where the matrices Aj ,Bj are the discretizations of (6). Page 914 shows equation 24 (and its simplified form equation 29), which is the non-linear optimization equation for the optimality criterion in equation 23, which teaches non-linear constraints).
Regarding claim 5, the combination of Koc and Zeman teaches the controller of claim 4, wherein the non-linear constraints are system dynamics and stabilizing constraints for the local time-invariant feedback policy (Koc page 919 right column para. 2 recites desired Cartesian position, velocity, and orientations of the racket at the hitting time T impose constraints on the seven joint angles and seven joint velocities of the robot arm at T. Along with the desired hitting time T (or the time until impact), these 15 parameters are used to generate third-order joint space polynomials (i.e. non-linear constraints include the system dynamics). Stabilizing constraints are taught by Koc equation 23 on page 914. The time invariant feedback policy is taught by Zeman (see analysis of claim 1)).
Regarding claim 6, the combination of Koc and Zeman teaches the controller of claim 1, wherein the time-invariant local policy is configured to satisfy the robustness constraint which is push a current state of the system in a worst-case deviation state at a current time step into an error-tolerance around the trajectory at a next time step (minimizing or pushing the worst-case state deviation towards an optimal trajectory is taught by equation 24 on page 914 and figs. 4-6 of Koc, Zeman teaches the time-invariant feedback policy (see analysis of claim 1)).
Regarding claim 7, the combination of Koc and Zeman teaches the controller of claim 1, wherein local sets of uncertainty along the nominal trajectory are obtained by a stochastic function approximator used to learn a forward dynamics model of the system (Koc page 913 para. 2 recites the forward dynamics model3 (3) can then be used to sample the means and variances of the continuous LTV matrices, e.g., using Monte Carlo sampling (i.e. a forward dynamics model of the system). Page 921 right column para. 1 recites the [ILC] algorithm was then recast in a more efficient form (derived in Appendix A), which does not require the estimation of disturbances and can be implemented as a recursive ILC update. The update law makes it easy to introduce caution with respect to modeling uncertainties and online adaptation of the linearized model matrices. Unlike typical ILC updates, feedback matrices for the tracking of striking trajectories are adapted as well, which are useful for rejecting noise and varying initial conditions (i.e. the uncertainty of the trajectory is calculated as part of the model).
Regarding claim 8, the combination of Koc and Zeman teaches the controller of claim 1, wherein the worst-case deviation state for the system at every state along a nominal trajectory in a known set is obtained by solving an optimization problem (Koc page 912 left column para. 3-4 recite norm-optimal ILC uses the discrete LTV model in (7) to minimize the next iteration errors, where the computed control inputs are optimal with respect to some vector norm. These approaches based on optimality criteria can learn efficiently by taking advantage of the inaccurate models. Batch methods that compute the next iteration compensations stack the model matrices together to compute (a possibly weighted and dampened) pseudoinverse of this block lower-diagonal matrix. As an alternative, some methods use convex programming to compute these optimal compensations under additional constraints. The condition of this lifted model matrix typically grows exponentially with the horizon size N and computing the pseudoinverse stably becomes very difficult. Downsampling trajectories restores the condition number and a stable inversion becomes much more manageable, at the cost of reduced tracking performance. As a better alternative, optimization-based approaches, depending on the particular optimizer, may avoid computing the pseudoinverse (i.e. a deviation state (including the worst case deviation) can be obtained by solving an optimization problem)).
Regarding claim 9, the combination of Koc and Zeman teaches the controller of claim 1, wherein the formulated nonlinear program with the additional robustness constraint is solved to obtain the feedforward control along with the additional time-constant feedback controller (see 112(b) rejection of claim 9 for interpretation of “the additional robustness constraint” and “the additional time-constant feedback controller”) using the gradient of the robustness constraint at the worst-case deviation state (Koc page 912 left column para. 1-2 recite conventional (discrete) ILC algorithms learn to compensate for the errors incurred along the trajectory (i.e. the deviation state, which includes the worst case deviation state) by updating the control inputs δuj iteratively. Whenever we refer to the outcome of a particular iteration k, we will use the first subindex for iterations and the second subindex will be used to denote the (discrete) time step, i.e., the vectors ekj ∈ R2n, δukj ∈ Rm denote the deviations and control input compensations at the time step j during iteration k, respectively. The control commands to be applied at iteration k + 1 as uk+1,j = ukj + δukj (8) are computed using the deviations ekj at iteration k. Page 914 shows equations 23,24, and 29, which calculate the feedforward and feedback components using nonlinear programming).
Regarding claim 10, the combination of Koc and Zeman teaches the controller of claim 1, wherein at least one of the sensors performs a wireless communication via the interface (fig. 1, 10, and page 919 left column para. 2 recites we perform experiments on our robotic table tennis platform, see Fig. 10, where two seven DoF cable-driven, torque controlled Barrett WAM arms (Ping and Pong) are hanging from the ceiling. The custom made Barrett WAM arms are capable of high speeds and accelerations (approx. up to 10 m/s2 in task space). Standard size rackets (16 cm diameter) are mounted on the end-effector of the arms as can be seen in Fig. 10. A vision system consisting of four cameras hanging from the ceiling around each corner of the table is used for tracking the ball [30]. Page 919 left column para. 3 recites the realistic simulation environment SL [31] acts as both a simulator and as a real-time interface to the Barrett WAMs in our experiments. The initial positioning is given by a PD controller with high gains on the shoulder joints, which is then toggled off during the experiments with the striking movements, as summarized in Algorithm 2. The high-gain PD controller used to initialize the robots was also tested for tracking the striking movements, see Fig. 9 (i.e. the sensors communicate wirelessly with the interface of the robotic system)).
Regarding claim 11, the combination of Koc and Zeman teaches the controller of claim 1, wherein at least one of the sensors is a three dimensional (3D) camera providing moving pictures including depth images (Koc page 919 left column para 2-3 recite a vision system consisting of four cameras hanging from the ceiling around each corner of the table is used for tracking the ball [30]. A ball launcher (see Fig. 1) is available to throw balls accurately to a fixed position inside the workspace of the robots. The incoming ball arrives with low-variability in desired positions and higher-variability in ball velocities. The whole area to be covered amounts to about 1 m2 circular region surrounding a centered posture of the robots. The realistic simulation environment SL [31] acts as both a simulator and as a real-time interface to the Barrett WAMs in our experiments. The initial positioning is given by a PD controller with high gains on the shoulder joints, which is then toggled off during the experiments with the striking movements, as summarized in Algorithm 2. The high-gain PD controller used to initialize the robots was also tested for tracking the striking movements, see Fig. 9 (i.e. the sensor can be a 3D camera, having multiple cameras would provide depth images since a 3D position of the ball would be required for the robot to strike accurately)).
Regarding claim 12, the combination of Koc and Zeman teaches the controller of claim 1, wherein the sensors are arranged in the system and predetermined peripheral positions (Koc figs. 1, 10 and page 919 left column para 2 recite a vision system consisting of four cameras hanging from the ceiling around each corner of the table is used for tracking the ball [30] (i.e. the visual sensors are arranged in the system and provide peripheral positions)).
Regarding claim 13, the combination of Koc and Zeman teaches the controller of claim 12, wherein at least one of the predetermined peripheral positions is determined by a view-angle such that the 3D camera captures a moving range of the system (Koc figs. 1, 9, 10 and page 919 left column para 2-3 recite a vision system consisting of four cameras hanging from the ceiling around each corner of the table is used for tracking the ball [30]. A ball launcher (see Fig. 1) is available to throw balls accurately to a fixed position inside the workspace of the robots. The incoming ball arrives with low-variability in desired positions and higher-variability in ball velocities. The whole area to be covered amounts to about 1 m2 circular region surrounding a centered posture of the robots. The realistic simulation environment SL [31] acts as both a simulator and as a real-time interface to the Barrett WAMs in our experiments. The initial positioning is given by a PD controller with high gains on the shoulder joints, which is then toggled off during the experiments with the striking movements, as summarized in Algorithm 2. The high-gain PD controller used to initialize the robots was also tested for tracking the striking movements, see Fig. 9 (i.e. the positions of the visual sensors enable capture of a moving range of the system)).
Regarding claim 14, the combination of Koc and Zeman teaches the controller of claim 1, wherein the trajectory-centric controller synthesis problem is a non-linear program (Koc page 914 shows equations 23, 24, and 29, which use dynamic programming (i.e. non-linear programming) to solve for an optimal trajectory).
Regarding claim 15, the combination of Koc and Zeman teaches the controller of claim 1, the local policy is a time-invariant feedback policy or a local stabilizing controller (Koc page 910 left column para. 3 recites in the closed-form solution, the covariances of the learned local linear models are employed as adaptive regularizers (i.e. a stabilizing controller). Zeman teaches a time invariant feedback policy (see claim 1)).
Regarding claim 16, the combination of Koc and Zeman teaches the controller of claim 1, wherein the control trajectory is an open-loop trajectory (Koc page 914 right column para. 3 recites typically ILC is used to feed the past errors along the trajectory (filtered and multiplied with a learning matrix) back to the system for the next trial as feedforward compensations. A well designed feedback controller, whenever available, is only used to reject nonrepeating disturbances and to stabilize the system in the time domain. The recursive implementation (29), on the other hand, readily provides and updates a feedback controller based on past performance. From here on, we will refer to the feedforward part of (29) as δukj, keeping the feedback control separate (i.e. the control trajectory can be an open loop trajectory)).
Claim 17 is a method claim and its limitation is included in claim 1. The only difference is that claim 1 requires a method (Koc page 910 left column para. 3 recites “Our contributions can be stated succinctly as follows: We propose a new adaptive and cautious model-based ILC (i.e. iterative learning control) algorithm, that is implemented efficiently using a recursive formulation. Pages 915 and 916 also show Algorithm 1 and Algorithm 2, respectively (i.e. a method)). Therefore, claim 17 is rejected for the same reasons as claim 1.
Claim 18 is a method claim and its limitation is included in claim 2. Claim 18 is rejected for the same reasons as claim 2.
Claim 19 is a method claim and its limitation is included in claim 3. Claim 19 is rejected for the same reasons as claim 3.
Claim 20 is a method claim and its limitation is included in claim 4. Claim 20 is rejected for the same reasons as claim 4.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
“Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning” (Chebotar et al) teaches reinforcement learning for robotic systems using a time varying linear Gaussian policy to combine a model-free and model-based framework for path planning improvement.
“Stabilizing a linear system with quantized state feedback” (Delchamps) teaches stabilizing an unstable, time-invariant, discrete-time, linear system by means of state feedback.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEAH M FEITL whose telephone number is (571)272-8350. The examiner can normally be reached on M-F 0800-1700.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B. Zhen can be reached on (571) 272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll- free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
	/L.M.F./             Examiner, Art Unit 2121                                                                                                                                                                                           


/Li B. Zhen/             Supervisory Patent Examiner, Art Unit 2121