DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant summarizes the claims and the mapping of prior art.  Applicant then argues that Wang does not teach or suggest that the initial control policy or adjusted control policy include "design parameters for configuring the robot".  The training samples and policy-generated trajectories/expert trajectories cited, and the training of the neural network in general from the reference are maintained to be equivalent to design parameters.
Applicant argues that Wang is directed to methods for controlling a robot during runtime, after design, and is not directed towards methods related to the design phase of a robot.  A control system for a robot that is updated (real-time or otherwise) during operation of the robot has an initial design which is being updated.  As such, absent further definition, the scope of "design" would include changes made to a control system while the robot is operating, such as updates based on data.  This is especially true in the case of machine learning systems (i.e. neural networks) which are updated.  
In view of the above, absent further definition in the claim that clearly excludes the training and verification of a machine learning system from falling within the scope of "design", applicant's arguments are respectfully not persuasive.  It is irrelevant in terms of applicability of the reference that some operations may be performed during runtime, as a machine learning control system can have its design altered by being trained during runtime (see above).  The prior-made rejections are maintained.



Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-4, 6, 9-14, 16, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Wang (Wang, Z., Merel, J., Reed, S., Wayne, G., de Freitas, N., & Heess, N. (2017). Robust imitation of diverse behaviors. arXiv preprint arXiv:1707.02747.) in view of Kobayashi (US 20090106177 A1).
Regarding Claim 1:
Wang teaches:
generating a plurality of design samples, (p.2, The generator attempts to generate samples; p.3, GAIL ... constructs a reward function using GANs to measure the similarity between the policy-generated trajectories and the expert trajectories.; p.5, for j ∈ {1, · · · , n} do ... Sample trajectory τj from the demonstration set and sample ...)
wherein a first design sample included in the plurality of design samples includes a first set of design parameter values associated with a first robot model; (p.2, These trajectories may have been produced by either an artificial or natural agent.; p.5, for j ∈ {1, · · · , n} do ... Sample trajectory τj from the demonstration set and sample ...)
generating a plurality of behavioral metrics based on the plurality of design samples, (p.1, The end product of this is a robust neural network policy that can imitate a large and diverse set of behaviors using few training demonstrations; p.3, It constructs a reward function using GANs to measure the similarity between the policy-generated trajectories and the expert trajectories. As in GANs, GAIL adopts the following objective function; p.3 Behavioral cloning with variational autoencoders suited for control)
wherein a first behavioral metric associated with the first design sample indicates a first expression level with which the first robot model performs a first behavior when configured according to the first set of design parameter values; (p.3, … GAIL “constructs a reward function using GANs to measure the similarity between the policy-generated trajectories and expert trajectories” using an objective function illustrated in equation 3. The “trajectories” are motions of the robotic arm and the expert trajectories are the motions of the first robot while the policy-generated trajectories are the motion of the second robot. The measure of the similarity between these two motions is the claimed accuracy as expressed by the Error term in the objective function of equation 3. Moreover, because the training process is an iterative one and the objective function operates to reduce the error on each iteration one of ordinary skill in the art understands that as the training progresses the policy-generated trajectories and time n are compared to the policy-generated trajectories at time n+1. Therefore the comparison between two simulated robots is made obvious; Figure 4 clearly illustrates the claimed language where the simulated avatars are analogous to the claimed robot where they are performing a walking type behavior where speed of the walk is an expression. Further Figure 4 clearly illustrates clustering “trajectories nearby in the plot tend to correspond to similar movement styles even when differing in speed. Therefore the plot is a clear illustration of the claim limitation in question )
generating a first mapping, based on the plurality of design samples and the plurality of behavioral metrics, that indicates a second expression level with which the first robot model performs the first behavior when configured according to a second set of design parameter values; and (page 3: “… the action decoder is an MLP that maps the concatenation of the state and the embedding to the parameter of a Gaussian policy…” page 5 algorithm 1: “… update policy parameters…)
generating a third set of design parameter values based on the first mapping, wherein the first robot model performs the first behavior when configured according to the third set of design parameter values. (page 5 algorithm 1 run policy and update policy…”; page 5 section 4 “robotic arm reaching”; Figure 4 walking)
Wang does not teach in particular, but Kobayashi teaches:
A computer-implemented method  (¶10 using a genetic algorithm; ¶23 causing a computer of the information processing device to execute processing; ¶149 With this computer 100, a CPU (Central Processing Unit) 101, ROM (Read Only Memory) 102, and RAM (Random Access Memory) 103 are mutually connected by a bus 104.; Fig. 13)
It would have been obvious to one of ordinary skill in the art at the time the invention was filed to combine the robot learning system of Wang with the genetic algorithms of Kobayashi, as Wang teaches to use Generative Adversarial Imitation Learning (GAIL) algorithms to compare a simulated robot’s designed behavior (expression) to a target robot behavior (expression) (page 5 algorithm 1) where the target robot behavior may also be simulated (i.e., artificial see page 2 last sentence), and Kobayashi teaches to use a computer (FIG. 13) in a process that includes the calculation of a feature amount resulting from genetic algorithms (FIG 1, 8, 11), therefore Kobayashi would provide a practical implementation for the features of Wang.

Regarding Claim 2:
Wang teaches:
discretizing the first range of design parameter values to produce a discretized first range of design parameter values; (p.2 For continuous latent variables, this bound can be optimized efficiently via the re-parameterization trick [15, 26]. This class of models are often referred to as VAEs.)
sampling the discretized first range of design parameter values to generate a first parameter value that is included in the first set of parameter values. (p.2  The generator attempts to generate samples that are indistinguishable from real data. The job of the discriminator is then to tell apart the data and the samples, predicting 1 with high probability if the sample is real and 0 otherwise. More precisely, GANs optimize the following objective function)
Wang does not teach in particular, but Kobayashi teaches:
obtaining a first range of design parameter values that includes a maximum value for a first design parameter and a minimum value for the first design parameter; (Figure 1, Figure 2: “existing feature amounts”; Figure 7 current generation evaluation level; Figure 11; ¶56 maximum value (MaxIndex); ¶83 equal to or greater than a predetermined threshold to the minimum value of the setting range thereof)
It would have been obvious to one of ordinary skill in the art at the time the invention was filed to combine the robot learning system of Wang with the genetic algorithms of Kobayashi, as Wang teaches to use Generative Adversarial Imitation Learning (GAIL) algorithms to compare a simulated robot’s designed behavior (expression) to a target robot behavior (expression) (page 5 algorithm 1) where the target robot behavior may also be simulated (i.e., artificial see page 2 last sentence), and Kobayashi teaches to use a computer (FIG. 13) in a process that includes the calculation of a feature amount resulting from genetic algorithms (FIG 1, 8, 11), therefore Kobayashi would provide a practical implementation for the features of Wang.

Regarding Claim 3:
Wang teaches:
comparing the first design sample to at least one other design sample included in the plurality of design samples, (page 3: “… GAIL “constructs a reward function using GANs to measure the similarity between the policy-generated trajectories and expert trajectories” using an objective function illustrated in equation 3. The “trajectories” are motions of the robotic arm and the expert trajectories are the motions of the first robot while the policy-generated trajectories are the motion of the second robot. The measure of the similarity between these two motions is the claimed accuracy as expressed by the Error term in the objective function of equation 3. Moreover, because the training process is an iterative one and the objective function operates to reduce the error on each iteration one of ordinary skill in the art understands that as the training progresses the policy-generated trajectories and time n are compared to the policy-generated trajectories at time n+1. Therefore the comparison between two simulated robots is made obvious)
wherein the at least one other design sample includes at least one set of design parameter values associated with the first robot model; and (page 5: “… on a reaching task with a simulated Jaco arm. The physical Jaco is a robotics arm developed by Kinova Robotics…”)
Wang does not teach in particular, but Kobayashi teaches:
determining that the first expression level exceeds any expression level with which the first robot model performs the first behavior when configured according to the at least one set of design parameter values. (¶83 equal to or greater than a predetermined threshold to the minimum value of the setting range thereof; see also Wang, page 3: “… the action decoder is an MLP that maps the concatenation of the state and the embedding to the parameter of a Gaussian policy…” page 5 algorithm 1: “… update policy parameters…)
It would have been obvious to one of ordinary skill in the art at the time the invention was filed to combine the robot learning system of Wang with the genetic algorithms of Kobayashi, as Wang teaches to use Generative Adversarial Imitation Learning (GAIL) algorithms to compare a simulated robot’s designed behavior (expression) to a target robot behavior (expression) (page 5 algorithm 1) where the target robot behavior may also be simulated (i.e., artificial see page 2 last sentence), and Kobayashi teaches to use a computer (FIG. 13) in a process that includes the calculation of a feature amount resulting from genetic algorithms (FIG 1, 8, 11), therefore Kobayashi would provide a practical implementation for the features of Wang.

Regarding Claim 4:
Wang teaches:
generating an initial mapping between the first set of design parameter values and the first behavioral metric; (page 3: “… policy gradient algorithms are used to train the policy…”; page 5 algorithm 1: “…. gradient…”; page 8: “…. 250 random trajectories from 6 different neural network controller that were trained…”; page 1: “…. the end product of this is a robust neural network…”)
"evaluating the initial mapping based on the first set of design parameter values to generate an estimated expression level with which the first robot model performs the first behavior when configured according to the first set of design parameter values; (p.6 To obtain demonstrations, we trained 60 independent policies to reach to random target locations in the workspace starting from the same initial configuration. We generated 30 trajectories from each of the first 50 policies. These serve as training data for the VAE model (1500 training trajectories in
total). The remaining 10 policies were used to generate test data.)"
determining that the estimated expression level is not equal to the first expression level; and performing one or more operations to generate a modified mapping from the initial mapping, wherein the modified mapping does indicate the first expression level when evaluated based on the first set of design parameter values. (p.3 To avoid differentiating through the system dynamics, policy gradient algorithms are used to train the policy by maximizing the discounted sum of rewards)

Regarding Claim 6:
Wang teaches:
wherein the first behavior comprises a first physical action, and (page 5: “… on a reaching task with a simulated Jaco arm. The physical Jaco is a robotics arm developed by Kinova Robotics…”)
wherein the first robot model does not perform the first physical action when configured according to a fourth set of design parameter values. (p.8, "Although not always successful the learned controller often transitions robustly, despite not having been trained to do so.", examiner notes that failed transitions equate to not performing the action)

Regarding Claim 9:
Wang teaches:
wherein the first robot model defines a multi-pedal robot, and (Fig.5 illustrates a multi-pedal robot)
the first set of design parameter values includes a first joint angle associated with a first leg of the multi-pedal robot. (p.5 56-actuated joint angles)

Regarding Claim 10:
Wang teaches:
wherein the first robot model defines a robotic arm, and (Figure 3: Interpolation in the latent space for the Jaco arm. Each column shows three frames of a target-reach trajectory (time increases across rows). The left and right most columns correspond to the demonstration trajectories in between which we interpolate. Intermediate columns show trajectories generated by our VAE policy conditioned on embeddings which are convex combinations of the embeddings of the demonstration trajectories. Interpolating in the latent space indeed correspond to interpolation in the physical dimensions.)
the first set of design parameter values includes a first flocking rule that governs how a group of agents move relative to one another to describe a path the robotic arm traverses during operation. (Figure 3: Interpolation in the latent space for the Jaco arm. Each column shows three frames of a target-reach trajectory (time increases across rows). The left and right most columns correspond to the demonstration trajectories in between which we interpolate. Intermediate columns show trajectories generated by our VAE policy conditioned on embeddings which are convex combinations of the embeddings of the demonstration trajectories. Interpolating in the latent space indeed correspond to interpolation in the physical dimensions.)

Claims 11-14, 16, and 19-20 are rejected under the same grounds as equivalent claims 1-4, 6, and 9-10 above (which are substantially similar and correspond in numerical order, i.e. 1 to 11).

Claims 5, 8, 15, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Wang (Wang, Z., Merel, J., Reed, S., Wayne, G., de Freitas, N., & Heess, N. (2017). Robust imitation of diverse behaviors. arXiv preprint arXiv:1707.02747.) in view of Kobayashi (US 20090106177 A1), and further in view of Mazzoldi (US 20170061319 A1).
Regarding Claim 5:
Wang teaches:
generating an initial set of design parameter values, wherein the first robot model performs the first behavior with an initial expression level when configured according to the initial set of design parameter values; (page 3: “… policy gradient algorithms are used to train the policy…”; page 5 algorithm 1: “…. gradient…”; page 8: “…. 250 random trajectories from 6 different neural network controller that were trained…”; page 1: “…. the end product of this is a robust neural network…”)
wherein the first robot model performs the first behavior with the final expression level when configured according to the third set of design parameter values. (par 5: “… reaching task with a simulated Jaco arm…”)
Wang does not teach in particular, but Mazzoldi teaches:
determining that the initial expression level should be modified to a final expression level based on an interaction with a user; and (abstract: “… receives a preference indicator for a selected one of the intermediate designs, where a user inputs the preference indicator…”; Figure 1, 4, 5, 6, 7)
modifying the initial set of design parameter values based on the first mapping to produce the third set of design parameter values, (Figure 1, 4, 5, par 32 – 33: “genetic algorithm”)
It would have been obvious to one of ordinary skill in the art at the time the invention was filed to combine the robot learning system of Wang with the design optimization system of Mazzoldi for the benefit of exploring the design space (Mazzoldi ¶4).

Regarding Claim 8:
Wang does not teach in particular, but Mazzoldi teaches:
wherein the first mapping comprises a function that classifies sets of design parameter values to produce different expression levels and is evaluated based on a plurality of weighted variables. (Abstract, the device generates a second plurality of intermediate designs using the adjusted corresponding weight of the selected one of the first plurality of intermediate designs; see also Wang, p.5,  One possible route is to initialize the weights; )
It would have been obvious to one of ordinary skill in the art at the time the invention was filed to combine the robot learning system of Wang with the design optimization system of Mazzoldi for the benefit of exploring the design space (Mazzoldi ¶4).

Claims 15 and 18 are rejected under the same grounds as equivalent claims 5 and 8 above (which are substantially similar and correspond in numerical order, i.e. 5 to 15).


Claims 7 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Wang (Wang, Z., Merel, J., Reed, S., Wayne, G., de Freitas, N., & Heess, N. (2017). Robust imitation of diverse behaviors. arXiv preprint arXiv:1707.02747.) in view of Kobayashi (US 20090106177 A1), and further in view of Kubota (Kubota, N., Nojima, Y., Baba, N., Kojima, F., & Fukuda, T. (2000, July). Evolving pet robot with emotional model. In Proceedings of the 2000 Congress on Evolutionary Computation. CEC00 (Cat. No. 00TH8512) (Vol. 2, pp. 1231-1237). IEEE.)
Regarding Claim 7:
Wang teaches:
wherein the first robot model does not express the first emotional state when configured according to a fourth set of design parameter values. (p.8, "Although not always successful the learned controller often transitions robustly, despite not having been trained to do so.", examiner notes that failed transitions equate to not performing the action)
Wang does not teach in particular, but Kobayashi teaches:
wherein the first behavior comprises expressing a first emotional state, and (Abstract, The pet robot performs tricks by using fuzzy controller, and further acquires tricks by a delta rule as online learning and a genetic algorithm as offline learning.; p.3, Furthermore, we use the concept of mood as a basis value for the feeling. This value is used at the steady-state of the pet robot and the feeling converges to the basic value ... p.4, Here the degree of emotion reacted by the external stimulus is decided by the fuzzy inference designed experimentally by us.)
It would have been obvious to one of ordinary skill in the art at the time the invention was filed to combine the robot learning system of Wang with robot emotional model of Kubota, as this provides "very important" communication with the robot's owner (Kubota, p.1 col 1).

Claim 17 is rejected under the same grounds as equivalent claim 7 above (which is substantially similar).

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BIJAN MAPAR whose telephone number is (571)270-3674. The examiner can normally be reached Monday - Thursday, 11:00-8:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Boris Gorney can be reached on 571-270-5626. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/BIJAN MAPAR/               Primary Examiner, Art Unit 2147