DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Examiner’s Notes
Regarding 35 USC § 103 rejection, the rejection made in the previous action has been withdrawn. 
Allowable Subject Matter
Claims 12-14, 16, 18, 20-27, and 29-33 are allowed.

EXAMINER'S AMENDMENT
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.

Authorization for this examiner’s amendment was by Michael Portnov  (Reg. No. 61,225) on February 25, 2022.
Replace claims 12-14, 16, 18, 20-27, and 29-33 with the following:
1-11.	(Cancelled)
12.	(Currently Amended) A computer-implemented method of encoding behaviours for recall, the method comprising:	obtaining a training trajectory representing an example behaviour, the training trajectory comprising, for each of a plurality of time steps during performance of the behaviour, (i) an observation representing a state of an environment at the time step and (ii) a training action performed at the time             
                t
            
         of the plurality of time steps:
generating action data             
                
                    
                        a
                    
                    
                        t
                    
                
            
         for the time step t from (i) an observation             
                
                    
                        s
                    
                    
                        t
                    
                
            
         representing the state of the environment at the time step t and one or more observations             
                [
                
                    
                        s
                    
                    
                        
                            
                                t
                                +
                                1
                            
                        
                    
                
                ,
                …
                ,
                
                    
                        s
                    
                    
                        t
                        +
                        k
                    
                
                ]
                 
            
        that represent states of the environment at time steps t+1,…t+k after the time step t in the training sequence, comprising:  		generating an encoder input             
                
                    
                        x
                    
                    
                        t
                    
                
            
         for the particular time step, wherein             
                
                    
                        x
                    
                    
                        t
                    
                
            
         comprises the observation at the time step t and the one or more observations representing states of the environment at a the one or more future time step from the training trajectory such that             
                
                    
                        x
                    
                    
                        t
                    
                
                =
                
                    
                        
                            
                                s
                            
                            
                                t
                            
                        
                        ,
                        
                            
                                s
                            
                            
                                t
                                +
                                1
                            
                        
                        ,
                        …
                        ,
                        
                            
                                s
                            
                            
                                t
                                +
                                k
                            
                        
                    
                
            
                     
                
                    
                         
                        x
                    
                    
                        t
                    
                
                =
                
                    
                        
                            
                                s
                            
                            
                                t
                                +
                                1
                            
                        
                        ,
                        …
                        ,
                        
                            
                                s
                            
                            
                                t
                                +
                                k
                            
                        
                    
                
            
        ;		encoding the encoder input using an encoder neural network to determine parameters of a posterior distribution             
                
                    
                        q
                    
                    
                        t
                    
                
                
                    
                        
                            
                                z
                            
                            
                                t
                            
                        
                    
                    
                        
                            
                                x
                            
                            
                                t
                            
                        
                    
                
                 
            
        over a set of motor primitive latent variables;		sampling from the posterior distribution             
                
                    
                        q
                    
                    
                        t
                    
                
                
                    
                        
                            
                                z
                            
                            
                                t
                            
                        
                    
                    
                        
                            
                                x
                            
                            
                                t
                            
                        
                    
                
                 
                 
            
        to determine a multi-dimensional motor primitive latent variable             
                
                    
                        z
                    
                    
                        t
                    
                
            
         for the particular time step; and		decoding (i) the multi-dimensional motor primitive latent variable             
                
                    
                        z
                    
                    
                        t
                    
                
            
         for the particular time step and (ii) the observation             
                
                    
                        s
                    
                    
                        t
                    
                
                 
            
        at the particular time step using a generative neural network to generate action data             
                
                    
                        a
                    
                    
                        t
                    
                
                 
            
        for the time step; and 		training the encoder neural network and the generative neural network using an objective function dependent upon (i) the action data             
                
                    
                        a
                    
                    
                        t
                    
                
                 
            
        output by the generative neural network for the particular time step and upon (ii) data representing the training action in the training trajectory at the particular time step.
13.	(Original) The method as claimed in claim 12 wherein the objective function further comprises a term dependent upon a difference between the posterior distribution and a prior distribution for the motor primitive latent variables.
14.	(Original) The method as claimed in claim 13 wherein the prior distribution comprises an autroregressive distribution such that at each time step the prior distribution depends on a combination of α times the prior distribution at a previous time step where |α|<1, and a noise component. 
15.	(Cancelled)

17.	(Cancelled)
18.	(Previously Presented) The method as claimed in claim 12, wherein the encoder input for the particular time step further comprises the multi-dimensional motor primitive latent variable for the time step preceding the particular time step in the training trajectory.
19.	(Cancelled) 
20.	(Previously Presented) The method as claimed in claim 12, wherein the observations in the training trajectory are generated by applying first perturbations to observations in a nominal trajectory for the behaviour wherein the nominal trajectory is given by a sequence of nominal state action pairs             
                
                    
                        
                            
                                
                                    
                                        s
                                    
                                    
                                        t
                                    
                                    
                                        *
                                    
                                
                                ,
                                
                                    
                                        a
                                    
                                    
                                        t
                                    
                                    
                                        *
                                    
                                
                            
                        
                    
                    
                        1
                        …
                        T
                    
                
            
         obtained by executing             
                
                    
                        μ
                    
                    
                        E
                    
                
                
                    
                        s
                    
                
                 
            
        (the mean action of an expert in state             
                s
            
        ) recursively.
21.	(Previously Presented) The method as claimed in claim 20, wherein the actions in the training trajectory are generated by applying second perturbations to actions in the nominal trajectory for the behaviour.
22.	(Previously Presented) The method as claimed in claim 21, wherein the first perturbations are based on perturbations drawn from a perturbation distribution, and wherein the second perturbations are based on a state-action Jacobian of a policy used to generate the nominal trajectory and the perturbations drawn from the perturbation distribution.
23.	(Currently Amended) A system comprising one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers to perform operations for encoding behaviours for recall, the operations comprising:	obtaining a training trajectory representing an example behaviour, the training trajectory comprising, for each of a plurality of time steps during performance of the behaviour, (i) an observation representing a state of an environment at the time step and (ii) a training action performed at the time             
                t
            
         of the plurality of time steps:
generating action data             
                
                    
                        a
                    
                    
                        t
                    
                
            
         for the time step t from (i) an observation             
                
                    
                        s
                    
                    
                        t
                    
                
            
         representing the state of the environment at the time step t and one or more observations             
                [
                
                    
                        s
                    
                    
                        
                            
                                t
                                +
                                1
                            
                        
                    
                
                ,
                …
                ,
                
                    
                        s
                    
                    
                        t
                        +
                        k
                    
                
                ]
                 
            
        that represent states of the environment at time steps t+1,…t+k after the time step t in the training sequence, comprising:  		generating an encoder input             
                
                    
                        x
                    
                    
                        t
                    
                
            
         for the particular time step, wherein             
                
                    
                        x
                    
                    
                        t
                    
                
            
         comprises the observation at the time step t and the one or more observations representing states of the environment at a the one or more future time step from the training trajectory such that             
                
                    
                        x
                    
                    
                        t
                    
                
                =
                
                    
                        
                            
                                s
                            
                            
                                t
                            
                        
                        ,
                        
                            
                                s
                            
                            
                                t
                                +
                                1
                            
                        
                        ,
                        …
                        ,
                        
                            
                                s
                            
                            
                                t
                                +
                                k
                            
                        
                    
                
            
                     
                
                    
                         
                        x
                    
                    
                        t
                    
                
                =
                
                    
                        
                            
                                s
                            
                            
                                t
                                +
                                1
                            
                        
                        ,
                        …
                        ,
                        
                            
                                s
                            
                            
                                t
                                +
                                k
                            
                        
                    
                
            
        ;		encoding the encoder input using an encoder neural network to determine parameters of a posterior distribution             
                
                    
                        q
                    
                    
                        t
                    
                
                
                    
                        
                            
                                z
                            
                            
                                t
                            
                        
                    
                    
                        
                            
                                x
                            
                            
                                t
                            
                        
                    
                
                 
            
        over a set of motor primitive latent variables;		sampling from the posterior distribution             
                
                    
                        q
                    
                    
                        t
                    
                
                
                    
                        
                            
                                z
                            
                            
                                t
                            
                        
                    
                    
                        
                            
                                x
                            
                            
                                t
                            
                        
                    
                
                 
                 
            
        to determine a multi-dimensional motor primitive latent variable             
                
                    
                        z
                    
                    
                        t
                    
                
            
         for the particular time step; and		decoding (i) the multi-dimensional motor primitive latent variable             
                
                    
                        z
                    
                    
                        t
                    
                
            
         for the particular time step and (ii) the observation             
                
                    
                        s
                    
                    
                        t
                    
                
                 
            
        at the particular time step using a generative neural network to generate action data             
                
                    
                        a
                    
                    
                        t
                    
                
                 
            
        for the time step; and 		training the encoder neural network and the generative neural network using an objective function dependent upon (i) the action data             
                
                    
                        a
                    
                    
                        t
                    
                
                 
            
        output by the generative neural network for the particular time step and upon (ii) data representing the training action in the training trajectory at the particular time step.
24.	(Previously Presented) The system as claimed in claim 23 wherein the objective function further comprises a term dependent upon a difference between the posterior distribution and a prior distribution for the motor primitive latent variables.
25.	(Previously Presented) The system as claimed in claim 24 wherein the prior distribution comprises an autroregressive distribution such that at each time step the prior distribution depends on a combination of α times the prior distribution at a previous time step where |α|<1, and a noise component. 

27.	(Previously Presented) The system as claimed in claim 23, wherein the encoder input for the particular time step further comprises the multi-dimensional motor primitive latent variable for the time step preceding the particular time step in the training trajectory.
28.	(Cancelled) 
29.	(Previously Presented) The system as claimed in claim 23, wherein the observations in the training trajectory are generated by applying first perturbations to observations in a nominal trajectory for the behaviour wherein the nominal trajectory is given by a sequence of nominal state action pairs             
                
                    
                        
                            
                                
                                    
                                        s
                                    
                                    
                                        t
                                    
                                    
                                        *
                                    
                                
                                ,
                                
                                    
                                        a
                                    
                                    
                                        t
                                    
                                    
                                        *
                                    
                                
                            
                        
                    
                    
                        1
                        …
                        T
                    
                
            
         obtained by executing             
                
                    
                        μ
                    
                    
                        E
                    
                
                
                    
                        s
                    
                
                 
            
        (the mean action of the expert in state             
                s
            
        ) recursively.
30.	(Previously Presented) The system as claimed in claim 29, wherein the actions in the training trajectory are generated by applying second perturbations to actions in the nominal trajectory for the behaviour.
31.	(Previously Presented) The system as claimed in claim 30, wherein the first perturbations are based on perturbations drawn from a perturbation distribution, and wherein the second perturbations are based on a state-action Jacobian of a policy used to generate the nominal trajectory and the perturbations drawn from the perturbation distribution.
32.	(Currently Amended) One or more non-transitory computer-readable storage media storing instructions that when executed by one or more computers cause the one or more computers to perform operations for encoding behaviours for recall, the operations comprising:obtaining a training trajectory representing an example behaviour, the training trajectory comprising, for each of a plurality of time steps during performance of the behaviour, (i) an observation representing a state of an environment at the time step and (ii) a training action performed at the time step;	for a particular time step             
                t
            
         of the plurality of time steps:
            
                
                    
                        a
                    
                    
                        t
                    
                
            
         for the time step t from (i) an observation             
                
                    
                        s
                    
                    
                        t
                    
                
            
         representing the state of the environment at the time step t and one or more observations             
                [
                
                    
                        s
                    
                    
                        
                            
                                t
                                +
                                1
                            
                        
                    
                
                ,
                …
                ,
                
                    
                        s
                    
                    
                        t
                        +
                        k
                    
                
                ]
                 
            
        that represent states of the environment at time steps t+1,…t+k after the time step t in the training sequence, comprising:  		generating an encoder input             
                
                    
                        x
                    
                    
                        t
                    
                
            
         for the particular time step, wherein             
                
                    
                        x
                    
                    
                        t
                    
                
            
         comprises the observation at the time step t and the one or more observations representing states of the environment at a the one or more future time step from the training trajectory such that             
                
                    
                        x
                    
                    
                        t
                    
                
                =
                
                    
                        
                            
                                s
                            
                            
                                t
                            
                        
                        ,
                        
                            
                                s
                            
                            
                                t
                                +
                                1
                            
                        
                        ,
                        …
                        ,
                        
                            
                                s
                            
                            
                                t
                                +
                                k
                            
                        
                    
                
            
                     
                
                    
                        x
                    
                    
                        t
                    
                
                =
                
                    
                        
                            
                                s
                            
                            
                                t
                                +
                                1
                            
                        
                        ,
                        …
                        ,
                        
                            
                                s
                            
                            
                                t
                                +
                                k
                            
                        
                    
                
            
        ;		encoding the encoder input using an encoder neural network to determine parameters of a posterior distribution             
                
                    
                        q
                    
                    
                        t
                    
                
                
                    
                        
                            
                                z
                            
                            
                                t
                            
                        
                    
                    
                        
                            
                                x
                            
                            
                                t
                            
                        
                    
                
                 
            
        over a set of motor primitive latent variables;		sampling from the posterior distribution             
                
                    
                        q
                    
                    
                        t
                    
                
                
                    
                        
                            
                                z
                            
                            
                                t
                            
                        
                    
                    
                        
                            
                                x
                            
                            
                                t
                            
                        
                    
                
                 
                 
            
        to determine a multi-dimensional motor primitive latent variable             
                
                    
                        z
                    
                    
                        t
                    
                
            
         for the particular time step; and		decoding (i) the multi-dimensional motor primitive latent variable             
                
                    
                        z
                    
                    
                        t
                    
                
            
         for the particular time step and (ii) the observation             
                
                    
                        s
                    
                    
                        t
                    
                
                 
            
        at the particular time step using a generative neural network to generate action data             
                
                    
                        a
                    
                    
                        t
                    
                
                 
            
        for the time step; and 		training the encoder neural network and the generative neural network using an objective function dependent upon (i) the action data             
                
                    
                        a
                    
                    
                        t
                    
                
                 
            
        output by the generative neural network for the particular time step and upon (ii) data representing the training action in the training trajectory at the particular time step.
33.	(Previously Presented) The one-or more non-transitory computer-readable storage media as claimed in claim 32 wherein the objective function further comprises a term dependent upon a difference between the posterior distribution and a prior distribution for the motor primitive latent variables.


Reasons for Allowance
The following is an examiner's statement of reasons for allowance:
Claims  12-14, 16, 18, 20-27, and 29-33 are considered allowable since when reading the claims in light of the specification, as per MPEP 2111.01, none of the references of record alone or in combination disclose or suggest the limitations found within the independent claims 12, 23,, and 32 as a 𝑥𝑡 for the particular time step, wherein xt comprises the observation at the time step t and the one or more observations representing states of the environment at a the one or more future time step from the training trajectory such that xt=[st,st+1,…,st+k]… encoding the encoder input using an encoder neural network to determine parameters of a posterior distribution 𝑞𝑡(𝑧𝑡|𝑥𝑡) over a set of motor primitive latent variables; … decoding (i) the multi-dimensional motor primitive latent variable 𝑧𝑡 for the particular time step and (ii) the observation 𝑠𝑡 at the particular time step using a generative neural network to generate action data 𝑎𝑡 for the time step; and training the encoder neural network and the generative neural network using an objective function dependent upon (i) the action data 𝑎𝑡 output by the generative neural network for the particular time step and upon (ii) data representing the training action in the training trajectory at the particular time step.” (in exemplar claim 12), 
 
The closest prior arts, listed below, discloses:
Agaian et al. (US Pub. No. US 20160253466): the updating training samples features associated with an ensemble of neural network classifiers based on distance based learning techniques such as support vector machines and K-nearest neighbors that use smallest distances of similar data samples to determine a class. Agaian does not discloses the specifically claimed process for selecting existing samples to remove samples from memory for updating a first set of existing samples used to retraining the original neural network.

Watter et al. (NPL: “Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images”): obtaining a training trajectory, as image frames for controlling dynamic systems using observations of actions taken from an image as xt and encoding xt to generate actions ut 

Ali Ghadirzadeh et al. (NPL: “Deep Predictive Policy Training using Reinforcement Learning”): making observations as demonstrations captured using hidden Markov Models for generating input for an encoder time sequence of actions using time steps from one or more future times from a time t for processing state action transitions over time using generative neural network model to process the data for controlling an learning agent. Ali does not expressly teach training using an encoder and a generative neural network using the encoded input data as required by applicant’s claims. 

Billard (NPL: “Handbook of Robotics”):  teaches A large body of work uses a symbolic representation of both the learning and the encoding of skills and tasks; and using symbolic representations of task environment as way of encoding skills may take several forms. One common way is to segment and encode the task according to sequences of predeﬁned actions, described symbolically. Encoding and regenerating the sequences of these actions using Hidden Markov Models and Gaussian Mixture Models. Billard does not expressly teach training using an encoder and a generative neural network using the encoded input data as required by applicant’s claims..

Chen et al. (NPL: “Dynamic movement primitives in latent space of time-dependent variational autoencoders”): teaches capturing action and state over time steps as movement observations that can be encoded and learned using a variational auto encoder. Chen does not expressly teach the user of an input that represents actions  having xt= [s(t) ..., St+k] as required by applicant’s claims.
In summary, the references made of record, fail to disclose the required claimed technical features as recited by the independent claim limitations as a whole, also refer to remarks filed 10/29/2021.

Furthermore, the references of record alone or in combination fail to disclose or suggest the combination of limitations found within the independent claims as a whole without hindsight reasoning.
The dependent claims, being further limiting to the independent claims, definite, and enable by the Specification are also allowed. 
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee. Such submissions should be clearly labeled "Comments on Statement of Reasons for Allowance."

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to OLUWATOSIN ALABI whose telephone number is (571)272-0516. The examiner can normally be reached Monday-Friday, 8:00am-5:00pm EST..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/OLUWATOSIN O ALABI/              Examiner, Art Unit 2129                                                                                                                                                                                          
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129