Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
	
EXAMINER’S AMENDMENT
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.
Authorization for this examiner’s amendment was given in a telephone communication with Applicant’s Representative, Martin Moynihan, to Examiner on 05/19/2022. 
The application has been amended as follows. Note that only claims 1, 14, 17 have been amended. 

1. (currently amended) A method for training a computerized mechanical device's neural network dataset, comprising: 
receiving data documenting a plurality of actions demonstrated by a demonstrating actuator performing a target task in a plurality of initial iterations; 
calculating using said data a neural network dataset having a plurality of neural network parameters and used for mimicking said demonstrated plurality of actions in performing said target task; 
gathering, in a plurality of reward training iterations of a robotic actuator performing said target task according to said neural network dataset, a plurality of scores given by an instructor to a plurality of world states, each world state comprising at least one sensor output value associated with said performance of said robotic actuator; 
calculating, using said plurality of scores, a reward neural network dataset having a second plurality of neural network parameters; 
computing, through machine learning, a reward function from said reward neural network dataset; 
receiving in each of a plurality of policy training iterations a reward value computed by applying said reward function to another world state, while said robotic actuator performs said target task according to said neural network dataset, wherein said another world state comprising at least one sensor output value; 
gathering in a plurality of safety training iterations a plurality of safety scores given by a safety instructor to a plurality of safety states, each safety state comprising at least one other sensor output value, while said robotic actuator performs said target task according to said neural network dataset; and 
calculating using said plurality of safety scores a safety neural network dataset having a third plurality of neural network parameters and used for computing a safety function;
updating at least some of said plurality of neural network parameters based on said received reward value of each of said plurality of policy training iteration; and 
outputting said updated neural network dataset; 
wherein updating said at least some of said plurality of neural network parameters further comprises: 
receiving in each of said plurality of policy training iterations a safety value computed by applying said safety function to said another world state, while said robotic actuator performs said target task according to said neural network dataset; and 
identifying at least one safe controller action subject to said safety value being less than an identified threshold safety value.

	14. (Currently Amended) A system for training a computerized mechanical device's neural network dataset, comprising: 
at least one hardware processor, executing at least one neural network comprising a plurality of convolutional layers; 
at least one sensor electrically connected to an input of said at least one hardware processor; and 
at least one controller, connected to an output of said at least one hardware processor; wherein said at least one hardware processor is adapted to: 
receive data documenting a plurality of actions demonstrated by a demonstrating actuator performing a target task in a plurality of initial iterations; 
calculate using said data a neural network dataset having a plurality of neural network parameters and used for mimicking said demonstrated plurality of actions in performing said target task; 
gather, in a plurality of reward training iterations of a robotic actuator performing said target task according to said neural network dataset, a plurality of scores given by an instructor to a plurality of world states, each world state comprising at least one sensor output value received from said at least one sensor and associated with said performance of said robotic actuator; 
calculate, using said plurality of scores, a reward neural network dataset having a second plurality of neural network parameters; 
compute, through machine learning, a reward function from said reward neural network dataset; 
receive in each of a plurality of policy training iterations a reward value computed by applying said reward function to another world state, while said robotic actuator performs said target task according to said neural network dataset, wherein said another world state comprising at least one sensor output value received from said at least one sensor; 
gather in a plurality of safety training iterations a plurality of safety scores given by a safety instructor to a plurality of safety states, each safety state comprising at least one other sensor output value, while said robotic actuator performs said target task according to said neural network dataset; and 
calculate using said plurality of safety scores a safety neural network dataset having a third plurality of neural network parameters and used for computing a safety function; 
update at least some of said plurality of neural network parameters based on said received reward value of each of said plurality of policy training iteration; and 
output said updated neural network dataset; wherein said at least some of said plurality of neural network parameters are updated by: 
receiving in each of said plurality of policy training iterations a safety value computed by applying said safety function to said another world state, while said robotic actuator performs said target task according to said neural network dataset; and 
identifying at least one safe controller action subject to said safety value being less than an identified threshold safety value.

17. (Currently Amended) A method for a computerized mechanical device, comprising: accessing a neural network data set generated by: 
receiving data documenting a plurality of actions demonstrated by a demonstrating actuator performing a target task in a plurality of initial iterations; 
calculating using said data a neural network dataset having a plurality of neural network parameters and used for mimicking said demonstrated plurality of actions in performing said target task; 
gathering, in a plurality of reward training iterations of a robotic actuator performing said target task according to said neural network dataset, a plurality of scores given by an instructor to a plurality of world states, each world state comprising at least one sensor output value associated with said performance of said robotic actuator; 
calculating, using said plurality of scores, a reward neural network dataset having a second plurality of neural network parameters; 
computing, through machine learning, a reward function from said reward neural network dataset; 
receiving in each of a plurality of policy training iterations a reward value computed by applying said reward function to another world state while said robotic actuator performs said target task according to said neural network dataset, wherein said another world state comprising at least one sensor output value; 
gathering in a plurality of safety training iterations a plurality of safety scores given by a safety instructor to a plurality of safety states, each safety state comprising at least one other sensor output value, while said robotic actuator performs said target task according to said neural network dataset; and 
calculating using said plurality of safety scores a safety neural network dataset having a third plurality of neural network parameters and used for computing a safety function; 
updating at least some of said plurality of neural network parameters based on said received reward value of each of said plurality of policy training iteration; and 
outputting said updated neural network dataset; 
receiving a plurality of sensor output values; and 
instructing at least one controller to perform one or more of an identified set of controller actions according to said updated neural network dataset in response to receiving said plurality of sensor output values; 
wherein updating said at least some of said plurality of neural network parameters further comprises: 
receiving in each of said plurality of policy training iterations a safety value computed by applying said safety function to said another world state, while said robotic actuator performs said target task according to said neural network dataset; and 
identifying at least one safe controller action subject to said safety value being less than an identified threshold safety value.

Reasons for Allowance
The following is an examiner’s statement of reasons for allowance: Claims 1-4, 6-18 are considered allowable since when reading the claims in light of the specification, none of the references of record either alone or in combination fairly disclose or suggest the combination of limitations specific in the independent claim including at least:

From independent claims 1, 14, 17:
gathering in a plurality of safety training iterations a plurality of safety scores given by a safety instructor to a plurality of safety states, each safety state comprising at least one other sensor output value, while said robotic actuator performs said target task according to said neural network dataset; and 
calculating using said plurality of safety scores a safety neural network dataset having a third plurality of neural network parameters and used for computing a safety function;
…
wherein updating said at least some of said plurality of neural network parameters further comprises: 
receiving in each of said plurality of policy training iterations a safety value computed by applying said safety function to said another world state, while said robotic actuator performs said target task according to said neural network dataset; and 
identifying at least one safe controller action subject to said safety value being less than an identified threshold safety value.

The closest prior art of record, Guenter et al. (Reinforcement Learning for Imitating Constrained Reaching Movements) discloses how a robot can adapt its execution of a learned task when confronted with a new situation based on the constraints taught by the demonstrator during the exploration process of the RL module.

Suleman et al. (Learning from demonstration in robots: Experimental comparison of neural architectures) discloses a demonstration mode based on sensor readings and actuator commands, and a learning mode via supervised ANN training as a learner controller. 

Daniel et al. (Active Reward Learning) discloses an active reward learning approach which shares the policy learning component with vanilla RL but models a probabilistic reward model that gets updated by asking for expert ratings.

MNIH et al. (US 2015/0100530 A1) discloses training multiple neural networks by using the experience data (which includes reward data relating to a reward of the action in moving from a current state to a subsequent state) which is used in conjunction with the first neural network for training the second neural network.

Uehara et al. (US 2013/0190964 A1) discloses identifying safe controller actions by enabling the navigation processor to determine that the current navigation task cannot be continued if the safety factor is below the predefined threshold value.

However, none of the references discloses in detail 

From independent claims 1, 14, 17:
gathering in a plurality of safety training iterations a plurality of safety scores given by a safety instructor to a plurality of safety states, each safety state comprising at least one other sensor output value, while said robotic actuator performs said target task according to said neural network dataset; and 
calculating using said plurality of safety scores a safety neural network dataset having a third plurality of neural network parameters and used for computing a safety function;
…
wherein updating said at least some of said plurality of neural network parameters further comprises: 
receiving in each of said plurality of policy training iterations a safety value computed by applying said safety function to said another world state, while said robotic actuator performs said target task according to said neural network dataset; and 
identifying at least one safe controller action subject to said safety value being less than an identified threshold safety value.

as in the claims for the purpose of training a computerized mechanical device by calculating the neural network dataset with reward iterations so that the robotic actuator may perform given tasks according to the neural network. 

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee. Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion
Claims 1-4, 6-18 are allowed.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEHWAN KIM whose telephone number is (571)270-7409. The examiner can normally be reached Mon - Thu 7:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael J Huntley can be reached on (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/S.K./Examiner, Art Unit 2129                                                                                                                                                                                                        
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129