Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 06/12/2020 and 05/26/2021 are considered by the examiner.
Drawings
The drawing submitted on 06/12/2020 are considered by the examiner.
Allowable Subject Matter
Claims 1-19 are allowed.
The following is an examiner’s statement of reasons for allowance: The prior art of record Keselman et al.(Us 2020/0097015 A1)  teach: [0007] Embodiments of the present invention may include a method of producing a motion planning policy for an Autonomous Driving Machine (ADM) by at least one processor. The method may include: [0008] a) creating a first node, such as a root node of a search tree, that may include a temporal data set, that may correspond to at least a condition (e.g., speed, location, orientation and the like) of the ADM; [0009] b) selecting, by a neural network (NN), a quality factor associated with the first node (e.g., root node) and with an action from a set of actions; [0010] c) producing, by a simulator at least one second node and a respective reward factor, where the second node may correspond with a predicted condition of the ADM following application of the selected action, and where the at least one second node may be associated with the first node by the selected action and by the reward factor; [0011] d) repeating steps b and c, to expand the search tree until a predefined termination condition is met; [0012] e) updating at least one quality factor by computing optimal sums of rewards along one or more trajectories in the expanded search tree; and [0013] f) training the NN to select at least one action according to at least one of: the temporal data set and the at least one updated quality factor.
The prior art of record alone or in combination failed to teach for claims 1, 10 and 19, “for each selected feature, learning an option policy from second transition tuples collected in the environment that maximizes a cumulative feature reward of the selected feature and storing the learned option policy for the selected feature in an augmented action space, wherein each second transition tuple includes state, action, feature reward and next state; and learning a second policy to maximize a second cumulative reward for a second task, the 15second policy learned by choosing one of the learned option policies in the augmented action space and using a reinforcement learning algorithm and third transition tuples collected in an environment, wherein each third transition tuple includes state, the chosen option policy, reward of the chosen learned option policy after taking the action generated by the chosen learned option policy, and next state.”.
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The prior art of record, Stelian et al. ( Robust Task-based Control Policites for Physics-based Characters, Published December 2009) teach: Abstract: As input, the method assumes an abstract action vocabulary consisting of balance-aware, step-based controllers. A novel constrained state exploration phase is first used to define a character dynamics model as well as a finite volume of character states over which the control policy will be defined. An optimized control policy is then computed using reinforcement learning. The final policy spans the cross-product of the character state and task state, and is more robust than the controllers it is constructed from. We demonstrate real-time results for six locomotion-based tasks and on three highly-varied bipedal characters. We further provide a game-scenario demonstration. 
Introduction: The task goal is specified using a reward or cost function, such as the distance to a goal point. The optimized control policy is then computed using fitted value iteration, which is a model-based reinforcement learning algorithm. Importantly, the control policy is defined over the cross product of the character state and the task state. These are both continuously-valued in our case and results in a high dimensional domain for the control policy. We show the feasibility of modeling control policies and value functions in such a case. The control policies also naturally integrate the need to maintain balance with the task objectives. For example, if a character is in a falling state and only one action can be taken to avoid a fall, this is captured by a control policy which is then invariant with respect to the task state for that particular character state. .
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MOHAMMAD K ISLAM whose telephone number is (571)270-5878. The examiner can normally be reached Monday -Friday, EST (IFP).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MOHAMMAD K ISLAM/Primary Examiner, Art Unit 2656