Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
This action is a responsive to the application filed on 11/16/2017.
The First-Action Interview (FAI) Pilot Program telephone interview was conducted on 03/01/2021.
Claims 1-20 are pending.
Claims 1-20 are rejected.

Claim Objections
Claim 5 is objected to because of the following informalities:
Claim 5 recites a typo of claiming dependency on itself, stating “The method of claim 5”. An optional way to overcome this objection is to change the dependency to be on claim 4 (as reflected in analogous claim 13).
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:


Claims 2, 10, and 18 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.
Claims 2, 10, and 18 recite the limitation “wherein applying the one or more properties to the empirical data comprises: determining a first state and a corresponding first operation from the empirical data; and deriving a second state and a corresponding second operation by calculating the one or more properties for the first state and the first operation”. Under its broadest interpretation, it is unclear to the examiner how the one or more properties are applied to calculate themselves. Claim 1 states to already have determined the “one or more properties” for application “to the empirical data” (see limitation “determining…one or more properties of the device that can be applied to empirical data of the device”). Specification paragraph 0007 merely recites the claim language, thus failing to clarify the limitation and maintaining that it is unclear how “properties” can be calculated if the properties are already determined.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
 (a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the 

Claims 1-20 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Noda et al (US Pub 20100318479) hereinafter Noda.
Regarding claims 1, 9, and 17, Noda teaches a computer-implemented method, non-transitory computer-readable storage medium storing instructions that when executed by a computer cause the computer, and computer system; comprising: a storage device; a processor; a non-transitory computer-readable storage medium storing instructions, which when executed by the processor causes the processor to perform a method for facilitating comprehensive control data for a device (paragraphs 1002-1014 teach a program (instructions) stored on a “recording medium” (CRM/storage device) that is executed by a CPU (processor) for performing the embodiments of the disclosure), the method comprising: 
determining, by a computer, one or more properties of the device that can be applied to empirical data of the device, wherein the empirical data is obtained based on experiments performed on the device (paragraph 0008 teaches a “computer” executing a program to, as taught in paragraphs 0011, 0073, have an agent (device) “observing externally observable information (empirical data)” when (based on) the agent performs “an action” (experiments performed on the device), in order to determine “an action plan as an action series (properties of the device)” to reach a “goal state” by calculating “an action to be performed next by the agent (properties of the device) according to the action plan” based on observation data (applied to empirical data). Further, paragraph 0072 teaches an “agent is a device for example as a robot” ; 
applying the one or more properties to the empirical data to obtain derived data (paragraphs 0011, 0073, and 0124-0125 teach “learning the state transition probability model (obtain derived data) defined by state transition probability for each action of a state making a state transition (derived data) due to an action performed by the agent and observation probability of a predetermined observed value being observed from the state (applying the one or more properties to the empirical data)”; in other words, determining all state transition probabilities for each agent action based on the observed information from the state (applying the one or more properties to the empirical data to obtain derived data)); 
learning an efficient policy for the device based on both empirical and derived data, wherein the efficient policy indicates one or more operations of the device that can reach a target state from an initial state of the device (paragraphs 0011, 0073 teach an agent (device) “observing externally observable information (empirical data)” when the agent performs “an action”, in order to determine (learning) “an action plan (efficient policy for the device) as an action series” to reach a “goal state (target state)” by calculating “an action to be performed next by the agent (one or more operations of the device) according to (indicated by) the action plan (efficient policy)” to “maximize likelihood of state transition from the present state to the goal state (based on derived data)” and based on observation data (empirical data) of a “present state (from an initial state of the device)”. Further, paragraph 0072 teaches an “agent is a ; and 
determining an operation for the device based on the efficient policy (paragraphs 0011, 0073 teach an determining “an action plan (efficient policy) as an action series” to reach a “goal state” by calculating (determining) “an action to be performed next by the agent (operation for the device) according to (based on) the action plan (efficient policy)”).

Regarding claims 2, 10, and 18, Noda teaches all the claim limitations of claims 1, 9, and 17 above, and further teaches wherein applying the one or more properties to the empirical data comprises: 
determining a first state and a corresponding first operation from the empirical data (paragraphs 0011, 0073 teach calculating (determining) “an action to be performed next by the agent (operation for the device) according to (based on) the action plan” and based on observation data (from the empirical data) of a “present state (derived first state)”, where paragraphs 0308-0309 and Figs. 10B-10C teach an “action determining section 24 sets an action of moving from the first state S.sub.28 to the next state S.sub.23 of the action plan PL1 as a determined action (corresponding first operation).  The agent performs the determined action.”); and 
deriving a second state and a corresponding second operation by calculating the one or more properties for the first state and the first operation (paragraphs 0011, 0073, and 0124-0125 teach “learning the state transition probability model defined by state transition probability for each action of a state making a state .

Regarding claims 3, 11, and 19, Noda teaches all the claim limitations of claims 1, 9, and 17 above, and further teaches wherein learning the efficient policy for the device comprises: 
determining a first state transition in the derived data that maximizes a corresponding first reward function indicating a benefit of the first state transition for the device (paragraphs 0011, 0073, 0100, and 0124-0129 teach “learning the state transition probability model (derived data) defined by state transition probability for each action of a state making a state transition (first state transition in the derived data) due to an action performed by the agent and observation probability of a predetermined , wherein the first state transition is determined based on a second state transition in the empirical data that maximizes a corresponding second reward function (paragraphs 0011, 0073, 0100, 0124-0129, 0308-0309 and Figs. 10B-10C teach using a “state transition probability model”, that can be a “Hidden Markov Model” utilizing “reinforcement learning”, for “maximiz[ing the] likelihood of state transition from the present state to the goal state” through actions of an “action plan (efficient policy) as an action series (based on a second state transition… that maximizes a corresponding second reward function)” from observed information (in the empirical data). This is further shown in paragraphs 0225 and 0253 as calculating the “optimum path” including a “state series (first and second transitions)” of reaching a goal state from all calculated paths from the observed information (in the empirical data). Paragraphs 0100, 0308-0309, and 0926-0927 further teach an agent iteratively determining an appropriate action for the current state transition when moving forward in the state series utilizing “a reward function (maximizes a corresponding second reward function) for calculating a reward for the agent” through “reinforcement learning”.).

Regarding claims 4 and 12, Noda teaches all the claim limitations of claims 3 and 11 above, and further teaches wherein learning the efficient policy for the device further comprises updating a learning function for the first and second state transitions (paragraphs 0011, 0073, 0100, and 0124-0129 teach “learning the state transition probability model (updating a learning function) defined by state transition probability for each action of a state making a state transition (first state transition) due to an action performed by the agent and observation probability of a predetermined observed value being observed from the state” and further all states of an action plan/series (first and second state transitions of an efficient policy), where the model can be a “Hidden Markov Model” utilizing “reinforcement learning”, for “maximiz[ing the] likelihood of state transition from the present state to the goal state” for an action plan (efficient policy)).

Regarding claims 5 and 13, Noda teaches all the claim limitations of claims 4 and 12 above, and further teaches wherein updating the learning function for the first state transition comprises computing the learning function based on a relationship between the first and second reward functions (paragraphs 0011, 0073, 0100, 0124-0129, 0308-0309 and Figs. 10B-10C teach “learning the state transition probability model (updating the learning function)”, that can be a “Hidden Markov Model” utilizing “reinforcement learning”, for “maximiz[ing the] likelihood of state transition from the present state to the goal state (based on a relationship between the first and second reward functions)” through actions of an “action plan as an action .

Regarding claims 6 and 14, Noda teaches all the claim limitations of claims 1 and 9 above, and further teaches wherein the one or more properties include a symmetry of operations of the device (Examiner note: Applicant’s specification, paragraph 0047 states “the symmetry property…represents a feasible trajectory for device 130”.
Noda, paragraphs 0011, 0073, 0225, 0253, and 0308-0309 teach determining “an action plan as an action series (properties include a symmetry of operations of the device)”, where the “optimum path (symmetry of operations)…which is a state series (symmetry of operations) providing the optimum state probability” to reach a “goal state” from all calculated paths (symmetry of operations); and further calculating “an action to be performed next by the agent (properties of the device) according to the action plan (symmetry of operations)” based on observation data.).

Regarding claims 7, 15, and 20, Noda teaches all the claim limitations of claims 1, 9, and 17 above, and further teaches wherein determining the operation for the device further comprises: 
determining a current environment for the device (paragraphs 0305-0309 and Figs. 10B-10C teach an “action determining section 24 sets an action of moving from the first state S.sub.28 [or “present state”] to the next state S.sub.23 of the action plan PL1 as a determined action” from observation information within a “action environment (determining a current environment)”, and “[t]he agent (device) performs the determined action”); 
identifying a state representing the current environment (paragraphs 0305-0309 and Figs. 10B-10C teach an “action determining section 24 sets an action of moving from the first state S.sub.28 [or “present state”] (identifying a state) to the next state S.sub.23 of the action plan PL1 as a determined action” from observation information within a “action environment (representing the current environment)”, and “[t]he agent performs the determined action”); and 
determining the operation corresponding to the state based on the efficient policy (paragraphs 0011, 0073 teach calculating (determining) “an action to be performed next by the agent (operation) according to (based on) the action plan (efficient policy)” and based on observation data of a “present state (corresponding to the state)”, where paragraphs 0308-0309 and Figs. 10B-10C teach an “action determining section 24 sets an action of moving from the first state S.sub.28 (corresponding to the state) to the next state S.sub.23 of the action plan PL1 (efficient .

Regarding claims 8 and 16, Noda teaches all the claim limitations of claims 1 and 9 above, and further teaches obtaining a set of trajectories for the device, wherein a respective trajectory indicates a sequence of state transitions for the device (paragraphs 0225, 0253, and 0308-0309 teach determining the “optimum path (trajectory)…which is (indicates) a state series (sequence of state transitions) providing the optimum state probability” of reaching a goal state from all calculated paths (obtained set of trajectories) for the agent (for the device)); and 
determining the efficient policy based on the entire set of trajectories (paragraphs 0011, 0073, 0225, 0253, and 0308-0309 teach determining the “optimum path (efficient policy)…which is a state series providing the optimum state probability” of reaching a goal state from all calculated paths (entire set of trajectories) for the agent).

Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Mullan et al (US Pub 20160067864) teaches utilizing “reinforcement learning” for determining a set of device actions “or planning policies” based on current state information for controlling a device. 
Koga (US Pub 20170090459) teaches utilizing “reinforcement learning algorithms” for controlling robot movements based on state data and observation data.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CLINT MULLINAX whose telephone number is 571-272-3241.  The examiner can normally be reached on Mon - Fri 8:00-4:30 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on 571-270-3428.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/C.M./Examiner, Art Unit 2123                                                                                                                                                                                                        


/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123