Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
Status of Claims
This action is in response to the application filed on July 1, 2020.
Claims 1-6 are currently pending.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on July 1, 2020 has been considered by the examiner.

Specification
The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. 

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 3 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.  Claim 3 recites the limitation "a mixture ratio of a probability distribution for each of the plurality of candidates to an entirety of the mixture model is derived together with the plurality of candidates" in line 3.  Claim 1, from which claim 3 depends, discloses “deriving a plurality of candidates for an action sequence of the agent from variational inference using a mixture model as a variational distribution.”  The mixture model appears to be a function; it is used to perform inference in claim 1.  It is unclear how a ratio can be computed using a probability distribution for a candidate and an inference model.  Additionally, it is not clear what the entirety of the mixture model comprises.  If the entirety of the mixture model is the sum of probability distributions the total would be 100%, or 1, and dividing by 1 would result in a ratio that is always the denominator.  


Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1, 2, and 4-6 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Chua et al., “Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models” (Chua).

With respect to independent claim 1 Chua teaches:
A learning method for learning an action of an agent using model-based reinforcement learning (Chua teaches a model-based reinforcement learning algorithm; see abstract.  The preamble states reinforcement learning but the remainder of the claims recite only supervised learning details and the correspondence between the preamble and body of the claim is not clear.), the learning method comprising:
obtaining time series data indicating states and actions of the agent when the agent performs a series of actions (Chua teaches a trajectory sampling propagation technique that re-samples each particle according to its probabilistic prediction at each point in time and at each time step (“each point in time” and “each time step” imply a time series) computes an optimal action sequence; see figure 1 and section 5.1.  Chua also teaches probabilistic dynamics which represent the condition distribution of the next state given the current state and action; see section 3.  The claim does not specify the agent and Chua provides potential applications of the disclosed method including conversational agents in section 1.);
establishing a dynamics model by performing supervised learning using the time series data obtained (Chua teaches modelling a dynamic function using an ensemble of bootstrapped probabilistic neural networks in section 4.  Chua further teaches a trajectory sampling propagation technique that re-samples each particle according to its probabilistic prediction at each point in time and at each time step (“each point in time” and “each time step” imply a time series) computes an optimal action sequence; see figure 1 and section 5.1.  The claim does not detail the supervised learning and time series data is considered labeled data and, therefore, learning using time series data is supervised learning.);
deriving a plurality of candidates for an action sequence of the agent from variational inference using a mixture model as a variational distribution, based on the dynamics model (Chua teaches deterministic methods including Gaussian mixture models in the third paragraph of section 5.  Chua also teaches an expressive dynamics model that can be represent the parameters in a deterministic manner which makes it feasible to incorporate neural networks into the probabilistic dynamics model; see the last paragraph of section 4.); and
outputting, as the action sequence of the agent, one candidate selected from among the plurality of candidates derived (Chua teaches that at each time step the MPC algorithm computes an optimal action sequence, applies the first action of in the sequence, and repeats until the task-horizon; see figure 1.  The claim does not detail the output and the action must be output in order to be applied.).

With respect to dependent claim 2 the rejection of claim 1 is incorporated.  Further Chua teaches:
obtaining, as the time series data, new time series data indicating states and actions of the agent when the agent performs a series of actions in accordance with the action sequence outputted (Chua teaches considering the current state st and current input at, and the next state st+1; see the first paragraph of section 3.  New data is not detailed and the next state taught by Chua is considered to be new data.).

With respect to dependent claim 4 the rejection of claim 1 is incorporated.  Further Chua teaches:
wherein the mixture model is a Gaussian mixture distribution (Chua teaches Gaussian processes in at least sections 1, 4, and 5.).

With respect to dependent claim 5 the rejection of claim 1 is incorporated.  Further Chua teaches:
wherein the dynamics model is an ensemble of a plurality of neural networks (Chua teaches an ensemble of bootstrapped neural networks in the first paragraph of section 4.).

With respect to dependent claim 6 the rejection of claim 1 is incorporated.  Further Chua teaches:
A non-transitory computer-readable recording medium for use in a computer, the recording medium having a computer program recorded thereon for causing the computer to execute the learning method according to claim 1.  (Chua teaches a computer implemented system of deep reinforcement learning comprising ensembles of artificial neural networks.  Such a system operates in a computing environment requiring memory.)


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under pre-AIA  35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim 3 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chua et al., “Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models” (Chua); in view of Stern, “Probability Models on Rankings and the Electoral Process” (Stern).

With respect to claim 3 the rejection of claim 1 is incorporated.  Further Chua does not explicitly disclose:
wherein in the deriving of the plurality of candidates,
a mixture ratio of a probability distribution for each of the plurality of candidates to an entirety of the mixture model is derived together with the plurality of candidates, and
in the selecting of the one candidate, 
a candidate corresponding to the probability distribution in which the mixture ratio is maximum among the plurality of candidates derived is selected as the one candidate.

However, Stern teaches these features:
wherein in the deriving of the plurality of candidates,
a mixture ratio of a probability distribution for each of the plurality of candidates to an entirety of the mixture model is derived together with the plurality of candidates (Stern teaches using the Bradley-Terry-Luce (BTL) model, which computes a ratio of candidates over candidates remaining; see the Stern beginning on the last paragraph of page 177.), and
in the selecting of the one candidate, 
a candidate corresponding to the probability distribution in which the mixture ratio is maximum among the plurality of candidates derived is selected as the one candidate (Stern teaches ranking candidates by probability on the top of page 178.).
However, Horn teaches this limitation:
Horn teaches hierarchical clustering methods that include iteratively adjusting the number of clusters by merging small clusters or splitting large clusters of data points; see [0011].
Chua and Stern are analogous art directed towards modelling.  Chua teaches a reinforcement learning system that implements model choice using a probabilistic ensemble, and Stern teaches various models selected by voting and probabilistic determinations.
It would have been obvious for one of ordinary skill in the art to incorporate Stern’s voting into the system described by Chua before the effective filing date of the claimed invention.  It would have been obvious because one of ordinary skill would be motivated to use maximum likelihood estimates for analyzing the effectiveness as disclosed on page 180 of Stern.

Prior Art of Record
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Haarnoja et al., “Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor” – teaches a deep reinforcement learning algorithm based on a maximum entropy framework.
Aoki, U.S. Patent Application Publication 2013/0339278 – teaches a data discrimination device for estimating the structure of learning data.



Conclusion
Claims 1-6 are rejected.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to DANIEL T PELLETT whose telephone number is (571)270-7156.  The examiner can normally be reached on Monday - Friday 9-5 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li Zhen can be reached on 571-272-3768.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/DANIEL T PELLETT/Primary Examiner, Art Unit 2121