DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Specification
The lengthy specification has not been checked to the extent necessary to determine the presence of all possible minor errors. Applicant’s cooperation is requested in correcting any errors of which applicant may become aware in the specification.


Drawings
The applicant’s submitted drawings appear to be acceptable for examination purposes.


Information Disclosure Statement
As required by M.P.E.P. 609(c), the applicant's submission of the Information Disclosure Statement, dated 1 March 2018, is acknowledged by the examiner and the cited references have been considered in the examination of the claims now pending.  M.P.E.P 609 C(2), a copy of the PTOL-1449 initialed and dated by the examiner is attached to the instant office action.


Claim Rejections - 35 USC § 101
Examiner’s Note: the examiner is interpreting the computer readable storage medium of claims 17-20 to explicitly exclude transitory signals, as described in para. 0148 of the specification as filed.


Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 3-8, 15, 16, 19, and 20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Claim 3 recites the limitation "the actual state defining the similarity" in line 2.  There is insufficient antecedent basis for this limitation in the claim.


Claim 5 recites the limitation "the actual state defining the similarity" in line 3.  There is insufficient antecedent basis for this limitation in the claim.
Claims 6-8 depend upon claim 5, and thus include the aforementioned limitation(s).

Claim 15 recites the limitation "the actual state defining the similarity" in line 2.  There is insufficient antecedent basis for this limitation in the claim.

Claim 16 recites the limitation "the actual state defining the similarity" in line 3.  There is insufficient antecedent basis for this limitation in the claim.

Claim 19 recites the limitation "the actual state defining the similarity" in line 2.  There is insufficient antecedent basis for this limitation in the claim.

Claim 20 recites the limitation "the actual state defining the similarity" in line 3.  There is insufficient antecedent basis for this limitation in the claim.


Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-3 and 5-20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Li et al. (Inferring The Latent Structure of Human Decision-Making from Raw Visual Inputs, March 2017, pgs. 1-11).

As per claim 1, Li teaches a computer-implemented method for estimating a reward in reinforcement learning, the method comprising: preparing a state prediction model trained to predict a state for an input using visited states in expert demonstrations performed by an expert [an apprenticeship learning system that learns a policy by estimating a reward based upon observations of an expert (pgs. 2-3 and 5, abstract and sections 2.1-2.3, 4.1; etc.)]; inputting an actual state observed by an agent in reinforcement learning into the state prediction model to calculate a predicted state [the GAN is used to produce a trajectory based upon the input state to attempt to imitate the expert trajectory (pg. 3, sections 2.2-2.3, etc.)]; and estimating a reward in the reinforcement learning based, at least in part, on similarity between the predicted state and an actual state observed by the agent [the GAN is used to produce a trajectory based upon the input state to attempt to imitate the expert trajectory by determining the distance between the trajectory (similarity) (pg. 3, sections 2.2-2.3; pg. 5, sections 4.1-4.2; etc.) where for the policy network, input visual features are passed through two convolutional layers, and then combined with the auxiliary information vector and (in the case of Info-GAIL) the latent code to produce the expected accumulated future reward (pg. 7, section 5.2, etc.)].

As per claim 2, Li teaches training the state prediction model using the visited states in the expert demonstrations without actions executed by the expert in relation to the visited states [the network may be trained with visual inputs (images) of the expert demonstration (i.e., only state information and not the action) (pg. 4, section 3.2, etc.)].

As per claim 3, Li teaches wherein the state prediction model is a generative model, and both of the actual state defining the similarity and the actual state inputted into the generative model are observed at the same time step [the system utilizes a generative adversarial imitation learning model (pg. 2, abstract, etc.)], the method further comprising: training the generative model so as to minimize an error between a visited state in the expert demonstrations and a reconstructed state from the visited [the GAN is used to produce a trajectory based upon the input state to attempt to imitate the expert trajectory by minimizing the distance between the trajectories (pg. 3, sections 2.2-2.3; pg. 5, sections 4.1-4.2; etc.)].

As per claim 5, Li teaches wherein the state prediction model is a temporal sequence prediction model, and the actual state inputted into the temporal sequence prediction model precedes the actual state defining the similarity, the method further comprising: training the temporal sequence prediction model so as to minimize an error between a visited state in the expert demonstrations and an inferred state from one or more preceding visited states in the expert demonstrations [the GAN is used to produce a trajectory based upon the input state to attempt to imitate the expert trajectory by determining the distance between the trajectory (similarity) (pg. 3, sections 2.2-2.3; pg. 5, sections 4.1-4.2; etc.) where for the policy network, input visual features are passed through multiple convolutional layers (a temporal sequence prediction model), and then combined with the auxiliary information vector and (in the case of Info-GAIL) the latent code to produce the expected accumulated future reward (pg. 7, section 5.2, etc.)].

As per claim 6, Li teaches wherein the temporal sequence prediction model is a next state model that infers a next state as the predicted state from an actual current state, the similarity being defined between the next state inferred by the next state model and an actual next state [the GAN is used to produce a trajectory based upon the input state to attempt to imitate the expert trajectory by determining the distance between the trajectory (similarity) (pg. 3, sections 2.2-2.3; pg. 5, sections 4.1-4.2; etc.) where for the policy network, input visual features are passed through multiple convolutional layers (a temporal sequence prediction model), and then combined with the auxiliary information vector and (in the case of Info-GAIL) the latent code to produce the expected accumulated future reward (pg. 7, section 5.2, etc.)].

As per claim 7, Li teaches wherein the temporal sequence prediction model is a long short term memory (LSTM) based model that infers a next state as the predicted state from an actual state history or an actual current state, the similarity being defined between the next state inferred by the LSTM based model and an actual next state [the model includes certain auxiliary information as internal input to serve as a short-term memory (pg. 6, section 5.2, etc.) where the GAN is used to produce a trajectory based upon the input state to attempt to imitate the expert trajectory by determining the distance between the trajectory (similarity) (pg. 3, sections 2.2-2.3; pg. 5, sections 4.1-4.2; etc.)].

As per claim 8, Li teaches wherein the temporal sequence prediction model is a 3-dimensional convolutional neural network (3D-CNN) model that infers a next state as the predicted state from an actual state history or an actual current state, the similarity being defined between the next state inferred by the 3D-CNN based model and an actual next state [the GAN is used to produce a trajectory based upon the input state to attempt to imitate the expert trajectory by determining the distance between the trajectory (similarity) (pg. 3, sections 2.2-2.3; pg. 5, sections 4.1-4.2; etc.) where for the policy network, input visual features are passed through multiple convolutional layers (a temporal sequence prediction model), and then combined with the auxiliary information vector and (in the case of Info-GAIL) the latent code to produce the expected accumulated future reward; utilizing three dimensional image information for the CNN (pg. 7, section 5.2, etc.)].

As per claim 9, Li teaches wherein the expert demonstration represents optimal behavior and the reward is estimated as a higher value as the similarity becomes high [the GAN is used to produce a trajectory based upon the input state to attempt to imitate the expert trajectory by determining the distance between the trajectory (similarity) (pg. 3, sections 2.2-2.3; pg. 5, sections 4.1-4.2; etc.) where for the policy network, input visual features are passed through multiple convolutional layers (a temporal sequence prediction model), and then combined with the auxiliary information vector and (in the case of Info-GAIL) the latent code to produce the expected accumulated future reward (pg. 7, section 5.2, etc.)].

As per claim 10, Li teaches wherein the reward is based further on a cost for an action executed by the agent in the reinforcement learning in addition to the similarity [an additional penalty (cost) term may be added to the agent in the reinforcement learning (pg. 5, section 4.1, etc.)].

[the posterior approximation network adopts the same architecture as the discriminator except that the output is a softmax over the discrete latent variables, or factored Gaussian over continuous latent variables (pg. 7, section 5.2, etc.)].

As per claim 12, Li teaches updating parameters in the reinforcement learning by using the reward estimated [the networks are updated and optimized (pg. 4, section 3.1-3.2, etc.)].

As per claim 13, see the rejection of claim 1, above, wherein Li also teaches a computer system comprising: a memory storing program instructions; a processing circuitry in communications with the memory for executing the program instructions, wherein the processing circuitry is configured to perform the steps [the system may be implemented in a client-server framework utilizing several available APIs (pg. 6, section 5.1), which inherently requires a memory storing instructions to be executed by a processor of some kind].

As per claim 14, see the rejection of claim 2, above.

As per claim 15, see the rejection of claim 3, above.



As per claim 17, see the rejection of claim 1, above, wherein Li also teaches a computer program product for estimating a reward in reinforcement learning, the compute program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computer to cause the computer to perform the method [the system may be implemented in a client-server framework utilizing several available APIs (pg. 6, section 5.1), which inherently requires a memory storing instructions to be executed by a computer of some kind].

As per claim 18, see the rejection of claim 2, above.

As per claim 19, see the rejection of claim 3, above.

As per claim 20, see the rejection of claim 5, above.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of 
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim 4 is/are rejected under 35 U.S.C. 103 as being unpatentable over Li et al. (Inferring The Latent Structure of Human Decision-Making from Raw Visual Inputs, March 2017, pgs. 1-11) in view of Gupta (US 2018/0293721).

As per claim 4, Li teaches the computer-implemented method of claim 3, as described above.
Li does not teach wherein the generative model is an autoencoder that reconstructs a state as the predicted state from an actual state, the similarity being defined between the state reconstructed by the autoencoder and the actual state.
Gupta teaches wherein the generative model is an autoencoder that reconstructs a state as the predicted state from an actual state, the similarity being defined between the state reconstructed by the autoencoder and the actual state [an autoencoder may be used as the generative model in a generative adversarial network (para. 0081, etc.); for the GAN used in the model of Li for determining similarity between the generated state and the actual state (see above)].
Li and Gupta are analogous art, as they are within the same field of endeavor, namely machine learning.
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to utilize an autoencoder as the generator in a GAN, as taught by Gupta, in the GAN in the system of Li.
Gupta provides motivation as [a variational auto-encoder is a component that takes the merits of deep learning and variational inference and leads to significant advances in generative modeling (para. 0081, etc.)].


Conclusion
The following is a summary of the treatment and status of all claims in the application as recommended by M.P.E.P. 707.07(i): claims 1-20.

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Mukadam (US 10,739,776) – discloses a system training a network only with state information identified by masking.
Hausman et al. (Multi-Modal Imitation Learning from Unstructured Demonstrations using Generative Adversarial Nets, Nov 2017, pgs. 1-11) – discloses imitation learning using GANs.
Abbeel et al. (Apprenticeship Learning via Inverse Reinforcement Learning, July 2004, pgs. 1-8) – discloses an apprentice learning system for estimating reward based upon expert demonstrations.

The examiner requests, in response to this Office action, that support be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line number(s) in the specification and/or drawing figure(s). This will assist the examiner in prosecuting the application.

When responding to this office action, Applicant is advised to clearly point out the patentable novelty which he or she thinks the claims present, in view of the state of the art disclosed by the references cited or the objections made. He or she must also show how the amendments avoid such references or objections.  See 37 CFR 1.111(c).

Any inquiry concerning this communication or earlier communications from the examiner should be directed to GEORGE GIROUX whose telephone number is (571)272-9769.  The examiner can normally be reached on M-F 10am-6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on 571-272-7796.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access 






/GEORGE GIROUX/Primary Examiner, Art Unit 2125