Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
The instant application having Application No. 16688934 has a total of 25 claims pending in the application, of which claims 2-3 and 17-19 have been cancelled. 


Claim Rejections - 35 USC § 112

The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1, 4-16, and 20-25 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. 
As per claims 1, 20, and 25, these claims make use of the phrase “reference trajectory” throughout the claims. However, the specification only mentions a “reference trajectory” in a single paragraph, 0033, which states: “The policy generator may be used to select actions to be performed by an agent interacting with an environment to imitate a state-action trajectory, using the discriminator to discriminate between the imitated state-action trajectory and a reference trajectory, and updating parameters of the policy generator using the reward values conditioned on the target embedding vector.” This is the only discussion of a “reference trajectory” in the entire specification. This means that every discussion of a “Reference trajectory” in the claims is new matter, as there is no support in the specification for any of the actions taking place in the claims, such as:
“receiving a set of reference trajectories, each reference trajectory comprising a set of observations characterizing ga set of states of the environment and corresponding actions”, 
“For each reference trajectory…”,
“Determining an embedding for the reference trajectory”
“applying the encoder to the reference trajectory”, 
“The embedding for the reference trajectory” 
“processing the observation action pair and the embedding of the reference trajectory”
“configured to process the observation action pair and the embedding for the reference trajectory”
“from an imitation trajectory generated by the policy instead of from the reference trajectory”
Essentially, every discussion of “reference trajectory” in the claims is not supported by the specification, and therefore rejected under U.S.C. 112(a) for new matter. 
Similar issues are found in claim 4 and 21. 
As per claims 4-16 and 21-24, these claims are rejected as being dependent on claims rejected under U.S.C. 112(a) for new matter. 
As per claims 1, 20, and 25, these claims make use of the phrase “observation action pair” as part of an imitation trajectory. However, the only type an observation-action pair is described is in one paragraph of the specification. Paragraph 0066 denotes the use of observation-action pairs in regards to use of the neural network, and generate a Q-value for that observation-action pair, and using those Q-values to determine an action for the agent to perform given the observation. However, this paragraph makes no discussion of an “imitation trajectory.” In fact, it does not discuss trajectories at all. The closest discussion of trajectories is in paragraph 0072, which states that they comprise data identifying a first observation characterizing a first state of the environment and a first action performed by the agent, but this does not mention the use of imitation trajectories. Imitation trajectories are only discussed in paragraphs 0023, 0105, 0108, and 0116, none of which makes any mention of observation-action pairs. The closest the Examiner can tell is one sentence in the specification which states: “Data characterizing a state of the environment will be referred to in this specification as an observation.” (See instant specification, paragraph 0009). Then the specification goes on to use “observation” and “State” separately, referring to state-action pairs numerously throughout the specification and “observation-action pairs” only in one location. As far as the Examiner can tell, these terms are not intended to be exactly the same, as they are used at separate times and separate situations. If they are intended to be interchangeable, why are both phrases used, but not in the same way? In the next response, please make clear where the Applicant is finding support for the use of the “observation-action pair” with imitation trajectories. 
There are similar issues with the phrases:
 “for each observation action pair in the imitation trajectory”,
“represents a likelihood that the observation action pair is from an imitation trajectory….” 
“Determining a respective reward for each observation action pair in the imitation trajectory from the discriminator output for the observation action pair” 
“Determining the return from the respective rewards for the observation action pairs in the imitation trajectory” 
As per claims 4-16 and 21-24, these claims are dependent on a claim rejected under U.S.C. 112(a) for new matter. 
As per claims 1, 20, and 25, these claims disclose “Receiving an observation characterizing an imitation state at the time step.” However, the specification at no time discloses an observation characterizing an imitation state. Imitation state is only disclosed in paragraphs 0023 (“from a total of Tj imitation state action pairs”), paragraph 0026 (“The variational auto encoder may further comprise a state decoder for decoding the embeddings to produce imitation states and an action decoder for decoding the embeddings to produce imitation actions. The imitation states and imitation actions combine as state action pairs to form imitation trajectories”), paragraph 0082 (“During training the encoding is input into a state decoder and an action decoder to determine imitation states and imitation actions”) and paragraph 0110 (“wherein Xn is the nth imitation state”). At no time do any of these paragraphs disclose receiving an observation, let alone receiving an observation that characterizes an imitation state at a particular time step. Even if one were to take the single sentence of the specification stating that “Data characterizing a state of the environment will be referred to in this specification as an observation.” (See instant specification, paragraph 0009). This does not disclose that an observation can be used to describe an imitation state. Even if it were found that “observation” and “state” are interchangeable, how is the claim “receiving an observation characterizing an imitation state at the time step?” The imitation state is not something received as input by the system, it is generated by the decoder (“The variational auto encoder may further comprise a state decoder for decoding the embeddings to produce imitation states”) (see paragraph 0026). How can something generated by the decoder be “receiving an observation characterizing an imitation state”? This causes the claim to be new matter and rejected under U.S.C. 112(a) for new matter. 
As per claims 4-16 and 21-24, these claims are dependent on a claim rejected under U.S.C. 112(a) for new matter. 
As per claims 1, 20, and 25, these claims disclose “Processing an input comprising (i) the observation and (ii) the embedding for the reference trajectory using the neural network.” However, at no time does the specification disclose receiving an input comprising an observation, let alone an observation and an embedding of a reference trajectory. At best, the specification discloses a neural network receiving an embedding OF an observation and using that to determine an action for the agent in response to the observation. However, this does not disclose processing an input of both an observation and the embedding for the reference trajectory. This leads the limitation to be new matter, and therefore rejected under U.S.C. 112(a). 
As per claims 1, 20, and 25, this claim calls for “to generate a discriminator output that represents a likelihood that the observation action pair is from an imitation trajectory entered by the policy instead of from the reference trajectory.” First, once again, there is no discussion of performing any likelihood analysis with an observation-action pair. As disclosed above, the observation-action pair is only discussed in a single paragraph, and none of it relates to likelihoods.  Second, the specification fails to disclose any discussion of determining likelihoods of any type of pair being from an imitation trajectory vs a reference trajectory. As stated above, reference trajectory is only mentioned once in the specification, so there is no support for this in that regard. Third, the only discussion of likelihood is found in paragraph 0054, which talks about behavioral cloning and demonstration trajectories, which apply a Maximum likelihood to imitate the actions. There is no discussion of an imitation trajectory, no discussion of an observation action pair, and no discussion of any policy. Therefore there is no support for this limitation in the specification, and the claims are rejected under U.S.C. 112(a) for new matter. 
As per claims 4-16 and 21-24, these claims are dependent on a claim rejected under U.S.C. 112(a) for new matter. 
As per claims 4 and 21, these claims disclose similar uses of observations from observation action pairs, but no support in the specification for these aspects. In the specification, all of these are performed for state-action pairs, not observation-action pairs.  So they are rejected for similar reasons to claims 1, 20, and 25 for new matter as shown above. 
As per claims 4 and 21, these claims make use of a reference trajectory, and there is no support in the specification for any reference trajectory in regards to these equations. They are rejected for similar reasons to those given above for claims 1, 20, and 25 for new matter. 

	



Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1, 4-16, and 20-25 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
As per claims 1, 20, and 25, these claims make use of the phrase “observation action pair.”  The only type an observation-action pair is described is in one paragraph of the specification. Paragraph 0066 denotes the use of observation-action pairs in regards to use of the neural network, and generate a Q-value for that observation-action pair, and using those Q-values to determine an action for the agent to perform given the observation. The claims make use of an “observation-action pair” to perform many of the steps described in the specification for state-action pairs. While the specification does state that: “Data characterizing a state of the environment will be referred to in this specification as an observation”, the specification then goes on to use “observation” and “State” separately, referring to state-action pairs numerously throughout the specification and “observation-action pairs” only in the one location. As far as the Examiner can tell, these terms are not intended to be exactly the same, as they are used at separate times and separate situations. If they are intended to be interchangeable, why are both phrases used, but not in the same way? In the next response, please make clear where the Applicant is finding support for the use of the “observation-action pair” for all the different steps attributed to state-action pairs in the specification, and why these phrases are both used in the specification but now in the claims are seemingly intended to be the exact same thing. 
As per claims 4-16 and 21-24, these claims are dependent on a claim rejected under U.S.C. 112(b) for failing to particularly point out and claim the intended invention. 
As per claims 1, 20, and 25, these claims make use of the phrase “A set of reference trajectories.” However, the specification only mentions reference trajectories in a single paragraph, and does not describe how they are made, what they are, or any detail on how they could be used. This causes the claim to be confusing as there is no definition or explanation on what a reference trajectory is or how it can perform any of the actions described to it in the claims. This causes the claim to be unclear and rejected under U.S.C. 112(b) for failing to particularly point out and claim the intended invention. 
As per claims 4-16 and 21-24, these claims are dependent on a claim rejected under U.S.C. 112(b) for failing to particularly point out and claim the intended invention. 

As per claims 6 and 23, these claims denote the phrase “the set of trajectories.” There is insufficient antecedent basis for this limitation. 
As per claim 8, this claim calls for “the set of trajectories.” There is insufficient antecedent basis for this limitation. Further, the parent claim 1 has both “a set of imitation trajectories” and “a set of reference trajectories.” It is unclear which one this would refer to.
As per claim 9, this claim is rejected as being dependent on a claim rejected under U.S.C. 112(b) for lack of antecedent basis. 
As per claim 10, this claim calls for “the set of trajectories.” There is insufficient antecedent basis for this limitation. Further, the parent claim 1 has both “a set of imitation trajectories” and “a set of reference trajectories.” It is unclear which one this would refer to.
As per claims 11-16, these claims are rejected as being dependent on a claim rejected under U.S.C. 112(a) for lack of antecedent basis. 
As per claim 14, this claim calls for “an observation from the trajectory. There is insufficient antecedent basis for “the trajectory.”  Further, the parent claim 1 has both “a set of imitation trajectories” and “a set of reference trajectories.” It is unclear which one this would refer to.
As per claims 15-16, these claims are rejected as being dependent on a claim rejected under U.S.C. 112(a) for lack of antecedent basis. 


Response to Arguments

Applicant's arguments with respect to claims 1, 4-16 and 20-25 have been considered but are moot in view of the new ground(s) of rejection.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BEN M RIFKIN whose telephone number is (571)272-9768. The examiner can normally be reached Monday-Friday 9 am - 5 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, James Trujillo can be reached on (571) 272-3677. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/BEN M RIFKIN/Primary Examiner, Art Unit 2198