DETAILED ACTION

Status of Case
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Office Action is in response to the claims filed on 5/22/2020.
Claims 1-20 are pending. 

Information Disclosure Statement
The information disclosure statements (IDS) filed on 6/5/2020, 11/9/2020, and 12/13/2021 have been considered by Examiner. 
	
Claim Rejections - 35 USC § 102 
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1, 8, 14-16, and 19 are rejected under pre-AIA  35 U.S.C. 102(a)(2) as being anticipated by Wayne (WO 2018/211140).
 	Consider claims 1, 16, and 19, Wayne discloses a method performed by one or more data processing apparatus for training an action selection neural network that is used to select actions to be performed by an agent interacting with an environment (see abstract and paragraph 129 and figures 1 and 4, wherein disclosed is said method), system comprising: one or more computers; and one or more storage devices communicatively coupled to the one or more computers, wherein the one or more storage devices store instructions that, when executed by the one or more computers, cause the one or more computers to perform operations for training an action selection neural network that is used to select actions to be performed by an agent interacting with an environment (see figure 1 (reproduced below for convenience), wherein disclosed is said system), and one or more non-transitory computer storage media storing instructions that when executed by one or more computers cause the one or more computers to perform operations for training an action selection neural network that is used to select actions to be performed by an agent interacting with an environment (see figure 1 and paragraphs 128-130, wherein disclosed is said non-transitory computer storage media), comprising: 
 	receiving an observation characterizing a current state of the environment (see paragraphs 3 and 4, wherein disclosed is said observation; also, see figure 1); 
 	selecting an exploration importance factor from a set of possible exploration importance factors (see paragraph 100: this feature is so broadly defined that the disclosed “…the noise injected into the policy for exploration (assuming that a stochastic policy gradient is used to train the policy) …” can be legitimately interpreted as corresponding to said feature); 
 	processing the observation and the exploration importance factor using the action selection neural network to generate an action selection output (see paragraphs 3, 4, 66, and 100, wherein disclosed is said action selection output; also, see figure 1); 
 	selecting an action to be performed by the agent using the action selection output (see paragraphs 3, 4, and 66, wherein disclosed is said selecting); 
 	determining an exploration reward based on: (i) a subsequent observation characterizing a state of the environment after the agent performs the selected action and (ii) one or more prior observations characterizing states of the environment prior to the agent performing the selected action (see paragraphs 18, 108, and 118, wherein disclosed is said exploration reward); 
 	determining an overall reward based on: (i) the exploration importance factor, and (ii) the exploration reward (see paragraphs 18, 100, 108, and 118, wherein disclosed is said overall reward); and 
 	training the action selection neural network using a reinforcement learning technique based on the overall reward (see paragraphs 18, 100, 108, 118, and 120-121, wherein disclosed is said training).


    PNG
    media_image1.png
    728
    866
    media_image1.png
    Greyscale

 	
 	Consider claim 8, Wayne discloses that the set of possible exploration importance factors is a discrete set (see paragraphs 100-101: discrete set).

 	Consider claim 14, Wayne discloses that the agent is a robotic agent interacting with a real-world environment (see paragraph 11: the environment is a real-world environment and the agent may be a robot interacting with the environment to accomplish a specific task).

 	Consider claim 15, Wayne discloses that the observation characterizing the current state of the environment comprises an image (see paragraph 50: the observation can be data captured by one or more sensors as it interacts with the environment, e.g., a camera). 

Allowable Subject Matter
Claims 2-7, 9-13, 17-18, and 20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jamal Javaid whose telephone number is 571-270-5137 and email address is Jamal.Javaid@uspto.gov.
 	Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. 	If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Charles Jiang, can be reached on 571-270-7191.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).
/JAMAL JAVAID/

Primary Examiner, Art Unit 2412