DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This application, filed on 11/20/2018, is a 371 of PCT/US2017/033218 (filed on 05/18/2017), which claims benefit of Provisional Application No. 62/339,778 (filed on 05/20/2016). 
This action is in response to preliminary amendments submitted on 11/21/2018. In the amendments, claims 2-11 are amended and claims 12-20 are added. Claims 1-20 are pending and have been examined.

Information Disclosure Statement
The information disclosure statement (IDS) was submitted on 12/04/2018.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Interpretation
Claim 10 recites “One or more computer storage media”. Specification [0052] provides the following “The computer storage medium is not, however, a propagated signal.” For examination purposes, “One or more computer storage media” in claim 10 has been interpreted as “One or more non-transitory computer storage media” in view of the Specification. Claim 20 is dependent on claim 10 and follows the claim interpretation of claim 10.
The preamble in each of independent claims 1, 10, and 11 recites “for training a neural network used to select actions to be performed by an agent interacting with an environment”. This recitation in the preamble is not given patentable weight because the recitation is a mere statement of purpose or intended use of the invention and does not reflect or provide definition of the claimed invention’s limitations. See MPEP 2111.02 (II) (“If the body of a claim fully and intrinsically sets forth all of the limitations of the claimed invention, and the preamble merely states, for example, the purpose or intended use of the invention, rather than any distinct definition of any of the claimed invention’s limitations, then the preamble is not considered a limitation and is of no significance to claim construction”). Therefore, for examination purposes, the recitation of “for training a neural network used to select actions to be performed by an agent interacting with an environment” is not considered a limitation of claims 1, 10, and 11. The same claim interpretation is applicable for dependent claims 2-8, 12-18, and 20.

Claim Objections
Claims 1-20 are objected to because of the following informalities: 
“the method” in claim 1 line 2 should be “the computer-implemented method”
“the parameters” in claim 1 line 16 should be “
“The method of” in claims 2-9 should be “The computer-implemented method of”
“the probability” in claim 3 line 3 should be “a probability”
“the one that maximizes the chances” in claim 3 line 4 should be “
“the form” in claim 7 line 2 should be “a form”
“the value” in claim 7 line 4 should be “a value”
“the method further” in claim 9 line 2 should be “the computer-implemented method further”
 “the parameters” in claim 10 line 18 should be “
“the parameters” in claim 11 line 19 should be “
“the probability” in claim 13 line 3 should be “a probability”
“the one that maximizes the chances” in claim 13 line 3-4 should be “
 “the form” in claim 17 line 2 should be “a form”
“the value” in claim 17 line 4 should be “a value”
Dependent claims 2-9 are objected to based on the rationale for claim 1; dependent claim 20 is objected to based on the rationale for claim 10; dependent claims 12-19 are objected to based on the rationale for claim 11.
Appropriate correction is required.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-2, 5-8, 10-12, 15-18, and 20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Regarding Claim 1,
Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 1 is directed to a computer-implemented method for training a neural network used to select actions to be performed by an agent interacting with an environment, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
determining a pseudo-count for the first observation using a sequential density model which represents a likelihood that the first observation occurs given a sequence of previous observations, wherein the pseudo-count depends upon a number of previous occurrences of the first observation during the training of the neural network;
determining an exploration reward bonus that incentivizes the agent to explore the environment from the pseudo-count for the first observation, wherein the exploration reward bonus is lower when the pseudo-count is higher and vice-versa;
generating a combined reward from the actual reward and the exploration reward bonus; and
adjusting current values of the parameters of the neural network using the combined reward.
as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply an exception language and insignificant extra-solution activity language. In particular, the above limitations in the context of this claim encompass determining a pseudo-count for the first observation using a sequential density model which represents a likelihood that the first observation occurs given a sequence of previous observations, wherein the pseudo-count depends upon a number of previous occurrences of the first observation during the training of the neural network (Specification paragraphs [0014] and [0044] provide that determining a pseudo-count using a sequential density model amounts to mathematical calculations and equations; the description of the pseudo-count depending on previous occurrences of the first observation “during the training” further describes the elements involved in the mathematical calculation of determining the pseudo-count); determining an exploration reward bonus that incentivizes the agent to explore the environment from the pseudo-count for the first observation, wherein the exploration reward bonus is lower when the pseudo-count is higher and vice-versa (Specification paragraph [0049] provides that determining an exploration reward bonus from the pseudo-count wherein the exploration reward bonus is lower when the pseudo-count is higher and vice-versa amounts to mathematical calculations and equation); generating a combined reward from the actual reward and the exploration reward bonus (generating a combined reward corresponds to evaluation and judgment with assistance of pen and paper); and adjusting current values of the parameters of the neural network using the combined reward (adjusting current values of the parameters corresponds to evaluation and judgment with assistance of pen and paper).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional element(s) that amount to a recitation of the words "apply it" (or an equivalent) or mere instructions to implement an abstract idea or other exception on a computer, which do not integrate a judicial exception into a practical application. See MPEP 2106.05(f). The recitation of “computer-implemented” amounts to mere instruction to implement an abstract idea on a computer. Moreover, the additional element of “obtaining data identifying (i) a first observation characterizing a first state of the environment, (ii) an action, selected using the neural network, performed by the agent in response to the first observation, and (iii) an actual reward received resulting from the agent performing the action in response to the first observation” amounts to mere data gathering, which is an insignificant extra-solution activity that does not integrate the judicial exception into a practical application. See MPEP 2106.05(g). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. See MPEP 2106.04(d).  
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of “computer-implemented” amounts to recitation of the words "apply it" (or an equivalent) or mere instructions to implement an abstract idea or other exception on a computer, which do not amount to significant more. See MPEP 2106.05(f). Moreover, the additional element of “obtaining data identifying (i) a first observation characterizing a first state of the environment, (ii) an action, selected using the neural network, performed by the agent in response to the first observation, and (iii) an actual reward received resulting from the agent performing the action in response to the first observation” amounts to receiving data, which is an insignificant extra-solution activity that is well-understood, routine, conventional. See MPEP 2106.05(d) (“The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network”). Therefore, the claim is not patent eligible.
Regarding Claim 2,
Claim 2 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 2 is directed to a computer-implemented method for training a neural network used to select actions to be performed by an agent interacting with an environment, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
wherein adjusting the current values of the parameters comprises: using the combined reward in place of the actual reward in performing an iteration of a reinforcement learning technique.
as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply an exception language and insignificant extra-solution activity language. In particular, the above limitations in the context of this claim encompass wherein adjusting the current values of the parameters comprises: using the combined reward in place of the actual reward in performing an iteration of a reinforcement learning technique (adjusting the current values of the parameters using the combined reward in performing an iteration of a reinforcement learning technique corresponds to evaluating the parameters (evaluation) and making changes to the parameters (judgment) with assistance of pen and paper wherein performing an iteration of a reinforcement learning technique corresponds to evaluation and judgment of data representing agent actions in an environment).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional element(s) that amount to a recitation of the words "apply it" (or an equivalent) or mere instructions to implement an abstract idea or other exception on a computer, which do not integrate a judicial exception into a practical application. See MPEP 2106.05(f). The recitation of “computer-implemented” amounts to mere instruction to implement an abstract idea on a computer. Moreover, the additional element of “obtaining data identifying (i) a first observation characterizing a first state of the environment, (ii) an action, selected using the neural network, performed by the agent in response to the first observation, and (iii) an actual reward received resulting from the agent performing the action in response to the first observation” amounts to mere data gathering, which is an insignificant extra-solution activity that does not integrate the judicial exception into a practical application. See MPEP 2106.05(g). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. See MPEP 2106.04(d).  
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of “computer-implemented” amounts to recitation of the words "apply it" (or an equivalent) or mere instructions to implement an abstract idea or other exception on a computer, which do not amount to significant more. See MPEP 2106.05(f). Moreover, the additional element of “obtaining data identifying (i) a first observation characterizing a first state of the environment, (ii) an action, selected using the neural network, performed by the agent in response to the first observation, and (iii) an actual reward received resulting from the agent performing the action in response to the first observation” amounts to receiving data, which is an insignificant extra-solution activity that is well-understood, routine, conventional. See MPEP 2106.05(d) (“The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network”). Therefore, the claim is not patent eligible.
Regarding Claim 5,
Claim 5 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 5 is directed to a computer-implemented method for training a neural network used to select actions to be performed by an agent interacting with an environment, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
wherein generating the combined reward comprises summing the actual reward and the exploration reward bonus.
as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply an exception language and insignificant extra-solution activity language. In particular, the above limitations in the context of this claim encompass wherein generating the combined reward comprises summing the actual reward and the exploration reward bonus (summing corresponds to evaluation and judgment with assistance of pen and paper and mathematical calculations).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional element(s) that amount to a recitation of the words "apply it" (or an equivalent) or mere instructions to implement an abstract idea or other exception on a computer, which do not integrate a judicial exception into a practical application. See MPEP 2106.05(f). The recitation of “computer-implemented” amounts to mere instruction to implement an abstract idea on a computer. Moreover, the additional element of “obtaining data identifying (i) a first observation characterizing a first state of the environment, (ii) an action, selected using the neural network, performed by the agent in response to the first observation, and (iii) an actual reward received resulting from the agent performing the action in response to the first observation” amounts to mere data gathering, which is an insignificant extra-solution activity that does not integrate the judicial exception into a practical application. See MPEP 2106.05(g). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. See MPEP 2106.04(d).  
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of “computer-implemented” amounts to recitation of the words "apply it" (or an equivalent) or mere instructions to implement an abstract idea or other exception on a computer, which do not amount to significant more. See MPEP 2106.05(f). Moreover, the additional element of “obtaining data identifying (i) a first observation characterizing a first state of the environment, (ii) an action, selected using the neural network, performed by the agent in response to the first observation, and (iii) an actual reward received resulting from the agent performing the action in response to the first observation” amounts to receiving data, which is an insignificant extra-solution activity that is well-understood, routine, conventional. See MPEP 2106.05(d) (“The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network”). Therefore, the claim is not patent eligible.
Regarding Claim 6,
Claim 6 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 6 is directed to a computer-implemented method for training a neural network used to select actions to be performed by an agent interacting with an environment, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
wherein the exploration reward bonus RB satisfies: 
    PNG
    media_image1.png
    85
    169
    media_image1.png
    Greyscale
 wherein x is the first observation, 
    PNG
    media_image2.png
    36
    49
    media_image2.png
    Greyscale
 is the pseudo-count for the first observation, a and b are constants, and β is a parameter selected by a parameter sweep.
as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply an exception language and insignificant extra-solution activity language. In particular, the above limitations in the context of this claim encompass wherein the exploration reward bonus RB satisfies: 
    PNG
    media_image1.png
    85
    169
    media_image1.png
    Greyscale
 wherein x is the first observation, 
    PNG
    media_image2.png
    36
    49
    media_image2.png
    Greyscale
 is the pseudo-count for the first observation, a and b are constants, and β is a parameter selected by a parameter sweep (corresponds to mathematical calculation and equation).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional element(s) that amount to a recitation of the words "apply it" (or an equivalent) or mere instructions to implement an abstract idea or other exception on a computer, which do not integrate a judicial exception into a practical application. See MPEP 2106.05(f). The recitation of “computer-implemented” amounts to mere instruction to implement an abstract idea on a computer. Moreover, the additional element of “obtaining data identifying (i) a first observation characterizing a first state of the environment, (ii) an action, selected using the neural network, performed by the agent in response to the first observation, and (iii) an actual reward received resulting from the agent performing the action in response to the first observation” amounts to mere data gathering, which is an insignificant extra-solution activity that does not integrate the judicial exception into a practical application. See MPEP 2106.05(g). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. See MPEP 2106.04(d).  
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of “computer-implemented” amounts to recitation of the words "apply it" (or an equivalent) or mere instructions to implement an abstract idea or other exception on a computer, which do not amount to significant more. See MPEP 2106.05(f). Moreover, the additional element of “obtaining data identifying (i) a first observation characterizing a first state of the environment, (ii) an action, selected using the neural network, performed by the agent in response to the first observation, and (iii) an actual reward received resulting from the agent performing the action in response to the first observation” amounts to receiving data, which is an insignificant extra-solution activity that is well-understood, routine, conventional. See MPEP 2106.05(d) (“The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network”). Therefore, the claim is not patent eligible.
Regarding Claim 7,
Claim 7 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 7 is directed to a computer-implemented method for training a neural network used to select actions to be performed by an agent interacting with an environment, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
wherein the pseudo-count 
    PNG
    media_image2.png
    36
    49
    media_image2.png
    Greyscale
 for the first observation is of the form:

    PNG
    media_image3.png
    81
    205
    media_image3.png
    Greyscale
wherein 
    PNG
    media_image4.png
    29
    58
    media_image4.png
    Greyscale
 is the value of a sequential density model for the first observation and 
    PNG
    media_image5.png
    30
    60
    media_image5.png
    Greyscale
is a recoding probability for the first observation, wherein the recoding probability is a value of the sequential density model after observing a new occurrence of the first observation.
as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply an exception language and insignificant extra-solution activity language. In particular, the above limitations in the context of this claim encompass wherein the pseudo-count 
    PNG
    media_image2.png
    36
    49
    media_image2.png
    Greyscale
 for the first observation is of the form:

    PNG
    media_image3.png
    81
    205
    media_image3.png
    Greyscale
wherein 
    PNG
    media_image4.png
    29
    58
    media_image4.png
    Greyscale
 is the value of a sequential density model for the first observation and 
    PNG
    media_image5.png
    30
    60
    media_image5.png
    Greyscale
is a recoding probability for the first observation, wherein the recoding probability is a value of the sequential density model after observing a new occurrence of the first observation (corresponds to mathematical calculation and equation).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional element(s) that amount to a recitation of the words "apply it" (or an equivalent) or mere instructions to implement an abstract idea or other exception on a computer, which do not integrate a judicial exception into a practical application. See MPEP 2106.05(f). The recitation of “computer-implemented” amounts to mere instruction to implement an abstract idea on a computer. Moreover, the additional element of “obtaining data identifying (i) a first observation characterizing a first state of the environment, (ii) an action, selected using the neural network, performed by the agent in response to the first observation, and (iii) an actual reward received resulting from the agent performing the action in response to the first observation” amounts to mere data gathering, which is an insignificant extra-solution activity that does not integrate the judicial exception into a practical application. See MPEP 2106.05(g). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. See MPEP 2106.04(d).  
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of “computer-implemented” amounts to recitation of the words "apply it" (or an equivalent) or mere instructions to implement an abstract idea or other exception on a computer, which do not amount to significant more. See MPEP 2106.05(f). Moreover, the additional element of “obtaining data identifying (i) a first observation characterizing a first state of the environment, (ii) an action, selected using the neural network, performed by the agent in response to the first observation, and (iii) an actual reward received resulting from the agent performing the action in response to the first observation” amounts to receiving data, which is an insignificant extra-solution activity that is well-understood, routine, conventional. See MPEP 2106.05(d) (“The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network”). Therefore, the claim is not patent eligible.
Regarding Claim 8,
Claim 8 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 8 is directed to a computer-implemented method for training a neural network used to select actions to be performed by an agent interacting with an environment, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
wherein the sequential density model is a pixel-level density model.
as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply an exception language and insignificant extra-solution activity language. In particular, the above limitations in the context of this claim encompass wherein the sequential density model is a pixel-level density model (Specification paragraphs [0014] and [0044] provide that determining a pseudo-count using a sequential density model amounts to mathematical calculations and equations; the explanation that the sequential density model is a pixel-level density model further describes that the density model is a mathematical model modeling specific sets of data).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional element(s) that amount to a recitation of the words "apply it" (or an equivalent) or mere instructions to implement an abstract idea or other exception on a computer, which do not integrate a judicial exception into a practical application. See MPEP 2106.05(f). The recitation of “computer-implemented” amounts to mere instruction to implement an abstract idea on a computer. Moreover, the additional element of “obtaining data identifying (i) a first observation characterizing a first state of the environment, (ii) an action, selected using the neural network, performed by the agent in response to the first observation, and (iii) an actual reward received resulting from the agent performing the action in response to the first observation” amounts to mere data gathering, which is an insignificant extra-solution activity that does not integrate the judicial exception into a practical application. See MPEP 2106.05(g). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. See MPEP 2106.04(d).  
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of “computer-implemented” amounts to recitation of the words "apply it" (or an equivalent) or mere instructions to implement an abstract idea or other exception on a computer, which do not amount to significant more. See MPEP 2106.05(f). Moreover, the additional element of “obtaining data identifying (i) a first observation characterizing a first state of the environment, (ii) an action, selected using the neural network, performed by the agent in response to the first observation, and (iii) an actual reward received resulting from the agent performing the action in response to the first observation” amounts to receiving data, which is an insignificant extra-solution activity that is well-understood, routine, conventional. See MPEP 2106.05(d) (“The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network”). Therefore, the claim is not patent eligible.
Regarding Claim 10,
Claim 10 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 10 is directed to one or more computer storage media encoded with instructions
that, when executed by one or more computers, cause the one or more computers to perform operations for training a neural network used to select actions to be performed by an agent interacting with an environment, which is directed to an article of manufacture, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
the operations comprising...determining a pseudo-count for the first observation using a sequential density model which represents a likelihood that the first observation occurs given a sequence of previous observations, wherein the pseudo-count depends upon a number of previous occurrences of the first observation during the training of the neural network;
determining an exploration reward bonus that incentivizes the agent to explore the environment from the pseudo-count for the first observation, wherein the exploration reward bonus is lower when the pseudo-count is higher and vice-versa;
generating a combined reward from the actual reward and the exploration reward bonus;
adjusting current values of the parameters of the neural network using the combined reward.
as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply an exception language and insignificant extra-solution activity language. In particular, the above limitations in the context of this claim encompass determining a pseudo-count for the first observation using a sequential density model which represents a likelihood that the first observation occurs given a sequence of previous observations, wherein the pseudo-count depends upon a number of previous occurrences of the first observation during the training of the neural network (Specification paragraphs [0014] and [0044] provide that determining a pseudo-count using a sequential density model amounts to mathematical calculations and equations; the description of the pseudo-count depending on previous occurrences of the first observation “during the training” further describes the elements involved in the mathematical calculation of determining the pseudo-count); determining an exploration reward bonus that incentivizes the agent to explore the environment from the pseudo-count for the first observation, wherein the exploration reward bonus is lower when the pseudo-count is higher and vice-versa (Specification paragraph [0049] provides that determining an exploration reward bonus from the pseudo-count wherein the exploration reward bonus is lower when the pseudo-count is higher and vice-versa amounts to mathematical calculations and equation); generating a combined reward from the actual reward and the exploration reward bonus (generating a combined reward corresponds to evaluation and judgment with assistance of pen and paper); and adjusting current values of the parameters of the neural network using the combined reward (adjusting current values of the parameters corresponds to evaluation and judgment with assistance of pen and paper).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional element(s) that amount to a recitation of the words "apply it" (or an equivalent) or mere instructions to implement an abstract idea or other exception on a computer, which do not integrate a judicial exception into a practical application. See MPEP 2106.05(f). The recitation of “One or more computer storage media encoded with instructions that, when executed by one or more computers, cause the one or more computers to perform operations” amounts to mere instruction to implement an abstract idea on a computer. Moreover, the additional element of “obtaining data identifying (i) a first observation characterizing a first state of the environment, (ii) an action, selected using the neural network, performed by the agent in response to the first observation, and (iii) an actual reward received resulting from the agent performing the action in response to the first observation” amounts to mere data gathering, which is an insignificant extra-solution activity that does not integrate the judicial exception into a practical application. See MPEP 2106.05(g). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. See MPEP 2106.04(d).  
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of “One or more computer storage media encoded with instructions that, when executed by one or more computers, cause the one or more computers to perform operations” amounts to recitation of the words "apply it" (or an equivalent) or mere instructions to implement an abstract idea or other exception on a computer, which do not amount to significant more. See MPEP 2106.05(f). Moreover, the additional element of “obtaining data identifying (i) a first observation characterizing a first state of the environment, (ii) an action, selected using the neural network, performed by the agent in response to the first observation, and (iii) an actual reward received resulting from the agent performing the action in response to the first observation” amounts to receiving data, which is an insignificant extra-solution activity that is well-understood, routine, conventional. See MPEP 2106.05(d) (“The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network”). Therefore, the claim is not patent eligible.
Regarding Claim 11,
Claim 11 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 11 is directed to a system comprising one or more computers and one or more computer storage media encoded with instructions that, when executed by the one or more computers, cause the one or more computers to perform operations for training a neural network used to select
actions to be performed by an agent interacting with an environment, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
the operations comprising:...determining a pseudo-count for the first observation using a sequential density model which represents a likelihood that the first observation occurs given a sequence of previous observations, wherein the pseudo-count depends upon a number of previous occurrences of the first observation during the training of the neural network;
determining an exploration reward bonus that incentivizes the agent to explore the environment from the pseudo-count for the first observation, wherein the exploration reward bonus is lower when the pseudo-count is higher and vice-versa;
generating a combined reward from the actual reward and the exploration reward bonus;
adjusting current values of the parameters of the neural network using the combined reward.
as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply an exception language and insignificant extra-solution activity language. In particular, the above limitations in the context of this claim encompass determining a pseudo-count for the first observation using a sequential density model which represents a likelihood that the first observation occurs given a sequence of previous observations, wherein the pseudo-count depends upon a number of previous occurrences of the first observation during the training of the neural network (Specification paragraphs [0014] and [0044] provide that determining a pseudo-count using a sequential density model amounts to mathematical calculations and equations; the description of the pseudo-count depending on previous occurrences of the first observation “during the training” further describes the elements involved in the mathematical calculation of determining the pseudo-count); determining an exploration reward bonus that incentivizes the agent to explore the environment from the pseudo-count for the first observation, wherein the exploration reward bonus is lower when the pseudo-count is higher and vice-versa (Specification paragraph [0049] provides that determining an exploration reward bonus from the pseudo-count wherein the exploration reward bonus is lower when the pseudo-count is higher and vice-versa amounts to mathematical calculations and equation); generating a combined reward from the actual reward and the exploration reward bonus (generating a combined reward corresponds to evaluation and judgment with assistance of pen and paper); and adjusting current values of the parameters of the neural network using the combined reward (adjusting current values of the parameters corresponds to evaluation and judgment with assistance of pen and paper).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional element(s) that amount to a recitation of the words "apply it" (or an equivalent) or mere instructions to implement an abstract idea or other exception on a computer, which do not integrate a judicial exception into a practical application. See MPEP 2106.05(f). The recitation of “one or more computers and one or more computer storage media encoded with instructions that, when executed by the one or more computers, cause the one or more computers to perform operations” amounts to mere instruction to implement an abstract idea on a computer. Moreover, the additional element of “obtaining data identifying (i) a first observation characterizing a first state of the environment, (ii) an action, selected using the neural network, performed by the agent in response to the first observation, and (iii) an actual reward received resulting from the agent performing the action in response to the first observation” amounts to mere data gathering, which is an insignificant extra-solution activity that does not integrate the judicial exception into a practical application. See MPEP 2106.05(g). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. See MPEP 2106.04(d).  
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of “one or more computers and one or more computer storage media encoded with instructions that, when executed by the one or more computers, cause the one or more computers to perform operations” amounts to recitation of the words "apply it" (or an equivalent) or mere instructions to implement an abstract idea or other exception on a computer, which do not amount to significant more. See MPEP 2106.05(f). Moreover, the additional element of “obtaining data identifying (i) a first observation characterizing a first state of the environment, (ii) an action, selected using the neural network, performed by the agent in response to the first observation, and (iii) an actual reward received resulting from the agent performing the action in response to the first observation” amounts to receiving data, which is an insignificant extra-solution activity that is well-understood, routine, conventional. See MPEP 2106.05(d) (“The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network”). Therefore, the claim is not patent eligible.
Regarding Claim 12,
Claim 12 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 12 is directed to a system comprising one or more computers and one or more computer storage media encoded with instructions that, when executed by the one or more computers, cause the one or more computers to perform operations for training a neural network used to select
actions to be performed by an agent interacting with an environment, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
wherein adjusting the current values of the parameters comprises: using the combined reward in place of the actual reward in performing an iteration of a reinforcement learning technique.
as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply an exception language and insignificant extra-solution activity language. In particular, the above limitations in the context of this claim encompass wherein adjusting the current values of the parameters comprises: using the combined reward in place of the actual reward in performing an iteration of a reinforcement learning technique (adjusting the current values of the parameters using the combined reward in performing an iteration of a reinforcement learning technique corresponds to evaluating the parameters (evaluation) and making changes to the parameters (judgment) with assistance of pen and paper wherein performing an iteration of a reinforcement learning technique corresponds to evaluation and judgment of data representing agent actions in an environment).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional element(s) that amount to a recitation of the words "apply it" (or an equivalent) or mere instructions to implement an abstract idea or other exception on a computer, which do not integrate a judicial exception into a practical application. See MPEP 2106.05(f). The recitation of “one or more computers and one or more computer storage media encoded with instructions that, when executed by the one or more computers, cause the one or more computers to perform operations” amounts to mere instruction to implement an abstract idea on a computer. Moreover, the additional element of “obtaining data identifying (i) a first observation characterizing a first state of the environment, (ii) an action, selected using the neural network, performed by the agent in response to the first observation, and (iii) an actual reward received resulting from the agent performing the action in response to the first observation” amounts to mere data gathering, which is an insignificant extra-solution activity that does not integrate the judicial exception into a practical application. See MPEP 2106.05(g). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. See MPEP 2106.04(d).  
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of “one or more computers and one or more computer storage media encoded with instructions that, when executed by the one or more computers, cause the one or more computers to perform operations” amounts to recitation of the words "apply it" (or an equivalent) or mere instructions to implement an abstract idea or other exception on a computer, which do not amount to significant more. See MPEP 2106.05(f). Moreover, the additional element of “obtaining data identifying (i) a first observation characterizing a first state of the environment, (ii) an action, selected using the neural network, performed by the agent in response to the first observation, and (iii) an actual reward received resulting from the agent performing the action in response to the first observation” amounts to receiving data, which is an insignificant extra-solution activity that is well-understood, routine, conventional. See MPEP 2106.05(d) (“The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network”). Therefore, the claim is not patent eligible.
Regarding Claim 15,
Claim 15 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 15 is directed to a system comprising one or more computers and one or more computer storage media encoded with instructions that, when executed by the one or more computers, cause the one or more computers to perform operations for training a neural network used to select
actions to be performed by an agent interacting with an environment, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
wherein generating the combined reward comprises summing the actual reward and the exploration reward bonus.
as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply an exception language and insignificant extra-solution activity language. In particular, the above limitations in the context of this claim encompass wherein generating the combined reward comprises summing the actual reward and the exploration reward bonus (summing corresponds to evaluation and judgment with assistance of pen and paper and mathematical calculations).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional element(s) that amount to a recitation of the words "apply it" (or an equivalent) or mere instructions to implement an abstract idea or other exception on a computer, which do not integrate a judicial exception into a practical application. See MPEP 2106.05(f). The recitation of “one or more computers and one or more computer storage media encoded with instructions that, when executed by the one or more computers, cause the one or more computers to perform operations” amounts to mere instruction to implement an abstract idea on a computer. Moreover, the additional element of “obtaining data identifying (i) a first observation characterizing a first state of the environment, (ii) an action, selected using the neural network, performed by the agent in response to the first observation, and (iii) an actual reward received resulting from the agent performing the action in response to the first observation” amounts to mere data gathering, which is an insignificant extra-solution activity that does not integrate the judicial exception into a practical application. See MPEP 2106.05(g). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. See MPEP 2106.04(d).  
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of “one or more computers and one or more computer storage media encoded with instructions that, when executed by the one or more computers, cause the one or more computers to perform operations” amounts to recitation of the words "apply it" (or an equivalent) or mere instructions to implement an abstract idea or other exception on a computer, which do not amount to significant more. See MPEP 2106.05(f). Moreover, the additional element of “obtaining data identifying (i) a first observation characterizing a first state of the environment, (ii) an action, selected using the neural network, performed by the agent in response to the first observation, and (iii) an actual reward received resulting from the agent performing the action in response to the first observation” amounts to receiving data, which is an insignificant extra-solution activity that is well-understood, routine, conventional. See MPEP 2106.05(d) (“The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network”). Therefore, the claim is not patent eligible.
Regarding Claim 16,
Claim 16 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 16 is directed to a system comprising one or more computers and one or more computer storage media encoded with instructions that, when executed by the one or more computers, cause the one or more computers to perform operations for training a neural network used to select
actions to be performed by an agent interacting with an environment, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
wherein the exploration reward bonus RB satisfies: 
    PNG
    media_image1.png
    85
    169
    media_image1.png
    Greyscale
 wherein x is the first observation, 
    PNG
    media_image2.png
    36
    49
    media_image2.png
    Greyscale
 is the pseudo-count for the first observation, a and b are constants, and β is a parameter selected by a parameter sweep.
as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply an exception language and insignificant extra-solution activity language. In particular, the above limitations in the context of this claim encompass wherein the exploration reward bonus RB satisfies: 
    PNG
    media_image1.png
    85
    169
    media_image1.png
    Greyscale
 wherein x is the first observation, 
    PNG
    media_image2.png
    36
    49
    media_image2.png
    Greyscale
 is the pseudo-count for the first observation, a and b are constants, and β is a parameter selected by a parameter sweep (corresponds to mathematical calculation and equation).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional element(s) that amount to a recitation of the words "apply it" (or an equivalent) or mere instructions to implement an abstract idea or other exception on a computer, which do not integrate a judicial exception into a practical application. See MPEP 2106.05(f). The recitation of “one or more computers and one or more computer storage media encoded with instructions that, when executed by the one or more computers, cause the one or more computers to perform operations” amounts to mere instruction to implement an abstract idea on a computer. Moreover, the additional element of “obtaining data identifying (i) a first observation characterizing a first state of the environment, (ii) an action, selected using the neural network, performed by the agent in response to the first observation, and (iii) an actual reward received resulting from the agent performing the action in response to the first observation” amounts to mere data gathering, which is an insignificant extra-solution activity that does not integrate the judicial exception into a practical application. See MPEP 2106.05(g). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. See MPEP 2106.04(d).  
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of “one or more computers and one or more computer storage media encoded with instructions that, when executed by the one or more computers, cause the one or more computers to perform operations” amounts to recitation of the words "apply it" (or an equivalent) or mere instructions to implement an abstract idea or other exception on a computer, which do not amount to significant more. See MPEP 2106.05(f). Moreover, the additional element of “obtaining data identifying (i) a first observation characterizing a first state of the environment, (ii) an action, selected using the neural network, performed by the agent in response to the first observation, and (iii) an actual reward received resulting from the agent performing the action in response to the first observation” amounts to receiving data, which is an insignificant extra-solution activity that is well-understood, routine, conventional. See MPEP 2106.05(d) (“The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network”). Therefore, the claim is not patent eligible.

Regarding Claim 17,
Claim 17 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 17 is directed to a system comprising one or more computers and one or more computer storage media encoded with instructions that, when executed by the one or more computers, cause the one or more computers to perform operations for training a neural network used to select
actions to be performed by an agent interacting with an environment, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
wherein the pseudo-count 
    PNG
    media_image2.png
    36
    49
    media_image2.png
    Greyscale
 for the first observation is of the form:
    PNG
    media_image3.png
    81
    205
    media_image3.png
    Greyscale
wherein 
    PNG
    media_image4.png
    29
    58
    media_image4.png
    Greyscale
 is the value of a sequential density model for the first observation and 
    PNG
    media_image5.png
    30
    60
    media_image5.png
    Greyscale
is a recoding probability for the first observation, wherein the recoding probability is a value of the sequential density model after observing a new occurrence of the first observation.
as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply an exception language and insignificant extra-solution activity language. In particular, the above limitations in the context of this claim encompass wherein the pseudo-count 
    PNG
    media_image2.png
    36
    49
    media_image2.png
    Greyscale
 for the first observation is of the form:
    PNG
    media_image3.png
    81
    205
    media_image3.png
    Greyscale
wherein 
    PNG
    media_image4.png
    29
    58
    media_image4.png
    Greyscale
 is the value of a sequential density model for the first observation and 
    PNG
    media_image5.png
    30
    60
    media_image5.png
    Greyscale
is a recoding probability for the first observation, wherein the recoding probability is a value of the sequential density model after observing a new occurrence of the first observation (corresponds to mathematical calculation and equation).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional element(s) that amount to a recitation of the words "apply it" (or an equivalent) or mere instructions to implement an abstract idea or other exception on a computer, which do not integrate a judicial exception into a practical application. See MPEP 2106.05(f). The recitation of “one or more computers and one or more computer storage media encoded with instructions that, when executed by the one or more computers, cause the one or more computers to perform operations” amounts to mere instruction to implement an abstract idea on a computer. Moreover, the additional element of “obtaining data identifying (i) a first observation characterizing a first state of the environment, (ii) an action, selected using the neural network, performed by the agent in response to the first observation, and (iii) an actual reward received resulting from the agent performing the action in response to the first observation” amounts to mere data gathering, which is an insignificant extra-solution activity that does not integrate the judicial exception into a practical application. See MPEP 2106.05(g). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. See MPEP 2106.04(d).  
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of “one or more computers and one or more computer storage media encoded with instructions that, when executed by the one or more computers, cause the one or more computers to perform operations” amounts to recitation of the words "apply it" (or an equivalent) or mere instructions to implement an abstract idea or other exception on a computer, which do not amount to significant more. See MPEP 2106.05(f). Moreover, the additional element of “obtaining data identifying (i) a first observation characterizing a first state of the environment, (ii) an action, selected using the neural network, performed by the agent in response to the first observation, and (iii) an actual reward received resulting from the agent performing the action in response to the first observation” amounts to receiving data, which is an insignificant extra-solution activity that is well-understood, routine, conventional. See MPEP 2106.05(d) (“The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network”). Therefore, the claim is not patent eligible.
Regarding Claim 18,
Claim 18 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 18 is directed to a system comprising one or more computers and one or more computer storage media encoded with instructions that, when executed by the one or more computers, cause the one or more computers to perform operations for training a neural network used to select
actions to be performed by an agent interacting with an environment, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
wherein the sequential density model is a pixel-level density model.
as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply an exception language and insignificant extra-solution activity language. In particular, the above limitations in the context of this claim encompass wherein the sequential density model is a pixel-level density model (Specification paragraphs [0014] and [0044] provide that determining a pseudo-count using a sequential density model amounts to mathematical calculations and equations; the explanation that the sequential density model is a pixel-level density model further describes that the density model is a mathematical model modeling specific sets of data).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional element(s) that amount to a recitation of the words "apply it" (or an equivalent) or mere instructions to implement an abstract idea or other exception on a computer, which do not integrate a judicial exception into a practical application. See MPEP 2106.05(f). The recitation of “one or more computers and one or more computer storage media encoded with instructions that, when executed by the one or more computers, cause the one or more computers to perform operations” amounts to mere instruction to implement an abstract idea on a computer. Moreover, the additional element of “obtaining data identifying (i) a first observation characterizing a first state of the environment, (ii) an action, selected using the neural network, performed by the agent in response to the first observation, and (iii) an actual reward received resulting from the agent performing the action in response to the first observation” amounts to mere data gathering, which is an insignificant extra-solution activity that does not integrate the judicial exception into a practical application. See MPEP 2106.05(g). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. See MPEP 2106.04(d).  
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of “one or more computers and one or more computer storage media encoded with instructions that, when executed by the one or more computers, cause the one or more computers to perform operations” amounts to recitation of the words "apply it" (or an equivalent) or mere instructions to implement an abstract idea or other exception on a computer, which do not amount to significant more. See MPEP 2106.05(f). Moreover, the additional element of “obtaining data identifying (i) a first observation characterizing a first state of the environment, (ii) an action, selected using the neural network, performed by the agent in response to the first observation, and (iii) an actual reward received resulting from the agent performing the action in response to the first observation” amounts to receiving data, which is an insignificant extra-solution activity that is well-understood, routine, conventional. See MPEP 2106.05(d) (“The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network”). Therefore, the claim is not patent eligible.
Regarding Claim 20,
Claim 20 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 20 is directed to one or more computer storage media encoded with instructions
that, when executed by one or more computers, cause the one or more computers to perform operations for training a neural network used to select actions to be performed by an agent interacting with an environment, which is directed to an article of manufacture, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
wherein the exploration reward bonus RB satisfies: 
    PNG
    media_image1.png
    85
    169
    media_image1.png
    Greyscale
 wherein x is the first observation, 
    PNG
    media_image2.png
    36
    49
    media_image2.png
    Greyscale
 is the pseudo-count for the first observation, a and b are constants, and β is a parameter selected by a parameter sweep.
as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply an exception language and insignificant extra-solution activity language. In particular, the above limitations in the context of this claim encompass wherein the exploration reward bonus RB satisfies: 
    PNG
    media_image1.png
    85
    169
    media_image1.png
    Greyscale
 wherein x is the first observation, 
    PNG
    media_image2.png
    36
    49
    media_image2.png
    Greyscale
 is the pseudo-count for the first observation, a and b are constants, and β is a parameter selected by a parameter sweep (corresponds to mathematical calculation and equation).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional element(s) that amount to a recitation of the words "apply it" (or an equivalent) or mere instructions to implement an abstract idea or other exception on a computer, which do not integrate a judicial exception into a practical application. See MPEP 2106.05(f). The recitation of “One or more computer storage media encoded with instructions that, when executed by one or more computers, cause the one or more computers to perform operations” amounts to mere instruction to implement an abstract idea on a computer. Moreover, the additional element of “obtaining data identifying (i) a first observation characterizing a first state of the environment, (ii) an action, selected using the neural network, performed by the agent in response to the first observation, and (iii) an actual reward received resulting from the agent performing the action in response to the first observation” amounts to mere data gathering, which is an insignificant extra-solution activity that does not integrate the judicial exception into a practical application. See MPEP 2106.05(g). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. See MPEP 2106.04(d).  
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of “One or more computer storage media encoded with instructions that, when executed by one or more computers, cause the one or more computers to perform operations” amounts to recitation of the words "apply it" (or an equivalent) or mere instructions to implement an abstract idea or other exception on a computer, which do not amount to significant more. See MPEP 2106.05(f). Moreover, the additional element of “obtaining data identifying (i) a first observation characterizing a first state of the environment, (ii) an action, selected using the neural network, performed by the agent in response to the first observation, and (iii) an actual reward received resulting from the agent performing the action in response to the first observation” amounts to receiving data, which is an insignificant extra-solution activity that is well-understood, routine, conventional. See MPEP 2106.05(d) (“The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network”). Therefore, the claim is not patent eligible.
Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: Mnih et al. (US 2015/0100530 A1) teaches reinforcement learning for a subject system having multiple states and actions to move from one state to the next, which is relevant to Fig. 1 of the present application. 

Allowable Subject Matter
Claims 3, 4, 9, 13, 14, and 19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YING YU CHEN whose telephone number is (571)270-1484. The examiner can normally be reached Monday-Friday 7:30 am-5:00 pm (EST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/YING YU CHEN/               Primary Examiner, Art Unit 2125