DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is in response to amendments and remarks filed on 02/28/2022. In the current amendments, the Specification is amended and claims 1, 9, 10, 11, 13, 14, 17, 19, 20, and 21 are amended. Claims 1-21 are pending and have been examined.
In response to amendments to Specification, the incorporation by reference in Specification paragraph [0171] is considered effective.
In response to amendments and remarks filed on 02/28/2022, the 35 U.S.C. 112(b) rejection to claims 1-8, 11-13, 15-18, and 21 made in the previous Office Action has been withdrawn.

Claim Objections
Claims 4, 5, 9, 10, and 11-20 are objected to because of the following informalities: 
The recitation of “and error value” in claim 11 line 7 should be “an error value” (emphasis added).  Dependent claims 12-20 are objected to based on the same rationale as claim 11. 
The recitation of “
    PNG
    media_image1.png
    36
    25
    media_image1.png
    Greyscale
” in claims 4, 5, 14, 15 should have an explanation in the claim that it represents the mathematical operation of taking the partial derivative.
The recitation of “λ” in claims 9, 10, 19, and 20 should have an explanation in the claim that it represents an input numerical value, in accordance with Specification paragraph [0022]. 
Appropriate correction is required.



Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 9, 10, 14, 19, and 20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 14 recites the limitation "The system of claim 11" in line 1.  There is insufficient antecedent basis for this limitation in the claim. For examination purposes, "The system of claim 11" has been interpreted as "The method of claim 11".
Amendments to claims 9, 10, 19, and 20 provide explanations to some of the claimed symbols, but the following symbol remains unexplained: “            
                
                    
                        H
                    
                    
                        i
                    
                
            
        ”. Therefore, amended claims 9, 10, 19, and 20 lack clarity because they recite algorithms without explaining what the symbol “            
                
                    
                        H
                    
                    
                        i
                    
                
            
        ” in the algorithms represents, and one of ordinary skill in the art would not be able to ascertain the metes and bounds of the claim.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-21 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Regarding Claim 1,
Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 1 is directed to a system for a machine learning architecture, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
provide the machine learning architecture for estimating a value function for at least a portion of a given state, the value function defined at least in part on a plurality of weights
receiving one or more observation data sets representing one or more observations associated with at least a portion of a state of an environment in which an agent operates; and
training the machine learning architecture with the one or more observation data sets, where the training includes updating the plurality of weights based on: an error value and at least one time-varying step-size value; wherein the at least one time-varying step-size value is based on a set of meta-weights which vary based on a stochastic meta-descent and a trace of past updates to the plurality of weights
as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply the exception language (“the system comprising: at least one memory and at least one processor configured” and “the at least one processor configured for” and “generating signals for instructing the agent to perform an action based on the machine learning architecture and an observed state of the environment”). The above limitations in the context of this claim encompass provide the machine learning architecture for estimating a value function for at least a portion of a given state, the value function defined at least in part on a plurality of weights (corresponds to observation, evaluation, and judgment with assistance of pen and paper; the broadest reasonable interpretation of “machine learning architecture” would include machine learning models since the Specification points to “architectures involving reinforcement learning (RL)” as an example for “machine learning architecture” (Specification [0003]); a human mind, with the assistance of pen and paper, is able to implement a reinforcement learning model (a type of machine learning architecture) to estimate a value function since it entails observing and evaluating an agent taking actions in an environment, interpreting (evaluating) the action and the resulting state, evaluating/judging the reward corresponding to the agent’s action, and evaluating a value function based on the actions, states, and rewards); receiving one or more observation data sets representing one or more observations associated with at least a portion of a state of an environment in which an agent operates (corresponds to observation); and training the machine learning architecture with the one or more observation data sets, where the training includes updating the plurality of weights based on: an error value and at least one time-varying step-size value; wherein the at least one time-varying step-size value is based on a set of meta-weights which vary based on a stochastic meta-descent and a trace of past updates to the plurality of weights (corresponds to mathematical calculations as the claim specifically identifies that “training the machine learning architecture” involves updating weights based on error values (mathematical calculations) and determining step-size value based on meta-weights based on a stochastic meta-descent and a trace of past updates to the plurality of weights (mathematical calculations)).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP
2106.05(f). The additional element(s) of “the system comprising: at least one memory and at least one processor configured” and “the at least one processor configured for”, as drafted, is/are reciting generic computer component(s). The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Moreover, the additional element of “generating signals for instructing the agent to perform an action based on the machine learning architecture and an observed state of the environment” amounts to mere instruction to apply the abstract idea by instructing an agent to perform action based on the abstract idea. Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements amount to no more than mere instructions to apply the exception. Mere instructions to apply an exception cannot provide an inventive concept. Therefore, the claim is not patent eligible.
Regarding Claim 2,
Claim 2 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 2 is directed to a system for a machine learning architecture, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
wherein the stochastic meta-descent is defined using a full gradient
as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply the exception language (“the system comprising: at least one memory and at least one processor configured” and “the at least one processor configured for” and “generating signals for instructing the agent to perform an action based on the machine learning architecture and an observed state of the environment”). The above limitations in the context of this claim encompass performing stochastic meta-descent with full gradient (mathematical calculations)).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP
2106.05(f). The additional element(s) of “the system comprising: at least one memory and at least one processor configured” and “the at least one processor configured for”, as drafted, is/are reciting generic computer component(s). The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Moreover, the additional element of “generating signals for instructing the agent to perform an action based on the machine learning architecture and an observed state of the environment” amounts to mere instruction to apply the abstract idea by instructing an agent to perform action based on the abstract idea. Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements amount to no more than mere instructions to apply the exception. Mere instructions to apply an exception cannot provide an inventive concept. Therefore, the claim is not patent eligible.
Regarding Claim 3,
Claim 3 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 3 is directed to a system for a machine learning architecture, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
wherein the stochastic meta-descent is defined using a semi-gradient
as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply the exception language (“the system comprising: at least one memory and at least one processor configured” and “the at least one processor configured for” and “generating signals for instructing the agent to perform an action based on the machine learning architecture and an observed state of the environment”). The above limitations in the context of this claim encompass performing stochastic meta-descent with semi-gradient (mathematical calculations)).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP
2106.05(f). The additional element(s) of “the system comprising: at least one memory and at least one processor configured” and “the at least one processor configured for”, as drafted, is/are reciting generic computer component(s). The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Moreover, the additional element of “generating signals for instructing the agent to perform an action based on the machine learning architecture and an observed state of the environment” amounts to mere instruction to apply the abstract idea by instructing an agent to perform action based on the abstract idea. Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements amount to no more than mere instructions to apply the exception. Mere instructions to apply an exception cannot provide an inventive concept. Therefore, the claim is not patent eligible.
Regarding Claim 4,
Claim 4 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 4 is directed to a system for a machine learning architecture, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:

    PNG
    media_image2.png
    204
    748
    media_image2.png
    Greyscale

as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply the exception language (“the system comprising: at least one memory and at least one processor configured” and “the at least one processor configured for” and “generating signals for instructing the agent to perform an action based on the machine learning architecture and an observed state of the environment”). The above limitations in the context of this claim encompass updating the meta-weights based on an equation (mathematical equation)).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP
2106.05(f). The additional element(s) of “the system comprising: at least one memory and at least one processor configured” and “the at least one processor configured for”, as drafted, is/are reciting generic computer component(s). The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Moreover, the additional element of “generating signals for instructing the agent to perform an action based on the machine learning architecture and an observed state of the environment” amounts to mere instruction to apply the abstract idea by instructing an agent to perform action based on the abstract idea. Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements amount to no more than mere instructions to apply the exception. Mere instructions to apply an exception cannot provide an inventive concept. Therefore, the claim is not patent eligible.
Regarding Claim 5,
Claim 5 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 5 is directed to a system for a machine learning architecture, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:

    PNG
    media_image3.png
    132
    754
    media_image3.png
    Greyscale
 
    PNG
    media_image4.png
    118
    731
    media_image4.png
    Greyscale

as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply the exception language (“the system comprising: at least one memory and at least one processor configured” and “the at least one processor configured for” and “generating signals for instructing the agent to perform an action based on the machine learning architecture and an observed state of the environment”). The above limitations in the context of this claim encompass updating the meta-weights based on an equation (mathematical equation)).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP
2106.05(f). The additional element(s) of “the system comprising: at least one memory and at least one processor configured” and “the at least one processor configured for”, as drafted, is/are reciting generic computer component(s). The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Moreover, the additional element of “generating signals for instructing the agent to perform an action based on the machine learning architecture and an observed state of the environment” amounts to mere instruction to apply the abstract idea by instructing an agent to perform action based on the abstract idea. Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements amount to no more than mere instructions to apply the exception. Mere instructions to apply an exception cannot provide an inventive concept. Therefore, the claim is not patent eligible.
Regarding Claim 6,
Claim 6 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 6 is directed to a system for a machine learning architecture, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
applying new inputs to the machine learning architecture with the updated weights.
as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply the exception language (“the system comprising: at least one memory and at least one processor configured” and “the at least one processor configured for” and “generating signals for instructing the agent to perform an action based on the machine learning architecture and an observed state of the environment”). The above limitations in the context of this claim encompass applying new inputs to the machine learning architecture with the updated weights (corresponds to evaluation of input; also see analysis above for claim 1 regarding “machine learning architecture”).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP
2106.05(f). The additional element(s) of “the system comprising: at least one memory and at least one processor configured” and “the at least one processor configured for”, as drafted, is/are reciting generic computer component(s). The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Moreover, the additional element of “generating signals for instructing the agent to perform an action based on the machine learning architecture and an observed state of the environment” amounts to mere instruction to apply the abstract idea by instructing an agent to perform action based on the abstract idea. Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements amount to no more than mere instructions to apply the exception. Mere instructions to apply an exception cannot provide an inventive concept. Therefore, the claim is not patent eligible.
Regarding Claim 7,
Claim 7 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 7 is directed to a system for a machine learning architecture, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: The claim, as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply the exception language (“the system comprising: at least one memory and at least one processor configured” and “the at least one processor configured for” and “generating signals for instructing the agent to perform an action based on the machine learning architecture and an observed state of the environment”) and language generally linking the use of a judicial exception to a particular technological environment or field of use (“wherein the machine learning architecture is at least a portion of a reinforcement learning system”). Also see analysis of claim 1 above.
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP
2106.05(f). The additional element(s) of “the system comprising: at least one memory and at least one processor configured” and “the at least one processor configured for”, as drafted, is/are reciting generic computer component(s). The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Moreover, the additional element of “generating signals for instructing the agent to perform an action based on the machine learning architecture and an observed state of the environment” amounts to mere instruction to apply the abstract idea by instructing an agent to perform action based on the abstract idea. Moreover, the recitation of “wherein the machine learning architecture is at least a portion of a reinforcement learning system” amounts to language generally linking the use of a judicial exception to a particular technological environment or field of use, namely the environment of a reinforcement learning system. MPEP 2106.05(h) provides that “limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not amount to significantly more than the exception itself, and cannot integrate a judicial exception into a practical application.” Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the claim contains additional elements that amount to no more than mere instructions to apply the exception. Mere instructions to apply an exception cannot provide an inventive concept. Moreover, the recitation of “wherein the machine learning architecture is at least a portion of a reinforcement learning system” amounts to language generally linking the use of a judicial exception to a particular technological environment or field of use, namely the environment of a reinforcement learning system. MPEP 2106.05(h) provides that “limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not amount to significantly more than the exception itself, and cannot integrate a judicial exception into a practical application.”  Therefore, the claim is not patent eligible.
Regarding Claim 8,
Claim 8 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 8 is directed to a system for a machine learning architecture, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
wherein the value function is at least a component of a representation of a position of a robot component
wherein the value function is an input to a...system
Step 2A Prong One Analysis: The claim, as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply the exception language (“the system comprising: at least one memory and at least one processor configured” and “the at least one processor configured for” and “generating signals for instructing the agent to perform an action based on the machine learning architecture and an observed state of the environment”) and language generally linking the use of a judicial exception to a particular technological environment or field of use (“robot control system”). The above limitations in the context of this claim encompass the value function represents a position of a robot component (corresponds to evaluation with assistance of pen and paper), and wherein the value function is an input to a...system (corresponds to evaluation with assistance of pen and paper).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP
2106.05(f). The additional element(s) of “the system comprising: at least one memory and at least one processor configured” and “the at least one processor configured for”, as drafted, is/are reciting generic computer component(s). The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Moreover, the additional element of “generating signals for instructing the agent to perform an action based on the machine learning architecture and an observed state of the environment” amounts to mere instruction to apply the abstract idea by instructing an agent to perform action based on the abstract idea. Moreover, the recitation of “robot control system” amounts to language generally linking the use of a judicial exception to a particular technological environment or field of use, namely the environment of a robot control system. MPEP 2106.05(h) provides that “limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not amount to significantly more than the exception itself, and cannot integrate a judicial exception into a practical application.” Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the claim contains additional elements that amount to no more than mere instructions to apply the exception. Mere instructions to apply an exception cannot provide an inventive concept. Mere instructions to apply an exception cannot provide an inventive concept. Moreover, the recitation of “robot control system” amounts to language generally linking the use of a judicial exception to a particular technological environment or field of use, namely the environment of a robot control system. MPEP 2106.05(h) provides that “limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not amount to significantly more than the exception itself, and cannot integrate a judicial exception into a practical application.”  Therefore, the claim is not patent eligible.
Regarding Claim 9,
Claim 9 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 9 is directed to a system for a machine learning architecture, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:

    PNG
    media_image5.png
    448
    772
    media_image5.png
    Greyscale

as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply the exception language (“the system comprising: at least one memory and at least one processor configured” and “the at least one processor configured for” and “generating signals for instructing the agent to perform an action based on the machine learning architecture and an observed state of the environment”). The above limitations in the context of this claim encompass initializing vectors (corresponds to evaluation of input); determining value for variables based on mathematical calculations (correspond to evaluation based on mathematical calculations); and updating a state based on observation data (corresponds to observation, evaluation, and judgment).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP
2106.05(f). The additional element(s) of “the system comprising: at least one memory and at least one processor configured” and “the at least one processor configured for”, as drafted, is/are reciting generic computer component(s). The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Moreover, the additional element of “generating signals for instructing the agent to perform an action based on the machine learning architecture and an observed state of the environment” amounts to mere instruction to apply the abstract idea by instructing an agent to perform action based on the abstract idea. Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements amount to no more than mere instructions to apply the exception. Mere instructions to apply an exception cannot provide an inventive concept. Therefore, the claim is not patent eligible.
Regarding Claim 10,
Claim 10 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 10 is directed to a system for a machine learning architecture, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:

    PNG
    media_image6.png
    221
    621
    media_image6.png
    Greyscale


    PNG
    media_image7.png
    341
    726
    media_image7.png
    Greyscale

as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply the exception language (“the system comprising: at least one memory and at least one processor configured” and “the at least one processor configured for” and “generating signals for instructing the agent to perform an action based on the machine learning architecture and an observed state of the environment”). The above limitations in the context of this claim encompass initializing vectors (corresponds to evaluation of input); determining value for variables based on mathematical calculations (correspond to evaluation based on mathematical calculations); and updating a state based on observation data (corresponds to observation, evaluation, and judgment).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP
2106.05(f). The additional element(s) of “the system comprising: at least one memory and at least one processor configured” and “the at least one processor configured for”, as drafted, is/are reciting generic computer component(s). The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Moreover, the additional element of “generating signals for instructing the agent to perform an action based on the machine learning architecture and an observed state of the environment” amounts to mere instruction to apply the abstract idea by instructing an agent to perform action based on the abstract idea. Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements amount to no more than mere instructions to apply the exception. Mere instructions to apply an exception cannot provide an inventive concept. Therefore, the claim is not patent eligible.
Regarding Claim 11,
Claim 11 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 11 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
a machine learning architecture for estimating a value function for at least a portion of a given state, the value function defined at least in part on a plurality of weights
receiving one or more observation data sets representing one or more observations associated with at least a portion of a state of an environment in which an agent operates
training the machine learning architecture with the one or more observation data sets, where the training includes updating the plurality of weights based on: and error value and at least one time-varying step-size value; wherein the at least one time-varying step-size value is based on a set of meta-weights which vary based on a stochastic meta-descent and a trace of past updates to the plurality of weights
as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply the exception language (“generating signals for instructing the agent to perform an action based on the machine learning architecture and an observed state of the environment”). The above limitations in the context of this claim encompass a machine learning architecture for estimating a value function for at least a portion of a given state, the value function defined at least in part on a plurality of weights (corresponds to observation, evaluation, and judgment with assistance of pen and paper; the broadest reasonable interpretation of “machine learning architecture” would include machine learning models since the Specification points to “architectures involving reinforcement learning (RL)” as an example for “machine learning architecture” (Specification [0003]); a human mind, with the assistance of pen and paper, is able to implement a reinforcement learning model (a type of machine learning architecture) to estimate a value function since it entails observing and evaluating an agent taking actions in an environment, interpreting (evaluating) the action and the resulting state, evaluating/judging the reward corresponding to the agent’s action, and evaluating a value function based on the actions, states, and rewards); receiving one or more observation data sets representing one or more observations associated with at least a portion of a state of an environment in which an agent operates (corresponds to observation); and training the machine learning architecture with the one or more observation data sets, where the training includes updating the plurality of weights based on: and error value and at least one time-varying step-size value; wherein the at least one time-varying step-size value is based on a set of meta-weights which vary based on a stochastic meta-descent and a trace of past updates to the plurality of weights (corresponds to mathematical calculations as the claim specifically identifies that “training the machine learning architecture” involves updating weights based on error values (mathematical calculations) and determining step-size value based on meta-weights using stochastic meta-descent (mathematical calculations)).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP
2106.05(f). The additional element of “generating signals for instructing the agent to perform an action based on the machine learning architecture and an observed state of the environment” amounts to mere instruction to apply the abstract idea by instructing an agent to perform action based on the abstract idea. Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements amount to no more than mere instructions to apply the exception. Mere instructions to apply an exception cannot provide an inventive concept. Therefore, the claim is not patent eligible.
Regarding Claim 12,
Claim 12 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 12 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
wherein the stochastic meta-descent is defined using a full gradient.
as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply the exception language (“generating signals for instructing the agent to perform an action based on the machine learning architecture and an observed state of the environment”). The above limitations in the context of this claim encompass performing stochastic meta-descent with full gradient (mathematical calculations)).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP
2106.05(f). The additional element of “generating signals for instructing the agent to perform an action based on the machine learning architecture and an observed state of the environment” amounts to mere instruction to apply the abstract idea by instructing an agent to perform action based on the abstract idea. Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements amount to no more than mere instructions to apply the exception. Mere instructions to apply an exception cannot provide an inventive concept. Therefore, the claim is not patent eligible.
Regarding Claim 13,
Claim 13 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 13 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
wherein the stochastic meta-descent is defined using a semi-gradient.
as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply the exception language (“generating signals for instructing the agent to perform an action based on the machine learning architecture and an observed state of the environment”). The above limitations in the context of this claim encompass performing stochastic meta-descent with semi-gradient (mathematical calculations)).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP
2106.05(f). The additional element of “generating signals for instructing the agent to perform an action based on the machine learning architecture and an observed state of the environment” amounts to mere instruction to apply the abstract idea by instructing an agent to perform action based on the abstract idea. Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements amount to no more than mere instructions to apply the exception. Mere instructions to apply an exception cannot provide an inventive concept. Therefore, the claim is not patent eligible.
Regarding Claim 14,
Claim 14 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 14 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:

    PNG
    media_image8.png
    235
    775
    media_image8.png
    Greyscale

as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply the exception language (“generating signals for instructing the agent to perform an action based on the machine learning architecture and an observed state of the environment”). The above limitations in the context of this claim encompass updating the meta-weights based on an equation (mathematical equation)).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP
2106.05(f). The additional element of “generating signals for instructing the agent to perform an action based on the machine learning architecture and an observed state of the environment” amounts to mere instruction to apply the abstract idea by instructing an agent to perform action based on the abstract idea. Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements amount to no more than mere instructions to apply the exception. Mere instructions to apply an exception cannot provide an inventive concept. Therefore, the claim is not patent eligible.
Regarding Claim 15,
Claim 15 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 15 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:

    PNG
    media_image9.png
    40
    752
    media_image9.png
    Greyscale


    PNG
    media_image10.png
    216
    733
    media_image10.png
    Greyscale





as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply the exception language (“generating signals for instructing the agent to perform an action based on the machine learning architecture and an observed state of the environment”). The above limitations in the context of this claim encompass updating the meta-weights based on an equation (mathematical equation)).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP
2106.05(f). The additional element of “generating signals for instructing the agent to perform an action based on the machine learning architecture and an observed state of the environment” amounts to mere instruction to apply the abstract idea by instructing an agent to perform action based on the abstract idea. Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements amount to no more than mere instructions to apply the exception. Mere instructions to apply an exception cannot provide an inventive concept. Therefore, the claim is not patent eligible.
Regarding Claim 16,
Claim 16 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 16 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
comprising: applying new inputs to the machine learning architecture with the updated weights
as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply the exception language (“generating signals for instructing the agent to perform an action based on the machine learning architecture and an observed state of the environment”). The above limitations in the context of this claim encompass applying new inputs to the machine learning architecture with the updated weights (corresponds to evaluation of input; also see analysis above for claim 11 regarding “machine learning architecture”).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP
2106.05(f). The additional element of “generating signals for instructing the agent to perform an action based on the machine learning architecture and an observed state of the environment” amounts to mere instruction to apply the abstract idea by instructing an agent to perform action based on the abstract idea. Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements amount to no more than mere instructions to apply the exception. Mere instructions to apply an exception cannot provide an inventive concept. Therefore, the claim is not patent eligible.


Regarding Claim 17,
Claim 17 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 17 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The claim, as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply the exception language (“generating signals for instructing the agent to perform an action based on the machine learning architecture and an observed state of the environment”) and the recitation of language generally linking the use of a judicial exception to a particular technological environment or field of use (“wherein the machine learning architecture is at least a portion of a reinforcement learning system”). Also see analysis of claim 11 above.
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP
2106.05(f). The additional element of “generating signals for instructing the agent to perform an action based on the machine learning architecture and an observed state of the environment” amounts to mere instruction to apply the abstract idea by instructing an agent to perform action based on the abstract idea. The recitation of “wherein the machine learning architecture is at least a portion of a reinforcement learning system” amounts to language generally linking the use of a judicial exception to a particular technological environment or field of use, namely the environment of a reinforcement learning system. MPEP 2106.05(h) provides that “limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not amount to significantly more than the exception itself, and cannot integrate a judicial exception into a practical application.” Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the claim contains additional elements that amount to no more than mere instructions to apply the exception. Mere instructions to apply an exception cannot provide an inventive concept. As discussed above with respect to integration of the abstract idea into a practical application, the recitation of “wherein the machine learning architecture is at least a portion of a reinforcement learning system” amounts to language generally linking the use of a judicial exception to a particular technological environment or field of use, namely the environment of a reinforcement learning system. MPEP 2106.05(h) provides that “limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not amount to significantly more than the exception itself, and cannot integrate a judicial exception into a practical application.”  Therefore, the claim is not patent eligible.
Regarding Claim 18,
Claim 18 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 18 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
wherein the value function is at least a component of a representation of a position of a robot component, and 
the value function is an input to a...system.
Step 2A Prong One Analysis: The claim, as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply the exception language (“generating signals for instructing the agent to perform an action based on the machine learning architecture and an observed state of the environment”) and the recitation of language generally linking the use of a judicial exception to a particular technological environment or field of use (“robot control system”). The above limitations in the context of this claim encompass the value function represents a position of a robot component (corresponds to evaluation with assistance of pen and paper), and wherein the value function is an input to a...system (corresponds to evaluation with assistance of pen and paper).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP
2106.05(f). The additional element of “generating signals for instructing the agent to perform an action based on the machine learning architecture and an observed state of the environment” amounts to mere instruction to apply the abstract idea by instructing an agent to perform action based on the abstract idea. The recitation of “robot control system” amounts to language generally linking the use of a judicial exception to a particular technological environment or field of use, namely the environment of a robot control system. MPEP 2106.05(h) provides that “limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not amount to significantly more than the exception itself, and cannot integrate a judicial exception into a practical application.” Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the claim contains additional elements that amount to no more than mere instructions to apply the exception. Mere instructions to apply an exception cannot provide an inventive concept. As discussed above with respect to integration of the abstract idea into a practical application, the recitation of “robot control system” amounts to language generally linking the use of a judicial exception to a particular technological environment or field of use, namely the environment of a robot control system. MPEP 2106.05(h) provides that “limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not amount to significantly more than the exception itself, and cannot integrate a judicial exception into a practical application.”  Therefore, the claim is not patent eligible.
Regarding Claim 19,
Claim 19 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 19 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:

    PNG
    media_image11.png
    454
    774
    media_image11.png
    Greyscale

as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply the exception language (“generating signals for instructing the agent to perform an action based on the machine learning architecture and an observed state of the environment”). The above limitations in the context of this claim encompass initializing vectors (corresponds to evaluation of input); determining value for variables based on mathematical calculations (correspond to evaluation based on mathematical calculations); and updating a state based on observation data (corresponds to observation, evaluation, and judgment).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP
2106.05(f). The additional element of “generating signals for instructing the agent to perform an action based on the machine learning architecture and an observed state of the environment” amounts to mere instruction to apply the abstract idea by instructing an agent to perform action based on the abstract idea. Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements amount to no more than mere instructions to apply the exception. Mere instructions to apply an exception cannot provide an inventive concept. Therefore, the claim is not patent eligible.
Regarding Claim 20,
Claim 20 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 20 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:

    PNG
    media_image12.png
    175
    575
    media_image12.png
    Greyscale


    PNG
    media_image13.png
    402
    726
    media_image13.png
    Greyscale

as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply the exception language (“generating signals for instructing the agent to perform an action based on the machine learning architecture and an observed state of the environment”). The above limitations in the context of this claim encompass initializing vectors (corresponds to evaluation of input); determining value for variables based on mathematical calculations (correspond to evaluation based on mathematical calculations); and updating a state based on observation data (corresponds to observation, evaluation, and judgment).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP
2106.05(f). The additional element of “generating signals for instructing the agent to perform an action based on the machine learning architecture and an observed state of the environment” amounts to mere instruction to apply the abstract idea by instructing an agent to perform action based on the abstract idea. Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements amount to no more than mere instructions to apply the exception. Mere instructions to apply an exception cannot provide an inventive concept. Therefore, the claim is not patent eligible.
Regarding Claim 21,
Claim 21 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 21 is directed to a device or system, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
determining a potential action in an observed environment 
provide a machine learning architecture for estimating a value function for at least a portion of a given state of the observed environment in which an agent operates
generate one or more signals for communicating or causing the potential action to be taken based on the value function
wherein the machine learning architecture was trained based on updating the plurality of weights in the machine learning architecture based on: an error value and at least one time-varying step-size value; wherein the at least one time-varying step-size value is based on a set of meta-weights which vary based on a stochastic meta-descent and a trace of past updates to the plurality of weights 
as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply the exception language (“at least one memory and at least one processor configured to”). The above limitations in the context of this claim encompass determining a potential action in an observed environment (observation and evaluation); provide a machine learning architecture for estimating a value function for at least a portion of a given state of the observed environment in which an agent operates (corresponds to observation, evaluation, and judgment with assistance of pen and paper; the broadest reasonable interpretation of “machine learning architecture” would include machine learning models since the Specification points to “architectures involving reinforcement learning (RL)” as an example for “machine learning architecture” (Specification [0003]); a human mind, with the assistance of pen and paper, is able to implement a reinforcement learning model (a type of machine learning architecture) to estimate a value function since it entails observing and evaluating an agent taking actions in an environment, interpreting (evaluating) the action and the resulting state, evaluating/judging the reward corresponding to the agent’s action, and evaluating a value function based on the actions, states, and rewards); generate one or more signals for communicating or causing the potential action to be taken based on the value function (corresponds to evaluation through using the value function and output an indication of judgment regarding which action to take, which can be performed with assistance of pen and paper); and wherein the machine learning architecture was trained based on updating the plurality of weights in the machine learning architecture based on: an error value and at least one time-varying step-size value; wherein the at least one time-varying step-size value is based on a set of meta-weights which vary based on a stochastic meta-descent and a trace of past updates to the plurality of weights (corresponds to mathematical calculations as the claim specifically identifies that “training the machine learning architecture” involves updating weights based on error values (mathematical calculations) and determining step-size value based on meta-weights using stochastic meta-descent (mathematical calculations)).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP
2106.05(f). The additional element(s) of “at least one memory and at least one processor configured to”, as drafted, is/are reciting generic computer component(s). The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements amount to no more than mere instructions to apply the exception. Mere instructions to apply an exception cannot provide an inventive concept. Therefore, the claim is not patent eligible.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 7, 11, 17, and 21 are rejected under 35 U.S.C. 103 as being unpatentable over SHIBATA (US 2018/0099408 A1) in view of Mahmood (“AUTOMATIC STEP-SIZE ADAPTATION IN INCREMENTAL SUPERVISED LEARNING”).
Regarding Claim 1,
SHIBATA teaches A system for a machine learning architecture, the system comprising: at least one memory and at least one processor configured to provide the machine learning architecture for estimating a value function for at least a portion of a given state (pg. 4 [0067]-[0068] teach using machine learning system such as reinforcement learning system; pg. 5 [0087] teaches “a function to approximate Q(s, a)”, which corresponds to estimating a value function for the Q-value based on a given state; pg. 4 [0070] teaches processor; pg. 5 [0087]-[0088] teaches the neural network (machine learning architecture), which implements the approximate function for reinforcement learning, is implemented by a system with a memory), 
the value function defined at least in part on a plurality of weights (pg. 5 [0087]: “Examples of the method for expressing Q(s, a) on a computer include a method for preserving the values of all state action pairs (s, a) as a table, and a method for preparing a function to approximate Q(s, a). In the latter method, the above update expression may be achieved by adjusting a parameter of the approximate function using a method, such as stochastic gradient descent. A neural network may be used as the approximate function” teaches using a neural network to implement (define) the approximate function for Q(s, a), which corresponds to estimating the value function; pg. 5 [0089] teaches using the neural network involves determining a plurality of weights);
the at least one processor configured for (pg. 4 [0070] teaches processor):
receiving one or more observation data sets representing one or more observations associated with at least a portion of a state of an environment in which an agent operates (pg. 5 [0074]-[0076]: “The reinforcement learning will now be described. Problems are set in reinforcement learning as follows...A processing machine observes the state of environment, and decides an action...The environment varies in accordance with some rules, and your action may vary the environment” teach in reinforcement learning, the processing machine receives observations data associated with the state of an environment; pg. 5 [0081]-[0082]: “The explanation of reinforcement learning will be continued below using, for example, Q-learning...Q-learning is a method for learning a value Q(s, a) at which an action a is selected under an environmental state s. In other words, it is only required that the action a having the highest value Q(s, a) is selected as an optimal action a, under a given state s. However, initially, the correct value of the value Q(s, a) for a combination of the state s and the action a is completely unknown. Then, the agent (the subject of an action) selects various actions a under a given state s, and gives rewards to the actions a at that time. Thus, the agent learns selection of a more beneficial action, i.e., the correct value Q(s, a)” teaches in Q-learning (a type of reinforcement learning), an agent operates in a state of the environment and performs actions; Fig. 5A provides an example of receiving input data associated with state variable from the environment); 
training the machine learning architecture with the one or more observation data sets, where the training includes updating the plurality of weights based on: an error value (pg. 6 [0096]: “The weights Wl to W3 may be learned by an error backpropagation method. The information on errors is introduced from the right side to the left side. The error backpropagation method is a method for adjusting (learning) each weight so as to reduce a difference between the output y when the input x is input and the true output y (teacher) in each neuron” teaches training the neural network (machine learning architecture) includes updating weights based on error value through an error backpropagation method; Fig. 5B teaches training of neural network is based on input observations)...
generating signals for instructing the agent to perform an action based on the machine learning architecture and an observed state of the environment (pg. 5 [0081]-[0082]: “The explanation of reinforcement learning will be continued below using, for example, Q-learning...Q-learning is a method for learning a value Q(s, a) at which an action a is selected under an environmental state s. In other words, it is only required that the action a having the highest value Q(s, a) is selected as an optimal action a, under a given state s. However, initially, the correct value of the value Q(s, a) for a combination of the state s and the action a is completely unknown. Then, the agent (the subject of an action) selects various actions a under a given state s, and gives rewards to the actions a at that time. Thus, the agent learns selection of a more beneficial action, i.e., the correct value Q(s, a)” and pg. 5 [0087]: “Examples of the method for expressing Q(s, a) on a computer include a method for preserving the values of all state action pairs (s, a) as a table, and a method for preparing a function to approximate Q(s, a). In the latter method, the above update expression may be achieved by adjusting a parameter of the approximate function using a method, such as stochastic gradient descent. A neural network may be used as the approximate function” teach generating reward values for respective actions to instruct an agent to perform an action based on neural network (corresponds to machine learning architecture) and an observed state of the environment; pg. 5 [0077]: “A reward signal is returned at each action” teaches rewards are represented by signals).
 SHIBATA does not appear to explicitly teach where the training includes updating the plurality of weights based on:...and at least one time-varying step-size value; wherein the at least one time-varying step-size value is based on a set of meta-weights which vary based on a stochastic meta-descent and a trace of past updates to the plurality of weights.
However, Mahmood teaches where the training includes updating the plurality of weights based on:...and at least one time-varying step-size value (pg. 38 third full paragraph: “In this chapter we have established that step-size adaptation algorithms can effectively adapt the step-size parameter on nonstationary problems. Meta-descent algorithms—IDBD, SMD, K1 and ALAP—achieved at least two times better performance than LMS” teaches meta-descent algorithms include the K1 algorithm; pg. 39 second full paragraph:
    PNG
    media_image14.png
    330
    925
    media_image14.png
    Greyscale

teaches training includes updating weights based on a step size value (
    PNG
    media_image15.png
    37
    78
    media_image15.png
    Greyscale
) that depends on variations in the time parameter (t)); 
wherein the at least one time-varying step-size value is based on a set of meta-weights which vary based on a stochastic meta-descent and a trace of past updates to the plurality of weights (pg. 13 second full paragraph: “Meta-descent algorithms iteratively update the step-size parameter based on a gradient-descent rule. A gradient-descent rule is typically used for updating the weights of the learning system. Therefore, when a gradient-descent rule is also applied to learn the step-size parameter, we refer to it as a meta-descent algorithm” teaches using stochastic meta-descent to learn step-size value and weights; pg. 15 Equation (2.2) and pg. 15 last paragraph to pg. 16:

    PNG
    media_image16.png
    545
    646
    media_image16.png
    Greyscale
 
    PNG
    media_image17.png
    66
    625
    media_image17.png
    Greyscale

teaches the step-size value (
    PNG
    media_image18.png
    24
    47
    media_image18.png
    Greyscale
) in a meta-descent algorithm is based on a set of meta-weights and 
    PNG
    media_image19.png
    28
    162
    media_image19.png
    Greyscale
 (gradients of the sample squared error with respect to the weight vector in two successive time steps correspond to trace of past updates to the plurality of weights, wherein time t and time t-1 are time steps in the past compared to time step t+1)).
SHIBATA and Mahmood are analogous art to the claimed invention because they are directed to analyzing weights in machine learning based systems.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Mahmood to the disclosed invention of SHIBATA.
One of ordinary skill in the arts would have been motivated to make this modification because “Meta-descent algorithms—IDBD, SMD, K1 and ALAP—achieved at least two times better performance than LMS” (Mahmood pg. 38 third full paragraph).
Regarding Claim 7,
SHIBATA in view of Mahmood teaches the system of claim 1.
SHIBATA further teaches wherein the machine learning architecture is at least a portion of a reinforcement learning system (pg. 5 [0087] teaches applying neural network (machine learning architecture) as the approximate function in a reinforcement learning system, thus rendering the neural network (machine learning architecture) is at least a portion of the reinforcement learning system).
Regarding Claim 11,
SHIBATA teaches A method for a machine learning architecture for estimating a value function for at least a portion of a given state (pg. 4 [0067]-[0068] teach using machine learning system such as reinforcement learning system; pg. 5 [0087] teaches “a function to approximate Q(s, a)”, which corresponds to estimating a value function for the Q-value based on a given state; pg. 4 [0070] teaches processor; pg. 5 [0087]-[0088] teaches the neural network (machine learning architecture), which implements the approximate function for reinforcement learning, is implemented by a system with a memory), 
the value function defined at least in part on a plurality of weights, the method comprising (pg. 5 [0087]: “Examples of the method for expressing Q(s, a) on a computer include a method for preserving the values of all state action pairs (s, a) as a table, and a method for preparing a function to approximate Q(s, a). In the latter method, the above update expression may be achieved by adjusting a parameter of the approximate function using a method, such as stochastic gradient descent. A neural network may be used as the approximate function” teaches using a neural network to implement (define) the approximate function for Q(s, a), which corresponds to estimating the value function; pg. 5 [0089] teaches using the neural network involves determining a plurality of weights):
receiving one or more observation data sets representing one or more observations associated with at least a portion of a state of an environment in which an agent operates (pg. 5 [0074]-[0076]: “The reinforcement learning will now be described. Problems are set in reinforcement learning as follows...A processing machine observes the state of environment, and decides an action...The environment varies in accordance with some rules, and your action may vary the environment” teach in reinforcement learning, the processing machine receives observations data associated with the state of an environment; pg. 5 [0081]-[0082]: “The explanation of reinforcement learning will be continued below using, for example, Q-learning...Q-learning is a method for learning a value Q(s, a) at which an action a is selected under an environmental state s. In other words, it is only required that the action a having the highest value Q(s, a) is selected as an optimal action a, under a given state s. However, initially, the correct value of the value Q(s, a) for a combination of the state s and the action a is completely unknown. Then, the agent (the subject of an action) selects various actions a under a given state s, and gives rewards to the actions a at that time. Thus, the agent learns selection of a more beneficial action, i.e., the correct value Q(s, a)” teaches in Q-learning (a type of reinforcement learning), an agent operates in a state of the environment and performs actions; Fig. 5A provides an example of receiving input data associated with state variable from the environment); and
training the machine learning architecture with the one or more observation data sets, where the training includes updating the plurality of weights based on: and error value (pg. 5 [0087]-[0088] teaches the neural network implements the approximate function for reinforcement learning, therefore rendering training the neural network is part of the training of the reinforcement learning model; pg. 6 [0096]: “The weights Wl to W3 may be learned by an error backpropagation method. The information on errors is introduced from the right side to the left side. The error backpropagation method is a method for adjusting (learning) each weight so as to reduce a difference between the output y when the input x is input and the true output y (teacher) in each neuron” training the neural network (machine learning architecture) includes updating weights based on error value through an error backpropagation method; Fig. 5B teaches training of neural network is based on input observations)...
and generating signals for instructing the agent to perform an action based on the machine learning architecture and an observed state of the environment (pg. 5 [0081]-[0082]: “The explanation of reinforcement learning will be continued below using, for example, Q-learning...Q-learning is a method for learning a value Q(s, a) at which an action a is selected under an environmental state s. In other words, it is only required that the action a having the highest value Q(s, a) is selected as an optimal action a, under a given state s. However, initially, the correct value of the value Q(s, a) for a combination of the state s and the action a is completely unknown. Then, the agent (the subject of an action) selects various actions a under a given state s, and gives rewards to the actions a at that time. Thus, the agent learns selection of a more beneficial action, i.e., the correct value Q(s, a)” and pg. 5 [0087]: “Examples of the method for expressing Q(s, a) on a computer include a method for preserving the values of all state action pairs (s, a) as a table, and a method for preparing a function to approximate Q(s, a). In the latter method, the above update expression may be achieved by adjusting a parameter of the approximate function using a method, such as stochastic gradient descent. A neural network may be used as the approximate function” teach generating reward values for respective actions to instruct an agent to perform an action based on neural network (corresponds to machine learning architecture) and an observed state of the environment; pg. 5 [0077]: “A reward signal is returned at each action” teaches rewards are represented by signals).
SHIBATA does not appear to explicitly teach where the training includes updating the plurality of weights based on:...and at least one time-varying step-size value; wherein the at least one time-varying step-size value is based on a set of meta-weights which vary based on a stochastic meta-descent and a trace of past updates to the plurality of weights.
However, Mahmood teaches where the training includes updating the plurality of weights based on:...and at least one time-varying step-size value (pg. 38 third full paragraph: “In this chapter we have established that step-size adaptation algorithms can effectively adapt the step-size parameter on nonstationary problems. Meta-descent algorithms—IDBD, SMD, K1 and ALAP—achieved at least two times better performance than LMS” teaches meta-descent algorithms include the K1 algorithm; pg. 39 second full paragraph:
    PNG
    media_image14.png
    330
    925
    media_image14.png
    Greyscale

teaches training includes updating weights based on a step size value (
    PNG
    media_image15.png
    37
    78
    media_image15.png
    Greyscale
) that depends on variations in the time parameter (t)); 
wherein the at least one time-varying step-size value is based on a set of meta-weights which vary based on a stochastic meta-descent and a trace of past updates to the plurality of weights (pg. 13 second full paragraph: “Meta-descent algorithms iteratively update the step-size parameter based on a gradient-descent rule. A gradient-descent rule is typically used for updating the weights of the learning system. Therefore, when a gradient-descent rule is also applied to learn the step-size parameter, we refer to it as a meta-descent algorithm” teaches using stochastic meta-descent to learn step-size value and weights; pg. 15 Equation (2.2) and pg. 15 last paragraph to pg. 16:

    PNG
    media_image16.png
    545
    646
    media_image16.png
    Greyscale
 
    PNG
    media_image17.png
    66
    625
    media_image17.png
    Greyscale

teaches the step-size value (
    PNG
    media_image18.png
    24
    47
    media_image18.png
    Greyscale
) in a meta-descent algorithm is based on a set of meta-weights and 
    PNG
    media_image19.png
    28
    162
    media_image19.png
    Greyscale
 (gradients of the sample squared error with respect to the weight vector in two successive time steps correspond to trace of past updates to the plurality of weights, wherein time t and time t-1 are time steps in the past compared to time step t+1)).
SHIBATA and Mahmood are analogous art to the claimed invention because they are directed to analyzing weights in machine learning based systems.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Mahmood to the disclosed invention of SHIBATA.
One of ordinary skill in the arts would have been motivated to make this modification because “Meta-descent algorithms—IDBD, SMD, K1 and ALAP—achieved at least two times better performance than LMS” (Mahmood pg. 38 third full paragraph).
Regarding Claim 17,
SHIBATA in view of Mahmood teaches the method of claim 11.
SHIBATA further teaches wherein the machine learning architecture is at least a portion of a reinforcement learning system (pg. 5 [0087] teaches applying neural network (machine learning architecture) as the approximate function in a reinforcement learning system, thus rendering the neural network (machine learning architecture) is at least a portion of the reinforcement learning system).
Regarding Claim 21,
SHIBATA teaches A device or system for determining a potential action in an observed environment, the device or system comprising: at least one memory and at least one processor configured to provide a machine learning architecture for estimating a value function for at least a portion of a given state of the observed environment in which an agent operates (pg. 4 [0067]-[0068] teach using machine learning system such as reinforcement learning system; pg. 5 [0087] teaches “a function to approximate Q(s, a)”, which corresponds to estimating a value function for the Q-value based on a given state; pg. 4 [0070] teaches processor; pg. 5 [0087]-[0088] teaches the neural network (machine learning architecture), which implements the approximate function for reinforcement learning, is implemented by a system with a memory; pg. 5 [0081]-[0082]: “The explanation of reinforcement learning will be continued below using, for example, Q-learning...Q-learning is a method for learning a value Q(s, a) at which an action a is selected under an environmental state s. In other words, it is only required that the action a having the highest value Q(s, a) is selected as an optimal action a, under a given state s. However, initially, the correct value of the value Q(s, a) for a combination of the state s and the action a is completely unknown. Then, the agent (the subject of an action) selects various actions a under a given state s, and gives rewards to the actions a at that time. Thus, the agent learns selection of a more beneficial action, i.e., the correct value Q(s, a)” teaches in Q-learning (a type of reinforcement learning), an agent operates in a state of the environment and performs actions), 
and generate one or more signals for communicating or causing the potential action to be taken by the agent based on the value function (pg. 5 [0082]: “Then, the agent (the subject of an action) selects various actions a under a given state s, and gives rewards to the actions a at that time. Thus, the agent learns selection of a more beneficial action, i.e., the correct value Q(s, a)” and pg. 5 [0087]: “a method for preparing a function to approximate Q(s, a)” teach determining potential action to be taken based on a Q(s,a), which is approximated by a function (corresponds to value function); Figs. 5A-B and pg. 6 [0098] teaches a robot system generating an output signal to cause the robot (agent) to take an action based on the analysis of the machine learning device);
wherein the machine learning architecture was trained based on updating the plurality of weights in the machine learning architecture based on: an error value (pg. 5 [0087]-[0088] teaches the neural network implements the approximate function for reinforcement learning, therefore rendering training the neural network is part of the training of the reinforcement learning model; pg. 6 [0096]: “The weights Wl to W3 may be learned by an error backpropagation method. The information on errors is introduced from the right side to the left side. The error backpropagation method is a method for adjusting (learning) each weight so as to reduce a difference between the output y when the input x is input and the true output y (teacher) in each neuron” teaches training the neural network (machine learning architecture) includes updating weights based on error value through an error backpropagation method; Fig. 5B teaches training of neural network is based on input observations).
 SHIBATA does not appear to explicitly teach wherein the machine learning architecture was trained based on:...and at least one time-varying step-size value; wherein the at least one time-varying step-size value is based on a set of meta-weights which vary based on a stochastic meta-descent and a trace of past updates to the plurality of weights.
However, Mahmood teaches wherein the machine learning architecture was trained based on:...and at least one time-varying step-size value (pg. 38 third full paragraph: “In this chapter we have established that step-size adaptation algorithms can effectively adapt the step-size parameter on nonstationary problems. Meta-descent algorithms—IDBD, SMD, K1 and ALAP—achieved at least two times better performance than LMS” teaches meta-descent algorithms include the K1 algorithm; pg. 39 second full paragraph:
    PNG
    media_image14.png
    330
    925
    media_image14.png
    Greyscale

teaches training includes updating weights based on a step size value (
    PNG
    media_image15.png
    37
    78
    media_image15.png
    Greyscale
) that depends on variations in the time parameter (t)); 
wherein the at least one time-varying step-size value is based on a set of meta-weights which vary based on a stochastic meta-descent and a trace of past updates to the plurality of weights (pg. 13 second full paragraph: “Meta-descent algorithms iteratively update the step-size parameter based on a gradient-descent rule. A gradient-descent rule is typically used for updating the weights of the learning system. Therefore, when a gradient-descent rule is also applied to learn the step-size parameter, we refer to it as a meta-descent algorithm” teaches using stochastic meta-descent to learn step-size value and weights; pg. 15 Equation (2.2) and pg. 15 last paragraph to pg. 16:

    PNG
    media_image16.png
    545
    646
    media_image16.png
    Greyscale
 
    PNG
    media_image17.png
    66
    625
    media_image17.png
    Greyscale

teaches the step-size value (
    PNG
    media_image18.png
    24
    47
    media_image18.png
    Greyscale
) in a meta-descent algorithm is based on a set of meta-weights and 
    PNG
    media_image19.png
    28
    162
    media_image19.png
    Greyscale
 (gradients of the sample squared error with respect to the weight vector in two successive time steps correspond to trace of past updates to the plurality of weights, wherein time t and time t-1 are time steps in the past compared to time step t+1)).
SHIBATA and Mahmood are analogous art to the claimed invention because they are directed to analyzing weights in machine learning based systems.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Mahmood to the disclosed invention of SHIBATA.
One of ordinary skill in the arts would have been motivated to make this modification because “Meta-descent algorithms—IDBD, SMD, K1 and ALAP—achieved at least two times better performance than LMS” (Mahmood pg. 38 third full paragraph).

Claims 2 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over SHIBATA (US 2018/0099408 A1) in view of Mahmood (“AUTOMATIC STEP-SIZE ADAPTATION IN INCREMENTAL SUPERVISED LEARNING”) and further in view of Schraudolph et al. (“Online Independent Component Analysis With Local Learning Rate Adaptation”).
Regarding Claim 2,
SHIBATA in view of Mahmood teaches the system of claim 1.
SHIBATA in view of Mahmood does not appear to explicitly teach wherein the stochastic meta-descent is defined using a full gradient.
However, Schraudolph et al. teaches wherein the stochastic meta-descent is defined using a full gradient (pg. 789 last full paragraph: “We apply stochastic meta-descent (SMD), a new online adaptation method for local learning rates [3, 4], to an extended Bell-Sejnowski ICA algorithm [5] with natural gradient [6] and kurtosis estimation [7] modifications. The resulting algorithm is capable of separating and tracking a time-varying mixture of 10 sources whose unknown mixing coefficients change at different rates” teaches the stochastic meta descent method in the disclosure is defined using the natural gradient (correspond to full, or ordinary, gradient)).
SHIBATA, Mahmood, and Schraudolph et al. are analogous art to the claimed invention because they are directed to analyzing weights in machine learning based systems.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein the stochastic meta-descent is defined using a full gradient as taught by Schraudolph et al. to the disclosed invention of SHIBATA in view of Mahmood.
One of ordinary skill in the arts would have been motivated to make this modification in order to leverage an algorithm that applies stochastic meta-descent to an extended Bell-Sejnowski ICA algorithm with natural gradient because “the resulting algorithm is capable of separating and tracking a time-varying mixture of 10 sources whose unknown mixing coefficients change at different rates” (Schraudolph et al. pg. 789 last full paragraph).
Regarding Claim 12,
SHIBATA in view of Mahmood teaches the method of claim 11.
SHIBATA in view of Mahmood does not appear to explicitly teach wherein the stochastic meta-descent is defined using a full gradient.
However, Schraudolph et al. teaches wherein the stochastic meta-descent is defined using a full gradient (pg. 789 last full paragraph: “We apply stochastic meta-descent (SMD), a new online adaptation method for local learning rates [3, 4], to an extended Bell-Sejnowski ICA algorithm [5] with natural gradient [6] and kurtosis estimation [7] modifications. The resulting algorithm is capable of separating and tracking a time-varying mixture of 10 sources whose unknown mixing coefficients change at different rates” teaches the stochastic meta descent method in the disclosure is defined using the natural gradient (correspond to full, or ordinary, gradient)).
SHIBATA, Mahmood, and Schraudolph et al. are analogous art to the claimed invention because they are directed to analyzing weights in machine learning based systems.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein the stochastic meta-descent is defined using a full gradient as taught by Schraudolph et al. to the disclosed invention of SHIBATA in view of Mahmood.
One of ordinary skill in the arts would have been motivated to make this modification in order to leverage an algorithm that applies stochastic meta-descent to an extended Bell-Sejnowski ICA algorithm with natural gradient because “the resulting algorithm is capable of separating and tracking a time-varying mixture of 10 sources whose unknown mixing coefficients change at different rates” (Schraudolph et al. pg. 789 last full paragraph).

Claims 3-5 and 13-15 are rejected under 35 U.S.C. 103 as being unpatentable over SHIBATA (US 2018/0099408 A1) in view of Mahmood (“AUTOMATIC STEP-SIZE ADAPTATION IN INCREMENTAL SUPERVISED LEARNING”) and further in view of Sutton (“Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta”).
Regarding Claim 3,
SHIBATA in view of Mahmood teaches the system of claim 1.
SHIBATA in view of Mahmood does not appear to explicitly teach wherein the stochastic meta-descent is defined using a semi-gradient.
However, Sutton teaches wherein the stochastic meta-descent is defined using a semi-gradient (pg. 174 last full paragraph and Equation (9) to pg. 175 and Equation (10):


    PNG
    media_image20.png
    125
    578
    media_image20.png
    Greyscale


    PNG
    media_image21.png
    557
    751
    media_image21.png
    Greyscale

teaches deriving the IBDB algorithm as gradient descent in which a meta step-size is involved (thus rendering the process to be a stochastic meta-descent method); Equation 10 teaches an approximated (semi) gradient).
SHIBATA, Mahmood, and Sutton are analogous art to the claimed invention because they are directed to analyzing weights in machine learning based systems.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein the stochastic meta-descent is defined using a semi-gradient as taught by Sutton to the disclosed invention of SHIBATA in view of Mahmood.
One of ordinary skill in the arts would have been motivated to make this modification in order to leverage the analysis of deriving the Incremental Delta-Bar-Delta (IDBD) algorithm as gradient descent because such an “analysis refines previous analyses by improving certain approximations and by being applicable to incremental training” (Sutton pg. 175 Conclusion).
Regarding Claim 4,
SHIBATA in view of Mahmood teaches the system of claim 1.
SHIBATA in view of Mahmood does not appear to explicitly teach 
    PNG
    media_image2.png
    204
    748
    media_image2.png
    Greyscale

However, Sutton teaches  
    PNG
    media_image2.png
    204
    748
    media_image2.png
    Greyscale


(pg. 174 last full paragraph and Equation (9):
    PNG
    media_image20.png
    125
    578
    media_image20.png
    Greyscale
teaches updating the meta-weights based on Equation (9); pg. 175 first paragraph teaches                         
                            
                                
                                    β
                                
                                
                                    i
                                
                            
                            (
                            t
                            )
                        
                     represents meta-weight and 
    PNG
    media_image22.png
    38
    38
    media_image22.png
    Greyscale
 represents meta step-size; pg. 174 third full paragraph teaches 
    PNG
    media_image23.png
    32
    48
    media_image23.png
    Greyscale
  represents sample error (difference) with respect to temporal element t).
SHIBATA, Mahmood, and Sutton are analogous art to the claimed invention because they are directed to analyzing weights in machine learning based systems.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate above limitation(s) as taught by Sutton to the disclosed invention of SHIBATA in view of Mahmood.
One of ordinary skill in the arts would have been motivated to make this modification in order to leverage the analysis of deriving the Incremental Delta-Bar-Delta (IDBD) algorithm as gradient descent because such an “analysis refines previous analyses by improving certain approximations and by being applicable to incremental training” (Sutton pg. 175 Conclusion).
Regarding Claim 5,
SHIBATA in view of Mahmood teaches the system of claim 1.
SHIBATA in view of Mahmood does not appear to explicitly teach 
    PNG
    media_image3.png
    132
    754
    media_image3.png
    Greyscale
 
    PNG
    media_image4.png
    118
    731
    media_image4.png
    Greyscale

However, Sutton teaches  
    PNG
    media_image3.png
    132
    754
    media_image3.png
    Greyscale
 
    PNG
    media_image4.png
    118
    731
    media_image4.png
    Greyscale


(pg. 175 first paragraph and Equation (10): 
    PNG
    media_image24.png
    200
    567
    media_image24.png
    Greyscale
teaches updating meta-weights based on Equation (10); pg. 175 first paragraph teaches                         
                            
                                
                                    β
                                
                                
                                    i
                                
                            
                            (
                            t
                            )
                        
                     represents meta-weight and 
    PNG
    media_image22.png
    38
    38
    media_image22.png
    Greyscale
 represents meta step-size; pg. 174 third full paragraph teaches 
    PNG
    media_image23.png
    32
    48
    media_image23.png
    Greyscale
  represents sample error (difference) with respect to temporal element t; pg. 174 fourth full paragraph teaches                         
                            
                                
                                    w
                                
                                
                                    i
                                
                            
                            (
                            t
                            )
                        
                     as weight i at time t).
SHIBATA, Mahmood, and Sutton are analogous art to the claimed invention because they are directed to analyzing weights in machine learning based systems.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate above limitation(s) as taught by Sutton to the disclosed invention of SHIBATA in view of Mahmood.
One of ordinary skill in the arts would have been motivated to make this modification in order to leverage the analysis of deriving the Incremental Delta-Bar-Delta (IDBD) algorithm as gradient descent because such an “analysis refines previous analyses by improving certain approximations and by being applicable to incremental training” (Sutton pg. 175 Conclusion).
Regarding Claim 13,
SHIBATA in view of Mahmood teaches the method of claim 11.
SHIBATA in view of Mahmood does not appear to explicitly teach wherein the stochastic meta-descent is defined using a semi-gradient.
However, Sutton teaches wherein the stochastic meta-descent is defined using a semi-gradient (pg. 174 last full paragraph and Equation (9) to pg. 175 and Equation (10):


    PNG
    media_image20.png
    125
    578
    media_image20.png
    Greyscale


    PNG
    media_image21.png
    557
    751
    media_image21.png
    Greyscale

teaches deriving the IBDB algorithm as gradient descent in which a meta step-size is involved (thus rendering the process to be a stochastic meta-descent method); Equation 10 teaches an approximated (semi) gradient).
SHIBATA, Mahmood, and Sutton are analogous art to the claimed invention because they are directed to analyzing weights in machine learning based systems.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein the stochastic meta-descent is defined using a semi-gradient as taught by Sutton to the disclosed invention of SHIBATA in view of Mahmood.
One of ordinary skill in the arts would have been motivated to make this modification in order to leverage the analysis of deriving the Incremental Delta-Bar-Delta (IDBD) algorithm as gradient descent because such an “analysis refines previous analyses by improving certain approximations and by being applicable to incremental training” (Sutton pg. 175 Conclusion).


Regarding Claim 14,
SHIBATA in view of Mahmood teaches the system of claim 11. [Interpreted as “method of claim 11”, see 35 U.S.C. 112(b) rejection]
SHIBATA in view of Mahmood does not appear to explicitly teach 

    PNG
    media_image8.png
    235
    775
    media_image8.png
    Greyscale

However, Sutton teaches  

    PNG
    media_image8.png
    235
    775
    media_image8.png
    Greyscale

(pg. 174 last full paragraph and Equation (9):
    PNG
    media_image20.png
    125
    578
    media_image20.png
    Greyscale
teaches updating the meta-weights based on Equation (9); pg. 175 first paragraph teaches                         
                            
                                
                                    β
                                
                                
                                    i
                                
                            
                            (
                            t
                            )
                        
                     represents meta-weight and 
    PNG
    media_image22.png
    38
    38
    media_image22.png
    Greyscale
 represents meta step-size; pg. 174 third full paragraph teaches 
    PNG
    media_image23.png
    32
    48
    media_image23.png
    Greyscale
  represents sample error (difference) with respect to temporal element t).
SHIBATA, Mahmood, and Sutton are analogous art to the claimed invention because they are directed to analyzing weights in machine learning based systems.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate above limitation(s) as taught by Sutton to the disclosed invention of SHIBATA in view of Mahmood.
One of ordinary skill in the arts would have been motivated to make this modification in order to leverage the analysis of deriving the Incremental Delta-Bar-Delta (IDBD) algorithm as gradient descent because such an “analysis refines previous analyses by improving certain approximations and by being applicable to incremental training” (Sutton pg. 175 Conclusion).
Regarding Claim 15,
SHIBATA in view of Mahmood teaches the method of claim 11.
SHIBATA in view of Mahmood does not appear to explicitly teach  

    PNG
    media_image9.png
    40
    752
    media_image9.png
    Greyscale


    PNG
    media_image10.png
    216
    733
    media_image10.png
    Greyscale



However, Sutton teaches    

    PNG
    media_image9.png
    40
    752
    media_image9.png
    Greyscale


    PNG
    media_image10.png
    216
    733
    media_image10.png
    Greyscale

(pg. 175 first paragraph and Equation (10): 
    PNG
    media_image24.png
    200
    567
    media_image24.png
    Greyscale
teaches updating meta-weights based on Equation (10); pg. 175 first paragraph teaches                         
                            
                                
                                    β
                                
                                
                                    i
                                
                            
                            (
                            t
                            )
                        
                     represents meta-weight and 
    PNG
    media_image22.png
    38
    38
    media_image22.png
    Greyscale
 represents meta step-size; pg. 174 third full paragraph teaches 
    PNG
    media_image23.png
    32
    48
    media_image23.png
    Greyscale
  represents sample error (difference) with respect to temporal element t; pg. 174 fourth full paragraph teaches                         
                            
                                
                                    w
                                
                                
                                    i
                                
                            
                            (
                            t
                            )
                        
                     as weight i at time t).
SHIBATA, Mahmood, and Sutton are analogous art to the claimed invention because they are directed to analyzing weights in machine learning based systems.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate above limitation(s) as taught by Sutton to the disclosed invention of SHIBATA in view of Mahmood.
One of ordinary skill in the arts would have been motivated to make this modification in order to leverage the analysis of deriving the Incremental Delta-Bar-Delta (IDBD) algorithm as gradient descent because such an “analysis refines previous analyses by improving certain approximations and by being applicable to incremental training” (Sutton pg. 175 Conclusion).

Claims 6, 8, 16, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over SHIBATA (US 2018/0099408 A1) in view of Mahmood (“AUTOMATIC STEP-SIZE ADAPTATION IN INCREMENTAL SUPERVISED LEARNING”) and further in view of Miljković et al. (“Neural network Reinforcement Learning for visual control of robot manipulators”).
Regarding Claim 6,
SHIBATA in view of Mahmood teaches the system of claim 1.
SHIBATA further teaches wherein the at least one processor is configured for (pg. 4 [0070] teaches processor).
SHIBATA in view of Mahmood does not appear to explicitly teach applying new inputs to the machine learning architecture with the updated weights.
However, Miljković et al. teaches applying new inputs to the machine learning architecture with the updated weights (pg. 1724 Section 4.3: “That error, along with the learning rate, is used to update the weights of the neural network in the backward phase with a signal propagated from the output to the input layer” teaches the neural network (machine learning architecture) has updated weights; pg. 1725 Algorithm 1 teaches the neural network is being trained iteratively with new learning samples (new inputs), thus rendering the updated neural network continues to be trained with new learning samples).
SHIBATA, Mahmood, and Miljković et al. are analogous art to the claimed invention because they are directed to analyzing weights in machine learning based systems.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate applying new inputs to the machine learning architecture with the updated weights as taught by Miljković et al. to the disclosed invention of SHIBATA. In view of Mahmood.
One of ordinary skill in the arts would have been motivated to make this modification to leverage “a database of representative learning samples is employed in the hybrid control scheme. Because the training of the neural network includes all the samples in the database, the Q-value is updated several times in each interaction with the environment” to achieve the following: “speed up the convergence of the algorithms” (Miljković et al. 1735 Section 8).
Regarding Claim 8,
SHIBATA in view of Mahmood teaches the system of claim 1.
SHIBATA in view of Mahmood does not appear to explicitly teach wherein the value function is at least a component of a representation of a position of a robot component, and wherein the value function is an input to a robot control system.
However, Miljković et al. teaches wherein the value function is at least a component of a representation of a position of a robot component, and wherein the value function is an input to a robot control system (pg. 1725 last full paragraph: “Starting from an arbitrarily pose in the structured environment, the robot manipulator is to move to the desired pose as indicated by the prerecorded target image, taken at its desired position and orientation. Firstly, image in the current pose is acquired and the feature points are extracted. Then, the current world state is obtained regarding the position of the feature in the image. If the feature is in the first area of interest, the approaching (second) step is conducted. In the other case (feature is not in the first area of interest) the correction in the robot pose is carried out by choosing an optimal action with the neural network Reinforcement Learning controller” teaches the goal of the current neural network based reinforcement system is to move the robot to a desired position by analyzing its current position in which state/action information describes position/pose of the robot; 1724 Section 4.3 Equation 13 provides the value function for producing the Q-value Q(s,a), including state and action information, for reaching the goal of the system (desired position) in the neural network reinforcement learning system, which is input to the robot manipulator (robot control system), see Fig. 2 & 3).
SHIBATA, Mahmood, and Miljković et al. are analogous art to the claimed invention because they are directed to analyzing weights in machine learning based systems.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein the value function is at least a component of a representation of a position of a robot component, and wherein the value function is an input to a robot control system as taught by Miljković et al. to the disclosed invention of SHIBATA. In view of Mahmood.
One of ordinary skill in the arts would have been motivated to make this modification to leverage “a database of representative learning samples is employed in the hybrid control scheme. Because the training of the neural network includes all the samples in the database, the Q-value is updated several times in each interaction with the environment” to achieve the following: “speed up the convergence of the algorithms” (Miljković et al. 1735 Section 8).
Regarding Claim 16,
SHIBATA in view of Mahmood teaches the method of claim 11.
SHIBATA in view of Mahmood does not appear to explicitly teach comprising: applying new inputs to the machine learning architecture with the updated weights.
However, Miljković et al. teaches comprising: applying new inputs to the machine learning architecture with the updated weights (pg. 1724 Section 4.3: “That error, along with the learning rate, is used to update the weights of the neural network in the backward phase with a signal propagated from the output to the input layer” teaches the neural network (machine learning architecture) has updated weights; pg. 1725 Algorithm 1 teaches the neural network is being trained iteratively with new learning samples (new inputs), thus rendering the updated neural network continues to be trained with new learning samples).
SHIBATA, Mahmood, and Miljković et al. are analogous art to the claimed invention because they are directed to analyzing weights in machine learning based systems.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate comprising: applying new inputs to the machine learning architecture with the updated weights as taught by Miljković et al. to the disclosed invention of SHIBATA. In view of Mahmood.
One of ordinary skill in the arts would have been motivated to make this modification to leverage “a database of representative learning samples is employed in the hybrid control scheme. Because the training of the neural network includes all the samples in the database, the Q-value is updated several times in each interaction with the environment” to achieve the following: “speed up the convergence of the algorithms” (Miljković et al. 1735 Section 8).
Regarding Claim 18,
SHIBATA in view of Mahmood teaches the method of claim 11.
SHIBATA in view of Mahmood does not appear to explicitly teach wherein the value function is at least a component of a representation of a position of a robot component, and the value function is an input to a robot control.
However, Miljković et al. teaches wherein the value function is at least a component of a representation of a position of a robot component, and the value function is an input to a robot control system (pg. 1725 last full paragraph: “Starting from an arbitrarily pose in the structured environment, the robot manipulator is to move to the desired pose as indicated by the prerecorded target image, taken at its desired position and orientation. Firstly, image in the current pose is acquired and the feature points are extracted. Then, the current world state is obtained regarding the position of the feature in the image. If the feature is in the first area of interest, the approaching (second) step is conducted. In the other case (feature is not in the first area of interest) the correction in the robot pose is carried out by choosing an optimal action with the neural network Reinforcement Learning controller” teaches the goal of the current neural network based reinforcement system is to move the robot to a desired position by analyzing its current position in which state/action information describes position/pose of the robot; 1724 Section 4.3 Equation 13 provides the value function for producing the Q-value Q(s,a), including state and action information, for reaching the goal of the system (desired position) in the neural network reinforcement learning system, which is input to the robot manipulator (robot control system), see Fig. 2 & 3).
SHIBATA, Mahmood, and Miljković et al. are analogous art to the claimed invention because they are directed to analyzing weights in machine learning based systems.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein the value function is at least a component of a representation of a position of a robot component, and the value function is an input to a robot control system as taught by Miljković et al. to the disclosed invention of SHIBATA. In view of Mahmood.
One of ordinary skill in the arts would have been motivated to make this modification to leverage “a database of representative learning samples is employed in the hybrid control scheme. Because the training of the neural network includes all the samples in the database, the Q-value is updated several times in each interaction with the environment” to achieve the following: “speed up the convergence of the algorithms” (Miljković et al. 1735 Section 8).



Response to Arguments
Applicant's arguments filed on 02/28/2022 with respect to the 35 U.S.C. 103 rejection to independent claims 1, 11, and 21 have been fully considered but they are not persuasive. Applicant asserts “the independent claims have been amended to incorporate some of the allowable features of claims 9, 10, 19 or 20. For example, representative claim 1 recites that the set of meta-weights "vary based on a stochastic meta-descent and a trace of past updates to the plurality of weights". These features are not taught or suggested by the cited references” (Remarks, pg. 10).
Examiner’s Response:
The Examiner respectfully disagrees. First, Examiner notes the previous Office Action indicated the prior arts of record, either alone or in combination, do not teach or suggest the limitations in claims 9, 10, 19, and 20. However, amended independent claims 1, 11, and 21 do not respectively incorporate all of the limitations of claims 9, 10, 19, or 20. Second, Applicant asserts that “claim 1 recites that the set of meta-weights "vary based on a stochastic meta-descent and a trace of past updates to the plurality of weights"” and that the prior arts do not teach this feature. Applicant's arguments fail to comply with 37 CFR 1.111(b) because they amount to a general allegation that the claims define a patentable invention without specifically pointing out how the language of the claims patentably distinguishes them from the references. As discussed above in the current 35 U.S.C. 103 rejection, Mahmood teaches wherein the at least one time-varying step-size value is based on a set of meta-weights which vary based on a stochastic meta-descent and a trace of past updates to the plurality of weights (pg. 13 second full paragraph: “Meta-descent algorithms iteratively update the step-size parameter based on a gradient-descent rule. A gradient-descent rule is typically used for updating the weights of the learning system. Therefore, when a gradient-descent rule is also applied to learn the step-size parameter, we refer to it as a meta-descent algorithm” teaches using stochastic meta-descent to learn step-size value and weights; pg. 15 Equation (2.2) and pg. 15 last paragraph to pg. 16:

    PNG
    media_image16.png
    545
    646
    media_image16.png
    Greyscale
 
    PNG
    media_image17.png
    66
    625
    media_image17.png
    Greyscale

teaches the step-size value (
    PNG
    media_image18.png
    24
    47
    media_image18.png
    Greyscale
) in a meta-descent algorithm is based on a set of meta-weights and 
    PNG
    media_image19.png
    28
    162
    media_image19.png
    Greyscale
 (gradients of the sample squared error with respect to the weight vector in two successive time steps correspond to trace of past updates to the plurality of weights, wherein time t and time t-1 are time steps in the past compared to time step t+1)). Therefore, the 35 U.S.C. 103 rejection to claims 1, 11, and 21 is maintained.

Applicant's arguments filed on 02/28/2022 with respect to the 35 U.S.C. 101 rejection to claims 1-21 have been fully considered but they are not persuasive. 
Applicant asserts “aspects of the present claims provide methods and systems which receive observation data which can be used to define the state of an environment in which a machine learning agent operates, and based on a robust machine learning architecture, generates signals for instructing the agent to perform an action based on an observed state of the environment. These features make no sense outside of the context of computers of computer-instructable agents” (Remarks, pg. 10).
Examiner’s Response:
The Examiner respectfully disagrees. First, Applicant has not identified which limitations in the claimed invention would render the claim eligible or established arguments regarding the 35 U.S.C. 101 rejection. Second, the independent claims do not recite “computer-instructable agents.” Representative claim 1 recites “agent” in two limitations, including the following: “receiving one or more observation data sets representing one or more observations associated with at least a portion of a state of an environment in which an agent operates” and “generating signals for instructing the agent to perform an action based on the machine learning architecture and an observed state of the environment” (emphasis added). Here, it is clear that the claim only requires “agent,” not “computer-instructable agents.” Further, the Specification does not provide special definition that an “agent” has to be a “computer-instructable agents”, therefore the broadest reasonable interpretation of the term “agent” would be the dictionary meaning of “agent.” 
Moreover, as noted in the rejection, the limitation “receiving one or more observation data sets representing one or more observations associated with at least a portion of a state of an environment in which an agent operates” is directed to making observations of data associated with at least a portion of a state of an environment in which an agent operates, which corresponds to a human making observations of the operations of an agent (observation is a mental process). See MPEP 2106.04(a). Further, as noted in the rejection, the limitation “generating signals for instructing the agent to perform an action based on the machine learning architecture and an observed state of the environment” amounts to mere instruction to apply language, which does not amount to practical integration or significantly more. See MPEP 2106.05(f).

Applicant asserts “As described for example at pages 9-11 of the present application, the claimed training mechanism provides a robust machine learning architecture which can in some instances have a performance which is less sensitive to a user defined input. In some situations, this may provide an improvement over previous techniques similar to the Method of Training a Neural Network in Example 39 of the USPTO Subject Matter Eligibility Examples. Accordingly, the current claims are not directed to a judicial exception, and would furthermore provide integration of any alleged judicial exception into a practical application. For example, the present application describes examples of instructing agents such as the control of bionic limbs” (Remarks, pg. 10).
Examiner’s Response:
The Examiner respectfully disagrees. First, Applicant has not clearly identified which limitation(s) in the claimed invention are relied upon in the arguments. Second, assuming the asserted “claimed training mechanism” is directed to the amended limitation of “training the machine learning architecture with the one or more observation data sets, where the training includes updating the plurality of weights based on: an error value and at least one time-varying step-size value; wherein the at least one time-varying step-size value is based on a set of meta-weights which vary based on a stochastic meta-descent and a trace of past updates to the plurality of weights” in claim 1, this limitation does not amount to an improvement in the functioning of a computer or to any other technology. The “training...” limitation specifically identifies that “training the machine learning architecture” includes updating weights based on error values (mathematical calculations) and determining step-size value based on meta-weights based on a stochastic meta-descent and a trace of past updates to the plurality of weights (mathematical calculations). The Specification in paragraphs [0056]-[0058] further supports that the training by updating weights in accordance with the claimed features amounts to mathematical calculations. As the “training...” limitation is directed to mathematical calculations, which is an abstract idea, this limitation is not an additional element directed to any alleged improvement. See MPEP 2106.05(a) (“It is important to note, the judicial exception alone cannot provide the improvement. The improvement can be provided by one or more additional elements”). 
Moreover, the current claimed invention is not analogous to Example 39 because the present claims do not recite a “neural network” or training of a “neural network.” Amended claim 1 recites a “machine learning architecture” for which the Specification points to “architectures involving reinforcement learning (RL)” as an example for “machine learning architecture” (Specification [0003]). A reinforcement learning model involves observing and evaluating an agent taking actions in an environment, interpreting (evaluating) the action and the resulting state, evaluating/judging the reward corresponding to the agent’s action, and evaluating a value function based on the actions, states, and rewards, which can be analogized to the process of observation, evaluation and judgment. As discussed above, the claimed training of a “machine learning architecture” (such as a reinforcement learning model) amounts to steps of updating weights in the form of mathematical calculations. Although the Specification contains examples of the machine learning architecture being a neural network (see Specification [00127]), the claims do not require this element. Therefore, Applicant’s argument is not persuasive because the features upon which Applicant relies (i.e., training a neural network using the claimed mechanism) are not recited in the rejected claim(s). Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).
In addition, Applicant asserts that “instructing agents such as the control of bionic limbs” would “provide integration of any alleged judicial exception into a practical application”, but does not point out which claim limitation is directed to the alleged features. The claims only recite “agent” and not “agents such as the control of bionic limbs”. Therefore, Applicant’s argument is not persuasive because the features upon which Applicant relies (i.e., “instructing agents such as the control of bionic limbs”) are not recited in the rejected claim(s). Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).

Applicant's arguments filed on 02/28/2022 with respect to the 35 U.S.C. 112(b) rejection to claims 9, 10, 14, 19, and 20 have been fully considered but they are not persuasive. 
Regarding claim 14, although the previous ground of 35 U.S.C. 112(b) rejection has been addressed, amendment to claim 14 necessitated a new ground of 35 U.S.C. 112(b) rejection. Please see the current rejection for more information. 
Regarding claims 9, 10, 19, and 20, Applicant asserts that “Claims 9, 10, 19, and 20 have been amended to clarify symbols in the equations” (Remarks, pg. 9).
Examiner’s Response:
Amendments to claims 9, 10, 19, and 20 provide explanations to some of the claimed symbols, but the following symbol remains unexplained: “                        
                            
                                
                                    H
                                
                                
                                    i
                                
                            
                        
                    ”. Therefore, amended claims 9, 10, 19, and 20 lack clarity because they recite algorithms without explaining what the symbol “                        
                            
                                
                                    H
                                
                                
                                    i
                                
                            
                        
                    ” in the algorithms represents, and one of ordinary skill in the art would not be able to ascertain the metes and bounds of the claim.






Prior Art
The prior arts of record, either alone or in combination, do not teach or suggest the limitations in claims 9, 10, 19, and 20. These claims remain rejected under 35 U.S.C. 101 and 35 U.S.C. 112(b), as noted above.

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YING YU CHEN whose telephone number is (571)270-1484. The examiner can normally be reached Monday-Friday 7:30 am-5:00 pm (EST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/YING YU CHEN/Examiner, Art Unit 2125