DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
The present application was filed on November 08, 2019. 
Claims 1-20 are pending. 
Drawings
The drawings filed on November 08, 2019 are accepted. 

Information Disclosure Statement
The information disclosure statement (IDS) submitted on November 8, 2019 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Interpretation
“A computer program product including one or more computer readable storage mediums…” as recited in independent claim 16 and dependent claims 17-20, is interpreted to be non-transitory, as mentioned by Paragraph [0080] of the Specification below: 
“A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.”

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-20 is/are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Regarding Claim 1, 
Claim 1 recites “the vector”. This recitation lacks clarity because it is unclear whether “the vector” refers to “a vector” that is updated using a reward of a subsequent time step and the eligibility trace of the current time step, or if “the vector” refers to “a feature vector” that is an encoded representation of a state. For purposes of examination, “the vector” will be interpreted as referring to “a vector” that is updated using a reward of a subsequent time step and the eligibility trace of the current time step. 

Regarding Claim 2, 
Claim 2 recites “the reward”. Claim 2 depends on Claim 1, which recites “a cumulative reward” and “a reward”. It is unclear whether “the reward” recited in claim 2 refers to a reward or cumulative reward of claim 1. Therefore, this limitation lacks clarity because it is unclear which reward “the reward” refers to. For purposes of examination “the reward” will be interpreted as referring to “a reward” of claim 1 (not the cumulative reward). 

Regarding Claim 3, 
Claim 3 recites “estimating the cumulative reward by using a feature vector of a target time step of a target system of the target system type, the matrix, and the vector”. This limitation lacks clarity because it is unclear whether “a target system of the target system type” refers to the target time step or the feature vector. Additionally, it is unclear whether the cumulative reward is estimated by using: 
 a feature vector, 
a target time step, 
target system of the target system type, 
the matrix, 
the vector
or if the cumulative reward is estimated by using: 
a feature vector (of a target time step of a target system of the target system type)
the matrix
the vector
For purposes of examination, the cumulative reward will be estimated as the latter, by using a feature vector, the matrix, and the vector. 

Claim 3 recites “the vector”. Claim 3 depends on Claim 1, which recites “a vector” and “a feature vector”. It is unclear whether “the vector” recited in claim 3 refers to a vector or a feature vector of claim 1. Therefore, this limitation lacks clarity because it is unclear which vector “the vector” refers to. For purposes of examination “the vector” will be interpreted as referring to “a vector” of claim 1 (not the feature vector).

Regarding Claim 5, 
Claim 5 recites “the reward”. Claim 5 depends on Claim 1, which recites “a cumulative reward” and “a reward”. It is unclear whether “the reward” recited in claim 5 refers to the reward or cumulative reward of claim 1. Therefore, this limitation lacks clarity because it is unclear which reward “the reward” refers to. For purposes of examination “the reward” will be interpreted as referring to “a reward” of claim 1 (not the cumulative reward).

	Claim 5 recites “the vector”. Claim 5 depends on Claim 1, which recites “a vector” and “a feature vector”. It is unclear whether “the vector” recited in claim 5 refers to a vector or a feature vector of claim 1. Therefore, this limitation lacks clarity because it is unclear which vector “the vector” refers to. For purposes of examination “the vector” will be interpreted as referring to “a vector” of claim 1 (not the feature vector).

Regarding Claim 6, 
Claim 6 recites “wherein the recursively updating the eligibility trace at the subsequent time step includes: adding the feature vector of the subsequent time step to a product of a lambda value, a discount factor of the weighted difference, and the eligibility trace of the current time step”. This limitation lacks clarity because it is unclear whether the feature vector is added to a product of the lambda value, added to the discount factor of the weighted difference, and added to the eligibility trace of the current time step or if the product of the lambda value, discount factor, and eligibility trace as a whole is added to the feature vector. For purposes of examination this limitation will be interpreted as  the product of the lambda value, discount factor, and eligibility trace as a whole being added to the feature vector. 

Regarding Claim 9, 
Claim 9 recites “the basis”. There is insufficient antecedent basis for this limitation in the claim. For purposes of examination, this will be interpreted as “a basis”. 

Regarding Claim 10, 
Claim 10 recites “the predicted cumulative reward”. There is insufficient antecedent basis for this limitation in the claim. For purposes of examination, this will be interpreted as “a predicted cumulative reward”.

Regarding Claim 11, 
Claim 11 recites “the vector”. This recitation lacks clarity because it is unclear whether “the vector” refers to “a vector” that is updated using a reward of a subsequent time step and the eligibility trace of the current time step, or if “the vector” refers to “a feature vector” that is an encoded representation of a state. For purposes of examination, “the vector” will be interpreted as referring to “a vector” that is updated using a reward of a subsequent time step and the eligibility trace of the current time step.

Regarding Claim 12, 
Claim 12 recites “the reward”. Claim 12 depends on Claim 11, which recites “a cumulative reward” and “a reward”. It is unclear whether “the reward” recited in claim 12 refers to the reward or cumulative reward of claim 11. Therefore, this limitation lacks clarity because it is unclear which reward “the reward” refers to. For purposes of examination “the reward” will be interpreted as referring to “a reward” of claim 11 (not the cumulative reward).

Regarding Claim 13, 
Claim 13 recites “estimating the cumulative reward by using a feature vector of a target time step of a target system of the target system type, the matrix, and the vector”. This limitation lacks clarity because it is unclear whether “a target system of the target system type” refers to the target time step or the feature vector. Additionally, it is unclear whether the cumulative reward is estimated by using: 
 a feature vector, 
a target time step, 
target system of the target system type, 
the matrix, 
the vector
or if the cumulative reward is estimated by using: 
a feature vector (of a target time step of a target system of the target system type)
the matrix
the vector
For purposes of examination, the cumulative reward will be estimated as the latter, by using a feature vector, the matrix, and the vector. 

Claim 13 recites “the vector”. Claim 13 depends on Claim 11, which recites “a vector” and “a feature vector”. It is unclear whether “the vector” recited in claim 3 refers to a vector or a feature vector of claim 11. Therefore, this limitation lacks clarity because it is unclear which vector “the vector” refers to. For purposes of examination “the vector” will be interpreted as referring to “a vector” of claim 11 (not the feature vector).

Regarding Claim 15, 
Claim 15 recites “the reward”. Claim 15 depends on Claim 11, which recites “a cumulative reward” and “a reward”. It is unclear whether “the reward” recited in claim 15 refers to the reward or cumulative reward of claim 11. Therefore, this limitation lacks clarity because it is unclear which reward “the reward” refers to. For purposes of examination “the reward” will be interpreted as referring to “a reward” of claim 11 (not the cumulative reward).

Claim 15 recites “the vector”. Claim 15 depends on Claim 11, which recites “a vector” and “a feature vector”. It is unclear whether “the vector” recited in claim 15 refers to a vector or a feature vector of claim 11. Therefore, this limitation lacks clarity because it is unclear which vector “the vector” refers to. For purposes of examination “the vector” will be interpreted as referring to “a vector” of claim 11 (not the feature vector).

Regarding Claim 16, 
Claim 16 recites “the vector”. This recitation lacks clarity because it is unclear whether “the vector” refers to “a vector” that is updated using a reward of a subsequent time step and the eligibility trace of the current time step, or if “the vector” refers to “a feature vector” that is an encoded representation of a state. For purposes of examination, “the vector” will be interpreted as referring to “a vector” that is updated using a reward of a subsequent time step and the eligibility trace of the current time step.

Regarding Claim 17, 
Claim 17 recites “the reward”. Claim 17 depends on Claim 16, which recites “a cumulative reward” and “a reward”. It is unclear whether “the reward” recited in claim 17 refers to the reward or cumulative reward of claim 16. Therefore, this limitation lacks clarity because it is unclear which reward “the reward” refers to. For purposes of examination “the reward” will be interpreted as referring to “a reward” of claim 16 (not the cumulative reward).

Regarding Claim 18, 
Claim 18 recites “estimating the cumulative reward by using a feature vector of a target time step of a target system of the target system type, the matrix, and the vector”. This limitation lacks clarity because it is unclear whether “a target system of the target system type” refers to the target time step or the feature vector. Additionally, it is unclear whether the cumulative reward is estimated by using: 
 a feature vector, 
a target time step, 
target system of the target system type, 
the matrix, 
the vector
or if the cumulative reward is estimated by using: 
a feature vector (of a target time step of a target system of the target system type)
the matrix
the vector
For purposes of examination, the cumulative reward will be estimated as the latter, by using a feature vector, the matrix, and the vector. 

Claim 18 recites “the vector”. Claim 18 depends on Claim 16, which recites “a vector” and “a feature vector”. It is unclear whether “the vector” recited in claim 18 refers to a vector or a feature vector of claim 16. Therefore, this limitation lacks clarity because it is unclear which vector “the vector” refers to. For purposes of examination “the vector” will be interpreted as referring to “a vector” of claim 16 (not the feature vector).

Regarding Claim 20, 
Claim 20 recites “the reward”. Claim 20 depends on Claim 16, which recites “a cumulative reward” and “a reward”. It is unclear whether “the reward” recited in claim 20 refers to the reward or cumulative reward of claim 16. Therefore, this limitation lacks clarity because it is unclear which reward “the reward” refers to. For purposes of examination “the reward” will be interpreted as referring to “a reward” of claim 16 (not the cumulative reward).

Claim 20 recites “the vector”. Claim 20 depends on Claim 16, which recites “a vector” and “a feature vector”. It is unclear whether “the vector” recited in claim 20 refers to a vector or a feature vector of claim 16. Therefore, this limitation lacks clarity because it is unclear which vector “the vector” refers to. For purposes of examination “the vector” will be interpreted as referring to “a vector” of claim 16 (not the feature vector).

Dependent claims 2-10, 12-15, and 17-20 are rejected due to being directly and indirectly dependent on rejected claims. 

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 

Regarding Claim 1, 
Step 1: 
Claim 1 is directed to a computer-implemented method, which is directed to a process, one of the statutory categories. 
Step 2A Prong One:
Claim 1 recites the following limitations: 
recursively updating a matrix by using a weighted difference between an eligibility trace of a current time step and an eligibility trace of a previous time step
recursively updating a vector by using a reward of a subsequent time step and the eligibility trace of the current time step; and
recursively updating an eligibility trace of a subsequent time step by using a feature vector of the subsequent time step, each feature vector being an encoded representation of at least a state of a training system of the target system type at a corresponding time step, and

These limitations require recursively updating a matrix using a weighted difference between eligibility traces of a current and previous time step, recursively updating a vector using a reward of a subsequent time step and eligibility trace of a current time step, and recursively updating an eligibility trace of a subsequent time step using a feature vector of a subsequent time step. These steps fall within the mathematical concept grouping of abstract ideas. Thus, claim 1 recites an abstract idea. 

Step 2A Prong Two: 
The abstract idea of claim 1 is not integrated into a practical application because the additional elements recited in claim 1 are: 
training a prediction model by performing an iteration for each time step, the iteration including: 
outputting the matrix and the vector as the prediction model for estimating the cumulative reward of a target time step of a target system of the target system type.

The recitation of:
outputting the matrix and the vector as the prediction model for estimating the cumulative reward of a target time step of a target system of the target system type 
amounts to recitation of insignificant extra-solution activity of data gathering. See MPEP 2106.05(g). 
Additionally, generally linking the abstract idea to a particular technological environment or field of use (training a prediction model by performing an iteration for each time step) cannot integrate the abstract idea into a practical application (see MPEP 2106.05(h)), this additional element merely specifies that the above mathematical concept steps are performed with a trained prediction model. Therefore, Claim 1 is directed to an abstract idea.

Step 2B: 
Finally, the additional elements, taken alone or in combination, do not represent significantly more than the abstract idea itself. Generally linking the abstract idea to a field of use or technological environment (training a prediction model by performing an iteration for each time step) does not provide an inventive concept (see MPEP 2106.05(h)). 
Further, the following recitation of insignificant extra-solution activity (outputting the matrix and the vector as the prediction model for estimating the cumulative reward of a target time step of a target system of the target system type) amounts to insignificant extra-solution activity of data gathering, see MPEP 2106.05(g). Further, MPEP 2106(d)(II) notes the following, "The courts have recognized the following computer functions as well-understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity...i. Receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 (utilizing an intermediary computer to forward information); TLI Communications LLC v. AV Auto. LLC, 823 F.3d 607, 610, 118 USPQ2d 1744, 1745 (Fed. Cir. 2016) (using a telephone for image transmission); OIP Techs., Inc., v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1093 (Fed. Cir. 2015) (sending messages over a network); buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014) (computer receives and sends information over a network);". Accordingly, the additional element does not integrate the abstract idea into a practical application because the recitation of insignificant extra solution activity is well-understood, routine, and conventional.

Regarding Claim 2, 
Claim 2 is dependent on claim 1 and only includes additional limitations drawn to mathematical concepts (wherein the reward in each time step represents a difference in value of a physical system between the current time step and the previous time step, and… the cumulative reward as a cumulative difference in value of a future time step). These limitations require that the reward of each time step is calculated by the difference in value between the current time step and previous time step and that the cumulative reward is the cumulative difference in value of a future time step. 
This claim includes an additional element (“the prediction model predicts”) that amounts to generally linking the abstract idea to a particular technological environment or field of use, which cannot integrate the abstract idea into a practical application or provide significantly more (see MPEP 2106.05(h)). The claim thus remains subject matter ineligible. 

Regarding Claim 3, 
Claim 3 is dependent on claim 1 and only includes additional limitations drawn to mathematical concepts (estimating the cumulative reward by using a feature vector of a target time step of a target system of the target system type, the matrix, and the vector, wherein the estimating the cumulative reward includes using a product of an inverse matrix of the matrix and the vector). These limitations require that the cumulative reward is estimated by using a feature vector, the matrix, and the vector and that estimating the cumulative reward includes calculating a product of an inverse matrix and the vector. This claim does not recite any additional elements beyond those recited in claim 1, and as such do not recite any additional elements which could integrate the abstract idea into a practical application or be significantly more than the abstract idea. The claim thus remains subject-matter ineligible.

Regarding Claim 4, 
Claim 4 is dependent on claim 1 and only includes additional limitations drawn to mathematical concepts (wherein the recursively updating a matrix further comprises: subtracting, from the matrix, a fraction of which numerator is a product of (i) the matrix, (ii) the weighted difference between an eligibility trace, at a current time step and an eligibility trace at a previous time step, (iii) the feature vector of the time step, and (iv) the matrix, and denominator is a sum of (I) a constant and (II) a product of (i) the feature vector of the time step, (ii) the matrix, and the weighted difference between the eligibility trace at the current time step and the eligibility trace at the previous time step.). These limitations require a mathematical calculation to recursively update a matrix. This claim does not recite any additional elements beyond those recited in claim 1, and as such do not recite any additional elements which could integrate the abstract idea into a practical application or be significantly more than the abstract idea. The claim thus remains subject-matter ineligible.

Regarding Claim 5, 
Claim 5 is dependent on claim 1 and only includes additional limitations drawn to mathematical concepts (wherein the recursively updating the vector includes: adding a product of the reward of the subsequent time step and the eligibility trace of the current time step to the vector). These limitations require a mathematical calculation to recursively update the vector by adding a product of the reward of the subsequent time step and eligibility trace of the current time step. This claim does not recite any additional elements beyond those recited in claim 1, and as such do not recite any additional elements which could integrate the abstract idea into a practical application or be significantly more than the abstract idea. The claim thus remains subject-matter ineligible.

Regarding Claim 6, 
Claim 6 is dependent on claim 1 and only includes additional limitations drawn to mathematical concepts (wherein the recursively updating the eligibility trace at the subsequent time step includes: adding the feature vector of the subsequent time step to a product of a lambda value, a discount factor of the weighted difference, and the eligibility trace of the current time step). These limitations require a mathematical calculation to recursively update the eligibility trace by adding a feature vector to the product of the lambda value, discount factor, and eligibility trace as a whole. This claim does not recite any additional elements beyond those recited in claim 1, and as such do not recite any additional elements which could integrate the abstract idea into a practical application or be significantly more than the abstract idea. The claim thus remains subject-matter ineligible.

Regarding Claim 7, 
Claim 7 is dependent on claim 1 and only includes additional limitations drawn to mathematical concepts (wherein the feature vector of the target time step of the target system represents a state or a state-action pair of the target system). These limitations require that the feature vector of the target time step represents a state or state-action pair. This claim does not recite any additional elements beyond those recited in claim 1, and as such do not recite any additional elements which could integrate the abstract idea into a practical application or be significantly more than the abstract idea. The claim thus remains subject-matter ineligible.

Regarding Claim 8, 
Claim 8 is dependent on claim 1 and only includes additional limitations drawn to mathematical concepts (wherein at least part of the training of the prediction model is performed using Boyan's Least Square Temporal Difference (LSTD)). These limitations require using Boyan’s Least Square Temporal Difference. This claim does not recite any additional elements beyond those recited in claim 1, and as such do not recite any additional elements which could integrate the abstract idea into a practical application or be significantly more than the abstract idea. The claim thus remains subject-matter ineligible.

Regarding Claim 9, 
Claim 9 is dependent on claim 1 and only includes additional limitations drawn to mathematical concepts (wherein the iteration further includes: recursively updating the matrix on the basis of the weighted difference between the feature vector of the time step and a feature vector of the subsequent time step). These limitations require mathematical calculation to recursively update the matrix on the basis of the weighted difference between the feature vector of the time step and a feature vector of the subsequent time step. This claim does not recite any additional elements beyond those recited in claim 1, and as such do not recite any additional elements which could integrate the abstract idea into a practical application or be significantly more than the abstract idea. The claim thus remains subject-matter ineligible.

Regarding Claim 10, 
Claim 10 is dependent on claim 8 and includes additional limitations drawn to mental processes (wherein the iteration further includes: updating a policy to choose an action in a given state by using the predicted cumulative reward at the current time step, wherein the target system transits from one state to another state by the action chosen by the policy.). These limitations require evaluating actions in a given state and choosing the action by using the predicted cumulative reward. This claim does not recite any additional elements beyond those recited in claim 1, and as such do not recite any additional elements which could integrate the abstract idea into a practical application or be significantly more than the abstract idea. The claim thus remains subject-matter ineligible.

Regarding Claim 11, 
Claim 11 is directed to an apparatus comprising a processor or a programmable circuitry; and one or more computer readable mediums…, which is directed to a machine, one of the statutory categories. Claim 11 recites: “An apparatus comprising a processor or a programmable circuitry; and one or more computer readable mediums collectively including instructions that, when executed by the processor or the programmable circuitry, cause the processor or the programmable circuitry to perform operations including:” which performs a process similar to the method of claim 1 and has limitations that are similar to the method of claim 1. As performing an abstract idea on a generic computer component cannot integrate the abstract idea into a practical application and cannot provide an inventive concept, claim 11 remains subject matter ineligible and claim 11 is therefore rejected with the same rationale applied against claim 1.

Regarding Claim 12, 
Claim 12 is dependent on claim 11 and recites limitations similar to the limitations recited in claim 2, therefore is rejected with the same rationale applied against claim 2. This claim does not recite any additional elements beyond those recited in claim 2, and as such does not recite any additional elements which could integrate the abstract idea into a practical application or be significantly more than the abstract idea. The claim thus remains subject matter ineligible. 

Regarding Claim 13, 
Claim 13 is dependent on claim 11 and recites limitations similar to the limitations recited in claim 3, therefore is rejected with the same rationale applied against claim 3. This claim does not recite any additional elements beyond those recited in claim 3, and as such does not recite any additional elements which could integrate the abstract idea into a practical application or be significantly more than the abstract idea. The claim thus remains subject matter ineligible. 

Regarding Claim 14, 
Claim 14 is dependent on claim 11 and recites limitations similar to the limitations recited in claim 4, therefore is rejected with the same rationale applied against claim 4. This claim does not recite any additional elements beyond those recited in claim 4, and as such does not recite any additional elements which could integrate the abstract idea into a practical application or be significantly more than the abstract idea. The claim thus remains subject matter ineligible. 

Regarding Claim 15, 
Claim 15 is dependent on claim 11 and recites limitations similar to the limitations recited in claim 5, therefore is rejected with the same rationale applied against claim 5. This claim does not recite any additional elements beyond those recited in claim 5, and as such does not recite any additional elements which could integrate the abstract idea into a practical application or be significantly more than the abstract idea. The claim thus remains subject matter ineligible. 

Regarding Claim 16, 
Claim 16 is directed to a computer program product including one or more computer readable storage mediums…, which is directed to an article of manufacture, one of the statutory categories. Claim 16 recites: “A computer program product including one or more computer readable storage mediums collectively storing program instructions that are executable by a processor or programmable circuitry to cause the processor or programmable circuitry to perform operations comprising:” which performs a process similar to the method of claim 1 and has limitations that are similar to the method of claim 1. As performing an abstract idea on a generic computer component cannot integrate the abstract idea into a practical application and cannot provide an inventive concept, claim 16 remains subject matter ineligible and claim 16 is therefore rejected with the same rationale applied against claim 1.

Regarding Claim 17, 
Claim 17 is dependent on claim 16 and recites limitations similar to the limitations recited in claim 2, therefore is rejected with the same rationale applied against claim 2. This claim does not recite any additional elements beyond those recited in claim 2, and as such does not recite any additional elements which could integrate the abstract idea into a practical application or be significantly more than the abstract idea. The claim thus remains subject matter ineligible. 

Regarding Claim 18, 
Claim 18 is dependent on claim 16 and recites limitations similar to the limitations recited in claim 3, therefore is rejected with the same rationale applied against claim 3. This claim does not recite any additional elements beyond those recited in claim 3, and as such does not recite any additional elements which could integrate the abstract idea into a practical application or be significantly more than the abstract idea. The claim thus remains subject matter ineligible. 

Regarding Claim 19, 
Claim 19 is dependent on claim 16 and recites limitations similar to the limitations recited in claim 4, therefore is rejected with the same rationale applied against claim 4. This claim does not recite any additional elements beyond those recited in claim 4, and as such does not recite any additional elements which could integrate the abstract idea into a practical application or be significantly more than the abstract idea. The claim thus remains subject matter ineligible. 

Regarding Claim 20, 
Claim 20 is dependent on claim 16 and recites limitations similar to the limitations recited in claim 5, therefore is rejected with the same rationale applied against claim 5. This claim does not recite any additional elements beyond those recited in claim 5, and as such does not recite any additional elements which could integrate the abstract idea into a practical application or be significantly more than the abstract idea. The claim thus remains subject matter ineligible.

Conclusion
Claims 1-20 have been searched but no prior art to teach these claims has been uncovered. 
The prior art made of record but not relied upon is considered pertinent to the applicant’s disclosure: 
Boyan et al. (“Technical Update: Least-Squares Temporal Difference Learning”) teaches performing reinforcement learning by using least squares temporal difference. The LSTD algorithm converges to the same coefficients that temporal difference learning does and performs a similar update to the eligibility traces. 
Morimura et al. (“UTILIZING THE NATURAL GRADIENT IN TEMPORAL DIFFERENCE REINFORCEMENT LEARNING WITH ELIGIBILITY TRACES”) teaches a method to perform reinforcement learning by approximating temporal differences between state values, reducing bias by using eligibility traces, and performing decision making with updates to a policy. 
Xu et al. (“Efficient Reinforcement Learning Using Recursive Least-Squares Methods”) teaches performing reinforcement learning by using a recursive least squares temporal difference algorithm. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHOUN ABRAHAM whose telephone number is (571)272-8144. The examiner can normally be reached Mon - Fri 08:00-16:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/S.J.A./Examiner, Art Unit 2125                                                                                                                                                                                                        
/KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125