DETAILED ACTION
Claims 1-6 are pending.  
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Specification
The abstract of the disclosure is objected to because it recites the abbreviation TD without defining it.  Correction is required.  See MPEP § 608.01(b).
Applicant is reminded of the proper language and format for an abstract of the disclosure.
The abstract should be in narrative form and generally limited to a single paragraph on a separate sheet within the range of 50 to 150 words. The form and legal phraseology often used in patent claims, such as “means” and “said,” should be avoided. The abstract should describe the disclosure sufficiently to assist readers in deciding whether there is a need for consulting the full patent text for details.

The language should be clear and concise and should not repeat information given in the title. It should avoid using phrases which can be implied, such as, “The disclosure concerns,” “The disclosure defined by this invention,” “The disclosure describes,” etc.
Claim Objections
The claims are objected to because of the following informalities: 
‘state-value’ appears inconsistently as ‘state – value’ and ‘state -value’ throughout the claims.
Appropriate correction is required.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(B)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention. 

Claim(s) 1-6 is/are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.
With regard to claims 1 and 5-6, each of these claims recite ‘a TD error’ without defining what the abbreviation TD means.
In addition, claims 1 and 5-6 recite ‘when state variation of…’ that is unclear.  It appears that the intended meaning is ‘wherein state variation of…’.
With regard to claim 2, this claim recites ‘the calculating’ and it is unclear which of the calculating steps in claim 1 this refers to.
In addition, claim 2 recites ‘calculating estimated components acquired by estimating components of the gradient function matrix, by correlating a result acquired by dividing the TD error calculated for each of the plurality of components of the feedback coefficient matrix by the perturbation, and a result acquired by differentiating the state-value function with respect to each of the plurality of components of the feedback coefficient matrix’ that is unclear.
The respective dependent claims are also rejected under 35 U.S.C. § 112 as they inherit all of the characteristics of the claim from which they depend and none of the dependent claims provide a cure for the indefiniteness of the parent claims.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claim(s) 1-6 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a non-statutory subject matter. The claims do not fall within at least one of the four categories of patent eligible subject matter because the claimed invention is directed to a judicial exception — an abstract idea (mathematical process). 
Claim 1 recites a non-transitory computer-readable recording medium storing therein a policy improvement program of reinforcement learning by a state-value function, i.e. an article of manufacture, which is a statutory category of invention.  Claim 1 recites ‘calculating a TD error based on an estimated state-value function that is acquired by estimating the state-value function, the TD error being calculated by giving a perturbation to each of a plurality of components of a feedback coefficient matrix that provides a policy; 
calculating based on the TD error and the perturbation, an estimated gradient function matrix acquired by estimating a gradient function matrix of the state-value function with respect to the feedback coefficient matrix for a state of a controlled object, when state variation of the controlled object in the reinforcement learning is described by a linear difference equation and an immediate cost or an immediate reward of the controlled object is described in a quadratic form of the state and an input; and 
updating the feedback coefficient matrix using the estimated gradient function matrix, i.e. under the broadest reasonable interpretation, these limitations comprise the mathematical steps performing calculations and updating an abstract value.  Thus the claim recites an abstract idea (a mathematical process), see MPEP 2106.04(a).
This judicial exception is not integrated into a practical application because the additional elements, i.e. a non-transitory computer-readable recording medium with program code (merely applying the exception with generic technology– see MPEP 2106.04(a)(2) III D or CyberSource, 654 F.3d at 1368 n. 1, 99 USPQ2d) do not impose any meaningful limits on practicing the abstract idea.  The claim is therefore directed to an abstract idea.
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, applying the exception using non-transitory computer-readable recording medium with program code (merely applying the exception with generic technology — see 2106.04(d) and MPEP 2106.04(a)(2) III D) is not considered significantly more.  Considering the additionally elements individually and in combination and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Thus the claim is not patent eligible.  Also note that feedback systems and reinforcement learning are well-understood, routine, conventional, see for example Munakata et al. U.S. Patent Publication No. 20170227936  [0029] and Hans et al. U.S. Patent Publication No. 20110059427 [0005-0009] or the other refences cited below.
Claim 2 recites ‘calculating estimated components…’ i.e. another mathematical step. Thus this claim recites an abstract idea.
Claim 3 recites further details of the abstract mathematical method embodied in the non-transitory medium. Thus this claim recites an abstract idea.
Claim 4 recites the mathematical form of the state-value function.  Thus this claim recites an abstract idea.
Claim 5 recites a policy improvement method, i.e. a process, which is a statutory category of invention.  The recited method is however similar to that performed by the computer-readable medium of claim 1 and considered to involve an abstract idea (mathematical process) and is rejected under the same rationale as claim 1.
Claim 6 recites a policy improvement apparatus, i.e. a machine, which is a statutory category of invention.  The recited machine however, performs a method that is similar to that performed by the computer-readable medium of claim 1 and considered to involve an abstract idea (mathematical process) and is rejected under the same rationale as claim 1.  Note that applying the abstract idea with a processor and memory (generic computer technology) is not sufficient to integrate the abstract idea into a practical application or amount to significantly more — see MPEP 2106.04(a)(2) III C.
Invitation to Participate in DSMER Pilot Program
The present application satisfies the criteria for participation set forth in the Federal Register Notice entitled “Deferred Subject Matter Eligibility Response (DSMER) Pilot Program.” Therefore, the examiner invites applicant to participate in the DSMER pilot program. 

An applicant who accepts the invitation to participate in this pilot program must still file a reply to every Office action mailed in this application, but may defer presenting arguments or amendments in response to subject matter eligibility (SME) rejection(s) until the earlier of final disposition of the application, or the withdrawal or obviation of all other outstanding non-SME rejections. A final disposition for purposes of this pilot program occurs upon the earliest of: mailing of a notice of allowance; mailing of a final Office action; filing of a notice of appeal; filing of a request for continued examination; or abandonment of the application. Other than applicant’s ability to defer responding to SME rejections, participation in the DSMER pilot program does not alter the normal examination process (e.g., as outlined in MPEP 700), and applicant must still respond to all non-SME rejections when replying to Office actions. 

Further information about the pilot program, including an explanation of the criteria for receiving an invitation, and the conditions of participation, is provided in the Federal Register Notice announcing the program, which is available on the pilot program website https://www.uspto.gov/patents/initiatives/patent-application-initiatives/deferred-subject-matter-eligibility-response.

Applicant has two choices with respect to this invitation:
(1) Applicant may elect to participate in the DSMER pilot program. To effect this choice, applicant MUST accept this invitation by filing a completed request form PTO/SB/456 with a timely response to this Office action. The DSMER Pilot request form must be signed in accordance with 37 CFR § 1.33(b) by a person having authority to prosecute the application, and must be submitted via the USPTO’s patent electronic filing systems (EFS-Web or Patent Center). The form is available on the pilot program website https://www.uspto.gov/patents/initiatives/patent-application-initiatives/deferred-subject-matter-eligibility-response. If the form is properly completed and timely received, the application will be entered into the pilot program.

(2) Applicant may decline to participate in the pilot program. No action is required from applicant to effect this choice, because if applicant does not timely file a properly completed form PTO/SB/456, the application will not be entered into the pilot program.

Citation of Pertinent Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Hosogi et al. U.S. Patent No. 6000827, which discloses a system with adaptive learning feedback control.
Saxena et al. U.S. Patent No. 10739733, which discloses a system with adaptive learning feedback control.
Al-Hamouz et al. U.S. Patent Publication No. 20110257799, which discloses a system that uses a control law with linear state feedback.
Vau U.S. Patent No. 20170322523, which discloses a system with an adaptive control law.
Hintea et al. U.S. Patent Publication No. 20180134118 which discloses a feedback system utilizing a temporal difference.
Jereminov et al. U.S. Patent Publication No. 20180158152, which discloses a control system that utilizes a quadratic cost function.
Zehetleitner et al. U.S. Patent Publication No. 20200096013, which discloses a control system that utilizes a quadratic cost function.
Lewis et al. ‘Reinforcement Learning and Adaptive Dynamic Programming for Feedback Control’ IEEE CIRCUITS AND SYSTEMS MAGAZINE, IEEE 2009, which discloses mathematical formulations for reinforcement for control systems.
Ruvolo et al. ‘Control by Gradient Collocation: Applications to Optimal Obstacle Avoidance and Minimum Torque Control’ 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, October 7-12, 2012, which discloses a machine learning algorithm for feedback control that utilizes the gradient of a value function.

Note that any citations to specific, pages, columns, lines, or figures in the prior art references and any interpretation of the reference should not be considered to be limiting in any way.  A reference is relevant for all it contains and may be relied upon for all that it would have reasonably suggested to one having ordinary skill in the art.  See MPEP 2123.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BERNARD G. LINDSAY whose telephone number is (571)270-0665.  The examiner can normally be reached on IFP.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mohammad Ali can be reached on (571)272-4105.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).  If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/BERNARD G LINDSAY/
Primary Examiner, Art Unit 2119