Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments

Applicant’s arguments with respect to the claim(s) have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.

Claims 1-8  is/are rejected under 35 U.S.C. 103 as being unpatentable over   Fujimaki (US 2018/0218380 A1) in view of Baird, III (US 5,608,843) .
Fujimaki discloses a non-transitory computing readable recording medium (para. 0202 and Fig. 11 and claim 1) storing a reinforcement learning program (para. 0202 and claim 1), as Fujimaki discloses a predictive model which is learned based on past historical data (para. 0002) and Fujimaki also discloses the situation in which the input data is unobservable (para. 0028), which are a disclosure of a reinforcement learning program,  using a value function that causes a computer to execute a process, as 
Estimating first coefficients of the value function represented in a quadratic form, as Fujimaki discloses quadratic programming may be used for the model (para. 0003),  of inputs at times in the past  and the outputs at the present time and the times in the past, as Fujimaki discloses input data which is unobservable and is historical data from the past (para. 0029), the first coefficients estimated based on inputs at the times in the past, as Fujimaki discloses this data is used for the calculation of the function which has the constraint condition (para. 0010 and 0034 and 0060),  the outputs at the present time and the times in the past, and costs or reward that corresponds to the inputs at the times in the past, as Fujimaki discloses the output is the function which has as input historical data from times past in order to obtain the output is the function of the result of the optimization and is expressed as a function or the input and the constraint (para. 0030 and 0033)  and 
Determining a second coefficients that defines a control law based on the value function that uses the estimated first coefficients and determining input values at times after estimation of the first coefficients, as Fujimaki discloses a function which may quadratic (para. 0003), as stated above, Fujimaki discloses the function representation in the model may be quadratic (para. 0003), which is a predictive model obtained by the method as stated above, and is represented as a matrix of coefficients (para. 0036 and Formula 1 after para. 0036).
Baird, in the same field of endeavor of reinforcement learning (Abstract), discloses an algorithm for a  learning controller for reinforced learning which uses quadratic equations in a learning quadratic regulator (col. 2, lines 65-67 and col. 3, lines 1-6).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have chosen quadratic equations to express the equations in the method 
Re claim 2:   The combination of Fujimaki and Baird satisfies the limitations of claim 1 as stated above.   Fujimaki also  discloses the value function if represented by a quadratic form of the inputs at the present time and the times in the past, and the outputs at the present time and the times in the past, as Fujimaki discloses function is represented by coefficients obtained from unobservable historical past data and the present time, as Fujimaki discloses the constraint was also used to obtain the coefficients, and the outputs of the present time includes present information such as other information that is observable in the function (para. 0034), which is in the form of a matrix of coefficients (para. 0036), as stated above in the rejection of claim 1.
Re claim 3:  The combination of Fujimaki and Baird discloses the limitations of claim 1 as stated above.  Fujimaki also discloses the determining includes using outputs at times after the estimation and determining the input values at times after the estimation based on the value function, as Fujimaki discloses the model is updated with time t and the updating the  predictive model which uses historical data from the past would include using input values at times after the estimation in the updating of the predictive formula (para. 0040).
Re claim 4:  The combination of Fujimaki and Baird discloses the limitations of claim 1 as stated above.  Fujimaki also discloses the updating of the predictive model (para 0040), including predicting the influence of other factors on the function model (para. 0077-0080), which is a disclosure of estimating the first coefficients of the value function with respect to a control problem that uses the value function and for which the first coefficients indicating a degree of an influence of an input on a cost or a reward is unknown , the control problem being fully observed regarding the input at the times in the past and the outputs at the present time and the times in the past, as Fujimaki discloses updating the predictive model and predicting changes which take into account other influences such as sale prices (para. 0077).

Re claim 6:  Fujimaki discloses a non-transitory computing readable recording medium (para. 0202 and Fig. 11 and claim 1) storing a reinforcement learning program (para. 0202 and claim 1), as Fujimaki discloses a predictive model which is learned based on past historical data (para. 0002) and Fujimaki also discloses the situation in which the input data is unobservable (para. 0028), which are a disclosure of a reinforcement learning program,  using a value function that causes a computer to execute a process, as Fujimaki discloses optimization using a function which is optimized under a constraint (para. 0010), the constraint may be for example a cost (para. 0006), which is a disclosure of a penalty, the process including
Estimating first coefficients of the value function represented in a quadratic form, as Fujimaki discloses quadratic form for the model (para. 0003),  of inputs at times in the past  and the outputs at the present time and the times in the past, as Fujimaki discloses input data which is unobservable and is historical data from the past (para. 0029), the first coefficients estimated based on inputs at the times in the past, as Fujimaki discloses this data is used for the calculation of the function which has the constraint condition (para. 0010 and 0034 and 0060),  the outputs at the present time and the times in the past, and costs or reward that corresponds to the inputs at the times in the past, as Fujimaki discloses the output is the function which has as input historical data from times past in order to obtain the output is the function of the result of the optimization and is expressed as a function or the input and the constraint (para. 0030 and 0033)  and 
Determining a second coefficients that defines a control law based on the value function that uses the estimated first coefficients and determining input values at times after estimation of the first 
Baird, in the same field of endeavor of reinforcement learning (Abstract), discloses an algorithm for a  learning controller for reinforced learning which uses quadratic equations in a learning quadratic regulator (col. 2, lines 65-67 and col. 3, lines 1-6).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have chosen quadratic equations to express the equations in the method disclosed by Fujimaki because Baird discloses that a quadratic learning controller is known to have desirable performance in reinforced learning (Baird,col. 3, lines 1-16).
 Fujimaki also discloses the determining includes using outputs at times after the estimation and determining the input values at times after the estimation based on the value function, as Fujimaki discloses the model is updated with time t and the updating the  predictive model which uses historical data from the past would include using input values at times after the estimation in the updating of the predictive formula (para. 0040).
Re claim 7:  Fujimaki discloses a reinforcement learning apparatus using a value function
Including a memory, as Fujimaki discloses a storage unit 1003 (Fig. 11 and para. 0201)
A processor 1001 (Fig. 11 and para. 0201)
Fujimaki discloses a non-transitory computing readable recording medium (para. 0202 and Fig. 11 and claim 1) storing a reinforcement learning program (para. 0202 and claim 1), as Fujimaki discloses a predictive model which is learned based on past historical data (para. 0002) and Fujimaki also discloses the situation in which the input data is unobservable (para. 0028), which are a disclosure of a reinforcement learning program,  using a value function that causes a computer to execute a process, as 
Estimating first coefficients of the value function represented in a quadratic form, as Fujimaki discloses quadratic form for the model (para. 0003),  of inputs at times in the past  and the outputs at the present time and the times in the past, as Fujimaki discloses input data which is unobservable and is historical data from the past (para. 0029), the first coefficients estimated based on inputs at the times in the past, as Fujimaki discloses this data is used for the calculation of the function which has the constraint condition (para. 0010 and 0034 and 0060),  the outputs at the present time and the times in the past, and costs or reward that corresponds to the inputs at the times in the past, as Fujimaki discloses the output is the function which has as input historical data from times past in order to obtain the output is the function of the result of the optimization and is expressed as a function or the input and the constraint (para. 0030 and 0033)  and 
Determining a second coefficients that defines a control law based on the value function that uses the estimated first coefficients and determining input values at times after estimation of the first coefficients, as Fujimaki discloses a function which may quadratic (para. 0003), as stated above, Fujimaki discloses the function representation in the model may be quadratic (para. 0003), which is a predictive model obtained by the method as stated above, and is represented as a matrix of coefficients (para. 0036 and Formula 1 after para. 0036), and Fujimaki also discloses the determining includes using outputs at times after the estimation and determining the input values at times after the estimation based on the value function, as Fujimaki discloses the model is updated with time t and the updating the  predictive model which uses historical data from the past would include using input values at times after the estimation in the updating of the predictive formula (para. 0040).

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have chosen quadratic equations to express the equations in the method disclosed by Fujimaki because Baird discloses that a quadratic learning controller is known to have desirable performance in reinforced learning (Baird,col. 3, lines 1-16).
Re claim 8:  The combination of Fujimaki and Baird discloses the limitations of claim 1 as stated above.  Fujimaki also  discloses a recording medium according to claim 1, as stated in the rejection of claim 1 above,  in which the reinforcement learning program determines input values for a controlled object whose coefficient matrices of a state equation, an output equation, and an immediate cost equation are unknown and whose state is not directly observed , as Fujimaki discloses estimating first coefficients of the value function represented in a quadratic form, as Fujimaki discloses quadratic form for the model (para. 0003),  of inputs at times in the past  and the outputs at the present time and the times in the past, as Fujimaki discloses input data which is unobservable and is historical data from the past (para. 0029), the first coefficients estimated based on inputs at the times in the past, as Fujimaki discloses this data is used for the calculation of the function which has the constraint condition (para. 0010 and 0034 and 0060),  the outputs at the present time and the times in the past, and costs or reward that corresponds to the inputs at the times in the past, as Fujimaki discloses the output is the function which has as input historical data from times past in order to obtain the output is the function of the result of the optimization and is expressed as a function or the input and the constraint (para. 0030 and 0033)  and 
 And the estimating includes estimating the first coefficients based on the inputs at the times in the past the outputs at the present time and the times in the past, and immediate costs or rewards that 
 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to CARIDAD EVERHART whose telephone number is (571)272-1892.  The examiner can normally be reached on M-F 6:00 AM-4:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Eliseo Ramos-Feliciano can be reached on 571-272-7925.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available 






/CARIDAD EVERHART/Primary Examiner, Art Unit 2895