Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 4/20/2022 has been entered.
 
Response to Arguments
Applicant's arguments filed  4/20/2022  have been fully considered. Applicant has argued that Fujimaki and Baird cannot be combined because Baird requires that the controlled subject can be directly observed and because the model disclosed by Baird differs from the model disclosed by Fujimaki.  This argument is respectfully found to be not persuasive because Baird was relied upon in the rejection of the last Office Action only for its teaching with respect to the use of quadratic equations in a learning quadratic regulator, at the bottom of page 4 of the Office Action, in order to support that Fujimaki, which discloses unobservable input data, as stated on page 3 of the last Office Action, and which discloses  unobservable  quadratic programming used in the model, suggests quadratic equations in the model. Applicant has also argued that Fujimaki does not discloses “the value function representing an accumulated cost or an accumulated reward given to the controlled object” as recited in amended independent claim 1. New grounds of rejection are made.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-8  is/are rejected under 35 U.S.C. 103 as being unpatentable over   Fujimaki (US 2018/0218380 A1) in view of Baird, III (US 5,608,843) and of  Zhang et al (US 2018/0165554 A1) and of Kotti et al (u s2018/0226076 A1)(“Kotti”).
Fujimaki discloses a non-transitory computing readable recording medium (para. 0202 and Fig. 11 and claim 1) storing a reinforcement learning program (para. 0202 and claim 1), as Fujimaki discloses a predictive model which is learned based on past historical data (para. 0002) and Fujimaki also discloses the situation in which the input data is unobservable (para. 0028), which are a disclosure of a reinforcement learning program,  using a value function that causes a computer to execute a process, as Fujimaki discloses optimization using a function which is optimized under a constraint (para. 0010), the constraint may be for example a cost (para. 0006), which is a disclosure of a penalty, the process including
Estimating first coefficients of the value function represented in a quadratic form, as Fujimaki discloses quadratic programming may be used for the model (para. 0003),  of inputs at times in the past  and the outputs at the present time and the times in the past, as Fujimaki discloses input data which is unobservable and is historical data from the past (para. 0029), the first coefficients estimated based on inputs at the times in the past, as Fujimaki discloses this data is used for the calculation of the function which has the constraint condition (para. 0010 and 0034 and 0060),  the outputs at the present time and the times in the past, and costs or reward that corresponds to the inputs at the times in the past, as Fujimaki discloses the output is the function which has as input historical data from times past in order to obtain the output is the function of the result of the optimization and is expressed as a function or the input and the constraint (para. 0030 and 0033)  and 
Determining a second coefficients that defines a control law based on the value function that uses the estimated first coefficients and determining input values at times after estimation of the first coefficients, as Fujimaki discloses a function which may quadratic (para. 0003), as stated above, Fujimaki discloses the function representation in the model may be quadratic (para. 0003), which is a predictive model obtained by the method as stated above, and is represented as a matrix of coefficients (para. 0036 and Formula 1 after para. 0036).
Fujimaki does not explicitly state reinforcement.  Fujimaki is silent with respect to quadratic equations and with respect to accumulated rewards.
Baird, in the same field of endeavor of reinforcement learning (Abstract), discloses an algorithm for a  learning controller for reinforced learning which uses quadratic equations in a learning quadratic regulator (col. 2, lines 65-67 and col. 3, lines 1-6).
Zhang, in the same field of endeavor of machine training (Abstract) discloses a non-transitory, computer-readable recording medium storing a reinforcement learning program, as Zhang discloses instructions on a non-transitory medium (para. 0052) for machine learning (para. 0003) which is unsupervised learning (para. 0008), and unsupervised machine learning is well known in the art to be a reinforced learning.  Zhang also discloses a value function, as Zhang discloses cost and reward functions (para. 0015 and 0018).  Zhang also discloses unobserved samples or data (para. 0021).
Kotti, in the same field of endeavor of reinforcement learning (para. 0211), discloses unobservable environmental factors (para. 0211) and also discloses accumulated rewards (para. 0221).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have chosen quadratic equations to express the equations in the method disclosed by Fujimaki because Baird discloses that a quadratic learning controller is known to have desirable performance in reinforced learning (Baird,col. 3, lines 1-16).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention that Fujimaki discloses a reinforcement learning in view of the disclosure made by Zhang, as Fujimaki discloses unobservable data samples and cost function and other features as stated above which disclose reinforcement learning.
 It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the disclosure made by Kotti with the device disclosed by Fujimaki in order to obtain the benefit disclosed by Kotti of a measure of the success of the process as disclosed by Kotti (para. 0221).
Re claim 2:   The combination of Fujimaki and Baird and Zhang and Kotti satisfies the limitations of claim 1 as stated above.   Fujimaki also  discloses the value function if represented by a quadratic form of the inputs at the present time and the times in the past, and the outputs at the present time and the times in the past, as Fujimaki discloses function is represented by coefficients obtained from unobservable historical past data and the present time, as Fujimaki discloses the constraint was also used to obtain the coefficients, and the outputs of the present time includes present information such as other information that is observable in the function (para. 0034), which is in the form of a matrix of coefficients (para. 0036), as stated above in the rejection of claim 1.
Re claim 3:  The combination of Fujimaki and Baird and Zhang and Kotti  discloses the limiations of claim 1 as stated above.  Fujimaki also discloses the determining includes using outputs at times after the estimation and determining the input values at times after the estimation based on the value function, as Fujimaki discloses the model is updated with time t and the updating the  predictive model which uses historical data from the past would include using input values at times after the estimation in the updating of the predictive formula (para. 0040).
Re claim 4:  The combination of Fujimaki and Baird and Zhang and Kotti discloses the limitations of claim 1 as stated above.  Fujimaki also discloses the updating of the predictive model (para 0040), including predicting the influence of other factors on the function model (para. 0077-0080), which is a disclosure of estimating the first coefficients of the value function with respect to a control problem that uses the value function and for which the first coefficients indicating a degree of an influence of an input on a cost or a reward is unknown , the control problem being fully observed regarding the input at the times in the past and the outputs at the present time and the times in the past, as Fujimaki discloses updating the predictive model and predicting changes which take into account other influences such as sale prices (para. 0077).
Re claim 5:  The combination of Fujimaki and Baird and Zhang and Kotti discloses the limitations of claim 1 as stated above.  Fujimaki also discloses that the influence of sales amount and price can be explained and it is possible to model the influence of these factors (para. 0034), which is a disclosure of an observability condition.
Re claim 6:  Fujimaki discloses a non-transitory computing readable recording medium (para. 0202 and Fig. 11 and claim 1) storing a reinforcement learning program (para. 0202 and claim 1), as Fujimaki discloses a predictive model which is learned based on past historical data (para. 0002) and Fujimaki also discloses the situation in which the input data is unobservable (para. 0028), which are a disclosure of a reinforcement learning program,  using a value function that causes a computer to execute a process, as Fujimaki discloses optimization using a function which is optimized under a constraint (para. 0010), the constraint may be for example a cost (para. 0006), which is a disclosure of a penalty, the process including
Estimating first coefficients of the value function represented in a quadratic form, as Fujimaki discloses quadratic form for the model (para. 0003),  of inputs at times in the past  and the outputs at the present time and the times in the past, as Fujimaki discloses input data which is unobservable and is historical data from the past (para. 0029), the first coefficients estimated based on inputs at the times in the past, as Fujimaki discloses this data is used for the calculation of the function which has the constraint condition (para. 0010 and 0034 and 0060),  the outputs at the present time and the times in the past, and costs or reward that corresponds to the inputs at the times in the past, as Fujimaki discloses the output is the function which has as input historical data from times past in order to obtain the output is the function of the result of the optimization and is expressed as a function or the input and the constraint (para. 0030 and 0033)  and 
Determining a second coefficients that defines a control law based on the value function that uses the estimated first coefficients and determining input values at times after estimation of the first coefficients, as Fujimaki discloses a function which may quadratic (para. 0003), as stated above, Fujimaki discloses the function representation in the model may be quadratic (para. 0003), which is a predictive model obtained by the method as stated above, and is represented as a matrix of coefficients (para. 0036 and Formula 1 after para. 0036).
Baird, in the same field of endeavor of reinforcement learning (Abstract), discloses an algorithm for a  learning controller for reinforced learning which uses quadratic equations in a learning quadratic regulator (col. 2, lines 65-67 and col. 3, lines 1-6).
Zhang, in the same field of endeavor of machine training (Abstract) discloses a non-transitory, computer-readable recording medium storing a reinforcement learning program, as Zhang discloses instructions on a non-transitory medium (para. 0052) for machine learning (para. 0003) which is unsupervised learning (para. 0008), and unsupervised machine learning is well known in the art to be a reinforced learning.  Zhang also discloses a value function, as Zhang discloses cost and reward functions (para. 0015 and 0018).  Zhang also discloses unobserved samples or data (para. 0021).
Kotti, in the same field of endeavor of reinforcement learning (para. 0211), discloses unobservable environmental factors (para. 0211) and also discloses accumulated rewards (para. 0221).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have chosen quadratic equations to express the equations in the method disclosed by Fujimaki because Baird discloses that a quadratic learning controller is known to have desirable performance in reinforced learning (Baird,col. 3, lines 1-16).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention that Fujimaki discloses a reinforcement learning in view of the disclosure made by Zhang, as Fujimaki discloses unobservable data samples and cost function and other features as stated above which disclose reinforcement learning.
 It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the disclosure made by Kotti with the device disclosed by Fujimaki in order to obtain the benefit disclosed by Kotti of a measure of the success of the process as disclosed by Kotti (para. 0221).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have chosen quadratic equations to express the equations in the method disclosed by Fujimaki because Baird discloses that a quadratic learning controller is known to have desirable performance in reinforced learning (Baird,col. 3, lines 1-16).
 Fujimaki also discloses the determining includes using outputs at times after the estimation and determining the input values at times after the estimation based on the value function, as Fujimaki discloses the model is updated with time t and the updating the  predictive model which uses historical data from the past would include using input values at times after the estimation in the updating of the predictive formula (para. 0040).
Re claim 7:  Fujimaki discloses a reinforcement learning apparatus using a value function
Including a memory, as Fujimaki discloses a storage unit 1003 (Fig. 11 and para. 0201)
A processor 1001 (Fig. 11 and para. 0201)
Fujimaki discloses a non-transitory computing readable recording medium (para. 0202 and Fig. 11 and claim 1) storing a reinforcement learning program (para. 0202 and claim 1), as Fujimaki discloses a predictive model which is learned based on past historical data (para. 0002) and Fujimaki also discloses the situation in which the input data is unobservable (para. 0028), which are a disclosure of a reinforcement learning program,  using a value function that causes a computer to execute a process, as Fujimaki discloses optimization using a function which is optimized under a constraint (para. 0010), the constraint may be for example a cost (para. 0006), which is a disclosure of a penalty, the process including
Estimating first coefficients of the value function represented in a quadratic form, as Fujimaki discloses quadratic form for the model (para. 0003),  of inputs at times in the past  and the outputs at the present time and the times in the past, as Fujimaki discloses input data which is unobservable and is historical data from the past (para. 0029), the first coefficients estimated based on inputs at the times in the past, as Fujimaki discloses this data is used for the calculation of the function which has the constraint condition (para. 0010 and 0034 and 0060),  the outputs at the present time and the times in the past, and costs or reward that corresponds to the inputs at the times in the past, as Fujimaki discloses the output is the function which has as input historical data from times past in order to obtain the output is the function of the result of the optimization and is expressed as a function or the input and the constraint (para. 0030 and 0033)  and 
Determining a second coefficients that defines a control law based on the value function that uses the estimated first coefficients and determining input values at times after estimation of the first coefficients, as Fujimaki discloses a function which may quadratic (para. 0003), as stated above, Fujimaki discloses the function representation in the model may be quadratic (para. 0003), which is a predictive model obtained by the method as stated above, and is represented as a matrix of coefficients (para. 0036 and Formula 1 after para. 0036), and Fujimaki also discloses the determining includes using outputs at times after the estimation and determining the input values at times after the estimation based on the value function, as Fujimaki discloses the model is updated with time t and the updating the  predictive model which uses historical data from the past would include using input values at times after the estimation in the updating of the predictive formula (para. 0040).
Baird, in the same field of endeavor of reinforcement learning (Abstract), discloses an algorithm for a  learning controller for reinforced learning which uses quadratic equations in a learning quadratic regulator (col. 2, lines 65-67 and col. 3, lines 1-6).
Zhang, in the same field of endeavor of machine training (Abstract) discloses a non-transitory, computer-readable recording medium storing a reinforcement learning program, as Zhang discloses instructions on a non-transitory medium (para. 0052) for machine learning (para. 0003) which is unsupervised learning (para. 0008), and unsupervised machine learning is well known in the art to be a reinforced learning.  Zhang also discloses a value function, as Zhang discloses cost and reward functions (para. 0015 and 0018).  Zhang also discloses unobserved samples or data (para. 0021).
Kotti, in the same field of endeavor of reinforcement learning (para. 0211), discloses unobservable environmental factors (para. 0211) and also discloses accumulated rewards (para. 0221).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have chosen quadratic equations to express the equations in the method disclosed by Fujimaki because Baird discloses that a quadratic learning controller is known to have desirable performance in reinforced learning (Baird,col. 3, lines 1-16).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention that Fujimaki discloses a reinforcement learning in view of the disclosure made by Zhang, as Fujimaki discloses unobservable data samples and cost function and other features as stated above which disclose reinforcement learning.
 It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the disclosure made by Kotti with the device disclosed by Fujimaki in order to obtain the benefit disclosed by Kotti of a measure of the success of the process as disclosed by Kotti (para. 0221).
Re claim 8:  The combination of Fujimaki and Baird  and Zhang and Kotti discloses the limitations of claim 1 as stated above.  Fujimaki also  discloses a recording medium according to claim 1, as stated in the rejection of claim 1 above,  in which the reinforcement learning program determines input values for a controlled object whose coefficient matrices of a state equation, an output equation, and an immediate cost equation are unknown and whose state is not directly observed , as Fujimaki discloses estimating first coefficients of the value function represented in a quadratic form, as Fujimaki discloses quadratic form for the model (para. 0003),  of inputs at times in the past  and the outputs at the present time and the times in the past, as Fujimaki discloses input data which is unobservable and is historical data from the past (para. 0029), the first coefficients estimated based on inputs at the times in the past, as Fujimaki discloses this data is used for the calculation of the function which has the constraint condition (para. 0010 and 0034 and 0060),  the outputs at the present time and the times in the past, and costs or reward that corresponds to the inputs at the times in the past, as Fujimaki discloses the output is the function which has as input historical data from times past in order to obtain the output is the function of the result of the optimization and is expressed as a function or the input and the constraint (para. 0030 and 0033)  and 
 And the estimating includes estimating the first coefficients based on the inputs at the times in the past the outputs at the present time and the times in the past, and immediate costs or rewards that correspond to the inputs at the times in the past, as Fujimaki discloses  estimating first coefficients of the value function represented in a quadratic form, as Fujimaki discloses quadratic form for the model (para. 0003),  of inputs at times in the past  and the outputs at the present time and the times in the past, as Fujimaki discloses input data which is unobservable and is historical data from the past (para. 0029), the first coefficients estimated based on inputs at the times in the past, as Fujimaki discloses this data is used for the calculation of the function which has the constraint condition (para. 0010 and 0034 and 0060),  the outputs at the present time and the times in the past, and costs or reward that corresponds to the inputs at the times in the past, as Fujimaki discloses the output is the function which has as input historical data from times past in order to obtain the output is the function of the result of the optimization and is expressed as a function or the input and the constraint (para. 0030 and 0033).  

Any inquiry concerning this communication or earlier communications from the examiner should be directed to CARIDAD EVERHART whose telephone number is (571)272-1892. The examiner can normally be reached M-F 6:00 AM-4:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Eliseo Ramos-Feliciano can be reached on 571-272-7925. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/CARIDAD EVERHART/               Primary Examiner, Art Unit 2895