Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION

This action is in response to the claimed listing filed on 02/18/2019.
Claims 1-22 are pending. 


Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 15-18, 21-22 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
- Claims 15-16 recite “wherein the limitation of the total count of licenses to be accessed in a performance a succeeding plurality of cycles of execution of the software-encoded reinforcement learning program includes at least one computational system license”, This claimed recitation is in claim 15 where claim 16 is depending therein. This claimed recitation is insufficient antecedent basis in the claim since claim 1 does not recite any limitation of the total count of licenses. The claim 15 should be depending on claim 13. However if claim 15 depends on claim 13, it repeats the limitation of claim 14. Therefore, for correction, Claim 15 should be deleted, as well as claim 16. 
- Claim 17 recites “wherein the simulation program is at least partially executed on a computational system accessed via an electronic communications network”. This claimed recitation is insufficient antecedent basis since claim 1 does not recite any simulation program. Claim 17 if is corrected would be depending on claim 16, or it would be deleted if claim 16 is deleted for correcting the 35 USC 112(b).
- Claim 18 recites “wherein the limitation of the total count of licenses to be accessed in a performance a succeeding plurality of cycles of execution of the software-encoded reinforcement learning program includes at least one computational system license”. This limitation is insufficient antecedent basis in the claim since claim 1 does not recite any the limitation of the total count of licenses. Claim 18 has the claimed recitation nearly identical to claim 14, it should be corrected whether it is depending on claim 13 or claim 14 or deleted.
- Claims 21 and 22 recite as depending on “the method of claim 20”. Claim is indefinite because claim 20 is a system. 

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-22 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
- Independent Claim 1: It recites a series of acts , “( a.) performing a plurality of cycles of execution of a software-encoded reinforcement learning program; (b.) evaluating at least one financial cost value related to the execution of the software-encoded reinforcement learning program; and (c.) adjusting at least one resource limitation of the software-encoded reinforcement learning program at least partly in consideration of the at least one financial cost value”. Thus, the claim is directed to a process.
In light of the specification, the process is to perform and to evaluate financial costs using  the policy gradient in reinforcement learning, a mathematical concept. The method is to perform such as: (b.) evaluating at least one financial cost value related [to the execution of the software-encoded reinforcement learning program]; and (c.) adjusting at least one resource limitation of the software-encoded reinforcement learning program at least partly in consideration of the at least one financial cost value , that clearly recites  fundamental economic concept utilizing the policy gradient.
Thus, the claim is directed to a judicial exception, and nothing within the claim to recite additional elements contributed this judicial exception to significantly more; so, it is directed to an abstract idea. 
The recited elements in the claim [performing a plurality of cycles of execution of a software-encoded reinforcement learning program”] and  [to the execution of the software-encoded reinforcement learning program], merely instructions of computer program, naming “reinforcement learning program” as recited. A program with instructions for executing a math concept, modeled in machine leaning, and merely for calculating economic principles, does not contribute to an inventive concept.  It would not contribute significantly more to the judicial exception. Therefore, the claim is directed to an abstract idea, and it is not eligible under 35 USC 101.
- Dependent Claims 2-18:
-- Claims 2, 3, 4, and 5 do not add the claimed elements to contribute significantly more, but encompassing the elements such as cost values. Reciting elements such as license fee, the use of software program, and the program is a simulation program, in claims (2, 3, 4) are of the economic concept; they are merely the values used in the mathematical calculation of the costs, the gradients of Reinforcement Learning. It does not added the new things but applied economic concept with the gradient equation. 
-- Claims 6-7 recite additional element “reinforcement learning program bi-directional communicates” between the model and the program, it is only as for performing the execution of a program to carry the execution with the values of the policy gradient. This additional element, as part of the program mentioned in the claim 1, does not added the new thing.
-- Claim 8 recites financial cost value is related to a usage fee, it remains directed to economic principle, and be a judicial exception.  
-- Claim 9-18, recite using economic concepts of and the executions of programs. It remains directed to economic principle, math concepts. The additions of programs, cycle executions do not add to the new things. The claimed recitations in the claims do not impose any meaningful limits on practicing the judicial exception of math values set forth in the policy gradients. 
Thus, the claims 2-18 with the recitations encompassing the mathematical concepts using toward the economic principles and employing the policy gradient for a reinforcement learnings do not cure the abstract idea of claim 1. They are ineligible under 35 USC 101. 
	- Claims 19-22: Claims are directed to a system with adding one or more processors and a memory as generic computer components, and have the limitations and functionality of the method recited in claims 1-18 above. The claims are ineligible under 35 USC 101 as set-forth in the method of claims 1-18 because the generic components of processors and memory do not add the abstract idea new things.

Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-22 are rejected under 35 U.S.C. 103 as being unpatentable over Li et al., “Reinforcement Learning for Uplift Modeling”, 2-2019, downloaded from https://arxiv.org/  , pages, 1-22, in view of  Dasgupta et al., US Pub. No. US 2019/0019082 A1.
As per Claim 1: Li discloses, 1.  A reinforcement learning method comprising:
(a.) performing a plurality of cycles of execution of a software-encoded reinforcement learning program;
(See Algorithm 1, p. 13, a software program executing a training data, the software program of Algorithm 1 to illustrate the reinforcement learning in Figure 1, p. 11)
(b.) evaluating at least one financial cost value related to the execution of the software-encoded reinforcement learning program; and
(See Figure 1, the colorful circle represents values evaluated from Policy function).
(c.) adjusting at least one resource limitation of the software-encoded reinforcement learning program at least partly in consideration of the at least one financial cost value.
(See Figure 1(a) the value “Calculate Gradients” after Evaluation is adjusted into Policy Function
Li evaluates at least one of the values from Policy function, and 
Li does not mention “financial cost value”. However, the Policies are as actions modeled in reinforcement learning to uplift response , such as action response toward attractions such as a customer receiving coupons, but also extending to various applications that benefit from machine leaning (See Li’s Introduction) 
Dasgupta discloses “financial cost value” (See text section, p. 10,  [0143], ‘Metering and Pricing 1082 provide cost tracking as resources are utilized within the cloud computing environment, and billing or invoicing for consumption of these resources. In one example, these resources may include application software licenses’. These values are part of policy gradient or Action-value Function shown in  Fig.1.
Therefore, it would be obvious to an ordinary of skills before the effective filing of the Application to combine the teaching of Li with Policy Function, mathematical learning actions of Reinforcement learning of any expectations, and the teaching of Dasgupta for including financial cost value in the learning model. The combination would yield results predictable because the Policy Function would be applicable to any expectation algorithm in Machine Learning including financial areas and toward economic concepts.

As per Claim 2: Regarding,
2. The reinforcement learning method of claim 1, wherein the at least one financial cost value is derived from a license fee.
(See further in Dasgupta [0143], ‘Metering and Pricing 1082 provide cost tracking as resources are utilized within the cloud computing environment, and billing or invoicing for consumption of these resources. In one example, these resources may include application software licenses’.)

Therefore, it would be obvious to an ordinary of skills before the effective filing of the Application to combine the teaching of Li with Policy Function, mathematical learning actions of Reinforcement learning of any expectations, and the further teaching of Dasgupta with license fee (i.e. resources may include software licenses) because the Policy Function would be applicable to any expectation algorithm in Machine Learning including license cost as part of finance cost in the policy function toward economic concept.

As per Claim 3: Regarding,
3. The reinforcement learning method of claim 2, wherein the license fee is incurred in the use of a software program.
(See in Dasgupta [0143], Pricing 1082, and ‘these resources may include application software licenses’ is the use of software program)

As per Claim 4: Regarding,
4. The reinforcement learning method of claim 3, wherein the software program is a simulation program.
(Software resource in [0143] of Dasgupta reads on any types of software, thus it includes   simulation program )
Therefore, it would be obvious to an ordinary of skills before the effective filing of the Application to combine the teaching of Li with Policy Function, and the further teaching of Dasgupta with pricing software and resources as Policy Function; it  would be applicable to any expectation algorithm in Machine Learning including any software program such as simulation program that requires pricing or license.

As per Claim 5: Regarding,
5. The reinforcement learning method of claim 4, wherein the software program is at least partially executed on a prespecified type of computational system.
Software resource in [0143] of Dasgupta reads on any types of software, and Fig. 8, communication interface; thus, the resource software it can be executed in any platform, including the model reinforcement learning. 
Therefore, it would be obvious to an ordinary of skills before the effective filing of the Application to combine the teaching of Li with Policy Function, and the further teaching of Dasgupta software and resources; it  would be applicable to any expectation algorithm in Machine Learning including any software program such as a prespecified type of the communication. 

As per Claim 6: Regarding,
6. The reinforcement learning method of claim 4, wherein the software-encoded reinforcement learning program bi-directionally communicates with a software-encoded model  in performing at least one cycle of execution of the software-encoded reinforcement learning program.
(See Li, Figure 1 (a) Dataset as input to Policy Function, and the entire Figure 1 as the software-encoded reinforcement learning program. The execution is bi-direction as Policy function to Evaluation and to Evaluation to Policy function)

As per Claim 7: Regarding,
7. The reinforcement learning method of claim 1, wherein the software-encoded reinforcement learning program bi-directionally communicates with a software-encoded model in performing at least one cycle of execution of the software-encoded reinforcement learning program.
(Applied similarly to Claim 6 above)

As per Claim 8: Regarding,
8. The reinforcement learning method of claim 1, wherein the at least one financial cost value is related to a usage fee of a bundled software and computational hardware system.
Incorporated with claim 1, for Dasgupta discloses financial cost value; see further in Dasgupta [0143], ‘Metering and Pricing 1082 provide cost tracking as resources are utilized within the cloud computing environment, and billing or invoicing for consumption of these resources. In one example, these resources may include application software licenses’, it reads on “related to a usage fee of a bundled software and computational hardware system”)

Therefore, it would be obvious to an ordinary of skills before the effective filing of the Application to combine the teaching of Li with Policy Function, mathematical learning actions of Reinforcement learning of any expectations, and the further teaching of Dasgupta with financial cost value related to a usage fee (i.e. resources may include software licenses) because the Policy Function would be applicable to any expectation algorithm in Machine Learning including usage cost as part of finance cost in the policy function toward economic concept.

As per Claim 9: Regarding,
9. The reinforcement learning method of claim 1, wherein the reinforcement learning method adapts to one or more granularity settings available in a domain-specific simulator.
(See Li, p. 13, sec. 4.1, Experiment Setup, and the experiment uses simulation dataset in a real business)
As per Claim 10: Regarding,
10. The reinforcement learning method of claim 1, wherein the reinforcement learning method adapts to the licensing model associated with the granularity settings of the simulator.
Incorporated with claim 1, for Dasgupta discloses financial cost value; see further in Dasgupta [0143], ‘these resources may include application software licenses’, it reads on “adapts to the licensing model associated with the granularity settings of the simulator”)

Therefore, it would be obvious to an ordinary of skills before the effective filing of the Application to combine the teaching of Li with Policy Function, mathematical learning actions of Reinforcement learning of any expectations, and the further teaching of Dasgupta with financial cost value for adapting to the licensing model associated with the granularity settings of the simulator (i.e. resources may include software licenses) because the Policy Function would be applicable to any expectation algorithm in Machine Learning including licensing model as part of finance cost in the policy function toward economic concept.


As per Claim 11: Regarding,
11. The reinforcement learning method of claim 1, wherein the software-encoded reinforcement learning program includes a time limitation of duration of the plurality of cycles of execution of the software-encoded reinforcement learning program.
(See Li, in p. 10 sec. 3.2, using a time t and the time t+1 represents a mathematical definition of time limitation of a duration)

As per Claim 12: Regarding,
12. The reinforcement learning method of claim 1, wherein the at least one resource limitation limits a financial expenditure.
 Incorporated with claim 1, for Dasgupta discloses financial cost value; see further in Dasgupta [0143], ‘Metering and Pricing 1082 provide cost tracking as resources are utilized within the cloud computing environment, and billing or invoicing for consumption of these resources. In one example, these resources may include application software licenses’, it reads on “resource limitation limits a financial expenditure”. 
Therefore, it would be obvious to an ordinary of skills before the effective filing of the Application to combine the teaching of Li with Policy Function, mathematical learning actions of Reinforcement learning of any expectations, and the further teaching of Dasgupta with resource limited on finances (i.e. Metering and Pricing) because the Policy Function would be applicable to any expectation algorithm in Machine Learning including resource and its financial expenditure in the policy function toward economic concept.

As per Claim 13: Regarding,
13. The reinforcement learning method of claim 1, wherein the at least one resource limitation limits a total count of licenses to be accessed in a performance of a succeeding plurality of cycles of execution of the software-encoded reinforcement learning program.
Incorporated with claim 1, for Dasgupta discloses financial cost value; see further in Dasgupta [0143], ‘Metering and Pricing 1082 provide cost tracking as resources are utilized within the cloud computing environment, and billing or invoicing for consumption of these resources. In one example, these resources may include application software licenses’, it reads on “a total count of licenses to be accessed in a performance of a succeeding plurality of cycles of execution of the software-encoded reinforcement learning program”. 
Therefore, it would be obvious to an ordinary of skills before the effective filing of the Application to combine the teaching of Li with Policy Function, mathematical learning actions of Reinforcement learning of any expectations in the model of Figure 1(a), and the further teaching of Dasgupta with resource limited on finances and software licenses (i.e. software licenses in Dasgupta, applied as to the cycles of the evaluations of the policy function in the model of Li, Figure 1(a)) because the Policy Function would be applicable to any expectation algorithm in Machine Learning including resource and finances as limitations in the policy function toward economic concept.

As per Claim 14: Regarding,
14. The reinforcement learning method of claim 13 wherein the limitation of the total count of licenses to be accessed in a performance of a succeeding plurality of cycles of execution of the software-encoded reinforcement learning program includes at least one software program license.
(Incorporated with a total count of licenses as of claim 13, ‘includes at least one software program license’ remains reading in the text section [0143] and Fig. 1 of Dasgupta as ‘resources may include application software licenses’).

As per Claim 15: Regarding,
15. The reinforcement learning method of claim 1, wherein the limitation of the total count of licenses to be accessed in a performance of a succeeding plurality of cycles of execution of the software-encoded reinforcement learning program includes at least one software license.
 	Incorporated with claim 1, for Dasgupta discloses financial cost value; see further in Dasgupta [0143], ‘Metering and Pricing 1082 provide cost tracking as resources are utilized within the cloud computing environment, and billing or invoicing for consumption of these resources. In one example, these resources may include application software licenses’, it reads on “total count of licenses to be accessed in a performance of a succeeding plurality of cycles of execution of the software-encoded reinforcement learning program”, and with Fig. 1, the action-value function 117 receives bi-directional communications to dataset 119 as part of descriptions in [0143], as for “includes at least one software license”. 
Therefore, it would be obvious to an ordinary of skills before the effective filing of the Application to combine the teaching of Li with Policy Function, mathematical learning actions of Reinforcement learning of any expectations in the model of Figure 1(a), and the further teaching of Dasgupta with resource limited on finances and software licenses (i.e. Metering and Pricing, and software licenses in Dasgupta, applied as to cycles of the evaluations of the policy function in the model of Li, Figure 1(a)) because the Policy Function would be applicable to any expectation algorithm in Machine Learning including resource and financial software as limitations in the policy function toward economic concept.

As per Claim 16: Regarding,
16. The reinforcement learning method of claim 15, wherein the software license permits access to a simulation program.
(Incorporated with a total count of licenses as of claim 15, ‘wherein the software license permits access to a simulation program, remains reading in the text section [0143] and Fig. 1 of Dasgupta where reference # 119 reads on the dataset that action-value function accesses to it).

As per Claim 17: Regarding,
17. The reinforcement learning method of claim 1, wherein the simulation program is at least partially executed on a computational system accessed via an electronic communications network.
(Claim 17 is indefinite for unclear recitation addressed in the section of 35 USC 112(b). .
(with the best interpretation, Li discloses claim 17 with the Figure 1(a))

As per Claim 18: Regarding,
18. The reinforcement learning method of claim 1, wherein the limitation of the total count of licenses to be accessed in a performance a succeeding plurality of cycles of execution of the software-encoded reinforcement learning program includes at least one computational system license.
(Claim 18 is indefinite for unclear recitation addressed in the section of 35 USC 112(b). Claim 18 has the limitation similarly to claim 15. The claim is rejected in the similarly to claim 15)

As per Claims 19-22: Claims are directed to a system having the recitations corresponding to the limitations recited in the method of claims 1-4. The rejection of claims 19-22 is provided with the rationales in the claims 1-4 above.


Conclusion
 	 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Ted T Vo whose telephone number is (571)272-3706.  The examiner can normally be reached on 8am-4:30pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Wei Y Zhen can be reached on (571) 272-3708.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

TTV
August 12, 2022
/Ted T. Vo/
Primary Examiner, Art Unit 2191