Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 5/27/2022 has been entered.

Status of Claims
Claims 1-3, 5-10, 12-17, and 19-20 were amended in the response filed 5/27/2022.
Claims 4, 11, and 18 remain in a previous presentation. 
Claims 1-20 are currently pending and considered below. 

Notice to Applicant
Claims 3-4, 7-8, and 17-18 are not presently rejected under 35 U.S.C. 102 or 103, and hence would be in condition for allowance if amended to overcome the rejections presented under 35 U.S.C. 101. The following represents Examiner' s characterization of the most relevant prior art references and Examiner' s reasons for allowance if the claims were able to overcome the aforementioned grounds of rejection under 35 U.S.C. 101:
Personalized HeartSteps: A Reinforcement Learning Algorithm for Optimizing Physical Activity by Liao, et al. 
U.S. Patent Publication No. 2018/0089373 to Matsuguchi, et al. 
U.S. Patent Publication No. 2020/0193323 to Alesiani, et al.
The aforementioned references are understood to be the closest prior art.  Various aspects of the present invention are known individually, but for the reasons disclosed above, the particular manner in which the elements are claimed, when considered as an ordered combination, distinguishes from the aforementioned references and hence the present invention is not considered non-novel and/or an obvious variant of the inventions taught by the closest prior art references.

Claim Objections
Claim 7 is objected to because of the following informalities: claims 7 recites “responsive to the tow or more of the determined…” This should read “the two or more…”  Appropriate correction is required.

Claim Rejections – 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more.

Step 1 of the Alice/Mayo Test
Claims 1-20 are within the four statutory categories. 

Step 2A of the Alice/Mayo Test - Prong One
Claim 1, which is a representative claim for all claims 1-20, recites: A method, in a data processing system, comprising:
computing, by an interpretable strategy generation engine executing within the data processing system, a discounted health variable with a penalty for deviating from one or more clinical guidelines specified in clinical guidelines data based on a distance function representing an allowed deviation from the clinical guidelines;
applying, by a model builder executing within the data processing system, computer executed reinforcement learning operations on the discounted health variable to optimize the discounted health variable based on a distance function that evaluates a distance between a set of possible actions and an optimal action, and thereby generate a reinforcement learning (RL) machine learning (ML) computer model for generating dynamic treatment regimes;
executing on patient data, the RL ML computer model, configured with a first distance impact hyperparameter setting corresponding to no constraint on selection of a next action in a treatment regime thereby permitting selection of next actions that deviate from the one or more clinical guidelines with no constraint, to thereby determine a first next action;
executing, on the patient data, the RL ML computer model, configured with a second distance impact hyperparameter setting corresponding to a partially guideline compliant constraint on selection of a next action in the treatment regime with allowed limited deviation from the one or more clinical guidelines, to thereby determine a second next action;
executing, on the patient data. the RL ML computer model, configured with a third distance impact hyperparameter setting corresponding to a guideline compliant selection of a next action in the treatment regime that adheres to the one or more clinical guidelines with no deviation, to thereby determine a third next action;
generating, by a presentation layer within the dynamic treatment regime generation engine, an outcome output display based on the determined first, second, and third next action in a treatment regime, wherein the outcome output display connects the first, second and third next action to a current action in a tree representation that presents, in a visual display, a representation of adherence and deviation of a treatment regime from the clinical guidelines; and
presenting, by the presentation layer, the outcome output display to a user.

The Examiner submits that the foregoing underlined limitations constitute: (a) “certain methods of organizing human activity.” For example, a medical professional may follow rules or instructions to determine a treatment regime and a next action in a treatment regime. 
Dependent Claims 2-7, 9-14, and 16-20 include other limitations, but these only serve to further limit the abstract idea, and hence are directed towards fundamentally the same abstract idea as claim 1. For example, Claims 2-4, 9-11, and 16-18 merely recite a mathematical formula; claims 5-7, 12-14, and 19-20 merely define a type of data processed by the system and only serve to further limit the abstract idea.

Step 2A of the Alice/Mayo Test - Prong Two
A method, in a data processing system, comprising:
computing, by an interpretable strategy generation engine executing within the data processing system, a discounted health variable with a penalty for deviating from one or more clinical guidelines specified in clinical guidelines data based on a distance function representing an allowed deviation from the clinical guidelines;
applying, by a model builder executing within the data processing system, computer executed reinforcement learning operations on the discounted health variable to optimize the discounted health variable based on a distance function that evaluates a distance between a set of possible actions and an optimal action, and thereby generate a reinforcement learning (RL) machine learning (ML) computer model for generating dynamic treatment regimes;
executing on patient data, the RL ML computer model, configured with a first distance impact hyperparameter setting corresponding to no constraint on selection of a next action in a treatment regime thereby permitting selection of next actions that deviate from the one or more clinical guidelines with no constraint, to thereby determine a first next action;
executing, on the patient data, the RL ML computer model, configured with a second distance impact hyperparameter setting corresponding to a partially guideline compliant constraint on selection of a next action in the treatment regime with allowed limited deviation from the one or more clinical guidelines, to thereby determine a second next action;
executing, on the patient data. the RL ML computer model, configured with a third distance impact hyperparameter setting corresponding to a guideline compliant selection of a next action in the treatment regime that adheres to the one or more clinical guidelines with no deviation, to thereby determine a third next action;
generating, by a presentation layer within the dynamic treatment regime generation engine, an outcome output display based on the determined first, second, and third next action in a treatment regime, wherein the outcome output display connects the first, second and third next action to a current action in a tree representation that presents, in a visual display, a representation of adherence and deviation of a treatment regime from the clinical guidelines; and
presenting, by the presentation layer, the outcome output display to a user.

The bolded text shown above, are not part of the aforementioned abstract ideas. However, they relate to the remaining elements which do not amount to a practical application or significantly more, and will be discussed in further detail below. 

Furthermore, Claims 1-20 are not integrated into a practical application because the additional elements (i.e. the limitations not identified as part of the abstract idea) amount to no more than limitations which:
amount to mere instructions to apply an exception – for example, the recitation of “a data processing system,” “an interpretable strategy generation engine,” “a model builder,” “a reinforcement learning (RL) machine learning (ML) computer model,” “the dynamic treatment regime generation engine,” and “a presentation layer,” which amounts to merely invoking a computer as a tool to perform the abstract idea. See MPEP 2106.04(d).

Step 2B of the Alice/Mayo Test for Claims
Furthermore, the claims do not include additional elements that are sufficient to amount to “significantly more” than the judicial exception because the additional elements (i.e. the elements other than the abstract idea) amount to elements that have been recognized as well-understood, routine, and conventional activity in particular fields, as demonstrated by:
The data processing system, interpretable strategy engine, and the presentation layer amounts to elements that have been recognized as well-understood, routine, and conventional activity in particular fields, as demonstrated by:
U.S. Patent Publication No. 2018/0325385 to Deterding, et. al. disclosing the computing device (para. 0040); software (0041), display (0061)
U.S. Patent Publication No. 2014/0006055 to Seraly, et. al. disclosing the computing device (para. 0093), software (0099), display (0075)
The models/model builder amount to elements that have been recognized as well-understood, routine, and conventional activity in particular fields, as demonstrated by:
U.S. Patent Publication No. 2014/0006055 to Seraly, at para. 0014; 
U.S. Patent Publication No. 2018/0043182 to Wu, et al., at para. 0025.
Independent claims 8 and 15 include nearly identical limitations and are similarly rejected. Dependent claims 2-7, 9-14, and 16-20 include other limitations, but none of these functions are deemed significantly more than the abstract idea because the additional elements recited in the aforementioned dependent claims similarly represent no more than performing repetitive calculations (e.g. the “computing” feature of dependent Claims 2-3, 9-10, and 16-17; the “estimating” feature of dependent Claims 4, 11, and 18), receiving or transmitting data over a network (e.g. the “aggregating” and “storing” features of dependent Claims 5, 12, and 19); gathering and analyzing information using conventional techniques and displaying the result (e.g. the “display” feature of dependent Claims 6, 13, and 20; and the “generating” feature of dependent Claims 7 and 14). 
Thus, taken alone, the additional elements do not amount to “significantly more” than the above-identified abstract idea. Furthermore, looking at the limitations as an ordered combination adds nothing that is not already present when looking at the elements taken individually, and there is no indication that the combination of elements improves the functioning of a computer or improves any other technology, and their collective functions merely provide conventional computer implementation. Therefore, whether taken individually or as an ordered combination, Claims 1-20 are rejected under 35 U.S.C. 101 as being directed to non-statutory subject matter.

Claim Rejections - 35 USC § 112(b)
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
The term “limited” in claims 1, 8, and 15, is a relative term which renders the claims indefinite. The term “limited” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. Said another way, the claim is indefinite because it is unclear what would constitute a “limited deviation” and what would not constitute a “limited deviation.”
The term “allowed” in claims 1, 8, and 15, is a relative term which renders the claims indefinite. The term “allowed” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. Said another way, the claim is indefinite because it is unclear what would constitute a “allowed deviation” and what would not constitute an “allowed deviation.”

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-2, 5-9, 12-16, and 19-20 are rejected under 35 U.S.C. § 103 as being unpatentable over Personalized HeartSteps: A Reinforcement Learning Algorithm for Optimizing Physical Activity by Liao, et al. (“Liao”) in view of U.S. Patent Publication No. 2018/0089373 to Matsuguchi, et al. (“Matsuguchi”) in further view of U.S. Patent Publication No. 2020/0193323 to Alesiani, et al. (“Alesiani”)
Regarding Claim 1, Liao discloses:
A method, comprising: (Liao, page 4, second paragraph: a dynamic system for promoting physical activity and dietary health)
computing, by an interpretable strategy generation engine executing within the data processing system, (Liao, page 11, second full paragraph: simulation-based procedure) a discounted health variable (Liao, page 11, second full paragraph: calculating the discount rate, γ, based on a simulation-based procedure) with a penalty for deviating from one or more clinical guidelines (Liao, page 2: the “reward” is effectively a penalty1 for completing or deviating from certain guidelines/practices, such as an unsatisfactory step count. The clinical guidelines are explicitly taught by Matsuguchi, below) 
applying, by a model builder executing within the data processing system, computer executed reinforcement learning operations on the discounted health variable (Liao, page 11, second full paragraph: applying a RL algorithm to the tuning parameters, which includes the discount rate, γ) to optimize the discounted health variable (Liao, page 11, second full paragraph: the modeling is used to find optimal tuning parameters), and thereby generate a reinforcement learning (RL) machine learning (ML) model (Liao, page 11, second full paragraph: a simulation environment, and the Reinforcement Learning is an area of machine learning) for generating dynamic treatment regimes; (Liao, page 13, 7 Pilot Data From HeartSteps V2: the RL algorithm is used to trigger a “context-tailored activity suggestion,” construed as a dynamic treatment regime)
executing on patient data, the RL ML computer model (Liao, page 10, first paragraph: the HeartSteps V1) corresponding to no constraint on selection of a next action in a treatment regime (Liao, page 10, first paragraph: using a constraint of zero) thereby permitting selection of next actions that deviate from the one or more clinical guidelines with no constraint, to thereby determine a first next action; (Liao, page 10, first paragraph: the discounted future rewards sends nothing (i.e. with a constraint of 0) or sends an activity suggestion, e.g. if a constraint is greater than a certain value)
executing, on the patient data, the RL ML computer model, (Liao, page 10, first paragraph: the HeartSteps V1) corresponding to a partially guideline compliant constraint on selection of a next action in the treatment regime with allowed limited deviation from the one or more clinical guidelines, (Liao, page 10, first paragraph: using a constraint between zero and one) to thereby determine a second next action; (Liao, page 10, first paragraph: the discounted future rewards sends an activity suggestion, for example, when a constraint is greater than a certain value)
executing, on the patient data, the RL ML computer model, (Liao, page 10, first paragraph: the HeartSteps V1) corresponding to a guideline compliant selection of a next action in the treatment regime that adheres to the one or more clinical guidelines with no deviation, (Liao, page 10, first paragraph: using a constraint of one) to thereby determine a third next action; (Liao, page 10, first paragraph: the action is selected to maximize the sum of discounted rewards)
Liao discloses a dynamic system for promoting physical activity and dietary health using RL learning (page 4, second paragraph). However, Liao does not explicitly recite, but Matsuguchi teaches that it is old and well known in the art of healthcare to include calculating a variable specified in clinical guidelines data (Matsuguchi, [0194]: National Comprehensive Cancer Network (NCCN) guidelines) based on a distance function representing an allowed deviation from the clinical guidelines; (Matsuguchi, [0183]: grouping therapies using a “similarity score,” which is a type of distance function, and the similarity can be determined using an empirical significance threshold, which would be a type of allowed deviation)
generating, by a presentation layer within the dynamic treatment regime generation engine, an outcome output display based on the determined first, second, and third action in a treatment regime (Matsuguchi, [0184]: ten or more therapies can be presented, which includes a first, second, and third) wherein the outcome output display connects the first, second and third next action to a current action in a tree representation (Matsuguchi, Fig. 8, [0184]: the matched therapies are displayed on a user interface and Fig. 8 shows an action tree representation) that presents, in a visual display, a representation of adherence and deviation of a treatment regime from the clinical guidelines; and (Matsuguchi, [0184]: deviations from clinical guidelines, such as “can new chemotherapies cause your cancer to shrink?”)
presenting, by the presentation layer, the outcome output display to a user. (Matsuguchi, [0184]: the matched therapies are displayed on a user interface).
Therefore, it would have been obvious to one having ordinary skill in the art of healthcare at the time of filing to modify Liao to include the outcome output display, as taught by Matsuguchi, for efficient processing because Matsuguchi teaches that the above improves efficient data processing and user experience.
The combination of Liao and Matsuguchi does not explicitly recite, but Alesiani teaches that it is old and well known in the art of healthcare to include calculating a variable based on a distance function that evaluates a distance between a set of possible actions and an optimal action (Alesiani, [0052]: finding optimal values for MILP problems (see Fig. 1) using a distance function, para. 0058)
configured with a first distance impact hyperparameter setting (Alesiani, [0006]: the hyperparameters are iteratively tested and would include multiple settings)
configured with a second distance impact hyperparameter setting (Alesiani, [0006]: the hyperparameters are iteratively tested and would include multiple settings)
configured with a third distance impact hyperparameter setting (Alesiani, [0006]: the hyperparameters are iteratively tested and would include multiple settings)
Therefore, it would have been obvious to one having ordinary skill in the art of healthcare at the time of filing to modify the combination to include the distance function and hyperparameters, as taught by Alesiani, for efficient processing because Alesiani teaches that the above improves technical systems by providing for a selection of a best solver under given constraints. 

Regarding claim 2, the combination discloses each of the limitations of claim 1 as discussed above, and further discloses:
wherein applying computer executed reinforcement learning operations on the discounted health variable comprises computing an average outcome value as follows: 
    PNG
    media_image1.png
    29
    192
    media_image1.png
    Greyscale
 (Liao, page 9: (S4) the mean reward given St = s and At = a is r(s, a), and Qπ(s, a) = Eπ[Rt + γRt+1 + γ2Rt+2 + . . . | St = s, At = a]) where:
Y* denote the optimal outcome variable that the RL is optimizing; and (Liao, page 9, fourth paragraph: maximizing the future value)
Qk is a Q function as defined in a Q-learning algorithm in RL that estimates the average treatment effect (Liao, pg 9: (S4) the mean reward given St = s and At = a is r(s, a)).
Matsuguchi teaches that it is old and well known in the art of healthcare to include Lk represents a history of all observed data for a patient up to time k; (Matsuguchi, [0180]: a list of therapies is generated from medical history data of the subject) Ak represents the history of all past actions that were taken on a given patient up to time k; (Matsuguchi, [0180]: a list of therapies is generated from a first list of therapies of the subject)
Therefore, it would have been obvious to one having ordinary skill in the art of healthcare at the time of filing to modify Liao to include that Lk represents a history of all observed data for a patient up to time k; Ak represents the history of all past actions that where taken on a given patient up to time k, as taught by Matsuguchi, for efficient processing because Matsuguchi teaches that the above improves efficient data processing.

Regarding claim 5, the combination discloses each of the limitations of claim 1 as discussed above, and further discloses:
Matsuguchi teaches that it is old and well known in the art of healthcare to include wherein applying computer executed reinforcement learning operations on the discounted health variable comprises aggregating data from patient monitors and storing aggregated data in a historical patient data storage. (Matsuguchi, [0005]: the medical history data).
Therefore, it would have been obvious to one having ordinary skill in the art of healthcare at the time of filing to modify Liao to include wherein applying reinforcement learning techniques on the discounted health variable comprises aggregating data from patient monitors and storing aggregated data in a historical patient data storage, for efficient processing because Matsuguchi teaches that the above improves efficient data processing.

Regarding claim 6, the combination discloses each of the limitations of claim 1 as discussed above, and further discloses:
wherein generating the outcome output display comprises generating an action tree based on the determined first, second, and third next action in the treatment regime using the RL model, (Liao, page 2: the treatment policies are construed as the action tree) wherein the first distance impact hyperparameter setting specifies a relatively large value that does not present a limit on a deviation of from the clinical guidelines, (Liao, page 10, first paragraph: the discounted future rewards sends nothing (i.e. with a constraint of 0) or sends an activity suggestion, e.g. if a constraint is greater than a certain value) the second distance impact hyperparameter setting specifies a non-zero value indicating a non-zero limit on deviation from the clinical guidelines of an optimal next action in the treatment regime, (Liao, page 10, first paragraph: the discounted future rewards sends an activity suggestion, for example, when a constraint is greater than a certain value)and the third distance impact hyperparameter setting specifies a zero value indicating no permissible deviation of the next action in the treatment regime from the clinical guidelines. (Liao, page 10, first paragraph: the action is selected to maximize the sum of discounted rewards). The hyperparameters are taught by Alesiani, above.

Regarding claim 7, the combination discloses each of the limitations of claim 1 as discussed above, and further discloses:
wherein generating the outcome output display further comprises:
repeating the execution of the RL ML computer model on the patient data configured with the first, second, and third distance impact hyperparameter settings, for a plurality of time points, and generating, at each time point a set of the first, second, and third next actions for the corresponding time point, and (Liao, page 11: using cross validation and multiple iterations to construct distributions, estimate noise, and tune parameters, and see page 11, that the max discount reward is selection based on decision time T)
for each time point, extending the outcome output display at least by generating a corresponding action tree at least by: connecting current actions corresponding to a current time point to the first, second, and third next actions for a next time point in the plurality of time points; (Liao, page 5: the dosage variable connects past treatments to future rewards)
responsive to two or more of the determined first, second, and third next actions in the treatment regime being the same, reducing the action tree; and (Liao, page 4: the treatment policies are based on past rewards and applied to future rewards)
responsive to the tow or more of the determined first, second, and third next actions in the treatment regime not being the same, expanding the action tree. (Liao, page 4: the treatment policies is updated and as the system learns from additional data).

Regarding Claim 8, Liao discloses:
A computer program product comprising a computer readable storage medium having a computer readable program stored therein, wherein the computer readable program, when executed on a data processing system, causes the data processing system to: (Liao, page 4, second paragraph: a dynamic system for promoting physical activity and dietary health)
compute, by an interpretable strategy generation engine executing within the data processing system, (Liao, page 11, second full paragraph: simulation-based procedure) a discounted health variable (Liao, page 11, second full paragraph: calculating the discount rate, γ, based on a simulation-based procedure) with a penalty for deviating from one or more clinical guidelines (Liao, page 2: the “reward” is effectively a penalty2 for completing or deviating from certain guidelines/practices, such as an unsatisfactory step count. The clinical guidelines are explicitly taught by Matsuguchi, below)
apply, by a model builder executing within the data processing system, computer executed reinforcement learning operations on the discounted health variable (Liao, page 11, second full paragraph: applying a RL algorithm to the tuning parameters, which includes the discount rate, γ) to optimize the discounted health variable (Liao, page 11, second full paragraph: the modeling is used to find optimal tuning parameters), and thereby generate a reinforcement learning (RL) machine learning (ML) model (Liao, page 11, second full paragraph: a simulation environment, and the Reinforcement Learning is an area of machine learning) for generating dynamic treatment regimes; (Liao, page 13, 7 Pilot Data From HeartSteps V2: the RL algorithm is used to trigger a “context-tailored activity suggestion,” construed as a dynamic treatment regime)
execute on patient data, the RL ML computer model (Liao, page 10, first paragraph: the HeartSteps V1) corresponding to no constraint on selection of a next action in a treatment regime (Liao, page 10, first paragraph: using a constraint of zero) thereby permitting selection of next actions that deviate from the one or more clinical guidelines with no constraint, to thereby determine a first next action; (Liao, page 10, first paragraph: the discounted future rewards sends nothing (i.e. with a constraint of 0) or sends an activity suggestion, e.g. if a constraint is greater than a certain value)
execute, on the patient data, the RL ML computer model, (Liao, page 10, first paragraph: the HeartSteps V1) corresponding to a partially guideline compliant constraint on selection of a next action in the treatment regime with allowed limited deviation from the one or more clinical guidelines, (Liao, page 10, first paragraph: using a constraint between zero and one) to thereby determine a second next action; (Liao, page 10, first paragraph: the discounted future rewards sends an activity suggestion, for example, when a constraint is greater than a certain value)
executing, on the patient data, the RL ML computer model, (Liao, page 10, first paragraph: the HeartSteps V1) corresponding to a guideline compliant selection of a next action in the treatment regime that adheres to the one or more clinical guidelines with no deviation, (Liao, page 10, first paragraph: using a constraint of one) to thereby determine a third next action; (Liao, page 10, first paragraph: the action is selected to maximize the sum of discounted rewards)
Liao discloses a dynamic system for promoting physical activity and dietary health using RL learning (page 4, second paragraph). However, Liao does not explicitly recite, but Matsuguchi teaches that it is old and well known in the art of healthcare to include calculating a variable specified in clinical guidelines data (Matsuguchi, [0194]: National Comprehensive Cancer Network (NCCN) guidelines) based on a distance function representing an allowed deviation from the clinical guidelines; (Matsuguchi, [0183]: grouping therapies using a “similarity score,” which is a type of distance function, and the similarity can be determined using an empirical significance threshold, which would be a type of allowed deviation)
generating, by a presentation layer within the dynamic treatment regime generation engine, an outcome output display based on the determined first, second, and third action in a treatment regime (Matsuguchi, [0184]: ten or more therapies can be presented, which includes a first, second, and third) wherein the outcome output display connects the first, second and third next action to a current action in a tree representation (Matsuguchi, Fig. 8, [0184]: the matched therapies are displayed on a user interface and Fig. 8 shows an action tree representation) that presents, in a visual display, a representation of adherence and deviation of a treatment regime from the clinical guidelines; and (Matsuguchi, [0184]: deviations from clinical guidelines, such as “can new chemotherapies cause your cancer to shrink?”)
presenting, by the presentation layer, the outcome output display to a user. (Matsuguchi, [0184]: the matched therapies are displayed on a user interface).
Therefore, it would have been obvious to one having ordinary skill in the art of healthcare at the time of filing to modify Liao to include the outcome output display, as taught by Matsuguchi, for efficient processing because Matsuguchi teaches that the above improves efficient data processing and user experience.
The combination of Liao and Matsuguchi does not explicitly recite, but Alesiani teaches that it is old and well known in the art of healthcare to include calculating a variable based on a distance function that evaluates a distance between a set of possible actions and an optimal action (Alesiani, [0052]: finding optimal values for MILP problems (see Fig. 1) using a distance function, para. 0058)
configured with a first distance impact hyperparameter setting (Alesiani, [0006]: the hyperparameters are iteratively tested and would include multiple settings)
configured with a second distance impact hyperparameter setting (Alesiani, [0006]: the hyperparameters are iteratively tested and would include multiple settings)
configured with a third distance impact hyperparameter setting (Alesiani, [0006]: the hyperparameters are iteratively tested and would include multiple settings)
Therefore, it would have been obvious to one having ordinary skill in the art of healthcare at the time of filing to modify the combination to include the distance function and hyperparameters, as taught by Alesiani, for efficient processing because Alesiani teaches that the above improves technical systems by providing for a selection of a best solver under given constraints.

Regarding claim 9, the combination teaches each of the limitations of claim 8 as discussed above, and further discloses:
wherein applying computer executed reinforcement learning operations on the discounted health variable comprises computing an average outcome value as follows: 
    PNG
    media_image1.png
    29
    192
    media_image1.png
    Greyscale
 (Liao, page 9: (S4) the mean reward given St = s and At = a is r(s, a), and Qπ(s, a) = Eπ[Rt + γRt+1 + γ2Rt+2 + . . . | St = s, At = a]) where:
Y* denote the optimal outcome variable that the RL is optimizing; and (Liao, page 9, fourth paragraph: maximizing the future value)
Qk is a Q function as defined in a Q-learning algorithm in RL that estimates the average treatment effect (Liao, pg 9: (S4) the mean reward given St = s and At = a is r(s, a)).
Matsuguchi teaches that it is old and well known in the art of healthcare to include Lk represents a history of all observed data for a patient up to time k; (Matsuguchi, [0180]: a list of therapies is generated from medical history data of the subject) Ak represents the history of all past actions that were taken on a given patient up to time k; (Matsuguchi, [0180]: a list of therapies is generated from a first list of therapies of the subject)
Therefore, it would have been obvious to one having ordinary skill in the art of healthcare at the time of filing to modify Liao to include that Lk represents a history of all observed data for a patient up to time k; Ak represents the history of all past actions that where taken on a given patient up to time k, as taught by Matsuguchi, for efficient processing because Matsuguchi teaches that the above improves efficient data processing.


Regarding claim 12, the combination teaches each of the limitations of claim 8 as discussed above, and further discloses:
Matsuguchi teaches that it is old and well known in the art of healthcare to include wherein applying computer executed reinforcement learning operations on the discounted health variable comprises aggregating data from patient monitors and storing aggregated data in a historical patient data storage. (Matsuguchi, [0005]: the medical history data).
Therefore, it would have been obvious to one having ordinary skill in the art of healthcare at the time of filing to modify Liao to include wherein applying reinforcement learning techniques on the discounted health variable comprises aggregating data from patient monitors and storing aggregated data in a historical patient data storage, for efficient processing because Matsuguchi teaches that the above improves efficient data processing.

Regarding claim 13, the combination teaches each of the limitations of claim 8 as discussed above, and further discloses:
wherein generating the outcome output display comprises generating an action tree based on the determined first, second, and third next action in the treatment regime using the RL model, (Liao, page 2: the treatment policies are construed as the action tree) wherein the first distance impact hyperparameter setting specifies a relatively large value that does not present a limit on a deviation of from the clinical guidelines, (Liao, page 10, first paragraph: the discounted future rewards sends nothing (i.e. with a constraint of 0) or sends an activity suggestion, e.g. if a constraint is greater than a certain value) the second distance impact hyperparameter setting specifies a non-zero value indicating a non-zero limit on deviation from the clinical guidelines of an optimal next action in the treatment regime, (Liao, page 10, first paragraph: the discounted future rewards sends an activity suggestion, for example, when a constraint is greater than a certain value)and the third distance impact hyperparameter setting specifies a zero value indicating no permissible deviation of the next action in the treatment regime from the clinical guidelines. (Liao, page 10, first paragraph: the action is selected to maximize the sum of discounted rewards). The hyperparameters are taught by Alesiani, above.

Regarding claim 14, the combination teaches each of the limitations of claim 8 as discussed above, and further discloses:
wherein generating the outcome output display further comprises:
repeating the execution of the RL ML computer model on the patient data configured with the first, second, and third distance impact hyperparameter settings, for a plurality of time points, and generating, at each time point a set of the first, second, and third next actions for the corresponding time point, and (Liao, page 11: using cross validation and multiple iterations to construct distributions, estimate noise, and tune parameters, and see page 11, that the max discount reward is selection based on decision time T)
for each time point, extending the outcome output display at least by generating a corresponding action tree at least by: connecting current actions corresponding to a current time point to the first, second, and third next actions for a next time point in the plurality of time points; (Liao, page 5: the dosage variable connects past treatments to future rewards)
responsive to two or more of the determined first, second, and third next actions in the treatment regime being the same, reducing the action tree; and (Liao, page 4: the treatment policies are based on past rewards and applied to future rewards)
responsive to the tow or more of the determined first, second, and third next actions in the treatment regime not being the same, expanding the action tree. (Liao, page 4: the treatment policies is updated and as the system learns from additional data).

Regarding Claim 15, Liao discloses:
A data processing system comprising: (Liao, page 4, second paragraph: a dynamic system for promoting physical activity and dietary health)
at least one processor; and at least one memory coupled to the at least one processor, wherein the at least one memory comprises instructions which, when executed by the at least one processor, cause the at least one processor to: (Liao, page 1: a mobile application)
compute, by an interpretable strategy generation engine executing within the data processing system, (Liao, page 11, second full paragraph: simulation-based procedure) a discounted health variable (Liao, page 11, second full paragraph: calculating the discount rate, γ, based on a simulation-based procedure) with a penalty for deviating from one or more clinical guidelines (Liao, page 2: the “reward” is effectively a penalty3 for completing or deviating from certain guidelines/practices, such as an unsatisfactory step count. The clinical guidelines are explicitly taught by Matsuguchi, below) 
apply, by a model builder executing within the data processing system, computer executed reinforcement learning operations on the discounted health variable (Liao, page 11, second full paragraph: applying a RL algorithm to the tuning parameters, which includes the discount rate, γ) to optimize the discounted health variable (Liao, page 11, second full paragraph: the modeling is used to find optimal tuning parameters), and thereby generate a reinforcement learning (RL) machine learning (ML) model (Liao, page 11, second full paragraph: a simulation environment, and the Reinforcement Learning is an area of machine learning) for generating dynamic treatment regimes; (Liao, page 13, 7 Pilot Data From HeartSteps V2: the RL algorithm is used to trigger a “context-tailored activity suggestion,” construed as a dynamic treatment regime)
execute on patient data, the RL ML computer model (Liao, page 10, first paragraph: the HeartSteps V1) corresponding to no constraint on selection of a next action in a treatment regime (Liao, page 10, first paragraph: using a constraint of zero) thereby permitting selection of next actions that deviate from the one or more clinical guidelines with no constraint, to thereby determine a first next action; (Liao, page 10, first paragraph: the discounted future rewards sends nothing (i.e. with a constraint of 0) or sends an activity suggestion, e.g. if a constraint is greater than a certain value;
execute, on the patient data, the RL ML computer model, (Liao, page 10, first paragraph: the HeartSteps V1) corresponding to a partially guideline compliant constraint on selection of a next action in the treatment regime with allowed limited deviation from the one or more clinical guidelines, (Liao, page 10, first paragraph: using a constraint between zero and one) to thereby determine a second next action; (Liao, page 10, first paragraph: the discounted future rewards sends an activity suggestion, for example, when a constraint is greater than a certain value;
executing, on the patient data, the RL ML computer model, (Liao, page 10, first paragraph: the HeartSteps V1) corresponding to a guideline compliant selection of a next action in the treatment regime that adheres to the one or more clinical guidelines with no deviation, (Liao, page 10, first paragraph: using a constraint of one) to thereby determine a third next action; (Liao, page 10, first paragraph: the action is selected to maximize the sum of discounted rewards)
Liao discloses a dynamic system for promoting physical activity and dietary health using RL learning (page 4, second paragraph). However, Liao does not explicitly recite, but Matsuguchi teaches that it is old and well known in the art of healthcare to include calculating a variable specified in clinical guidelines data (Matsuguchi, [0194]: National Comprehensive Cancer Network (NCCN) guidelines) based on a distance function representing an allowed deviation from the clinical guidelines; (Matsuguchi, [0183]: grouping therapies using a “similarity score,” which is a type of distance function, and the similarity can be determined using an empirical significance threshold, which would be a type of allowed deviation)
generating, by a presentation layer within the dynamic treatment regime generation engine, an outcome output display based on the determined first, second, and third action in a treatment regime (Matsuguchi, [0184]: ten or more therapies can be presented, which includes a first, second, and third) wherein the outcome output display connects the first, second and third next action to a current action in a tree representation (Matsuguchi, Fig. 8, [0184]: the matched therapies are displayed on a user interface and Fig. 8 shows an action tree representation) that presents, in a visual display, a representation of adherence and deviation of a treatment regime from the clinical guidelines; and (Matsuguchi, [0184]: deviations from clinical guidelines, such as “can new chemotherapies cause your cancer to shrink?”)
presenting, by the presentation layer, the outcome output display to a user. (Matsuguchi, [0184]: the matched therapies are displayed on a user interface).
Therefore, it would have been obvious to one having ordinary skill in the art of healthcare at the time of filing to modify Liao to include the outcome output display, as taught by Matsuguchi, for efficient processing because Matsuguchi teaches that the above improves efficient data processing and user experience.
The combination of Liao and Matsuguchi does not explicitly recite, but Alesiani teaches that it is old and well known in the art of healthcare to include calculating a variable based on a distance function that evaluates a distance between a set of possible actions and an optimal action (Alesiani, [0052]: finding optimal values for MILP problems (see Fig. 1) using a distance function, para. 0058).
configured with a first distance impact hyperparameter setting (Alesiani, [0006]: the hyperparameters are iteratively tested and would include multiple settings)
configured with a second distance impact hyperparameter setting (Alesiani, [0006]: the hyperparameters are iteratively tested and would include multiple settings)
configured with a third distance impact hyperparameter setting (Alesiani, [0006]: the hyperparameters are iteratively tested and would include multiple settings)
Therefore, it would have been obvious to one having ordinary skill in the art of healthcare at the time of filing to modify the combination to include the distance function and hyperparameters, as taught by Alesiani, for efficient processing because Alesiani teaches that the above improves technical systems by providing for a selection of a best solver under given constraints. 

Regarding claim 16, the combination teaches each of the limitations of claim 15 as discussed above, and further discloses:
wherein applying computer executed reinforcement learning operations on the discounted health variable comprises: computing an average outcome value as follows: 
    PNG
    media_image1.png
    29
    192
    media_image1.png
    Greyscale
 (Liao, page 9: (S4) the mean reward given St = s and At = a is r(s, a), and Qπ(s, a) = Eπ[Rt + γRt+1 + γ2Rt+2 + . . . | St = s, At = a]) where:
Y* denote the optimal outcome variable that the RL is optimizing; and (Liao, page 9, fourth paragraph: maximizing the future value)
Qk is a Q function as defined in a Q-learning algorithm in RL that estimates the average treatment effect (Liao, pg 9: (S4) the mean reward given St = s and At = a is r(s, a)).
Matsuguchi teaches that it is old and well known in the art of healthcare to include Lk represents a history of all observed data for a patient up to time k; (Matsuguchi, [0180]: a list of therapies is generated from medical history data of the subject) Ak represents the history of all past actions that where taken on a given patient up to time k; (Matsuguchi, [0180]: a list of therapies is generated from a first list of therapies of the subject)
Therefore, it would have been obvious to one having ordinary skill in the art of healthcare at the time of filing to modify Liao to include that Lk represents a history of all observed data for a patient up to time k; Ak represents the history of all past actions that where taken on a given patient up to time k, as taught by Matsuguchi, for efficient processing because Matsuguchi teaches that the above improves efficient data processing.

Regarding claim 19, the combination teaches each of the limitations of claim 15 as discussed above, and further discloses:
Matsuguchi teaches that it is old and well known in the art of healthcare to include wherein applying computer executed reinforcement learning operations on the discounted health variable comprises aggregating data from patient monitors and storing aggregated data in a historical patient data storage. (Matsuguchi, [0005]: the medical history data).
Therefore, it would have been obvious to one having ordinary skill in the art of healthcare at the time of filing to modify Liao to include wherein applying reinforcement learning techniques on the discounted health variable comprises aggregating data from patient monitors and storing aggregated data in a historical patient data storage, for efficient processing because Matsuguchi teaches that the above improves efficient data processing.

Regarding claim 20, the combination teaches each of the limitations of claim 15 as discussed above, and further discloses:
wherein generating the outcome output display comprises generating an action tree based on the determined first, second, and third next action in the treatment regime using the RL model, (Liao, page 2: the treatment policies are construed as the action tree) wherein the first distance impact hyperparameter setting specifies a relatively large value that does not present a limit on a deviation of from the clinical guidelines, (Liao, page 10, first paragraph: the discounted future rewards sends nothing (i.e. with a constraint of 0) or sends an activity suggestion, e.g. if a constraint is greater than a certain value) the second distance impact hyperparameter setting specifies a non-zero value indicating a non-zero limit on deviation from the clinical guidelines of an optimal next action in the treatment regime, (Liao, page 10, first paragraph: the discounted future rewards sends an activity suggestion, for example, when a constraint is greater than a certain value)and the third distance impact hyperparameter setting specifies a zero value indicating no permissible deviation of the next action in the treatment regime from the clinical guidelines. (Liao, page 10, first paragraph: the action is selected to maximize the sum of discounted rewards). The hyperparameters are taught by Alesiani, above.

Response to Applicant’s Arguments
Applicant’s arguments and amendments, filed on 5/27/2022, with respect to the 35 USC § 101 rejection of claims 1-20 have been considered but are not persuasive. 
The Applicant argues that the claims represent a technological improvement and are therefore allowable under 101. Examiner disagrees. Applicant’s argument that the claims amount to be “significantly more” under Step 2B of the 2019 PEG analysis is not persuasive because an improvement of conventional technologies is not a technical solution to a technical problem. Instead the argued improvement represent improvements to the abstract idea of the certain methods of organizing human activity as discussed above. In contrast, the 2019 PEG cite to “a modification of Internet hyperlink protocol to dynamically produce a dual-source hybrid web page” (i.e., the invention of DDR Holdings) to demonstrate an “improvement in the function of a computer or an improvement to other technology or technical field.” 
The Applicant argues that the claims are not directed to an abstract idea because only a computer tool can apply reinforcement learning to generate an RL model. The argument is not persuasive because the claims are merely invoking a computer as a tool to perform the abstract idea.
The Applicant argues that the claims improve the functioning of a computer. The argument is not persuasive because these features merely recite generic computer components, and the claimed technological improvement is to the abstract idea, and not the processing system.

Applicant’s amendments, filed on 5/27/2022, with respect to the 35 USC § 112(a) rejection of claims 1-20 have been considered and are persuasive. 

Applicant’s arguments and amendments, filed on 5/27/2022, with respect to the 35 USC § 103 rejection of claims 1-2, 5-9, 12-16, and 19-20 have been considered but are not persuasive. 
With regards to claim 1, Applicant argues that Liao does not teach the penalty, as claimed. The argument is not persuasive, as stated above on page 10 with the footnote to Paul. 
Applicant’s argument directed to the newly added claim limitations have been addressed above, and are rejected over Liao in view of Matsuguchi in further view of Alesiani.
Claims 3-4, 7-8, and 17-18 are objected to as being dependent upon a rejected base claim, but would overcome the art of record if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
For the reasons set forth in the 35 USC § 103 rejection of claims 1-20 above, the references cited in the rejection render amended claims 1-2, 5-9, 12-16, and 19-20 obvious under 35 USC § 103. Applicant’s argument is not persuasive.

CONCLUSION
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Shyam Goswami whose telephone number is (303)297-4283.  The examiner can normally be reached on Monday-Thursday, 8:30AM-6:30PM MT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Victoria Augustine can be reached on (313)446-4858.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


SHYAM M GOSWAMI
Examiner
Art Unit 3686



/JOHN P GO/Primary Examiner, Art Unit 3686                                                                                                                                                                                                        


    
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
    

    
        1 See An introduction to Q-Learning: Reinforcement Learning by Sayak Paul, (available at https://blog.floydhub.com/an-introduction-to-q-learning-reinforcement-learning/), stating that “The rewards need not be always the same. But it is much better than having some amount reward for the actions than having no rewards at all. This idea is known as the living penalty.”
        2 See An introduction to Q-Learning: Reinforcement Learning by Sayak Paul, (available at https://blog.floydhub.com/an-introduction-to-q-learning-reinforcement-learning/), stating that “The rewards need not be always the same. But it is much better than having some amount reward for the actions than having no rewards at all. This idea is known as the living penalty.”
        3 See An introduction to Q-Learning: Reinforcement Learning by Sayak Paul, (available at https://blog.floydhub.com/an-introduction-to-q-learning-reinforcement-learning/), stating that “The rewards need not be always the same. But it is much better than having some amount reward for the actions than having no rewards at all. This idea is known as the living penalty.”