Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
Claims 1-3, 8-10, and 15-17 were amended in the response filed 11/30/2021.
Claims 4-7, 11-14, and 18-20 remain in a previous presentation. 
Claims 1-20 are currently pending and considered below. 

Claim Rejections – 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more.

Step 1 of the Alice/Mayo Test
Claims 1-20 are within the four statutory categories. 

Step 2A of the Alice/Mayo Test - Prong One
Following Prong One of Step 2A of the Alice/Mayo Test, Claim 1, which is a representative claim for all claims 1-20, which is addressed below for 101 explanation purposes, recites: A method, in a data processing system comprising a processor and a memory, the memory comprising instructions that are executed by the processor to specifically configure the learning interpretable strategies in the presence of existing domain knowledge, the method comprising:
transforming clinical guidelines from a clinical guidelines store into logical assertions to form a clinical guidelines representation data structure;
computing a discounted health variable with a penalty for deviating from clinical guidelines in the clinical guidelines representation data structure based on a distance function representing an allowed deviation from the clinical guidelines;
applying, by a model builder executing within the data processing system, reinforcement learning techniques on the discounted health variable to generate a reinforcement learning (RL) model for generating dynamic treatment regimes;
determining, by the dynamic treatment regime generation engine using the RL model, for a patient for a plurality of times, an unconstrained next action in a treatment regime with no constraint, a partially guideline compliant next action in the treatment regime with allowed deviation from the guidelines, and a guideline compliant next action in the treatment regime that adheres to the guidelines;
generating, by a presentation layer within the dynamic treatment regime generation engine, an outcome output display based on the determined next action in a treatment regime using the RL model with no constraint, optimal next action in the treatment regime with allowed deviation from the guidelines, and next action in the treatment regime that adheres to the guidelines; and
presenting, by the presentation layer, the outcome output display to a user.

The Examiner submits that the foregoing underlined limitations constitute: (a) “certain methods of organizing human activity.” For example, a medical professional may follow rules or instructions to determine a treatment regime and a next action in a treatment regime. 


Step 2A of the Alice/Mayo Test - Prong Two
A method, in a data processing system comprising a processor and a memory, the memory comprising instructions that are executed by the processor to specifically configure the processor to implement a dynamic treatment regime generation engine for learning interpretable strategies in the presence of existing domain knowledge, the method comprising:
transforming clinical guidelines from a clinical guidelines store into logical assertions to form a clinical guidelines representation data structure;
computing a discounted health variable with a penalty for deviating from clinical guidelines in the clinical guidelines representation data structure based on a distance function representing an allowed deviation from the clinical guidelines;
applying, by a model builder executing within the data processing system, reinforcement learning techniques on the discounted health variable to generate a reinforcement learning (RL) model for generating dynamic treatment regimes;
determining, by the dynamic treatment regime generation engine using the RL model, for a patient for a plurality of times, an unconstrained next action in a treatment regime with no constraint, a partially guideline compliant next action in the treatment regime with allowed deviation from the guidelines, and a guideline compliant next action in the treatment regime that adheres to the guidelines;
generating, by a presentation layer within the dynamic treatment regime generation engine, an outcome output display based on the determined next action in a treatment regime using the RL model with no constraint, optimal next action in the treatment regime with allowed deviation from the guidelines, and next action in the treatment regime that adheres to the guidelines; and
presenting, by the presentation layer, the outcome output display to a user.

The bolded text shown above, are not part of the aforementioned abstract ideas. However, they relate to the remaining elements which do not amount to a practical application or significantly more, and will be discussed in further detail below. 

Furthermore, Claims 1-20 are not integrated into a practical application because the additional elements (i.e. the limitations not identified as part of the abstract idea) amount to no more than limitations which:
amount to mere instructions to apply an exception – for example, the recitation of “in a data processing system comprising a processor and a memory, the memory comprising instructions that are executed by the processor to specifically configure the processor to implement a dynamic treatment regime generation engine for,” “from a clinical guidelines store,” “data structure,” “by a model builder executing within the data processing system,” “by the dynamic treatment regime generation engine,” and “by a presentation layer within the dynamic treatment regime generation engine,” which amounts to merely invoking a computer as a tool to perform the abstract idea. See MPEP 2106.05(f), see, e.g., paragraphs [0051] – [0057] of the Present Specification, discussing the data processing system, and paragraph [0027] of the Present Specification, discussing the engine.

Step 2B of the Alice/Mayo Test for Claims
Furthermore, the claims do not include additional elements that are sufficient to amount to “significantly more” than the judicial exception because the additional elements (i.e. the elements other than the abstract idea) amount to elements that have been recognized as well-understood, routine, and conventional activity in particular fields, as demonstrated by:
The “data processing system comprising a processor and a memory” amounts to elements that have been recognized as well-understood, routine, and conventional activity in particular fields, as demonstrated by:
U.S. Patent Publication No. 2018/0325385 to Deterding, et. al. disclosing the processor (para. 0048), computer-readable memory (para. 0048), computing device (para. 0040)
U.S. Patent Publication No. 2014/0006055 to Seraly, et. al. disclosing the processor (para. 0078), computer-readable memory (para. 0092), computing device (para. 0093)
The models0 amount to elements that have been recognized as well-understood, routine, and conventional activity in particular fields, as demonstrated by:
U.S. Patent Publication No. 2014/0006055 to Seraly, at para. 0014; 
U.S. Patent Publication No. 2018/0043182 to Wu, et al., at para. 0025.
The “clinical guidelines store” and “data structure” amounts to elements that have been recognized as well-understood, routine, and conventional activity in particular fields, as demonstrated by:
U.S. Patent Publication No. 2017/0116373 to Ginsburg, at para. 0142;
U.S. Patent Publication No. 2018/0043182 to Wu, et al., at para. 0097.
Independent claims 8 and 15 include nearly identical limitations and are similarly rejected. Dependent claims 2-7, 9-14, and 16-20 include other limitations, but none of these functions are deemed significantly more than the abstract idea because the additional elements Claims 2-3, 9-10, and 16-17; the “estimating” feature of dependent Claims 4, 11, and 18), receiving or transmitting data over a network (e.g. the “aggregating” and “storing” features of dependent Claims 5, 12, and 19); gathering and analyzing information using conventional techniques and displaying the result (e.g. the “display” feature of dependent Claims 6, 13, and 20; and the “generating” feature of dependent Claims 7 and 14). 
Thus, taken alone, the additional elements do not amount to “significantly more” than the above-identified abstract idea.  Furthermore, looking at the limitations as an ordered combination adds nothing that is not already present when looking at the elements taken individually, and there is no indication that the combination of elements improves the functioning of a computer or improves any other technology, and their collective functions merely provide conventional computer implementation.
Therefore, whether taken individually or as an ordered combination, Claims 1-20 are nonetheless rejected under 35 U.S.C. 101 as being directed to non-statutory subject matter.

Claim Rejections – 35 USC § 112(a)
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1-20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. 
	Claims 1, 8 and 15, have been amended to recite “a clinical guidelines representation data structure,” which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for pre-AIA  the inventor(s), at the time the application was filed, had possession of the claimed invention. 
	Claims 2-7, 9-14, and 16-20 are rejected at least due to their dependency on Claims 1, 8, and 15, respectively.
In addition, independent claims 1, 8, and 15, recites steps of “transforming clinical guidelines from a clinical guidelines store into logical assertions to form a clinical guidelines representation data structure.”  However, no description of how these steps are actually performed is provided in the specification.
For a computer-implemented functional claims, the specification must disclose the computer and the algorithm (e.g., the necessary steps and/or flowcharts) that perform the claimed function in sufficient detail such that one of ordinary skill in the art can reasonably conclude that the inventor invented the claimed subject matter. Specifically, if one skilled in the art would know how to program the disclosed computer to perform the necessary steps described in the specification to achieve the claimed function and the inventor was in possession of that knowledge, the written description requirement would be satisfied (See MPEP 2161.01(I)). In the instant application, no guidance is provided how to program a computer to take the inputs of "clinical guidelines” and output “a clinical guidelines representation data structure.” While a medical professional may normally perform these steps 
Dependent claims 2-7, 9-14, and 16-20 incorporate the deficiencies of independent claims 1, 8, and 15, and are rejected for the same reasons.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-20 are rejected under 35 U.S.C. § 103 as being unpatentable over Personalized HeartSteps: A Reinforcement Learning Algorithm for Optimizing Physical Activity by Liao, et al. (“Liao”) in view of U.S. Patent Publication No. 2018/0089373 to Matsuguchi, et al. (“Matsuguchi”) in further view of U.S. Patent Publication No. 2014/0365210 to Riskin, et al. (“Riskin”)
	
Regarding Claim 1, Liao discloses:
A method, in a data processing system comprising a processor and a memory, the memory comprising instructions that are executed by the processor to specifically configure the processor to implement a dynamic treatment regime generation engine for learning interpretable strategies in the presence of existing domain knowledge, the method comprising: (Liao, page 4, second paragraph: a dynamic system for promoting physical activity and dietary health)
computing a discounted health variable (Liao, page 11, second full paragraph: calculating the discount rate, γ, based on a simulation-based procedure) with a penalty for deviating from clinical guidelines (Liao, page 2: the “reward” is effectively a penalty1 for completing or deviating from certain guidelines/practices, such as an unsatisfactory step count. The clinical guidelines representation data structure is taught by Riskin, below) 
applying, by a model builder executing within the data processing system, reinforcement learning techniques on the discounted health variable (Liao, page 11, second full paragraph: applying a RL algorithm to the tuning parameters, which includes the discount rate, γ) to generate a reinforcement learning (RL) model (Liao, page 11, second full paragraph: a simulation environment) for generating dynamic treatment regimes; (Liao, page the RL algorithm is used to trigger a “context-tailored activity suggestion,” construed as a dynamic treatment regime)
determining, by the dynamic treatment regime generation engine using the RL model, for a patient for a plurality of times, an unconstrained next action in a treatment regime with no constraint (Liao, page 10, first paragraph: using a constraint of zero), a partially guideline compliant next action in the treatment regime with allowed deviation from the guidelines  (Liao, page 10, first paragraph: using a constraint between zero and one), and a guideline compliant next action in the treatment regime that adheres to the guidelines; (Liao, page 10, first paragraph: using a constraint of one)
Liao discloses a dynamic system for promoting physical activity and dietary health using RL learning (page 4, second paragraph). However, Liao does not explicitly recite, but Matsuguchi teaches that it is old and well known in the art of healthcare to include calculating a variable based on a distance function representing an allowed deviation from the clinical guidelines; (Matsuguchi, [0183]: grouping therapies using a “similarity score,” which is a type of distance function, and the similarity can be determined using an empirical significance threshold, which would be a type of allowed deviation)
generating, by a presentation layer within the dynamic treatment regime generation engine, an outcome output display based on the determined next action in a treatment regime using the RL model with no constraint, optimal next action in the treatment regime with allowed deviation from the guidelines, and next action in the treatment regime that adheres to the guidelines; and (Matsuguchi, [0184]: the matched therapies are displayed on a user interface)
presenting, by the presentation layer, the outcome output display to a user. (Matsuguchi, [0184]: the matched therapies are displayed on a user interface).
Therefore, it would have been obvious to one having ordinary skill in the art of healthcare at the time of filing to modify Liao to include calculating a variable based on a 
The combination of Liao and Matsuguchi does not explicitly recite, but Riskin teaches that it is old and well known in the art of healthcare to include transforming clinical data from a clinical data store into logical assertions (Riskin, [0013]: processing data to transform narrative content into structured output) to form a clinical data representation data structure (Riskin, [0044]; Fig. 5: the structured output defines where individual information resides within the output). Clinical data in the form of clinical guidelines is taught by Matsuguchi above. 
Therefore, it would have been obvious to one having ordinary skill in the art of healthcare at the time of filing to modify Liao to include transforming clinical data from a clinical data store into logical assertions to form a clinical data representation data structure, as taught by Riskin, for improved data structuring systems (Riskin, 0010) because Riskin teaches that the above improves care and reduce costs. (Riskin, 0007).

Regarding claim 2, the combination discloses each of the limitations of claim 1 as discussed above, and further discloses:
wherein applying reinforcement learning techniques on the discounted health variable comprises computing an average outcome value as follows: 
    PNG
    media_image1.png
    29
    192
    media_image1.png
    Greyscale
 (Liao, page 9: (S4) the mean reward given St = s and At = a is r(s, a), and Qπ(s, a) = Eπ[Rt + γRt+1 + γ2Rt+2 + . . . | St = s, At = a]) where:
Y* denote the optimal outcome variable that the RL is optimizing; and (Liao, page 9, fourth paragraph: maximizing the future value)
Qk is a Q function as defined in a Q-learning algorithm in RL that estimates the average treatment effect (Liao, pg 9: (S4) the mean reward given St = s and At = a is r(s, a)).
Matsuguchi teaches that it is old and well known in the art of healthcare to include Lk represents a history of all observed data for a patient up to time k; (Matsuguchi, [0180]: a list of therapies is generated from medical history data of the subject) Ak represents the history of all past actions that were taken on a given patient up to time k; (Matsuguchi, [0180]: a list of therapies is generated from a first list of therapies of the subject)
Therefore, it would have been obvious to one having ordinary skill in the art of healthcare at the time of filing to modify Liao to include that Lk represents a history of all observed data for a patient up to time k; Ak represents the history of all past actions that where taken on a given patient up to time k, as taught by Matsuguchi, for efficient processing because Matsuguchi teaches that the above improves efficient data processing.

Regarding claim 3, the combination teaches each of the limitations of claim 2 as discussed above, and further discloses:
wherein applying reinforcement learning techniques on the discounted health variable further comprises: using the following equation to find an action that maximizes QK; 
    PNG
    media_image2.png
    34
    299
    media_image2.png
    Greyscale
 (Liao, page 9: V (x, i) = max a∈A(i) {r1(x, a) + γ X x0,i0 τ (x0|x, a)pi0avail(1 − pavail)1−i0V (x0, i0)) where:
Vk is the value function optimizing Qk under the action ak taken at time k; (Liao, page 9, fourth paragraph: maximizing the future value)
computing for a health outcome variable Y a discounted health outcome variable Y* as follows:
    PNG
    media_image3.png
    32
    279
    media_image3.png
    Greyscale
 (Liao, page 9: (S4) the mean reward given St = s and At = a is r(s, a), and Qπ(s, a) = Eπ[Rt + γRt+1 + γ2Rt+2 + . . . | St = s, At = a])
Matsuguchi teaches that it is old and well known in the art of healthcare to include where δm represents a distance function that measures deviation from existing domain knowledge; (Matsuguchi, [0183]: grouping therapies using a “similarity score,” which is a type of distance function, and the similarity can be determined using an empirical significance threshold, which would be a type of allowed deviation)
Lm represents a history of covariates up to time m; (Matsuguchi, [0180]: a list of therapies is generated from medical history data of the subject)
Am-1 represents actions in the treatment regimen up to time m-1; (Matsuguchi, [0180]: a list of therapies is generated from a first list of therapies of the subject)
α*m represents all admissible actions according to the prior domain knowledge; and (Matsuguchi, [0174]: a first list of therapies of the subject)
γ is a real number representing a hyperparameter controlling how much impact the distance function δm has on optimization. (Matsuguchi, [0183]: the similarity metric is measure of the empirical significance of the similarity score)
Therefore, it would have been obvious to one having ordinary skill in the art of healthcare at the time of filing to modify Liao to include where δm represents a distance function that measures deviation from existing domain knowledge; Lm represents a history of covariates up to time m; Am-1 represents actions in the treatment regimen up to time m-1; α*m represents all admissible actions according to the prior domain knowledge; and γ is a real number representing a hyperparameter controlling how much impact the distance function δm has on optimization, as taught by Matsuguchi, for efficient processing because Matsuguchi teaches that the above improves efficient data processing.

Regarding claim 4, the combination discloses each of the limitations of claim 3 as discussed above, and further discloses:
Matsuguchi teaches that it is old and well known in the art of healthcare to include wherein δm is a Hamming distance, estimated to be 0 if 
    PNG
    media_image4.png
    26
    100
    media_image4.png
    Greyscale
 contains Am and 1 otherwise. (Matsuguchi, [0151]: the matched therapies include a hamming distance).
Therefore, it would have been obvious to one having ordinary skill in the art of healthcare at the time of filing to modify Liao to include wherein δm is a Hamming distance, estimated to be 0 if 
    PNG
    media_image4.png
    26
    100
    media_image4.png
    Greyscale
  contains Am and 1 otherwise, for efficient processing because Matsuguchi teaches that the above improves efficient data processing.

Regarding claim 5, the combination discloses each of the limitations of claim 1 as discussed above, and further discloses:
Matsuguchi teaches that it is old and well known in the art of healthcare to include wherein applying reinforcement learning techniques on the discounted health variable comprises aggregating data from patient monitors and storing aggregated data in a historical patient data storage. (Matsuguchi, [0005]: the medical history data).
Therefore, it would have been obvious to one having ordinary skill in the art of healthcare at the time of filing to modify Liao to include wherein applying reinforcement learning techniques on the discounted health variable comprises aggregating data from patient monitors and storing aggregated data in a historical patient data storage, for efficient processing because Matsuguchi teaches that the above improves efficient data processing.

claim 6, the combination discloses each of the limitations of claim 1 as discussed above, and further discloses:
generating the outcome output display comprises generating an action tree based on the determined next action in a treatment regime using the RL model with no distance function, optimal next action in the treatment regime with allowed deviation from the guidelines, and next action in the treatment regime that adheres to the guidelines. (Liao, page 2: the treatment policies are construed as the action tree)

Regarding claim 7, the combination discloses each of the limitations of claim 6 as discussed above, and further discloses:
wherein generating the action tree comprises: connecting current actions to next actions; (Liao, page 5: the dosage variable connects past treatments to future rewards)
responsive to two or more of the determined next action in a treatment regime using the RL model with no distance function, the optimal next action in the treatment regime with allowed deviation from the guidelines, and the next action in the treatment regime that adheres to the guidelines being the same, reducing the action tree; and (Liao, page 4: the treatment policies are based on past rewards and applied to future rewards)
responsive to the determined next action in a treatment regime using the RL model with no distance function, the optimal next action in the treatment regime with allowed deviation from the guidelines, and the next action in the treatment regime that adheres to the guidelines not being the same, expanding the action tree. (Liao, page 4: the treatment policies is updated and as the system learns from additional data).

Regarding Claim 8, Liao discloses:
A computer program product comprising a computer readable storage medium having a computer readable program stored therein, wherein the computer readable program, when executed on a data processing system, causes the data processing system to implement a dynamic treatment regime generation engine for learning interpretable strategies in the presence of existing domain knowledge, wherein the computer readable program causes the data processing system to: (Liao, page 4, second paragraph: a dynamic system for promoting physical activity and dietary health)
compute a discounted health variable (Liao, page 11, second full paragraph: calculating the discount rate, γ, based on a simulation-based procedure) with a penalty for deviating from clinical guidelines (Liao, page 2: the “reward” is effectively a penalty2 for completing or deviating from certain guidelines/practices, such as an unsatisfactory step count. The clinical guidelines representation data structure is taught by Riskin, below)
apply, by a model builder executing within the data processing system, reinforcement learning techniques on the discounted health variable (Liao, page 11, second full paragraph: applying a RL algorithm to the tuning parameters, which includes the discount rate, γ) to generate a reinforcement learning (RL) model (Liao, page 11, second full paragraph: a simulation environment) for generating dynamic treatment regimes; (Liao, page 13, 7 Pilot Data From HeartSteps V2: the RL algorithm is used to trigger a “context-tailored activity suggestion,” construed as a dynamic treatment regime) 
determine, by the dynamic treatment regime generation engine using the RL model, for a patient for a plurality of times, an unconstrained next action in a treatment regime with no distance function (Liao, page 10, first paragraph: using a constraint of zero), a partially guideline compliant next action in the treatment regime with allowed deviation from the guidelines (Liao, page 10, first paragraph: using a constraint between zero and one), and a guideline compliant next action in the treatment regime that adheres to the guidelines; (Liao, page 10, first paragraph: using a constraint of one)
Matsuguchi teaches that it is old and well known in the art of healthcare to include calculating a variable based on a distance function representing an allowed deviation from the clinical guidelines and/or best practices; (Matsuguchi, [0183]: grouping therapies using a “similarity score,” which is a type of distance function, and the similarity can be determined using an empirical significance threshold, which would be a type of allowed deviation)
generate, by a presentation layer within the dynamic treatment regime generation engine, an outcome output display based on the determined next action in a treatment regime using the RL model with no distance function, optimal next action in the treatment regime with allowed deviation from the guidelines, and next action in the treatment regime that adheres to the guidelines; and (Matsuguchi, [0184]: the matched therapies are displayed on a user interface)
present, by the presentation layer, the outcome output display to a user. (Matsuguchi, [0184]: the matched therapies are displayed on a user interface).
Therefore, it would have been obvious to one having ordinary skill in the art of healthcare at the time of filing to modify Liao to include calculating a variable based on a distance function representing an allowed deviation from the clinical guidelines and/or best practices; generating, by a presentation layer within the dynamic treatment regime generation engine, an outcome output display based on the determined next action in a treatment regime using the RL model with no constraint, optimal next action in the treatment regime with allowed deviation from the guidelines, and next action in the treatment regime that adheres to the guidelines; and presenting, by the presentation layer, the outcome output display to a user, as taught by Matsuguchi, for efficient processing because Matsuguchi teaches that the above improves efficient data processing and user experience.
transforming clinical data from a clinical data store into logical assertions (Riskin, [0013]: processing data to transform narrative content into structured output) to form a clinical data representation data structure (Riskin, [0044]; Fig. 5: the structured output defines where individual information resides within the output). Clinical data in the form of clinical guidelines is taught by Matsuguchi above. 
Therefore, it would have been obvious to one having ordinary skill in the art of healthcare at the time of filing to modify Liao to include transforming clinical data from a clinical data store into logical assertions to form a clinical data representation data structure, as taught by Riskin, for improved data structuring systems (Riskin, 0010) because Riskin teaches that the above improves care and reduce costs. (Riskin, 0007).

Regarding claim 9, the combination teaches each of the limitations of claim 8 as discussed above, and further discloses:
wherein applying reinforcement learning techniques on the discounted health variable comprises: computing an average outcome value as follows: 
    PNG
    media_image1.png
    29
    192
    media_image1.png
    Greyscale
 (Liao, page 9: (S4) the mean reward given St = s and At = a is r(s, a), and Qπ(s, a) = Eπ[Rt + γRt+1 + γ2Rt+2 + . . . | St = s, At = a]) where:
Y* denote the optimal outcome variable that the RL is optimizing; and (Liao, page 9, fourth paragraph: maximizing the future value)
Qk is a Q function as defined in a Q-learning algorithm in RL that estimates the average treatment effect (Liao, pg 9: (S4) the mean reward given St = s and At = a is r(s, a)).
Matsuguchi teaches that it is old and well known in the art of healthcare to include Lk represents a history of all observed data for a patient up to time k; (Matsuguchi, [0180]: a list of therapies is generated from medical history data of the subject) Ak represents the history of all past actions that were taken on a given patient up to time k; (Matsuguchi, [0180]: a list of therapies is generated from a first list of therapies of the subject)
Therefore, it would have been obvious to one having ordinary skill in the art of healthcare at the time of filing to modify Liao to include that Lk represents a history of all observed data for a patient up to time k; Ak represents the history of all past actions that where taken on a given patient up to time k, as taught by Matsuguchi, for efficient processing because Matsuguchi teaches that the above improves efficient data processing.

Regarding claim 10, the combination teaches each of the limitations of claim 9 as discussed above, and further discloses:
wherein applying reinforcement learning techniques on the discounted health variable further comprises: using the following equation to find an action that maximizes QK; 
    PNG
    media_image2.png
    34
    299
    media_image2.png
    Greyscale
(Liao, page 9: V (x, i) = max a∈A(i) {r1(x, a) + γ X x0,i0 τ (x0|x, a)pi0avail(1 − pavail)1−i0V (x0, i0)) where:
Vk is the value function optimizing Qk under the action ak taken at time k; (Liao, page 9, fourth paragraph: maximizing the future value)
computing for a health outcome variable Y a discounted health outcome variable Y* as follows:
    PNG
    media_image3.png
    32
    279
    media_image3.png
    Greyscale
 (Liao, page 9: (S4) the mean reward given St = s and At = a is r(s, a), and Qπ(s, a) = Eπ[Rt + γRt+1 + γ2Rt+2 + . . . | St = s, At = a])
Matsuguchi teaches that it is old and well known in the art of healthcare to include where δm represents a distance function that measures deviation from existing domain knowledge; (Matsuguchi, [0183]: grouping therapies using a “similarity score,” which is a type of distance function, and the similarity can be determined using an empirical significance threshold, which would be a type of allowed deviation)
Lm represents a history of covariates up to time m; (Matsuguchi, [0180]: a list of therapies is generated from medical history data of the subject)
Am-1 represents actions in the treatment regimen up to time m-1; (Matsuguchi, [0180]: a list of therapies is generated from a first list of therapies of the subject)
α*m represents all admissible actions according to the prior domain knowledge; and (Matsuguchi, [0174]: a first list of therapies of the subject)
γ is a real number representing a hyperparameter controlling how much impact the distance function δm has on optimization. (Matsuguchi, [0183]: the similarity metric is measure of the empirical significance of the similarity score)
Therefore, it would have been obvious to one having ordinary skill in the art of healthcare at the time of filing to modify Liao to include where δm represents a distance function that measures deviation from existing domain knowledge; Lm represents a history of covariates up to time m; Am-1 represents actions in the treatment regimen up to time m-1; α*m represents all admissible actions according to the prior domain knowledge; and γ is a real number representing a hyperparameter controlling how much impact the distance function δm has on optimization, as taught by Matsuguchi, for efficient processing because Matsuguchi teaches that the above improves efficient data processing.

Regarding claim 11, the combination teaches each of the limitations of claim 10 as discussed above, and further discloses:
Matsuguchi teaches that it is old and well known in the art of healthcare to include wherein δm is a Hamming distance, estimated to be 0 if 
    PNG
    media_image4.png
    26
    100
    media_image4.png
    Greyscale
 contains Am and 1 otherwise. (Matsuguchi, [0151]: the matched therapies include a hamming distance).
Therefore, it would have been obvious to one having ordinary skill in the art of healthcare at the time of filing to modify Liao to include wherein δm is a Hamming distance, 
    PNG
    media_image4.png
    26
    100
    media_image4.png
    Greyscale
  contains Am and 1 otherwise, for efficient processing because Matsuguchi teaches that the above improves efficient data processing.

Regarding claim 12, the combination teaches each of the limitations of claim 8 as discussed above, and further discloses:
Matsuguchi teaches that it is old and well known in the art of healthcare to include wherein applying reinforcement learning techniques on the discounted health variable comprises aggregating data from patient monitors and storing aggregated data in a historical patient data storage. (Matsuguchi, [0005]: the medical history data).
Therefore, it would have been obvious to one having ordinary skill in the art of healthcare at the time of filing to modify Liao to include wherein applying reinforcement learning techniques on the discounted health variable comprises aggregating data from patient monitors and storing aggregated data in a historical patient data storage, for efficient processing because Matsuguchi teaches that the above improves efficient data processing.

Regarding claim 13, the combination teaches each of the limitations of claim 8 as discussed above, and further discloses:
generating the outcome output display comprises generating an action tree based on the determined next action in a treatment regime using the RL model with no distance function, optimal next action in the treatment regime with allowed deviation from the guidelines, and next action in the treatment regime that adheres to the guidelines. (Liao, page 2: the treatment policies are construed as the action tree).

claim 14, the combination teaches each of the limitations of claim 13 as discussed above, and further discloses:
wherein generating the action tree comprises: connecting current actions to next actions; (Liao, page 5: the dosage variable connects past treatments to future rewards)
responsive to two or more of the determined next action in a treatment regime using the RL model with no distance function, the optimal next action in the treatment regime with allowed deviation from the guidelines, and the next action in the treatment regime that adheres to the guidelines being the same, reducing the action tree; and (Liao, page 4: the treatment policies are based on past rewards and applied to future rewards)
responsive to the determined next action in a treatment regime using the RL model with no distance function, the optimal next action in the treatment regime with allowed deviation from the guidelines, and the next action in the treatment regime that adheres to the guidelines not being the same, expanding the action tree. (Liao, page 4: the treatment policies is updated and as the system learns from additional data).

Regarding Claim 15, Liao discloses:
A data processing system comprising: (Liao, page 4, second paragraph: a dynamic system for promoting physical activity and dietary health)
at least one processor; and at least one memory coupled to the at least one processor, wherein the at least one memory comprises instructions which, when executed by the at least one processor, cause the at least one processor to implement a dynamic treatment regime generation engine for learning interpretable strategies in the presence of existing domain knowledge, wherein the instructions cause the processor to: (Liao, page 1: a mobile application)
compute a discounted health variable (Liao, page 11, second full paragraph: calculating the discount rate, γ, based on a simulation-based procedure) with a penalty for deviating from clinical guidelines based on a distance function representing an allowed deviation from the clinical guidelines; (Liao, page 2: the “reward” is effectively a penalty3 for completing or deviating from certain guidelines/practices, such as an unsatisfactory step count. The clinical guidelines representation data structure is taught by Riskin, below) 
apply, by a model builder executing within the data processing system, reinforcement learning techniques on the discounted health variable (Liao, page 11, second full paragraph: applying a RL algorithm to the tuning parameters, which includes the discount rate, γ) to generate a reinforcement learning (RL) model (Liao, page 11, second full paragraph: a simulation environment) for generating dynamic treatment regimes; (Liao, page 13, 7 Pilot Data From HeartSteps V2: the RL algorithm is used to trigger a “context-tailored activity suggestion,” construed as a dynamic treatment regime)
determine, by the dynamic treatment regime generation engine using the RL model, for a patient for a plurality of times, an unconstrained next action in a treatment regime with no distance function (Liao, page 10, first paragraph: using a constraint of zero), a partially guideline compliant next action in the treatment regime with allowed deviation from the guidelines (Liao, page 10, first paragraph: using a constraint between zero and one), and a guideline compliant next action in the treatment regime that adheres to the guidelines; (Liao, page 10, first paragraph: using a constraint of one)
Matsuguchi teaches that it is old and well known in the art of healthcare to include calculating a variable based on a distance function representing an allowed deviation from the clinical guidelines and/or best practices; (Matsuguchi, [0183]: grouping therapies using a “similarity score,” which is a type of distance function, and the similarity can be determined using an empirical significance threshold, which would be a type of allowed deviation)
generate, by a presentation layer within the dynamic treatment regime generation engine, an outcome output display based on the determined next action in a treatment regime using the RL model with no distance function, optimal next action in the treatment regime with allowed deviation from the guidelines, and next action in the treatment regime that adheres to the guidelines; and (Matsuguchi, [0184]: the matched therapies are displayed on a user interface)
present, by the presentation layer, the outcome output display to a user. (Matsuguchi, [0184]: the matched therapies are displayed on a user interface).
Therefore, it would have been obvious to one having ordinary skill in the art of healthcare at the time of filing to modify Liao to include calculating a variable based on a distance function representing an allowed deviation from the clinical guidelines and/or best practices; generating, by a presentation layer within the dynamic treatment regime generation engine, an outcome output display based on the determined next action in a treatment regime using the RL model with no constraint, optimal next action in the treatment regime with allowed deviation from the guidelines, and next action in the treatment regime that adheres to the guidelines; and presenting, by the presentation layer, the outcome output display to a user, as taught by Matsuguchi, for efficient processing because Matsuguchi teaches that the above improves efficient data processing and user experience.
The combination of Liao and Matsuguchi does not explicitly recite, but Riskin teaches that it is old and well known in the art of healthcare to include transforming clinical data from a clinical data store into logical assertions (Riskin, [0013]: processing data to transform narrative content into structured output) to form a clinical data representation data structure (Riskin, [0044]; Fig. 5: the structured output defines where individual information resides within the output). Clinical data in the form of clinical guidelines is taught by Matsuguchi above. 
Therefore, it would have been obvious to one having ordinary skill in the art of healthcare at the time of filing to modify Liao to include transforming clinical data from a clinical 

Regarding claim 16, the combination teaches each of the limitations of claim 15 as discussed above, and further discloses:
wherein applying reinforcement learning techniques on the discounted health variable comprises: computing an average outcome value as follows: 
    PNG
    media_image1.png
    29
    192
    media_image1.png
    Greyscale
 (Liao, page 9: (S4) the mean reward given St = s and At = a is r(s, a), and Qπ(s, a) = Eπ[Rt + γRt+1 + γ2Rt+2 + . . . | St = s, At = a]) where:
Y* denote the optimal outcome variable that the RL is optimizing; and (Liao, page 9, fourth paragraph: maximizing the future value)
Qk is a Q function as defined in a Q-learning algorithm in RL that estimates the average treatment effect (Liao, pg 9: (S4) the mean reward given St = s and At = a is r(s, a)).
Matsuguchi teaches that it is old and well known in the art of healthcare to include Lk represents a history of all observed data for a patient up to time k; (Matsuguchi, [0180]: a list of therapies is generated from medical history data of the subject) Ak represents the history of all past actions that where taken on a given patient up to time k; (Matsuguchi, [0180]: a list of therapies is generated from a first list of therapies of the subject)
Therefore, it would have been obvious to one having ordinary skill in the art of healthcare at the time of filing to modify Liao to include that Lk represents a history of all observed data for a patient up to time k; Ak represents the history of all past actions that where taken on a given patient up to time k, as taught by Matsuguchi, for efficient processing because Matsuguchi teaches that the above improves efficient data processing.

claim 17, the combination teaches each of the limitations of claim 16 as discussed above, and further discloses:
wherein applying reinforcement learning techniques on the discounted health variable further comprises: using the following equation to find an action that maximizes QK; 
    PNG
    media_image2.png
    34
    299
    media_image2.png
    Greyscale
 (Liao, page 9: V (x, i) = max a∈A(i) {r1(x, a) + γ X x0,i0 τ (x0|x, a)pi0avail(1 − pavail)1−i0V (x0, i0)) where:
Vk is the value function optimizing Qk under the action ak taken at time k; (Liao, page 9, fourth paragraph: maximizing the future value)
computing for a health outcome variable Y a discounted health outcome variable Y* as follows:
    PNG
    media_image3.png
    32
    279
    media_image3.png
    Greyscale
 (Liao, page 9: (S4) the mean reward given St = s and At = a is r(s, a), and Qπ(s, a) = Eπ[Rt + γRt+1 + γ2Rt+2 + . . . | St = s, At = a])
Matsuguchi teaches that it is old and well known in the art of healthcare to include where δm represents a distance function that measures deviation from existing domain knowledge; (Matsuguchi, [0183]: grouping therapies using a “similarity score,” which is a type of distance function, and the similarity can be determined using an empirical significance threshold, which would be a type of allowed deviation)
Lm represents a history of covariates up to time m; (Matsuguchi, [0180]: a list of therapies is generated from medical history data of the subject)
Am-1 represents actions in the treatment regimen up to time m-1; (Matsuguchi, [0180]: a list of therapies is generated from a first list of therapies of the subject)
α*m represents all admissible actions according to the prior domain knowledge; and (Matsuguchi, [0174]: a first list of therapies of the subject)
γ is a real number representing a hyperparameter controlling how much impact the distance function δm has on optimization. (Matsuguchi, [0183]: the similarity metric is measure of the empirical significance of the similarity score)
Therefore, it would have been obvious to one having ordinary skill in the art of healthcare at the time of filing to modify Liao to include where δm represents a distance function that measures deviation from existing domain knowledge; Lm represents a history of covariates up to time m; Am-1 represents actions in the treatment regimen up to time m-1; α*m represents all admissible actions according to the prior domain knowledge; and γ is a real number representing a hyperparameter controlling how much impact the distance function δm has on optimization, as taught by Matsuguchi, for efficient processing because Matsuguchi teaches that the above improves efficient data processing.

Regarding claim 18, the combination teaches each of the limitations of claim 17 as discussed above, and further discloses:
Matsuguchi teaches that it is old and well known in the art of healthcare to include wherein δm is a Hamming distance, estimated to be 0 if 
    PNG
    media_image4.png
    26
    100
    media_image4.png
    Greyscale
 contains Am and 1 otherwise. (Matsuguchi, [0151]: the matched therapies include a hamming distance).
Therefore, it would have been obvious to one having ordinary skill in the art of healthcare at the time of filing to modify Liao to include wherein δm is a Hamming distance, estimated to be 0 if 
    PNG
    media_image4.png
    26
    100
    media_image4.png
    Greyscale
  contains Am and 1 otherwise, for efficient processing because Matsuguchi teaches that the above improves efficient data processing.

Regarding claim 19, the combination teaches each of the limitations of claim 15 as discussed above, and further discloses:
wherein applying reinforcement learning techniques on the discounted health variable comprises aggregating data from patient monitors and storing aggregated data in a historical patient data storage. (Matsuguchi, [0005]: the medical history data).
Therefore, it would have been obvious to one having ordinary skill in the art of healthcare at the time of filing to modify Liao to include wherein applying reinforcement learning techniques on the discounted health variable comprises aggregating data from patient monitors and storing aggregated data in a historical patient data storage, for efficient processing because Matsuguchi teaches that the above improves efficient data processing.

Regarding claim 20, the combination teaches each of the limitations of claim 15 as discussed above, and further discloses:
generating the outcome output display comprises generating an action tree based on the determined next action in a treatment regime using the RL model with no distance function, optimal next action in the treatment regime with allowed deviation from the guidelines, and next action in the treatment regime that adheres to the guidelines. (Liao, page 2: the treatment policies are construed as the action tree).

Response to Applicant’s Arguments
Applicant’s arguments and amendments, filed on 11/30/2021, with respect to the 35 USC § 101 rejection of claims 1-20 have been considered but are not persuasive. 
The Applicant argues that the claims are not directed to an abstract idea because only a computer tool can apply reinforcement learning to generate an RL model. The argument is not persuasive because the claims are merely invoking a computer as a tool to perform the abstract idea.


Applicant’s arguments and amendments, filed on 11/30/2021, with respect to the 35 USC § 103 rejection of claims 1-20 have been considered but are not persuasive. 
With regards to claim 1, Applicant argues that Liao does not teach the penalty, as claimed. The argument is not persuasive, as stated above on page 10 with the footnote to Paul. 
Applicant argues that Liao does not teach the constrained/unconstrained actions, as claimed. The argument is not persuasive because the constraints of Liao at page 10 teach the deviations as claimed. 
Applicant argues that the new limitation of transforming is not taught by the cited references. The limitation has been addressed with Riskin, above. Examiner notes that this limitation is subject to a 112(a) rejection. 
Examiner notes that the discounted health variable is taught by Liao. (see page 11, second full paragraph). 
Applicant argues that the office action provides no explanation of claim 3. Applicant is directed to the rejection of claim 3 above. 
Applicant argues that the office action proffers no analysis of claim 7. Applicant is directed to the rejection of claim 7 above. 
For the reasons set forth in the 35 USC § 103 rejection of claims 1-20 above, the references cited in the rejection render amended claims 1-20 obvious under 35 USC § 103. Applicant’s argument is not persuasive.

CONCLUSION
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Shyam Goswami whose telephone number is (303)297-4283.  The examiner can normally be reached on Monday-Thursday, 8:30AM-6:30PM MT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Victoria Augustine can be reached on (313)446-4858.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-


SHYAM M GOSWAMI
Examiner
Art Unit 3686



	/Victoria P Augustine/                   Supervisory Patent Examiner, Art Unit 3686                                                                                                                                                                                     


    
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
    

    
        1 See An introduction to Q-Learning: Reinforcement Learning by Sayak Paul, (available at https://blog.floydhub.com/an-introduction-to-q-learning-reinforcement-learning/), stating that “The rewards need not be always the same. But it is much better than having some amount reward for the actions than having no rewards at all. This idea is known as the living penalty.”
        2 See An introduction to Q-Learning: Reinforcement Learning by Sayak Paul, (available at https://blog.floydhub.com/an-introduction-to-q-learning-reinforcement-learning/), stating that “The rewards need not be always the same. But it is much better than having some amount reward for the actions than having no rewards at all. This idea is known as the living penalty.”
        3 See An introduction to Q-Learning: Reinforcement Learning by Sayak Paul, (available at https://blog.floydhub.com/an-introduction-to-q-learning-reinforcement-learning/), stating that “The rewards need not be always the same. But it is much better than having some amount reward for the actions than having no rewards at all. This idea is known as the living penalty.”