Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
Claims 1-19 were previously pending and subject to a non-final Office Action having a notification date of January 26, 2022 (“non-final Office Action”).  Following the non-final Office Action, Applicant filed an amendment on April 26, 2022 (the “Amendment”), amending claims 18, and 19.  The present Notice of Allowance addresses pending claims 1-19 in the Amendment.

EXAMINER’S AMENDMENT
During a discussion with Brad Bertoglio, Reg. No. 47,422, on May 31, 2022, it was agreed that certain amendments to the application are to be made, which are set forth below. 

1. (Currently Amended) A hierarchical reinforcement learning system for automatic disease prediction for a plurality of diseases, comprising:
one or more processors; and
a computer readable medium having encoded thereon computer-executable instructions to cause the one or more processors to:
model the acts of doctors with an agent simulator module, wherein the agent simulator module includes a master module which is in a high level, and a plurality of worker modules which are in a low level,
model the acts of patients with a user simulator module, wherein
the plurality of worker modules each acts as a doctor from a specific department, while the master module appoints the plurality of worker modules to interact with the user simulator module for collecting information, and
modeling the acts of doctors includes iteratively training each of the plurality of worker modules a training process of the master module[[;]], wherein the iteratively training includes
storing, for each worker module, information collected by the worker module from interacting with the user simulator module in a respective experience buffer for the worker module,
performing experience replay to sample a subset of the collected information stored in the respective experience buffer for each worker module,
training a current neural network for each worker module based on its respective subset of the collected information,
updating a target neural network for each worker module based on the training, and
removing the subsets of collected information from the experience buffers after the updating;
produce a prediction result of the diseases of the plurality of diseases with a disease classifier module, wherein
the master module activates the disease classifier module to output the prediction result from the information collected from the plurality of worker modules, and
the prediction result is a probability distribution over all diseases of the plurality of diseases, and
produce an intrinsic reward with an internal critic module, wherein the internal critic module is configured to generate an intrinsic reward to the plurality of worker modules judging a termination condition for the plurality of worker modules, and
wherein the user simulator module is configured to return an extrinsic reward to the master module.

2. (Original) The hierarchical reinforcement learning system according to claim 1, wherein the diseases are divided into a plurality of disease subsets, with each disease subset being associated with a plurality of symptoms subsets, the symptoms of each plurality of symptoms subsets are related to diseases, and the plurality of worker modules collect information from the user simulator module about symptoms of the symptoms subsets.

3. (Original) The hierarchical reinforcement learning system according to claim 1, wherein the plurality of worker modules iteratively collect information from the user simulator module for a plurality of turns.

4. (Currently Amended) The hierarchical reinforcement learning system according to claim 3, wherein at each turn of the plurality of turns, the master module decides whether to collect symptom information from the user simulator module or inform the user simulator module with the prediction result.

5. (Original) The hierarchical reinforcement learning system according to claim 4, wherein when the master module decides to collect symptom information from the user simulator module, it picks one worker module of the plurality of worker modules to interact with the user simulator module.

6. (Original) The hierarchical reinforcement learning system according to claim 4, wherein when the master module decides to inform the user simulator module with prediction result, it activates the disease classifier module to output the predicted disease.

7. (Original) The hierarchical reinforcement learning system according to claim 5, wherein the picked worker module interacts with the user simulator module until a subtask of the picked worker module is terminated.

8. (Original) The hierarchical reinforcement learning system according to claim 5, wherein the picked worker module takes an inquiry action and then receives the intrinsic reward from the internal critic module.

9. (Currently Amended) The hierarchical reinforcement learning system according to claim 1, wherein when the disease classifier module is activated by the master module, it takes the state of the master module as input and outputs a vector which represents the probability distribution over all diseases.

10. (Currently Amended) The hierarchical reinforcement learning system according to claim 9, wherein the disease of the vector with a highest probability is returned to the user simulator module as the prediction result.

11. (Original) The hierarchical reinforcement learning system according to claim 8, wherein the intrinsic reward equals +1, if the picked worker module requests a symptom that the simulated user suffers; the intrinsic reward equals -1, if there are repeated actions generated by the picked worker module, or the number of the subtask turns reaches a threshold; and otherwise the intrinsic reward equals 0.

12. (Original) The hierarchical reinforcement learning system according to claim 7, wherein the subtask of the picked worker module is terminated as successful when the user simulator module responds “true” to the symptom requested by the picked worker, which means the current worker finishes the subtask by collecting enough symptom information.

13. (Original) The hierarchical reinforcement learning system according to claim 7, wherein the subtask of the picked worker module is terminated as failed when there are repeated actions generated by the picked worker module, or the number of the subtask turns reaches a threshold.

14. (Currently Amended) The hierarchical reinforcement learning system according to claim 1, wherein at the beginning of each of a plurality of dialogue sessionssimulator module samples a user goal from a training set randomly, and all said each dialogue session.

15. (Currently Amended) The hierarchical reinforcement learning system according to claim 1, wherein during a course of a dialogue session, the user simulator module interacts with the agent simulator module based on a user goal.

16. (Currently Amended) The hierarchical reinforcement learning system according to claim 14, wherein the dialogue session will be terminated as successful if the agent simulator module make a correct diagnosis.

17. (Currently Amended) The hierarchical reinforcement learning system according to claim 14, wherein said each dialogue session will be terminated as failed if an informed disease is incorrect or a dialogue turn reaches a maximal turn, or if there are repeated actions taken by the system.

18. (Previously Presented) The hierarchical reinforcement learning system according to claim 1, wherein a reward shaping is used to add an auxiliary reward to the extrinsic reward.

19. (Currently Amended) A  of a plurality of diseases with a hierarchical reinforcement learning system, the method comprising:
modeling the acts of doctors with an agent simulator module, wherein
the agent simulator module includes
a master module which is in a high level, and
a plurality of worker modules which are in a low level and
modeling the acts of the doctors includes iteratively training each of the plurality of worker modules a [[the]]training process of the master module[[;]], wherein the iteratively training includes
storing, for each worker module, information collected by the worker module from interacting with the user simulator module in a respective experience buffer for the worker module,
performing experience replay to sample a subset of the collected information stored in the respective experience buffer for each worker module,
training a current neural network for each worker module based on its respective subset of the collected information,
updating a target neural network for each worker module based on the training, and
removing the subsets of collected information from the experience buffers after the updating;
modeling the acts of patients with a user simulator module, wherein the plurality of worker modules each acts as a doctor from a specific department, while the master module appoints the plurality of worker modules to interact with the user simulator module for collecting information;
producing 
the master module activates the disease classifier module to output the prediction result when information collected from the plurality of worker modules is sufficient, and
the prediction result is a probability distribution over all diseases of the plurality of diseases; and
producing 
wherein the internal critic module is configured to generate the intrinsic reward to the plurality of worker modules judging a termination condition for the plurality of worker modules, and
wherein the user simulator module is configured to return an extrinsic reward to the master module.

20. (New) The method of claim 19, wherein the iteratively training includes fixing the target neural network while performing the experience replay.

Allowable Subject Matter
Claims 1-20 are allowed.
The following is the Examiner’s statement of reasons for allowance:
Independent claims 1 and 19 include many limitations that are disclosed by NPL “End-to-End Knowledge-Routed Relational Dialogue System for Automatic Diagnosis” to Xu et al. (“Xu”) in view of NPL “Context-Aware Symptom Checking for Disease Diagnosis Using Hierarchical Reinforcement Learning” to Kao et al. (“Kao”) as generally discussed at pages 19-22 of the non-final Office Action; namely, the limitations directed to a hierarchical reinforcement learning system for automatic disease prediction for a plurality of diseases that includes modeling the acts of doctors with an agent simulator module including a plurality of worker modules in a low level that act as doctors from different departments and a master module in a high level that appoints the worker modules to interact with a user simulator module for collecting information, producing a prediction result of the plurality of diseases with a disease classifier module activated by the master module to output the prediction result from the information collected from the plurality of worker module as a probability distribution over all diseases of the plurality of diseases, using an internal critic module to produce an intrinsic reward to the plurality of worker modules judging a termination condition for the plurality of worker modules, using the user simulator module to return an extrinsic reward to the master module.
However, each of independent claims 1 and 19 now includes the following limitations that, in combination with the other limitations of the claims, are not disclosed or suggested by the prior art of record:
modeling the acts of doctors includes iteratively training each of the plurality of worker modules during a training process of the master module, wherein the iteratively training includes
storing, for each worker module, information collected by the worker module from interacting with the user simulator module in a respective experience buffer for the worker module,
performing experience replay to sample a subset of the collected information stored in the respective experience buffer for each worker module,
training a current neural network for each worker module based on its respective subset of the collected information,
updating a target neural network for each worker module based on the training, and
removing the subsets of collected information from the experience buffers after the updating.

For reference, U.S. Patent App. Pub. No. 2018/0046773 to Tang et al. disclose a medical system and method for providing medical prediction that includes an agent that making symptom inquiries of a patient, making disease predictions based on the symptoms, and receiving positive or negative reward signals based on the disease predictions.  However, this document does not disclose iteratively training a number of low level worker modules of the agent during a training process of a high level master module of the agent, the iteratively training including storing, for each worker module, information collected by the worker module from interacting with the user simulator module in a respective experience buffer for the worker module; performing experience replay to sample a subset of the collected information stored in the respective experience buffer for each worker module; training a current neural network for each worker module based on its respective subset of the collected information; updating a target neural network for each worker module based on the training; and removing the subsets of collected information from the experience buffers after the updating as recited in independent claims 1 and 19.
Also for reference, Int’l Pub No. WO 2021/179625 to Li et al. discloses a hierarchical reinforcement learning system that includes an agent having different levels for making patient symptom inquiries, making medical department recommendations based on the symptoms, and receiving patient feedback regarding the recommended medical departments.  However, this document does not disclose iteratively training a number of low level worker modules of the agent during a training process of a high level master module of the agent, the iteratively training including storing, for each worker module, information collected by the worker module from interacting with the user simulator module in a respective experience buffer for the worker module; performing experience replay to sample a subset of the collected information stored in the respective experience buffer for each worker module; training a current neural network for each worker module based on its respective subset of the collected information; updating a target neural network for each worker module based on the training; and removing the subsets of collected information from the experience buffers after the updating as recited in independent claims 1 and 19.

In relation to the claim rejections under 35 USC 101 set forth in the non-final Office Action, these rejections are now withdrawn when currently pending claims 1-20 are considered in view of the 2019 Revised Patent Subject Matter Eligibility Guidance (which collectively includes the guidance in the January 7, 2019 Federal Register notice and the October 2019 update issued by the USPTO as incorporated into the MPEP) and Applicant’s remarks in the Amendment.  

Specifically, the “additional limitations” of the claims (including, inter alia, iteratively training each of the plurality of worker modules during a training process of the master module, wherein the iteratively training includes storing, for each worker module, information collected by the worker module from interacting with the user simulator module in a respective experience buffer for the worker module; performing experience replay to sample a subset of the collected information stored in the respective experience buffer for each worker module; training a current neural network for each worker module based on its respective subset of the collected information; updating a target neural network for each worker module based on the training; and removing the subsets of collected information from the experience buffers after the updating) together with the limitations directed to the at least one abstract idea (performing disease prediction via modeling the acts of doctors and users, interacting with the users to collect information, generating disease predictions based on the collected information, and producing rewards) when viewed as a whole, integrate the at least one abstract idea into a practical application of the at least one abstract idea or provide significantly more than the at least one abstract idea by improving the functioning of a computer and/or other technology.  
For instance, as discussed at paragraphs [0008] and [0078] of the present specification as well as in Applicant’s remarks in the Amendment, the recited manner of iteratively training each of the worker modules during a training process of the master module is in contrast to prior approaches that pre-train the low-level policy and reach a sub-optimal solution.  Furthermore, the recited limitations of iteratively training via storing, for each worker module, information collected by the worker module from interacting with the user simulator module in a respective experience buffer for the worker module; performing experience replay to sample a subset of the collected information stored in the respective experience buffer for each worker module; training a current neural network for each worker module based on its respective subset of the collected information; updating a target neural network for each worker module based on the training; and removing the subsets of collected information from the experience buffers after the updating advantageously speeds up the training process and leads to more accurate results.  See [0067] of the present specification.
Furthermore, the above-discussed additional limitations amount to other meaningful limitations beyond generally linking the use of a judicial exception to a particular technological environment.  MPEP 2106.05(e).  Specifically, such limitations are meaningful “because they integrate the results of the analysis into a specific and tangible method that results in the method moving form abstract scientific principles to specific application.”  Id.

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: The references cited on the attached PTO-892 disclose reinforcement learning systems for making predictions.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JONATHON A. SZUMNY whose telephone number is (303) 297-4376.  The examiner can normally be reached on Monday-Friday 8-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jason Dunham can be reached on 571-272-8109.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JONATHON A. SZUMNY/Patent Examiner
Art Unit 3686