DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 2/4/2021 has been entered.
Claims 1, 3-6, 10-11, 13, 17-19, 21, and 25 have been amended. Claims 1, 3-11, 13-19, and 21-25 are pending and have been examined.
Response to Arguments/Amendments
Applicant's arguments filed 2/4/2021 have been fully considered but they are not persuasive. 
On p. 13 of the 2/4/2021 response, Applicant essentially argues that cited art of record Tesauro is directed to performance of the computing system, not the response of users, and therefore fails to teach reward data describing “each human user’s interaction” as claimed. In response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986). It is noted that cited art of record .
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1, 3, 7, 10, 13-14, 17, 19, 21-22, and 25 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent Application Publication 2008/0313116 by Groble (“Groble”) in .

	In regard to claim 1, Groble discloses:
1. A method of displaying performance data for computer implemented decision policies, comprising: See Groble, at least Fig. 5, depicting a method.
… a first reward statistic comprising an actual performance result for a policy implemented by an application; See Groble, ¶ 0020, “The selected and activated interaction policy may be more suitable for a user, according to the changed parameters, than a previously used interaction policy.” Also see 0049, e.g. “policy evaluator 208 may score each of the generated interaction policies with respect to the user models.” Groble does not expressly disclose displaying. However, Hirai teaches the display of data. See Hirai, ¶ 0062, e.g. “display and compare the classification results.” It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Groble’s reward score with Hirai’s display in order to provide the ability to select the best results as suggested by Hirai.
obtaining experimental data corresponding to previously implemented policies, See Groble, Fig. 4, depicting data 402 and policies 404.
wherein the experimental data is generated by receiving reward data from the application for each one of a plurality of human users, See Groble, ¶ 0026 and 0042, e.g. “Parameter values may specify transition probabilities, observation probabilities, and expected rewards … Parameter updater 410 may receive an indication of a change with wherein the reward data describes one or more details about each human user's interaction with the application in response to a customization applied pursuant to the previously implemented policies; However, Tesauro teaches this. See Tesauro, Fig. 4, elements 408 and ¶ 0022-0023, e.g. “the method 400 obtains an initial decision-making entity and a reward mechanism (e.g., from a user of the computing system to which the method 400 is applied … In block 408, the method records at least one instance of observable data … an observation in accordance with step 408 is defined by a tuple that, … denotes … a reward, r, generated by the reward mechanism responsive to action a in state s. … the observed action, a, may comprise an exploratory "off-policy" action differing from the preferred action of the initial decision-making entity, taken in order to facilitate more effective reward-based learning. The observations are logged by the method 400 as training data for use in deriving a new policy.” It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Grobel’s data with Tesauro’s observations in order to automatically derive a policy for making management decisions in a computer system as suggested by Tesauro (see ¶ 0022).
generating a hypothetical policy by a policy training system that enables a client to specify policy training parameters…; See Groble, Fig. 2, parameters 202 and policy trainer/generator 206. Also See Groble, ¶ 0019, e.g. “the selected interaction policies maximize an expected reward over a population of the expected users and the selected interaction policies have a minimum size. The expected reward may be determined based on a particular set of values for a parameter set.” Also see ¶ 0026, e.g. “Parameter values may specify transition probabilities, observation probabilities, and expected rewards.” Also see ¶ 0042, e.g. “In response to receiving the user preference information, or other information, parameter updater 410 may update one or more values of parameters of parameter set 402.” Grobel does not expressly disclose … to be used by the policy training system to train the hypothetical policy on the experimental data. However, this is taught by Tesauro. See Tesauro, Fig. 4, elements 408-414 along with ¶ 0022-0023, e.g. “the policy may be based on a set of hand-crafted behavior rules or on an explicit system performance model … The observations are logged by the method 400 as training data for use in deriving a new policy, as described in greater detail below.” Also see It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Grobel’s policies with Tesauro’s policy training in order to automatically derive a policy as suggested by Tesauro (see ¶ 0022).
computing a second reward statistic for a hypothetical policy using a reward function applied to the experimental data; and See Groble, ¶ 0049, e.g. “policy evaluator 
… [data that indicates] which of the corresponding policies is likely to provide a better user experience when using the application. See Groble, ¶ 0050, e.g. “calculate an average score of each of the generated interaction policies with respect to each of the user models … processing device 100 may select up to the predetermined number of interaction policies with highest scores within the specific tolerances of the optimal interaction policy scores for each of the user models that achieve a goal of the user within a minimum number of dialog turns, based on the simulating, where a dialog turn may be defined as one user action or one agent action, such as, for example, an action of statistical interaction manager 406.” Groble does not expressly disclose displaying the second reward statistic together with the first reward statistic to enable the client to compare the first reward statistic and the second reward statistic, wherein the displaying of the first reward statistic and the second reward statistic indicates … [data]. However, this is taught by Hirai. See Hirai, ¶ 0062, e.g. “display and compare the classification results … The user selects the best result from the classification results.” It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Groble’s policy score data with Hirai’s data display in order to provide the ability to select the best results as suggested by Hirai.


3. The method of claim 1, wherein the hypothetical policy is generated by a machine learning algorithm based on the policy training parameters specified by the client. See Groble, ¶ 0005 and 0019, e.g. “policies may be learned.” Also see Tesauro, ¶ 0022-0023 as cited above.

	In regard to claim 7, Groble discloses:
7. The method of claim 1, comprising: computing a third reward statistic for a baseline policy from the experimental data, wherein the baseline policy is generated by a machine learning algorithm; and See Groble, ¶ 0029, e.g. “optimal interaction policy.”
Groble does not expressly disclose displaying the third reward statistic together with the first reward statistic and the second reward statistic. However, this is taught by Hirai. See Hirai, ¶ 0062, e.g. “display and compare the classification results … The user selects the best result from the classification results.” It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Groble’s reward score with Hirai’s display in order to provide the ability to select the best results as suggested by Hirai.

	In regard to claim 10, Groble and Hirai also teach:
10. The method of claim 1, comprising generating a selection tool that is selectable by the client to deploy the hypothetical policy for the application. See Groble, 

	In regard to claim 11, Groble discloses:
11. A system for evaluating computer implemented decision policies, comprising: a display device; a processor; and a system memory comprising code to direct the processor to: See Groble, Fig. 1, depicting a system. 
All further limitations of claim 11 have been addressed in the above rejection of claim 1.

	In regard to claims 13-14 and 17, parent claim 11 is addressed above. All further limitations have been addressed in the above rejections of claims 3, 7 and 10, respectively. 

	In regard to claim 19, Groble discloses:
19. One or more computer-readable memory storage devices for storing computer-readable instructions that, when executed, instruct one or more processing devices to: See Groble, ¶ 0024, e.g. “machine-readable medium.” 
…
generate a policy training tool that enables a client to specify policy training parameters See Groble, Fig. 2, parameters 202 and policy trainer/generator 206. Also See Groble, ¶ 0019, 0026, and 0042 as cited in the rejection of claim 1 above.
to be used to train a hypothetical policy; generate the hypothetical policy based on the policy training parameters specified by the client; However, this is taught by Tesauro. See Tesauro, ¶ 0022-0023 as cited in the rejection of claim 1 above. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Grobel’s policies with Tesauro’s policy training in order to automatically derive a policy as suggested by Tesauro (see ¶ 0022).
All further limitations of claim 19 have been addressed in the above rejection of claim 1.

	In regard to claim 20, Groble discloses:
20. The computer-readable memory storage devices of claim 19, comprising computer-readable instructions that, when executed, instruct one or more processing devices to: generate a policy training tool that enables the user to specify policy training parameters to be used to train the hypothetical policy; and generate the hypothetical policy offline based on the policy training parameters specified by the user. See Groble, Fig. 2, parameters 202 and policy trainer/generator 206. Also see Fig. 4, element 404 “Predefined Policies.”

	In regard to claims 21-22, parent claims 19-20 are addressed above. All further limitations have been addressed in the above rejections of claims 3 and 7, respectively. 

. 

Claims 4-5 are rejected under 35 U.S.C. 103 as being unpatentable over Groble in view of Hirai and Tesauro as applied above, and further in view of U.S. Patent 7,653,893 to Neumann (“Neumann”).

	In regard to claim 4, Groble does not expressly disclose the claimed limitations. However, Neumann teaches the following: 
4. The method of claim 1, wherein the hypothetical policy is selected by the client from a list of predefined hypothetical policies. See Neumann, col. 4, lines 49-54, e.g. “In act 210, a list of policies is displayed, which allows a user to select one or more policies in the list of policies.”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Groble and Hirai’s policy display with Neumann’s list in order to allow for user selection as suggested by Neumann.

	In regard to claim 5, Groble does not expressly disclose the claimed limitations. However, Nuemann teaches them:
5. The method of claim 1, wherein the reward function is selected by the client from a list of predefined reward functions. See Neumann, col. 4, lines 49-54, e.g. “In act 210, a list of policies is displayed, which allows a user to select one or more policies in .

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Groble in view of Hirai and Tesauro as applied above and further in view of U.S. Patent Application Publication 2006/0206337 by Paek et al. (“Paek”).

	In regard to claim 6, Groble does not expressly disclose the claimed limitations. However, Paek teaches the following:
6. The method of claim 1, wherein the reward function is manually generated by the client. See Paek, ¶ 0003, e.g. “dialog designers have had to either explicitly specify a reward function …” It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Groble’s function with Paek’s manual generation in order to allow for configurability.

Claims 8, 15, and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Groble in view of Hirai and Tesauro as applied above, and further in view of U.S. Patent Application Publication 2013/0318023 by Morimura (“Morimura”).

In regard to claim 8, Groble does not expressly disclose the claimed limitations. However, Morimura teaches the following:
8. The method of claim 1, wherein displaying the first reward statistic and second reward statistic comprises displaying line graphs representing the reward statistics plotted over a specified time window. See Morimura, Fig. 5, depicting a reward line graph. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Groble and Hirai’s reward display with Morimura’s line graph in order to provide a format to analyze data.

	In regard to claim 15, parent claim 11 is addressed above. All further limitations have been addressed in the above rejection of claim 8. 

	In regard to claim 23, parent claim 19 is addressed above. All further limitations have been addressed in the above rejection of claim 8. 

Claims 9, 16, 18, 24 are rejected under 35 U.S.C. 103 as being unpatentable over Groble in view of Hirai and Tesauro as applied above, and further in view of U.S. Patent Application Publication 2006/0123389 by Kolawa et al. (“Kolawa”).

In regard to claim 9, Groble does not expressly disclose the claimed limitations. However, Kolawa teaches the following:
9. The method of claim 1, wherein displaying the first reward statistic and second reward statistic comprises displaying a bar graph representing the reward statistics averaged over a specified time window. See Kolawa, ¶ 0096, e.g. “bar graph.” It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Groble and Hirai’s reward display with Kolawa’s bar graph in order to provide a format to analyze data.

	In regard to claim 16, parent claim 11 is addressed above. All further limitations have been addressed in the above rejection of claim 9. 

	In regard to claim 18, Groble does not expressly disclose the claimed limitations. However, Kolawa teaches the following:
18. The system of claim 11, comprising code to direct the processor to generate an interface tool for receiving a selection of a time window from the client, wherein the time window determines a length of time over which the first reward statistic and second reward statistic are computed and displayed. See Kolawa, ¶ 0096, e.g. “clicking on a date within the Average Confidence Factor graph displays an Average Confidence Factor by Category bar graph.” It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Groble and Hirai’s 

	In regard to claim 24, parent claim 19 is addressed above. All further limitations have been addressed in the above rejection of claim 9. 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. “An Application of Reinforcement Learning to Dialogue Strategy Selection in a Spoken Dialogue System for Email” by Walker teaches selection of a dialogue strategy/policy using human interaction. See Abstract.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to James D Rutten whose telephone number is (571)272-3703.  The examiner can normally be reached on M-F 9:00-5:30 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached on (571)272-3768.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.





/James D. Rutten/Primary Examiner, Art Unit 2121