DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 01/11/2021 has been entered.

Response to Amendment
The amendment filed 01/11/2021 has been entered. Claims 13-20 remain pending in the application.

Response to Arguments
Applicant’s arguments, filed 01/11/2021, with respect to the rejections of claims 13 and 16 under 103 have been fully considered and are persuasive because of the amendments. Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of Williams (Partially Observable Markov Decision Processes for Spoken Dialogue Management) in view of Yasuhiro et al. (JP2012-190062) and further in view of Podgorny et al. (US Patent 10,083,213).
Applicant argues
It appears as if the Examiner is taking Official Notice of this feature, as it is again respectfully asserted that there is no discussion regarding selecting a second ranked action if a first best action ID is unable to be used anywhere in Yasuhiro.
	In response
This argument is addressed in the Final Office action mailed 11/12/2020

Applicant argues
Williams fails to teach or suggest at least the above reproduced claim feature at least because Williams is silent regarding a number of alternative alpha vectors being user selected.
In response
	This limitation is not claiming.

Applicant argues
Williams fails to discuss any maximization of expected long-term cumulative rewards in each time step anywhere in the Williams specification. Rather, the discussion in Williams is regarding overall maximizing of a vector.
In response
Williams in pages 6-7 and 11 discloses the concept of selecting actions that maximizing the reward by citing: at each time-step t, the machine receives reward rt which depends on the current state and action, the cumulative, discounted reward accumulated by time-step t is written Vt, where, the cumulative, discounted, infinite horizon reward is called the return, denoted by V1, or simply V for short. The goal of the machine is to choose actions in such a way as to maximize the expected return E[V].
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

Claims 13-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.
Claims 13 and 16 are rejected for citing the limitation “top-k2” which renders the claim indefinite because it is unclear what top-k2 means. For the purpose of examination, the claims are interpreted without the limitation top-k2.
Claims 14-15 and 17-20 are rejected for being dependent on a rejected base claims, namely claims 13 and 16.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 13-20 are rejected under 35 USC. 103 as being unpatentable over Williams (Partially Observable Markov Decision Processes for Spoken Dialogue Management) in view of Yasuhiro et al. (JP2012-190062) and further in view of Podgorny et al. (US Patent 10,083,213).
As per claim 13, Williams teaches a system, comprising: 
a processor [page iii, 1st paragraph, computer/machine]; and 
performs a method for selecting an action [page 7, 1st paragraph, the goal of the machine is to choose actions in such a way as to maximize the expected return], the method comprising: 
reading, into a memory [page 52, last paragraph, the memory], a Partially Observed Markov Decision Process (POMDP) model [page 7, second paragraph, voicemail POMDP], the POMDP model having top-k action IDs for each of one or more belief state [Fig. 2.7 shows all 27 conditional plans for belief state b, Fig. 2.8, page 16, last paragraph, disclose top 5 conditional plans after 22 plans that do not contribute to the optimal policy are pruned, condition plans are labelled with their initial actions: doSave, ask, doDelete], the top-k action IDs maximizing expected long-term cumulative rewards in each time-step [page 6, last paragraph, “at each time-step t , the machine receives reward rt which depends on the current state and action, r(s, a), the cumulative, discounted reward accumulated by time-step t is written Vt”; page 11, last paragraph, “the machine’s task is to choose between a number of conditional plans to find the one which maximizes Vt “; page 7, first paragraph, “The cumulative, discounted, infinite horizon reward is called the return, denoted by V1, or simply V for short. The goal of the machine is to choose actions in such a way as to maximize the expected return E[V]”], and k being an integer of two or more determined by a number of alternative alpha vectors utilized in an execution time process of the POMDP model [Fig. 2.8 shows top 5 conditional plans after 22 plans that do not contribute to the optimal policy are pruned, condition plans are labelled with their initial actions: doSave, ask, doDelete; page 16 last paragraph-page 17 first paragraph, “plans which don’t contribute to the optimal policy are pruned, and in this example 22 of the 27 conditional plans are pruned, leaving 5 plans which contribute to the optimal policy. Figure 2.8 shows the values of those 2-step conditional plans which contribute to the optimal policy”], based on a point-based value iteration algorithm [page 15 last paragraph-page 16 first paragraph, “Each iteration of Algorithm 1 contains two steps. First, in the “generation” step, all potentially useful t-step conditional plans are created by enumerating all actions followed by all possible useful combinations of (t-1)-step plans. Then, in the “pruning” step, conditional plans which do not contribute to the optimal t-step policy are removed, leaving the set of useful t-step plans. The algorithm is repeated for T steps”; page 20, section 2.4, 1st paragraph, “Point-based value iteration (PBVI) … finds optimal conditional plans only at a finite set of N discrete belief points in belief space”; page 21, last paragraph, “PBVI generates only N . |A||O| possibly useful conditional plans in each iteration”], the alternative alpha vectors being calculated for each of the one or more belief states [page 9, section 2.2 Finding POMDP policies, “POMDP policies can take on various forms, including a collection of conditional plans … of belief space”; page 14 last paragraph-page 15 first paragraph, “finding the subset of possible t-step conditional plans which contribute to the optimal t-step policy. These conditional plans are called useful, and only useful t-step plans are considered when finding the (t + 1)-step optimal policy”; page 16 last paragraph-page 17 first paragraph, “plans which don’t contribute to the optimal policy are pruned, and in this example 22 of the 27 conditional plans are pruned, leaving 5 plans which contribute to the optimal policy”]; 
determining a set of top second (top-k2) alpha vectors [vectors with red marks (from the examiner) on the figure below] including a plurality of second best actions for each of the belief states [ask, doSave, doDelete, ask], each of the second best actions from the set of top-k2 alpha vectors being identified by a second best action ID based on a displayed graph including the first best action ID and the second best action ID [Figs. 2.6 or 2.8, the heavy line shows V1*(b): the optimal value (best action ID); page 16, second paragraph, “Figure 2.6 shows the first step of value iteration applied to the VOICEMAIL spoken dialog POMDP example problem … there are 3 possible conditional plans, one for each action. This figure shows the three 1-step conditional plan values {V11(b), V12(b), V13(b)} and the heavy line shows V1*, the value of the optimal 1-step policy”; it can be seen in fig. 2.6 that the heavy line is the best action ID, where each action (doSave, ask and doDelete) has the highest value. The fig. 2.6 also displays the lighter line (with the check marks from the examiner for easily seeing) right below the heavy line which is the next best action ID (second best action ID), where each action (ask, doSave, doDelete, ask) has the next highest value], and being generated by iteratively calculating the Page 2 of 13top-k2 alpha vectors until convergence, and pruning alpha vectors other than the top-k2 alpha vectors for each of the belief states [page 15 last paragraph-page 16 first paragraph, “Each iteration of Algorithm 1 contains two steps. First, in the “generation” step, all potentially useful t-step conditional plans are created by enumerating all actions followed by all possible useful combinations of (t-1)-step plans. Then, in the “pruning” step, conditional plans which do not contribute to the optimal t-step policy are removed, leaving the set of useful t-step plans. The algorithm is repeated for T steps”; Fig. 2.8 shows top 5 conditional plans after 22 plans that do not contribute to the optimal policy are pruned, condition plans are labelled with their initial actions: doSave, ask, doDelete; page 16 last paragraph-page 17 first paragraph, “plans which don’t contribute to the optimal policy are pruned, and in this example 22 of the 27 conditional plans are pruned, leaving 5 plans which contribute to the optimal policy];

    PNG
    media_image1.png
    487
    885
    media_image1.png
    Greyscale

Williams does not teach
a memory storing a program, which, when executed on the processor, performs 
in the execution-time process of the POMDP model, detecting a situation where an action identified by a first best action ID among the top-k action IDs for a current belief state is unable to be selected due to any of a plurality of constraints, the constraints including execution- time process constraints of the POMDP model; and 
selecting and executing the second best action identified by the second best action ID among the top-k action IDs and the set of top-k2 alpha vectors for the current belief state in response to a detection of the situation.  
Yasuhiro teaches
a memory storing a program, which, when executed on the processor, performs [paragraph 0118, “the program may be downloaded from a recording medium … into the computer via a communication line, and the program may be executed”].
in the execution-time process of the POMDP model [paragraph 0008, “action control by POMDP which can automatically determine the action of the system according to the statistics of data”], detecting a situation where an action identified by a first best action ID [action (rank 1)] among the top-k action IDs [paragraph 0073, “an action at' (rank 1) corresponding to the best score and an action at' (rank 2) corresponding to the second best score are obtained”] for a current belief state is unable to be selected due to any of a plurality of constraints [paragraph 0090, “determining the action of the system to avoid action control in which only the same action is repeatedly performed many times (constraint)”; paragraph 0076, receiving an action at′ (rank 1) corresponding to the best score from the score calculation unit, determining whether or not the immediately preceding action at-1 is the same as the action at' (rank 1); paragraph 0078, “when the determination result indicates that “at−1 and at ′ (rank 1) are the same””; It can be understood that when actions at−1 and at ′ (rank 1) are the same, the system may not select action at' (rank 1) to prevent the same action from being repeated many times]; 
selecting and executing the second best action identified by the second best action ID [action rank 2] among the top-k action IDs and the set of top-k2 alpha vectors for the current belief state in response to a detection of the situation [paragraph 0078, when the determination result indicates that “at−1 and at ′ (rank 1) are the same”, the selection unit determines the action duration corresponding to the real-time action duration m of the action at ′ (rank 1) and the probability Pat' (rank 1) (m) is received; paragraph 0082, if the probability is smaller than the uniform random number, the action at' (rank 1) corresponding to the best score of the real-time action continuation length m is determined as the action at' to be taken by the system; if the probability is equal to or more than uniform random number, the action at'(rank 2) is determined as the action at' to be taken by the system; paragraph 0086, avoiding the action control that executes only the same action over and over again; since Yasuhiro teaches selecting the second best action ID, and Williams teaches each action doSave, ask, doDelete is associated with a vector from the top k alpha vectors (fig. 2.6), thus the combination of Williams and Yasuhiro read on the above limitation].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have included selecting and executing a second best action among the top-k action IDs in response to detecting a best action ID is unable to be selected due to a constraint of Yasuhiro into the method of selecting actions in such a way as to maximize the expected return of Williams. Doing so would help avoiding action control in which only the same action is repeatedly performed many times (Yasuhiro, 0090).
Williams and Yasuhiro do not teach
the constraints including execution- time process constraints of the POMDP model (emphasis added); 
Podgorny teaches
detecting a situation where an action identified by a first best action ID among the top-k action IDs for a current belief state is unable to be selected due to any of a plurality of constraints, the constraints including execution- time process constraints of the POMDP model [abstract, “A question and answer based customer support system is provided through which users submit question data representing questions to be answered … and questions having a low quality question format are labeled improperly formatted questions”; Col. 4, line 67 – Col. 5, lines 1-10, “professional agent support personnel time are wasted trying to answer a low quality/low value question …  Worse yet, the longer time devoted to trying to answer the low quality/low value questions is often completely wasted because, by definition, neither the asking or searching users are likely to be satisfied with the answer data provided”; Col. 19, lines 37-44, “it was determined that it may be possible to transform the question type/format from a low quality format question … to a high quality format question … For example, asking the user to re-phrase/transform a "Why" type/format question into a closed-ended type/format question”; Fig. 2C recites “avoid asking “why” … We suggest re-phrasing your question. You’ll get better, faster response”; Col. 21, “Original Question: I don't understand why I can't efile" (first action) … Re-Phrased Question: "What steps do I need to take to efile?" (second action)”; Since Williams (as modified) teaches selecting the second best action when the first best action is unable to be selected due to a constraint, and Podgorny teaches selecting the second action instead of the first action because of the time constraint, and therefore, the combination of Williams (as modified) and Podgorny teach the claim limitation]; 


As per claim 14, Williams, Yasuhiro and Podgorny teach the system according to claim 13.
Williams further teaches
the top-k action IDs are top-k alpha vectors [Fig. 2.9, page 18 shows 34 vectors] and each of the top-k alpha vectors have an associated action [page 17, 2nd paragraph, 34 vectors is shown in fig. 2.9, the upper surface of these 34 vectors represents V(b), the value of the optimal infinite-horizon policy, the leftmost vector gives the value of a conditional plan which starts with the doSave action, the rightmost vector gives the value of a conditional plan which starts with the doDelete action, and all of the other vectors give the value of conditional plans which start with the ask action; Fig. 2.8 shows top 5 conditional plans/vectors, condition plans are labelled with their initial actions: doSave, ask, doDelete].  

As per claim 15, Williams, Yasuhiro and Podgorny teach the system according to claim 13.
Williams further teaches
the top-k action IDs are identifiers of top-k actions associated with alpha vectors [Fig. 2.8, page 16, last paragraph, disclose top 5 conditional plans, condition plans are labelled with their initial actions: doSave, ask, doDelete; Fig. 2.6, page 16 shows three conditional plans 1, 2 and 3, the plan from the top (higher value) is labelled with the doSave action, the middle plan is labelled with the ask action and the bottom plan is labeled with the doDelete action; page 17, 2nd paragraph, “34 vectors is shown in fig. 2.9 … the upper surface of these 34 vectors represents V(b), the value of the optimal infinite-horizon policy, the leftmost vector gives the value of a conditional plan which starts with the doSave action, the rightmost vector gives the value of a conditional plan which starts with the doDelete action, and all of the other vectors give the value of conditional plans which start with the ask action”].-2- 

As per claim 16, Williams teaches 
a computer [page iii, 1st paragraph, a computer] to perform a method [page 7, 1st paragraph, the goal of the machine is to choose actions in such a way as to maximize the expected return] comprising: 
reading, into a memory [page 52, last paragraph, the memory], a Partially Observed Markov Decision Process (POMDP) model [page 7, second paragraph, voicemail POMDP], the POMDP model having top-k action IDs for each of one or more belief state [Fig. 2.7 shows all 27 conditional plans for belief state b, Fig. 2.8, page 16, last paragraph, disclose top 5 conditional plans after 22 plans that do not contribute to the optimal policy are pruned, condition plans are labelled with their initial actions: doSave, ask, doDelete], the top-k action IDs maximizing expected long-term cumulative rewards in each time-step [page 6, last paragraph, “at each time-step t , the machine receives reward rt which depends on the current state and action, r(s, a), the cumulative, discounted reward accumulated by time-step t is written Vt”; page 11, last paragraph, “the machine’s task is to choose between a number of conditional plans to find the one which maximizes Vt “; page 7, first paragraph, “The cumulative, discounted, infinite horizon reward is called the return, denoted by V1, or simply V for short. The goal of the machine is to choose actions in such a way as to maximize the expected return E[V]”], and k being an integer of two or more determined by a number of alternative alpha vectors utilized in an execution time process of the POMDP model [Fig. 2.8 shows top 5 conditional plans after 22 plans that do not contribute to the optimal policy are pruned, condition plans are labelled with their initial actions: doSave, ask, doDelete; page 16 last paragraph-page 17 first paragraph, “plans which don’t contribute to the optimal policy are pruned, and in this example 22 of the 27 conditional plans are pruned, leaving 5 plans which contribute to the optimal policy. Figure 2.8 shows the values of those 2-step conditional plans which contribute to the optimal policy”], based on a point-based value iteration algorithm [page 15 last paragraph-page 16 first paragraph, “Each iteration of Algorithm 1 contains two steps. First, in the “generation” step, all potentially useful t-step conditional plans are created by enumerating all actions followed by all possible useful combinations of (t-1)-step plans. Then, in the “pruning” step, conditional plans which do not contribute to the optimal t-step policy are removed, leaving the set of useful t-step plans. The algorithm is repeated for T steps”; page 20, section 2.4, 1st paragraph, “Point-based value iteration (PBVI) … finds optimal conditional plans only at a finite set of N discrete belief points in belief space”; page 21, last paragraph, “PBVI generates only N . |A||O| possibly useful conditional plans in each iteration”], the alternative alpha vectors being calculated for each of the one or more belief states [page 9, section 2.2 Finding POMDP policies, “POMDP policies can take on various forms, including a collection of conditional plans … of belief space”; page 14 last paragraph-page 15 first paragraph, “finding the subset of possible t-step conditional plans which contribute to the optimal t-step policy. These conditional plans are called useful, and only useful t-step plans are considered when finding the (t + 1)-step optimal policy”; page 16 last paragraph-page 17 first paragraph, “plans which don’t contribute to the optimal policy are pruned, and in this example 22 of the 27 conditional plans are pruned, leaving 5 plans which contribute to the optimal policy”]; 
determining a set of top second (top-k2) alpha vectors [vectors with red marks (from the examiner) on the figure below] including a plurality of second best actions for each of the belief states [ask, doSave, doDelete, ask], each of the second best actions from the set of top-k2 alpha vectors being identified by a second best action ID based on a displayed graph including the first best action ID and the [Figs. 2.6 or 2.8, the heavy line shows V1*(b): the optimal value (best action ID); page 16, second paragraph, “Figure 2.6 shows the first step of value iteration applied to the VOICEMAIL spoken dialog POMDP example problem … there are 3 possible conditional plans, one for each action. This figure shows the three 1-step conditional plan values {V11(b), V12(b), V13(b)} and the heavy line shows V1*, the value of the optimal 1-step policy”; it can be seen in fig. 2.6 that the heavy line is the best action ID, where each action (doSave, ask and doDelete) has the highest value. The fig. 2.6 also displays the lighter line (with the check marks from the examiner for easily seeing) right below the heavy line which is the next best action ID (second best action ID), where each action (ask, doSave, doDelete, ask) has the next highest value], and being generated by iteratively calculating the Page 2 of 13top-k2 alpha vectors until convergence, and pruning alpha vectors other than the top-k2 alpha vectors for each of the belief states [page 15 last paragraph-page 16 first paragraph, “Each iteration of Algorithm 1 contains two steps. First, in the “generation” step, all potentially useful t-step conditional plans are created by enumerating all actions followed by all possible useful combinations of (t-1)-step plans. Then, in the “pruning” step, conditional plans which do not contribute to the optimal t-step policy are removed, leaving the set of useful t-step plans. The algorithm is repeated for T steps”; Fig. 2.8 shows top 5 conditional plans after 22 plans that do not contribute to the optimal policy are pruned, condition plans are labelled with their initial actions: doSave, ask, doDelete; page 16 last paragraph-page 17 first paragraph, “plans which don’t contribute to the optimal policy are pruned, and in this example 22 of the 27 conditional plans are pruned, leaving 5 plans which contribute to the optimal policy];

    PNG
    media_image1.png
    487
    885
    media_image1.png
    Greyscale


Williams does not teach
a computer program product for selecting an action, the computer program product comprising a non-transitory computer readable storage medium having program instructions embodied therewith, wherein the computer readable storage medium is not a transitory signal per se, the program instructions executable by  a computer;
in the execution-time process of the POMDP model, detecting a situation where an action identified by a first best action ID among the top-k action IDs for a current belief state is unable to be selected due to any of a plurality of constraints, the constraints including execution- time process constraints of the POMDP model; and 
selecting and executing the second best action identified by the second best action ID among the top-k action IDs and the set of top-k2 alpha vectors for the current belief state in response to a detection of the situation.  
Yasuhiro teaches
a computer program product for selecting an action, the computer program product comprising a non-transitory computer readable storage medium having program instructions embodied therewith, wherein the computer readable storage medium is not a transitory signal per se, the program [paragraph 0118, the program may be downloaded from a recording medium into the computer via a communication line, and the program may be executed];
in the execution-time process of the POMDP model [paragraph 0008, “action control by POMDP which can automatically determine the action of the system according to the statistics of data”], detecting a situation where an action identified by a first best action ID [action (rank 1)] among the top-k action IDs [paragraph 0073, “an action at' (rank 1) corresponding to the best score and an action at' (rank 2) corresponding to the second best score are obtained”] for a current belief state is unable to be selected due to any of a plurality of constraints [paragraph 0090, “determining the action of the system to avoid action control in which only the same action is repeatedly performed many times (constraint)”; paragraph 0076, receiving an action at′ (rank 1) corresponding to the best score from the score calculation unit, determining whether or not the immediately preceding action at-1 is the same as the action at' (rank 1); paragraph 0078, “when the determination result indicates that “at−1 and at ′ (rank 1) are the same””; It can be understood that when actions at−1 and at ′ (rank 1) are the same, the system may not select action at' (rank 1) to prevent the same action from being repeated many times]; 
selecting and executing the second best action identified by the second best action ID [action rank 2] among the top-k action IDs and the set of top-k2 alpha vectors for the current belief state in response to a detection of the situation [paragraph 0078, when the determination result indicates that “at−1 and at ′ (rank 1) are the same”, the selection unit determines the action duration corresponding to the real-time action duration m of the action at ′ (rank 1) and the probability Pat' (rank 1) (m) is received; paragraph 0082, if the probability is smaller than the uniform random number, the action at' (rank 1) corresponding to the best score of the real-time action continuation length m is determined as the action at' to be taken by the system; if the probability is equal to or more than uniform random number, the action at'(rank 2) is determined as the action at' to be taken by the system; paragraph 0086, avoiding the action control that executes only the same action over and over again; since Yasuhiro teaches selecting the second best action ID, and Williams teaches each action doSave, ask, doDelete is associated with a vector from the top k alpha vectors (fig. 2.6), thus the combination of Williams and Yasuhiro read on the above limitation].
same rationale as claim 13.
Williams and Yasuhiro do not teach
detecting a situation where an action identified by a first best action ID among the top-k action IDs for a current belief state is unable to be selected due to any of a plurality of constraints, the constraints including execution- time process constraints of the POMDP model (emphasis added); 
Podgorny teaches
detecting a situation where an action identified by a first best action ID among the top-k action IDs for a current belief state is unable to be selected due to any of a plurality of constraints, the constraints including execution- time process constraints of the POMDP model [abstract, “A question and answer based customer support system is provided through which users submit question data representing questions to be answered … and questions having a low quality question format are labeled improperly formatted questions”; Col. 4, line 67 – Col. 5, lines 1-10, “professional agent support personnel time are wasted trying to answer a low quality/low value question …  Worse yet, the longer time devoted to trying to answer the low quality/low value questions is often completely wasted because, by definition, neither the asking or searching users are likely to be satisfied with the answer data provided”; Col. 19, lines 37-44, “it was determined that it may be possible to transform the question type/format from a low quality format question … to a high quality format question … For example, asking the user to re-phrase/transform a "Why" type/format question into a closed-ended type/format question”; Fig. 2C recites “avoid asking “why” … We suggest re-phrasing your question. You’ll get better, faster response”; Col. 21, “Original Question: I don't understand why I can't efile" (first action) … Re-Phrased Question: "What steps do I need to take to efile?" (second action)”; Since Williams (as modified) teaches selecting the second best action when the first best action is unable to be selected due to a constraint, and Podgorny teaches selecting the second action instead of the first action because of the time constraint, and therefore, the combination of Williams (as modified) and Podgorny teach the claim limitation]; 
claim 16 is rejected using the same rationale as claim 13.

As per claim 17, Williams, Yasuhiro and Podgorny teach the computer program product according to claim 16.
Williams further teaches
the top-k action IDs are top-k alpha vectors [Fig. 2.9, page 18 shows 34 vectors] and each of the top-k alpha vectors have an associated action [page 17, 2nd paragraph, 34 vectors is shown in fig. 2.9, the upper surface of these 34 vectors represents V(b), the value of the optimal infinite-horizon policy, the leftmost vector gives the value of a conditional plan which starts with the doSave action, the rightmost vector gives the value of a conditional plan which starts with the doDelete action, and all of the other vectors give the value of conditional plans which start with the ask action; Fig. 2.8 shows top 5 conditional plans/vectors, condition plans are labelled with their initial actions: doSave, ask, doDelete].  

As per claim 18, Williams, Yasuhiro and Podgorny teach the computer program product according to claim 16.
Williams further teaches
the top-k action IDs are identifiers of top-k actions associated with alpha vectors [Fig. 2.8, page 16, last paragraph, disclose top 5 conditional plans, condition plans are labelled with their initial actions: doSave, ask, doDelete; Fig. 2.6, page 16 shows three conditional plans 1, 2 and 3, the plan from the top (higher value) is labelled with the doSave action, the middle plan is labelled with the ask action and the bottom plan is labeled with the doDelete action; page 17, 2nd paragraph, “34 vectors is shown in fig. 2.9 … the upper surface of these 34 vectors represents V(b), the value of the optimal infinite-horizon policy, the leftmost vector gives the value of a conditional plan which starts with the doSave action, the rightmost vector gives the value of a conditional plan which starts with the doDelete action, and all of the other vectors give the value of conditional plans which start with the ask action”].-2- 

As per claim 19, Williams, Yasuhiro and Podgorny teach the computer program product according to claim 17.
Williams further teaches
alpha vectors other than the top-k alpha vectors are pruned when the top-k alpha vectors are selected [page 17, 2nd paragraph, 34 vectors is shown in fig. 2.9, the upper surface of these 34 vectors represents V(b), the value of the optimal infinite-horizon policy, the leftmost vector gives the value of a conditional plan which starts with the doSave action, the rightmost vector gives the value of a conditional plan which starts with the doDelete action, and all of the other vectors give the value of conditional plans which start with the ask action; Fig. 2.7 shows all 27 conditional plans/vectors for belief state b, Fig. 2.8, page 16, last paragraph, disclose top 5 conditional plans after 22 plans that do not contribute to the optimal policy are pruned, condition plans are labelled with their initial actions: doSave, ask, doDelete].-3-  

As per claim 20, Williams, Yasuhiro and Podgorny teach the computer program product according to claim 18.
Williams further teaches
alpha vectors other than the alpha vectors associated with the top-k action IDs are pruned when the top-k actions are selected [page 17, 2nd paragraph, 34 vectors is shown in fig. 2.9, the upper surface of these 34 vectors represents V(b), the value of the optimal infinite-horizon policy, the leftmost vector gives the value of a conditional plan which starts with the doSave action, the rightmost vector gives the value of a conditional plan which starts with the doDelete action, and all of the other vectors give the value of conditional plans which start with the ask action; Fig. 2.7 shows all 27 conditional plans/vectors for belief state b, Fig. 2.8, page 16, last paragraph, disclose top 5 conditional plans after 22 plans that do not contribute to the optimal policy are pruned, condition plans are labelled with their initial actions: doSave, ask, doDelete].

Prior Art

The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.
Osogami (US Patent 9,747,616) describes a method of calculating an optimum policy in a transition model having completely observable visible states and unobservable hidden states.
Williams (US Pub. 2014/0330554) describes a method of combining manual design of spoken dialog systems with an automatic learning approach.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TRI T NGUYEN whose telephone number is 571-272-0103.  The examiner can normally be reached on M-F, 8 AM-5 PM, (CT).

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ALEXEY SHMATOV can be reached on 571-270-3428.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/T. N./Examiner, Art Unit 2123                                                                                                                                                                                                        
/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123