Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Claims 1-6, 8-13, 15-16 and 18-20 are pending.  Claims 1, 8 and 15 are independent and have been amended.  
This Application was published as U.S. 2019/0206402.
Priority used for search is 12/29/2017.

This Application is related to a fair number of U.S. applications as follows:
16/233,539, U.S. 20190202061
16/233,566, issued as U.S. 10567570.
16/233,640, 
16/233,678, issued as U.S. 11222632.
16/233,716, U.S. 20190206407.
16/233,786, issued as U.S. 11003860
16/233,829, issued as U.S. 11024294
16/233,939, issued as U.S. 10967508.
16/233,986, issued as U.S. 10994421.
16/234,041 NOA mailed 1/19/2022.


This action is Final.

Response to Arguments
Applicant’s arguments are moot in view of the new grounds of rejection.

Suggestion:
Finally, considering that both the instant Application and Akolkar have the same goal of “reduce the iterations for questioning and answering” (Akolkar [0118]), the distinction comes down to if and how their particular methods of making the Dialog more efficient differ.  Look at the equation used by the instant Application in [0107]-[0113] of the published Application that maximizes an expectation function which includes a PG factor.
The difference may be in the “Joint -PG” feature that is mentioned in Claim 4 (11 and 18) but is not defined with particularity.  The aspect of “Space” and “Time,” taken together, needs to be defined in a manner that distinguishes it from Moturu.  Another aspect is the concept of “scene” that is mentioned broadly and loosely in the Claim but has a specific definition in the Specification of the instant Application:  “[0086] In representing a user's ongoing mindset, different world models 730-5 and a dialogue context 730-6 established from the perspective of the user are used to represent the mindset of the user. The world models 730-5 for a user may be set up based on spatial, temporal, and causal representations. Such representations may be derived based on what is observed in the dialogue and characterizes, e.g., what is observed in the scene (e.g., objects such as desk, chair, computer on the desk, a toy on the floor, etc.) and how they are related (e.g., how objects spatially related). In this exemplary implementation, such representations may be developed using AND-OR graphs or AOG. Spatial representation of objects may be represented by S-AOG. The temporal representation of the objects over time may be represented by T-AOG (e.g., what is done on which object over time). Any causal relationship between temporal actions and spatial representation may be represented by C-AOG, e.g., when an action of moving a chair is performed (the spatial location of the chair is changed).”

Summary of Reasons why Akolkar and Moturu teach the Claim
Amended Claim 1 provides:
1. A method implemented on at least one machine including at least one processor, storage, and communication platform capable of connecting to a network for an automated dialogue companion, 
the method comprising: 
receiving multimodal input data associated with a user engaged in a dialogue with a predetermined goal on a certain topic in a dialogue scene, wherein the dialogue is managed based on a dialogue tree having a plurality of nodes, each of which is associated with a utility and some of which have branches representing alternative conversations of the dialogue, the multimodal input data capture a communication from the user and information surrounding the dialogue scene;
analyzing the multimodal input data to generate a current state of the dialogue and a context of the dialogue, wherein the current state of the dialogue corresponds to a node in the dialogue tree; 
accessing first utilities associated with first one or more branches of the node with respect to the current state of the dialogue, wherein the first utilities characterize effectiveness of different dialogue strategies represented by the first one or more branches with respect to the user; and
determining a response communication to be conveyed to the user in response to the communication in accordance with the first utilities and second utilities associated with respectively a plurality of branches of the first one or more branches, wherein the response communication maximizes look-ahead expected utilities given the current state, 
wherein both the first and second utilities are learned based on historic dialogue data with respect to the goal of the dialogue on the certain topic. 

In short, Akolkar teaches all of the limitations and aspects and policy/goal of the Claim but for two details: 
(1) Akolkar does not teach a muti-modal input and taking into account context (environmental and sensor data) which is taught by Moturu and the combination of which with Akolkar is logical because, for example, in the health scenario of Moturu, an Interactive Voice Response in Dialog with a Patient would use sensor data such as blood pressure etc. in addition to answers from the user/patient in order to decide what question to ask next.
(2) Akolkar does not teach the “Dialog Tree” of the Claim.  Unlike the “Multi-modal input” of (1), this would be a key and fundamental feature.  However, Akolkar teaches the use of directed graph G(V,E) for modeling the logic flow and the information of the conversation logic flow is stored as Finite State Machine where the “topology of the logic flow can be freely changed by adding more context states and/or updating transitions.”  (Akolkar [0109].)  A Decision Tree/ Dialog Tree is a special case and a simplified version of a FSM.  Note the number of times the instant Application refers to a “node” as the “current state of the dialogue” Claim 6.  Thus, this key feature is effectively taught by Akolkar and buttressed by the combination with Moturu which expressly includes “Decision Tree” for dialog.
Further, the key concept of the Claim is reflected in “accessing first utilities associated with first one or more branches of the node with respect to the current state of the dialogue, wherein the first utilities characterize effectiveness of different dialogue strategies represented by the first one or more branches with respect to the user.”  This is very nicely taught by eff(Q) of Akolkar and the teachings of [0118]-[0127] on p. 9 of Akolkar.
With respect to the “Dialogue Tree” that is emphasized in the Arguments see the following parts of the instant Application:
[0060] FIG. 4B illustrates a part of a dialogue tree of an on-going dialogue with paths taken based on interactions between the automated companion and a user, according to an embodiment of the present teaching. In this illustrated example, the dialogue management at layer 3 (of the automated companion) may predict multiple paths with which a dialogue, or more generally an interaction, with a user may proceed. In this example, each node may represent a point of the current state of the dialogue and each branch from a node may represent possible responses from a user. As shown in this example, at node 1, the automated companion may have three separate paths which may be taken depending on a response detected from a user. If the user responds with an affirmative response, dialogue tree 400 may proceed from node 1 to node 2. At node 2, a response may be generated for the automated companion in response to the affirmative response from the user and may then be rendered to the user, which may include audio, visual, textual, haptic, or any combination thereof [0061] If, at node 1, the user responds negatively, the path is for this stage is from node 1 to node 10. If the user responds, at node 1, with a "so-so" response (e.g., not negative but also not positive), dialogue tree 400 may proceed to node 3, at which a response from the automated companion may be rendered and there may be three separate possible responses from the user, "No response," "Positive Response," and "Negative response," corresponding to nodes 5, 6, and 7, respectively. Depending on the user's actual response with respect to the automated companion's response rendered at node 3, the dialogue management at layer 3 may then follow the dialogue accordingly. For instance, if the user responds at node 3 with a positive response, the automated companion moves to respond to the user at node 6. Similarly, depending on the user's reaction to the automated companion's response at node 6, the user may further respond with an answer that is correct. In this case, the dialogue state moves from node 6 to node 8, etc. In this illustrated example, the dialogue state during this period moved from node 1, to node 3, to node 6, and to node 8. The traversal through nodes 1, 3, 6, and 8 forms a path consistent with the underlying conversation between the automated companion and a user. As shown in FIG. 4B, the path representing the dialogue is represented by the solid lines connecting nodes 1, 3, 6, and 8, whereas the paths skipped during a dialogue is represented by the dashed lines.


    PNG
    media_image1.png
    681
    586
    media_image1.png
    Greyscale

	Note the similarity of Figure 4B in terms of Yes, and No paths to the formulation of eff(Q) in Akolkar which depends on the probabilities of Yes and No at each state/node.

Response to Arguments in Detail
Akolkar
Applicant argues:

    PNG
    media_image2.png
    324
    576
    media_image2.png
    Greyscale


    PNG
    media_image3.png
    250
    574
    media_image3.png
    Greyscale


    PNG
    media_image4.png
    85
    576
    media_image4.png
    Greyscale


    PNG
    media_image5.png
    64
    563
    media_image5.png
    Greyscale


    PNG
    media_image6.png
    385
    576
    media_image6.png
    Greyscale

Response 14-15.

Applicant describes the function and goal of the instant Application and therefore its Claims as presenting an optimized set of questions to the user of an IVR that minimizes the path (number of questions) to the goal of the user.  In other words, instead of asking a large number of questions to get the necessary information, it tries to ask the minimum number of questions from the user.
Alkolkar, as characterized by the Response (pp. 15-16) has the same goal of getting the required information out of the use with the minimum number of questions and minimum annoyance of the user.
Thus, so far, the Application and the Reference have a common goal and parallel features.
Then, the Applicant argues that Akolkar achieves its goal by a method that is different from that of the Claims because Akolkar does not teach a “dialogue tree with multiple nodes, some of such nodes have branches, and each node associated with personalized utilities learned based on historic dialogue data of a user.”  Response 15.2
According to the Applicant’s argument, Akolkar has two shortcomings:
1- Akolkar does not teach the use of “dialogue tree with multiple objects.”
2- Akolkar does not teach that the personal history of a single user is taken into consideration.

In Reply:
Regarding “dialogue tree”:
Alkokar teaches a “Dialog Engine 420” in Figure 4 which identifies “a subset of candidate information technology services … provided by a plurality of vendors” and a finite state diagram with an ontology of services in Figure 8 each of which service, as shown in Figure 8, may include several functions/actions.  Alkokar optimizes the effectiveness of a sequence of questions and the most effective sequence is the one selected.  Each question by the machine is an action.  See Akolkar [0119] to [0125].  This is consistent with the definition of “action” in the instant Application ([0079] … delivering some audio response …”).
As provided in the rejection of Claim 1, Akolkar optimizes the sequence of questions which teach the “response” of the machine to the human user and is also an “action” by the machine (consistent with the definition of “action” in the instant Application ([0079] … delivering some audio response …”): “[0125] From the perspective of the traditional optimization problem, the goal of picking the question sequence is to find the one that maximizes the effectiveness ….”  Note the way the Claim defines “Utility”: “wherein the utilities characterize effectiveness of different dialogue strategies with respect to the user under different circumstances and are learned based on historic dialogue data and corresponding contexts of the historic dialogue data;”  “[0126] In at least some instances, the best question sequence cannot be obtained via pre-computing, because the variables … used for computing the effectiveness of a single question would change according to the answer of the previous question, making eff(Q.sub.i+1) depend on the answer of Q.sub.i. Therefore, one or more embodiments of CSM dynamically compute the next best question based on the previous answer on-the-fly….”  The Argmax in [0125] and [0126] works on iteration and is recursive as also supported by Alkokar “[0119] In one or more embodiments, to reduce the number of iterations …”   Alkokar also teaches finding the “next best question” which teaches the “look ahead action” of the Claim.  Finally, the “utilities” are based on user interaction history and reflect the preferences of the user.  See Alkokar [0123].
The selecting of the actions/questions in order to maximize a “utility”/effectiveness is already present in Alkokar.  Alkokar uses a FSM and not a dialog tree.  FSMs have states and state transitions cause some output/action.  FSM is a more sophisticated decision tree and a decision tree is a simplified FSM.  Note paragraphs [0060] and [0061] of the published Application that describe the Dialogue Tree of Figure 4B of the instant Application and equate “state” of the conversation with “node” of the Dialogue Tree.
Further, as provided by a reference that is cited as evidence: 
“[0059] In general, a FSM may be described as a tree where the nodes of the tree represent the state of the FSM and the branches or directed edges are used to describe possible state transitions. Associated with each branch is an input-output label wherein the input represents the k-bit input and the output, the n-bit output. A path through the tree may be described as a concatenated sequence of branches that follow the tree. For example, in FIG. 1C a portion 115 of the tree for the FSM of the BCC 100 is shown.”  (See Heegard (U.S. 20020037059).)

	Regarding the use of “personal history of a single user”:
In Akolkar, the probability associated with each potential answer by the user in the formula eff(Q) which measures the effectiveness of a question Q (see [0121]) is obtained from the “customer history” which teaches the “historic dialogue data” of the Claim.  “[0123] In the above formulas, the probabilities (p(yes), p(no), p(oi)) can be estimated, for example, via an empirical distribution obtained from customer history. ….”  In the above, oi is the ith option.

Finally, both of the above features were mapped also to the secondary reference Moturu.

Moturu
Regarding Moturu, the Applicant argues:

    PNG
    media_image7.png
    283
    569
    media_image7.png
    Greyscale


    PNG
    media_image8.png
    54
    576
    media_image8.png
    Greyscale

Response 15-16.

Regarding Moturu, Applicant presents two arguments:
1- Moturu does not teach the use of a dialogue tree with multiple nodes ….
2- Moturu (or Akolkar) does not teach or suggest the “look-ahead based optimization” that is claimed in “accessing first utilities associated with first one or more branches of the node the user with respect to the current state of the dialogue, wherein the first utilities characterize effectiveness of different dialogue strategies represented by the first one or more branches with respect to the user; and determining a response communication to be conveyed to the user in response to the communication in accordance with the first utilities and second utilities associated with respectively a plurality of branches of the first one or more branches, wherein the response communication maximizes look-ahead expected utilities given the current state, wherein both the first and second utilities are learned based on historic dialogue data with respect to the goal of the dialogue on the certain topic.”

In Reply, Moturu teaches the use of a decision tree with nodes and branches for the “communication model” that is used to determine a communication plan in Figure 3, S150.  “[0050] Determining a communication plan in Block S150 can additionally or alternatively include generating and/or applying a communication model. Communication models preferably output one or more components of a communication plan based on communication-related features and/or datasets ….  In another variation, applying a communication model can include applying a communication decision tree model, such as a decision tree model including internal nodes and branches selected based on correlations between automated communications and user outcomes (e.g., in relation to user conditions)….”
Moturu teaches that the communication model applied at step S150 of Figures 1 and 2, which determines a “tailored communication plan” to converse with the patient/user maximizes a reward which optimizes user outcome based on all the inputs shown in steps S110, S120, S130, S140, and S145 of Figures 1 and 2.  “[0051] … In specific examples, Block S150 can include training and applying a reinforcement learning model (e.g., deep reinforcement learning model), such as a reinforcement learning model for maximizing a reward (e.g., determining components of a communication plan for optimizing user outcomes, improving user conditions, user openness, and/or other suitable user parameters, etc.); a reinforcement learning model (e.g., inverse reinforcement learning model) for mimicking an observed behavior (e.g., care provider communication behavior in user-provider communications where the user was receptive, etc.); and/or any other suitable type of reinforcement learning models. ….”  Accordingly, Moturu also maximizes a reward associated with a communication plan and a communication plan inherently includes at least two consecutive actions (question – answer, for example) which teach the “combination of the action and at least one look-ahead action” of the Claim.  Further see:  “[0039] Blocks S110-S145 can thus provide passive data (e.g., unobtrusively collected data) and/or active data (e.g., survey data) that can be taken as inputs in Block S150 and/or other portions of the method 100 in order to generate communication plans tailored to past, present, and/or future user behaviors, user conditions, care provider behaviors, and/or any other suitable aspects associated with a user-provider relationship….”  A “plan” by definition is a “look-ahead” model.  See Figure 11: “Query Plan” / “Communication Plan.”
Moturu also uses a “learning model” which teaches the “learned utilities” of the Claim.  See Figure 8 which shows the “update” of the communication plan according to “user engagement.”  See [0018] for “tailored, dynamically modifiable communication plan.”
Accordingly, both the primary and secondary references teach the “maximizes look-ahead expected utilities” by considering at least two consecutive “actions”: question-response sets.

35 U.S.C. 112(f) Claim Interpretation
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. 
Such claim limitation(s) is/are: “device” and “user interaction engine” and “dialogue manager” in Claim 15 and “utility learning engine” in Claim 19.  These limitations are generic in the context of the art and don’t refer to any specific structure and only serve as placeholders for the structure that performs the associated function(s) without providing any information about what that structure is.  MPEP 2181 I A.
Applicant has acknowledged the interpretation.  (Applicant’s Response, p. 14.)

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 5-6, 8, 12-13, 15, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Akolkar (U.S. 2014/0337010) in view of Moturu (U.S. 2017/0213007).
Regarding Claim 1, Akolkar teaches:
1. A method implemented on at least one machine including at least one processor, storage, and communication platform capable of connecting to a network for an automated dialogue companion, [Akolkar, see Figure 1 provided above for the hardware components including “processing unit 16,” “memory 28,” and “I/O interfaces 22.”  Figures 4 and 11 showing the “conversational interface 416” and Figure 4 showing the “dialog engine 420.”]

    PNG
    media_image9.png
    477
    705
    media_image9.png
    Greyscale


    PNG
    media_image10.png
    490
    666
    media_image10.png
    Greyscale


    PNG
    media_image11.png
    435
    735
    media_image11.png
    Greyscale

the method comprising: 
receiving multimodal input data associated with a user engaged in a dialogue with a predetermined goal on a certain topic in a dialogue scene, [Input in Akolkar is in “natural language” form and Figure 17 shows examples of “initial input” such as “I want to pay my employees” or “I want to arrange a company event.”  The examples of Figure 17 show that the user is “engaged in a dialogue with a predetermined goal on a certain topic.” The “candidate services” under the column “#Category Candidates” of Figure 17 teach the “topics” of the Claim such as “Events, Music, Travel, Government, Calendar,” for example.  Akolkar does not specify the spoken or written form of the input only that it is in natural language and subject to semantic analysis.]  [Akolkar does not collect sensor or multimodal data.]

    PNG
    media_image12.png
    378
    739
    media_image12.png
    Greyscale

wherein the dialogue is managed based on a dialogue tree having a plurality of nodes, each of which is associated with a utility and some of which have branches representing alternative conversations of the dialogue, the multimodal input data capture a communication from the user and information surrounding the dialogue scene; [ Akolkar does not mention a “dialog tree” but a FSM (finite state machine) is very much like a dialog tree; a FSM includes states/nodes/Vertex and transitions between states/branches/Edge.  “[0109] G(V, E), where vertex V denotes the context state and edge E denotes the transition between contexts. FIG. 12 shows exemplary topology of the logic flow. Based on the discussion with domain experts, initially define eight main context states 1202, 1204, 1230, 1232, 1234, 1236, 1238, 1240 (along with several exception handling states, omitted for clarity and brevity) and fourteen transitions (labeled arrows between the states). One or more embodiments of CSM {cloud services marketplace} use a finite state machine to store the information of the conversation logic flow. The topology of the logic flow can be freely changed by adding more context states and/or updating transitions.”] 

    PNG
    media_image13.png
    440
    635
    media_image13.png
    Greyscale

analyzing the multimodal input data to generate a current state of the dialogue and a context of the dialogue, wherein the current state of the dialogue corresponds to a node in the dialogue tree; [Akolkar teaches that the current state of the dialog is represented by a Vertex of FSM which is like the node of a tree.  “[0105] … At the very beginning, the customer 1102 tells CSM about his or her requirement via the conversational interface 416…. Receiving the retrieved data, the Dialog {Engine} updates the meta-data accordingly, including the user's input and the index of candidate services, and, at the same time, updates the conversation context 1104 according to the logic flow in FIG. 12 to continue the conversation….”  In Akolkar “conversation context” and “state” are the same.  “[0110] At the beginning of each conversation (recognized as the creation of a new session), Dialog Engine 420 creates the profile for the current session, and sets the current state as Service Category Identification 1202. In this state, CSM assumes that the customer's input is related to the scenario of looking for proper service categories. ….”  “[0111] More particularly, the customer inputs a requirement at 1202 and the system attempts to identify the corresponding pertinent category. …”  The State/Node/Vertex is updated as more input is provided by the user. ]
accessing first utilities associated with first one or more branches of the node with respect to the current state of the dialogue, [Akolkar, Figure 12, shows going from a node/state/vertex along a branch/transition/edge to another state and each of these transitions/branches have a certain “effectiveness” / “utility” in getting the user to his final goal.  Akolkar maximizes the effectiveness/utility of line of questions and that is how it decides which question to ask next/ which branch to take.  “First Utilities” are defined as the “effectiveness of a dialog strategy” and Akolkar teaches calculating the “effectiveness of a question sequence” or “eff(Q)” which teaches the “First Utilities” of the Claim.  See below for quotes from Akolkar.  Figure 12 shows the state diagram (FSM) used by Alkokar where the states/Vertices teach the “nodes” of the Claim.  From each state/node/vertex going to each of the possible transitions/branches/edges effectuates a different degree of effectiveness (eff(Q)) /utility.  See the description of “Service Filtering” [0113]-[0130] and particularly [0125]-[0126] where eff*(Q) is discussed as the “best question sequence” which maximizes the effectiveness of the overall strategy and permits the user to reach his goal faster.  The “Conversation Flow Control” uses a “cloud services marketplace (CSM).”]
wherein the first utilities characterize effectiveness of different dialogue strategies represented by the first one or more branches with respect to the user; and [Akolkar teaches that to best assist the user with the most appropriate service, the system has to ask more questions. But on the other hand, too many questions irritate the user and therefore the system tries to come up with an optimal sequence of questions to ask to get to the goal of the user faster and with the least irritation.  The “sequence of questions Q” in Akolkar teaches the “dialogue strategy” of the Claim.  The “eff(Q)” in Akolkar teaches the “first utilities” of the Claim because it characterizes the “effectiveness of different dialogue strategies.”  See [0119] below.  “[0118] The more questions the customer answers, the more unsatisfactory services can be pruned. However, too many questions may degrade the quality of the user experience. To make the service filtering effective, one or more embodiments provide a novel method referred to as Iteration-Min to reduce the iterations for the questioning and answering.”  “[0119] In one or more embodiments, to reduce the number of iterations, find a sequence of questions Q={Q1, Q2 . . . Qn} with the least length to rule out all unsatisfied candidate services via capability or configuration. Quantitatively, use eff(Q) to evaluate how effectively the sequence can filter the candidates. The effectiveness of a question sequence can be considered as the sum of the effectiveness of all its questions, i.e., eff(Q)=Ʃi eff(Qi). Concretely, the effectiveness of a question is qualified as the expected number of candidates it can prune, i.e. nprune, based on the customer's potential answer.”  “[0123] In the above formulas, the probabilities (p(yes), p(no), p(oi)) can be estimated, for example, via an empirical distribution obtained from customer history. ….”  “[0110] … FIG. 12 shows exemplary topology of the logic flow. Based on the discussion with domain experts, initially define eight main context states 1202, 1204, 1230, 1232, 1234, 1236, 1238, 1240 (along with several exception handling states, omitted for clarity and brevity) and fourteen transitions (labeled arrows between the states)….”]
determining a response communication to be conveyed to the user in response to the communication in accordance with the first utilities and second utilities associated with respectively a plurality of branches of the first one or more branches, [Akolkar, Figure 11, the “dialog Engine 420” decides the “response” of the machine which may be another “question” based on the “exemplary logic flow” shown in Figure 12 which is in the form of a “finite state machine.”  (See [0109].)  The finite state machine of Figure 12 is based on the current state of the dialog and includes “service filtering 1232 and 1204” steps which “prunes” the unsatisfactory services ([0118]).  The pruning is related to the effectiveness of the “sequence of questions Q” which is eff(Q)/utility.  ([0119]-[0121]).  So the response is determined using the finite state machine of Figure 12, shown as G(V,E) as a function of a current state of the conversation and the eff(Q)/utility of a “sequence of questions.”  The “utility” of the Claim is taught by effectiveness function eff(Q) of Akolkar which changes dynamically as the conversation proceeds and teaches first, second, etc utility of the Claim.  “[0126] In at least some instances, the best question sequence cannot be obtained via pre-computing, because the variables … used for computing the effectiveness of a single question would change according to the answer of the previous question, making eff(Qi+1) depend on the answer of Qi. Therefore, one or more embodiments of CSM dynamically compute the next best question based on the previous answer on-the-fly. …”   “Utility” is defined as “first utilities characterize effectiveness of different dialogue strategies represented by the first one or more branches” which determines whether it is better to go down the first branch or the second branch, for example.  The equation for eff(Q) in Akolkar ([0120], equation 1) includes p(yes) and p(no) and equation 2 in [0121] includes p(oi) which is the probability that the user selects option oi.  Each of Yes, or No or option Oi sends the dialog and hence the FSM (or decision tree) down a different branch and teaches “first utilities and second utilities associated with respectively a plurality of branches “ of the Claim. ]
wherein the response communication maximizes look-ahead expected utilities given the current state, [Akolkar, for “Look-ahead expected utilities given the current state” see [0126] and equation (5) where “Therefore, one or more embodiments of CSM dynamically compute the next best question based on the previous answer on-the-fly.” ]
wherein both the first and second utilities are learned based on historic dialogue data with respect to the goal of the dialogue on the certain topic. [Akolkar, The probability associated with each potential answer by the user in the formula eff(Q) which measures the effectiveness of a question Q (see [0121]) is obtained from the “customer history” which teaches the “historic dialogue data” of the Claim.  “[0123] In the above formulas, the probabilities (p(yes), p(no), p(oi)) {oi is the ith option} can be estimated, for example, via an empirical distribution obtained from customer history. ….”]

Akolkar does not teach a multimodal input.
Akolkar does not teach the use of a dialog tree, although a FSM does the same job.
Moturu teaches:
1. A method implemented on at least one machine including at least one processor, storage, and communication platform capable of connecting to a network for an automated dialogue companion, [Moturu, the user device in Figure 2 or Figure 6 is shown as a “mobile device” which would inherently include all of the hardware components recited.  See also [0068].]

    PNG
    media_image14.png
    377
    470
    media_image14.png
    Greyscale
 
    PNG
    media_image15.png
    584
    456
    media_image15.png
    Greyscale

the method comprising: 
receiving multimodal input data associated with a user engaged in a dialogue with a predetermined goal on a certain topic in a dialogue scene, [Moturu receives input as speech or text of the user, see Figure 6, e.g., and also receives sensor data regarding mobility of the user and his physical condition.  The dialog as shown in Figure 6 is about the topic of the health of the user.  “[0029] In some variations, Block S120 can include receiving one or more of: location information, movement information (e.g., related to physical isolation, related to lethargy), device usage information (e.g., screen usage information, physical movement of the mobile device, etc.), device authentication information (e.g., information associated with authenticated unlocking of the mobile device), and/or any other suitable information….”  “[0030] In some variations, Block S120 can include collecting biometric data associated with user conditions, such as from electronic health records, sensors of mobile devices and/or supplemental medical devices, user inputs (e.g., entries by the user at the mobile device), and/or other suitable sources. Biometric data can include one or more of: electroencephalogram (EEG) data, electrooculogram (EOG) data, electromyogram (EMG) data, electrocardiogram (ECG) data, airflow data (e.g., nasal airflow, oral airflow, measured by pressure transducers, thermocouples, etc.), pulse oximetry data, sound probes, polysomnography data, family conditions, genetic data, microbiome data, and/or any other biometric data.”  “[0033] … Additionally or alternatively, the device event data can include data from sensors (e.g., accelerometer, gyroscope, other motion sensors, other biometric sensors, etc.) implemented with the mobile device and/or other suitable devices,….” “[0051] … In a specific example, the method 100 can include: applying a machine learning communication model to tag a communication with a topic (e.g., where the communication model is trained on a training dataset including text messages and associated topic labels); mapping the topic to a subset of potential automated communications associated with the topic; and selecting an automated communication to transmit to the user from the subset of potential automated communications….”]


    PNG
    media_image16.png
    548
    473
    media_image16.png
    Greyscale

wherein the dialogue is managed based on a dialogue tree having a plurality of nodes, each of which is associated with a utility and some of which have branches representing alternative conversations of the dialogue, the multimodal input data capture a communication from the user and information surrounding the dialogue scene; [Moturu teaches that its “communication model” can include a “communication decision tree model” including “nodes and branches.”  A decision tree by definition includes branches that represent alternatives.  Moturu takes in multimodal sensor data including for example “mobility” of the user which teaches the “information surrounding the dialogue scene” of the Claim.  Figure 3, S120: mobility supplemental dataset.  [0028]-[0030].   See “[0050] … applying a communication model can include applying a communication decision tree model, such as a decision tree model including internal nodes and branches selected based on correlations between automated communications and user outcomes (e.g., in relation to user conditions)…..”]
analyzing the multimodal input data to generate a current state of the dialogue and a context of the dialogue, wherein the current state of the dialogue corresponds to a node in the dialogue tree; [Moturu, Figure 1, step S150 determines a tailored communication plan for the user based on the data provided to the mobile device.  Moturu teaches the use of a decision tree which has nodes/states corresponding to the selected communications and responses:  “[0050] Determining a communication plan in Block S150 can additionally or alternatively include generating and/or applying a communication model. Communication models preferably output one or more components of a communication plan based on communication-related features and/or datasets, but any suitable inputs can be leveraged by communication models for generating any suitable outputs. The communication model can include any one or more of: probabilistic properties, heuristic properties, deterministic properties, and/or any other suitable properties. In a variation, the communication model can include weights assigned to different communication-related features and/or datasets. For example, features extracted from user-provider communications can be weighted more heavily than features extracted from communications between a user and a non-care provider. In another example, mobility behaviors associated with promoted therapeutic interventions (e.g., user locations where a therapeutic intervention is provided) can be weighted more heavily than mobility behaviors associated with user daily activities. In another variation, applying a communication model can include applying a communication decision tree model, such as a decision tree model including internal nodes and branches selected based on correlations between automated communications and user outcomes (e.g., in relation to user conditions). In a specific example, a communication decision tree model can start with an initial automated communication (e.g., to be transmitted to the user), and subsequent automated communications can be selected and transmitted based on user responses (e.g., associated user meaning, user sentiment, etc.) to communications. However, applying communication decision tree models can be performed in any suitable manner.”]

    PNG
    media_image17.png
    535
    452
    media_image17.png
    Greyscale

accessing first utilities associated with first one or more branches of the node with respect to the current state of the dialogue, wherein the first utilities characterize effectiveness of different dialogue strategies represented by the first one or more branches with respect to the user; and [Moturu teaches that its “communication plan” is designed to “optimize user outcomes” ([0051]) which conveys the same idea but is not specific with respect to branches and nodes of the decision tree.]
determining a response communication to be conveyed to the user in response to the communication in accordance with the first utilities and second utilities associated with respectively a plurality of branches of the first one or more branches, wherein the response communication maximizes look-ahead expected utilities given the current state,  [Moturu teaches that the communication model applied at step S150 of Figures 1 and 2, which determines a “tailored communication plan” to converse with the patient/user maximizes a reward which optimizes user outcome based on all the inputs shown in steps S110, S120, S130, S140, and S145 of Figures 1 and 2.  “[0051] … In specific examples, Block S150 can include training and applying a reinforcement learning model (e.g., deep reinforcement learning model), such as a reinforcement learning model for maximizing a reward (e.g., determining components of a communication plan for optimizing user outcomes, improving user conditions, user openness, and/or other suitable user parameters, etc.); a reinforcement learning model (e.g., inverse reinforcement learning model) for mimicking an observed behavior (e.g., care provider communication behavior in user-provider communications where the user was receptive, etc.); and/or any other suitable type of reinforcement learning models. ….”]
wherein both the first and second utilities are learned based on historic dialogue data with respect to the goal of the dialogue on the certain topic. [Moturu teaches that the communication model applied at step S150 of Figures 1 and 2, which determines a “tailored communication plan” to converse with the patient/user is trained on “historic communications” their “context” based on all the inputs shown in steps S110, S120, S130, S140, and S145 of Figures 1 and 2.   “[0045] In a variation of Block S150, determining a communication plan (and/or associated components) can be based on historic communications (e.g., historic automated communications, user-provider communications, associated user responses, communications associated with other users such as users sharing a subgroup, etc.). Determining a communication plan based on historic communications can include one or more of, in relation to historic communications: determining contextual parameters (e.g., based on data from Blocks S110-S145, etc.), extracting meaning (e.g., user meaning associated with user inputs), determining sentiment (e.g., emotional sentiment associated with a communication, with a therapeutic intervention, with an application feature, etc.), topic tagging (e.g., detecting, categorizing, and/or otherwise tagging communications with topics, which can be used for determining and/or promoting therapeutic interventions, identifying transition events for transitioning between care providers and an automated communication determination system for transmitting communications, summarizing communications for subsequent analysis, updating communication plans, searching communications, determining content components and/or format components, etc.), summarizing communication content (e.g., for documentation such as in relation to the Health Insure Portability and Accountability Act and/or other regulations; for supporting care providers by providing summaries of historic communications with the user; for topic tagging; etc.), and/or any other suitable processes.  ….”]

Akolkar and Moturu pertain to natural language conversational dialog systems where a machine is trained to conduct a dialog with a user and both teach optimizing a conversational model to provide optimized responses or a course of dialog to the user.  It would have been obvious to combine the system of Akolkar that relies on natural language input alone with the system of Moturu which includes multimodal sensor data pertaining the user in order to arrive at a more comprehensive set of inputs for determining and optimizing what the next step of dialog should be (when the dialog depends on of what is happening to the user in addition to what the user is saying) and as combining prior art elements according to known methods to yield predictable results and also to replace the FSM of Akolkar that is used to obtain states of the dialog with the system of Moturu which uses a decision tree to obtain a next state of dialog as an equivalent or simpler system and as simple substitution of one known element for another to obtain predictable results. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Regarding Claim 5, Alkokar does not discuss machine learning.
Moturu teaches and suggests:
5. The method of claim 1, further comprising machine learning the utilities which comprises: [Moturu, “[0051] In another variation of Block S150, applying a communication model can include applying one or more machine learning communication models employing one or more machine learning approaches ….” ]
accessing the historic dialogue data related to past dialogues; [Moturu determines its “tailored communication plan” based on historic/past dialog data. “[0045] In a variation of Block S150, determining a communication plan (and/or associated components) can be based on historic communications (e.g., historic automated communications, user-provider communications, associated user responses, communications associated with other users such as users sharing a subgroup, etc.)….”]
obtaining, via machine learning, the utilities based on the historic dialogue data, wherein the utilities are formulated as the expected utilities with respect to actions specified by the dialogue tree given the current state of the dialogue; [Moturu.  “Utility” is a measure of effectiveness of a “tailored communication plan” which in Moturu is taught by maximizing the “reward” and Moturu performs machine learning to maximize “reward” / utility:  “[0051] … In a specific example, the method 100 can include: applying a machine learning communication model to tag a communication with a topic (e.g., where the communication model is trained on a training dataset including text messages and associated topic labels); mapping the topic to a subset of potential automated communications associated with the topic; and selecting an automated communication to transmit to the user from the subset of potential automated communications. In another specific example, Block S150 can include training a neural network model (e.g., a generative neural network model) with an input neural layer using features derived datasets described in Blocks S110-S145 to dynamically output content components for an automated communication, and/or any other suitable components of a communication plan. In specific examples, Block S150 can include training and applying a reinforcement learning model (e.g., deep reinforcement learning model), such as a reinforcement learning model for maximizing a reward (e.g., determining components of a communication plan for optimizing user outcomes, improving user conditions, user openness, and/or other suitable user parameters, etc.); ….”]
receiving, continuously, updated dialogue data of additional dialogues involving the user; and [Moturu, “[0020] … The technology can continuously collect and utilize specialized datasets unique to internet-enabled, non-generalized mobile devices in order to personalize and automate communications between a user and care provider for facilitating treatment….”  “[0026] Preferably, Block S110 is implemented using a module of a processing subsystem configured to interface with a native data collection application executing on a mobile device (e.g., smartphone, tablet, personal data assistant, personal music player, vehicle, head-mounted wearable computing device, wrist-mounted wearable computing device, etc.) of the user. As such, in one variation, a native data collection application can be installed on the mobile device of the user, can execute substantially continuously while the mobile device is in an active state (e.g., in use, in an on-state, in a sleep state, etc.), and can record communication parameters (e.g., communication times, durations, contact entities) of each inbound and/or outbound communication from the mobile device….”]
updating dynamically the utilities based on the updated dialogue data of the additional dialogues. [Moturu suggests this limitation because the act of adapting or learning can be a one time training on a pre-determined set of data but is more normally a continuous process as more data comes available.  Moturu teaches continuous data collection.  Moturu teaches machine learning on history.  These two teachings together suggest that the machine learning is continuous as new data is becoming available.]
Akolkar and Maturu pertain to conversational systems and it would have been obvious to modify Akolkar which uses previous stored (historic) user behavior with Maturu that teaches conducting machine learning based on historic dialog data as combining prior art elements according to known methods to yield predictable results or simple substitution of one known element for another to obtain predictable results. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Regarding Claim 6, Akolkar teaches:
6. The method of claim 5, wherein the step of determining the response communication comprises: 
identifying a plurality of actions associated with a node in the dialogue tree corresponding to the current state of the dialogue; [Akolkar, the “actions” are potentially appropriate responses in each state of the dialog which depend on which service is selected.  Akolkar calls them “candidates” which teach the “actions” of the Claim.  “[0084] FIGS. 5-7 demonstrate how a customer interacts with CSM via the Conversational Interface 416. In some embodiments, the UI is divided into two parts: (1) the conversation area 502, which includes the conversation display area and the text input area 514, and (2) the candidate service list area 504, which displays the qualified candidate services selected based on the conversation.… Accordingly, CSM retrieves all the relevant services (e.g., service 1 through n) and displays them on the right side at 504 under the "Matched Service List" heading. To further filter the matching services list, CSM provides the customer with the next level of details while simultaneously ruling out unqualified candidates and conducting service configuration, through a series of iterative question and answer procedures which guide the customer through the requirements.”]
determining a reward associated with each of the plurality of actions based on the learned utilities associated with the user; and [Akolkar, the “reward” is taught by the “number of candidates a sequence of questions Q can prune” which determines the  “Effectiveness”/ “utility” value (“eff(Q)”) of the sequence Q of questions based on the goal of the user.  “[0113] Service Filtering.”  “[0119] In one or more embodiments, to reduce the number of iterations, find a sequence of questions Q={Q.sub.1, Q.sub.2 . . . Q.sub.n} with the least length to rule out all unsatisfied candidate services via capability or configuration. Quantitatively, use eff(Q) to evaluate how effectively the sequence can filter the candidates. The effectiveness of a question sequence can be considered as the sum of the effectiveness of all its questions, i.e., eff(Q)=.SIGMA..sub.i eff(Q.sub.i). Concretely, the effectiveness of a question is qualified as the expected number of candidates it can prune, i.e. n.sub.prune, based on the customer's potential answer. There are three types of questions and their effectiveness is evaluated differently:”]
selecting an action as the response communication from the plurality of actions, [Akolkar, “[0084] … Accordingly, CSM retrieves all the relevant services (e.g., service 1 through n) and displays them on the right side at 504 under the "Matched Service List" heading. To further filter the matching services list, CSM provides the customer with the next level of details while simultaneously ruling out unqualified candidates and conducting service configuration, through a series of iterative question and answer procedures which guide the customer through the requirements.”]
that corresponds to a maximum utility represented as a function of the reward. [Akolkar, maximum utility is maximum effectiveness eff(Q) which is a function of the “number of questions it can prune” / “reward.” “[0125] From the perspective of the traditional optimization problem, the goal of picking the question sequence is to find the one that maximizes the effectiveness ….”  The “effectiveness” is the utility of the Claim.]
Akolkar teaches a finite state diagram with an ontology of services in Figure 8 each of each as shown in Figure 8 may include several functions/action.  Alkokar optimizes the effectiveness of a sequence of questions and the most effective sequence is the one selected.  Each question is an action.  See [0119] to [0125].   Alkokar uses a FSM and not a dialog tree which is taught by Moturu.  The dialog tree of Moturu can substitute the FSM of Akolkar as an equivalent/similar method in the context of conversation/dialog.  (In conversation: states and actions are the same: going from state to state, the action that occurs is the dialog portion that is uttered. Question or Response.  See Heegard in the Conclusion section below.)

Claim 8 is a computer program product system claim with limitations corresponding to the limitations of method Claim 1 and is rejected under similar rationale.
8. Machine readable and non-transitory medium having information recorded thereon for an automated dialogue companion, wherein the information, when read by the machine, causes the machine to perform: 
….
Claim 12 is a computer program product system claim with limitations corresponding to the limitations of method Claim 5 and is rejected under similar rationale.
Claim 13 is a computer program product system claim with limitations corresponding to the limitations of method Claim 6 and is rejected under similar rationale.
Claim 15 is a system claim with limitations corresponding to the limitations of Claim 1 and is rejected under similar rationale.
15. A system for an automated dialogue companion, comprising: 
a device configured for receiving multimodal input data associated with a user engaged in a dialogue of a certain topic in a dialogue scene, wherein the multimodal input data capture a communication from the user and information surrounding the dialogue scene; 
a user interaction engine configured for 
…. and 
a dialogue manager configured for …. 

Claim 19 is a system claim with limitations corresponding to the limitations of Claim 5 and is rejected under similar rationale.
Claim 20 is a system claim with limitations corresponding to the limitations of Claim 6 and is rejected under similar rationale.
Claims 2-4, 9-11, 16, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Akolkar and Moturu in view of Breazeal (U.S. 2017/0206064).
Regarding Claim 2, Akolkar teaches:
2. The method of claim 1, wherein the multimodal input data include at least audio data, visual data, text data, and haptic data. [Akolkar, the input is expressly said to be natural language and not specified as voice or text which are easily interchangeable.  However, in Figure 5, the input is shown to be text input into the interface. ]
This limitation is interpreted as including ALL of the listed modes because there is no “one” after “at least.”
Moturu teaches:
2. The method of claim 1, wherein the multimodal input data include at least audio data, visual data, text data, and haptic data. [Muturu teaches multimodal input including voice ore text by the patient/user:  “[0014] As such, variations of the method 100 and/or system 200 can be implemented in characterizing and/or improving user conditions including any one or more of: …  communication-related conditions (e.g., expressive language disorder; stuttering; phonological disorder; autism disorder; voice conditions ….”  “[0027] As such, Block S110 preferably enables collection of one or more of: phone call-related data … media such as images, charts and graphs, audio, video, file, links, emojis, clipart, etc.) … vocal and textual content (e.g., text and/or voice data that can be used to derive features indicative of negative or positive sentiments; textual and/or audio inputs collected from a user in response to automated textual and/or voice communications; etc.) ….”  “[0017] …. For example, the technology can improve tailoring of communication plans (e.g., live and automated communication with users) and associated promotion of therapeutic interventions through leveraging passively collected digital communication data (e.g., text messaging features, phone calling features, user-provider relationship features, etc.) and/or supplementary data (e.g., mobility behavior data extracted from GPS sensors of mobile devices) that would not exist but for advances in mobile devices (e.g., smartphones) and associated digital communication protocols (e.g., WiFi-based phone calling; video conferencing for digital telemedicine; etc.)….”] 
Rationale for combination as provided for Claim 1.

Haptic data is not taught by Moturu as input: “[0043] Relating to Block S150, automated communications preferably include one or more format components defining format-related aspects associated with presentation of the automated communication. The format components can include any one or more of: … touch parameters (e.g., braille parameters; haptic feedback parameters; etc.);….”  Note that the smartphone shown in Moturu would include a touchscreen but this is not express.
Breazeal teaches:
2. The method of claim 1, wherein the multimodal input data include at least audio data, visual data, text data, and haptic data. [Breazeal, Figure 5 showing Microphone array 506, Cameras 504, Touch Sensors 508, and  “[0314] … In such instances, PCD 100 may operate to acquire text based/GUI/speech entered information such as during a "getting acquainted" interaction….”  “[0369] … For example, a person may text a message to a PCD 100 associated with a user within which is embedded an emoticon representing an emotion or social action that the sender of the message wishes to convey via PCD 100….”  A keyboard appears as a part of touch sensors 508 but is not expressly mentioned.  ASR 206 of Figure 2 is repeated mentioned to generate text.  Claims is interpreted as including all of the modes recited. ]
Akolkar/Moturu and Breazeal pertain to natural language conversational dialog systems where a machine is trained to conduct a dialog with a user.  It would have been obvious to combine the system of combination that relies on multimodal communication but does not expressly indicate a haptic input with the system of Breazeal which includes a comprehensive multimodal input system expressly counting the various modes of input in order to be more comprehensive with respect to input and as combining prior art elements according to known methods to yield predictable results. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Regarding Claim 3, Akolkar does not mention mood or sentiment.  Alkokar finds the movements of the user which teaches “act performed by the user” but not by analysis of visual data.
Moturu teaches:
3. The method of claim 2, wherein the step of analyzing the multimodal input data comprises at least one of: 
analyzing the audio data to recognize content of the communication from the user, characteristics of the communication indicative of an emotion conveyed in the communication, and acoustic sound in the dialogue scene; and [Moturu, “[0027] As such, Block S110 preferably enables collection of one or more of: … vocal and textual content (e.g., text and/or voice data that can be used to derive features indicative of negative or positive sentiments ….”  “[0030] In some variations, Block S120 can include collecting biometric data associated with user conditions, such as from electronic health records, sensors of mobile devices and/or supplemental medical devices, user inputs (e.g., entries by the user at the mobile device), and/or other suitable sources. Biometric data can include one or more of: electroencephalogram (EEG) data, electrooculogram (EOG) data, electromyogram (EMG) data, electrocardiogram (ECG) data, airflow data (e.g., nasal airflow, oral airflow, measured by pressure transducers, thermocouples, etc.), pulse oximetry data, sound probes, polysomnography data ….”  “[0030] In some variations, Block S120 can include collecting biometric data associated with user conditions, …. Biometric data can include one or more of: … sound probes, …”]
analyzing the visual data to information surrounding the dialogue scene, including at least one of: a facial expression of the user, an emotion associated with the facial expression, an act performed by the user, and one or more objects in the dialogue scene and the spatial relationships thereof. 
Rationale as provided for Claim 1.
Moturu teaches determining sentiment associated with communication ([0045]) it also teaches “automatically initiating a visual telemedicine communication”  (claim 17).  But not expressly determining facial expressions or sentiment from facial expressions based on image analysis.  Moturu also teaches collection of eye or leg movement which may not be by visual data: “[0030] In some variations, Block S120 can include collecting biometric data associated with user conditions, …. Biometric data can include one or more of: … sound probes, polysomnography data …” Polysomnography collects eye and leg movements during sleep.
Breazeal teaches:
3. The method of claim 2, wherein the step of analyzing the multimodal input data comprises at least one of: 
analyzing the audio data to recognize content of the communication from the user, characteristics of the communication indicative of an emotion conveyed in the communication, and acoustic sound in the dialogue scene; and [Breazeal, Figure 2, “ASR 206” leading to “Perceptual Cues /Belief states” which include “emotion” of the speaker.  Figure 9 starting with:  “Interpret User Body/Facial/Speech details to determine his emotional sate 902.”   “[0025] FIG. 9 illustrates a flowchart for a method to indicate and/or influence emotional state of a user by use of the PCD.”]
analyzing the visual data to information surrounding the dialogue scene, including at least one of: a facial expression of the user, an emotion associated with the facial expression, an act performed by the user, and one or more objects in the dialogue scene and the spatial relationships thereof. [Breazeal, Figure 2, “Cameras 212, 214.” Figure 9 starting with:  “Interpret User Body/Facial/Speech details to determine his emotional sate 902.”]   
Rationale as provided for Claim 2.  Breazeal was added to teach the additional modalities of multimodal input and their functions would come from Breazeal.

Regarding Claim 4, Akolkar teaches:
4. The method of claim 3, wherein the current state of the dialogue is generated by: 
obtaining a language parsed graph (Lan-PG) of the dialogue based on the content of the communication from user based on and the dialogue tree; [Akolkar, Figure 4, “Conversation Parser.” Conversation parser determines the services that are being requested and leads to “Service Configurator 412” which can lead to the finite state diagram of Figure 8.]
obtaining a spatial-temporal-causal parsed graph (STC-PG) based on the act performed by the user and the dialogue tree; and [Akolkar, the FSM of Figure 8 includes a state diagram which shows what action causes what transition and outputs what result.  A conversation is temporal.  Thus, the FSM of Figure 8 teaches this limitation.]
generating a joint parsed graph (joint-PG) based on the Lan-PG, the STC-PG, and the information surrounding the dialogue scene. [Akolkar, Figure 8, FSM.  Time is a factor in a conversation and in a sequence of events and therefore the FSM of figure 8 includes time and causation.]
Akolkar does not include a factor of space.
Moturu teaches:
obtaining a spatial-temporal-causal parsed graph (STC-PG) based on the act performed by the user and the dialogue tree; and [Moturu.  Because mobility of the patient is an issue in Moturu and the location of the patient is considered, space/location is a factor:  “8. The method of claim 6, wherein extracting the set of mobility-communication features comprises generating a text messaging location feature based on associating a text messaging parameter from the log of use dataset with a location parameter from the mobility supplementary dataset, ….”  “[0050] … In another variation, applying a communication model can include applying a communication decision tree model, such as a decision tree model including internal nodes and branches selected based on correlations between automated communications and user outcomes (e.g., in relation to user conditions). In a specific example, a communication decision tree model can start with an initial automated communication (e.g., to be transmitted to the user), and subsequent automated communications can be selected and transmitted based on user responses (e.g., associated user meaning, user sentiment, etc.) to communications. However, applying communication decision tree models can be performed in any suitable manner.”]
Akolkar and Maturu pertain to conversational systems and it would have been obvious to modify Akolkar which uses a finite state machine graph to show the relationship between the state of the conversation and the next time step with Maturu that teaches the use of a decision tree for the same purpose and additionally includes the factor of location/space as another parameter to be taken into consideration for the decision as combining prior art elements according to known methods to yield predictable results or simple substitution of one known element for another to obtain predictable results. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Claim 9 is a computer program product system claim with limitations corresponding to the limitations of method Claim 2 and is rejected under similar rationale.
Claim 10 is a computer program product system claim with limitations corresponding to the limitations of method Claim 3 and is rejected under similar rationale.
Claim 11 is a computer program product system claim with limitations corresponding to the limitations of method Claim 4 and is rejected under similar rationale.
Claim 16 is a system claim with limitations corresponding to the limitations of Claim 2 and is rejected under similar rationale.
Claim 18 is a system claim with limitations corresponding to the limitations of Claim 4 and is rejected under similar rationale.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Perez (U.S. 20170316777):
[0020] FIG. 4 illustrates a dialog tree for a given troubleshooting scenario;
[0027] Dialog data production is achieved in the exemplary method through a reactive learning formulation using a Wizard of Oz (WOZ) approach in which a human simulates a virtual agent for conducting a dialog with client. An efficient method is described herein for producing valuable dialog data in order to initialize a policy learning mechanism, such as reinforcement learning or imitative learning for dialog management. Using an active tree expansion model based on a reactive learning sampling strategy, more useful data can be produced in less time than in conventional WOZ approaches to data collection. The method reduces the amount of data which needs to be collected by avoiding low information redundancy and maximizing information gain in region of a dialog tree associated with a given dialog scenario. The approach can produce directly usable data for any dialog policy learning approach, for use in troubleshooting, conducting transactions, pro-active upselling, and the like.

    PNG
    media_image18.png
    702
    980
    media_image18.png
    Greyscale


Lev-Tov (U.S. 20170116173), Figure 12A, “DialogueTreeMiner.”

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARIBA SIRJANI whose telephone number is (571)270-1499.  The examiner can normally be reached on 9 to 5, M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Desir can be reached on 571-272-7799.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Fariba Sirjani/
Primary Examiner, Art Unit 2659