DETAILED ACTION
This action is in response to the amendments filed 18 March 2021 for application 16/252846 filed on 21 January 2019.  Currently claims 1-20 are pending. 

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Choi et al (US20180181922), hereinafter referred to as Choi, in view of Papangelis (US20180232435), hereinafter referred to as Papangelis, in view of Hwang et al. (US20180165581), hereinafter referred to as Hwang, and in further view of Fehrle et al. (“Generating Pictorial Presentations for Advice-Giving Dialog Systems”, In: Gorny P. Tauber M.J. (Eds) Visualization in Human-Computer Interaction. |Psy 1988. Lecture Notes in Computer Science, Vol 329, 1990, pp. 27-36), hereinafter referred to as Fehrle.

In regards to claim 1, Choi teaches a system, comprising: a processor circuit; and a memory storing instructions which when executed by the processor circuit cause the processor circuit to: determine a first task, of a plurality of tasks, ([0113, 0222, 0261, 0355, Figure 3] The device … may output notification related to a first task at a time when the first task in the to-do list has to be performed., The data learner … may collect data used to determine tasks in a to-do list, Disclosed embodiments may be implemented as a software program including instructions stored in a computer - readable storage medium ..; wherein a to-do list consisting of tasks is assembled which includes a first task among a plurality of tasks contained in the list and wherein the operation of the system may be performed by a software module in a non-transitory computer-readable recording medium) for an environment, the environment comprising one or more of a computing environment and a real- world environment ([Figure 25, 0068, 0198, 0152, 0156] The device … may obtain data used to generate a to-do list of a user … about a message transmitted/received by a chatting application …, The sensing unit … may include at least one from among … a position sensor a barometric pressure sensor … an RGB sensor…, wherein the task may be associated with a computing environment since it may be generated from message data and may entail sending photos in a message application and may be associated with a real world environment since it may be associated with tasks such as “buy milk” with specificity of the task based on GPS location information.), the first task relative to one or more of the computing environment and the real-world environment; ([0128] For example, it is assumed that the first task is “Today, Go Shopping at Market A near Home”…, wherein any single task may be associated with either a computing environment or a real-world environment, including the first task and wherein an example of a first task relative to the real-world environment is the task of shopping at market A.), receive data from a plurality of data sources, the received data comprising audio data, image data, application data, and text data ([0068, 0205, 0206] The device may obtain data … from an application(s) executed on a display of the device. For example … a message transmitted/received by a chatting application …, An image captured by the camera … may be used as context information …, The microphone … may receive a voice input of the user., wherein various sources of information may be used in the to-list management/synthesis including chatting/text application data, audio input from the user, and image data which may provide context information.); extract a plurality of features from the received data, the plurality of features comprising: objects identified in the image data, natural language concepts extracted from the audio data, and natural language concepts extracted from the text data; ([0008, 0072, 0084, 0205, 0239, 0240] Various fields to which AI technology is applied are as follows . Linguistic understanding is a technology for recognizing and applying processing human languages / characters and includes natural language processing , machine translation , dialog systems , question answering , and voice recognition / synthesis . Visual understanding is a technology for recognizing and processing objects like a human visual system and includes object recognition , object tracking , image search , person recognition , scene understanding , spatial understanding , and image enhancement., Alternatively , when the application executed on the display is a call application through the device 1000 , the device 1000 may convert call voice into text by using speech - to - text ( STT ) conversion technology in real time ,…, The device 1000 may convert the recorded call voice into text , and may analyze based on the text that the user has to book a flight to Japan on vacation and has to buy a present for Susan . Accordingly , the device 1000 may determine “ Japan ” , “ Present ” , “ Vacation ” , “ Flight ” , and “ Book ” as the keywords 820 , based on the analysis., The camera 1610 may obtain image frames such as a still image or a moving image by using an image sensor in a video mode or an imaging mode . An image captured by the image sensor may be processed by the processor 1300 or an additional image processor ( not shown ) . An image captured by the camera 1610 may be used as context information of the user., For example , the data obtainer 1310 _ 1 may obtain voice data , image data , text data , or bio - signal data . For example , the data obtainer 1310 _ 1 may receive data through an input device ( e . g . , a microphone , a camera , or a sensor ) of the electronic apparatus . Alternatively , the data obtainer 1310 _ 1 may obtain data through an external device that communicates with the electronic apparatus., The pre - processor 1310 _ 2 may pre - process the obtained data so that the obtained data is used for learning for a standard for determining a keyword used to determine tasks in a to - do list of the user , …; wherein video, audio, and/or text data are processed to extract/recognize keywords pertinent for a to-do list such that the features are keywords associated with information/objects (contextual information or keywords) thereby found in images or with information/natural language concepts/semantics thereby found in text data or in audio data (after audio-to-text conversion).) determine a first time step associated with the received data ([0251, 0066, 0113, 0099, Figures 18, 22, and 23] The basic learning data may be previously classified according to various standards such as … a time for which the learning data is generated…., When it is determined that the first task in the to-do list has not been performed by a predetermined time, the device … may output information indicating that the first task has not been performed., The device … may output notification information related to a first task at a time when the first task has to be performed., wherein various time steps may be associated with the received data, including a time form which learning is generated, a time associated with the completion of a task, a time associated with a schedule, and a to-do list updating time based upon the monitoring of the tasks to determine if any has been completed.); determine a plurality of candidate actions for the determined first time step; ([0096, 0099, Figures 22, 23, 31, 32, 33, 34] The device … may determine one or more candidate tasks for the first item based on the selected first keyword., The device … may group tasks belonging to the same schedule, and may generate a candidate task list of each group…, wherein a plurality of candidate tasks are generated according to keywords which may include a set of tasks corresponding to a single item/keyword as well as a single set of tasks associated with a common schedule such that the time step may be completion time associated with that set of tasks)  compute a respective … value of each candidate action achieving the first task at the first time step based on a first machine learning (ML) model applied to the received data and the plurality of features, the first ML model trained based on training data labeled to specify that the training data achieves each of the plurality of tasks or the training data does not achieve each of the plurality of tasks ([0223, 0253] The data learner … may train the data recognition model through unsupervised learning to find a standard for judging the user’s intention, determining candidate tasks, determining a substitute task, and judging monitoring of a task by learning a type of data needed to judge a situation by itself without supervision. Also, the model learner … may train the data recognition model through reinforcement learning using a feedback about whether a result of judging the user’s intention, determining candidate tasks, determining substitute task, and judging monitoring of a task according to learning is right., …, wherein reinforcement learning (a first ML model) is used to determine the intentions of the user based upon a learned recognition of the meaning to the user of the received data (images, audio, text), such as in the form of keywords (including any NL concepts/semantics/contextual information derived from text, audio, and images), such that it continually evaluates through feedback the correctness of its judgement of intention (that is whether the response/feedback training data sample is associated/achieves or is not associated with/does not achieve a given action/task)  and the determination of the candidate tasks and wherein this reinforcement model implicitly includes a probabilistic representation of the tasks given the interpreted user’s intentions through a Markov decision process.) determine that a first candidate action of the plurality of candidate actions has a greater  … value for achieving the first task at the first time step relative to the … values of the remaining plurality of candidate actions ([0242, 0253] The data learner … may train the data recognition model through unsupervised learning to find a standard for judging the user’s intention, determining candidate tasks, determining a substitute task, and judging monitoring of a task by learning a type of data needed to judge a situation by itself without supervision. Also, the model learner … may train the data recognition model through reinforcement learning using a feedback about whether a result of judging the user’s intention, determining candidate tasks, determining substitute task, and judging monitoring of a task according to learning is right., …, wherein the set of candidate tasks are chosen based upon a model that has been trained to learn the intentions of the user based on keywords such that the set of candidate actions/tasks have been chosen because they best represent the user’s intention and so are most likely to achieve the a first task, such as determined by a first item or keyword, and wherein these candidate tasks are inherently based on a probabilistic representation of the user’s intention relative to a keyword through the reinforcement learning process.) determine that the first candidate action has not been implemented in the environment at the first time step ([0066] When it is determined that the first task in the to-do list has not been performed by a predetermined time, the device … may output information indicating that the first task has not been performed…, wherein, once a monitoring operation determines that the first task has not been completed by a predetermined time, an output is produced to indicate this fact); and generate, …, an indication specifying to implement the first candidate action at the first time step as part of a policy to achieve the first task ([Figures 7 and 9, 0102, 0103, 0346] The device … may determine an item as “Japan Trip” by using “Japan”, “Present”, “Vacation”, … “Vacation Request” that are keywords … and may determine “Japan Trip”, “Book Flight”, “Book Hotel A” … as the candidate tasks…, The device … may display the candidate task list … of the candidate tasks…, wherein the set of candidate tasks associated with the item/first task “Japan Trip” are determined and displayed to the user such that each of the candidate tasks is a candidate action that is part of a policy that must be implemented for the “Japan Trip” item/task.) and generate, …  based on the first ML model, the first candidate action, the … value, and the received data from the plurality of data sources, and the plurality of features, a natural language narrative describing the first candidate action and … value for the first candidate action, …. the plurality of natural language concepts of the natural language narrative comprising at least one of the natural language concepts extracted from the audio data and at least one of the natural language concepts extracted from the text data. ([Figures 5, 7, 9, 10, 12, 13, 16, and 17, 0116, 0134, 0253] The device … may form the tasks selected by the user as a to-do list … and may display the to-do list … on the execution screen of the display., Regarding “Book Hotel”, the device … may provide a hotel list including hotel information arranged in a descending order of ratings compared to price, by reflecting the intention that the user (I) and the other user … want a clean hotel … The device may generate the hotel list by applying a weight to price and ratings…,  wherein a natural language narrative describing each of the actions associated with a task is displayed on a screen such that the selection of these tasks is based upon the output of the model for discerning user’s intentions and a corresponding association of tasks/actions that may be most likely to meet the requirements with that model being derived, for example, from reinforcement learning (applied/trained using keywords or responsive to keywords extracted from image, audio, or text data), and wherein the output narrative includes NL concepts extracted from either text and/or audio data (e.g., “vacation”, “Japan” etc.) and it is noted that the output narrative may include additional information (e.g., ratings) which is used for prioritizing the list of potential actions but which is also a metric indicative of how likely the candidate action will satisfy the user requirements.)
Although Choi may use a number of different machine learning techniques for discerning the intentions of the user and the candidate actions best suited for satisfying the user’s intentions, including reinforcement learning which would implicitly map the discernment operation relative to the actions to a probabilistic framework, he does not explicitly describe a probabilistic scoring of the candidate actions. Also, although Choi does include graphical symbols in a text narrative (and uses terms in the narrative that are NL concepts), the symbols are not in themselves associated with natural language concepts of the natural language narrative. Furthermore, Choi does not explicitly teach a second ML model that may perform additional processing or evaluation of the candidate actions that is used to generate a narrative.  Thus, in particular, Choi does not teach compute, based on the first ML model, a second probability value for the first candidate action, the second probability value reflecting a likelihood that the first candidate action will occur in the real-world environment at the first time step; … based on the first and second probability values of the first candidate action… by a second ML model and … second probability … second probability … the natural language narrative comprising a plurality of graphical symbols associated with a plurality of natural language concepts of the natural language narrative. 
However, Papangelis, in the analogous environment of a dialogue system for providing candidate actions, teaches determine a plurality of candidate actions for the determined first time step; compute a respective first probability value of each candidate action achieving the first task at the first time step based on a first machine learning (ML) model applied to the received data, the first ML model trained based on training data labeled to specify that the training data achieves each of the plurality of tasks or the training data does not achieve each of the plurality of tasks ([Figures 6 and 7, 0049, 0050, 0094, 0152, 0169, 0247,260] The system states may be obtained from a corpus of data and the training performed using labels associated with the data …, The belief with respect to a slot s may be the set of probabilities that the slot has each possible value… The belief tracker model is a stored trained model that maps the input utterance to slot values , and updates the probabilities accordingly., The policy model effectively solves generic information-seeking problems. This means, for example, that instead of taking an action “inform(food)” the policy model takes and outputs actions of the type “inform(slot), for slot with maximum belief and importance greater than X” (where the importance … is a measure of how likely it is that a slot of the end user ontology must be filled for the parameterized policy to meet a user’s requirement).,  The labels indicating the current domain are used to determine the success of this identification, allowing the model to learn., Dialogue policy optimization can be solved via Reinforcement Learning.,  wherein a set of candidate actions is generated based upon a probabilistic belief measure which corresponds to the probability of a candidate action being associated with/relevant to the expressed intent of the user which is a probability that the action response will achieve the satisfaction of the user’s intentions, wherein this evaluation is performed during each time slot between user utterances, and wherein the policy model may be trained using labeled data with a predefined domain correspondence to each utterance but may also be trained using reinforcement learning with labels associated with success or non-success of the identification of pertinent response categories.) determine that a first candidate action of the plurality of candidate actions has a greater first probability value for achieving the first task at the first time step relative to the first probability values of the remaining plurality of candidate actions ([0094, 0095, 0110, 0152] The belief with respect to a slot s may be the set of probabilities that the slot has each possible value. For example, for the slot “price”, the values and probabilities may be: [empty:0.15, cheap: 0.35, moderate: 0.1, expensive: 0.4]. These probabilities are updated by the tracker model at each time slot t based on the new input utterance., It may comprise joint beliefs, which are probability distributions over the values of more than one slot (e.g. price and location (the probability that the user said both “cheap restaurant” and “centre of town”)., The full actions each comprise an action function … and may also comprise one or more categories and one or more values (e.g., price=high)., The belief with respect to a slot s may be the set of probabilities that the slot has each possible value., The policy model effectively solves generic information-seeking problems. This means, for example, that instead of taking an action “inform(food)” the policy model takes and outputs actions of the type “inform(slot), for slot with maximum belief and importance greater than X” (where the importance … is a measure of how likely it is that a slot of the end user ontology must be filled for the parameterized policy to meet a user’s requirement).,  wherein a set of candidate actions in the form of action slots, for example, of proposed restaurants, is generated based upon the probabilistic belief measure (joint or single) being maximum (i.e., a belief measure that exceeds the second highest belief measure).) compute, based on the first ML model, a second probability value for the first candidate action, the second probability value reflecting a likelihood that the first candidate action will occur in the real-world environment at the first time step, ([0152, 0196, 0165, 0231] The policy model effectively solves generic information-seeking problems. This means, for example, that instead of taking an action “inform(food)” the policy model takes and outputs actions of the type “inform(slot), for slot with maximum belief and importance greater than X” (where the importance … is a measure of how likely it is that a slot of the end user ontology must be filled for the parameterized policy to meet a user’s requirement)., Importance, e.g. two parameters describing respectively how likely a slot will and will not occur in a dialogue.,  wherein the importance measure, as an indicator of the likelihood (between 0 and 1) that a slot will actually occur, is being interpreted as a second probability value that reflects the likelihood of an action taking place.) generate, based on the first and second probability values of the first candidate action, an indication specifying to implement the first candidate action at the first time step as part of a policy to achieve the first task ([0124, 0152, Figure 6] The action with maximum expected reward is then selected at each dialogue turn during implementation., The policy model effectively solves generic information-seeking problems. This means, for example, that instead of taking an action “inform(food)” the policy model takes and outputs actions of the type “inform(slot), for slot with maximum belief and importance greater than X” (where the importance … is a measure of how likely it is that a slot of the end user ontology must be filled for the parameterized policy to meet a user’s requirement)., Importance, e.g. two parameters describing respectively how likely a slot will and will not occur in a dialogue..,  wherein the system selects an action/response based upon a set of probabilities/likelihoods including single belief states for a slot/action, joint belief states across slots/actions, importance likelihood, and slot/action likelihood of being requested/being reflecting relevance/likelihood of action occurrence.)  and generate, … based on the …the second probability value, … natural language narrative describing the first candidate action and the second probability value for the first candidate action, ….  ([0094, 0095, 0110, 0152, Figure 4] The belief with respect to a slot s may be the set of probabilities that the slot has each possible value. For example, for the slot “price”, the values and probabilities may be: [empty:0.15, cheap: 0.35, moderate: 0.1, expensive: 0.4]. These probabilities are updated by the tracker model at each time slot t based on the new input utterance., It may comprise joint beliefs, which are probability distributions over the values of more than one slot (e.g. price and location (the probability that the user said both “cheap restaurant” and “centre of town”)., The full actions each comprise an action function … and may also comprise one or more categories and one or more values (e.g., price=high)., The belief with respect to a slot s may be the set of probabilities that the slot has each possible value., The policy model effectively solves generic information-seeking problems. This means, for example, that instead of taking an action “inform(food)” the policy model takes and outputs actions of the type “inform(slot), for slot with maximum belief and importance greater than X” (where the importance … is a measure of how likely it is that a slot of the end user ontology must be filled for the parameterized policy to meet a user’s requirement).,  wherein a set of candidate actions in the form of action slots, for example, of proposed restaurants, is generated based upon two probabilities: (1) the probabilistic belief measure being maximum (i.e., a belief measure that exceeds the second highest belief measure) and (2)  the importance likelihood exceeding “X” which measures the likelihood that the action slot will satisfy the user’s requirements in the sense of being likely to occur and wherein a natural language narrative is generated based on these probabilities in the form of a dialogue turn (natural language narrative).) 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Choi to incorporate the teachings of Papangelis to have determined a probability that a candidate action would satisfy the most probable discerned intentions of the user and to have proposed the candidate actions based upon this probabilistic assessment and to include this in a displayed natural language list of potential actions. The modification would have been obvious because one of ordinary skill would have been motivated to use a probabilistic measure for the likelihood that a candidate task may satisfy a user requirement to facilitate the adaptive optimization of the domain-specific and user-dependent policy model optimize the policy model and to a probabilistic metric to represent to prioritization of actions ([0064, 0068]).
However, Choi and Papangelis  do not teach … by a second ML model and … the natural language narrative comprising a plurality of graphical symbols associated with a plurality of natural language concepts of the natural language narrative. In other words, Choi and Papangelis do not explicitly teach a second ML model that may perform additional processing or evaluation of the candidate actions that is used to generate a narrative that includes NL-based graphical symbols.
However, Hwang, in the analogous environment of using machine learning to provide guiding tasks to a user, teaches and generate, by a second ML model and based on the first ML model, the first candidate action, the second probability value, the received data from the plurality of data sources and the plurality of features, a natural language narrative describing the first candidate action and the … value for the first candidate action, … ([Figures 7, 8, and 10, 0142, 0150, 0151, 0077] In the input processing, the electronic apparatus … may process various types of user input received … voice … text … image., The plan recognition management module is a module for determining a candidate task to provide a guide., The filter management module may determine a priority between candidate tasks or remove some candidate tasks … The electronic apparatus … may then generate a guide … according to the priority…, The memory may store a model for Natural Language Generation …,   wherein the automatic generation of user guides, as shown in Figure 7, receives input from text, audio, and image data sources (from which task-related NL concepts may be derived) along with intent parameters and other results generated through the task processing module  which is used to construct a candidate set of actions which corresponds to a first ML model but is also used in the filter manager which is a second ML model which evaluates the candidate list, deleting those which are not found to be relevant and prioritizing the list and which is connected to the natural language generation output processing which may be considered also part of the second ML model or a distinct alternative second ML model, such that the result is a natural language expression/narrative of the task to be performed (i.e., relative to the extracted task-related NL concepts), which may, for instance, be a user utterance).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Choi and Papangelis to incorporate the teachings of Hwang to have applied a second ML model for refining a set of candidate actions to form a NL-based narrative to interactively guide the user through the actions. The modification would have been obvious because one of ordinary skill would have been motivated to improve the efficiency and user-friendliness of task completion by using an interactive intelligent assistant to guide the user through prioritized tasks using a natural language generation model to facilitate the interaction ([0002, 0009]).
However, Choi, Papangelis, and Hwang do not teach the natural language narrative comprising a plurality of graphical symbols associated with a plurality of natural language concepts of the natural language narrative. In other words, although Choi, Papangelis, and Hwang each teach the generation of a natural language narrative, none of them disclose the inclusion within that narrative NL-concept based graphical symbols.
However, Fehrle, in the analogous art of designing a dialog system for communicating advice, teaches the natural language narrative comprising a plurality of graphical symbols associated with a plurality of natural language concepts of the natural language narrative, ([p. 28, Section 2.2, p. 28, Section 2.3, p. 32, Section 5.1, Figure 3, Figure 4] In practice, several generators will be necessary, one to generate text, one for pictures and perhaps even one for speech., When the domain of discourse deals with real-world objects, their inter-relationships, and how to put them together, graphics showing the parts are usually a better way of communicating information to the user than, say, texts. Thus, a language generator for encoding information for presentation to the user should have a strong graphical (picture-like) component…. The picture generator accepts as input a semantic representation of an intended utterance within the domain of discourse of the application. The generator looks up in its picture lexicon the semantic symbols used in this representation. Typical entries for nouns are icons and for verbs are descriptions of movements of icons on the screen. The generator assembles the icons retrieved and designs an appropriate graphical presentation., This representation is then sent to the language generator. To a novice, the natural language message "Please open up the PC now" will be of little help; he may not realize that he should first turn off the power, unplug it and disconnect the peripheral devices. Further, he may not know how to open it up. However, such information can be encoded in a pictorial presentation in a clear manner. The lexicon contains an entry for the action open as follows: action open (IBM-PC) 1. turn off machine 2. unplug power and peripheral devices 3. unscrew casing 4. lift off casing. The lexicon contains an entry for each of these primitive actions., wherein a natural language narrative is generated that includes graphical symbols for the communication of advised actions such that the graphical symbols directly correspond to the meaning and content (natural language concepts) of a corresponding natural language narrative and wherein it is noted that even though the focus of this system is the generation of the graphical symbols it may be used with other language generators such as text and speech such that, in any case, the automatic language generator of selected symbols, text, or audio to represent the advised actions is a second ML model.) 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Choi, Papangelis, and Hwang to incorporate the teachings of Fehrle to have generated a natural language narrative that includes NL-based graphical symbols. The modification would have been obvious because one of ordinary skill would have been motivated to improve the effectiveness of the communication of the content automatically generated for a user by including pictorial representations of the conceptual meaning of a natural language narrative (Fehrle, [Abstract, p. 28, Section 2.2, p. 29, Section 3.2]).

In regards to claim 2, rejection of claim 1 is incorporated and Choi further teaches … 
wherein the natural language narrative is further based on the remaining plurality of candidate actions and the remaining … values …and output the natural language narrative for display ([Figures 12, 13, 16, and 17, 0116, 0134, 0253] The device … may form the tasks selected by the user as a to-do list … and may display the to-do list … on the execution screen of the display., Regarding “Book Hotel”, the device … may provide a hotel list including hotel information arranged in a descending order of ratings compared to price, by reflecting the intention that the user (I) and the other user … want a clean hotel … The device may generate the hotel list by applying a weight to price and ratings…,  a natural language narrative describing each of the actions associated with a task is displayed on a screen such that the selection of these tasks is based upon the output of the model for discerning user’s intentions and a corresponding association of tasks/actions that may be most likely to meet the requirements with that model being derived, for example, from reinforcement learning, and wherein the output narrative also may include additional information (e.g., ratings) which is used for prioritizing the list of potential actions but which is also a metric indicative of how likely the candidate action will satisfy the user requirements.) 
However, Choi does not explicitly teach … and the remaining probability values … receive, by the second ML model, the first ML model, the plurality of candidate actions, the computed probability values, and the received data from the plurality of data sources, the plurality of data sources comprising microphones, cameras, and computing devices…. In other words, Choi does not explicitly teach a second ML model that may perform additional processing or evaluation of the candidate actions that is used to generate a narrative. In addition, Choi does not explicitly teach a “probability value” that is associated with the expected efficacy of any given action/task. 
However, Papangelis, in the analogous environment of a dialogue system for providing candidate actions, teaches … and the remaining probability values …([0094, 0095, 0110, 0152] The belief with respect to a slot s may be the set of probabilities that the slot has each possible value. For example, for the slot “price”, the values and probabilities may be: [empty:0.15, cheap: 0.35, moderate: 0.1, expensive: 0.4]. These probabilities are updated by the tracker model at each time slot t based on the new input utterance., It may comprise joint beliefs, which are probability distributions over the values of more than one slot (e.g. price and location (the probability that the user said both “cheap restaurant” and “centre of town”)., The full actions each comprise an action function … and may also comprise one or more categories and one or more values (e.g., price=high)., The belief with respect to a slot s may be the set of probabilities that the slot has each possible value., The policy model effectively solves generic information-seeking problems. This means, for example, that instead of taking an action “inform(food)” the policy model takes and outputs actions of the type “inform(slot), for slot with maximum belief and importance greater than X” (where the importance … is a measure of how likely it is that a slot of the end user ontology must be filled for the parameterized policy to meet a user’s requirement).,  wherein a set of candidate actions in the form of action slots, for example, of proposed restaurants, is generated based upon two probabilities: (1) the probabilistic belief measure being maximum (i.e., a belief measure that exceeds the second highest belief measure) and (2)  the importance likelihood exceeding “X” which measures the likelihood that the action slot will satisfy the user’s requirements in the sense of being likely to occur.) 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Choi to incorporate the teachings of Papangelis to have determined a probability that a candidate action would satisfy the most probable discerned intentions of the user and to have proposed the candidate actions based upon this probabilistic assessment and to include this in a displayed list of potential actions. The modification would have been obvious because one of ordinary skill would have been motivated to use a probabilistic measure for the likelihood that a candidate task may satisfy a user requirement to facilitate the adaptive optimization of the domain-specific and user-dependent policy model optimize the policy model and to a probabilistic metric to represent to prioritization of actions ([0064, 0068]).
However, Choi and Papangelis  do not teach receive, by the second ML model, the first ML model, the plurality of candidate actions, the computed probability values, and the received data from the plurality of data sources, the plurality of data sources comprising microphones, cameras, and computing device…. In other words, Choi and Papangelis do not explicitly teach a second ML model that may perform additional processing or evaluation of the candidate actions that is used to generate a narrative.
However, Hwang, in the analogous environment of using machine learning to provide guiding tasks to a user, teaches receive, by the second ML model, the first ML model, the plurality of candidate actions, the computed … values, and the received data from the plurality of data sources, the plurality of data sources comprising microphones, cameras, and computing devices; ([Figures 7, 8, and 10, 0142, 0150, 0151, 0077] In the input processing, the electronic apparatus … may process various types of user input received … voice … text … image., The plan recognition management module is a module for determining a candidate task to provide a guide., The filter management module may determine a priority between candidate tasks or remove some candidate tasks … The electronic apparatus … may then generate a guide … according to the priority…, The memory may store a model for Natural Language Generation …,   wherein the automatic generation of user guides, as shown in Figure 7, receives input from text, audio, and image data sources along with intent parameters and other results generated through the task processing module  which is used to construct a candidate set of actions which corresponds to a first ML model but is also fed into the filter manager which is a second ML model which is also connected to the natural language generation output processing which may be considered also part of the second ML model or a distinct alternative second ML model .) 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Choi and Papangelis to incorporate the teachings of Hwang to have applied a second ML model for refining a set of candidate actions to form a narrative to interactively guide the user through the actions. The modification would have been obvious because one of ordinary skill would have been motivated to improve the efficiency and user-friendliness of task completion by using an interactive intelligent assistant to guide the user through prioritized tasks using a natural language generation model to facilitate the interaction ([0002, 0009]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Choi, Papangelis, and Hwang to incorporate the teachings of Fehrle for the same reasons as pointed out for claim 1

In regards to claim 3, rejection of claim 2 is incorporated and Choi further teaches  the memory storing instructions which when executed by the processor circuit cause the processor circuit to: generate a text transcription of speech in the audio data; extract the natural language concepts from the text transcription to extract the natural language concepts from the audio data; ([0072, 0084] Alternatively , when the application executed on the display is a call application through the device 1000 , the device 1000 may convert call voice into text by using speech - to - text ( STT ) conversion technology in real time , may analyze the text , and may determine keywords used to determine the tasks …. The device 1000 may convert the recorded call voice into text , may analyze the text , and may determine a keyword., The device 1000 may convert the recorded call voice into text , and may analyze based on the text that the user has to book a flight to Japan on vacation and has to buy a present for Susan . Accordingly , the device 1000 may determine “ Japan ” , “ Present ” , “ Vacation ” , “ Flight ” , and “ Book ” as the keywords 820 , based on the analysis.; wherein a textual transcript of audio/speech/voice data is created through STT conversion and wherein the resultant text is analyzed to extract/recognize keywords pertinent for a to-do list (i.e., information/natural language concepts/semantics relevant for the to-do list are found in text transcription).)
However, Choi, Papangelis, and Hwang do not explicitly disclose determine, by the second ML model for each of the plurality of natural language concepts of the natural language narrative, an associated graphical symbol of the plurality of graphical symbols; select, by the second ML model, a first graphical symbol of the plurality of graphical symbols; and assign the first graphical symbol to the first candidate action.  Although each of Choi, Papangelis, and Hwang generate a dialogue/natural language narrative with Papangelis and Hwang in particular making use of natural language generation methods to perform that function, none of them explicitly discloses the determination of natural language concepts using a model as well as, as previously noted, the use of NL concept-based graphical symbols in the narrative.
However, Fehrle, in the analogous art of designing a dialog system for communicating advice, teaches determine, by the second ML model for each natural language concept, an associated graphical symbol of the plurality of graphical symbols; ([p. 28, Section 2.3, p. 30, Section 4.1, p. 32, Section 5.2.1, Figure 2, Figure 3] The picture generator accepts as input a semantic representation of an intended utterance within the domain of discourse of the application. The generator looks up in its picture lexicon the semantic symbols used in this representation.., Given a semantic representation of a message, the following steps must be carried out: 1. Determine, by looking up the necessary information in the lezicon, the sequence of CGS's and primitive actions which must be portrayed on the screen., The lexicon contains, among primitive graphical symbols, various views of the IBM-PC, a screwdriver and a hand (see Figure 5.2.1). The IBM-PC's back view has the annotation that the places where the casing may be unscrewed are in the four corners and the upper middle. Furthermore, the screwdriver has an annotation that if it is being used, a hand must hold it on its handle., wherein the language generator (second ML model) determines an association between parsed content and meaning of an intended semantic utterance and particular corresponding graphical symbols such that NL concepts like verb or noun have distinct symbolic structure and concepts associated with the semantic meanings also are associated with particular symbolic structures (or sequence or movement of symbols).) select, by the second ML model, a first graphical symbol of the plurality of graphical symbols; ([p. 30, Section 4.1, p. 33, Section 5.3,  Figure 2, Figure 3] Given a semantic representation of a message, the following steps must be carried out: 1. Determine, by looking up the necessary information in the lexicon, the sequence of CGS's and primitive actions which must be portrayed on the screen., The generator assembles a presentation by merging the instructions for the four simple operations and co-ordinating the transitions between the low-level primitives. The IBM-PC is considered the most important object and is thus placed in the center of the screen. The sequence of pictures which is generated is too long to show in its entirety here; instead we show two snapshots as examples., wherein symbols are selected according to their significance in the communication of the information signal in both time and space.) and assign the first graphical symbol to the first candidate action ([p. 33, Section 5.3,  Figure 4] Initially the PC is shown from the front. The user must turn around the PC to switch it off; Figure 5.3a shows the front view of the PC with the arrow (flashing in the actual presentation) indicating that it is to be turned., wherein particular graphical symbols are assigned to specific actions (first candidate action) in order to guide the recipient to perform those actions.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Choi, Papangelis, and Hwang to incorporate the teachings of Fehrle to have generated a natural language narrative using a model that identifies NL concepts, determines graphical symbols according to those concepts, and assigns each symbol to a candidate action. The modification would have been obvious because one of ordinary skill would have been motivated to improve the effectiveness of the communication of the content automatically generated for a user by including pictorial representations of the conceptual meaning of a natural language narrative that communicates a guide for actions by the recipient of that narrative  (Fehrle, [Abstract, p. 28, Section 2.2, p. 29, Section 3.2]).

In regards to claim 4, rejection of claim 1 is incorporated and Choi further teaches wherein the first ML model comprises an artificial neural network, the memory storing instructions which when executed by the processor circuit cause the processor circuit to: receive the training data comprising a plurality of training actions, at least a subset of the plurality of training actions comprising sequential actions; ([0118, 0250, 0251, 0253] When “Japan Trip”, “Book Flight”, “Book Hotel B”, “Take Passport Photo”, “submit Vacation Request”, “Buy Present for Susan”, and “Go to Festival XX” are determined as tasks … the device … may classify the tasks … into a task … before departure and a task … after arrival., The data recognition model may be a model based on , for example , a neural network., The basic learning data may be previously classified according to types of data, and the data recognition models may be previously established according to the types of data.,  The model learner may train the data recognition model through supervised learning by using, for example, learning data as an input value. Also the model learner … may train the data recognition model through unsupervised learning to find a standard for judging the user’s intention, determining candidate tasks, determining a substitute task, and judging monitoring of a task by learning a type of data needed to judge a situation… The model learner … may train the data recognition model through reinforcement learning using a feedback about whether a result of judging the user’s intention, determining candidate tasks, determining substitute tasks … is right., wherein various sources of training data is used to train the model learner/data recognition model including the training of various functional aspects of the to-do list generation process such as the determination of the candidate tasks such that this process may include any data collected during previous learning model construction events and wherein it is noted that this may also include sequences of actions because of the functionality for splitting apart a candidate task into sequential sub-tasks or actions such as corresponding to “before departure” and “after arrival” and wherein the model framework includes a neural network..) receive the labels for the training data, the labels comprising values indicating whether the associated training action achieves the plurality of tasks ([0242, 0243, 0251, 0253, 0256] The data learner … may train a data recognition model to have a standard about how to judge the user’s intention and how to determine candidate tasks…, The basic learning data may be previously classified according to types of data, and the data recognition models may be previously established according to the types of data.,  The model learner may train the data recognition model through supervised learning by using, for example, learning data as an input value. Also the model learner … may train the data recognition model through unsupervised learning to find a standard for judging the user’s intention, determining candidate tasks, determining a substitute task, and judging monitoring of a task by learning a type of data needed to judge a situation… The model learner … may train the data recognition model through reinforcement learning using a feedback about whether a result of judging the user’s intention, determining candidate tasks, determining substitute tasks … is right.; wherein the learning data is used for the training of various functional aspects of the to-do list generation process so that they conform to a standard of correctness, such that one such functionality is the determination of the candidate tasks with the training data used to learn this determination based upon a reference of correctness that may be based either upon a feedback mechanism such as in reinforcement learning or based upon the training data in a supervised learning context and wherein the feedback about whether a result is correct is a label that is used in the training process.) and train the first ML model based on the training data, the labels, and a ML algorithm ([0252, 0253, 0256] The model learner … may train the data recognition model by using a learning algorithm including… error back-propagation …., The model learner may train the data recognition model through supervised learning by using, for example, learning data as an input value. Also the model learner … may train the data recognition model through unsupervised learning to find a standard for judging the user’s intention, determining candidate tasks, determining a substitute task, and judging monitoring of a task by learning a type of data needed to judge a situation… The model learner … may train the data recognition model through reinforcement learning using a feedback about whether a result of judging the user’s intention, determining candidate tasks, determining substitute tasks … is right.; wherein, using the learning data, various functional aspects of the to-do list generation process are trained so that they conform to a standard of correctness, such that one such functionality is the determination of the candidate tasks with the training data used to learn this determination based upon a reference of correctness that may be based either upon a feedback mechanism such as in reinforcement learning or based upon the training data in a supervised learning context, wherein the ML  algorithm used to perform this function may be any of back propagation, reinforcement learning, supervised learning, or unsupervised learning, and wherein the feedback about whether a result is correct is a label that is used in the training process.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Choi to incorporate the teachings of Papangelis, Hwang, and Fehrle for the same reasons as pointed out in claim 1.

In regards to claim 5, rejection of claim 1 is incorporated and Choi further teaches the memory storing instructions which when executed by the processor circuit cause the processor circuit to: receive data from the plurality of data sources at a second time step, the second time step subsequent to the first time step; ([Figures 4, 6, and 15, 0084, 0088, 0122, 0128] The device … may record call voice …. Accordingly, the device may determine “Japan”, “Present”, “Vacation”, “Flight”, and “Book” as the keywords …, The user may additionally insert “Hotel” and “Passport Photo” into the keyword list…., FIG . 15 is a flowchart of a method by which the device 1000 provides a task that may replace a task that is previously determined.,  The device 1000 may obtain information saying “ Today the market A is closed ” from the service providing server; wherein the various sources of image, textual, and audio data is monitored for the purpose of analyzing it to extract keywords that may be associated with a candidate task and wherein this may be performed over different time periods, leading potentially to a different set of tasks/actions for inclusion in the to-do list but wherein each of the sets of information is processed using the same system functional components such that the time associated with a second set of tasks takes place after the time associated with a first set of tasks with the tasks being organized and updated over time.) determine a second task of the plurality of tasks ([0099, 0118] The device may classify the determined plurality of candidate tasks according to schedules of the user …, The device … may display the task … before departure and the task … after arrival ….; wherein, second tasks may be determined based upon different times in which they are required to be executing, leading, for instance, to a first main task and a second main task, each of which may consist of sub-tasks or actions but wherein any of these sub-tasks or main tasks is evaluated using the same system functional components.); determine a plurality of candidate actions for the second time step; ([0096, 0099, Figures 22, 23, 31, 32, 33, 34] The device … may determine one or more candidate tasks for the first item based on the selected first keyword., The device … may group tasks belonging to the same schedule, and may generate a candidate task list of each group…, wherein a plurality of candidate tasks are generated according to keywords which may include a set of tasks corresponding to a single item/keyword as well as a single set of tasks associated with a common schedule such that the time step may be completion time associated with that set of tasks)  compute, by the first ML model based on the data received at the second time step, a respective … value for each candidate action for the second time step, the probability values reflecting the probability that the associated candidate action for the second time step achieves the second task at the second time step; ([0223, 0253] The data learner … may train the data recognition model through unsupervised learning to find a standard for judging the user’s intention, determining candidate tasks, determining a substitute task, and judging monitoring of a task by learning a type of data needed to judge a situation by itself without supervision. Also, the model learner … may train the data recognition model through reinforcement learning using a feedback about whether a result of judging the user’s intention, determining candidate tasks, determining substitute task, and judging monitoring of a task according to learning is right., …, wherein reinforcement learning (a first ML model) is used to determine the intentions of the user based upon a learned recognition of the meaning to the user of the received data, such as in the form of keywords, such that it continually evaluates through feedback the correctness of its judgement of intention and the determination of the candidate tasks and wherein this reinforcement model implicitly includes a probabilistic representation of the tasks given the interpreted user’s intentions through a Markov decision process.) determine that a second candidate action of the plurality of candidate actions for the second time step has a greater … value for achieving the second task at the second time step relative to the … values of the remaining plurality of candidate actions for the second time step; ([0242, 0253] The data learner … may train the data recognition model through unsupervised learning to find a standard for judging the user’s intention, determining candidate tasks, determining a substitute task, and judging monitoring of a task by learning a type of data needed to judge a situation by itself without supervision. Also, the model learner … may train the data recognition model through reinforcement learning using a feedback about whether a result of judging the user’s intention, determining candidate tasks, determining substitute task, and judging monitoring of a task according to learning is right., …, wherein the set of candidate action/tasks are chosen based upon a model that has been trained to learn the intentions of the user based on keywords such that the set of candidate actions/tasks have been chosen because they best represent the user’s intention and so are most likely to achieve the a first task, such as determined by a first item or keyword, and wherein these candidate tasks are inherently based on a probabilistic representation of the user’s intention relative to a keyword through the reinforcement learning process.) determine that the second candidate action has not been implemented in the environment at the second time step; ([0149] When it is determined as a result of the monitoring that a task in the to-do list has not been performed, the device … may output information indicating that the first task has not been performed…, wherein, once a monitoring operation determines that any given task (first or second) has not been completed by a predetermined time, an output is produced to indicate this fact.) generate the policy comprising the first and second candidate actions; and generate an indication specifying to implement the second candidate action at the second time step as part of the policy to achieve the second task. ([Figures 12, 13, 16, and 17, 0116, 0118] The device … may form the tasks selected by the user as a to-do list … and may display the to-do list … on the execution screen of the display., When “Japan Trip”, “Book Flight”, “Book Hotel B”, “Take Passport Photo”, “submit Vacation Request”, “Buy Present for Susan”, and “Go to Festival XX” are determined as tasks … the device … may classify the tasks … into a task … before departure and a task … after arrival.; wherein the set of candidate tasks associated with the second main task (after arrival) in “Japan Trip”, are determined and displayed to the user such that each of the candidate tasks is a candidate action that is part of a policy that must be implemented.) 
Although Choi may use a number of different machine learning techniques for discerning the intentions of the user and the candidate actions best suited for satisfying the user’s intentions, including reinforcement learning which would implicitly map the discernment operation relative to the actions to a probabilistic framework, he does not explicitly describe a probabilistic scoring of the candidate actions.
However, Papangelis, in the analogous environment of a dialogue system for providing candidate actions, teaches compute, by the first ML model based on the data received at the second time step, a respective third probability value for each candidate action for the second time step, the third probability values reflecting the probability that the associated candidate action for the second time step achieves the second task at the second time step ([Figures 6 and 7, 0049, 0050, 0094, 0152, 0169, 0247,260] The system states may be obtained from a corpus of data and the training performed using labels associated with the data …, The belief with respect to a slot s may be the set of probabilities that the slot has each possible value… The belief tracker model is a stored trained model that maps the input utterance to slot values , and updates the probabilities accordingly., The policy model effectively solves generic information-seeking problems. This means, for example, that instead of taking an action “inform(food)” the policy model takes and outputs actions of the type “inform(slot), for slot with maximum belief and importance greater than X” (where the importance … is a measure of how likely it is that a slot of the end user ontology must be filled for the parameterized policy to meet a user’s requirement).,  The labels indicating the current domain are used to determine the success of this identification, allowing the model to learn., Dialogue policy optimization can be solved via Reinforcement Learning.,  wherein a set of candidate actions is generated based upon a probabilistic belief measure which corresponds to the probability of a candidate action being associated with/relevant to the expressed intent of the user which is a probability that the action response will achieve the satisfaction of the user’s intentions, and wherein this evaluation is performed during each time slot between user utterances.) determine that a second candidate action of the plurality of candidate actions for the second time step has a greater third probability value for achieving the second task at the second time step relative to the third probability values of the remaining plurality of candidate actions for the second time step ([0094, 0095, 0110, 0152] The belief with respect to a slot s may be the set of probabilities that the slot has each possible value. For example, for the slot “price”, the values and probabilities may be: [empty:0.15, cheap: 0.35, moderate: 0.1, expensive: 0.4]. These probabilities are updated by the tracker model at each time slot t based on the new input utterance., It may comprise joint beliefs, which are probability distributions over the values of more than one slot (e.g. price and location (the probability that the user said both “cheap restaurant” and “centre of town”)., The full actions each comprise an action function … and may also comprise one or more categories and one or more values (e.g., price=high)., The belief with respect to a slot s may be the set of probabilities that the slot has each possible value., The policy model effectively solves generic information-seeking problems. This means, for example, that instead of taking an action “inform(food)” the policy model takes and outputs actions of the type “inform(slot), for slot with maximum belief and importance greater than X” (where the importance … is a measure of how likely it is that a slot of the end user ontology must be filled for the parameterized policy to meet a user’s requirement).,  wherein a set of candidate actions in the form of action slots, for example, of proposed restaurants, is generated based upon the probabilistic belief measure (joint or single) being maximum (i.e., a belief measure that exceeds the second highest belief measure).) 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Choi to incorporate the teachings of Papangelis to have determined a probability that a candidate action would satisfy the most probable relevant discerned intentions of the user and to have proposed the candidate actions based upon this probabilistic assessment. The modification would have been obvious because one of ordinary skill would have been motivated to use a probabilistic measure for the likelihood that a candidate task may satisfy a user requirement to facilitate the adaptive optimization of the domain-specific and user-dependent policy model optimize the policy model ([0064, 0068]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Choi and Papangelis to incorporate the teachings of  Hwang and Fehrle for the same reasons as pointed out in claim 1.
 
In regards to claim 6, rejection of claim 1 is incorporated and Choi further teaches the memory storing instructions which when executed by the processor circuit cause the processor circuit to: receive, by the first ML model, a reward value for the first time step, …, wherein the first ML model further determines the … values based on the reward value for the first time step.  ([0253, 0258] Also, the model learner … may train the data recognition model through reinforcement learning using a feedback about whether a result of judging the user’s intention, determining candidate tasks, determining substitute task, and judging monitoring of a task according to learning is right., …, wherein the set of candidate tasks are chosen based upon an ML model that has been trained using reinforcement learning to learn the best candidate action selection process based on a feedback about the efficacy of any choice of candidate actions as an indication of the correctness of the candidate task determination process and wherein the reinforcement learning feedback that is generated may be interpreted as a reward such that this reward then determines the probabilistic states of the reinforcement learning model and wherein the effect of implementing the reinforcement learning is to improve the efficacy of the candidate action selection process.) 
However, Choi does not explicitly teach the reward value lesser than a reward value for a time step prior to the first time step,… first and second probability values based on the reward value … compute, based on the first ML model, a respective second probability value for each remaining candidate action of the plurality of candidate actions; and determine that the second probability value for the first candidate action is greater than the second probability values of the remaining plurality of candidate actions. Although the reinforcement learning algorithm inherently computes a reward as a feedback, Choi does not explicitly relate that reward or feedback to a probability value characterizing the efficacy of the candidate action selection or that the reward may be less at a given time relative to a previous time.
However, Papangelis, in the analogous environment of a dialogue system for providing candidate actions, teaches receive, by the first ML model, a reward value for the first time step, the reward value lesser than a reward value for a time step prior to the first time step, wherein the first ML model further determines the first and second probability values based on the reward value for the first time step ([0094, 0152, 0229, 0231, 0255, 0259, 0260] The belief with respect to a slot s may be the set of probabilities that the slot has each possible value., The policy model effectively solves generic information-seeking problems. This means, for example, that instead of taking an action “inform(food)” the policy model takes and outputs actions of the type “inform(slot), for slot with maximum belief and importance greater than X” (where the importance … is a measure of how likely it is that a slot of the end user ontology must be filled for the parameterized policy to meet a user’s requirement)., For each slot … the cumulative moving average of the ratio of the reward that was achieved to the maximum possible reward is determined as an estimate of slot importance., Dialogue policy optimisation may be aimed at estimating the expected long - term reward for a system action being executed at a system state or belief state , such that the action with maximum expected reward can be selected at each dialogue turn., When learning or updating the values during the implementation, the system is able to adapt to potential changes as time progresses.., Dialogue policy optimization can be solved via Reinforcement Learning … A reward function assigns a reward r given the current state and action taken ….,,  wherein reinforcement learning is used to optimize an ML policy model such that the slot with an action is selected according to the highest long term reward (first probability) with this selection also being predicated upon the importance likelihood (second probability) which is computed from the rewards observed in response to an action, wherein this ML learning process adapts itself by adjusting the measure of importance, and wherein this adaption would occur as the rewards associated with a particular task diminishes over time (i.e., a reward for the first time step is less than the reward for a previous time step) relative to some other task, thereby changing the likelihood of its importance.) compute, based on the first ML model, a respective second probability value for each remaining candidate action of the plurality of candidate actions; ([0152,  0165] The policy model effectively solves generic information-seeking problems. This means, for example, that instead of taking an action “inform(food)” the policy model takes and outputs actions of the type “inform(slot), for slot with maximum belief and importance greater than X” (where the importance … is a measure of how likely it is that a slot of the end user ontology must be filled for the parameterized policy to meet a user’s requirement)., Importance, e.g. two parameters describing respectively how likely a slot will and will not occur in a dialogue..,  wherein the importance measure, as an indicator of the likelihood that a slot will actually occur, may also be interpreted as a second probability value that reflects the likelihood of an action taking place, and wherein this computation is computed for all candidate actions/slot including those for which the belief may be the maximum.)  and determine that the second probability value for the first candidate action is greater than the second probability values of the remaining plurality of candidate actions. ([0152, 0196, 0165, 0183, 0234] The policy model effectively solves generic information-seeking problems. This means, for example, that instead of taking an action “inform(food)” the policy model takes and outputs actions of the type “inform(slot), for slot with maximum belief and importance greater than X” (where the importance … is a measure of how likely it is that a slot of the end user ontology must be filled for the parameterized policy to meet a user’s requirement)., Importance, e.g. two parameters describing respectively how likely a slot will and will not occur in a dialogue., In addition, there may be a set of (one-dimensional) values, each indicating the probability of a slot being requested by a user … b_r_s denotes the belief probabilities for slot s being requested., The output from the policy model is an action comprising a communication function ( e . g . “ select ” ) and one or more parameter values which describe a slot ( e . g . the slot with 5 values and high importance).,  wherein the importance measure, as an indicator of the likelihood that a slot will actually occur, is being interpreted as a second probability value such that the selection of an action is based upon a slot not only with maximum belief/relevance (first probability) but also sufficiently high importance and wherein this threshold of importance is being used to exclude a maximum belief candidate that has less than a requisite level of importance but which also will instantiate the maximum belief candidate if its importance is high in a relative sense (that is higher than other candidates which is interpreted to mean those that populate the set of remaining candidates)). 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Choi to incorporate the teachings of Papangelis to have used reinforcement learning to determine a probability that a candidate action would satisfy the most probable discerned intentions based upon the reward associated with that candidate action and to shift the relative importance of different candidate actions adaptively as the relative normalized rewards change over time. The modification would have been obvious because one of ordinary skill would have been motivated to use a probabilistic measure for the likelihood that a candidate task may satisfy a user requirement to facilitate the adaptive optimization of the domain-specific and user-dependent policy model optimize the policy model ([0064, 0068, 0229]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Choi and Papangelis to incorporate the teachings of  Hwang and Fehrle for the same reasons as pointed out in claim 1.

In regards to claim 7, rejection of claim 1 is incorporated and Choi further teaches wherein the first task is determined based on one or more of user input and the first ML model, wherein the first candidate action comprises one or more of: (i) 28Attorney Docket No. 1988.0034 modifying the real-world environment ([Figure 20] wherein the task of “buying milk at the market” entails a modification of the real-world environment, namely the acquisition of a physical object.)  and (ii) performing an operation in the real-world environment ([Figure 6] wherein the task of “book a flight” entails an operation in the real-world environment.)  
 It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Choi to incorporate the teachings of Papangelis, Hwang, and Fehrle for the same reasons as pointed out in claim 1.

Claim 8 is also rejected because it is just a computer readable medium implementation of the same subject matter of claim 1 which can be found in Choi and Papangelis, Hwang, and Fehrle.

In regards to claim 9, rejection of claim 8 is incorporated and Choi further teaches further storing instructions that when executed by the processor cause the processor to:  29Attorney Docket No. 1988.0034 receive data from the plurality of data sources at a second time step, the second time step subsequent to the first time step; ([Figures 4 and 6, 0084, 0088, 0156] The device … may record call voice …. Accordingly, the device may determine “Japan”, “Present”, “Vacation”, “Flight”, and “Book” as the keywords …, The user may additionally insert “Hotel” and “Passport Photo” into the keyword list…, The device … may monitor the data indicating the location information of the user from the GPS application … in real time.; wherein the various sources of image, textual, and audio data is monitored for the purpose of analyzing it to extract keywords that may be associated with a candidate task and wherein this may be performed over different time periods or in real time, leading potentially to a different set of tasks/actions for inclusion in the to-do list or monitoring of user actions relative to a previously established task but wherein each of the sets of information is processed using the same system functional components.) determine a plurality of candidate actions for the second time step; ([0099, 0118, 0156, 0128, Figure 20] The device may classify the determined plurality of candidate tasks according to schedules of the user …, The device … may display the task … before departure and the task … after arrival …, When the device determines that information indicating the location information indicates a location of a market, the device … may output notification information about the task “Buy Milk”…, The device may determine a substitute task … The device … may obtain information about the market B near the user’s home from the service providing server or the application … and may determine “Go Shopping at Market B near Home” as a substitute task.; wherein, second tasks may be determined based upon different times in which they are required to be executing, leading, for instance, to a first main task and a second main task, each of which may consist of sub-tasks or actions, to the determination of a task such as “Buy Milk” which corresponds to an earlier determined task, or to the determination of a substitute task/action such as to shop at an alternative market but wherein any of these sub-tasks or main tasks is evaluated using the same system functional components.); compute, by the first ML model based on the data received at the second time step, a respective … value for each candidate action for the second time step, the probability values reflecting the probability that the associated candidate action achieves a second task at the second time step; ([0223, 0253] The data learner … may train the data recognition model through unsupervised learning to find a standard for judging the user’s intention, determining candidate tasks, determining a substitute task, and judging monitoring of a task by learning a type of data needed to judge a situation by itself without supervision. Also, the model learner … may train the data recognition model through reinforcement learning using a feedback about whether a result of judging the user’s intention, determining candidate tasks, determining substitute task, and judging monitoring of a task according to learning is right., …, wherein reinforcement learning (a first ML model) is used to determine the intentions of the user based upon a learned recognition of the meaning to the user of the received data, such as in the form of keywords, such that it continually evaluates through feedback the correctness of its judgement of intention and the determination of the candidate tasks, wherein this reinforcement model implicitly includes a probabilistic representation of the tasks given the interpreted user’s intentions through a Markov decision process, and wherein this functionality applies to any action/task that is identified as being relevant in a current time frame, such as “Buy Milk” even if it may correspond to an earlier identified task/action, or to an substitute/second task if another task is not possible, for instance to shop at market B instead of market B if a given time market A is closed.) determine that a second candidate action of the plurality of candidate actions for the second time step has a greater … value for achieving the first task at the second time step relative to the … values of the remaining plurality of candidate actions for the second time step; ([0242, 0253] The data learner … may train the data recognition model through unsupervised learning to find a standard for judging the user’s intention, determining candidate tasks, determining a substitute task, and judging monitoring of a task by learning a type of data needed to judge a situation by itself without supervision. Also, the model learner … may train the data recognition model through reinforcement learning using a feedback about whether a result of judging the user’s intention, determining candidate tasks, determining substitute task, and judging monitoring of a task according to learning is right., …, wherein the set of candidate action/tasks, in any time step, are chosen based upon a model that has been trained to learn the intentions of the user based on keywords such that the set of candidate actions/tasks have been chosen because they best represent the user’s intention and so are most likely to achieve the a first task, such as determined by an item or keyword, and wherein these candidate tasks are inherently based on a probabilistic representation of the user’s intention relative to a keyword through the reinforcement learning process.) determine that the second candidate action has not been implemented in the environment at the second time step; ([0149] When it is determined as a result of the monitoring that a task in the to-do list has not been performed, the device … may output information indicating that the first task has not been performed…, wherein, once a monitoring operation determines that any given task/action (first or second) has not been completed by a predetermined time, an output is produced to indicate this fact.) generate the policy comprising the first and second candidate actions; and generate an indication specifying to implement the second candidate action at the second time step as part of the policy to achieve the first task.  ([0242, 0253, Figure 20] The data learner … may train the data recognition model through unsupervised learning to find a standard for judging the user’s intention, determining candidate tasks, determining a substitute task, and judging monitoring of a task by learning a type of data needed to judge a situation by itself without supervision. Also, the model learner … may train the data recognition model through reinforcement learning using a feedback about whether a result of judging the user’s intention, determining candidate tasks, determining substitute task, and judging monitoring of a task according to learning is right., …, wherein the set of candidate action/tasks are chosen based upon a model that has been trained to learn the intentions of the user based on keywords such that the set of candidate actions/tasks have been chosen because they best represent the user’s intention and so are most likely to achieve a first task, such as determined by a first item or keyword, and wherein these candidate tasks are inherently based on a probabilistic representation of the user’s intention relative to a keyword through the reinforcement learning process and wherein the first task that is achieved may be the act of going to a supermarket with the different candidate actions corresponding to a primary action (supermarket A) and a substitute action (supermarket B).)
Although Choi may use a number of different machine learning techniques for discerning the intentions of the user and the candidate actions best suited for satisfying the user’s intentions, including reinforcement learning which would implicitly map the discernment operation relative to the actions to a probabilistic framework, he does not explicitly describe a probabilistic scoring of the candidate actions.
However, Papangelis, in the analogous environment of a dialogue system for providing candidate actions, teaches  compute, by the first ML model based on the data received at the second time step, a respective third probability value for each candidate action for the second time step, the third probability values reflecting the probability that the associated candidate action achieves the first task at the second time step ([Figures 6 and 7, 0049, 0050, 0094, 0152, 0169, 0247,260] The system states may be obtained from a corpus of data and the training performed using labels associated with the data …, The belief with respect to a slot s may be the set of probabilities that the slot has each possible value… The belief tracker model is a stored trained model that maps the input utterance to slot values , and updates the probabilities accordingly., The policy model effectively solves generic information-seeking problems. This means, for example, that instead of taking an action “inform(food)” the policy model takes and outputs actions of the type “inform(slot), for slot with maximum belief and importance greater than X” (where the importance … is a measure of how likely it is that a slot of the end user ontology must be filled for the parameterized policy to meet a user’s requirement).,  The labels indicating the current domain are used to determine the success of this identification, allowing the model to learn., Dialogue policy optimization can be solved via Reinforcement Learning.,  wherein a set of candidate actions is generated based upon a probabilistic belief measure which corresponds to the probability of a candidate action being associated with/relevant to the expressed intent of the user which is a probability that the action response will achieve the satisfaction of the user’s intentions, and wherein this evaluation is performed during each time slot between user utterances.) determine that a second candidate action of the plurality of candidate actions for the second time step has a greater probability value for achieving the first task at the second time step relative to the probability values of the remaining plurality of candidate actions for the second time step ([0094, 0095, 0110, 0152] The belief with respect to a slot s may be the set of probabilities that the slot has each possible value. For example, for the slot “price”, the values and probabilities may be: [empty:0.15, cheap: 0.35, moderate: 0.1, expensive: 0.4]. These probabilities are updated by the tracker model at each time slot t based on the new input utterance., It may comprise joint beliefs, which are probability distributions over the values of more than one slot (e.g. price and location (the probability that the user said both “cheap restaurant” and “centre of town”)., The full actions each comprise an action function … and may also comprise one or more categories and one or more values (e.g., price=high)., The belief with respect to a slot s may be the set of probabilities that the slot has each possible value., The policy model effectively solves generic information-seeking problems. This means, for example, that instead of taking an action “inform(food)” the policy model takes and outputs actions of the type “inform(slot), for slot with maximum belief and importance greater than X” (where the importance … is a measure of how likely it is that a slot of the end user ontology must be filled for the parameterized policy to meet a user’s requirement).,  wherein a set of candidate actions in the form of action slots, for example, of proposed restaurants, is generated based upon the probabilistic belief measure (joint or single) being maximum (i.e., a belief measure that exceeds the second highest belief measure).) 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Choi to incorporate the teachings of Papangelis to have determined a probability that a candidate action would satisfy the most probable discerned intentions of the user and to have proposed the candidate actions based upon this probabilistic assessment. The modification would have been obvious because one of ordinary skill would have been motivated to use a probabilistic measure for the likelihood that a candidate task may satisfy a user requirement to facilitate the adaptive optimization of the domain-specific and user-dependent policy model optimize the policy model ([0064, 0068]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Choi and Papangelis to incorporate the teachings of  Hwang and Fehrle for the same reasons as pointed out in claim 1.

In regards to claim 10, rejection of claim 9 is incorporated and Choi further teaches … 
wherein the natural language narrative is further based on the remaining plurality of candidate actions and the remaining … values …and output the natural language narrative for display ([Figures 12, 13, 16, and 17, 0116, 0134, 0253] The device … may form the tasks selected by the user as a to-do list … and may display the to-do list … on the execution screen of the display., Regarding “Book Hotel”, the device … may provide a hotel list including hotel information arranged in a descending order of ratings compared to price, by reflecting the intention that the user (I) and the other user … want a clean hotel … The device may generate the hotel list by applying a weight to price and ratings…,  a natural language narrative describing each of the actions associated with a task is displayed on a screen such that the selection of these tasks is based upon the output of the model for discerning user’s intentions and a corresponding association of tasks/actions that may be most likely to meet the requirements with that model being derived, for example, from reinforcement learning, and wherein the output narrative also may include additional information (e.g., ratings) which is used for prioritizing the list of potential actions but which is also a metric indicative of how likely the candidate action will satisfy the user requirements.) 
However, Choi does not explicitly teach … and the remaining probability values … receive, by the second ML model, the first ML model, the plurality of candidate actions, the computed probability values, and the received data from the plurality of data sources, the plurality of data sources comprising microphones, cameras, and computing devices…. In other words, Choi does not explicitly teach a second ML model that may perform additional processing or evaluation of the candidate actions that is used to generate a narrative. In addition, Choi does not explicitly teach a “probability value” that is associated with the expected efficacy of any given action/task. 
However, Papangelis, in the analogous environment of a dialogue system for providing candidate actions, teaches … and the remaining probability values …([0094, 0095, 0110, 0152] The belief with respect to a slot s may be the set of probabilities that the slot has each possible value. For example, for the slot “price”, the values and probabilities may be: [empty:0.15, cheap: 0.35, moderate: 0.1, expensive: 0.4]. These probabilities are updated by the tracker model at each time slot t based on the new input utterance., It may comprise joint beliefs, which are probability distributions over the values of more than one slot (e.g. price and location (the probability that the user said both “cheap restaurant” and “centre of town”)., The full actions each comprise an action function … and may also comprise one or more categories and one or more values (e.g., price=high)., The belief with respect to a slot s may be the set of probabilities that the slot has each possible value., The policy model effectively solves generic information-seeking problems. This means, for example, that instead of taking an action “inform(food)” the policy model takes and outputs actions of the type “inform(slot), for slot with maximum belief and importance greater than X” (where the importance … is a measure of how likely it is that a slot of the end user ontology must be filled for the parameterized policy to meet a user’s requirement).,  wherein a set of candidate actions in the form of action slots, for example, of proposed restaurants, is generated based upon two probabilities: (1) the probabilistic belief measure being maximum (i.e., a belief measure that exceeds the second highest belief measure) and (2)  the importance likelihood exceeding “X” which measures the likelihood that the action slot will satisfy the user’s requirements in the sense of being likely to occur.) 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Choi to incorporate the teachings of Papangelis to have determined a probability that a candidate action would satisfy the most probable discerned intentions of the user and to have proposed the candidate actions based upon this probabilistic assessment and to include this in a displayed list of potential actions. The modification would have been obvious because one of ordinary skill would have been motivated to use a probabilistic measure for the likelihood that a candidate task may satisfy a user requirement to facilitate the adaptive optimization of the domain-specific and user-dependent policy model optimize the policy model and to a probabilistic metric to represent to prioritization of actions ([0064, 0068]).
However, Choi and Papangelis  do not teach receive, by the second ML model, the first ML model, the plurality of candidate actions, the computed probability values, and the received data from the plurality of data sources, the plurality of data sources comprising microphones, cameras, and computing device…. In other words, Choi and Papangelis do not explicitly teach a second ML model that may perform additional processing or evaluation of the candidate actions that is used to generate a narrative.
However, Hwang, in the analogous environment of using machine learning to provide guiding tasks to a user, teaches receive, by the second ML model, the first ML model, the plurality of candidate actions, the computed … values, and the received data from the plurality of data sources, the plurality of data sources comprising microphones, cameras, and computing devices; ([Figures 7, 8, and 10, 0142, 0150, 0151, 0077] In the input processing, the electronic apparatus … may process various types of user input received … voice … text … image., The plan recognition management module is a module for determining a candidate task to provide a guide., The filter management module may determine a priority between candidate tasks or remove some candidate tasks … The electronic apparatus … may then generate a guide … according to the priority…, The memory may store a model for Natural Language Generation …,   wherein the automatic generation of user guides, as shown in Figure 7, receives input from text, audio, and image data sources along with intent parameters and other results generated through the task processing module  which is used to construct a candidate set of actions which corresponds to a first ML model but is also fed into the filter manager which is a second ML model which is also connected to the natural language generation output processing which may be considered also part of the second ML model or a distinct alternative second ML model .) 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Choi and Papangelis to incorporate the teachings of Hwang to have applied a second ML model for refining a set of candidate actions to form a narrative to interactively guide the user through the actions. The modification would have been obvious because one of ordinary skill would have been motivated to improve the efficiency and user-friendliness of task completion by using an interactive intelligent assistant to guide the user through prioritized tasks using a natural language generation model to facilitate the interaction ([0002, 0009]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Choi, Papangelis, and Hwang to incorporate the teachings of Fehrle for the same reasons as pointed out for claim 8

In regards to claim 11, rejection of claim 10 is incorporated and Choi further teaches  further storing instructions that when executed by the processor cause the processor to: generate a text transcription of speech in the audio data; extract the natural language concepts from the text transcription to extract the natural language concepts from the audio data; ([0072, 0084] Alternatively , when the application executed on the display is a call application through the device 1000 , the device 1000 may convert call voice into text by using speech - to - text ( STT ) conversion technology in real time , may analyze the text , and may determine keywords used to determine the tasks …. The device 1000 may convert the recorded call voice into text , may analyze the text , and may determine a keyword., The device 1000 may convert the recorded call voice into text , and may analyze based on the text that the user has to book a flight to Japan on vacation and has to buy a present for Susan . Accordingly , the device 1000 may determine “ Japan ” , “ Present ” , “ Vacation ” , “ Flight ” , and “ Book ” as the keywords 820 , based on the analysis.; wherein a textual transcript of audio/speech/voice data is created through STT conversion and wherein the resultant text is analyzed to extract/recognize keywords pertinent for a to-do list (i.e., information/natural language concepts/semantics relevant for the to-do list are found in text transcription).)
However, Choi, Papangelis, and Hwang do not explicitly disclose determine, by the second ML model, the plurality of natural language concepts; determine, by the second ML model for each natural language concept, an associated graphical symbol of the plurality of graphical symbols; select, by the second ML model, a first graphical symbol of the plurality of graphical symbols; and assign the first graphical symbol to the first candidate action.  Although each of Choi, Papangelis, and Hwang generate a dialogue/natural language narrative with Papangelis and Hwang in particular making use of natural language generation methods to perform that function, none of them explicitly discloses the determination of natural language concepts using a model as well as, as previously noted, the use of NL concept-based graphical symbols in the narrative.
However, Fehrle, in the analogous art of designing a dialog system for communicating advice, teaches determine, by the second ML model for each natural language concept, an associated graphical symbol of the plurality of graphical symbols; ([p. 28, Section 2.3, p. 30, Section 4.1, p. 32, Section 5.2.1, Figure 2, Figure 3] The picture generator accepts as input a semantic representation of an intended utterance within the domain of discourse of the application. The generator looks up in its picture lexicon the semantic symbols used in this representation.., Given a semantic representation of a message, the following steps must be carried out: 1. Determine, by looking up the necessary information in the lezicon, the sequence of CGS's and primitive actions which must be portrayed on the screen., The lexicon contains, among primitive graphical symbols, various views of the IBM-PC, a screwdriver and a hand (see Figure 5.2.1). The IBM-PC's back view has the annotation that the places where the casing may be unscrewed are in the four corners and the upper middle. Furthermore, the screwdriver has an annotation that if it is being used, a hand must hold it on its handle., wherein the language generator (second ML model) determines an association between parsed content and meaning of an intended semantic utterance and particular corresponding graphical symbols such that NL concepts like verb or noun have distinct symbolic structure and concepts associated with the semantic meanings also are associated with particular symbolic structures (or sequence or movement of symbols).) select, by the second ML model, a first graphical symbol of the plurality of graphical symbols; ([p. 30, Section 4.1, p. 33, Section 5.3,  Figure 2, Figure 3] Given a semantic representation of a message, the following steps must be carried out: 1. Determine, by looking up the necessary information in the lexicon, the sequence of CGS's and primitive actions which must be portrayed on the screen., The generator assembles a presentation by merging the instructions for the four simple operations and co-ordinating the transitions between the low-level primitives. The IBM-PC is considered the most important object and is thus placed in the center of the screen. The sequence of pictures which is generated is too long to show in its entirety here; instead we show two snapshots as examples., wherein symbols are selected according to their significance in the communication of the information signal in both time and space.) and assign the first graphical symbol to the first candidate action ([p. 33, Section 5.3,  Figure 4] Initially the PC is shown from the front. The user must turn around the PC to switch it off; Figure 5.3a shows the front view of the PC with the arrow (flashing in the actual presentation) indicating that it is to be turned., wherein particular graphical symbols are assigned to specific actions (first candidate action) in order to guide the recipient to perform those actions.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Choi, Papangelis, and Hwang to incorporate the teachings of Fehrle to have generated a natural language narrative using a model that identifies NL concepts, determines graphical symbols according to those concepts, and assigns each symbol to a candidate action. The modification would have been obvious because one of ordinary skill would have been motivated to improve the effectiveness of the communication of the content automatically generated for a user by including pictorial representations of the conceptual meaning of a natural language narrative that communicates a guide for actions by the recipient of that narrative  (Fehrle, [Abstract, p. 28, Section 2.2, p. 29, Section 3.2]).

Claim 12/8 is also rejected because it is just a computer readable medium implementation of the same subject matter of claim 4/1 which can be found in Choi, Papangelis, Hwang, and Fehrle.

Claim 13/8 is also rejected because it is just a computer readable medium implementation of the same subject matter of claim 6/1 which can be found in Choi, Papangelis, Hwang, and Fehrle.

Claim 14/8 is also rejected because it is just a computer readable medium implementation of the same subject matter of claim 7/1 which can be found in Choi, Papangelis, Hwang, and Fehrle.

Claim 15 is also rejected because it is just a method implementation of the same subject matter of claim 1 which can be found in Choi, Papangelis, Hwang, and Fehrle.

Claim 16/15 is also rejected because it is just a method implementation of the same subject matter of claim 2/1 which can be found in Choi, Papangelis, Hwang, and Fehrle.

Claim 17/16 is also rejected because it is just a method implementation of the same subject matter of claim 3/1 which can be found in Choi, Papangelis, Hwang, and Fehrle.

In regards to claim 18, rejection of claim 15 is incorporated and Choi further teaches wherein the first ML model comprises an artificial neural network, the method further comprising: receiving the training data comprising a plurality of training actions, at least a subset of the plurality of training actions comprising sequential actions; ([0118, 0250, 0251, 0253] When “Japan Trip”, “Book Flight”, “Book Hotel B”, “Take Passport Photo”, “submit Vacation Request”, “Buy Present for Susan”, and “Go to Festival XX” are determined as tasks … the device … may classify the tasks … into a task … before departure and a task … after arrival., The data recognition model may be a model based on , for example , a neural network., The basic learning data may be previously classified according to types of data, and the data recognition models may be previously established according to the types of data.,  The model learner may train the data recognition model through supervised learning by using, for example, learning data as an input value. Also the model learner … may train the data recognition model through unsupervised learning to find a standard for judging the user’s intention, determining candidate tasks, determining a substitute task, and judging monitoring of a task by learning a type of data needed to judge a situation… The model learner … may train the data recognition model through reinforcement learning using a feedback about whether a result of judging the user’s intention, determining candidate tasks, determining substitute tasks … is right., wherein various sources of training data is used to train the model learner/data recognition model including the training of various functional aspects of the to-do list generation process such as the determination of the candidate tasks such that this process may include any data collected during previous learning model construction events and wherein it is noted that this may also include sequences of actions because of the functionality for splitting apart a candidate task into sequential sub-tasks or actions such as corresponding to “before departure” and “after arrival” and wherein the model framework includes a neural network..) receiving the labels for the training data, the labels comprising values indicating ; ([0242, 0243, 0251, 0253, 0256] The data learner … may train a data recognition model to have a standard about how to judge the user’s intention and how to determine candidate tasks…, The basic learning data may be previously classified according to types of data, and the data recognition models may be previously established according to the types of data.,  The model learner may train the data recognition model through supervised learning by using, for example, learning data as an input value. Also the model learner … may train the data recognition model through unsupervised learning to find a standard for judging the user’s intention, determining candidate tasks, determining a substitute task, and judging monitoring of a task by learning a type of data needed to judge a situation… The model learner … may train the data recognition model through reinforcement learning using a feedback about whether a result of judging the user’s intention, determining candidate tasks, determining substitute tasks … is right.; wherein the learning data is used for the training of various functional aspects of the to-do list generation process so that they conform to a standard of correctness, such that one such functionality is the determination of the candidate tasks with the training data used to learn this determination based upon a reference of correctness that may be based either upon a feedback mechanism such as in reinforcement learning or based upon the training data in a supervised learning context and wherein the feedback about whether a result is correct is a label that is used in the training process either learning strategy in which the response/feedback training data sample is associated with/achieves or is not associated with/does not achieve a given action/task.) and training the first ML model based on the training data, the labels, and a ML algorithm ([0252, 0253, 0256] The model learner … may train the data recognition model by using a learning algorithm including… error back-propagation …., The model learner may train the data recognition model through supervised learning by using, for example, learning data as an input value. Also the model learner … may train the data recognition model through unsupervised learning to find a standard for judging the user’s intention, determining candidate tasks, determining a substitute task, and judging monitoring of a task by learning a type of data needed to judge a situation… The model learner … may train the data recognition model through reinforcement learning using a feedback about whether a result of judging the user’s intention, determining candidate tasks, determining substitute tasks … is right.; wherein, using the learning data, various functional aspects of the to-do list generation process are trained so that they conform to a standard of correctness, such that one such functionality is the determination of the candidate tasks with the training data used to learn this determination based upon a reference of correctness that may be based either upon a feedback mechanism such as in reinforcement learning or based upon the training data in a supervised learning context, wherein the ML  algorithm used to perform this function may be any of back propagation, reinforcement learning, supervised learning, or unsupervised learning, and wherein the feedback about whether a result is correct is a label that is used in the training process.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Choi to incorporate the teachings of  Papangelis, Hwang, and Fehrle for the same reasons as pointed out in claim 15.

Claim 19/15 is also rejected because it is just a method implementation of the same subject matter of claim 5/1 which can be found in Choi, Papangelis, Hwang, and Fehrle.

In regards to claim 20, rejection of claim 15 is incorporated and Choi further teaches wherein the first task is determined based on one or more of user input and the first ML model, wherein the first candidate action comprises one or more of: (i) 28Attorney Docket No. 1988.0034 modifying the real-world environment ([Figure 20] wherein the task of “buying milk at the market” entails a modification of the real-world environment, namely the acquisition of a physical object.)  and (ii) performing an operation in the real-world environment  ([Figure 6] wherein the task of “book a flight” entails an operation in the real-world environment.)   the method further comprising: receiving, by the first ML model, a reward value for the first time step, the reward value lesser than a reward value for a time step prior to the first time step, wherein the first ML model further determines the … values based on the reward value for the first time step.  ([0253, 0258] Also, the model learner … may train the data recognition model through reinforcement learning using a feedback about whether a result of judging the user’s intention, determining candidate tasks, determining substitute task, and judging monitoring of a task according to learning is right., …, wherein the set of candidate tasks are chosen based upon an ML model that has been trained using reinforcement learning to learn the best candidate action selection process based on a feedback about the efficacy of any choice of candidate actions as an indication of the correctness of the candidate task determination process and wherein the reinforcement learning feedback that is generated may be interpreted as a reward such that this reward then determines the probabilistic states of the reinforcement learning model and wherein the effect of implementing the reinforcement learning is to improve the efficacy of the candidate action selection process.) 
However, Choi does not explicitly teach the reward value lesser than a reward value for a time step prior to the first time step based on the reward value,… first and second probability values … compute, based on the first ML model, a respective second probability value for each remaining candidate action of the plurality of candidate actions; and determine that the second probability value for the first candidate action is greater than the second probability values of the remaining plurality of candidate actions. Although the reinforcement learning algorithm inherently computes a reward as a feedback, Choi does not explicitly relate that reward or feedback to a probability value characterizing the efficacy of the candidate action selection or that the reward may be less at a given time relative to a previous time.
However, Papangelis, in the analogous environment of a dialogue system for providing candidate actions, teaches receiving, by the first ML model, a reward value for the first time step, the reward value lesser than a reward value for a time step prior to the first time step,
wherein the first ML model further determines the first and second probability values based on the reward value for the first time step ([0094, 0152, 0229, 0231, 0255, 0259, 0260] The belief with respect to a slot s may be the set of probabilities that the slot has each possible value., The policy model effectively solves generic information-seeking problems. This means, for example, that instead of taking an action “inform(food)” the policy model takes and outputs actions of the type “inform(slot), for slot with maximum belief and importance greater than X” (where the importance … is a measure of how likely it is that a slot of the end user ontology must be filled for the parameterized policy to meet a user’s requirement)., For each slot … the cumulative moving average of the ratio of the reward that was achieved to the maximum possible reward is determined as an estimate of slot importance., Dialogue policy optimisation may be aimed at estimating the expected long - term reward for a system action being executed at a system state or belief state , such that the action with maximum expected reward can be selected at each dialogue turn., When learning or updating the values during the implementation, the system is able to adapt to potential changes as time progresses.., Dialogue policy optimization can be solved via Reinforcement Learning … A reward function assigns a reward r given the current state and action taken ….,,  wherein reinforcement learning is used to optimize an ML policy model such that the slot with an action is selected according to the highest long term reward (first probability) with this selection also being predicated upon the importance likelihood (second probability) which is computed from the rewards observed in response to an action, wherein this ML learning process adapts itself by adjusting the measure of importance, and wherein this adaption would occur as the rewards associated with a particular task diminishes over time (i.e., a reward for the first time step is less than the reward for a previous time step) relative to some other task, thereby changing the likelihood of its importance.) computing, based on the first ML model, a respective second probability value for each remaining candidate action of the plurality of candidate actions; ([0152, 0196, 0165] The policy model effectively solves generic information-seeking problems. This means, for example, that instead of taking an action “inform(food)” the policy model takes and outputs actions of the type “inform(slot), for slot with maximum belief and importance greater than X” (where the importance … is a measure of how likely it is that a slot of the end user ontology must be filled for the parameterized policy to meet a user’s requirement)., Importance, e.g. two parameters describing respectively how likely a slot will and will not occur in a dialogue., In addition, there may be a set of (one-dimensional) values, each indicating the probability of a slot being requested by a user … b_r_s denotes the belief probabilities for slot s being requested.,  wherein the importance measure, as an indicator of the likelihood that a slot will actually occur, may also be interpreted as a second probability value that reflects the likelihood of an action taking place, and wherein this computation is computed for all candidate actions/slot including those for which the belief may be the maximum.)  and determining that the second probability value for the first candidate action is greater than the second probability values of the remaining plurality of candidate actions. ([0152, 0196, 0165, 0183, 0234] The policy model effectively solves generic information-seeking problems. This means, for example, that instead of taking an action “inform(food)” the policy model takes and outputs actions of the type “inform(slot), for slot with maximum belief and importance greater than X” (where the importance … is a measure of how likely it is that a slot of the end user ontology must be filled for the parameterized policy to meet a user’s requirement)., Importance, e.g. two parameters describing respectively how likely a slot will and will not occur in a dialogue., In addition, there may be a set of (one-dimensional) values, each indicating the probability of a slot being requested by a user … b_r_s denotes the belief probabilities for slot s being requested., The output from the policy model is an action comprising a communication function ( e . g . “ select ” ) and one or more parameter values which describe a slot ( e . g . the slot with 5 values and high importance).,  wherein the importance measure, as an indicator of the likelihood that a slot will actually occur, is being interpreted as a second probability value such that the selection of an action is based upon a slot not only with maximum belief/relevance (first probability) but also sufficiently high importance and wherein this threshold of importance is being used to exclude a maximum belief candidate that has less than a requisite level of importance but which also will instantiate the maximum belief candidate if its importance is high in a relative sense (that is higher than other candidates which is interpreted to mean those that populate the set of remaining candidates)).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Choi to incorporate the teachings of Papangelis to have used reinforcement learning to determine a probability that a candidate action would satisfy the most probable discerned intentions based upon the reward associated with that candidate action and to shift the relative importance of different candidate actions adaptively as the relative normalized rewards change over time. The modification would have been obvious because one of ordinary skill would have been motivated to use a probabilistic measure for the likelihood that a candidate task may satisfy a user requirement to facilitate the adaptive optimization of the domain-specific and user-dependent policy model optimize the policy model ([0064, 0068, 0229]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Choi and Papangelis  to incorporate the teachings of  Hwang and Fehrle for the same reasons as pointed out in claim 15.

Response to Arguments
Applicant's arguments filed 18 March 2021 have been fully considered but they are not persuasive. 
Specifically, the Applicants Argue:
The Office concedes that Choi, Papangelis, and Hwang do not teach the limitations of "generate... a natural language narrative comprising a plurality of graphical symbols associated with a plurality of natural language concepts." Fehrle fails to cure the deficiencies of Choi, Papangelis, and Hwang, as Fehrle fails to teach the claimed "generate...a natural language narrative comprising a plurality of graphical symbols associated with a plurality of natural language concepts," the "natural language concepts comprising at least one of the natural language concepts extracted from the audio data and at least one of the natural language concepts extracted from the text data." Fehrle generally teaches a system that encodes concepts based on static mappings. For example, Figure 2 of Ferhle depicts a table that maps "pictures" to "text." Furthermore, Section 4.2 of Ferhle describes a "picture-lexicon that contains all the graphical symbols ... needed to produce a graphical representation." The "picture- lexicon" includes "descriptions of objects, stored in a graphical form," and "description of object movements, stored in a procedural form." Applicant respectfully submits that because Fehrle is limited to the static "picture-lexicon," Ferhle does not teach the claimed "generate...a natural language narrative comprising a plurality of graphical symbols associated with a plurality of natural language concepts," the "natural language concepts comprising at least one of the natural language concepts extracted from the audio data and at least one of the natural language concepts extracted from the text data" as recited by amended independent claims 1, 8, and 15. Instead, Ferhle merely references the static mappings of the "picture-lexicon." As such, the cited references fail to teach or suggest Appl. No. 16/252,846 Docket No.: 1988.0034Reply to Office Action of December 18, 2020 TC/A.U. 2122each limitation of claims 1, 8, and 15. 

Examiner’s Response:
The Examiner respectfully disagrees. Choi, Papangelis, Hwang, and Ferhle do teach these limitations. Specifically, as set forth in the current office action, Choi, Papangelis, Hwang, and Ferhle collectively teach “extract a plurality of features from the received data, the plurality of features comprising: objects identified in the image data, natural language concepts extracted from the audio data, and natural language concepts extracted from the text data; … generate … the natural language narrative comprising a plurality of graphical symbols associated with a plurality of natural language concepts of the natural language narrative, the plurality of natural language concepts of the natural language narrative comprising at least one of the natural language concepts extracted from the audio data and at least one of the natural language concepts extracted from the text data” with Choi teaching  “extract a plurality of features from the received data, the plurality of features comprising: objects identified in the image data, natural language concepts extracted from the audio data, and natural language concepts extracted from the text data; … generate … the natural language narrative …, the plurality of natural language concepts of the natural language narrative comprising at least one of the natural language concepts extracted from the audio data and at least one of the natural language concepts extracted from the text data” because Choi teaches the processing of video, audio, or text data to extract keywords pertinent for a to-do list ([0239, 0240]). Choi teaches the conversion of audio/voice into text using speech-to-text conversion technology and analyzing it to determine keywords ([0072, 0084]) a data recognition model is trained to analyze the collected data to identify a keyword that may be used in the to-do list ([0222]). Choi points out the use of image object recognition as related art ([0008]) while also teaching that the image may be analyzed to  provide “context information” ([0205]) and is analyzed (along with all other data sources) to extract/recognize keywords which may be considered in a BRI sense an object recognition process to determine the semantic meaning (NL concepts) of that content relative to a to-do list (e.g., “Japan” and “Vacation”, [0084]) in order to generate a NL narrative (including those related to NL concepts such as “Japan” and “Vacation” for example) displayed on a screen describing each of the actions associated with a task (viz., [Figures 5, 7, 9, 10, 12, 13, 16, and 17, 0116, 0134, 0253] “The device … may form the tasks selected by the user as a to-do list … and may display the to-do list … on the execution screen of the display., Regarding “Book Hotel”, the device … may provide a hotel list including hotel information arranged in a descending order of ratings compared to price, by reflecting the intention that the user (I) and the other user … want a clean hotel … The device may generate the hotel list by applying a weight to price and ratings…”.). Therefore, Choi teaches the extraction of features from both image data, audio data, and text data with the last two based on NLP methods and with the extracted features corresponding to the meaning of individual elements of the speech ([0084]). These features/keywords are directly used to train and apply the first ML model and generate the narrative as indicated in the Office Action. It is noted that Hwang also teaches extract a plurality of features from the received data, the plurality of features comprising: objects identified in the image data, natural language concepts extracted from the audio data, and natural language concepts extracted from the text data; … generate … the natural language narrative …, the plurality of natural language concepts of the natural language narrative comprising at least one of the natural language concepts extracted from the audio data and at least one of the natural language concepts extracted from the text data”. Specifically, Hwang was relied upon to generate by a second ML model a natural language narrative (a guide) that describes actions that the user may perform based on the recognition of the user’s intentions determined through the analysis of text, image, or audio input information.  Hwang also indicates (related art – [0007]) the use of NLP, speech recognition, object recognition etc including specifically the implementation in “an intelligent assistant” speech recognition technology “to understand a user’s language and perform the instruction desired by the user.” The NL guide/narrative is clearly based on the extraction/recognition of NL features input into the system from images (Figure 8), words input by the user (viz., [0013 ] “The processor may be further configured to deter mine whether the task is performable according to whether a word corresponding to a task execution command is included by analyzing the received user input.”[ 0014 ] “The processor may be further configured to obtain at least one from among an intention and a parameter by analyzing the received user input ,) and based on audio features…”, [0056] “For example , when a user input is a voice input , the processor 130 may perform preprocessing on the input user voice before performing a voice recognition function . For example , the preprocessing may include the operations of removing noise and obtaining features.”). 
However, Choi, Papangelis, and Hwang do not teach “the natural language narrative comprising a plurality of graphical symbols associated with a plurality of natural language concepts of the natural language narrative”.  In other words, although Choi, Papangelis, and Hwang teach the generation of a narrative using NL concepts that are derived from processing media content, they do not generate this narrative so as to include graphical symbols that are associated with the NL concepts. However, Fehrle teaches that association in the generation of a narrative as pointed out in the current office action and the previous NFOA.

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ROBERT LEWIS KULP whose telephone number is (571)272-7983.  The examiner can normally be reached on M, Th, F 8-5:30; Tu 8-3.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki, can be reached on 571-272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/ROBERT LEWIS KULP/Examiner, Art Unit 2122                                                                                                                                                                                                        
/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122