Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
All objections/rejections not mentioned in this Office Action have been withdrawn by the Examiner.

Response to Amendments 
Applicant’s amendment filed on 26 September 2022 has been entered. 
In view of the amendment to the claim(s), the amendment of claim(s) 1 and 5-7 and the cancellation of claim(s) 2 and 3, have been acknowledged and entered.  
In view of the amendment to claim(s) 1 and 5-7 and the cancellation of claim(s) 2 and 3, the rejection of claims 1-7 under 35 U.S.C. §103 is withdrawn.
In light of the amended claims, new grounds for rejection under 35 U.S.C. §103 are provided in the response below. 

Response to Arguments
Applicant’s arguments regarding the subject matter rejections under 35 U.S.C. §103, see pages 8-11 of the Response to Non-Final Office Action dated 13 July 2022, which was received on 26 September 2022, have been fully considered.
With respect to the rejection(s) of claim(s) 1 and 5-7 under 35 U.S.C. §103 as being obvious under Chen (U.S. Pat. App. Pub. No. 2020/0007380, hereinafter Chen) in view of Cech (U.S. Pat. App. Pub. No. 2018/0286404, hereinafter Cech), applicant asserts that Cech fails to teach or suggest “determines that the determined word in the another inquiry is the predetermined keyword indicating the user's intention in response to determining that a difference between the length of the user's another voice and the length of the predetermined response is within a predetermined range and in response to the intention determination means failing to determine the positive response, the negative response, or the predetermined keyword indicating the user's intention based on the user's voice response to the initial inquiry made by the inquiry means, the user's another voice being generated in response to the another inquiry, the predetermined keyword being one of the plurality of predetermined keywords.”  Specifically, applicant asserts (1) that Cech fails to “determining that the word in the another inquiry is the predetermined keyword indicating the user's intention based on the comparison of the length of the user's another voice with the length of the predetermined response” and (2) that “Because Cech is silent with respect to an inquiry provided to the user, Cech also does not teach or suggest determining that the determined word in the another inquiry is the predetermined keyword.” These arguments are not persuasive.
Regarding the first argument, Cech discloses the above described limitations of amended claim 1. The system of Cech discloses “identify[ing] a voice token as either a keyword phrase or command or a portion of a keyword phrase or command,” which is determining that a word is a keyword. (Cech, ¶ [0060]). Further, the system is “Tracking the interim periods during known command audio signal transmission... to identify a voice token as either a keyword phrase or command or a portion of a keyword phrase or command,” where a keyword phrase or command can “control vehicle operation {a keyword which indicates the user’s intention}” and where the comparison is based on an “expected interim time values” in conjunction with “respective start triggers and stop triggers associated with segments of the audio data stream.”. (Cech, ¶ [0060]; See excerpt from FIG. 8 below) 

    PNG
    media_image1.png
    200
    400
    media_image1.png
    Greyscale

Though applicant characterizes this as simply measuring the time in between words, this is an unfair characterization of the disclosure in Cech. The system in Cech is measuring, as illustrated in FIG. 8, both periods in between speech (interim periods), as well as the speech itself (audio sample period) and the overall command time frame as correlated to mouth movement, to determine the “keyword phrase or command”. Interim periods are punctuated by the “respective start triggers and stop triggers,” which indicate the time frame of the speech elements to which they are related in Cech. The system in Cech then compares these interim periods with “a command spacing time value constant corresponding to an expected interim time value between commands.” A grouping of expected interim time values, including both start and end times of the audio sample periods, is the length of the user’s another voice as disclosed in the present application. Therefore, the rejection is maintained as amended below. 
Regarding the second argument, and in response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986). Applicant asserts that Cech is deficient with regards to an inquiry. However, Cech discloses the same elements as related to a “keyword phrase or command,” where the inquiry analysis of Chen is modified by the command analysis of Cech. Examiner notes that the phrase “keyword phrase or command” is broad enough to encompass an inquiry, which is expressly recited in Chen. As the “keyword phrase or command” analysis of Cech modifies the inquiry analysis described in Chen, one skilled in the art would readily be able to apply said teaching to any form of user voice input, such as queries or commands. Therefore, this argument is not persuasive and the rejection is maintained.
Further, examiner notes that in the speech recognition arts, the words “command” and “inquiry” are among many which are used interchangeably as generic descriptors to describe user input to a speech responsive system, such as an autonomous device or an intelligent virtual assistant. The use of one generic descriptor, such as “voice input” or “command,” without the other does not prevent the disclosure from being applied more broadly. Without more, the mere lack of the word inquiry is not dispositive regarding the breadth of the teaching, where “voice input” and “commands” are described throughout.
The Applicant has not provided any further statement and therefore, the Examiner directs the Applicant to the below rationale.	 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1 and 5-7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chen (U.S. Pat. App. Pub. No. 2020/0007380, hereinafter Chen) in view of Cech (U.S. Pat. App. Pub. No. 2018/0286404, hereinafter Cech), Wolverton (U.S. Pat. App. Pub. No. 2014/0136013, hereinafter Wolverton), and Divakaran (U.S. Pat. App. Pub. No. 2017/0160813, hereinafter Divakaran).

Regarding claim 1, Chen discloses An interaction system comprising (the virtual agent implementing the method 300; Chen, ¶¶ [0037]): storage means for storing a user preference database (“Computer-readable instructions stored on a computer-readable storage device are executable by the processing unit 1402 of the machine 1400. A hard drive, CD-ROM, and RAM are some examples of articles including a non-transitory computer-readable medium such as a storage device.”; Chen, ¶¶ [0119]); inquiry means for outputting an electronic voice signal of an initial inquiry to a user (“the virtual agent” implementing the method 300 {using an inquiry means} “provides a question {making an inquiry} and a set of acceptable answers (choices) to the user” where the “human-to-agent interaction may take the form of one or more of text (e.g., a chat session), graphics (e.g., a video conference), or audio (e.g., a voice conversation)” thus the inquiry to the user can be made by a voice.; Chen, ¶¶ [0032], [0037], [0103]); a microphone configured to capture a user’s voice response (“the virtual agent 102 may detect the user 104 has accessed the virtual agent webpage at operation 106” such as by “speaking … into a microphone,”; Chen, ¶¶ [0026]); and intention determination means for determining a user’s intention (“the method 300,” where the method 300 as performed by the “processor operating on a computer system” is the intention determination means, either determines that the user response is an exact match to an answer provided at operation 204 or “at operation 320, determin[es] whether the answer provided by the user, at operation 206, corresponds to an answer provided (e.g., is not an exact match but the virtual agent may conclude with some degree of certainty that the user intended to select the answer)” thus determining the user’s intention.; Chen, ¶¶ [0037]) based on the user’s voice response in response to the initial inquiry made by the inquiry means (The determination of intention to select one or more answers, is based on the “answer provided by the user” where the answer is provided in response to the question and set of answers from the virtual agent, and where the answer provided by the user is spoken {e.g., “speaking the choice verbatim” as part of “a voice conversation”}; Chen, ¶¶ [0032]-[0033], [0037], [0103]), a plurality of predetermined keywords being set in the intention determination means, (“Embodiments herein may provide a virtual agent that is capable of understanding and selecting a choice to which an unexpected user response corresponds.” where “The virtual agent 102 may select the choice... in response to receiving such a word, phrase, or symbol from the user 104.” where the selection of a choice in the form of “other word[s], phrase[s], or symbol[s]” is a plurality of keywords being set corresponding to the user’s intent {intention determination means}.; Chen, ¶¶ [0030]) wherein, in response to the intention determination means failing to determine a positive response, a negative response, or a predetermined keyword indicating the user’s intention based on the user’s voice response to the initial inquiry made by the inquiry means (The virtual agent {implementing the method 300, thus including the intention determination means} can provide a “prompt (e.g., question) and choices (options the user may select to respond to the prompt). In response, the virtual agent expects, verbatim, the user to respond with a given choice of the choices” where the choices can be “‘YES’ {positive response} and ‘NO {negative response}’” and verbatim response of a “given choice of the choices” is a predetermined keyword, and where the virtual agent can “determin[e]... that the response provided by the user does not correspond to an answer provided {failing to determine a response indicating the user’s intention}” and “In response to determining, at operation 320, that the response provided by the user does not correspond to an answer provided, the virtual agent may determine that the user is off-track and perform remediation operation 324. The remediation operation 324 may include... ask[ing] the user a (new) question and provide answers”; Chen, ¶¶ [0029], [0038]), the inquiry means determines a word to be included in another inquiry by looking up a preference of the user in the user preference database (The system looks up possible response equivalents in “a model configured to determine a semantic similarity {determines a word by looking up a preference}” where the “model is configured to detect semantic similarity between a previous response and a current response. {the previous response and the semantic similarity being stored in a database, thus a user preference database}”; Chen, ¶¶ [0053], [0077]) and outputs an electronic voice signal of the another inquiry including the determined word to the user (The system may ask “Follow up questions … to resolve an ambiguity,” where “for the ‘user repeat’ taxonomy, the conversation controller 910 may choose the next best intent, excluding intents that were tried previously in the conversation, and follow the dialog script {determined word} corresponding to that intent.”; Chen, ¶¶ [0072], [0097]), the intention determination means: determines the positive response, the negative response, or the predetermined keyword based on a user’s image or a user’s another voice (“After operation 326, the method 300 may continue at operation 206.” Thus, as depicted in FIG. 3 and described in the accompanying paragraphs, operation 206 is performed after asking a new question at 326, where the user’s response is the “given choice of the choices” where the choices can be “‘YES’ {positive response} and ‘NO {negative response}’” and verbatim response of a “given choice of the choices” is a predetermined keyword, where the virtual agent will receive user response {user’s reaction} in response to the new question {the another inquiry}, and where the user response is a voice response {based on the user’s voice}; Chen, ¶¶ [0029], [0038]; FIG. 3), which is a user’s reaction in response to the another inquiry made by the inquiry means (operation 206, in response to a new question provided by the virtual agent at operation 326, is a new user response {user’s reaction} in response to a new question {the another inquiry} made by the virtual assistant {the inquiry means}; Chen, ¶¶ [0038]; FIG. 3) wherein the inquiry means makes the initial inquiry again so as to encourage the user to react… (The inquiry means, as incorporated in the virtual agent performing the method 300, “may ask the user a (new) question {makes the inquiry again} and provide answers or provide a non-question message to the user, at operation 326. After operation 326, the method 300 may continue at operation 206,” As operation 206 is receiving a user response to the new question, the new question {the inquiry again} encourages the user to provide the user response {to react}”; Chen, ¶¶ [0038], FIG. 3). However, Chen fails to expressly recite compares a length of the user’s another voice with a length of a predetermined response; and determines that the determined word in the another inquiry is the predetermined keyword indicating the user’s intention in response to determining that a difference between the length of the user’s another voice and the length of the predetermined response is within a predetermined range, the predetermined keyword being one of the plurality of predetermined keywords, in response to the intention determination means failing to determine the positive response, the negative response, or the predetermined keyword indicating the user’s intention based on the user’s voice response to the initial inquiry made by the inquiry means, the user’s another voice being generated in response to the another inquiry, wherein the inquiry means makes the initial inquiry again so as to encourage the user to react by a predetermined action, a facial expression, or a line of sight, the intention determination means determines the positive response, the negative response, or the predetermined keyword by recognizing the action, the facial expression, or the line of sight of the user based on the user’s image, which is the user’s reaction in response to the another inquiry made by the inquiry means, the storage means stores user profile information in which information indicating by which one of the action, the facial expression, and the line of sight the user should be encouraged to react to the another inquiry is set for each user, and the inquiry means makes the initial inquiry again so as to encourage reaction by the corresponding predetermined action, the facial expression, or the line of sight for each of the users based on the user profile information stored in the storage means.
Cech discloses systems and methods for automated speech recognition including the measurement of audio sample times for keyword detection. (Cech, ¶ [0007]). Regarding claim 1, Cech teaches compares a length of the user’s another voice with a length of a predetermined response (“In some embodiments, the audio samples are the above described voice tokens 45 that have been parsed from at least one speech input 42” where the speech and keyword phrase and command recognition includes “compar[ing] interim period times 715 with a command spacing time value constant {a length}” of the audio samples {of the user’s another voice} “corresponding to an expected interim time value {with a length…} between commands in a valid command data set {…of a predetermined response}; Cech, ¶¶ [0060]); and determines that the determined word in the another inquiry is the predetermined keyword indicating the user’s intention (“Tracking the interim periods during known command audio signal transmission {…based on the comparison of the length of the user’s another voice with the length of the predetermined response} is one aspect of training a speech recognition system to identify a voice token {determines that the word in the another inquiry…} as either a keyword phrase or command or a portion of a keyword phrase or command {…is the predetermined keyword…}.”; Cech, ¶¶ [0060]) in response to determining that a difference between the length of the user’s another voice and the length of the predetermined response is within a predetermined range (Further, the system is “Tracking the interim periods during known command audio signal transmission... to identify a voice token as either a keyword phrase or command or a portion of a keyword phrase or command,” where a keyword phrase or command can “control vehicle operation {a keyword which indicates the user’s intention}” and where the comparison is based on an “expected interim time value {...is within a predetermined range}”; Cech, ¶¶ [0060], [0062]), the predetermined keyword being one of the plurality of predetermined keywords (The system compares to “commands {predetermined keywords} in a valid command data set {...being one of the plurality of predetermined keywords}”; Cech, ¶¶ [0060]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and methods for a context-aware virtual agent of Chen to incorporate the teachings of Cech to include compares a length of the user’s another voice with a length of a predetermined response; and determines that the determined word in the another inquiry is the predetermined keyword indicating the user’s intention in response to determining that a difference between the length of the user’s another voice and the length of the predetermined response is within a predetermined range, the predetermined keyword being one of the plurality of predetermined keywords. The speech recognition system uses length of the audio signal, among other “out-of-band information,” which provides a “credible way … to double check a perceived speech input,” as recognized by Cech. (Cech, ¶ [0006]). However, Chen and Cech fails to expressly recite in response to the intention determination means failing to determine the positive response, the negative response, or the predetermined keyword indicating the user’s intention based on the user’s voice response to the initial inquiry made by the inquiry means, the user’s another voice being generated in response to the another inquiry, wherein the inquiry means makes the initial inquiry again so as to encourage the user to react by a predetermined action, a facial expression, or a line of sight, the intention determination means determines the positive response, the negative response, or the predetermined keyword by recognizing the action, the facial expression, or the line of sight of the user based on the user’s image, which is the user’s reaction in response to the another inquiry made by the inquiry means, the storage means stores user profile information in which information indicating by which one of the action, the facial expression, and the line of sight the user should be encouraged to react to the another inquiry is set for each user, and the inquiry means makes the initial inquiry again so as to encourage reaction by the corresponding predetermined action, the facial expression, or the line of sight for each of the users based on the user profile information stored in the storage means.
Wolverton discloses systems and methods for multi-modal interaction with a personal assistant. (Wolverton, ¶ [0002]-[0003]). Regarding claim 1, Wolverton discloses determines that the determined word in the another inquiry is the predetermined keyword indicating the user’s intention (The system distinguishes situation aware dialogues where “in situation-aware dialogs, adjectives (e.g., “wavy lines” or “orange”) may be extracted and assigned a higher importance.”; Wolverton, ¶¶ [0103]) and in response to the intention determination means failing to determine the positive response, the negative response, or the predetermined keyword indicating the user’s intention based on the user’s voice response to the initial inquiry made by the inquiry means (the system “evaluates whether the input 102 is clear as to the user’s intent and objectives” where “if the user’s goal or intent is not clear from the input 102, the method 500 solicits additional input from the user at block 512 and returns to block 502” where the evaluation “may be based on, for example, a comparison of the current input 102 to previously-received inputs 102 and corresponding responses issued by the vehicle personal assistant 112.” resulting in an information request, where the “information request may further be...a ‘situation-aware query’,”; Wolverton, ¶¶ [0105]), the user’s another voice being generated in response to the another inquiry (“upon receiving a suggestion from the vehicle personal assistant 112, the user may respond with a follow-up question or statement”; Wolverton, ¶¶ [0089]), wherein the inquiry means makes the initial inquiry again so as to encourage the user to react by a predetermined action, a facial expression, or a line of sight (The system “solicits additional input from the user” where the solicited input can include “the user’s gesture 154, gaze 156, and facial features or expressions 160”; Wolverton, ¶¶ [0105], [0102]), the intention determination means determines the positive response, the negative response, or the predetermined keyword by recognizing the action, the facial expression, or the line of sight of the user based on the user’s image (The system “analyzes the input 102 to determine the user’s intended meaning thereof “ where the inputs can include “the user’s voice 152... the user’s gesture 154, gaze 156, and facial features or expressions 160; … [and] the user’s touch 158.”; Wolverton, ¶¶ [0103], [0102]), which is the user’s reaction in response to the another inquiry made by the inquiry means (The input 102 is in response to the solicited additional input.; Wolverton, ¶¶ [0105]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and methods for a context-aware virtual agent of Chen as modified by the sample time measurement for speech recognition taught in Cech, to incorporate the teachings of Wolverton to include determines that the determined word in the another inquiry is the predetermined keyword indicating the user’s intention  and in response to the intention determination means failing to determine the positive response, the negative response, or the predetermined keyword indicating the user’s intention based on the user’s voice response to the initial inquiry made by the inquiry means, the user’s another voice being generated in response to the another inquiry, wherein the inquiry means makes the initial inquiry again so as to encourage the user to react by a predetermined action, a facial expression, or a line of sight, the intention determination means determines the positive response, the negative response, or the predetermined keyword by recognizing the action, the facial expression, or the line of sight of the user based on the user’s image, which is the user’s reaction in response to the another inquiry made by the inquiry means. By using contextual input, the “vehicle personal assistant 112 can determine or “infer” a likely current context of a dialog in which it is engaged with a person in order to improve its understanding of the person's goal or intent with respect to the dialog,” thereby improving the user experience, as recognized by Wolverton. (Wolverton, ¶ [0024]). However, Chen, Cech, and Wolverton fails to expressly recite the storage means stores user profile information in which information indicating by which one of the action, the facial expression, and the line of sight the user should be encouraged to react to the another inquiry is set for each user, and the inquiry means makes the initial inquiry again so as to encourage reaction by the corresponding predetermined action, the facial expression, or the line of sight for each of the users based on the user profile information stored in the storage means.
Divakaran teaches “a multi-modal, conversational virtual personal assistant.” (Divakaran, ¶ [0039]). Regarding claim 1, Divakaran teaches the storage means stores user profile information (“a multi-modal virtual personal assistant can also include a preference model, which can be tailored for a particular population and/or for one or more individual people” stored as part of a database, such as database 820; Divakaran, ¶¶ [0041], [0107]) in which information indicating by which one of the action, the facial expression, and the line of sight the user should be encouraged to react to the another inquiry is set for each user (“The preference model can also store characteristics and traits about a person, such as a propensity for speaking very quickly when anxious. The various audible, visual, and tactile information that can be input into the virtual personal assistant can be modified by the preference model to adjust for, for example, accents, cultural differences in the meaning of gestures, regional peculiarities, personal characteristics, and so on,” where adjusting for said differences and distinctions is encouragement to react to the inquiry, and where the preference model is specific to the person {set for each user}; Divakaran, ¶¶ [0041]), and the inquiry means makes the initial inquiry again so as to encourage reaction by the corresponding predetermined action, the facial expression, or the line of sight for each of the users based on the user profile information stored in the storage means (The preference model, as used throughout, is discussed in further detail in FIG. 15. The system discloses that “the programmed preferences 1542 and/or learned preferences 1544 of [the preference model] can be applied to the inputs 1510, 1520, 1530, to filter and/or adjust the inputs according to the preferences 1542, 1544.” Thus, the inputs {inquiry} are based on the preference model {user profile information} stored in the database {storage means}; Divakaran, ¶¶ [0173], [0107]). 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and methods for a context-aware virtual agent of Chen as modified by the sample time measurement for speech recognition taught in Cech, and as modified by the in-vehicle personal assistant of Wolverton, to incorporate the teachings of Divakaran to include the storage means stores user profile information in which information indicating by which one of the action, the facial expression, and the line of sight the user should be encouraged to react to the another inquiry is set for each user, and the inquiry means makes the initial inquiry again so as to encourage reaction by the corresponding predetermined action, the facial expression, or the line of sight for each of the users based on the user profile information stored in the storage means. The multi-modal virtual assistant can “comprehend non-verbal conversational cues” which can allow the assistant “to interact with a person in a natural way,” as recognized by Divakaran. (Divakaran, ¶ [0037]).

Regarding claim 5, Chen discloses An interaction method comprising the steps of (the virtual agent implementing the method 300; Chen, ¶¶ [0037]): outputting an electronic voice signal of an initial inquiry to a user by a voice (“the virtual agent” implementing the method 300 “provides a question {making an inquiry} and a set of acceptable answers (choices) to the user” where the “human-to-agent interaction may take the form of one or more of text (e.g., a chat session), graphics (e.g., a video conference), or audio (e.g., a voice conversation)” thus the inquiry to the user can be made by a voice.; Chen, ¶¶ [0032], [0037], [0103]) capturing, by a microphone, a user's voice response (“the virtual agent 102 may detect the user 104 has accessed the virtual agent webpage at operation 106” such as by “speaking … into a microphone,”; Chen, ¶¶ [0026]); and determining a user's intention (“the method 300” either determines that the user response is an exact match to an answer provided at operation 204 or “at operation 320, determin[es] whether the answer provided by the user, at operation 206, corresponds to an answer provided (e.g., is not an exact match but the virtual agent may conclude with some degree of certainty that the user intended to select the answer)” thus determining the user's intention.; Chen, ¶¶ [0037]) based on the user's voice response in response to the initial inquiry, (The determination of intention to select one or more answers, is based on the “answer provided by the user” where the answer is provided in response to the question and set of answers from the virtual agent, and where the answer provided by the user is spoken {e.g., “speaking the choice verbatim” as part of “a voice conversation”}; Chen, ¶¶ [0032]-[0033], [0037], [0103]) a plurality of predetermined keywords being preset, (“Embodiments herein may provide a virtual agent that is capable of understanding and selecting a choice to which an unexpected user response corresponds.” where “The virtual agent 102 may select the choice... in response to receiving such a word, phrase, or symbol from the user 104.” where the selection of a choice in the form of “other word[s], phrase[s], or symbol[s]” is a plurality of keywords being set corresponding to the user’s intent.; Chen, ¶¶ [0030]), the method comprising: determining a word to be included in another inquiry by looking up a preference of the user in a user preference database (The system looks up possible response equivalents in “a model configured to determine a semantic similarity {determines a word by looking up a preference}” where the “model is configured to detect semantic similarity between a previous response and a current response. {the previous response and the semantic similarity being stored in a database, thus a user preference database}”; Chen, ¶¶ [0053], [0077]) and outputting an electronic voice signal of the another inquiry including the determined word to the user (The system may ask “Follow up questions … to resolve an ambiguity,” where “for the ‘user repeat’ taxonomy, the conversation controller 910 may choose the next best intent, excluding intents that were tried previously in the conversation, and follow {outputting} the dialog script {the another inquiry including the determined word} corresponding to that intent.” ; Chen, ¶¶ [0072], [0097]), in response to failing to determine a positive response, a negative response, or a predetermined keyword indicating the user's intention based on the user's voice response in response to the initial inquiry (The virtual agent {implementing the method 300} can provide a “prompt (e.g., question) and choices (options the user may select to respond to the prompt). In response, the virtual agent expects, verbatim, the user to respond with a given choice of the choices” where the choices can be “'YES' {positive response} and 'NO {negative response}'“ and verbatim response of a “given choice of the choices” is a predetermined keyword, and where the virtual agent can “determin[e]... that the response provided by the user does not correspond to an answer provided {cannot determine a response indicating the user's intention}”; Chen, ¶¶ [0029], [0038]; FIG. 3); determining the positive response, the negative response, or the predetermined keyword based on a user's image or a user's voice (“After operation 326, the method 300 may continue at operation 206.” Thus, as depicted in FIG. 3 and described in the accompanying paragraphs, operation 206 is performed after asking a new question at 326, where the user’s response is the “given choice of the choices” where the choices can be “'YES' {positive response} and 'NO {negative response}'“ and verbatim response of a “given choice of the choices” is a predetermined keyword, where the virtual agent will receive user response {user's reaction} in response to the new question {the another inquiry}, and where the user response is a voice response {based on the user’s voice}; Chen, ¶¶ [0029], [0038]; FIG. 3), which is a user's reaction in response to the another inquiry (operation 206, in response to a new question provided by the virtual agent at operation 326, is a new user response {user's reaction} in response to a new question {the another inquiry} made by the virtual assistant {the inquiry means}; Chen, ¶¶ [0038]; FIG. 3) making the initial inquiry again so as to encourage reaction… (The inquiry means, as incorporated in the virtual agent performing the method 300, “may ask the user a (new) question {makes the inquiry again} and provide answers or provide a non-question message to the user, at operation 326. After operation 326, the method 300 may continue at operation 206,” As operation 206 is receiving a user response to the new question, the new question {the inquiry again} encourages the user to provide the user response {to react}”; Chen, ¶¶ [0038], FIG. 3). However, Chen fails to expressly recite comparing a length of the user’s another voice with a length of a predetermined response; and determining that the determined word in the another inquiry is the predetermined keyword indicating the user’s intention in response to determining that a difference between the length of the user’s another voice and the length of the predetermined response is within a predetermined range, the predetermined keyword being one of the plurality of predetermined keywords, in response to failing to determine the positive response, the negative response, or the predetermined keyword indicating the user’s intention based on the user’s voice response to the initial inquiry, the user’s another voice being generated in response to the another inquiry, making the initial inquiry again so as to encourage reaction by the corresponding predetermined action, facial expression, or line of sight, and determining the positive response, the negative response, or the predetermined keyword by recognizing the predetermined action, the facial expression, or the line of sight of the user based on the user's image, which is the user’s reaction in response to the another inquiry made by the inquiry means, wherein the method further comprises: storing user profile information in which information indicating by which one of a predetermined action, a facial expression, and a line of sight the user should be encouraged to react to the another inquiry is set for each user making the initial inquiry again so as to encourage reaction by the corresponding predetermined action, facial expression, or line of sight for each of the users based on the stored user profile information.
The relevance of Cech is disclosed above with relation to claim 1. Regarding claim 5, Cech teaches comparing a length of the user’s another voice with a length of a predetermined response (“In some embodiments, the audio samples are the above described voice tokens 45 that have been parsed from at least one speech input 42” where the speech and keyword phrase and command recognition includes “compar[ing] interim period times 715 with a command spacing time value constant {a length}” of the audio samples {of the user’s another voice} “corresponding to an expected interim time value {with a length…} between commands in a valid command data set {…of a predetermined response}; Cech, ¶¶ [0060]); and determining that the determined word in the another inquiry is the predetermined keyword indicating the user’s intention (“Tracking the interim periods during known command audio signal transmission {…based on the comparison of the length of the user’s another voice with the length of the predetermined response} is one aspect of training a speech recognition system to identify a voice token {determines that the word in the another inquiry…} as either a keyword phrase or command or a portion of a keyword phrase or command {…is the predetermined keyword…}.”; Cech, ¶¶ [0060]) in response to determining that a difference between the length of the user’s another voice and the length of the predetermined response is within a predetermined range (Further, the system is “Tracking the interim periods during known command audio signal transmission... to identify a voice token as either a keyword phrase or command or a portion of a keyword phrase or command,” where a keyword phrase or command can “control vehicle operation {a keyword which indicates the user’s intention}” and where the comparison is based on an “expected interim time value {...is within a predetermined range}”; Cech, ¶¶ [0060], [0062]), the predetermined keyword being one of the plurality of predetermined keywords (The system compares to “commands {predetermined keywords} in a valid command data set {...being one of the plurality of predetermined keywords}”; Cech, ¶¶ [0060]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and methods for a context-aware virtual agent of Chen to incorporate the teachings of Cech to include compares a length of the user’s another voice with a length of a predetermined response; and determines that the determined word in the another inquiry is the predetermined keyword indicating the user’s intention in response to determining that a difference between the length of the user’s another voice and the length of the predetermined response is within a predetermined range, the predetermined keyword being one of the plurality of predetermined keywords. The speech recognition system uses length of the audio signal, among other “out-of-band information,” which provides a “credible way … to double check a perceived speech input,” as recognized by Cech. (Cech, ¶ [0006]). However, Chen and Cech fails to expressly recite in response to failing to determine the positive response, the negative response, or the predetermined keyword indicating the user’s intention based on the user’s voice response to the initial inquiry, the user’s another voice being generated in response to the another inquiry, making the initial inquiry again so as to encourage reaction by the corresponding predetermined action, facial expression, or line of sight, and determining the positive response, the negative response, or the predetermined keyword by recognizing the predetermined action, the facial expression, or the line of sight of the user based on the user's image, which is the user’s reaction in response to the another inquiry made by the inquiry means, wherein the method further comprises: storing user profile information in which information indicating by which one of a predetermined action, a facial expression, and a line of sight the user should be encouraged to react to the another inquiry is set for each user making the initial inquiry again so as to encourage reaction by the corresponding predetermined action, facial expression, or line of sight for each of the users based on the stored user profile information.
The relevance of Wolverton is disclosed above with relation to claim 1. Regarding claim 5, Wolverton discloses determines that the determined word in the another inquiry is the predetermined keyword indicating the user’s intention (The system distinguishes situation aware dialogues where “in situation-aware dialogs, adjectives (e.g., “wavy lines” or “orange”) may be extracted and assigned a higher importance.”; Wolverton, ¶¶ [0103]) in response to failing to determine the positive response, the negative response, or the predetermined keyword indicating the user’s intention based on the user’s voice response to the initial inquiry (the system “evaluates whether the input 102 is clear as to the user’s intent and objectives” where “if the user’s goal or intent is not clear from the input 102, the method 500 solicits additional input from the user at block 512 and returns to block 502” where the evaluation “may be based on, for example, a comparison of the current input 102 to previously-received inputs 102 and corresponding responses issued by the vehicle personal assistant 112.” resulting in an information request, where the “information request may further be...a ‘situation-aware query’,”; Wolverton, ¶¶ [0105]), the user’s another voice being generated in response to the another inquiry (“upon receiving a suggestion from the vehicle personal assistant 112, the user may respond with a follow-up question or statement”; Wolverton, ¶¶ [0089]), making the initial inquiry again so as to encourage reaction by the corresponding predetermined action, facial expression, or line of sight (The system “solicits additional input from the user” where the solicited input can include “the user’s gesture 154, gaze 156, and facial features or expressions 160”; Wolverton, ¶¶ [0105], [0102]), and determining the positive response, the negative response, or the predetermined keyword by recognizing the predetermined action, the facial expression, or the line of sight of the user based on the user's image (The system “analyzes the input 102 to determine the user’s intended meaning thereof “ where the inputs can include “the user’s voice 152... the user’s gesture 154, gaze 156, and facial features or expressions 160; … [and] the user’s touch 158.”; Wolverton, ¶¶ [0103], [0102]), which is the user’s reaction in response to the another inquiry made by the inquiry means (The input 102 is in response to the solicited additional input.; Wolverton, ¶¶ [0105]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and methods for a context-aware virtual agent of Chen as modified by the sample time measurement for speech recognition taught in Cech, to incorporate the teachings of Wolverton to include in response to failing to determine the positive response, the negative response, or the predetermined keyword indicating the user’s intention based on the user’s voice response to the initial inquiry, the user’s another voice being generated in response to the another inquiry, making the initial inquiry again so as to encourage reaction by the corresponding predetermined action, facial expression, or line of sight, and determining the positive response, the negative response, or the predetermined keyword by recognizing the predetermined action, the facial expression, or the line of sight of the user based on the user's image, which is the user’s reaction in response to the another inquiry made by the inquiry means. By using contextual input, the “vehicle personal assistant 112 can determine or “infer” a likely current context of a dialog in which it is engaged with a person in order to improve its understanding of the person's goal or intent with respect to the dialog,” thereby improving the user experience, as recognized by Wolverton. (Wolverton, ¶ [0024]). However, Chen, Cech, and Wolverton fails to expressly recite wherein the method further comprises: storing user profile information in which information indicating by which one of a predetermined action, a facial expression, and a line of sight the user should be encouraged to react to the another inquiry is set for each user making the initial inquiry again so as to encourage reaction by the corresponding predetermined action, facial expression, or line of sight for each of the users based on the stored user profile information.
The relevance of Divakaran is disclosed above with relation to claim 1. Regarding claim 5, Divakaran teaches wherein the method further comprises: storing user profile information (“a multi-modal virtual personal assistant can also include a preference model, which can be tailored for a particular population and/or for one or more individual people” stored as part of a database, such as database 820; Divakaran, ¶¶ [0041], [0107]) in which information indicating by which one of a predetermined action, a facial expression, and a line of sight the user should be encouraged to react to the another inquiry is set for each user (“The preference model can also store characteristics and traits about a person, such as a propensity for speaking very quickly when anxious. The various audible, visual, and tactile information that can be input into the virtual personal assistant can be modified by the preference model to adjust for, for example, accents, cultural differences in the meaning of gestures, regional peculiarities, personal characteristics, and so on,” where adjusting for said differences and distinctions is encouragement to react to the inquiry, and where the preference model is specific to the person {set for each user}; Divakaran, ¶¶ [0041]), making the initial inquiry again so as to encourage reaction by the corresponding predetermined action, facial expression, or line of sight for each of the users based on the stored user profile information (The preference model, as used throughout, is discussed in further detail in FIG. 15. The system discloses that “the programmed preferences 1542 and/or learned preferences 1544 of [the preference model] can be applied to the inputs 1510, 1520, 1530, to filter and/or adjust the inputs according to the preferences 1542, 1544.” Thus, the inputs {inquiry} are based on the preference model {user profile information} stored in the database {storage means}; Divakaran, ¶¶ [0173], [0107]). 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and methods for a context-aware virtual agent of Chen as modified by the sample time measurement for speech recognition taught in Cech, and as modified by the in-vehicle personal assistant of Wolverton, to incorporate the teachings of Divakaran to include wherein the method further comprises: storing user profile information in which information indicating by which one of a predetermined action, a facial expression, and a line of sight the user should be encouraged to react to the another inquiry is set for each user making the initial inquiry again so as to encourage reaction by the corresponding predetermined action, facial expression, or line of sight for each of the users based on the stored user profile information. The multi-modal virtual assistant can “comprehend non-verbal conversational cues” which can allow the assistant “to interact with a person in a natural way,” as recognized by Divakaran. (Divakaran, ¶ [0037]).

Regarding claim 6, Chen discloses a non-transitory computer readable medium storing a program for causing a computer to execute the following processing of (the virtual agent implementing the method 300 through “computer-readable instructions stored on a computer-readable storage device are executable by the processing unit 1402 of the machine 1400… including a non-transitory computer-readable medium such as a storage device”; Chen, ¶¶ [0037], [0119]): outputting an electronic voice signal of an initial inquiry to a user (“the virtual agent” implementing the method 300 “provides a question {making an inquiry}and a set of acceptable answers (choices) to the user” where the “human-to-agent interaction may take the form of one or more of text (e.g., a chat session), graphics (e.g., a video conference), or audio (e.g., a voice conversation)” thus the inquiry to the user can be made by a voice.; Chen, ¶¶ [0032], [0037], [0103]); determining a word to be included in another inquiry by looking up a preference of the user in a user preference database (The system looks up possible response equivalents in “a model configured to determine a semantic similarity {determines a word by looking up a preference}” where the “model is configured to detect semantic similarity between a previous response and a current response. {the previous response and the semantic similarity being stored in a database, thus a user preference database}”; Chen, ¶¶ [0053], [0077]) and outputting an electronic voice signal of the another inquiry to the user (The system may ask “Follow up questions … to resolve an ambiguity,” where “for the ‘user repeat’ taxonomy, the conversation controller 910 may choose the next best intent, excluding intents that were tried previously in the conversation, and follow {outputting} the dialog script {the another inquiry including the determined word} corresponding to that intent.” ; Chen, ¶¶ [0072], [0097]), in response to failing to determine a positive response, a negative response, or a predetermined keyword indicating the user's intention based on the user's voice response in response to the initial inquiry (The virtual agent {implementing the method 300} can provide a “prompt (e.g., question) and choices (options the user may select to respond to the prompt). In response, the virtual agent expects, verbatim, the user to respond with a given choice of the choices” where the choices can be “'YES' {positive response} and 'NO {negative response}'“ and verbatim response of a “given choice of the choices” is a predetermined keyword, and where the virtual agent can “determin[e]... that the response provided by the user does not correspond to an answer provided {cannot determine a response indicating the user's intention}” where the determination of intention to select one or more answers, is based on the “answer provided by the user” where the answer is provided in response to the question and set of answers from the virtual agent, and where the answer provided by the user is spoken {e.g., “speaking the choice verbatim” as part of “a voice conversation”}; Chen, ¶¶ [0029], [0032]-[0033], [0037]-[0038], [0103]; FIG. 3); a plurality of predetermined keywords being set in the program, (“Embodiments herein may provide a virtual agent that is capable of understanding and selecting a choice to which an unexpected user response corresponds.” where “The virtual agent 102 may select the choice... in response to receiving such a word, phrase, or symbol from the user 104.” where the selection of a choice in the form of “other word[s], phrase[s], or symbol[s]” is a plurality of keywords being set corresponding to the user’s intent {intention determination means}.; Chen, ¶¶ [0030]), and determining the positive response, the negative response, or the predetermined keyword based on a user's image or a user's another voice (“After operation 326, the method 300 may continue at operation 206.” Thus, as depicted in FIG. 3 and described in the accompanying paragraphs, operation 206 is performed after asking a new question at 326, where the user’s response is the “given choice of the choices” where the choices can be “'YES' {positive response} and 'NO {negative response}'“ and verbatim response of a “given choice of the choices” is a predetermined keyword, where the virtual agent will receive user response {user's reaction} in response to the new question {the another inquiry}, and where the user response is a voice response {based on the user’s voice}; Chen, ¶¶ [0029], [0038]; FIG. 3), which is a user's reaction in response to the another inquiry (operation 206, in response to a new question provided by the virtual agent at operation 326, is a new user response {user's reaction} in response to a new question {the another inquiry} made by the virtual assistant {the inquiry means}; Chen, ¶¶ [0038]; FIG. 3), [and] making the initial inquiry again so as to encourage reaction… (The inquiry means, as incorporated in the virtual agent performing the method 300, “may ask the user a (new) question {makes the inquiry again} and provide answers or provide a non-question message to the user, at operation 326. After operation 326, the method 300 may continue at operation 206,” As operation 206 is receiving a user response to the new question, the new question {the inquiry again} encourages the user to provide the user response {to react}”; Chen, ¶¶ [0038], FIG. 3). However, Chen fails to expressly recite comparing a length of the user’s another voice with a length of a predetermined response; and determining that the determined word in the another inquiry is the predetermined keyword indicating the user’s intention in response to determining that a difference between the length of the user’s another voice and the length of the predetermined response is within a predetermined range, the predetermined keyword being one of the plurality of predetermined keywords, in response to failing to determine the positive response, the negative response, or the predetermined keyword indicating the user’s intention based on the user’s voice response to the initial inquiry, the user’s another voice being generated in response to the another inquiry, making the initial inquiry again so as to encourage reaction by the corresponding predetermined action, facial expression, or line of sight, and determining the positive response, the negative response, or the predetermined keyword by recognizing the predetermined action, the facial expression, or the line of sight of the user based on the user's image, which is the user’s reaction in response to the another inquiry made by the inquiry means, wherein the method further comprises: storing user profile information in which information indicating by which one of a predetermined action, a facial expression, and a line of sight the user should be encouraged to react to the another inquiry is set for each user making the initial inquiry again so as to encourage reaction by the corresponding predetermined action, facial expression, or line of sight for each of the users based on the stored user profile information.
The relevance of Cech is disclosed above with relation to claim 1. Regarding claim 6, Cech teaches comparing a length of the user’s another voice with a length of a predetermined response (“In some embodiments, the audio samples are the above described voice tokens 45 that have been parsed from at least one speech input 42” where the speech and keyword phrase and command recognition includes “compar[ing] interim period times 715 with a command spacing time value constant {a length}” of the audio samples {of the user’s another voice} “corresponding to an expected interim time value {with a length…} between commands in a valid command data set {…of a predetermined response}; Cech, ¶¶ [0060]); and determining that the determined word in the another inquiry is the predetermined keyword indicating the user’s intention (“Tracking the interim periods during known command audio signal transmission {…based on the comparison of the length of the user’s another voice with the length of the predetermined response} is one aspect of training a speech recognition system to identify a voice token {determines that the word in the another inquiry…} as either a keyword phrase or command or a portion of a keyword phrase or command {…is the predetermined keyword…}.”; Cech, ¶¶ [0060]) in response to determining that a difference between the length of the user’s another voice and the length of the predetermined response is within a predetermined range (Further, the system is “Tracking the interim periods during known command audio signal transmission... to identify a voice token as either a keyword phrase or command or a portion of a keyword phrase or command,” where a keyword phrase or command can “control vehicle operation {a keyword which indicates the user’s intention}” and where the comparison is based on an “expected interim time value {...is within a predetermined range}”; Cech, ¶¶ [0060], [0062]), the predetermined keyword being one of the plurality of predetermined keywords (The system compares to “commands {predetermined keywords} in a valid command data set {...being one of the plurality of predetermined keywords}”; Cech, ¶¶ [0060]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and methods for a context-aware virtual agent of Chen to incorporate the teachings of Cech to include compares a length of the user’s another voice with a length of a predetermined response; and determines that the determined word in the another inquiry is the predetermined keyword indicating the user’s intention in response to determining that a difference between the length of the user’s another voice and the length of the predetermined response is within a predetermined range, the predetermined keyword being one of the plurality of predetermined keywords. The speech recognition system uses length of the audio signal, among other “out-of-band information,” which provides a “credible way … to double check a perceived speech input,” as recognized by Cech. (Cech, ¶ [0006]). However, Chen and Cech fails to expressly recite in response to failing to determine the positive response, the negative response, or the predetermined keyword indicating the user’s intention based on the user’s voice response to the initial inquiry, the user’s another voice being generated in response to the another inquiry, making the initial inquiry again so as to encourage reaction by the corresponding predetermined action, facial expression, or line of sight, and determining the positive response, the negative response, or the predetermined keyword by recognizing the predetermined action, the facial expression, or the line of sight of the user based on the user's image, which is the user’s reaction in response to the another inquiry made by the inquiry means, wherein the method further comprises: storing user profile information in which information indicating by which one of a predetermined action, a facial expression, and a line of sight the user should be encouraged to react to the another inquiry is set for each user making the initial inquiry again so as to encourage reaction by the corresponding predetermined action, facial expression, or line of sight for each of the users based on the stored user profile information.
The relevance of Wolverton is disclosed above with relation to claim 1. Regarding claim 6, Wolverton discloses determines that the determined word in the another inquiry is the predetermined keyword indicating the user’s intention (The system distinguishes situation aware dialogues where “in situation-aware dialogs, adjectives (e.g., “wavy lines” or “orange”) may be extracted and assigned a higher importance.”; Wolverton, ¶¶ [0103]) in response to failing to determine the positive response, the negative response, or the predetermined keyword indicating the user’s intention based on the user’s voice response to the initial inquiry (the system “evaluates whether the input 102 is clear as to the user’s intent and objectives” where “if the user’s goal or intent is not clear from the input 102, the method 500 solicits additional input from the user at block 512 and returns to block 502” where the evaluation “may be based on, for example, a comparison of the current input 102 to previously-received inputs 102 and corresponding responses issued by the vehicle personal assistant 112.” resulting in an information request, where the “information request may further be...a ‘situation-aware query’,”; Wolverton, ¶¶ [0105]), the user’s another voice being generated in response to the another inquiry (“upon receiving a suggestion from the vehicle personal assistant 112, the user may respond with a follow-up question or statement”; Wolverton, ¶¶ [0089]), making the initial inquiry again so as to encourage reaction by the corresponding predetermined action, facial expression, or line of sight (The system “solicits additional input from the user” where the solicited input can include “the user’s gesture 154, gaze 156, and facial features or expressions 160”; Wolverton, ¶¶ [0105], [0102]), and determining the positive response, the negative response, or the predetermined keyword by recognizing the predetermined action, the facial expression, or the line of sight of the user based on the user's image (The system “analyzes the input 102 to determine the user’s intended meaning thereof “ where the inputs can include “the user’s voice 152... the user’s gesture 154, gaze 156, and facial features or expressions 160; … [and] the user’s touch 158.”; Wolverton, ¶¶ [0103], [0102]), which is the user’s reaction in response to the another inquiry made by the inquiry means (The input 102 is in response to the solicited additional input.; Wolverton, ¶¶ [0105]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and methods for a context-aware virtual agent of Chen as modified by the sample time measurement for speech recognition taught in Cech, to incorporate the teachings of Wolverton to include in response to failing to determine the positive response, the negative response, or the predetermined keyword indicating the user’s intention based on the user’s voice response to the initial inquiry, the user’s another voice being generated in response to the another inquiry, making the initial inquiry again so as to encourage reaction by the corresponding predetermined action, facial expression, or line of sight, and determining the positive response, the negative response, or the predetermined keyword by recognizing the predetermined action, the facial expression, or the line of sight of the user based on the user's image, which is the user’s reaction in response to the another inquiry made by the inquiry means. By using contextual input, the “vehicle personal assistant 112 can determine or “infer” a likely current context of a dialog in which it is engaged with a person in order to improve its understanding of the person's goal or intent with respect to the dialog,” thereby improving the user experience, as recognized by Wolverton. (Wolverton, ¶ [0024]). However, Chen, Cech, and Wolverton fails to expressly recite wherein the method further comprises: storing user profile information in which information indicating by which one of a predetermined action, a facial expression, and a line of sight the user should be encouraged to react to the another inquiry is set for each user making the initial inquiry again so as to encourage reaction by the corresponding predetermined action, facial expression, or line of sight for each of the users based on the stored user profile information.
The relevance of Divakaran is disclosed above with relation to claim 1. Regarding claim 6, Divakaran teaches wherein the method further comprises: storing user profile information (“a multi-modal virtual personal assistant can also include a preference model, which can be tailored for a particular population and/or for one or more individual people” stored as part of a database, such as database 820; Divakaran, ¶¶ [0041], [0107]) in which information indicating by which one of a predetermined action, a facial expression, and a line of sight the user should be encouraged to react to the another inquiry is set for each user (“The preference model can also store characteristics and traits about a person, such as a propensity for speaking very quickly when anxious. The various audible, visual, and tactile information that can be input into the virtual personal assistant can be modified by the preference model to adjust for, for example, accents, cultural differences in the meaning of gestures, regional peculiarities, personal characteristics, and so on,” where adjusting for said differences and distinctions is encouragement to react to the inquiry, and where the preference model is specific to the person {set for each user}; Divakaran, ¶¶ [0041]), making the initial inquiry again so as to encourage reaction by the corresponding predetermined action, facial expression, or line of sight for each of the users based on the stored user profile information (The preference model, as used throughout, is discussed in further detail in FIG. 15. The system discloses that “the programmed preferences 1542 and/or learned preferences 1544 of [the preference model] can be applied to the inputs 1510, 1520, 1530, to filter and/or adjust the inputs according to the preferences 1542, 1544.” Thus, the inputs {inquiry} are based on the preference model {user profile information} stored in the database {storage means}; Divakaran, ¶¶ [0173], [0107]). 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and methods for a context-aware virtual agent of Chen as modified by the sample time measurement for speech recognition taught in Cech, and as modified by the in-vehicle personal assistant of Wolverton, to incorporate the teachings of Divakaran to include wherein the method further comprises: storing user profile information in which information indicating by which one of a predetermined action, a facial expression, and a line of sight the user should be encouraged to react to the another inquiry is set for each user making the initial inquiry again so as to encourage reaction by the corresponding predetermined action, facial expression, or line of sight for each of the users based on the stored user profile information. The multi-modal virtual assistant can “comprehend non-verbal conversational cues” which can allow the assistant “to interact with a person in a natural way,” as recognized by Divakaran. (Divakaran, ¶ [0037]).

Regarding claim 7, Chen discloses An interaction system comprising (the virtual agent implementing the method 300; Chen, ¶¶ [0037]): storage unit for storing a user preference database (“Computer-readable instructions stored on a computer-readable storage device are executable by the processing unit 1402 of the machine 1400. A hard drive, CD-ROM, and RAM are some examples of articles including a non-transitory computer-readable medium such as a storage device.”; Chen, ¶¶ [0119]); inquiry unit for outputting an electronic voice signal of an initial inquiry to a user (“the virtual agent” implementing the method 300 {using an inquiry unit} “provides a question {making an inquiry} and a set of acceptable answers (choices) to the user” where the “human-to-agent interaction may take the form of one or more of text (e.g., a chat session), graphics (e.g., a video conference), or audio (e.g., a voice conversation)” thus the inquiry to the user can be made by a voice.; Chen, ¶¶ [0032], [0037], [0103]); a microphone configured to capture a user’s voice response (“the virtual agent 102 may detect the user 104 has accessed the virtual agent webpage at operation 106” such as by “speaking … into a microphone,”; Chen, ¶¶ [0026]); and intention determination unit for determining a user’s intention (“the method 300,” where the method 300 as performed by the “processor operating on a computer system” is the intention determination unit, either determines that the user response is an exact match to an answer provided at operation 204 or “at operation 320, determin[es] whether the answer provided by the user, at operation 206, corresponds to an answer provided (e.g., is not an exact match but the virtual agent may conclude with some degree of certainty that the user intended to select the answer)” thus determining the user’s intention.; Chen, ¶¶ [0037]) based on the user’s voice response in response to the initial inquiry made by the inquiry unit (The determination of intention to select one or more answers, is based on the “answer provided by the user” where the answer is provided in response to the question and set of answers from the virtual agent, and where the answer provided by the user is spoken {e.g., “speaking the choice verbatim” as part of “a voice conversation”}; Chen, ¶¶ [0032]-[0033], [0037], [0103]), a plurality of predetermined keywords being set in the intention determination unit, (“Embodiments herein may provide a virtual agent that is capable of understanding and selecting a choice to which an unexpected user response corresponds.” where “The virtual agent 102 may select the choice... in response to receiving such a word, phrase, or symbol from the user 104.” where the selection of a choice in the form of “other word[s], phrase[s], or symbol[s]” is a plurality of keywords being set corresponding to the user’s intent {intention determination unit}.; Chen, ¶¶ [0030]) wherein, in response to the intention determination unit failing to determine a positive response, a negative response, or a predetermined keyword indicating the user’s intention based on the user’s voice response to the initial inquiry made by the inquiry unit (The virtual agent {implementing the method 300, thus including the intention determination unit} can provide a “prompt (e.g., question) and choices (options the user may select to respond to the prompt). In response, the virtual agent expects, verbatim, the user to respond with a given choice of the choices” where the choices can be “‘YES’ {positive response} and ‘NO {negative response}’” and verbatim response of a “given choice of the choices” is a predetermined keyword, and where the virtual agent can “determin[e]... that the response provided by the user does not correspond to an answer provided {failing to determine a response indicating the user’s intention}” and “In response to determining, at operation 320, that the response provided by the user does not correspond to an answer provided, the virtual agent may determine that the user is off-track and perform remediation operation 324. The remediation operation 324 may include... ask[ing] the user a (new) question and provide answers”; Chen, ¶¶ [0029], [0038]), the inquiry unit determines a word to be included in another inquiry by looking up a preference of the user in the user preference database (The system looks up possible response equivalents in “a model configured to determine a semantic similarity {determines a word by looking up a preference}” where the “model is configured to detect semantic similarity between a previous response and a current response. {the previous response and the semantic similarity being stored in a database, thus a user preference database}”; Chen, ¶¶ [0053], [0077]) and outputs an electronic voice signal of the another inquiry including the determined word to the user (The system may ask “Follow up questions … to resolve an ambiguity,” where “for the ‘user repeat’ taxonomy, the conversation controller 910 may choose the next best intent, excluding intents that were tried previously in the conversation, and follow the dialog script {determined word} corresponding to that intent.”; Chen, ¶¶ [0072], [0097]), the intention determination unit: determines the positive response, the negative response, or the predetermined keyword based on a user’s image or a user’s another voice (“After operation 326, the method 300 may continue at operation 206.” Thus, as depicted in FIG. 3 and described in the accompanying paragraphs, operation 206 is performed after asking a new question at 326, where the user’s response is the “given choice of the choices” where the choices can be “‘YES’ {positive response} and ‘NO {negative response}’” and verbatim response of a “given choice of the choices” is a predetermined keyword, where the virtual agent will receive user response {user’s reaction} in response to the new question {the another inquiry}, and where the user response is a voice response {based on the user’s voice}; Chen, ¶¶ [0029], [0038]; FIG. 3), which is a user’s reaction in response to the another inquiry made by the inquiry unit (operation 206, in response to a new question provided by the virtual agent at operation 326, is a new user response {user’s reaction} in response to a new question {the another inquiry} made by the virtual assistant {the inquiry unit}; Chen, ¶¶ [0038]; FIG. 3) wherein the inquiry unit makes the initial inquiry again so as to encourage the user to react… (The inquiry unit, as incorporated in the virtual agent performing the method 300, “may ask the user a (new) question {makes the inquiry again} and provide answers or provide a non-question message to the user, at operation 326. After operation 326, the method 300 may continue at operation 206,” As operation 206 is receiving a user response to the new question, the new question {the inquiry again} encourages the user to provide the user response {to react}”; Chen, ¶¶ [0038], FIG. 3). However, Chen fails to expressly recite compares a length of the user’s another voice with a length of a predetermined response; and determines that the determined word in the another inquiry is the predetermined keyword indicating the user’s intention in response to determining that a difference between the length of the user’s another voice and the length of the predetermined response is within a predetermined range, the predetermined keyword being one of the plurality of predetermined keywords, in response to the intention determination unit failing to determine the positive response, the negative response, or the predetermined keyword indicating the user’s intention based on the user’s voice response to the initial inquiry made by the inquiry unit, the user’s another voice being generated in response to the another inquiry, wherein the inquiry unit makes the initial inquiry again so as to encourage the user to react by a predetermined action, a facial expression, or a line of sight, the intention determination unit determines the positive response, the negative response, or the predetermined keyword by recognizing the action, the facial expression, or the line of sight of the user based on the user’s image, which is the user’s reaction in response to the another inquiry made by the inquiry unit, the storage unit stores user profile information in which information indicating by which one of the action, the facial expression, and the line of sight the user should be encouraged to react to the another inquiry is set for each user, and the inquiry unit makes the initial inquiry again so as to encourage reaction by the corresponding predetermined action, the facial expression, or the line of sight for each of the users based on the user profile information stored in the storage unit.
The relevance of Cech is disclosed above with relation to claim 1. Cech discloses systems and methods for automated speech recognition including the measurement of audio sample times for keyword detection. (Cech, ¶ [0007]). Regarding claim 7, Cech teaches compares a length of the user’s another voice with a length of a predetermined response (“In some embodiments, the audio samples are the above described voice tokens 45 that have been parsed from at least one speech input 42” where the speech and keyword phrase and command recognition includes “compar[ing] interim period times 715 with a command spacing time value constant {a length}” of the audio samples {of the user’s another voice} “corresponding to an expected interim time value {with a length…} between commands in a valid command data set {…of a predetermined response}; Cech, ¶¶ [0060]); and determines that the determined word in the another inquiry is the predetermined keyword indicating the user’s intention (“Tracking the interim periods during known command audio signal transmission {…based on the comparison of the length of the user’s another voice with the length of the predetermined response} is one aspect of training a speech recognition system to identify a voice token {determines that the word in the another inquiry…} as either a keyword phrase or command or a portion of a keyword phrase or command {…is the predetermined keyword…}.”; Cech, ¶¶ [0060]) in response to determining that a difference between the length of the user’s another voice and the length of the predetermined response is within a predetermined range (Further, the system is “Tracking the interim periods during known command audio signal transmission... to identify a voice token as either a keyword phrase or command or a portion of a keyword phrase or command,” where a keyword phrase or command can “control vehicle operation {a keyword which indicates the user’s intention}” and where the comparison is based on an “expected interim time value {...is within a predetermined range}”; Cech, ¶¶ [0060], [0062]), the predetermined keyword being one of the plurality of predetermined keywords (The system compares to “commands {predetermined keywords} in a valid command data set {...being one of the plurality of predetermined keywords}”; Cech, ¶¶ [0060]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and methods for a context-aware virtual agent of Chen to incorporate the teachings of Cech to include compares a length of the user’s another voice with a length of a predetermined response; and determines that the determined word in the another inquiry is the predetermined keyword indicating the user’s intention in response to determining that a difference between the length of the user’s another voice and the length of the predetermined response is within a predetermined range, the predetermined keyword being one of the plurality of predetermined keywords. The speech recognition system uses length of the audio signal, among other “out-of-band information,” which provides a “credible way … to double check a perceived speech input,” as recognized by Cech. (Cech, ¶ [0006]). However, Chen and Cech fails to expressly recite in response to the intention determination unit failing to determine the positive response, the negative response, or the predetermined keyword indicating the user’s intention based on the user’s voice response to the initial inquiry made by the inquiry unit, the user’s another voice being generated in response to the another inquiry, wherein the inquiry unit makes the initial inquiry again so as to encourage the user to react by a predetermined action, a facial expression, or a line of sight, the intention determination unit determines the positive response, the negative response, or the predetermined keyword by recognizing the action, the facial expression, or the line of sight of the user based on the user’s image, which is the user’s reaction in response to the another inquiry made by the inquiry unit, the storage unit stores user profile information in which information indicating by which one of the action, the facial expression, and the line of sight the user should be encouraged to react to the another inquiry is set for each user, and the inquiry unit makes the initial inquiry again so as to encourage reaction by the corresponding predetermined action, the facial expression, or the line of sight for each of the users based on the user profile information stored in the storage unit.
The relevance of Wolverton is disclosed above with relation to claim 1. Regarding claim 7, Wolverton discloses determines that the determined word in the another inquiry is the predetermined keyword indicating the user’s intention (The system distinguishes situation aware dialogues where “in situation-aware dialogs, adjectives (e.g., “wavy lines” or “orange”) may be extracted and assigned a higher importance.”; Wolverton, ¶¶ [0103]) and in response to the intention determination unit failing to determine the positive response, the negative response, or the predetermined keyword indicating the user’s intention based on the user’s voice response to the initial inquiry made by the inquiry unit (the system “evaluates whether the input 102 is clear as to the user’s intent and objectives” where “if the user’s goal or intent is not clear from the input 102, the method 500 solicits additional input from the user at block 512 and returns to block 502” where the evaluation “may be based on, for example, a comparison of the current input 102 to previously-received inputs 102 and corresponding responses issued by the vehicle personal assistant 112.” resulting in an information request, where the “information request may further be...a ‘situation-aware query’,”; Wolverton, ¶¶ [0105]), the user’s another voice being generated in response to the another inquiry (“upon receiving a suggestion from the vehicle personal assistant 112, the user may respond with a follow-up question or statement”; Wolverton, ¶¶ [0089]), wherein the inquiry unit makes the initial inquiry again so as to encourage the user to react by a predetermined action, a facial expression, or a line of sight (The system “solicits additional input from the user” where the solicited input can include “the user’s gesture 154, gaze 156, and facial features or expressions 160”; Wolverton, ¶¶ [0105], [0102]), the intention determination unit determines the positive response, the negative response, or the predetermined keyword by recognizing the action, the facial expression, or the line of sight of the user based on the user’s image (The system “analyzes the input 102 to determine the user’s intended meaning thereof “ where the inputs can include “the user’s voice 152... the user’s gesture 154, gaze 156, and facial features or expressions 160; … [and] the user’s touch 158.”; Wolverton, ¶¶ [0103], [0102]), which is the user’s reaction in response to the another inquiry made by the inquiry unit (The input 102 is in response to the solicited additional input.; Wolverton, ¶¶ [0105]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and methods for a context-aware virtual agent of Chen as modified by the sample time measurement for speech recognition taught in Cech, to incorporate the teachings of Wolverton to include determines that the determined word in the another inquiry is the predetermined keyword indicating the user’s intention  and in response to the intention determination unit failing to determine the positive response, the negative response, or the predetermined keyword indicating the user’s intention based on the user’s voice response to the initial inquiry made by the inquiry unit, the user’s another voice being generated in response to the another inquiry, wherein the inquiry unit makes the initial inquiry again so as to encourage the user to react by a predetermined action, a facial expression, or a line of sight, the intention determination unit determines the positive response, the negative response, or the predetermined keyword by recognizing the action, the facial expression, or the line of sight of the user based on the user’s image, which is the user’s reaction in response to the another inquiry made by the inquiry unit. By using contextual input, the “vehicle personal assistant 112 can determine or “infer” a likely current context of a dialog in which it is engaged with a person in order to improve its understanding of the person's goal or intent with respect to the dialog,” thereby improving the user experience, as recognized by Wolverton. (Wolverton, ¶ [0024]). However, Chen, Cech, and Wolverton fails to expressly recite the storage unit stores user profile information in which information indicating by which one of the action, the facial expression, and the line of sight the user should be encouraged to react to the another inquiry is set for each user, and the inquiry unit makes the initial inquiry again so as to encourage reaction by the corresponding predetermined action, the facial expression, or the line of sight for each of the users based on the user profile information stored in the storage unit.
The relevance of Divakaran is disclosed above with relation to claim 1. Regarding claim 7, Divakaran teaches the storage unit stores user profile information (“a multi-modal virtual personal assistant can also include a preference model, which can be tailored for a particular population and/or for one or more individual people” stored as part of a database, such as database 820; Divakaran, ¶¶ [0041], [0107]) in which information indicating by which one of the action, the facial expression, and the line of sight the user should be encouraged to react to the another inquiry is set for each user (“The preference model can also store characteristics and traits about a person, such as a propensity for speaking very quickly when anxious. The various audible, visual, and tactile information that can be input into the virtual personal assistant can be modified by the preference model to adjust for, for example, accents, cultural differences in the meaning of gestures, regional peculiarities, personal characteristics, and so on,” where adjusting for said differences and distinctions is encouragement to react to the inquiry, and where the preference model is specific to the person {set for each user}; Divakaran, ¶¶ [0041]), and the inquiry unit makes the initial inquiry again so as to encourage reaction by the corresponding predetermined action, the facial expression, or the line of sight for each of the users based on the user profile information stored in the storage unit (The preference model, as used throughout, is discussed in further detail in FIG. 15. The system discloses that “the programmed preferences 1542 and/or learned preferences 1544 of [the preference model] can be applied to the inputs 1510, 1520, 1530, to filter and/or adjust the inputs according to the preferences 1542, 1544.” Thus, the inputs {inquiry} are based on the preference model {user profile information} stored in the database {storage unit}; Divakaran, ¶¶ [0173], [0107]). 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and methods for a context-aware virtual agent of Chen as modified by the sample time measurement for speech recognition taught in Cech, and as modified by the in-vehicle personal assistant of Wolverton, to incorporate the teachings of Divakaran to include the storage unit stores user profile information in which information indicating by which one of the action, the facial expression, and the line of sight the user should be encouraged to react to the another inquiry is set for each user, and the inquiry unit makes the initial inquiry again so as to encourage reaction by the corresponding predetermined action, the facial expression, or the line of sight for each of the users based on the user profile information stored in the storage unit. The multi-modal virtual assistant can “comprehend non-verbal conversational cues” which can allow the assistant “to interact with a person in a natural way,” as recognized by Divakaran. (Divakaran, ¶ [0037]).

Claim 4 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chen, Cech, Wolverton, and Divakaran as applied to claim 1 above, and further in view of Yamada (U.S. Pat. App. Pub. No. 2007/0276659, hereinafter Yamada). 

Regarding claim 4, the rejection of claim 1 is incorporated. Chen, Cech, Wolverton, and Divakaran disclose all of the elements of the current invention as stated above. Chen further discloses wherein the inquiry means makes the initial inquiry again so as to encourage the user to make a predetermined response by a voice (“In response to determining, at operation 320, that the response provided by the user does not correspond to an answer provided, the virtual agent may determine that the user is off-track and perform remediation operation 324. The remediation operation 324 may include... ask[ing] the user a (new) question and provide answers..., at operation 326 {makes the initial inquiry again}” where the “human-to-agent interaction may take the form of one or more of text (e.g., a chat session), graphics (e.g., a video conference), or audio (e.g., a voice conversation)” thus the inquiry to the user can be made by a voice. Further, a voice conversation encourages the user to provide a response, including by voice, and where the “provided answers” are a predetermined response.; Chen, ¶¶ [0038], [0103]; FIG. 3), and the intention determination means determines the positive response, the negative response, or the predetermined keyword by [speech recognition] of the user's another voice based on the user's another voice (the virtual agent will determine the user's “given choice of the choices” {determines the positive response, the negative response, or the predetermined keyword} where the choices can be “'YES' {positive response} and 'NO {negative response}'“ and verbatim response of a “given choice of the choices” is a predetermined keyword, based on the received user response at operation 206. The virtual agent can further “determine... whether an unexpected user response (a response that is not included in a list of expected responses) corresponds to an answer provided at operation...26.” Further, as “the human-to-agent interaction may take the form of... a voice conversation,” the user response is received by speech recognition, based on the voice-based user response {the user's voice}.; Chen, ¶¶ [0029], [0038]-[0039], [0103]; FIG. 3), which is a user's response to the another inquiry (operation 206, in response to a new question provided by the virtual agent at operation 326, is a new user response {user's reaction} in response to a new question {another inquiry} made by the virtual assistant {inquiry means}; Chen, ¶¶ [0038]; FIG. 3). However, Chen and Cech fails to expressly recite wherein speech recognition includes recognizing prosody of the user's voice.
Yamada teaches “an apparatus and a method for identifying prosody on the basis of features of input speech and an apparatus and a method for recognizing speech using the prosody identification.” (Yamada, ¶ [0003]). Regarding claim 4, Yamada discloses, wherein speech recognition includes recognizing prosody of the user's voice (discloses that, in some cases, “human utterance speech cannot be identified by using phonetic information. For example, in Japanese language, “UN” that indicates an affirmative answer {positive response} is phonetically similar to “UUN” that indicates a negative answer {negative response}.” therefore the system and method can include “identifying prosody on the basis of an amount of change in movement of a feature distribution obtained from an autocorrelation matrix of a frequency characteristic of the input speech...performing speech recognition on the basis of features acquired by sound-analyzing the speech input... [and] integrat[ing] the output from the prosody identifying means with an output of the speech recognizing means”; Yamada, ¶¶ [0007], [0013])
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and methods for a context-aware virtual agent of Chen as modified by the sample time measurement for speech recognition taught in Cech, as modified by the in-vehicle personal assistant of Wolverton, and as modified by the multi-modal virtual assistant systems of Divakaran, to incorporate the teachings of Yamada to include wherein speech recognition includes recognizing prosody of the user's voice. The “prosody information” can be used to identify “human utterance speech [which] cannot be identified by using phonetic information” which creates a more robust speech recognition system, as recognized by Yamada. (Yamada, ¶¶ [0007], [0014]).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Anders et al. (U.S. Pat. App Pub. No. 2019/0325864) discloses systems and methods for automated assistants which can accommodate different age groups and speech capabilities.

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Sean E. Serraguard whose telephone number is (313)446-6627. The examiner can normally be reached 07:00-17:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel C. Washburn can be reached on (571) 272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Sean E Serraguard/Patent Examiner, Art Unit 2657        
/LAMONT M SPOONER/Primary Examiner, Art Unit 2657                                                                                                                                                                                                                                                            
12/12/2022