Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
All objections/rejections not mentioned in this Office Action have been withdrawn by the Examiner.

Response to Amendments 
Applicant’s amendment filed on October 20, 2021 has been entered. 
In view of the amendment to the claim(s), the amendment of claim(s) 1 and 3-7 have been acknowledged and entered.  
In view of the amendment to claim(s) 5 and 6, the rejection of claim(s) 5 and 6 under 35 U.S.C. §112 is withdrawn.
In view of the amendment to claim(s) 1 and 3-7, the rejection of claims 1-7 under 35 U.S.C. §102 and 35 U.S.C. §103 is withdrawn.
In light of the amended claims, new grounds for rejection under 35 U.S.C. §103 are provided in the response below. 

Response to Arguments
Applicant’s arguments regarding the subject matter rejections under 35 U.S.C. §101, see pages 7-11 of the Response to Non-Final Office Action dated Non-Final Office Action dated September 28, 2021, which was received on October 20, 2021, have been fully considered but they are not persuasive.
Regarding the 35 U.S.C. §101 rejections, applicant first asserts that the claimed subject matter is not directed to an abstract idea, specifically asserting that the claims are not drawn to methods of organizing human activity.  Applicant references the two-prong test described in the USPTO 2019 Revised Patent Subject Matter Eligibility Guidance issued on January 4, 2019 in 
As explained in the Office Action, in light of the broadest reasonable interpretation of the claim, the claim describes certain methods of organization of human activity. MPEP 2106.04(a) provides specific examples of methods of organizing human activity which fall under the umbrella of abstract ideas, including “managing personal behavior or relationships or interactions between people (including social activities, teaching, and following rules or instructions).”  As previously provided, the BRI of the claim relates to a dialogue between two people, where one person attempts to determine the emotional state of another person and asks a follow-up question. (See Office Action, pg. 6). A method for dialogue including the determination of a person’s mental state, recognizing context and asking follow-up question is “managing personal behavior or relationships or interactions between people,” and, as such, is a method of organizing a human activity under grouping (2). (MPEP 2106.04(a)) Therefore, this argument is not persuasive.
The applicant then presents that “Even if, arguendo, the instant claims recite a judicial exception… the claims nevertheless are integrated into a practical application.” (Response, pg. 9). Applicant specifically argues that the use of microphones and “inquiry means for outputting an electronic voice signal,” “processes of identifying the intention of a user,” as well as the determination of other words using user preferences, and “determines that the word in the another inquiry is the predetermined keyword based on the comparison of the length of the user's another voice with the length of the predetermined response,” as evidence that the claims are integrated into a practical application.  (Response, pg. 9). This argument, as applied to the amended claim language, is persuasive. Therefore, the rejection of the claims under 35 U.S.C. §101 is withdrawn.
Applicant’s arguments regarding the prior art rejections under 35 U.S.C. §102/103, see pages 12-13 of the Response
With respect to the rejection(s) of claim(s) 1 and 5-7 under 35 U.S.C. §102 as being anticipated by (U.S. Pat. App. Pub. No. 2020/0007380, hereinafter Chen), applicant provides two arguments. First, applicant asserts that Chen fails to teach or suggest “compares a length of the user's another voice with a length of a predetermined response; and determines that the word in the another inquiry is the predetermined keyword.”  Second, applicant asserts that Chen fails to teach or suggest “in response to the intention determination means failing to determine a positive response, a negative response, or a predetermined keyword indicating the user's intention based on the user's voice response in response to the inquiry made by the inquiry means, the inquiry means determines a word to be included in another inquiry by looking up a user preference in the user preference database and outputs an electronic voice signal of the another inquiry including the determined word to the user.” 
Regarding the first argument, this argument is persuasive. Therefore, the rejection of claims 1 and 5-7 under 35 U.S.C. §102 is withdrawn.
Applicant further argues that dependent claims 2-4 are allowable for at least the same reasons as independent claim 1. Applicant’s arguments in light of the amended claims are persuasive. As such, the rejections of claims 2-4 under 35 U.S.C. §103 are withdrawn.
However, upon further consideration, new ground(s) of rejection under 35 U.S.C. §103 are made in light of combinations of Chen, Divakaran (U.S. Pat. App. Pub. No. 2017/0160813, hereinafter Divakaran), Yamada (U.S. Pat. App. Pub. No. 2007/0276659, hereinafter Yamada), and newly cited reference Cech (U.S. Pat. App. Pub. No. 2018/0286404, hereinafter Cech).
Regarding the second argument and as discussed in the Interview dated September 28, 2021, Chen discloses the above described amended elements. The mapping of specific elements to the cited reference is presented in detail in the rejection below.
The Applicant has not provided any further statement and therefore, the Examiner directs the Applicant to the below rationale.	 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1 and 5-7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chen (U.S. Pat. App. Pub. No. 2020/0007380, hereinafter Chen) in view of Cech (U.S. Pat. App. Pub. No. 2018/0286404, hereinafter Cech)

Regarding claim 1, Chen discloses an interaction system comprising (the virtual agent implementing the method 300; Chen, ¶¶ [0037]): storage means for storing a user preference (“Computer-readable instructions stored on a computer-readable storage device are executable by the processing unit 1402 of the machine 1400. A hard drive, CD-ROM, and RAM are some examples of articles including a non-transitory computer-readable medium such as a storage device.”; Chen, ¶ [0119]); inquiry means for outputting an electronic voice signal of an inquiry to a user (“the virtual agent” implementing the method 300 {using an inquiry means} “provides a question {making an inquiry} and a set of acceptable answers (choices) to the user” where the “human-to-agent interaction may take the form of one or more of text (e.g., a chat session), graphics (e.g., a video conference), or audio (e.g., a voice conversation)” thus the inquiry to the user can be made by a voice.; Chen, ¶¶ [0032], [0037], [0103]); a microphone configured to capture a user’s voice response (“the virtual agent 102 may detect the user 104 has accessed the virtual agent webpage at operation 106” such as by “speaking … into a microphone,”; Chen, ¶¶ [0026]); and intention determination means for determining a user's intention (“the method 300,” where the method 300 as performed by the “processor operating on a computer system” is the intention determination means, either determines that the user response is an exact match to an answer provided at operation 204 or “at operation 320, determin[es] whether the answer provided by the user, at operation 206, corresponds to an answer provided (e.g., is not an exact match but the virtual agent may conclude with some degree of certainty that the user intended to select the answer)” thus determining the user's intention.; Chen, ¶¶ [0037]) based on the user's voice response in response to the inquiry made by the inquiry means, (The determination of intention to select one or more answers, is based on the “answer provided by the user” where the answer is provided in response to the question and set of answers from the virtual agent, and where the answer provided by the user is spoken {e.g., “speaking the choice verbatim” as part of “a voice conversation”}; Chen, ¶¶ [0032]-[0033], [0037], [0103]) wherein, in response to the intention determination means failing to determine a positive response, a negative response, or a predetermined keyword indicating the user's intention based on the user's voice response in response to the inquiry made by the inquiry means (The virtual agent {implementing the method 300, thus including the intention determination means} can provide a “prompt (e.g., question) and choices (options the user may select to respond to the prompt). In response, the virtual agent expects, verbatim, the user to respond with a given choice of the choices” where the choices can be “'YES' {positive response} and 'NO {negative response}'“ and verbatim response of a “given choice of the choices” is a predetermined keyword, and where the virtual agent can “determin[e]... that the response provided by the user does not correspond to an answer provided {failing to determine a response indicating the user's intention}” and “In response to determining, at operation 320, that the response provided by the user does not correspond to an answer provided, the virtual agent may Chen, ¶¶ [0029], [0038]; FIG. 3), the inquiry means determines a word to be included in another inquiry by looking up a preference of the user in the user preference database (The system looks up possible response equivalents in "a model configured to determine a semantic similarity {determines a word by looking up a preference}" where the "model is configured to detect semantic similarity between a previous response and a current response. {the previous response and the semantic similarity being stored in a database, thus a user preference database}"; Chen, ¶¶ [0053], [0077]) and outputs an electronic voice signal of the another inquiry including the determined word to the user (The system may ask "Follow up questions … to resolve an ambiguity," where "for the ‘user repeat’ taxonomy, the conversation controller 910 may choose the next best intent, excluding intents that were tried previously in the conversation, and follow the dialog script {determined word} corresponding to that intent." ; Chen, ¶¶ [0072], [0097]), the intention determination means determines the positive response, the negative response, or the predetermined keyword based on a user's image or a user's another voice (“After operation 326, the method 300 may continue at operation 206.” Thus, as depicted in FIG. 3 and described in the accompanying paragraphs, operation 206 is performed after asking a new question at 326, where the user’s response is the “given choice of the choices” where the choices can be “'YES' {positive response} and 'NO {negative response}'“ and verbatim response of a “given choice of the choices” is a predetermined keyword, where the virtual agent will receive user response {user's reaction} in response to the new question {the another inquiry}, and where the user response is a voice response {based on the user’s voice}; Chen, ¶¶ [0029], [0038]; FIG. 3), which is a user's reaction in response to the another inquiry made by the inquiry means (operation 206, in response to a new question provided by the virtual agent at operation 326, is a new user response {user's reaction} in response to a new question {the another inquiry} made by the virtual assistant {the inquiry means}; Chen, ¶¶ [0038]; FIG. 3). However, Chen fails to 
Cech discloses systems and methods for automated speech recognition including the measurement of audio sample times for keyword detection. (Cech, ¶ [0007]). Regarding claim 1, Cech teaches compares a length of the user's another voice with a length of a predetermined response (“In some embodiments, the audio samples are the above described voice tokens 45 that have been parsed from at least one speech input 42” where the speech and keyword phrase and command recognition includes “compar[ing] interim period times 715 with a command spacing time value constant {a length}” of the audio samples {of the user’s another voice} “corresponding to an expected interim time value {with a length…} between commands in a valid command data set {…of a predetermined response}. Cech, ¶ [0060]); and determines that the word in the another inquiry is the predetermined keyword based on the comparison of the length of the user's another voice with the length of the predetermined response (“Tracking the interim periods during known command audio signal transmission {…based on the comparison of the length of the user's another voice with the length of the predetermined response} is one aspect of training a speech recognition system to identify a voice token {determines that the word in the another inquiry…} as either a keyword phrase or command or a portion of a keyword phrase or command {…is the predetermined keyword…}.; Cech, ¶ [0060]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and methods for a context-aware virtual agent of Chen to incorporate the teachings of Cech to include compares a length of the user's another voice with a length of a predetermined response; and determines that the word in the another inquiry is the predetermined keyword based on the comparison of the length of the Cech. (Cech, ¶ [0006]).

Regarding claim 5, Chen discloses An interaction method comprising the steps of (the virtual agent implementing the method 300; Chen, ¶¶ [0037]): outputting an electronic voice signal of an inquiry to a user by a voice (“the virtual agent” implementing the method 300 “provides a question {making an inquiry} and a set of acceptable answers (choices) to the user” where the “human-to-agent interaction may take the form of one or more of text (e.g., a chat session), graphics (e.g., a video conference), or audio (e.g., a voice conversation)” thus the inquiry to the user can be made by a voice.; Chen, ¶¶ [0032], [0037], [0103]) capturing, by a microphone, a user's voice response (“the virtual agent 102 may detect the user 104 has accessed the virtual agent webpage at operation 106” such as by “speaking … into a microphone,”; Chen, ¶¶ [0026]); and determining a user's intention (“the method 300” either determines that the user response is an exact match to an answer provided at operation 204 or “at operation 320, determin[es] whether the answer provided by the user, at operation 206, corresponds to an answer provided (e.g., is not an exact match but the virtual agent may conclude with some degree of certainty that the user intended to select the answer)” thus determining the user's intention.; Chen, ¶¶ [0037]) based on the user's voice response in response to the inquiry, (The determination of intention to select one or more answers, is based on the “answer provided by the user” where the answer is provided in response to the question and set of answers from the virtual agent, and where the answer provided by the user is spoken {e.g., “speaking the choice verbatim” as part of “a voice conversation”}; Chen, ¶¶ [0032]-[0033], [0037], [0103]) the method comprising: determining a word to be included in another inquiry by looking up a preference of the user in a user preference database (The system looks up possible response Chen, ¶¶ [0053], [0077]) and outputting an electronic voice signal of the another inquiry including the determined word to the user (The system may ask "Follow up questions … to resolve an ambiguity," where "for the ‘user repeat’ taxonomy, the conversation controller 910 may choose the next best intent, excluding intents that were tried previously in the conversation, and follow {outputting} the dialog script {the another inquiry including the determined word} corresponding to that intent." ; Chen, ¶¶ [0072], [0097]), in response to failing to determine a positive response, a negative response, or a predetermined keyword indicating the user's intention based on the user's voice response in response to the inquiry (The virtual agent {implementing the method 300} can provide a “prompt (e.g., question) and choices (options the user may select to respond to the prompt). In response, the virtual agent expects, verbatim, the user to respond with a given choice of the choices” where the choices can be “'YES' {positive response} and 'NO {negative response}'“ and verbatim response of a “given choice of the choices” is a predetermined keyword, and where the virtual agent can “determin[e]... that the response provided by the user does not correspond to an answer provided {cannot determine a response indicating the user's intention}”; Chen, ¶¶ [0029], [0038]; FIG. 3); determining the positive response, the negative response, or the predetermined keyword based on a user's image or a user's voice (“After operation 326, the method 300 may continue at operation 206.” Thus, as depicted in FIG. 3 and described in the accompanying paragraphs, operation 206 is performed after asking a new question at 326, where the user’s response is the “given choice of the choices” where the choices can be “'YES' {positive response} and 'NO {negative response}'“ and verbatim response of a “given choice of the choices” is a predetermined keyword, where the virtual agent will receive user response {user's reaction} in response to the new question {the another inquiry}, and where the user response is a voice Chen, ¶¶ [0029], [0038]; FIG. 3), which is a user's reaction in response to the another inquiry (operation 206, in response to a new question provided by the virtual agent at operation 326, is a new user response {user's reaction} in response to a new question {the another inquiry} made by the virtual assistant {the inquiry means}; Chen, ¶¶ [0038]; FIG. 3). However, Chen fails to expressly recite comparing a length of the user's another voice with a length of a predetermined response; and determining that the word in the another inquiry is the predetermined keyword based on the comparison of the length of the user's another voice with the length of the predetermined response.
The relevance of Cech is disclosed above with relation to claim 1. Regarding claim 5, Cech teaches comparing a length of the user's another voice with a length of a predetermined response (“In some embodiments, the audio samples are the above described voice tokens 45 that have been parsed from at least one speech input 42” where the speech and keyword phrase and command recognition includes “compar[ing] interim period times 715 with a command spacing time value constant {a length}” of the audio samples {of the user’s another voice} “corresponding to an expected interim time value {with a length…} between commands in a valid command data set {…of a predetermined response}. Cech, ¶ [0060]); and determining that the word in the another inquiry is the predetermined keyword based on the comparison of the length of the user's another voice with the length of the predetermined response (“Tracking the interim periods during known command audio signal transmission {…based on the comparison of the length of the user's another voice with the length of the predetermined response} is one aspect of training a speech recognition system to identify a voice token {determines that the word in the another inquiry…} as either a keyword phrase or command or a portion of a keyword phrase or command {…is the predetermined keyword…}.; Cech, ¶ [0060]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and methods for a context-aware Chen to incorporate the teachings of Cech to include comparing a length of the user's another voice with a length of a predetermined response; and determining that the word in the another inquiry is the predetermined keyword based on the comparison of the length of the user's another voice with the length of the predetermined response. The speech recognition system uses length of the audio signal, among other “out-of-band information,” which provides a “credible way … to double check a perceived speech input,” as recognized by Cech. (Cech, ¶ [0006]).

Regarding claim 6, Chen discloses a non-transitory computer readable medium storing a program for causing a computer to execute the following processing of (the virtual agent implementing the method 300 through “computer-readable instructions stored on a computer-readable storage device are executable by the processing unit 1402 of the machine 1400… including a non-transitory computer-readable medium such as a storage device”; Chen, ¶¶ [0037], [0119]): outputting an electronic voice signal of an inquiry to a user (“the virtual agent” implementing the method 300 “provides a question {making an inquiry}and a set of acceptable answers (choices) to the user” where the “human-to-agent interaction may take the form of one or more of text (e.g., a chat session), graphics (e.g., a video conference), or audio (e.g., a voice conversation)” thus the inquiry to the user can be made by a voice.; Chen, ¶¶ [0032], [0037], [0103]); determining a word to be included in another inquiry by looking up a preference of the user in a user preference database (The system looks up possible response equivalents in "a model configured to determine a semantic similarity {determines a word by looking up a preference}" where the "model is configured to detect semantic similarity between a previous response and a current response. {the previous response and the semantic similarity being stored in a database, thus a user preference database}"; Chen, ¶¶ [0053], [0077]) and outputting an electronic voice signal of the another inquiry to the user (The system may ask "Follow up questions … to resolve an ambiguity," where "for the ‘user repeat’ taxonomy, the Chen, ¶¶ [0072], [0097]), in response to failing to determine a positive response, a negative response, or a predetermined keyword indicating the user's intention based on the user's voice response in response to the inquiry (The virtual agent {implementing the method 300} can provide a “prompt (e.g., question) and choices (options the user may select to respond to the prompt). In response, the virtual agent expects, verbatim, the user to respond with a given choice of the choices” where the choices can be “'YES' {positive response} and 'NO {negative response}'“ and verbatim response of a “given choice of the choices” is a predetermined keyword, and where the virtual agent can “determin[e]... that the response provided by the user does not correspond to an answer provided {cannot determine a response indicating the user's intention}” where the determination of intention to select one or more answers, is based on the “answer provided by the user” where the answer is provided in response to the question and set of answers from the virtual agent, and where the answer provided by the user is spoken {e.g., “speaking the choice verbatim” as part of “a voice conversation”}; Chen, ¶¶ [0029], [0032]-[0033], [0037]-[0038], [0103]; FIG. 3); and determining the positive response, the negative response, or the predetermined keyword based on a user's image or a user's another voice (“After operation 326, the method 300 may continue at operation 206.” Thus, as depicted in FIG. 3 and described in the accompanying paragraphs, operation 206 is performed after asking a new question at 326, where the user’s response is the “given choice of the choices” where the choices can be “'YES' {positive response} and 'NO {negative response}'“ and verbatim response of a “given choice of the choices” is a predetermined keyword, where the virtual agent will receive user response {user's reaction} in response to the new question {the another inquiry}, and where the user response is a voice response {based on the user’s voice}; Chen, ¶¶ [0029], [0038]; FIG. 3), which is a user's reaction in response to the another inquiry (operation 206, in response to a new Chen, ¶¶ [0038]; FIG. 3). However, Chen fails to expressly recite comparing a length of the user's another voice with a length of a predetermined response; and determining that the word in the another inquiry is the predetermined keyword based on the comparison of the length of the user's another voice with the length of the predetermined response.
The relevance of Cech is disclosed above with relation to claim 1. Regarding claim 6, Cech teaches comparing a length of the user's another voice with a length of a predetermined response (“In some embodiments, the audio samples are the above described voice tokens 45 that have been parsed from at least one speech input 42” where the speech and keyword phrase and command recognition includes “compar[ing] interim period times 715 with a command spacing time value constant {a length}” of the audio samples {of the user’s another voice} “corresponding to an expected interim time value {with a length…} between commands in a valid command data set {…of a predetermined response}. Cech, ¶ [0060]); and determining that the word in the another inquiry is the predetermined keyword based on the comparison of the length of the user's another voice with the length of the predetermined response (“Tracking the interim periods during known command audio signal transmission {…based on the comparison of the length of the user's another voice with the length of the predetermined response} is one aspect of training a speech recognition system to identify a voice token {determines that the word in the another inquiry…} as either a keyword phrase or command or a portion of a keyword phrase or command {…is the predetermined keyword…}.; Cech, ¶ [0060]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and methods for a context-aware virtual agent of Chen to incorporate the teachings of Cech to include comparing a length of the user's another voice with a length of a predetermined response; and determining that the word in Cech. (Cech, ¶ [0006]).
 
Regarding claim 7, Chen discloses an interaction system comprising (the virtual agent implementing the method 300; Chen, ¶¶ [0037]): an inquiry unit configured to output an electronic voice signal of an inquiry to a user by a voice (“the virtual agent” implementing the method 300 {using an inquiry means} “provides a question {making an inquiry}and a set of acceptable answers (choices) to the user” where the “human-to-agent interaction may take the form of one or more of text (e.g., a chat session), graphics (e.g., a video conference), or audio (e.g., a voice conversation)” thus the inquiry to the user can be made by a voice.; Chen, ¶¶ [0032], [0037], [0103]); a microphone configured to capture a user’s voice response (“the virtual agent 102 may detect the user 104 has accessed the virtual agent webpage at operation 106” such as by “speaking … into a microphone,”; Chen, ¶¶ [0026]); and an intention determination unit configured to determine a user's intention (“the method 300,” where the method 300 as performed by the “processor operating on a computer system” is the intention determination unit, either determines that the user response is an exact match to an answer provided at operation 204 or “at operation 320, determin[es] whether the answer provided by the user, at operation 206, corresponds to an answer provided (e.g., is not an exact match but the virtual agent may conclude with some degree of certainty that the user intended to select the answer)” thus determining the user's intention.; Chen, ¶¶ [0037]) based on the user's voice response in response to the inquiry made by the inquiry unit, (The determination of intention to select one or more answers, is based on the “answer provided by the user” where the answer is provided in response to the question and set of answers from the virtual agent, and where the answer provided by the user is Chen, ¶¶ [0032]-[0033], [0037], [0103]) wherein, in response to the intention determination unit failing to determine a positive response, a negative response, or a predetermined keyword indicating the user's intention based on the user's voice response in response to the inquiry made by the inquiry unit (The virtual agent {implementing the method 300, thus including the intention determination unit} can provide a “prompt (e.g., question) and choices (options the user may select to respond to the prompt). In response, the virtual agent expects, verbatim, the user to respond with a given choice of the choices” where the choices can be “'YES' {positive response} and 'NO {negative response}'“ and verbatim response of a “given choice of the choices” is a predetermined keyword, and where the virtual agent can “determin[e]... that the response provided by the user does not correspond to an answer provided {cannot determine a response indicating the user's intention}”; Chen, ¶¶ [0029], [0038]; FIG. 3), the inquiry unit determines a word to be included in another inquiry by looking up a preference of the user in the user preference database (The system looks up possible response equivalents in "a model configured to determine a semantic similarity {determines a word by looking up a preference}" where the "model is configured to detect semantic similarity between a previous response and a current response. {the previous response and the semantic similarity being stored in a database, thus a user preference database}"; Chen, ¶¶ [0053], [0077]) and outputs an electronic voice signal of the another inquiry including the determined word to the user (The system may ask "Follow up questions … to resolve an ambiguity," where "for the ‘user repeat’ taxonomy, the conversation controller 910 may choose the next best intent, excluding intents that were tried previously in the conversation, and follow the dialog script {determined word} corresponding to that intent." ; Chen, ¶¶ [0072], [0097]),, the intention determination unit determines the positive response, the negative response, or the predetermined keyword based on a user's image or a user's another voice (“After operation 326, the method 300 may continue at operation 206.” Thus, as depicted in FIG. 3 and described in the accompanying Chen, ¶¶ [0029], [0038]; FIG. 3), which is a user's reaction in response to the another inquiry made by the inquiry unit (operation 206, in response to a new question provided by the virtual agent at operation 326, is a new user response {user's reaction} in response to a new question {the another inquiry} made by the virtual assistant {the inquiry means}; Chen, ¶¶ [0038]; FIG. 3). However, Chen fails to expressly recite compares a length of the user's another voice with a length of a predetermined response; and determines that the word in the another inquiry is the predetermined keyword based on the comparison of the length of the user's another voice with the length of the predetermined response.
The relevance of Cech is disclosed above with relation to claim 1. Regarding claim 7, Cech teaches compares a length of the user's another voice with a length of a predetermined response (“In some embodiments, the audio samples are the above described voice tokens 45 that have been parsed from at least one speech input 42” where the speech and keyword phrase and command recognition includes “compar[ing] interim period times 715 with a command spacing time value constant {a length}” of the audio samples {of the user’s another voice} “corresponding to an expected interim time value {with a length…} between commands in a valid command data set {…of a predetermined response}. Cech, ¶ [0060]); and determines that the word in the another inquiry is the predetermined keyword based on the comparison of the length of the user's another voice with the length of the predetermined response (“Tracking the interim periods during known command audio signal transmission {…based on the comparison of the length of the user's another voice with the length of the predetermined response} is one aspect of training a speech recognition system to identify a voice Cech, ¶ [0060]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and methods for a context-aware virtual agent of Chen to incorporate the teachings of Cech to include compares a length of the user's another voice with a length of a predetermined response; and determines that the word in the another inquiry is the predetermined keyword based on the comparison of the length of the user's another voice with the length of the predetermined response. The speech recognition system uses length of the audio signal, among other “out-of-band information,” which provides a “credible way … to double check a perceived speech input,” as recognized by Cech. (Cech, ¶ [0006]).

Claims 2 and 3 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chen and Cech as applied to claim 1 above, and further in view of Divakaran.

Regarding claim 2, the rejection of claim 1 is incorporated. Chen and Cech disclose all of the elements of the current invention as stated above. Chen further discloses wherein the inquiry means makes the inquiry again so as to encourage the user to react by a predetermined action, facial expression, or line of sight (The inquiry means, as incorporated in the virtual agent performing the method 300, “may ask the user a (new) question {makes the inquiry again} and provide answers or provide a non-question message to the user, at operation 326. After operation 326, the method 300 may continue at operation 206,” As operation 206 is receiving a user response to the new question, the new question {the inquiry again} encourages the user to provide the user response {to react}”; Chen, ¶ [0038], FIG. 3). However, Chen and Cech fail(s) to expressly recite …to react by a predetermined action, facial expression, or line of 
Divakaran teaches “a multi-modal, conversational virtual personal assistant.” (Divakaran, ¶ [0039]). Regarding claim 2, Divakaran teaches …to react [to the inquiry] by a predetermined action, facial expression, or line of sight (“A virtual personal assistant according to these implementations is able to receive various sensory inputs, including audible, visual, and/or tactile input,” where sensory input are reactions to inquiries {described, in part as “asking follow-up questions” and depicted, for example, the system audio responses shown in FIGS. 2 and 3} in the sensed environment, and where visual inputs can include “video or still images... [to] determine information such as facial expressions, gestures {predetermined actions}, and iris biometrics (e.g., characteristics of a person's eyes) {line of sight}.”; Divakaran, ¶¶ [0039], [0040]; FIGS. 2 and 3), and the intention determination means determines the positive response, the negative response, or the predetermined keyword by recognizing the action, the facial expression, or the line of sight of the user based on the user's image (“A multi-modal virtual personal assistant can... accept visual input, including video or still images, and determine information such as facial expressions, gestures, and iris biometrics (e.g., characteristics of a person's eyes),” where, “the virtual personal assistant 150 typically includes an understanding system 152… to understand the person's 100 intent and/or emotional state,” where emotional state of the person {user} includes both positive and negative emotional states {positive response and negative response} As described in the specific example of FIG. 3, “The system may further detect, from image data, a visible grimace... [and] conclude that the person is probably frustrated, and that perhaps a different approach is needed,” thus determining a frustration {a negative response} based on a visible grimace {facial expression} from image data {of the user based on the user's image}.; Divakaran, ¶¶ [0040], [0051], [0068]; FIG. 3), which is the user's reaction in response to the another inquiry made by the inquiry means (The visible grimace is produced Divakaran, ¶¶ [0067]; FIG. 3). 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and methods for a context-aware virtual agent of Chen as modified by the sample time measurement for speech recognition taught in Cech, to incorporate the teachings of Divakaran to include ...to react [in the sensed environment] by a predetermined action, facial expression, or line of sight, and the intention determination means determines the positive response, the negative response, or the predetermined keyword by recognizing the action, the facial expression, or the line of sight of the user based on the user's image. The multi-modal virtual assistant can “comprehend non-verbal conversational cues” which can allow the assistant “to interact with a person in a natural way,” as recognized by Divakaran. (Divakaran, ¶ [0037]).

Regarding claim 3, the rejection of claim 2 is incorporated. Chen and Cech disclose all of the elements of the current invention as stated above. However, Chen and Cech fail to expressly recite further comprising storage means for storing user profile information in which information indicating by which one of the action, the facial expression, and the line of sight the user should be encouraged to react to the another inquiry is set for each user, and the inquiry means makes the inquiry again so as to encourage reaction by the corresponding predetermined action, facial expression, or line of sight for each user based on the user profile information stored in the storage means.
The relevance of Divakaran is disclosed above with relation to claim 2. Regarding claim 3, Divakaran teaches wherein the storage means stores user profile information (“a multi-modal virtual personal assistant can also include a preference model, which can be tailored for a particular population and/or for one or more individual people” stored as part of a database, such as database 820; Divakaran, ¶¶ [0041], [0107]) in which information indicating by which one of the action, the facial expression, and the line of sight the user should be encouraged to react to the another inquiry is set for each user (“The preference model can also store characteristics and traits about a person, such as a propensity for speaking very quickly when anxious. The various audible, visual, and tactile information that can be input into the virtual personal assistant can be modified by the preference model to adjust for, for example, accents, cultural differences in the meaning of gestures, regional peculiarities, personal characteristics, and so on,” where adjusting for said differences and distinctions is encouragement to react to the inquiry, and where the preference model is specific to the person {set for each user}; Divakaran, ¶¶ [0041]), and the inquiry means makes the inquiry again so as to encourage reaction by the corresponding predetermined action, facial expression, or line of sight for each user based on the user profile information stored in the storage means (The preference model, as used throughout, is discussed in further detail in FIG. 15. The system discloses that “the programmed preferences 1542 and/or learned preferences 1544 of [the preference model] can be applied to the inputs 1510, 1520, 1530, to filter and/or adjust the inputs according to the preferences 1542, 1544.” Thus, the inputs {inquiry} are based on the preference model {user profile information} stored in the database {storage means}; Divakaran, ¶¶ [0173], [0107]). 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and methods for a context-aware virtual agent of Chen as modified by the sample time measurement for speech recognition taught in Cech, to incorporate the teachings of Divakaran to include further comprising storage means for storing user profile information in which information indicating by which one of the action, the facial expression, and the line of sight the user should be encouraged to react to the another inquiry is set for each user, and the inquiry means makes the inquiry again so as to encourage reaction by the corresponding predetermined action, facial expression, or line of sight for each user based on the user profile information stored in the storage means. The multi-modal virtual Divakaran. (Divakaran, ¶ [0037]).

Claim 4 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chen and Cech as applied to claim 1 above, and further in view of Yamada. 

Regarding claim 4, the rejection of claim 1 is incorporated. Chen and Cech disclose all of the elements of the current invention as stated above. Chen further discloses wherein the inquiry means makes the inquiry again so as to encourage the user to make a predetermined response by a voice (“In response to determining, at operation 320, that the response provided by the user does not correspond to an answer provided, the virtual agent may determine that the user is off-track and perform remediation operation 324. The remediation operation 324 may include... ask[ing] the user a (new) question and provide answers..., at operation 326” where the “human-to-agent interaction may take the form of one or more of text (e.g., a chat session), graphics (e.g., a video conference), or audio (e.g., a voice conversation)” thus the inquiry to the user can be made by a voice. Further, a voice conversation encourages the user to provide a response by voice and where the “provided answers” are a predetermined response.; Chen, ¶¶ [0038], [0103]; FIG. 3), and the intention determination means determines the positive response, the negative response, or the predetermined keyword by [speech recognition] of the user's another voice based on the user's another voice (the virtual agent will determine the user's “given choice of the choices” {determines the positive response, the negative response, or the predetermined keyword} where the choices can be “'YES' {positive response} and 'NO {negative response}'“ and verbatim response of a “given choice of the choices” is a predetermined keyword, based on the received user response at operation 206. The virtual agent can further “determine... whether an unexpected user response (a response that is not included in a list of expected responses) corresponds to an answer provided at Chen, ¶¶ [0029], [0038]-[0039], [0103]; FIG. 3), which is a user's response to the another inquiry (operation 206, in response to a new question provided by the virtual agent at operation 326, is a new user response {user's reaction} in response to a new question {another inquiry} made by the virtual assistant {inquiry means}; Chen, ¶¶ [0038]; FIG. 3). However, Chen and Cech fails to expressly recite wherein speech recognition includes recognizing prosody of the user's voice.
Yamada teaches “an apparatus and a method for identifying prosody on the basis of features of input speech and an apparatus and a method for recognizing speech using the prosody identification.” (Yamada, ¶ [0003]). Regarding claim 4, Yamada discloses, wherein speech recognition includes recognizing prosody of the user's voice (discloses that, in some cases, “human utterance speech cannot be identified by using phonetic information. For example, in Japanese language, “UN” that indicates an affirmative answer {positive response} is phonetically similar to “UUN” that indicates a negative answer {negative response}.” therefore the system and method can include “identifying prosody on the basis of an amount of change in movement of a feature distribution obtained from an autocorrelation matrix of a frequency characteristic of the input speech...performing speech recognition on the basis of features acquired by sound-analyzing the speech input... [and] integrat[ing] the output from the prosody identifying means with an output of the speech recognizing means”; Yamada, ¶¶ [0007], [0013])
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and methods for a context-aware virtual agent of Chen as modified by the sample time measurement for speech recognition taught in Cech, to incorporate the teachings of Yamada to include wherein speech recognition includes recognizing prosody of the user's voice. The “prosody information” can be used to identify “human Yamada. (Yamada, ¶¶ [0007], [0014]).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Hirayama et al. (U.S. Pat. No. 6708150) discloses a speech recognition system including length comparison between a voice recognition and a recognition word to detect the recognition word using a predetermined length.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Sean E. Serraguard whose telephone number is (313)446-6627. The examiner can normally be reached 07:00-17:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Sean E Serraguard/Patent Examiner, Art Unit 2657                                                                                                                                                                                                        

/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657