Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
All objections/rejections not mentioned in this Office Action have been withdrawn by the Examiner.

Response to Amendments 
Applicant’s amendment filed on June 30, 2022 has been entered. 
In view of the amendment to the claim(s), the amendment of claim(s) 1 and 5-7 have been acknowledged and entered.  
In view of the amendment to claim(s) 1 and 5-7, the rejection of claims 1-7 under 35 U.S.C. §103 is maintained, as provided in the response below.
In light of the amended claims, new grounds for rejection under 35 U.S.C. §103 are provided in the response below. 

Response to Arguments
Applicant’s arguments regarding the subject matter rejections under 35 U.S.C. §101, see pages 6-9 of the Response to Final Office Action dated January 28, 2022 and Accompanying Request for Continued Examination, which was received on June 30, 2022, have been fully considered.
With respect to the rejection(s) of claim(s) 1 and 5-7 under 35 U.S.C. §103 as being obvious under (U.S. Pat. App. Pub. No. 2020/0007380, hereinafter Chen) in view of Cech (U.S. Pat. App. Pub. No. 2018/0286404, hereinafter Cech), applicant provides two arguments. First, applicant asserts that Cech fails to teach or suggest “an intention determination means that determines that the word in the another inquiry is the predetermined keyword indicating the user's intention based on the comparison of the length of the user's another voice with the length of the predetermined response, the predetermined keyword being one of the plurality of predetermined keywords.”  Second, applicant asserts that Cech fails to teach or suggest “an intention determination means that compares a length of the user's another voice with a length of a predetermined response.” 
Regarding the first argument, and in response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986). Applicant asserts that Cech is deficient with regards to an inquiry. However, Cech discloses the same elements as related to a “keyword phrase or command,” where the inquiry analysis of Chen is modified by the command analysis of Cech. Examiner notes that the phrase “keyword phrase or command” is broad enough to encompass an inquiry, which is expressly recited in Chen. As the “keyword phrase or command” analysis of Cech modifies the inquiry analysis described in Chen, one skilled in the art would readily be able to apply said teaching to any form of user voice input, such as queries or commands. Therefore, this argument is not persuasive and the rejection is maintained.
Further, examiner notes that in the speech recognition arts, the words “command” and “inquiry” are among many which are used interchangeably as generic descriptors to describe user input to a speech responsive system, such as an autonomous device or an intelligent virtual assistant. The use of one generic descriptor, such as “voice input” or “command,” without the other does not prevent the disclosure from being applied more broadly. Without more, the mere lack of the word inquiry is not dispositive regarding the breadth of the teaching, where “voice input” and “commands” are described throughout.
Regarding the second argument, this argument is not persuasive. Applicant asserts that Cech “describes comparing interim period times 715 with a command spacing time data set” and argues that “the interim periods of Cech do not correspond a length of the user's another voice of claim 1.” (Response, pg. 8). However, as discussed during the interview on April 19, 2022 and maintained here,  Cech discloses “compar[ing] interim period times 715 with a command spacing time value constant corresponding to an expected interim time value between commands in a valid command data set.” (Cech, [0060]) Interim period times 715 are based on the length of voice components, including start trigger and end trigger points for each portion. The interim period length, including the specific start time and end time of the interim period, is compared to the “command spacing time value constants” which are time value constants for spacing in a specific command. Each of these comparisons necessarily require comparison of the audio sample period, if nothing more than for the determination of correlating start and end trigger points as points in a continuum of time, to the same points in the command spacing time value constants, such that an “expected interim time value between commands” can be determined.  
Further, Cech discloses comparison of interim periods as determined in light of start triggers and stop triggers from audio sample periods 700A, 700B, and 700C and “identifying the length of expected interim periods 715 and sample lengths 700 likely for command data signals 765 from a given user,” where the “the system, method, and computer program product described in this disclosure are configured to utilize the audio sample lengths and the interim period lengths as additional data points in an overall verification and speech translation of a voice command.” (Cech, ¶ [0058]-[0059]). The described invention would not function without comparison of the interim period length, including the specific start time and end time of the interim period, to the “command spacing time value constants” which are time value constants for spacing in a specific command. Therefore, the argument is not persuasive and the rejection is maintained.
The Applicant has not provided any further statement and therefore, the Examiner directs the Applicant to the below rationale.	 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1 and 5-7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chen (U.S. Pat. App. Pub. No. 2020/0007380, hereinafter Chen) in view of Cech (U.S. Pat. App. Pub. No. 2018/0286404, hereinafter Cech)

Regarding claim 1, Chen discloses an interaction system comprising (the virtual agent implementing the method 300; Chen, ¶¶ [0037]): storage means for storing a user preference (“Computer-readable instructions stored on a computer-readable storage device are executable by the processing unit 1402 of the machine 1400. A hard drive, CD-ROM, and RAM are some examples of articles including a non-transitory computer-readable medium such as a storage device.”; Chen, ¶ [0119]); inquiry means for outputting an electronic voice signal of an inquiry to a user (“the virtual agent” implementing the method 300 {using an inquiry means} “provides a question {making an inquiry} and a set of acceptable answers (choices) to the user” where the “human-to-agent interaction may take the form of one or more of text (e.g., a chat session), graphics (e.g., a video conference), or audio (e.g., a voice conversation)” thus the inquiry to the user can be made by a voice.; Chen, ¶¶ [0032], [0037], [0103]); a microphone configured to capture a user’s voice response (“the virtual agent 102 may detect the user 104 has accessed the virtual agent webpage at operation 106” such as by “speaking … into a microphone,”; Chen, ¶¶ [0026]); and intention determination means for determining a user's intention (“the method 300,” where the method 300 as performed by the “processor operating on a computer system” is the intention determination means, either determines that the user response is an exact match to an answer provided at operation 204 or “at operation 320, determin[es] whether the answer provided by the user, at operation 206, corresponds to an answer provided (e.g., is not an exact match but the virtual agent may conclude with some degree of certainty that the user intended to select the answer)” thus determining the user's intention.; Chen, ¶¶ [0037]) based on the user's voice response in response to the inquiry made by the inquiry means, (The determination of intention to select one or more answers, is based on the “answer provided by the user” where the answer is provided in response to the question and set of answers from the virtual agent, and where the answer provided by the user is spoken {e.g., “speaking the choice verbatim” as part of “a voice conversation”}; Chen, ¶¶ [0032]-[0033], [0037], [0103]), a plurality of predetermined keywords being set in the intention determination means, ("Embodiments herein may provide a virtual agent that is capable of understanding and selecting a choice to which an unexpected user response corresponds." where "The virtual agent 102 may select the choice... in response to receiving such a word, phrase, or symbol from the user 104." where the selection of a choice in the form of "other word[s], phrase[s], or symbol[s]" is a plurality of keywords being set corresponding to the user’s intent {intention determination means}.; Chen, ¶¶ [0030]) wherein, in response to the intention determination means failing to determine a positive response, a negative response, or a predetermined keyword indicating the user's intention based on the user's voice response in response to the inquiry made by the inquiry means (The virtual agent {implementing the method 300, thus including the intention determination means} can provide a “prompt (e.g., question) and choices (options the user may select to respond to the prompt). In response, the virtual agent expects, verbatim, the user to respond with a given choice of the choices” where the choices can be “'YES' {positive response} and 'NO {negative response}'“ and verbatim response of a “given choice of the choices” is a predetermined keyword, and where the virtual agent can “determin[e]... that the response provided by the user does not correspond to an answer provided {failing to determine a response indicating the user's intention}” and “In response to determining, at operation 320, that the response provided by the user does not correspond to an answer provided, the virtual agent may determine that the user is off-track and perform remediation operation 324. The remediation operation 324 may include... ask[ing] the user a (new) question and provide answers”; Chen, ¶¶ [0029], [0038]; FIG. 3), the inquiry means determines a word to be included in another inquiry by looking up a preference of the user in the user preference database (The system looks up possible response equivalents in "a model configured to determine a semantic similarity {determines a word by looking up a preference}" where the "model is configured to detect semantic similarity between a previous response and a current response. {the previous response and the semantic similarity being stored in a database, thus a user preference database}"; Chen, ¶¶ [0053], [0077]) and outputs an electronic voice signal of the another inquiry including the determined word to the user (The system may ask "Follow up questions … to resolve an ambiguity," where "for the ‘user repeat’ taxonomy, the conversation controller 910 may choose the next best intent, excluding intents that were tried previously in the conversation, and follow the dialog script {determined word} corresponding to that intent." ; Chen, ¶¶ [0072], [0097]), the intention determination means determines the positive response, the negative response, or the predetermined keyword based on a user's image or a user's another voice (“After operation 326, the method 300 may continue at operation 206.” Thus, as depicted in FIG. 3 and described in the accompanying paragraphs, operation 206 is performed after asking a new question at 326, where the user’s response is the “given choice of the choices” where the choices can be “'YES' {positive response} and 'NO {negative response}'“ and verbatim response of a “given choice of the choices” is a predetermined keyword, where the virtual agent will receive user response {user's reaction} in response to the new question {the another inquiry}, and where the user response is a voice response {based on the user’s voice}; Chen, ¶¶ [0029], [0038]; FIG. 3), which is a user's reaction in response to the another inquiry made by the inquiry means (operation 206, in response to a new question provided by the virtual agent at operation 326, is a new user response {user's reaction} in response to a new question {the another inquiry} made by the virtual assistant {the inquiry means}; Chen, ¶¶ [0038]; FIG. 3). However, Chen fails to expressly recite compares a length of the user's another voice with a length of a predetermined response; and determines that the word in the another inquiry is the predetermined keyword indicating the user's intention based on the comparison of the length of the user's another voice with the length of the predetermined response, the predetermined keyword being one of the plurality of predetermined keywords.
Cech discloses systems and methods for automated speech recognition including the measurement of audio sample times for keyword detection. (Cech, ¶ [0007]). Regarding claim 1, Cech teaches a plurality of predetermined keywords being set in the intention determination means, (" The dictionary can include one or more “keyword” phrases and one or more “command” phrases."; Cech, ¶¶ [0039]); compares a length of the user’s another voice with a length of a predetermined response (“In some embodiments, the audio samples are the above described voice tokens 45 that have been parsed from at least one speech input 42” where the speech and keyword phrase and command recognition includes “compar[ing] interim period times 715 with a command spacing time value constant {a length}” of the audio samples {of the user’s another voice} “corresponding to an expected interim time value {with a length…} between commands in a valid command data set {…of a predetermined response}; Cech, ¶¶ [0060]); and determines that the word in the another inquiry is the predetermined keyword indicating the user’s intention based on the comparison of the length of the user’s another voice with the length of the predetermined response (“Tracking the interim periods during known command audio signal transmission {…based on the comparison of the length of the user’s another voice with the length of the predetermined response} is one aspect of training a speech recognition system to identify a voice token {determines that the word in the another inquiry…} as either a keyword phrase or command or a portion of a keyword phrase or command {…is the predetermined keyword…}." Further, the system is "Tracking the interim periods during known command audio signal transmission... to identify a voice token as either a keyword phrase or command or a portion of a keyword phrase or command," where a keyword phrase or command can "control vehicle operation {a keyword which indicates the user’s intention}."; Cech, ¶¶ [0060], [0062]), the predetermined keyword being one of the plurality of predetermined keywords (The system compares to "commands {predetermined keywords} in a valid command data set {...being one of the plurality of predetermined keywords}"; Cech, ¶¶ [0060]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and methods for a context-aware virtual agent of Chen to incorporate the teachings of Cech to include compares a length of the user's another voice with a length of a predetermined response; and determines that the word in the another inquiry is the predetermined keyword indicating the user's intention based on the comparison of the length of the user's another voice with the length of the predetermined response, the predetermined keyword being one of the plurality of predetermined keywords. The speech recognition system uses length of the audio signal, among other “out-of-band information,” which provides a “credible way … to double check a perceived speech input,” as recognized by Cech. (Cech, ¶ [0006]).

Regarding claim 5, Chen discloses An interaction method comprising the steps of (the virtual agent implementing the method 300; Chen, ¶¶ [0037]): outputting an electronic voice signal of an inquiry to a user by a voice (“the virtual agent” implementing the method 300 “provides a question {making an inquiry} and a set of acceptable answers (choices) to the user” where the “human-to-agent interaction may take the form of one or more of text (e.g., a chat session), graphics (e.g., a video conference), or audio (e.g., a voice conversation)” thus the inquiry to the user can be made by a voice.; Chen, ¶¶ [0032], [0037], [0103]) capturing, by a microphone, a user's voice response (“the virtual agent 102 may detect the user 104 has accessed the virtual agent webpage at operation 106” such as by “speaking … into a microphone,”; Chen, ¶¶ [0026]); and determining a user's intention (“the method 300” either determines that the user response is an exact match to an answer provided at operation 204 or “at operation 320, determin[es] whether the answer provided by the user, at operation 206, corresponds to an answer provided (e.g., is not an exact match but the virtual agent may conclude with some degree of certainty that the user intended to select the answer)” thus determining the user's intention.; Chen, ¶¶ [0037]) based on the user's voice response in response to the inquiry, (The determination of intention to select one or more answers, is based on the “answer provided by the user” where the answer is provided in response to the question and set of answers from the virtual agent, and where the answer provided by the user is spoken {e.g., “speaking the choice verbatim” as part of “a voice conversation”}; Chen, ¶¶ [0032]-[0033], [0037], [0103]) a plurality of predetermined keywords being preset, ("Embodiments herein may provide a virtual agent that is capable of understanding and selecting a choice to which an unexpected user response corresponds." where "The virtual agent 102 may select the choice... in response to receiving such a word, phrase, or symbol from the user 104." where the selection of a choice in the form of "other word[s], phrase[s], or symbol[s]" is a plurality of keywords being set corresponding to the user’s intent.; Chen, ¶¶ [0030]), the method comprising: determining a word to be included in another inquiry by looking up a preference of the user in a user preference database (The system looks up possible response equivalents in "a model configured to determine a semantic similarity {determines a word by looking up a preference}" where the "model is configured to detect semantic similarity between a previous response and a current response. {the previous response and the semantic similarity being stored in a database, thus a user preference database}"; Chen, ¶¶ [0053], [0077]) and outputting an electronic voice signal of the another inquiry including the determined word to the user (The system may ask "Follow up questions … to resolve an ambiguity," where "for the ‘user repeat’ taxonomy, the conversation controller 910 may choose the next best intent, excluding intents that were tried previously in the conversation, and follow {outputting} the dialog script {the another inquiry including the determined word} corresponding to that intent." ; Chen, ¶¶ [0072], [0097]), in response to failing to determine a positive response, a negative response, or a predetermined keyword indicating the user's intention based on the user's voice response in response to the inquiry (The virtual agent {implementing the method 300} can provide a “prompt (e.g., question) and choices (options the user may select to respond to the prompt). In response, the virtual agent expects, verbatim, the user to respond with a given choice of the choices” where the choices can be “'YES' {positive response} and 'NO {negative response}'“ and verbatim response of a “given choice of the choices” is a predetermined keyword, and where the virtual agent can “determin[e]... that the response provided by the user does not correspond to an answer provided {cannot determine a response indicating the user's intention}”; Chen, ¶¶ [0029], [0038]; FIG. 3); determining the positive response, the negative response, or the predetermined keyword based on a user's image or a user's voice (“After operation 326, the method 300 may continue at operation 206.” Thus, as depicted in FIG. 3 and described in the accompanying paragraphs, operation 206 is performed after asking a new question at 326, where the user’s response is the “given choice of the choices” where the choices can be “'YES' {positive response} and 'NO {negative response}'“ and verbatim response of a “given choice of the choices” is a predetermined keyword, where the virtual agent will receive user response {user's reaction} in response to the new question {the another inquiry}, and where the user response is a voice response {based on the user’s voice}; Chen, ¶¶ [0029], [0038]; FIG. 3), which is a user's reaction in response to the another inquiry (operation 206, in response to a new question provided by the virtual agent at operation 326, is a new user response {user's reaction} in response to a new question {the another inquiry} made by the virtual assistant {the inquiry means}; Chen, ¶¶ [0038]; FIG. 3). However, Chen fails to expressly recite comparing a length of the user's another voice with a length of a predetermined response; and determining that the word in the another inquiry is the predetermined keyword indicating the user's intention based on the comparison of the length of the user's another voice with the length of the predetermined response, the predetermined keyword being one of the plurality of predetermined keywords.
The relevance of Cech is disclosed above with relation to claim 1. Regarding claim 5, Cech teaches a plurality of predetermined keywords being preset, ("The dictionary can include one or more “keyword” phrases and one or more “command” phrases."; Cech, ¶¶ [0039]); comparing a length of the user’s another voice with a length of a predetermined response (“In some embodiments, the audio samples are the above described voice tokens 45 that have been parsed from at least one speech input 42” where the speech and keyword phrase and command recognition includes “compar[ing] interim period times 715 with a command spacing time value constant {a length}” of the audio samples {of the user’s another voice} “corresponding to an expected interim time value {with a length…} between commands in a valid command data set {…of a predetermined response}; Cech, ¶¶ [0060]); and determining that the word in the another inquiry is the predetermined keyword indicating the user’s intention based on the comparison of the length of the user’s another voice with the length of the predetermined response (“Tracking the interim periods during known command audio signal transmission {…based on the comparison of the length of the user’s another voice with the length of the predetermined response} is one aspect of training a speech recognition system to identify a voice token {determines that the word in the another inquiry…} as either a keyword phrase or command or a portion of a keyword phrase or command {…is the predetermined keyword…}." Further, the system is "Tracking the interim periods during known command audio signal transmission... to identify a voice token as either a keyword phrase or command or a portion of a keyword phrase or command," where a keyword phrase or command can "control vehicle operation {a keyword which indicates the user’s intention}."; Cech, ¶¶ [0060], [0062]), the predetermined keyword being one of the plurality of predetermined keywords (The system compares to "commands {predetermined keywords} in a valid command data set {...being one of the plurality of predetermined keywords}"; Cech, ¶¶ [0060]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and methods for a context-aware virtual agent of Chen to incorporate the teachings of Cech to include comparing a length of the user's another voice with a length of a predetermined response; and determining that the word in the another inquiry is the predetermined keyword indicating the user's intention based on the comparison of the length of the user's another voice with the length of the predetermined response, the predetermined keyword being one of the plurality of predetermined keywords. The speech recognition system uses length of the audio signal, among other “out-of-band information,” which provides a “credible way … to double check a perceived speech input,” as recognized by Cech. (Cech, ¶ [0006]).

Regarding claim 6, Chen discloses a non-transitory computer readable medium storing a program for causing a computer to execute the following processing of (the virtual agent implementing the method 300 through “computer-readable instructions stored on a computer-readable storage device are executable by the processing unit 1402 of the machine 1400… including a non-transitory computer-readable medium such as a storage device”; Chen, ¶¶ [0037], [0119]): outputting an electronic voice signal of an inquiry to a user (“the virtual agent” implementing the method 300 “provides a question {making an inquiry}and a set of acceptable answers (choices) to the user” where the “human-to-agent interaction may take the form of one or more of text (e.g., a chat session), graphics (e.g., a video conference), or audio (e.g., a voice conversation)” thus the inquiry to the user can be made by a voice.; Chen, ¶¶ [0032], [0037], [0103]); determining a word to be included in another inquiry by looking up a preference of the user in a user preference database (The system looks up possible response equivalents in "a model configured to determine a semantic similarity {determines a word by looking up a preference}" where the "model is configured to detect semantic similarity between a previous response and a current response. {the previous response and the semantic similarity being stored in a database, thus a user preference database}"; Chen, ¶¶ [0053], [0077]) and outputting an electronic voice signal of the another inquiry to the user (The system may ask "Follow up questions … to resolve an ambiguity," where "for the ‘user repeat’ taxonomy, the conversation controller 910 may choose the next best intent, excluding intents that were tried previously in the conversation, and follow {outputting} the dialog script {the another inquiry including the determined word} corresponding to that intent." ; Chen, ¶¶ [0072], [0097]), in response to failing to determine a positive response, a negative response, or a predetermined keyword indicating the user's intention based on the user's voice response in response to the inquiry (The virtual agent {implementing the method 300} can provide a “prompt (e.g., question) and choices (options the user may select to respond to the prompt). In response, the virtual agent expects, verbatim, the user to respond with a given choice of the choices” where the choices can be “'YES' {positive response} and 'NO {negative response}'“ and verbatim response of a “given choice of the choices” is a predetermined keyword, and where the virtual agent can “determin[e]... that the response provided by the user does not correspond to an answer provided {cannot determine a response indicating the user's intention}” where the determination of intention to select one or more answers, is based on the “answer provided by the user” where the answer is provided in response to the question and set of answers from the virtual agent, and where the answer provided by the user is spoken {e.g., “speaking the choice verbatim” as part of “a voice conversation”}; Chen, ¶¶ [0029], [0032]-[0033], [0037]-[0038], [0103]; FIG. 3); a plurality of predetermined keywords being set in the intention determination means, ("Embodiments herein may provide a virtual agent that is capable of understanding and selecting a choice to which an unexpected user response corresponds." where "The virtual agent 102 may select the choice... in response to receiving such a word, phrase, or symbol from the user 104." where the selection of a choice in the form of "other word[s], phrase[s], or symbol[s]" is a plurality of keywords being set corresponding to the user’s intent {intention determination means}.; Chen, ¶¶ [0030]), and determining the positive response, the negative response, or the predetermined keyword based on a user's image or a user's another voice (“After operation 326, the method 300 may continue at operation 206.” Thus, as depicted in FIG. 3 and described in the accompanying paragraphs, operation 206 is performed after asking a new question at 326, where the user’s response is the “given choice of the choices” where the choices can be “'YES' {positive response} and 'NO {negative response}'“ and verbatim response of a “given choice of the choices” is a predetermined keyword, where the virtual agent will receive user response {user's reaction} in response to the new question {the another inquiry}, and where the user response is a voice response {based on the user’s voice}; Chen, ¶¶ [0029], [0038]; FIG. 3), which is a user's reaction in response to the another inquiry (operation 206, in response to a new question provided by the virtual agent at operation 326, is a new user response {user's reaction} in response to a new question {the another inquiry} made by the virtual assistant {the inquiry means}; Chen, ¶¶ [0038]; FIG. 3). However, Chen fails to expressly recite comparing a length of the user's another voice with a length of a predetermined response; and determining that the word in the another inquiry is the predetermined keyword indicating the user's intention based on the comparison of the length of the user's another voice with the length of the predetermined response, the predetermined keyword being one of the plurality of predetermined keywords.
The relevance of Cech is disclosed above with relation to claim 1. Regarding claim 6, Cech teaches a plurality of predetermined keywords being preset, ("The dictionary can include one or more “keyword” phrases and one or more “command” phrases."; Cech, ¶¶ [0039]); comparing a length of the user’s another voice with a length of a predetermined response (“In some embodiments, the audio samples are the above described voice tokens 45 that have been parsed from at least one speech input 42” where the speech and keyword phrase and command recognition includes “compar[ing] interim period times 715 with a command spacing time value constant {a length}” of the audio samples {of the user’s another voice} “corresponding to an expected interim time value {with a length…} between commands in a valid command data set {…of a predetermined response}; Cech, ¶¶ [0060]); and determining that the word in the another inquiry is the predetermined keyword indicating the user’s intention based on the comparison of the length of the user’s another voice with the length of the predetermined response (“Tracking the interim periods during known command audio signal transmission {…based on the comparison of the length of the user’s another voice with the length of the predetermined response} is one aspect of training a speech recognition system to identify a voice token {determines that the word in the another inquiry…} as either a keyword phrase or command or a portion of a keyword phrase or command {…is the predetermined keyword…}." Further, the system is "Tracking the interim periods during known command audio signal transmission... to identify a voice token as either a keyword phrase or command or a portion of a keyword phrase or command," where a keyword phrase or command can "control vehicle operation {a keyword which indicates the user’s intention}."; Cech, ¶¶ [0060], [0062]), the predetermined keyword being one of the plurality of predetermined keywords (The system compares to "commands {predetermined keywords} in a valid command data set {...being one of the plurality of predetermined keywords}"; Cech, ¶¶ [0060]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and methods for a context-aware virtual agent of Chen to incorporate the teachings of Cech to include comparing a length of the user's another voice with a length of a predetermined response; and determining that the word in the another inquiry is the predetermined keyword indicating the user's intention based on the comparison of the length of the user's another voice with the length of the predetermined response, the predetermined keyword being one of the plurality of predetermined keywords. The speech recognition system uses length of the audio signal, among other “out-of-band information,” which provides a “credible way … to double check a perceived speech input,” as recognized by Cech. (Cech, ¶ [0006]).
 
Regarding claim 7, Chen discloses an interaction system comprising (the virtual agent implementing the method 300; Chen, ¶¶ [0037]): an inquiry unit configured to output an electronic voice signal of an inquiry to a user by a voice (“the virtual agent” implementing the method 300 {using an inquiry means} “provides a question {making an inquiry}and a set of acceptable answers (choices) to the user” where the “human-to-agent interaction may take the form of one or more of text (e.g., a chat session), graphics (e.g., a video conference), or audio (e.g., a voice conversation)” thus the inquiry to the user can be made by a voice.; Chen, ¶¶ [0032], [0037], [0103]); a microphone configured to capture a user’s voice response (“the virtual agent 102 may detect the user 104 has accessed the virtual agent webpage at operation 106” such as by “speaking … into a microphone,”; Chen, ¶¶ [0026]); and an intention determination unit configured to determine a user's intention (“the method 300,” where the method 300 as performed by the “processor operating on a computer system” is the intention determination unit, either determines that the user response is an exact match to an answer provided at operation 204 or “at operation 320, determin[es] whether the answer provided by the user, at operation 206, corresponds to an answer provided (e.g., is not an exact match but the virtual agent may conclude with some degree of certainty that the user intended to select the answer)” thus determining the user's intention.; Chen, ¶¶ [0037]) based on the user's voice response in response to the inquiry made by the inquiry unit, (The determination of intention to select one or more answers, is based on the “answer provided by the user” where the answer is provided in response to the question and set of answers from the virtual agent, and where the answer provided by the user is spoken {e.g., “speaking the choice verbatim” as part of “a voice conversation”}; Chen, ¶¶ [0032]-[0033], [0037], [0103]), a plurality of predetermined keywords being set in the intention determination unit, ("Embodiments herein may provide a virtual agent that is capable of understanding and selecting a choice to which an unexpected user response corresponds." where "The virtual agent 102 may select the choice... in response to receiving such a word, phrase, or symbol from the user 104." where the selection of a choice in the form of "other word[s], phrase[s], or symbol[s]" is a plurality of keywords being set corresponding to the user’s intent {intention determination means}.; Chen, ¶¶ [0030]) wherein, in response to the intention determination unit failing to determine a positive response, a negative response, or a predetermined keyword indicating the user's intention based on the user's voice response in response to the inquiry made by the inquiry unit (The virtual agent {implementing the method 300, thus including the intention determination unit} can provide a “prompt (e.g., question) and choices (options the user may select to respond to the prompt). In response, the virtual agent expects, verbatim, the user to respond with a given choice of the choices” where the choices can be “'YES' {positive response} and 'NO {negative response}'“ and verbatim response of a “given choice of the choices” is a predetermined keyword, and where the virtual agent can “determin[e]... that the response provided by the user does not correspond to an answer provided {cannot determine a response indicating the user's intention}”; Chen, ¶¶ [0029], [0038]; FIG. 3), the inquiry unit determines a word to be included in another inquiry by looking up a preference of the user in the user preference database (The system looks up possible response equivalents in "a model configured to determine a semantic similarity {determines a word by looking up a preference}" where the "model is configured to detect semantic similarity between a previous response and a current response. {the previous response and the semantic similarity being stored in a database, thus a user preference database}"; Chen, ¶¶ [0053], [0077]) and outputs an electronic voice signal of the another inquiry including the determined word to the user (The system may ask "Follow up questions … to resolve an ambiguity," where "for the ‘user repeat’ taxonomy, the conversation controller 910 may choose the next best intent, excluding intents that were tried previously in the conversation, and follow the dialog script {determined word} corresponding to that intent." ; Chen, ¶¶ [0072], [0097]),, the intention determination unit determines the positive response, the negative response, or the predetermined keyword based on a user's image or a user's another voice (“After operation 326, the method 300 may continue at operation 206.” Thus, as depicted in FIG. 3 and described in the accompanying paragraphs, operation 206 is performed after asking a new question at 326, where the user’s response is the “given choice of the choices” where the choices can be “'YES' {positive response} and 'NO {negative response}'“ and verbatim response of a “given choice of the choices” is a predetermined keyword, where the virtual agent will receive user response {user's reaction} in response to the new question {the another inquiry}, and where the user response is a voice response {based on the user’s voice}; Chen, ¶¶ [0029], [0038]; FIG. 3), which is a user's reaction in response to the another inquiry made by the inquiry unit (operation 206, in response to a new question provided by the virtual agent at operation 326, is a new user response {user's reaction} in response to a new question {the another inquiry} made by the virtual assistant {the inquiry means}; Chen, ¶¶ [0038]; FIG. 3). However, Chen fails to expressly recite compares a length of the user's another voice with a length of a predetermined response; and determines that the word in the another inquiry is the predetermined keyword indicating the user's intention based on the comparison of the length of the user's another voice with the length of the predetermined response, the predetermined keyword being one of the plurality of predetermined keywords.
The relevance of Cech is disclosed above with relation to claim 1. Regarding claim 7, Cech teaches a plurality of predetermined keywords being set in the intention determination means, (" The dictionary can include one or more “keyword” phrases and one or more “command” phrases."; Cech, ¶¶ [0039]); compares a length of the user’s another voice with a length of a predetermined response (“In some embodiments, the audio samples are the above described voice tokens 45 that have been parsed from at least one speech input 42” where the speech and keyword phrase and command recognition includes “compar[ing] interim period times 715 with a command spacing time value constant {a length}” of the audio samples {of the user’s another voice} “corresponding to an expected interim time value {with a length…} between commands in a valid command data set {…of a predetermined response}; Cech, ¶¶ [0060]); and determines that the word in the another inquiry is the predetermined keyword indicating the user’s intention based on the comparison of the length of the user’s another voice with the length of the predetermined response (“Tracking the interim periods during known command audio signal transmission {…based on the comparison of the length of the user’s another voice with the length of the predetermined response} is one aspect of training a speech recognition system to identify a voice token {determines that the word in the another inquiry…} as either a keyword phrase or command or a portion of a keyword phrase or command {…is the predetermined keyword…}." Further, the system is "Tracking the interim periods during known command audio signal transmission... to identify a voice token as either a keyword phrase or command or a portion of a keyword phrase or command," where a keyword phrase or command can "control vehicle operation {a keyword which indicates the user’s intention}."; Cech, ¶¶ [0060], [0062]), the predetermined keyword being one of the plurality of predetermined keywords (The system compares to "commands {predetermined keywords} in a valid command data set {...being one of the plurality of predetermined keywords}"; Cech, ¶¶ [0060]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and methods for a context-aware virtual agent of Chen to incorporate the teachings of Cech to include compares a length of the user's another voice with a length of a predetermined response; and determines that the word in the another inquiry is the predetermined keyword indicating the user's intention based on the comparison of the length of the user's another voice with the length of the predetermined response, the predetermined keyword being one of the plurality of predetermined keywords. The speech recognition system uses length of the audio signal, among other “out-of-band information,” which provides a “credible way … to double check a perceived speech input,” as recognized by Cech. (Cech, ¶ [0006]).

Claims 2 and 3 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chen and Cech as applied to claim 1 above, and further in view of Divakaran.

Regarding claim 2, the rejection of claim 1 is incorporated. Chen and Cech disclose all of the elements of the current invention as stated above. Chen further discloses wherein the inquiry means makes the inquiry again so as to encourage the user to react by a predetermined action, facial expression, or line of sight (The inquiry means, as incorporated in the virtual agent performing the method 300, “may ask the user a (new) question {makes the inquiry again} and provide answers or provide a non-question message to the user, at operation 326. After operation 326, the method 300 may continue at operation 206,” As operation 206 is receiving a user response to the new question, the new question {the inquiry again} encourages the user to provide the user response {to react}”; Chen, ¶ [0038], FIG. 3). However, Chen and Cech fail(s) to expressly recite …to react by a predetermined action, facial expression, or line of sight, and the intention determination means determines the positive response, the negative response, or the predetermined keyword by recognizing the action, the facial expression, or the line of sight of the user based on the user's image. 
Divakaran teaches “a multi-modal, conversational virtual personal assistant.” (Divakaran, ¶ [0039]). Regarding claim 2, Divakaran teaches …to react [to the inquiry] by a predetermined action, facial expression, or line of sight (“A virtual personal assistant according to these implementations is able to receive various sensory inputs, including audible, visual, and/or tactile input,” where sensory input are reactions to inquiries {described, in part as “asking follow-up questions” and depicted, for example, the system audio responses shown in FIGS. 2 and 3} in the sensed environment, and where visual inputs can include “video or still images... [to] determine information such as facial expressions, gestures {predetermined actions}, and iris biometrics (e.g., characteristics of a person's eyes) {line of sight}.”; Divakaran, ¶¶ [0039], [0040]; FIGS. 2 and 3), and the intention determination means determines the positive response, the negative response, or the predetermined keyword by recognizing the action, the facial expression, or the line of sight of the user based on the user's image (“A multi-modal virtual personal assistant can... accept visual input, including video or still images, and determine information such as facial expressions, gestures, and iris biometrics (e.g., characteristics of a person's eyes),” where, “the virtual personal assistant 150 typically includes an understanding system 152… to understand the person's 100 intent and/or emotional state,” where emotional state of the person {user} includes both positive and negative emotional states {positive response and negative response} As described in the specific example of FIG. 3, “The system may further detect, from image data, a visible grimace... [and] conclude that the person is probably frustrated, and that perhaps a different approach is needed,” thus determining a frustration {a negative response} based on a visible grimace {facial expression} from image data {of the user based on the user's image}.; Divakaran, ¶¶ [0040], [0051], [0068]; FIG. 3), which is the user's reaction in response to the another inquiry made by the inquiry means (The visible grimace is produced in response to the system responses, shown in this example with reference to the interaction 300 as system response 328; Divakaran, ¶¶ [0067]; FIG. 3). 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and methods for a context-aware virtual agent of Chen as modified by the sample time measurement for speech recognition taught in Cech, to incorporate the teachings of Divakaran to include ...to react [in the sensed environment] by a predetermined action, facial expression, or line of sight, and the intention determination means determines the positive response, the negative response, or the predetermined keyword by recognizing the action, the facial expression, or the line of sight of the user based on the user's image. The multi-modal virtual assistant can “comprehend non-verbal conversational cues” which can allow the assistant “to interact with a person in a natural way,” as recognized by Divakaran. (Divakaran, ¶ [0037]).

Regarding claim 3, the rejection of claim 2 is incorporated. Chen and Cech disclose all of the elements of the current invention as stated above. However, Chen and Cech fail to expressly recite further comprising storage means for storing user profile information in which information indicating by which one of the action, the facial expression, and the line of sight the user should be encouraged to react to the another inquiry is set for each user, and the inquiry means makes the inquiry again so as to encourage reaction by the corresponding predetermined action, facial expression, or line of sight for each user based on the user profile information stored in the storage means.
The relevance of Divakaran is disclosed above with relation to claim 2. Regarding claim 3, Divakaran teaches wherein the storage means stores user profile information (“a multi-modal virtual personal assistant can also include a preference model, which can be tailored for a particular population and/or for one or more individual people” stored as part of a database, such as database 820; Divakaran, ¶¶ [0041], [0107]) in which information indicating by which one of the action, the facial expression, and the line of sight the user should be encouraged to react to the another inquiry is set for each user (“The preference model can also store characteristics and traits about a person, such as a propensity for speaking very quickly when anxious. The various audible, visual, and tactile information that can be input into the virtual personal assistant can be modified by the preference model to adjust for, for example, accents, cultural differences in the meaning of gestures, regional peculiarities, personal characteristics, and so on,” where adjusting for said differences and distinctions is encouragement to react to the inquiry, and where the preference model is specific to the person {set for each user}; Divakaran, ¶¶ [0041]), and the inquiry means makes the inquiry again so as to encourage reaction by the corresponding predetermined action, facial expression, or line of sight for each user based on the user profile information stored in the storage means (The preference model, as used throughout, is discussed in further detail in FIG. 15. The system discloses that “the programmed preferences 1542 and/or learned preferences 1544 of [the preference model] can be applied to the inputs 1510, 1520, 1530, to filter and/or adjust the inputs according to the preferences 1542, 1544.” Thus, the inputs {inquiry} are based on the preference model {user profile information} stored in the database {storage means}; Divakaran, ¶¶ [0173], [0107]). 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and methods for a context-aware virtual agent of Chen as modified by the sample time measurement for speech recognition taught in Cech, to incorporate the teachings of Divakaran to include further comprising storage means for storing user profile information in which information indicating by which one of the action, the facial expression, and the line of sight the user should be encouraged to react to the another inquiry is set for each user, and the inquiry means makes the inquiry again so as to encourage reaction by the corresponding predetermined action, facial expression, or line of sight for each user based on the user profile information stored in the storage means. The multi-modal virtual assistant can “comprehend non-verbal conversational cues” which can allow the assistant “to interact with a person in a natural way,” as recognized by Divakaran. (Divakaran, ¶ [0037]).

Claim 4 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chen and Cech as applied to claim 1 above, and further in view of Yamada. 

Regarding claim 4, the rejection of claim 1 is incorporated. Chen and Cech disclose all of the elements of the current invention as stated above. Chen further discloses wherein the inquiry means makes the inquiry again so as to encourage the user to make a predetermined response by a voice (“In response to determining, at operation 320, that the response provided by the user does not correspond to an answer provided, the virtual agent may determine that the user is off-track and perform remediation operation 324. The remediation operation 324 may include... ask[ing] the user a (new) question and provide answers..., at operation 326” where the “human-to-agent interaction may take the form of one or more of text (e.g., a chat session), graphics (e.g., a video conference), or audio (e.g., a voice conversation)” thus the inquiry to the user can be made by a voice. Further, a voice conversation encourages the user to provide a response by voice and where the “provided answers” are a predetermined response.; Chen, ¶¶ [0038], [0103]; FIG. 3), and the intention determination means determines the positive response, the negative response, or the predetermined keyword by [speech recognition] of the user's another voice based on the user's another voice (the virtual agent will determine the user's “given choice of the choices” {determines the positive response, the negative response, or the predetermined keyword} where the choices can be “'YES' {positive response} and 'NO {negative response}'“ and verbatim response of a “given choice of the choices” is a predetermined keyword, based on the received user response at operation 206. The virtual agent can further “determine... whether an unexpected user response (a response that is not included in a list of expected responses) corresponds to an answer provided at operation...26.” Further, as “the human-to-agent interaction may take the form of... a voice conversation,” the user response is received by speech recognition, based on the voice-based user response {the user's voice}.; Chen, ¶¶ [0029], [0038]-[0039], [0103]; FIG. 3), which is a user's response to the another inquiry (operation 206, in response to a new question provided by the virtual agent at operation 326, is a new user response {user's reaction} in response to a new question {another inquiry} made by the virtual assistant {inquiry means}; Chen, ¶¶ [0038]; FIG. 3). However, Chen and Cech fails to expressly recite wherein speech recognition includes recognizing prosody of the user's voice.
Yamada teaches “an apparatus and a method for identifying prosody on the basis of features of input speech and an apparatus and a method for recognizing speech using the prosody identification.” (Yamada, ¶ [0003]). Regarding claim 4, Yamada discloses, wherein speech recognition includes recognizing prosody of the user's voice (discloses that, in some cases, “human utterance speech cannot be identified by using phonetic information. For example, in Japanese language, “UN” that indicates an affirmative answer {positive response} is phonetically similar to “UUN” that indicates a negative answer {negative response}.” therefore the system and method can include “identifying prosody on the basis of an amount of change in movement of a feature distribution obtained from an autocorrelation matrix of a frequency characteristic of the input speech...performing speech recognition on the basis of features acquired by sound-analyzing the speech input... [and] integrat[ing] the output from the prosody identifying means with an output of the speech recognizing means”; Yamada, ¶¶ [0007], [0013])
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and methods for a context-aware virtual agent of Chen as modified by the sample time measurement for speech recognition taught in Cech, to incorporate the teachings of Yamada to include wherein speech recognition includes recognizing prosody of the user's voice. The “prosody information” can be used to identify “human utterance speech [which] cannot be identified by using phonetic information” which creates a more robust speech recognition system, as recognized by Yamada. (Yamada, ¶¶ [0007], [0014]).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Ha et al. (KR Pat. Pub. No. 101122591B1) discloses systems and methods for keyword recognition based on word candidate reference lengths.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Sean E. Serraguard whose telephone number is (313)446-6627. The examiner can normally be reached 07:00-17:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel C. Washburn can be reached on (571) 272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Sean E Serraguard/Patent Examiner, Art Unit 2657                                                                                                                                                                                                        

/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657