DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
2.	A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant’s submission filed on 08/08/2022 has been entered.

Response to Arguments/Amendments
3.	With respect to Claim Rejections 35 U.S.C § 102/103, Applicant’s arguments have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Objections
4.	Claim 20 is objected to because of the following information: typographical errors. Claim 20 recites “determine an intent” in line 7. “an intent” in this limitation should be changed to “a first intent”. Appropriate correction is required.

Claim Rejections - 35 USC § 101
5.	35 U.S.C. 101 reads as follows: 
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
6.	Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) because the claim(s) as a whole, considering all claim elements both individually and in combination, do not amount to significantly more than an abstract idea. Claims 1-20 are directed to the abstract idea of human organizing of activities. 
	The independent claim 1 recites limitations 
 	“1. (Currently Amended) A computer-implemented method for performing incremental natural language understanding, the method comprising: 
 	acquiring a first audio speech segment associated with a user utterance; converting the first audio speech segment into a first text segment; 
 	determining a first intent based on a text string associated with the first text segment, wherein the text string represents a portion of the user utterance; 
 	determining that a confidence score associated with the first intent is less than a threshold value; and 
 	in response, generating a first response that is unrelated to  the first intent prior to when the user utterance completes, wherein the first response indicates that audio associated with the user utterance is being processed and is based on a non-intent specific response library.”
	The independent claims 1, 11 and 20 recite substantially the same concept but do so in the context of a method, a non-transitory computer-readable storage medium and a system. Claim 11 recites “One or more non-transitory computer-readable media storing program instructions that, when executed by one or more processors, cause the one or more processors to perform the steps of...” and claim 20 recites “A system, comprising: one or more memories that include instructions, and one or more processors that are coupled to the one or more memories and, when executing the instructions, are configured to...” 
 	More specifically, the underlying abstract idea revolves around what happen once a human listens to another person, writes down what he hears on the paper, determines an intention of the other person based on what he hears and what he could predict, and the human could tell the other person “Could you say again!”,  “I am thinking” or “Could you say louder” if the human is not sure about the intention of the other person. 
This judicial exception is not integrated into a practical application. In particular, claims 11 and 20 recite additional elements of “memory” and “processor” (claim 1 comprises no additional limitations). For example, in paragraphs [0030, 0033 and 0034] of the as filed specification, there is description of using a general-purpose computing environment. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea. 
 	The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional element of using a computer is noted as a general computer as noted. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The additional element(s) or combination of elements such as a memory, a processor, and a computer-readable non-transitory storage medium in the claim(s) other than the abstract idea per se amount(s) to no more than (i) mere instructions to implement the idea on a computer, and/or (ii) recitation of generic computer structure that serves to perform generic computer functions that are well-understood, routine, and conventional activities previously known to the pertinent industry. Viewed as a whole, these additional claim element(s) do not provide meaningful limitation(s) to transform the abstract idea into a patent eligible application of the abstract idea such that the claim(s) amounts to significantly more than the abstract idea itself. The mere recitation of “a memory, a processor, and a computer-readable non-transitory storage medium” and/or the like is akin of adding the word “apply it” with a computer in conjunction with the abstract idea. There is further no improvement to the computing device other than tell the other person something unrelated to the intention of the other person (e.g., “Could you say again!”,  “I am thinking” or “Could you say louder”) if the human could not determine the intention of the other person. The claims are not patent eligible. 
With respect to claims 2 and 12, the claims relate to making a response related to the intention of the other person if the  human could determine the intention of the other person. This reads on the human making a response related to the intention if the human could determine the intention of the other person. No additional limitations are present. With respect to claim 4 and 14, the claims relate to keep listening and determining the intention of the other person. This read on the human listening and determining the intention of the other person while the other person is talking. The human could detect that the intention detected is different from the intention he predicts and the human could make a response based on the intention he detects. No additional limitations are present. With respect to Claims 5 and 17, claims relate to predict what the other person talk and combine with what the human hears. This reads on the human predicting based on the current context of the conversation. No additional limitations are present. With respect to claims 6-9, 18 and 19, the claims relate to modifying the response. This reads on the human changing the response based on the determined emotion. No additional limitations are present. With respect to claim 10, claim relates to modifying the response based on the video. This reads on the human changing the response based on the motion of the user. With respect to claims 15 and 16, the claims relate to overlapping between two audio speech segments. This reads on combining or not combining two consecutive audio speech segments. No additional limitations are present. These claims further do not remedy the judicial exception being integrated into a practical application and further fail to include additional elements that are sufficient to amount to significantly more than the judicial exception.
For at least the supra provided reasons, claims 1-20 are rejected under 35 U.S.C. 101 as being directed to non-statutory subject matter. 

Claim Rejections - 35 USC § 112
7.	The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

8.	Claims 1, 2, 4-12, 14-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention. 
	Claims 1, 11 recite “in response, generating a first response that is unrelated to  the first intent prior to when the user utterance completes, wherein the first response indicates that audio associated with the user utterance is being processed and is based on a non-intent specific response library.” 
 	Claim 20 recites “in response, generate a first response that is unrelated to the first intent prior to when the user utterance completes, wherein the first response indicates that audio associated with the user utterance is being processed and is based on a non-intent specific response library.” 
It is not clear, as claimed, what the generating of the first response is “in response” to (in response to what?). Even assuming “in response” refers to the immediately preceding step (“determining that a confidence score associated with the first intent is less than a threshold value”), it is, at a minimum, not clear if the generating of the first response is in response to determining that the confidence score is less than the threshold value, or is in response to determining that the confidence score is NOT less than the threshold value. 
The dependent claims 2, 4-10, 12, 14-19 do not remedy the noted indefinite issue. Claims 2, 4-10, 12, 14-19 are rejected as the same ground by virtue of their dependency. 

Claim Rejections - 35 USC § 103
9.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

10.  	 Claims 1-2, 4-5, 11-12, 14, 17, 20 are rejected under 35 U.S.C. 103 as being unpatentable over Koukoumidis et al. (US 11,086,858 B1) in view of Kwon et al. (US 2019/0244619 A1.)

	With respect to Claim 1, Koukoumidis et al. disclose 
 	 A computer-implemented method for performing incremental natural language understanding, the method comprising: 
 	acquiring a first audio speech segment associated with a user utterance (Koukoumidis et al. col. 10 lines 53-57 If the user input is based on an audio modality (e.g., the user may speak to the assistant application 136 or send a video including speech to the assistant application 136), the assistant system 140 may process it using an audio speech recognition (ASR) module 210 to convert the user input into text);  
 	converting the first audio speech segment into a first text segment (Koukoumidis et al. col. 10 lines 53-57 If the user input is based on an audio modality (e.g., the user may speak to the assistant application 136 or send a video including speech to the assistant application 136), the assistant system 140 may process it using an audio speech recognition (ASR) module 210 to convert the user input into text); 
 	determining a first intent based on a text string associated with the first text segment, wherein the text string represents a portion of the user utterance (Koukoumidis et al. col. 27 lines 45-62 if the user inputs “What's the . . . ”, the assistant system 140 may speculatively execute a weather query as described above. But if the user input continues as “What's the way . . . ”, the assistant system 140 may re-calculate its confidence scores for the possible speculative queries and determine now that the user is likely intending to ask “What's the way to [location in calendar appointment]?” Similarly, if the user input continues as “What's the way to make . . . ”, the assistant system 140 may re-calculate its confidence scores again and determine that the user is likely intending to ask “What's the way to make corned beef?” At each instance, the speculative query may be executed in advance of the user completing his input, and the response may be cached and then discarded as the user input gets longer and the assistant system 140 determines that a particular speculative query is no longer correct (e.g., the confidence score for a particular speculative query drops below a threshold));
 	determining that a confidence score associated with the first intent is less than a threshold value (Koukoumidis et al. col. 27 lines 45-62 if the user inputs “What's the . . . ”, the assistant system 140 may speculatively execute a weather query as described above. But if the user input continues as “What's the way . . . ”, the assistant system 140 may re-calculate its confidence scores for the possible speculative queries and determine now that the user is likely intending to ask “What's the way to [location in calendar appointment]?” Similarly, if the user input continues as “What's the way to make . . . ”, the assistant system 140 may re-calculate its confidence scores again and determine that the user is likely intending to ask “What's the way to make corned beef?” At each instance, the speculative query may be executed in advance of the user completing his input, and the response may be cached and then discarded as the user input gets longer and the assistant system 140 determines that a particular speculative query is no longer correct (e.g., the confidence score for a particular speculative query drops below a threshold)); and 
	Koukoumidis et al. determines the user’s intent while the user is speaking, calculates the confidence score for the determined user’s intending and generates a response to the user before the user completes the utterance. Koukoumidis et al. also teach the conversation filler to interact with the user. Koukoumidis et al. fail to explicitly teach the response indicates that audio associated with the user utterance is being processed and is based on a non-intent specific response library.
	However, Kwon et al. teach 
 	in response, generating a first response that is unrelated to  the first intent prior to when the user utterance completes, wherein the first response indicates that audio associated with the user utterance is being processed and is based on a non-intent specific response library (Kwon et al. [0015] displaying a user interface (UI) indicating activation of the voice recognition mode, claim 5: wherein based on a subsequent user voice being input to the microphone of the electronic device, control the display to display a GUI indicating that the subsequent user voice is being processed in the first area of the display while the content is displayed on the second area of the display.)
 	Koukoumidis et al. and Kwon et al. are analogous art because they are from a similar field of endeavor in the Signal Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of generating the response based on the predicted intending before the user completes the utterance as taught by Koukoumidis et al., using teaching of displaying mode of the voice recognition as taught by Kwon et al. for the benefit of indicating mode of the voice recognition to the user (Kwon et al. [0015] displaying a user interface (UI) indicating activation of the voice recognition mode, claim 5: wherein based on a subsequent user voice being input to the microphone of the electronic device, control the display to display a GUI indicating that the subsequent user voice is being processed in the first area of the display while the content is displayed on the second area of the display.)

	With respect to Claim 2, Koukoumidis et al. in view of Kwon et al. teach 
 	further comprising determining that a second confidence score associated with a second  intent is greater than the  threshold value, and generating a second response by  performing one or more operations based on an intent specific response library to generate a response that is related to the second intent (Koukoumidis et al. col. 3 lines 12-18 the assistant system may calculate a confidence score for each speculative query based on the predictive model. The confidence score represents a likelihood that the predicted complete request corresponding to the respective speculative query will match an intended complete request associated with the user input after the further input is provided, col. 25 lines 40-44 the assistant system 140 may rank the one or more speculative queries. In particular embodiments, the assistant system 140 may rank the one or more speculative queries based on their respective confidence scores, col. 26 lines 37-41 the assistant system 140 may adjust a threshold rank for executing speculative queries based on the current processing load and execute speculative queries that have a rank greater than or equal to the adjusted threshold rank.)
 
 	With respect to Claim 4, Koukoumidis et al. in view of Kwon et al. teach
 	further comprising:
 	acquiring a second audio speech segment associated with the user utterance (Koukoumidis et al. col. 27 lines 41-52 the process of determining and executing speculative queries  may happen in an iterative manner, such that as the user’s input gets longer, new speculative queries may be determined and executed. As an example and not by way of limitation, if the user input “What’s the...” the assistant system 140 may speculatively execute a weather query as described above. But if the user input continues as “What’s the way...”); 
 	converting the second audio speech segment into a second text segment (Koukoumidis et al. col. 10 lines 53-57 If the user input is based on an audio modality (e.g., the user may speak to the assistant application 136 or send a video including speech to the assistant application 136), the assistant system 140 may process it using an audio speech recognition (ASR) module 210 to convert the user input into text); 
 	concatenating the second text segment to the text string to generate a concatenated text string (Koukoumidis et al. col. 27 lines 41-56 the process of determining and executing speculative queries  may happen in an iterative manner, such that as the user’s input gets longer, new speculative queries may be determined and executed. As an example and not by way of limitation, if the user input “What’s the...” the assistant system 140 may speculatively execute a weather query as described above. But if the user input continues as “What’s the way...”, the assistant system 140 may re-calculate its confidence scores for the possible speculative queries and determines now that the user is likely intending to ask “What’s the way to [location in calendar appointment]? Similarity, if the user input continues as “What’s the way to make...” the assistant system 140 may recalculate its confidence scores again and determine that the user is likely intending to ask “What’s the way to make corned beef?”;
 	determining a second intent based on the concatenated text string that is different than the first intent (Koukoumidis et al. col. 27 lines 41-56 the process of determining and executing speculative queries  may happen in an iterative manner, such that as the user’s input gets longer, new speculative queries may be determined and executed. As an example and not by way of limitation, if the user input “What’s the...” the assistant system 140 may speculatively execute a weather query as described above. But if the user input continues as “What’s the way...”, the assistant system 140 may re-calculate its confidence scores for the possible speculative queries and determines now that the user is likely intending to ask “What’s the way to [location in calendar appointment]? Similarity, if the user input continues as “What’s the way to make...” the assistant system 140 may recalculate its confidence scores again and determine that the user is likely intending to ask “What’s the way to make corned beef?”); and  
 	generating the second response based on the second intent prior to the end of the user utterance (Koukoumidis et al. col. 27 lines 56-62 At each instance, the speculative query may be executed in advance of the user completing his input, and the response may be cached and then discarded as the user input gets longer and the assistant system 140 determines that a particular speculative query is no longer correct (e.g., the confidence score for a particular speculative query drops below a threshold).

 	With respect to Claim 5, Koukoumidis et al. in view of Kwon et al. teach
 	further comprising: 
 	applying text prediction to the text string to determine a second text segment that is likely to follow the first text segment (Koukoumidis et al. col. 24 lines 37-42 From the initial portion of the user input, the predictive model may generate one or more speculative queries and assign a confidence score related to the likelihood that the assistant system 140 determines the user input is associated with the speculative query (e.g., user input matches the speculative query)); and 
 	prior to determining the first intent, concatenating the second text segment to the text string (Koukoumidis et al. col. 24 lines 49-67 if a user input starts with “What’s the...” the assistant system 140 may generate the speculative queries “What’s the weather [in the user’s location]?” and “What’s the traffic like today [in the user’s location]?” However, give spatial signals, if the assistant system 140 determines that the user is already at work when the initial portion of the user is received, then the assistant system 140 may assign a higher confidence score to the speculative query, “What’s the weather [in the user’s location]?” because of the likelihood the user may want to know the weather as opposed to the traffic conditions. In particular embodiments, the predictive model may be trained to analyze the broader context of the user inputting the user input, such as the weather at the user’s location. Therefore, as an example and not by way of limitation, if the assistant system has detected the user is in a location subject to an oncoming hurricane, for the initial portion of the user input “What’s the ...” the assistant system may generate the speculative queries, “What’s the deadline to evacuate?” “What’s the best way to prepare for a hurricane?” and the like and assign a higher confidence score to these speculative queries.”)

 	With respect to Claim 11, Koukoumidis et al. disclose 
 	One or more non-transitory computer-readable media storing program instructions that, when executed by one or more processors, cause the one or more processors (Koukoumidis et al. col. 49 lines 9-22, col. 47 lines 47-66)  to perform the steps of: 
 	acquiring a first audio speech segment associated with a user utterance (Koukoumidis et al. col. 10 lines 53-57 If the user input is based on an audio modality (e.g., the user may speak to the assistant application 136 or send a video including speech to the assistant application 136), the assistant system 140 may process it using an audio speech recognition (ASR) module 210 to convert the user input into text);  
 	converting the first audio speech segment into a first text segment (Koukoumidis et al. col. 10 lines 53-57 If the user input is based on an audio modality (e.g., the user may speak to the assistant application 136 or send a video including speech to the assistant application 136), the assistant system 140 may process it using an audio speech recognition (ASR) module 210 to convert the user input into text); 
 	concatenating the first text segment to a text string that represents a portion of the user utterance (Koukoumidis et al. col. 27 lines 45-56 if the user inputs “What's the . . . ”, the assistant system 140 may speculatively execute a weather query as described above. But if the user input continues as “What's the way . . . ”, the assistant system 140 may re-calculate its confidence scores for the possible speculative queries and determine now that the user is likely intending to ask “What's the way to [location in calendar appointment]?” Similarly, if the user input continues as “What's the way to make . . . ”, the assistant system 140 may re-calculate its confidence scores again and determine that the user is likely intending to ask “What's the way to make corned beef?”) 
 	determining a first intent based on the text string (Koukoumidis et al. col. 27 lines 45-56 if the user inputs “What's the . . . ”, the assistant system 140 may speculatively execute a weather query as described above. But if the user input continues as “What's the way . . . ”, the assistant system 140 may re-calculate its confidence scores for the possible speculative queries and determine now that the user is likely intending to ask “What's the way to [location in calendar appointment]?” Similarly, if the user input continues as “What's the way to make . . . ”, the assistant system 140 may re-calculate its confidence scores again and determine that the user is likely intending to ask “What's the way to make corned beef?”); 
 	determining that a confidence score associated with the first intent is less than a threshold value (Koukoumidis et al. col. 27 lines 45-62 if the user inputs “What's the . . . ”, the assistant system 140 may speculatively execute a weather query as described above. But if the user input continues as “What's the way . . . ”, the assistant system 140 may re-calculate its confidence scores for the possible speculative queries and determine now that the user is likely intending to ask “What's the way to [location in calendar appointment]?” Similarly, if the user input continues as “What's the way to make . . . ”, the assistant system 140 may re-calculate its confidence scores again and determine that the user is likely intending to ask “What's the way to make corned beef?” At each instance, the speculative query may be executed in advance of the user completing his input, and the response may be cached and then discarded as the user input gets longer and the assistant system 140 determines that a particular speculative query is no longer correct (e.g., the confidence score for a particular speculative query drops below a threshold)); and 
 	Koukoumidis et al. determines the user’s intent while the user is speaking, calculates the confidence score for the determined user’s intending and generates a response to the user before the user completes the utterance. Koukoumidis et al. also teach the conversation filler to interact with the user. Koukoumidis et al. fail to explicitly teach the response indicates that audio associated with the user utterance is being processed and is based on a non-intent specific response library.
	However, Kwon et al. teach 
	in response, generating a first response that is unrelated to the first intent prior to when the user utterance completes, wherein the first response indicates that audio associated with the user utterance is being processed and is based on a non-intent specific response library (Kwon et al. [0015] displaying a user interface (UI) indicating activation of the voice recognition mode, claim 5: wherein based on a subsequent user voice being input to the microphone of the electronic device, control the display to display a GUI indicating that the subsequent user voice is being processed in the first area of the display while the content is displayed on the second area of the display.)
 	Koukoumidis et al. and Kwon et al. are analogous art because they are from a similar field of endeavor in the Signal Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of generating the response based on the predicted intending before the user completes the utterance as taught by Koukoumidis et al., using teaching of displaying mode of the voice recognition as taught by Kwon et al. for the benefit of indicating mode of the voice recognition to the user (Kwon et al. [0015] displaying a user interface (UI) indicating activation of the voice recognition mode, claim 5: wherein based on a subsequent user voice being input to the microphone of the electronic device, control the display to display a GUI indicating that the subsequent user voice is being processed in the first area of the display while the content is displayed on the second area of the display.)

 	With respect to Claim 12, Koukoumidis et al. in view of Kwon et al. teach 
 	further comprising determining that a second confidence score associated with a second intent is greater than the  threshold value, and generating a second response by performing one or more operations based on an intent specific response library to generate a response that is related to the second intent (Koukoumidis et al. col. 3 lines 12-18 the assistant system may calculate a confidence score for each speculative query based on the predictive model. The confidence score represents a likelihood that the predicted complete request corresponding to the respective speculative query will match an intended complete request associated with the user input after the further input is provided, col. 25 lines 40-44 the assistant system 140 may rank the one or more speculative queries. In particular embodiments, the assistant system 140 may rank the one or more speculative queries based on their respective confidence scores, col. 26 lines 37-41 the assistant system 140 may adjust a threshold rank for executing speculative queries based on the current processing load and execute speculative queries that have a rank greater than or equal to the adjusted threshold rank.)

 	With respect to Claim 14, Koukoumidis et al. in view of Kwon et al. teach 
 	further comprising: 
 	acquiring a second audio speech segment associated with the user utterance (Koukoumidis et al. col. 27 lines 41-52 the process of determining and executing speculative queries  may happen in an iterative manner, such that as the user’s input gets longer, new speculative queries may be determined and executed. As an example and not by way of limitation, if the user input “What’s the...” the assistant system 140 may speculatively execute a weather query as described above. But if the user input continues as “What’s the way...”); 
 	converting the second audio speech segment into a second text segment (Koukoumidis et al. col. 10 lines 53-57 If the user input is based on an audio modality (e.g., the user may speak to the assistant application 136 or send a video including speech to the assistant application 136), the assistant system 140 may process it using an audio speech recognition (ASR) module 210 to convert the user input into text); 
 	concatenating the second text segment to the text string to generate a concatenated text string (Koukoumidis et al. col. 27 lines 41-56 the process of determining and executing speculative queries  may happen in an iterative manner, such that as the user’s input gets longer, new speculative queries may be determined and executed. As an example and not by way of limitation, if the user input “What’s the...” the assistant system 140 may speculatively execute a weather query as described above. But if the user input continues as “What’s the way...”, the assistant system 140 may re-calculate its confidence scores for the possible speculative queries and determines now that the user is likely intending to ask “What’s the way to [location in calendar appointment]? Similarity, if the user input continues as “What’s the way to make...” the assistant system 140 may recalculate its confidence scores again and determine that the user is likely intending to ask “What’s the way to make corned beef?”;
 	determining a second intent based on the concatenated text string that is different than the first intent (Koukoumidis et al. col. 27 lines 41-56 the process of determining and executing speculative queries  may happen in an iterative manner, such that as the user’s input gets longer, new speculative queries may be determined and executed. As an example and not by way of limitation, if the user input “What’s the...” the assistant system 140 may speculatively execute a weather query as described above. But if the user input continues as “What’s the way...”, the assistant system 140 may re-calculate its confidence scores for the possible speculative queries and determines now that the user is likely intending to ask “What’s the way to [location in calendar appointment]? Similarity, if the user input continues as “What’s the way to make...” the assistant system 140 may recalculate its confidence scores again and determine that the user is likely intending to ask “What’s the way to make corned beef?”); and  
 	generating a second response based on the second intent prior to the end of [[when]] the user utterance (Koukoumidis et al. col. 27 lines 56-62 At each instance, the speculative query may be executed in advance of the user completing his input, and the response may be cached and then discarded as the user input gets longer and the assistant system 140 determines that a particular speculative query is no longer correct (e.g., the confidence score for a particular speculative query drops below a threshold).

 	With respect to Claim 17, Koukoumidis et al. in view of Kwon et al. teach 
 	further comprising: 
 	applying text prediction to the text string to determine a second text segment that is likely to follow the first text segment (Koukoumidis et al. col. 24 lines 37-42 From the initial portion of the user input, the predictive model may generate one or more speculative queries and assign a confidence score related to the likelihood that the assistant system 140 determines the user input is associated with the speculative query (e.g., user input matches the speculative query)); and 
 	prior to determining the first intent, concatenating the second text segment to the text string (Koukoumidis et al. col. 24 lines 49-67 if a user input starts with “What’s the...” the assistant system 140 may generate the speculative queries “What’s the weather [in the user’s location]?” and “What’s the traffic like today [in the user’s location]?” However, give spatial signals, if the assistant system 140 determines that the user is already at work when the initial portion of the user is received, then the assistant system 140 may assign a higher confidence score to the speculative query, “What’s the weather [in the user’s location]?” because of the likelihood the user may want to know the weather as opposed to the traffic conditions. In particular embodiments, the predictive model may be trained to analyze the broader context of the user inputting the user input, such as the weather at the user’s location. Therefore, as an example and not by way of limitation, if the assistant system has detected the user is in a location subject to an oncoming hurricane, for the initial portion of the user input “What’s the ...” the assistant system may generate the speculative queries, “What’s the deadline to evacuate?” “What’s the best way to prepare for a hurricane?” and the like and assign a higher confidence score to these speculative queries.”)
 	
 	With respect to Claim 20, Koukoumidis et al. disclose
 	A system, comprising: 
 	one or more memories that include instructions, and one or more processors that are coupled to the one or more memories and, when executing the instructions (Koukoumidis et al. col. 49 lines 9-22, col. 47 lines 47-66), are configured to:
 	acquire an audio speech segment associated with a user utterance (Koukoumidis et al. col. 10 lines 53-57 If the user input is based on an audio modality (e.g., the user may speak to the assistant application 136 or send a video including speech to the assistant application 136), the assistant system 140 may process it using an audio speech recognition (ASR) module 210 to convert the user input into text);   
 	convert the audio speech segment into a text segment (Koukoumidis et al. col. 10 lines 53-57 If the user input is based on an audio modality (e.g., the user may speak to the assistant application 136 or send a video including speech to the assistant application 136), the assistant system 140 may process it using an audio speech recognition (ASR) module 210 to convert the user input into text); 
 	determine an intent based on a text string associated with the text segment, wherein the text string represents a portion of the user utterance (Koukoumidis et al. col. 27 lines 45-62 if the user inputs “What's the . . . ”, the assistant system 140 may speculatively execute a weather query as described above. But if the user input continues as “What's the way . . . ”, the assistant system 140 may re-calculate its confidence scores for the possible speculative queries and determine now that the user is likely intending to ask “What's the way to [location in calendar appointment]?” Similarly, if the user input continues as “What's the way to make . . . ”, the assistant system 140 may re-calculate its confidence scores again and determine that the user is likely intending to ask “What's the way to make corned beef?” At each instance, the speculative query may be executed in advance of the user completing his input, and the response may be cached and then discarded as the user input gets longer and the assistant system 140 determines that a particular speculative query is no longer correct (e.g., the confidence score for a particular speculative query drops below a threshold));
 	determine that a confidence score associated with the first intent is less than a threshold value (Koukoumidis et al. col. 27 lines 45-62 if the user inputs “What's the . . . ”, the assistant system 140 may speculatively execute a weather query as described above. But if the user input continues as “What's the way . . . ”, the assistant system 140 may re-calculate its confidence scores for the possible speculative queries and determine now that the user is likely intending to ask “What's the way to [location in calendar appointment]?” Similarly, if the user input continues as “What's the way to make . . . ”, the assistant system 140 may re-calculate its confidence scores again and determine that the user is likely intending to ask “What's the way to make corned beef?” At each instance, the speculative query may be executed in advance of the user completing his input, and the response may be cached and then discarded as the user input gets longer and the assistant system 140 determines that a particular speculative query is no longer correct (e.g., the confidence score for a particular speculative query drops below a threshold)); and 
 	Koukoumidis et al. determines the user’s intent while the user is speaking, calculates the confidence score for the determined user’s intending and generates a response to the user before the user completes the utterance. Koukoumidis et al. also teach the conversation filler to interact with the user. Koukoumidis et al. fail to explicitly teach the response indicates that audio associated with the user utterance is being processed and is based on a non-intent specific response library.
	However, Kwon et al. teach 
 	in response, generate a first response that is unrelated to the first intent prior to when the user utterance completes, wherein the first response indicates that audio associated with the user utterance is being processed and is based on a non-intent specific response library (Kwon et al. [0015] displaying a user interface (UI) indicating activation of the voice recognition mode, claim 5: wherein based on a subsequent user voice being input to the microphone of the electronic device, control the display to display a GUI indicating that the subsequent user voice is being processed in the first area of the display while the content is displayed on the second area of the display.)
 	Koukoumidis et al. and Kwon et al. are analogous art because they are from a similar field of endeavor in the Signal Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of generating the response based on the predicted intending before the user completes the utterance as taught by Koukoumidis et al., using teaching of displaying mode of the voice recognition as taught by Kwon et al. for the benefit of indicating mode of the voice recognition to the user (Kwon et al. [0015] displaying a user interface (UI) indicating activation of the voice recognition mode, claim 5: wherein based on a subsequent user voice being input to the microphone of the electronic device, control the display to display a GUI indicating that the subsequent user voice is being processed in the first area of the display while the content is displayed on the second area of the display.)

11.  	 Claims 6-7, 18 are rejected under 35 U.S.C. 103 as being unpatentable over Koukoumidis et al. (US 11,086,858 B1) in view of Kwon et al. (US 2019/0244619 A1) and Penilla et al. (US 2016/0104486 A1.)

 	With respect to Claim 6, Koukoumidis et al. in view of Kwon et al. teach all the limitations of Claim 1 upon which Claim 6 depends. Koukoumidis et al. in view of Kwon et al. fail to explicitly teach 
 	further comprising: 
 	determining a personality attribute weighting of an artificial intelligence avatar associated with the first response; and 
 	modifying the first response based on the personality attribute weighting. 
	However, Penilla et al. teach 
 	further comprising: 
 	determining a personality attribute weighting of an artificial intelligence avatar associated with the first response (Penilla et al. [0025] the voice profile identifies a type of vehicle response that is customized for the user, based on the identified tone in the voice input by the user); and 
 	modifying the first response based on the personality attribute weighting (Penilla et al. [0298] the vehicle response can be adjusted to cater to the tone of the user, e.g., so as to provide, augment, modify, moderate, and/or change the vehicle response to detected tone in the user’s voice.)
 	Koukoumidis et al., Kwon et al. and Penilla et al. are analogous art because they are from a similar field of endeavor in the Signal Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of generating the response based on the predicted intending before the user completes the utterance as taught by Koukoumidis et al., using teaching of displaying mode of the voice recognition as taught by Kwon et al. for the benefit of indicating mode of the voice recognition to the user, using teaching of modifying the response as taught by Penilla et al. for the benefit of modifying the response in response to detected tone in the user’s voice (Penilla et al. [0298] the vehicle response can be adjusted to cater to the tone of the user, e.g., so as to provide, augment, modify, moderate, and/or change the vehicle response to detected tone in the user’s voice.)
	
 	With respect to Claim 7, Koukoumidis et al. in view of Penilla et al. teach 
 	wherein the personality attribute weighting includes at least one of an excitability weighting, a curiosity weighting, and an interruptability weighting (Penilla et al. [0013] the mood of the user includes one or more of a normal mood, a frustrated mood, an agitated mood, an upset mood, a hurried mood, an urgency mood, a rushed mood, a stressed mood, a calm mood, a passive mood, a sleepy mood, a happy mood, an excited mood, or combinations of two or more thereof.)

 	With respect to Claim 18, Koukoumidis et al. in view of Kwon et al. all the limitations of Claim 11 upon which Claim 18 depends. Koukoumidis et al. in view of Kwon et al. fail to explicitly teach 
 	further comprising: 
 	determining a personality attribute weighting of an artificial intelligence avatar associated with the first response; and 
 	modifying the first response based on the personality attribute weighting. 
	However, Penilla et al. teach 
 	further comprising: 
 	determining a personality attribute weighting of an artificial intelligence avatar associated with the first response (Penilla et al. [0025] the voice profile identifies a type of vehicle response that is customized for the user, based on the identified tone in the voice input by the user); and 
 	modifying the first response based on the personality attribute weighting (Penilla et al. [0298] the vehicle response can be adjusted to cater to the tone of the user, e.g., so as to provide, augment, modify, moderate, and/or change the vehicle response to detected tone in the user’s voice.)
 	Koukoumidis et al., Kwon et al. and Penilla et al. are analogous art because they are from a similar field of endeavor in the Signal Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of generating the response based on the predicted intending before the user completes the utterance as taught by Koukoumidis et al., using teaching of displaying mode of the voice recognition as taught by Kwon et al. for the benefit of indicating mode of the voice recognition to the user, using teaching of modifying the response as taught by Penilla et al. for the benefit of modifying the response in response to detected tone in the user’s voice (Penilla et al. [0298] the vehicle response can be adjusted to cater to the tone of the user, e.g., so as to provide, augment, modify, moderate, and/or change the vehicle response to detected tone in the user’s voice.)

12.  	 Claims 8-9, 19 are rejected under 35 U.S.C. 103 as being unpatentable over Koukoumidis et al. (US 11,086,858 B1) in view of Kwon et al. (US 2019/0244619 A1) and Arora et al. (US 2019/0318219 A1.)

 	With respect to Claim 8, Koukoumidis et al. in view of Kwon et al. teach all the limitations of Claim 1 upon which Claim 8 depends. Koukoumidis et al. in view of Kwon et al. fail to explicitly teach 
 	further comprising: 
 	determining an intonation cue associated with the first audio speech segment; and
 	modifying the first response based on the intonation cue.  
	However, Arora et al. teach 
 	determining an intonation cue associated with the first audio speech segment (Arora et al. [0026] the customized response program may detect changes in the speaking pattern and tone of voice of the user and, based on an analysis of the user’s persona, the customized response program may appropriate match the user’s tone of voice, emotion, speaking pattern and language); and
 	modifying the first response based on the intonation cue (Arora et al. [0026] the customized response program may detect changes in the speaking pattern and tone of voice of the user and, based on an analysis of the user’s persona, the customized response program may appropriate match the user’s tone of voice, emotion, speaking pattern and language) 
Koukoumidis et al., Kwon et al. and Arora et al. are analogous art because they are from a similar field of endeavor in the Signal Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of generating the response based on the predicted intending before the user completes the utterance as taught by Koukoumidis et al., using teaching of displaying mode of the voice recognition as taught by Kwon et al. for the benefit of indicating mode of the voice recognition to the user, using teaching of customizing the response as taught by Arora et al. for the benefit of customizing the response in response to changes in tone of voice of the user (Arora et al. [0026] the customized response program may detect changes in the speaking pattern and tone of voice of the user and, based on an analysis of the user’s persona, the customized response program may appropriate match the user’s tone of voice, emotion, speaking pattern and language.)

 	With respect to Claim 9, Koukoumidis et al. in view of Kwon et al. and Arora et al. teach 
 	wherein the intonation cue includes at least one of a rising intonation, a trailing intonation, and a declarative intonation (Arora et al. [0026] the customized response program may detect changes in the speaking pattern and tone of voice of the user and, based on an analysis of the user’s persona, the customized response program may appropriate match the user’s tone of voice, emotion, speaking pattern and language.)

 	With respect to Claim 19, Koukoumidis et al. in view of Kwon et al. teach  all the limitations of Claim 11 upon which Claim 19 depends. Koukoumidis et al. in view of Kwon fail to explicitly teach 
 	further comprising: 
 	determining an intonation cue associated with the first audio speech segment; and
 	modifying the first response based on the intonation cue.  
	However, Arora et al. teach 
 	determining an intonation cue associated with the first audio speech segment (Arora et al. [0026] the customized response program may detect changes in the speaking pattern and tone of voice of the user and, based on an analysis of the user’s persona, the customized response program may appropriate match the user’s tone of voice, emotion, speaking pattern and language); and
 	modifying the first response based on the intonation cue (Arora et al. [0026] the customized response program may detect changes in the speaking pattern and tone of voice of the user and, based on an analysis of the user’s persona, the customized response program may appropriate match the user’s tone of voice, emotion, speaking pattern and language) 
Koukoumidis et al., Kwon et al. and Arora are analogous art because they are from a similar field of endeavor in the Signal Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of generating the response based on the predicted intending before the user completes the utterance as taught by Koukoumidis et al., using teaching of displaying mode of the voice recognition as taught by Kwon et al. for the benefit of indicating mode of the voice recognition to the user, using teaching of customizing the response as taught by Arora et al. for the benefit of customizing the response in response to changes in tone of voice of the user (Arora et al. [0026] the customized response program may detect changes in the speaking pattern and tone of voice of the user and, based on an analysis of the user’s persona, the customized response program may appropriate match the user’s tone of voice, emotion, speaking pattern and language.)

13.  	 Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Koukoumidis et al. (US 11,086,858 B1) in view of Kwon et al. (US 2019/0244619 A1) and Wang et al. (US 2018/0357286 A1.)

 	With respect to Claim 10, Koukoumidis et al. in view of Kwon et al. teach  all the limitations of Claim 1 upon which Claim 8 depends. Koukoumidis et al. in view of Kwon et al. fail to explicitly teach 
 	further comprising: 
 	analyzing a video feed associated with the user utterance; 
 	determining a second intent based on the video feed; and 
 	modifying the first response based on the second intent.  
	However, Wang et al. teach 
 	further comprising: 
 	analyzing a video feed associated with the user utterance (Wang et al. [0030] the labeled emotion database 118 includes labeled user data where the user data is associated with one or more potential emotions. Potential emotions include, but are not limited to, "happy," "sad," "agitated," "angry," "upset," "joyful," "tearful," "depressed," "despair," and other such emotions or combinations of emotions. As known to one of ordinary skill in the art, the labeled user data, such as the historical GPS locations, the current GPS location, the prosody of a given query and/or command, the words and/or phrases used in a particular query and/or command, an image and/or video, are each associated with one or more of the potential emotions);
 	determining a second intent based on the video feed (Wang et al. [0013] the disclosed emotional chatbot continuously tracks a given user's emotions using a multimodal emotion detection approach that includes determining the user's emotions from a variety of signals including, but not limited to, biometric data, voice data (e.g., the prosody of the user's voice), content/text, one or more camera images, one or more facial expressions, and other such sources of contextual data); and 
 	modifying the first response based on the second intent (Wang et al. [0017] The conversational chatbot may then further modify response to a user’s query and/or command based on assigned particular emotional state.)
 	Koukoumidis et al., Kwon et al. and Wang et al. are analogous art because they are from a similar field of endeavor in the Signal Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of generating the response based on the predicted intending before the user completes the utterance as taught by Koukoumidis et al., using teaching of displaying mode of the voice recognition as taught by Kwon et al. for the benefit of indicating mode of the voice recognition to the user, using teaching of modifying the response as taught by Wang et al for the benefit of modifying the response in response to the particular emotional state (Wang et al. [0017] The conversational chatbot may then further modify response to a user’s query and/or command based on assigned particular emotional state.)

14.  	 Claims 15-16 are rejected under 35 U.S.C. 103 as being unpatentable over Koukoumidis et al. (US 11,086,858 B1) in view of Kwon et al. (US 2019/0244619 A1) and Costa (US 9,361,084 B1.)

 	With respect to Claim 15, Koukoumidis et al. in view of Kwon et al. all the limitations of Claim 14 upon which Claim 15 depends. Koukoumidis et al. in view of Kwon et al. fail to explicitly teach 
 	wherein a first duration of time represented by the first audio speech segment overlaps with a second duration of time represented by the second audio speech segment.  
	However, Costa teaches 
 	wherein a first duration of time represented by the first audio speech segment overlaps with a second duration of time represented by the second audio speech segment (Costa col. col. 6 lines 60-67 The speech recognition module 104 may also be configured to sample and quantize the received input, divide the received input into overlapping or non-overlapping frames of time (e.g., 15 milliseconds), and/or perform spectral analysis on the frames to derive the spectral components of each frame. In addition, the speech recognition module 104 or a similar component may be configured to perform processes relating to noise removal.)
 	Koukoumidis et al., Kwon et al. and Costa are analogous art because they are from a similar field of endeavor in the Signal Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of generating the response based on the predicted intending before the user completes the utterance as taught by Koukoumidis et al., using teaching of displaying mode of the voice recognition as taught by Kwon et al. for the benefit of indicating mode of the voice recognition to the user, using teaching of overlapping frames in time domain as taught by Costa for the benefit of performing spectral analysis in speech recognition (Costa col. col. 6 lines 60-67 The speech recognition module 104 may also be configured to sample and quantize the received input, divide the received input into overlapping or non-overlapping frames of time (e.g., 15 milliseconds), and/or perform spectral analysis on the frames to derive the spectral components of each frame. In addition, the speech recognition module 104 or a similar component may be configured to perform processes relating to noise removal.)

 	With respect to Claim 16, Koukoumidis et al. in view of Kwon et al. teach all the limitations of Claim 14 upon which Claim 16 depends. Koukoumidis et al. in view of Kwon et al. fail to explicitly teach 
 	wherein a first duration of time represented by the first audio speech segment is non- overlapping with a second duration of time represented by the second audio speech segment.  
 	However, Costa teaches 
 	wherein a first duration of time represented by the first audio speech segment is non- overlapping with a second duration of time represented by the second audio speech segment (Costa col. col. 6 lines 60-67 The speech recognition module 104 may also be configured to sample and quantize the received input, divide the received input into overlapping or non-overlapping frames of time (e.g., 15 milliseconds), and/or perform spectral analysis on the frames to derive the spectral components of each frame. In addition, the speech recognition module 104 or a similar component may be configured to perform processes relating to noise removal.)
 	Koukoumidis et al., Kwon et al. and Costa are analogous art because they are from a similar field of endeavor in the Signal Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of generating the response based on the predicted intending before the user completes the utterance as taught by Koukoumidis et al., using teaching of displaying mode of the voice recognition as taught by Kwon et al. for the benefit of indicating mode of the voice recognition to the user, using teaching of non-overlapping frames in time domain as taught by Costa for the benefit of performing spectral analysis in speech recognition (Costa col. col. 6 lines 60-67 The speech recognition module 104 may also be configured to sample and quantize the received input, divide the received input into overlapping or non-overlapping frames of time (e.g., 15 milliseconds), and/or perform spectral analysis on the frames to derive the spectral components of each frame. In addition, the speech recognition module 104 or a similar component may be configured to perform processes relating to noise removal.)

Conclusion
15.	The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure. See PTO-892.
a.	Nelson et al. (US 2015/0053781 A1.) In this reference, Nelson et al. teach displaying the state of the speech recognizer to the user. 
b.	Lee et al. (US 2020/0074993 A1.) In this reference, Lee et al. teach updating the response information while providing the response information, on the basis of an additional word uttered after the at least one word is input.
c. 	Lemons et al. (US 2019/0394547 A1.) In this reference, Lemons et al. teach visual indicators that may indicate to a user that the voice activated device is active, listening, not listening, processing, speaking, and/or other actions or states. 

16.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to THUYKHANH LE whose telephone number is (571)272-6429. The examiner can normally be reached Mon-Fri: 9am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew C. Flanders can be reached on (571) 272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/THUYKHANH LE/Primary Examiner, Art Unit 2655