DETAILED ACTION
Response to Amendment
The amendment filed on 03/15/22 has been entered. Claims 1-5, 9-15, 19-20 remain pending in the application. It is acknowledged that claims 7, 17 have been cancelled. This action includes new grounds of rejection, and therefore, has not been made final.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3-5, 9-11, 13-15, 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Tadpatrikar (US 2017/0110116) in view of Gibbs (US 2006/0106769) and further in view of Buchholz (US 2002/0111812).

Regarding claim 1, Tadpatrikar discloses:
A method, comprising: engaging, at a digital assistant of an information handling device, in a conversational session with a user; receiving, during the conversational session, query input at least by ([0017] discloses the user speaking voice queries to a computing device, such as “text mom I love you” which is transcribed and shown in a natural-language dialogue 124 (conversational session) in Fig. 1 [0040] discloses that the computing device can be a personal digital assistant);
determining, at the information handling device, that the query input corresponds to a partial query input, wherein the determining comprises detecting one or more pause indicators in the query input at least by ([0019] “the computing device 121 may generate, without factoring in any characteristics of the user 127, the general endpoint signal 103 and the complete query signal 106. The complete query signal 106 represents an estimate performed by the computing device 121 that the generated transcription of the utterance 130 represents a complete utterance. The computing device 121 compares the generated transcription to one or more complete utterances that the user 127 and other users have previously spoken. The computing device 121 may compare the generated transcription to the complete utterances after a speech recognizer of computing device 121 has identified a new word. For example, after the user 127 speaks word 133, a speech recognizer of the computing device 121 generates the transcription “text.” The computing device 121 compares “text” to other complete utterances and determines that “text” is not a complete utterance. After the user 127 speaks word 139, the speech recognizer generates the transcription “text mom” that the computing device 121 identifies as complete. A similar determination is made after word 151. After the user 127 speaks word 145, the speech recognizer generates the transcription “text mom love” that the computing device 121 identifies as incomplete” [0021] “the computing device 121 may factor in the characteristics of the user 127 when identifying an endpoint of the utterance 124. On one hand, a novice user may speak with longer pauses between words possibly because the novice user may be unfamiliar with what terms may be best to speak to the computing device 121. On the other hand, an expert user may speak with shorter pauses between words because the expert user may be more comfortable and familiar with the speech input technology of computing device 121. Accordingly, the computing device 121 may lengthen or shorten the amount of time before it identifies a pause depending on how the computing device 121 categorizes the user”, Fig. 1 shows the determining of partial query input and pause detection (pause indicators));
wherein the one or more pause indicators in the query input are identified via reference to crowd-sourced data at least by ([0021] “the computing device 121 may factor in the characteristics of the user 127 when identifying an endpoint of the utterance 124. On one hand, a novice user may speak with longer pauses between words possibly because the novice user may be unfamiliar with what terms may be best to speak to the computing device 121. On the other hand, an expert user may speak with shorter pauses between words because the expert user may be more comfortable and familiar with the speech input technology of computing device 121. Accordingly, the computing device 121 may lengthen or shorten the amount of time before it identifies a pause depending on how the computing device 121 categorizes the user” [0026] “FIG. 2 is diagram of an example system 200 that classifies a particular user based on the particular user's experience with speech input. In some implementations, the system 200 may be included in a computing device that the particular user uses for speech input, such as computing device 121. In some implementations, the system may be included in a server that processes transcriptions of speech input” [0028] “Query log 210 illustrates the voice queries provided by the user Bob. The voice queries in query log 210 include three voice queries and each includes either a complete indicator “[C]” or an incomplete indicator “[I].” Each voice query includes a timestamp that notes the date and time that Bob spoke the voice query. Each voice query includes data indicating the pause intervals between the spoken words. For example, “cat videos” may include data to indicate that Bob paused two hundred milliseconds between “cat” and “video.” “Call . . . mom” may include data to indicate that Bob paused one second between “call” and “mom.”” [0029] “Query log 215 illustrates the voice queries provided by the user Alice. The voice queries in query log 215 include five voice queries and each includes either a complete indicator “[C]” or an incomplete indicator “[I].”) and the crowd-sourced data are the query logs which include completed queries of the users;
dynamically extending, responsive to the determining, a wait time for processing the query input, wherein the dynamically extending the wait time for processing the query input comprises: identifying an input-provision context associated with the user; and extending the wait time for processing the query input by a predetermined amount of time, wherein the predetermined amount of time is dictated by the input-provision context; … query input… at least by ([0021] “the computing device 121 may factor in the characteristics of the user 127 when identifying an endpoint of the utterance 124. On one hand, a novice user may speak with longer pauses between words possibly because the novice user may be unfamiliar with what terms may be best to speak to the computing device 121. On the other hand, an expert user may speak with shorter pauses between words because the expert user may be more comfortable and familiar with the speech input technology of computing device 121. Accordingly, the computing device 121 may lengthen or shorten the amount of time before it identifies a pause depending on how the computing device 121 categorizes the user” [0022] “The novice pause detector signal 109 illustrates the computing device 121 detecting a pause in audio data corresponding to utterance 124, where the detected pause length is longer than the pause length the corresponds to the general endpointer. For example, the computing device 121 may detect pauses with a length of one second when the user 127 is classified as a novice user. Applying this pause threshold to utterance 124, the computing device 121 will not detect novice length pauses during pauses 136 and 124 because those pauses are of length three hundred milliseconds and eight hundred milliseconds, respectively. The computing device 121 does detect novice length pauses during pauses 148 and 154. As shown in novice pause detector signal 109, the computing device 121 detects a pause of one second during pause 148 after the user 127 spoke word 145. The computing device 121 also detects a pause of one second during pause 154 after the user spoke word 151.” [0023] “The computing device 121 determines, based on the novice pause detector signal 109 and the complete query signal 106, a speech endpoint for the utterance 124 when the computing device classifies the user as a novice. When the computing device 121 detects a pause, such as the pause of the novice pause detector signal 109 during pause 148, the computing device 121 determines whether the utterance 124 is complete. During pause 148, the complete query signal 106 indicates that the utterance 124 is not complete. Even though the computing device 121 detected a novice length pause, the utterance 124 is not complete, so the computing device 121 continues processing the audio data of the utterance 124. During pause 154, the computing device 121 detects a novice length pause and the complete query signal 106 indicates that the utterance is complete and, therefore, generates an endpoint of the utterance 124 as indicated by the novice endpoint signal 112. When the user 127 is classified as a novice, the endpoint of the utterance 124 is after word 151, and the transcription 160 of the utterance 124 is “Text Mom love you.”” [0024] “The expert pause detector signal 115 illustrates the computing device 121 detecting a pause in audio data corresponding to utterance 124, where the detected pause length is shorter than the pause length the corresponds to the general endpointer. For example, the computing device 121 may detect pauses with a length of three hundred milliseconds when the user 127 is classified as an expert user. Applying this pause threshold to utterance 124, the computing device 121 detects expert length pauses during pauses 136, 142, 148, and 154. Because none of the pauses are less than three hundred milliseconds, all of the pauses in utterance 124 include an expert length pause detection.” [0025] “The computing device 121 combines the expert pause detector signal 115 and the complete query signal 106 to determine a speech endpoint for the utterance 124 when the computing device classifies the user as an expert. When the computing device 121 detects a pause, such as the pause of the expert pause detector signal 115 during pause 136, the computing device 121 determines whether the utterance 124 is complete. During pause 136, the complete query signal 106 indicates that the utterance 124 is not complete. Even though the computing device 121 detected an expert length pause, the utterance 124 is not complete, so the computing device 121 continues processing the audio data of the utterance 124. During pause 142, the computing device 121 detects an expert length pause and the complete query signal 106 indicates that the utterance is complete and, therefore, generates an endpoint of the utterance 124 as indicated by the expert endpoint signal 118. When the user 127 is classified as an expert, the endpoint of the utterance 124 is after word 139, and the transcription 163 of the utterance 124 is “Text Mom.””, Fig. 1 graphically shows the determining of query completion, speech endpointing based on user type, and pause detection between utterances) and an input-provision context is the classification of the user as an expert or novice which affects how the system endpoints the user’s utterances/queries;
wherein the wait time is based upon the query input and wherein the wait time is different for different query inputs at least by ([0028] “Query log 210 illustrates the voice queries provided by the user Bob. The voice queries in query log 210 include three voice queries and each includes either a complete indicator “[C]” or an incomplete indicator “[I].” Each voice query includes a timestamp that notes the date and time that Bob spoke the voice query. Each voice query includes data indicating the pause intervals between the spoken words. For example, “cat videos” may include data to indicate that Bob paused two hundred milliseconds between “cat” and “video.” “Call . . . mom” may include data to indicate that Bob paused one second between “call” and “mom.”” [0029] “Query log 215 illustrates the voice queries provided by the user Alice. The voice queries in query log 215 include five voice queries and each includes either a complete indicator “[C]” or an incomplete indicator “[I].” Each voice query includes a timestamp that notes the date and time that Alice spoke the voice query. Each voice query includes data indicating the pause intervals between the spoken words. For example, “Text Sally that I'll be ten minutes late” may include data to indicate that Alice paused one millisecond between “text” and “Sally,” paused three hundred milliseconds between “Sally” and “that,” and paused 1.5 seconds between “that” and “I'll,” as well as pause intervals between the other words. “Call mom” may include data to indicate that Alice paused three milliseconds between “call” and “mom.””, Fig. 1 also shows different pause detector times for different types of users for each separate utterance) and the pause time between words, such as “text” and “Sally” is three hundred milliseconds while the pause time between “call” “mom” is three milliseconds (wait time is based upon the query input and is different for different query inputs);
Tadpatrikar fails to disclose “wherein the determining comprises accessing crowd-sourced query input structures comprising the crowd-sourced data associated with completed queries related to a topic of the query input and comparing the query input to the crowd-sourced query input structures, wherein the comparing the query input to the crowd-sourced query input structures comprises identifying a feature of the user and comparing the query input to crowd-sourced input structures corresponding to other users having the feature; wherein the dynamically extending the wait time for processing the query input comprises extending the wait time for processing the query input by a predetermined amount depending on a topic of the query input; providing, using at least one output device associated with the information handling device, a notification that informs the user of the extending and that requests the user to provide additional query input; and executing, responsive to determining that the additional query input is received, a function corresponding to a completed query, wherein the completed query comprises the partial query input and the additional query input”
However, Gibbs teaches the following limitations, wherein the determining comprises accessing crowd-sourced query input structures comprising the crowd-sourced data associated with completed queries related to a topic of the query input and comparing the query input to the crowd-sourced query input structures, wherein the comparing the query input to the crowd-sourced query input structures comprises identifying a feature of the user and comparing the query input to crowd-sourced input structures corresponding to other users having the feature at least by ([0047] “FIG. 5 shows a set of data structures associated with historical queries (i.e., queries previously submitted) used for predicting queries corresponding to partially entered queries” [0048] “Referring to FIG. 5, a historical query log 502 is filtered by one or more filters 504 to create an authorized historical queries list 506. An ordered set builder 508 creates one or more fingerprint-to-table maps 510 from the authorized historical queries list 506 based on certain criteria. When the partial query is transmitted (FIG. 3, 308), it is received at the search engine 208 as partial query 513” [0049] “partial search queries received from a particular website might be mapped to predicted results using a set of fingerprint-to-table maps that were generated from historical queries received from the same website, or from a group of websites deemed to be similar to the particular website. Similarly, an individual user may, with his/her permission, have a user profile that specifies information about the user or about a group associated with the user, and that “personalization information” may be used to identify a respective set of fingerprint-to-table maps for use when predicting results for that user.” [0051] “The historical query log 502 contains a log of previously submitted queries received by the search engine 208 over a period of time. In some embodiments, the queries are from a particular user. In some embodiments, the queries are from a community of users sharing at least one similar characteristic such as belonging to the same workgroup, using the same language, having an internet address associated with the same country or geographic region, or the like. The selection of the community determines the pool of previously submitted queries from which the predictions are drawn. Different communities would tend to produce different sets of predictions” [0053] “other types of meta-data are associated and stored with the query such as the query language or other information which might be provided by the user or search assistant in accordance with user preferences (e.g., identification or profile information indicating certain preferences of the user). In some embodiments, the meta-information includes category or concept information gleaned from analyzing the terms in the query.” [0058] “Similarly, different fingerprint-to-table maps 510 could be created for geographical regions. As another example, different fingerprint-to-table maps 510 could be created from queries from particular IP addresses or groups of addresses, such as those from a particular network or a particular group of individuals (e.g., a corporation). Using the meta-information to create different fingerprint-to-table maps 510, allows the predictions to be based on users having characteristics similar to that of the searcher and which should increase the likelihood of a correct prediction”);
providing, using at least one output device associated with the information handling device, a notification that informs the user of the extending and that requests the user to provide additional query input at least by ([0039] “Regardless of the way in which the partial input is identified, it is transmitted to the search engine 208 (308) for processing. In response to the partial search query, the search engine 208 returns a set of ordered predicted search queries and/or URLs (310) which is presented to the user (312) ordered in accordance with a ranking criteria. The predictions may be displayed to the user in a number of ways. For example, the predictions could be displayed in a drop-down window, a persistent, or non-persistent window or other ways. In some embodiments, queries which the user had previously submitted could be visually indicated to the user (e.g., by highlighting the user's own previously entered queries” [0041] “r. For example, the predicted search queries and/or URLs might be presented in a drop down menu. Regardless of the manner in which the predicted queries and/or URLs are presented to the user, the user may select one of the queries and/or URLs if the user determines that one of the predictions matches the intended entry. In some instances, the predictions may provide the user with additional information which had not been considered. For example, a user may have one query in mind as part of a search strategy, but seeing the predicted results causes the user to alter the input strategy. Once the set is presented (312), the user's input is again monitored. If the user selects one of the predictions (302-final), the request is transmitted either to the search engine 208 as a search request or to a resource host as a URL request (304), as applicable”) and the system predicts a set of completed search queries based on entered partial search queries and displays them to the user for additional input which are selected by a user or the user alters the original search query;
and executing, responsive to determining that the additional query input is received, a function corresponding to a completed query, wherein the completed query comprises the partial query input and the additional query input at least by ([0033] “When a final input or selection (302—final input) is identified as a search query, the input is transmitted to the search engine 208 (304) for processing. The search engine 208 returns a set of search results, which is received by the search assistant 204 (306) or by a client application, such as a browser application. The list of search results is presented to the user such that the user may select one of the documents for further examination (e.g., visually or aurally)” [0041] “For example, the predicted search queries and/or URLs might be presented in a drop down menu. Regardless of the manner in which the predicted queries and/or URLs are presented to the user, the user may select one of the queries and/or URLs if the user determines that one of the predictions matches the intended entry. In some instances, the predictions may provide the user with additional information which had not been considered. For example, a user may have one query in mind as part of a search strategy, but seeing the predicted results causes the user to alter the input strategy. Once the set is presented (312), the user's input is again monitored. If the user selects one of the predictions (302-final), the request is transmitted either to the search engine 208 as a search request or to a resource host as a URL request (304), as applicable”) and once the system determines that a query input is final such as after user selection of the suggested queries, it is sent to the server and search results are returned to the user for selection and viewing (executing a function corresponding to the completed query).
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the teaching of Gibbs into the teaching of Tadpatrikar because the references similarly disclose speech and/or query endpointing. Consequently, one of ordinary skill in the art would be motivated to further modify the system as in Tadpatrikar to further include the accessing of crow-sourced query structures as in Gibbs to more accurately characterize the users input in order to reduce endpointing errors which would result in a bad user experience.
Tadpatrikar, Gibbs fail to disclose “wherein the dynamically extending the wait time for processing the … input comprises extending the wait time for processing the … input by a predetermined amount depending on a topic of the … input”
However, Buchholz teaches the above limitation at least by ([0021] “The set of pause types includes, but is not limited to speaker pauses, topic pauses, heading pauses, paragraph pauses, sentence pauses, phrase pauses, word pauses, end pauses and live audio pauses…Topic and speakers pauses are more indicative of spoken dialogues and represent context information. During the course of a conversation, speakers and topics change. These are natural places to introduce pauses in the playback. In the case of TTS, several types of indicia may be used to detect such places. For example, in a Q&A scenario, the question is often differentiated by Q: or speaker initials, e.g., DB, CM, or by typeface change. Thus, a pause would be introduced after a question (speaker change) or after the answer (topic change assuming each question introduces a new topic). In the case of recorded audio, an edit function introduces speaker and topic markers at the discretion of the editor. Those having ordinary skill in the art will appreciate that other methods for identifying topic and speaker pauses may be used” [0024] “an application 502 a provides text data 520 as input to a parser 504.” [0033] “The controller 602, upon receiving a message 650 that a condition adversely affecting the delivery of the transmitted packets has arisen, can decide to insert one or more pauses in the reconstructed audio 640 based on the pause information 652 included in the transmitted packets 630…the controller 602 attempts to insert pauses at the speaker or topic level first, then at the heading, paragraph, or sentence level, and finally at the phrase or word level” [0036] “As note above, the length of pauses inserted may be set to a predetermined length…. In a second embodiment, the length of a pause could be dependent upon the type of pause being inserted. For example, word and phrase pauses could be of a relatively short duration; heading, paragraph and sentence pauses could be longer; and speaker and topic pauses could be longer still… Further still, combinations of these three approaches may be mixed.”) and pauses of a predetermined length are inserted into the parsed input data based on the type of pauses to be inserted, such as a topic pauses which are of longer duration than those inserted for word, heading, paragraph, or sentence pauses.
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the teaching of Buchholz into the teaching of Tadpatrikar, Gibbs because the references similarly disclose speech and/or query endpointing. Consequently, one of ordinary skill in the art would be motivated to further modify the system as in the combination of references to further include the insertion of pauses depending on topics of the input data as in Buchholz to “accommodate the occurrence of delays in delivering the packets” (Buchholz, [0006]).
As per claim 3, claim 1 is incorporated, Tadpatrikar further discloses:
wherein the detecting the one or more pause indicators in the query input comprises detecting a pause contained with the query input at least by ([0021] “the computing device 121 may factor in the characteristics of the user 127 when identifying an endpoint of the utterance 124. On one hand, a novice user may speak with longer pauses between words possibly because the novice user may be unfamiliar with what terms may be best to speak to the computing device 121. On the other hand, an expert user may speak with shorter pauses between words because the expert user may be more comfortable and familiar with the speech input technology of computing device 121. Accordingly, the computing device 121 may lengthen or shorten the amount of time before it identifies a pause depending on how the computing device 121 categorizes the user”, Fig. 1 shows the determining of partial query input and pause detection (pause indicators)).
As per claim 4, claim 1 is incorporated, Tadpatrikar further discloses:
wherein the determining whether the query input has been completed comprises accessing a query history associated with the user at least by ([0025] “The computing device 121 combines the expert pause detector signal 115 and the complete query signal 106 to determine a speech endpoint for the utterance 124 when the computing device classifies the user as an expert. When the computing device 121 detects a pause, such as the pause of the expert pause detector signal 115 during pause 136, the computing device 121 determines whether the utterance 124 is complete.” [0026] “FIG. 2 is diagram of an example system 200 that classifies a particular user based on the particular user's experience with speech input. In some implementations, the system 200 may be included in a computing device that the particular user uses for speech input, such as computing device 121. In some implementations, the system may be included in a server that processes transcriptions of speech input” [0027] “The system 200 includes voice queries 205. The voice query log 205 stores the previous voice queries that users provide to the system 200.”).
As per claim 5, claim 4 is incorporated, Tadpatrikar fails to disclose “wherein the accessed query history associated with the user comprises a query history identifying structures of complete queries related to a topic of the query input”
However, Gibbs teaches the above limitation at least by ([0047] “FIG. 5 shows a set of data structures associated with historical queries (i.e., queries previously submitted) used for predicting queries corresponding to partially entered queries” [0049] “different sets of fingerprint-to-table maps 510 may be used for respective categories of users, thereby providing predicted results that are biased in accordance with one or more categories or topics associated with the user” [0051] “The historical query log 502 contains a log of previously submitted queries received by the search engine 208 over a period of time. In some embodiments, the queries are from a particular user. In some embodiments, the queries are from a community of users sharing at least one similar characteristic such as belonging to the same workgroup, using the same language, having an internet address associated with the same country or geographic region, or the like.” [0053] “other types of meta-data are associated and stored with the query such as the query language or other information which might be provided by the user or search assistant in accordance with user preferences (e.g., identification or profile information indicating certain preferences of the user). In some embodiments, the meta-information includes category or concept information gleaned from analyzing the terms in the query.”).
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the teaching of Gibbs into the teaching of Tadpatrikar because the references similarly disclose speech and/or query endpointing. Consequently, one of ordinary skill in the art would be motivated to further modify the system as in Tadpatrikar to further include the accessing of crow-sourced query structures related to a topic as in Gibbs to more generally characterize the users and the user input in order to reduce endpointing errors which would result in a bad user experience.
As per claim 9, claim 1 is incorporated, Gibbs further discloses:
further comprising performing a function associated with the query input upon indication of completion of the query input at least by ([0033] “When a final input or selection (302—final input) is identified as a search query, the input is transmitted to the search engine 208 (304) for processing. The search engine 208 returns a set of search results, which is received by the search assistant 204 (306) or by a client application, such as a browser application. The list of search results is presented to the user such that the user may select one of the documents for further examination (e.g., visually or aurally)” [0041] “For example, the predicted search queries and/or URLs might be presented in a drop down menu. Regardless of the manner in which the predicted queries and/or URLs are presented to the user, the user may select one of the queries and/or URLs if the user determines that one of the predictions matches the intended entry. In some instances, the predictions may provide the user with additional information which had not been considered. For example, a user may have one query in mind as part of a search strategy, but seeing the predicted results causes the user to alter the input strategy. Once the set is presented (312), the user's input is again monitored. If the user selects one of the predictions (302-final), the request is transmitted either to the search engine 208 as a search request or to a resource host as a URL request (304), as applicable”) and once the system determines that a query input is final such as after user selection of the suggested queries, it is sent to the server and search results are returned to the user for selection and viewing (executing a function corresponding to the completed query).
As per claim 10, claim 1 is incorporated, Tadpatrikar fails to disclose “wherein the indication of completion comprises a user input indicating the query input is complete”
However, Gibbs teaches the above limitations at least by ([0032] “the search assistant 204 receives or identifies a final input (302—final input) when the user has indicated completion of the input string or selected a presented prediction.” [0034] “A final input may be identified by the search assistant 204 in a number of ways such as when the user enters a carriage return, or equivalent character, selects a search button in a graphical user interface (GUI) presented to the user during entry of the search query, or by possibly selecting one of a set of possible queries presented to the user during entry of the search query”).
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the teaching of Gibbs into the teaching of Tadpatrikar because the references similarly disclose speech and/or query endpointing. Consequently, one of ordinary skill in the art would be motivated to further modify the system as in Tadpatrikar to further include the recognizing of query completion based on user input as in Gruber in order to give the user full control of the query endpointing to avoid any potential confusion.
Regarding claim 11, Tadpatrikar discloses:
An information handling device, comprising: at least one output device; a processor; a memory device that stores instructions executable by the processor to: engage, at a digital assistant, in a conversational session with a user; receive, during the conversational session, query input at least by ([0017] discloses the user speaking voice queries to a computing device, such as “text mom I love you” which is transcribed and shown in a natural-language dialogue 124 (conversational session) in Fig. 1 [0040] discloses that the computing device can be a personal digital assistant);
determine that the query input corresponds to a partial query input, wherein the instructions executable by the processor to determine comprise instructions executable by the processor to detect one or more pause indicators in the query input at least by ([0019] “the computing device 121 may generate, without factoring in any characteristics of the user 127, the general endpoint signal 103 and the complete query signal 106. The complete query signal 106 represents an estimate performed by the computing device 121 that the generated transcription of the utterance 130 represents a complete utterance. The computing device 121 compares the generated transcription to one or more complete utterances that the user 127 and other users have previously spoken. The computing device 121 may compare the generated transcription to the complete utterances after a speech recognizer of computing device 121 has identified a new word. For example, after the user 127 speaks word 133, a speech recognizer of the computing device 121 generates the transcription “text.” The computing device 121 compares “text” to other complete utterances and determines that “text” is not a complete utterance. After the user 127 speaks word 139, the speech recognizer generates the transcription “text mom” that the computing device 121 identifies as complete. A similar determination is made after word 151. After the user 127 speaks word 145, the speech recognizer generates the transcription “text mom love” that the computing device 121 identifies as incomplete” [0021] “the computing device 121 may factor in the characteristics of the user 127 when identifying an endpoint of the utterance 124. On one hand, a novice user may speak with longer pauses between words possibly because the novice user may be unfamiliar with what terms may be best to speak to the computing device 121. On the other hand, an expert user may speak with shorter pauses between words because the expert user may be more comfortable and familiar with the speech input technology of computing device 121. Accordingly, the computing device 121 may lengthen or shorten the amount of time before it identifies a pause depending on how the computing device 121 categorizes the user”, Fig. 1 shows the determining of partial query input and pause detection (pause indicators));
wherein the one or more pause indicators in the query input are identified via reference to crowd-sourced data at least by ([0021] “the computing device 121 may factor in the characteristics of the user 127 when identifying an endpoint of the utterance 124. On one hand, a novice user may speak with longer pauses between words possibly because the novice user may be unfamiliar with what terms may be best to speak to the computing device 121. On the other hand, an expert user may speak with shorter pauses between words because the expert user may be more comfortable and familiar with the speech input technology of computing device 121. Accordingly, the computing device 121 may lengthen or shorten the amount of time before it identifies a pause depending on how the computing device 121 categorizes the user” [0026] “FIG. 2 is diagram of an example system 200 that classifies a particular user based on the particular user's experience with speech input. In some implementations, the system 200 may be included in a computing device that the particular user uses for speech input, such as computing device 121. In some implementations, the system may be included in a server that processes transcriptions of speech input” [0028] “Query log 210 illustrates the voice queries provided by the user Bob. The voice queries in query log 210 include three voice queries and each includes either a complete indicator “[C]” or an incomplete indicator “[I].” Each voice query includes a timestamp that notes the date and time that Bob spoke the voice query. Each voice query includes data indicating the pause intervals between the spoken words. For example, “cat videos” may include data to indicate that Bob paused two hundred milliseconds between “cat” and “video.” “Call . . . mom” may include data to indicate that Bob paused one second between “call” and “mom.”” [0029] “Query log 215 illustrates the voice queries provided by the user Alice. The voice queries in query log 215 include five voice queries and each includes either a complete indicator “[C]” or an incomplete indicator “[I].”) and the crowd-sourced data are the query logs which include completed queries of the users;
dynamically extend, responsive to the determining, a wait time for processing the query input, wherein the instructions executable by the processor to dynamically extend the wait time for processing the query input comprise instructions executable by the processor to: identify an input-provision context associated with the user; and extend the wait time for processing the query input by a predetermined amount of time, wherein the predetermined amount of time is dictated by the input-provision context at least by ([0021] “the computing device 121 may factor in the characteristics of the user 127 when identifying an endpoint of the utterance 124. On one hand, a novice user may speak with longer pauses between words possibly because the novice user may be unfamiliar with what terms may be best to speak to the computing device 121. On the other hand, an expert user may speak with shorter pauses between words because the expert user may be more comfortable and familiar with the speech input technology of computing device 121. Accordingly, the computing device 121 may lengthen or shorten the amount of time before it identifies a pause depending on how the computing device 121 categorizes the user” [0022] “The novice pause detector signal 109 illustrates the computing device 121 detecting a pause in audio data corresponding to utterance 124, where the detected pause length is longer than the pause length the corresponds to the general endpointer. For example, the computing device 121 may detect pauses with a length of one second when the user 127 is classified as a novice user. Applying this pause threshold to utterance 124, the computing device 121 will not detect novice length pauses during pauses 136 and 124 because those pauses are of length three hundred milliseconds and eight hundred milliseconds, respectively. The computing device 121 does detect novice length pauses during pauses 148 and 154. As shown in novice pause detector signal 109, the computing device 121 detects a pause of one second during pause 148 after the user 127 spoke word 145. The computing device 121 also detects a pause of one second during pause 154 after the user spoke word 151.” [0023] “The computing device 121 determines, based on the novice pause detector signal 109 and the complete query signal 106, a speech endpoint for the utterance 124 when the computing device classifies the user as a novice. When the computing device 121 detects a pause, such as the pause of the novice pause detector signal 109 during pause 148, the computing device 121 determines whether the utterance 124 is complete. During pause 148, the complete query signal 106 indicates that the utterance 124 is not complete. Even though the computing device 121 detected a novice length pause, the utterance 124 is not complete, so the computing device 121 continues processing the audio data of the utterance 124. During pause 154, the computing device 121 detects a novice length pause and the complete query signal 106 indicates that the utterance is complete and, therefore, generates an endpoint of the utterance 124 as indicated by the novice endpoint signal 112. When the user 127 is classified as a novice, the endpoint of the utterance 124 is after word 151, and the transcription 160 of the utterance 124 is “Text Mom love you.”” [0024] “The expert pause detector signal 115 illustrates the computing device 121 detecting a pause in audio data corresponding to utterance 124, where the detected pause length is shorter than the pause length the corresponds to the general endpointer. For example, the computing device 121 may detect pauses with a length of three hundred milliseconds when the user 127 is classified as an expert user. Applying this pause threshold to utterance 124, the computing device 121 detects expert length pauses during pauses 136, 142, 148, and 154. Because none of the pauses are less than three hundred milliseconds, all of the pauses in utterance 124 include an expert length pause detection.” [0025] “The computing device 121 combines the expert pause detector signal 115 and the complete query signal 106 to determine a speech endpoint for the utterance 124 when the computing device classifies the user as an expert. When the computing device 121 detects a pause, such as the pause of the expert pause detector signal 115 during pause 136, the computing device 121 determines whether the utterance 124 is complete. During pause 136, the complete query signal 106 indicates that the utterance 124 is not complete. Even though the computing device 121 detected an expert length pause, the utterance 124 is not complete, so the computing device 121 continues processing the audio data of the utterance 124. During pause 142, the computing device 121 detects an expert length pause and the complete query signal 106 indicates that the utterance is complete and, therefore, generates an endpoint of the utterance 124 as indicated by the expert endpoint signal 118. When the user 127 is classified as an expert, the endpoint of the utterance 124 is after word 139, and the transcription 163 of the utterance 124 is “Text Mom.””, Fig. 1 graphically shows the determining of query completion, speech endpointing based on user type, and pause detection between utterances) and an input-provision context is the classification of the user as an expert or novice which affects how the system endpoints the user’s utterances/queries;
wherein the wait time is based upon the query input and wherein the wait time is different for different query inputs at least by ([0028] “Query log 210 illustrates the voice queries provided by the user Bob. The voice queries in query log 210 include three voice queries and each includes either a complete indicator “[C]” or an incomplete indicator “[I].” Each voice query includes a timestamp that notes the date and time that Bob spoke the voice query. Each voice query includes data indicating the pause intervals between the spoken words. For example, “cat videos” may include data to indicate that Bob paused two hundred milliseconds between “cat” and “video.” “Call . . . mom” may include data to indicate that Bob paused one second between “call” and “mom.”” [0029] “Query log 215 illustrates the voice queries provided by the user Alice. The voice queries in query log 215 include five voice queries and each includes either a complete indicator “[C]” or an incomplete indicator “[I].” Each voice query includes a timestamp that notes the date and time that Alice spoke the voice query. Each voice query includes data indicating the pause intervals between the spoken words. For example, “Text Sally that I'll be ten minutes late” may include data to indicate that Alice paused one millisecond between “text” and “Sally,” paused three hundred milliseconds between “Sally” and “that,” and paused 1.5 seconds between “that” and “I'll,” as well as pause intervals between the other words. “Call mom” may include data to indicate that Alice paused three milliseconds between “call” and “mom.””, Fig. 1 also shows different pause detector times for different types of users for each separate utterance) and the pause time between words, such as “text” and “Sally” is three hundred milliseconds while the pause time between “call” “mom” is three milliseconds (wait time is based upon the query input and is different for different query inputs);
Tadpatrikar fails to disclose “wherein the determining comprises accessing crowd-sourced query input structures comprising the crowd-sourced data associated with completed queries related to a topic of the query input and comparing the query input to the crowd-sourced query input structures, wherein the comparing the query input to the crowd-sourced query input structures comprises identifying a feature of the user and comparing the query input to crowd-sourced input structures corresponding to other users having the feature; wherein to dynamically extend the wait time for processing the query input comprises extending the wait time for processing the query input by a predetermined amount depending on a topic of the query input; provide, using the at least one output device, a notification that informs the user of the extending and that requests the user to provide additional query input; and execute, responsive to determining that the additional query input is received, a function corresponding to a completed query, wherein the completed query comprises the partial query input and the additional query input”
However, Gibbs teaches the following limitations, wherein the determining comprises accessing crowd-sourced query input structures comprising the crowd-sourced data associated with completed queries related to a topic of the query input and comparing the query input to the crowd-sourced query input structures, wherein the comparing the query input to the crowd-sourced query input structures comprises identifying a feature of the user and comparing the query input to crowd-sourced input structures corresponding to other users having the feature at least by ([0047] “FIG. 5 shows a set of data structures associated with historical queries (i.e., queries previously submitted) used for predicting queries corresponding to partially entered queries” [0048] “Referring to FIG. 5, a historical query log 502 is filtered by one or more filters 504 to create an authorized historical queries list 506. An ordered set builder 508 creates one or more fingerprint-to-table maps 510 from the authorized historical queries list 506 based on certain criteria. When the partial query is transmitted (FIG. 3, 308), it is received at the search engine 208 as partial query 513” [0049] “partial search queries received from a particular website might be mapped to predicted results using a set of fingerprint-to-table maps that were generated from historical queries received from the same website, or from a group of websites deemed to be similar to the particular website. Similarly, an individual user may, with his/her permission, have a user profile that specifies information about the user or about a group associated with the user, and that “personalization information” may be used to identify a respective set of fingerprint-to-table maps for use when predicting results for that user.” [0051] “The historical query log 502 contains a log of previously submitted queries received by the search engine 208 over a period of time. In some embodiments, the queries are from a particular user. In some embodiments, the queries are from a community of users sharing at least one similar characteristic such as belonging to the same workgroup, using the same language, having an internet address associated with the same country or geographic region, or the like. The selection of the community determines the pool of previously submitted queries from which the predictions are drawn. Different communities would tend to produce different sets of predictions” [0053] “other types of meta-data are associated and stored with the query such as the query language or other information which might be provided by the user or search assistant in accordance with user preferences (e.g., identification or profile information indicating certain preferences of the user). In some embodiments, the meta-information includes category or concept information gleaned from analyzing the terms in the query.” [0058] “Similarly, different fingerprint-to-table maps 510 could be created for geographical regions. As another example, different fingerprint-to-table maps 510 could be created from queries from particular IP addresses or groups of addresses, such as those from a particular network or a particular group of individuals (e.g., a corporation). Using the meta-information to create different fingerprint-to-table maps 510, allows the predictions to be based on users having characteristics similar to that of the searcher and which should increase the likelihood of a correct prediction”);
provide, using the at least one output device, a notification that informs the user of the extending and that requests the user to provide additional query input at least by ([0039] “Regardless of the way in which the partial input is identified, it is transmitted to the search engine 208 (308) for processing. In response to the partial search query, the search engine 208 returns a set of ordered predicted search queries and/or URLs (310) which is presented to the user (312) ordered in accordance with a ranking criteria. The predictions may be displayed to the user in a number of ways. For example, the predictions could be displayed in a drop-down window, a persistent, or non-persistent window or other ways. In some embodiments, queries which the user had previously submitted could be visually indicated to the user (e.g., by highlighting the user's own previously entered queries” [0041] “r. For example, the predicted search queries and/or URLs might be presented in a drop down menu. Regardless of the manner in which the predicted queries and/or URLs are presented to the user, the user may select one of the queries and/or URLs if the user determines that one of the predictions matches the intended entry. In some instances, the predictions may provide the user with additional information which had not been considered. For example, a user may have one query in mind as part of a search strategy, but seeing the predicted results causes the user to alter the input strategy. Once the set is presented (312), the user's input is again monitored. If the user selects one of the predictions (302-final), the request is transmitted either to the search engine 208 as a search request or to a resource host as a URL request (304), as applicable”) and the system predicts a set of completed search queries based on entered partial search queries and displays them to the user for additional input which are selected by a user or the user alters the original search query;
and execute, responsive to determining that the additional query input is received, a function corresponding to a completed query, wherein the completed query comprises the partial query input and the additional query input at least by ([0033] “When a final input or selection (302—final input) is identified as a search query, the input is transmitted to the search engine 208 (304) for processing. The search engine 208 returns a set of search results, which is received by the search assistant 204 (306) or by a client application, such as a browser application. The list of search results is presented to the user such that the user may select one of the documents for further examination (e.g., visually or aurally)” [0041] “For example, the predicted search queries and/or URLs might be presented in a drop down menu. Regardless of the manner in which the predicted queries and/or URLs are presented to the user, the user may select one of the queries and/or URLs if the user determines that one of the predictions matches the intended entry. In some instances, the predictions may provide the user with additional information which had not been considered. For example, a user may have one query in mind as part of a search strategy, but seeing the predicted results causes the user to alter the input strategy. Once the set is presented (312), the user's input is again monitored. If the user selects one of the predictions (302-final), the request is transmitted either to the search engine 208 as a search request or to a resource host as a URL request (304), as applicable”) and once the system determines that a query input is final such as after user selection of the suggested queries, it is sent to the server and search results are returned to the user for selection and viewing (executing a function corresponding to the completed query).
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the teaching of Gibbs into the teaching of Tadpatrikar because the references similarly disclose speech and/or query endpointing. Consequently, one of ordinary skill in the art would be motivated to further modify the system as in Tadpatrikar to further include the accessing of crow-sourced query structures as in Gibbs to more accurately characterize the users input in order to reduce endpointing errors which would result in a bad user experience.
Tadpatrikar, Gibbs fail to disclose “wherein to dynamically extend the wait time for processing the … input comprises extending the wait time for processing the … input by a predetermined amount depending on a topic of the … input”
However, Buchholz teaches the above limitation at least by ([0021] “The set of pause types includes, but is not limited to speaker pauses, topic pauses, heading pauses, paragraph pauses, sentence pauses, phrase pauses, word pauses, end pauses and live audio pauses…Topic and speakers pauses are more indicative of spoken dialogues and represent context information. During the course of a conversation, speakers and topics change. These are natural places to introduce pauses in the playback. In the case of TTS, several types of indicia may be used to detect such places. For example, in a Q&A scenario, the question is often differentiated by Q: or speaker initials, e.g., DB, CM, or by typeface change. Thus, a pause would be introduced after a question (speaker change) or after the answer (topic change assuming each question introduces a new topic). In the case of recorded audio, an edit function introduces speaker and topic markers at the discretion of the editor. Those having ordinary skill in the art will appreciate that other methods for identifying topic and speaker pauses may be used” [0024] “an application 502 a provides text data 520 as input to a parser 504.” [0033] “The controller 602, upon receiving a message 650 that a condition adversely affecting the delivery of the transmitted packets has arisen, can decide to insert one or more pauses in the reconstructed audio 640 based on the pause information 652 included in the transmitted packets 630…the controller 602 attempts to insert pauses at the speaker or topic level first, then at the heading, paragraph, or sentence level, and finally at the phrase or word level” [0036] “As note above, the length of pauses inserted may be set to a predetermined length…. In a second embodiment, the length of a pause could be dependent upon the type of pause being inserted. For example, word and phrase pauses could be of a relatively short duration; heading, paragraph and sentence pauses could be longer; and speaker and topic pauses could be longer still… Further still, combinations of these three approaches may be mixed.”) and pauses of a predetermined length are inserted into the parsed input data based on the type of pauses to be inserted, such as a topic pauses which are of longer duration than those inserted for word, heading, paragraph, or sentence pauses.
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the teaching of Buchholz into the teaching of Tadpatrikar, Gibbs because the references similarly disclose speech and/or query endpointing. Consequently, one of ordinary skill in the art would be motivated to further modify the system as in the combination of references to further include the insertion of pauses depending on topics of the input data as in Buchholz to “accommodate the occurrence of delays in delivering the packets” (Buchholz, [0006]).
As per claim 13, claim 11 is incorporated, Tadpatrikar further discloses:
wherein the detecting the one or more pause indicators in the query input comprises detecting a pause contained with the query input at least by ([0021] “the computing device 121 may factor in the characteristics of the user 127 when identifying an endpoint of the utterance 124. On one hand, a novice user may speak with longer pauses between words possibly because the novice user may be unfamiliar with what terms may be best to speak to the computing device 121. On the other hand, an expert user may speak with shorter pauses between words because the expert user may be more comfortable and familiar with the speech input technology of computing device 121. Accordingly, the computing device 121 may lengthen or shorten the amount of time before it identifies a pause depending on how the computing device 121 categorizes the user”, Fig. 1 shows the determining of partial query input and pause detection (pause indicators)).
As per claim 14, claim 11 is incorporated, Tadpatrikar further discloses:
wherein the instructions executable by the processor to determine whether the query input has been completed comprise instructions executable by the processor to access a query history associated with the user at least by ([0025] “The computing device 121 combines the expert pause detector signal 115 and the complete query signal 106 to determine a speech endpoint for the utterance 124 when the computing device classifies the user as an expert. When the computing device 121 detects a pause, such as the pause of the expert pause detector signal 115 during pause 136, the computing device 121 determines whether the utterance 124 is complete.” [0026] “FIG. 2 is diagram of an example system 200 that classifies a particular user based on the particular user's experience with speech input. In some implementations, the system 200 may be included in a computing device that the particular user uses for speech input, such as computing device 121. In some implementations, the system may be included in a server that processes transcriptions of speech input” [0027] “The system 200 includes voice queries 205. The voice query log 205 stores the previous voice queries that users provide to the system 200.”).
As per claim 15, claim 14 is incorporated, Tadpatrikar fails to disclose “wherein the accessed query history associated with the user comprises a query history identifying structures of complete queries related to a topic of the query input”
However, Gibbs teaches the above limitation at least by ([0047] “FIG. 5 shows a set of data structures associated with historical queries (i.e., queries previously submitted) used for predicting queries corresponding to partially entered queries” [0049] “different sets of fingerprint-to-table maps 510 may be used for respective categories of users, thereby providing predicted results that are biased in accordance with one or more categories or topics associated with the user” [0051] “The historical query log 502 contains a log of previously submitted queries received by the search engine 208 over a period of time. In some embodiments, the queries are from a particular user. In some embodiments, the queries are from a community of users sharing at least one similar characteristic such as belonging to the same workgroup, using the same language, having an internet address associated with the same country or geographic region, or the like.” [0053] “other types of meta-data are associated and stored with the query such as the query language or other information which might be provided by the user or search assistant in accordance with user preferences (e.g., identification or profile information indicating certain preferences of the user). In some embodiments, the meta-information includes category or concept information gleaned from analyzing the terms in the query.”).
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the teaching of Gibbs into the teaching of Tadpatrikar because the references similarly disclose speech and/or query endpointing. Consequently, one of ordinary skill in the art would be motivated to further modify the system as in Tadpatrikar to further include the accessing of crow-sourced query structures related to a topic as in Gibbs to more generally characterize the users and the user input in order to reduce endpointing errors which would result in a bad user experience.
As per claim 19, claim 11 is incorporated, Gibbs further discloses:
wherein the instructions are executable by the processor to perform a function associated with the query input upon indication of completion of the query input at least by ([0033] “When a final input or selection (302—final input) is identified as a search query, the input is transmitted to the search engine 208 (304) for processing. The search engine 208 returns a set of search results, which is received by the search assistant 204 (306) or by a client application, such as a browser application. The list of search results is presented to the user such that the user may select one of the documents for further examination (e.g., visually or aurally)” [0041] “For example, the predicted search queries and/or URLs might be presented in a drop down menu. Regardless of the manner in which the predicted queries and/or URLs are presented to the user, the user may select one of the queries and/or URLs if the user determines that one of the predictions matches the intended entry. In some instances, the predictions may provide the user with additional information which had not been considered. For example, a user may have one query in mind as part of a search strategy, but seeing the predicted results causes the user to alter the input strategy. Once the set is presented (312), the user's input is again monitored. If the user selects one of the predictions (302-final), the request is transmitted either to the search engine 208 as a search request or to a resource host as a URL request (304), as applicable”) and once the system determines that a query input is final such as after user selection of the suggested queries, it is sent to the server and search results are returned to the user for selection and viewing (executing a function corresponding to the completed query).
Regarding claim 20, Tadpatrikar discloses:
A product, comprising: a storage device that stores code, the code being executable by a processor and comprising: code that engages, at a digital assistant, in a conversational session with a user; code that receives, during the conversational session with the user, query input at least by ([0017] discloses the user speaking voice queries to a computing device, such as “text mom I love you” which is transcribed and shown in a natural-language dialogue 124 (conversational session) in Fig. 1 [0040] discloses that the computing device can be a personal digital assistant);
code that determines that the query input corresponds to a partial query input, wherein the code that determines comprises code that detects one or more pause indicators in the query input at least by ([0019] “the computing device 121 may generate, without factoring in any characteristics of the user 127, the general endpoint signal 103 and the complete query signal 106. The complete query signal 106 represents an estimate performed by the computing device 121 that the generated transcription of the utterance 130 represents a complete utterance. The computing device 121 compares the generated transcription to one or more complete utterances that the user 127 and other users have previously spoken. The computing device 121 may compare the generated transcription to the complete utterances after a speech recognizer of computing device 121 has identified a new word. For example, after the user 127 speaks word 133, a speech recognizer of the computing device 121 generates the transcription “text.” The computing device 121 compares “text” to other complete utterances and determines that “text” is not a complete utterance. After the user 127 speaks word 139, the speech recognizer generates the transcription “text mom” that the computing device 121 identifies as complete. A similar determination is made after word 151. After the user 127 speaks word 145, the speech recognizer generates the transcription “text mom love” that the computing device 121 identifies as incomplete” [0021] “the computing device 121 may factor in the characteristics of the user 127 when identifying an endpoint of the utterance 124. On one hand, a novice user may speak with longer pauses between words possibly because the novice user may be unfamiliar with what terms may be best to speak to the computing device 121. On the other hand, an expert user may speak with shorter pauses between words because the expert user may be more comfortable and familiar with the speech input technology of computing device 121. Accordingly, the computing device 121 may lengthen or shorten the amount of time before it identifies a pause depending on how the computing device 121 categorizes the user”, Fig. 1 shows the determining of partial query input and pause detection (pause indicators));
wherein the one or more pause indicators in the query input are identified via reference to crowd-sourced data at least by ([0021] “the computing device 121 may factor in the characteristics of the user 127 when identifying an endpoint of the utterance 124. On one hand, a novice user may speak with longer pauses between words possibly because the novice user may be unfamiliar with what terms may be best to speak to the computing device 121. On the other hand, an expert user may speak with shorter pauses between words because the expert user may be more comfortable and familiar with the speech input technology of computing device 121. Accordingly, the computing device 121 may lengthen or shorten the amount of time before it identifies a pause depending on how the computing device 121 categorizes the user” [0026] “FIG. 2 is diagram of an example system 200 that classifies a particular user based on the particular user's experience with speech input. In some implementations, the system 200 may be included in a computing device that the particular user uses for speech input, such as computing device 121. In some implementations, the system may be included in a server that processes transcriptions of speech input” [0028] “Query log 210 illustrates the voice queries provided by the user Bob. The voice queries in query log 210 include three voice queries and each includes either a complete indicator “[C]” or an incomplete indicator “[I].” Each voice query includes a timestamp that notes the date and time that Bob spoke the voice query. Each voice query includes data indicating the pause intervals between the spoken words. For example, “cat videos” may include data to indicate that Bob paused two hundred milliseconds between “cat” and “video.” “Call . . . mom” may include data to indicate that Bob paused one second between “call” and “mom.”” [0029] “Query log 215 illustrates the voice queries provided by the user Alice. The voice queries in query log 215 include five voice queries and each includes either a complete indicator “[C]” or an incomplete indicator “[I].”) and the crowd-sourced data are the query logs which include completed queries of the users;
code that dynamically extends, responsive to the code that determines, a wait time for processing the query input, wherein the code that dynamically extends the wait time for processing the query input comprises code that: identifies an input-provision context associated with the user; and extends the wait time for processing the query input by a predetermined amount of time, wherein the predetermined amount of time is dictated by the input-provision context; … query input… at least by ([0021] “the computing device 121 may factor in the characteristics of the user 127 when identifying an endpoint of the utterance 124. On one hand, a novice user may speak with longer pauses between words possibly because the novice user may be unfamiliar with what terms may be best to speak to the computing device 121. On the other hand, an expert user may speak with shorter pauses between words because the expert user may be more comfortable and familiar with the speech input technology of computing device 121. Accordingly, the computing device 121 may lengthen or shorten the amount of time before it identifies a pause depending on how the computing device 121 categorizes the user” [0022] “The novice pause detector signal 109 illustrates the computing device 121 detecting a pause in audio data corresponding to utterance 124, where the detected pause length is longer than the pause length the corresponds to the general endpointer. For example, the computing device 121 may detect pauses with a length of one second when the user 127 is classified as a novice user. Applying this pause threshold to utterance 124, the computing device 121 will not detect novice length pauses during pauses 136 and 124 because those pauses are of length three hundred milliseconds and eight hundred milliseconds, respectively. The computing device 121 does detect novice length pauses during pauses 148 and 154. As shown in novice pause detector signal 109, the computing device 121 detects a pause of one second during pause 148 after the user 127 spoke word 145. The computing device 121 also detects a pause of one second during pause 154 after the user spoke word 151.” [0023] “The computing device 121 determines, based on the novice pause detector signal 109 and the complete query signal 106, a speech endpoint for the utterance 124 when the computing device classifies the user as a novice. When the computing device 121 detects a pause, such as the pause of the novice pause detector signal 109 during pause 148, the computing device 121 determines whether the utterance 124 is complete. During pause 148, the complete query signal 106 indicates that the utterance 124 is not complete. Even though the computing device 121 detected a novice length pause, the utterance 124 is not complete, so the computing device 121 continues processing the audio data of the utterance 124. During pause 154, the computing device 121 detects a novice length pause and the complete query signal 106 indicates that the utterance is complete and, therefore, generates an endpoint of the utterance 124 as indicated by the novice endpoint signal 112. When the user 127 is classified as a novice, the endpoint of the utterance 124 is after word 151, and the transcription 160 of the utterance 124 is “Text Mom love you.”” [0024] “The expert pause detector signal 115 illustrates the computing device 121 detecting a pause in audio data corresponding to utterance 124, where the detected pause length is shorter than the pause length the corresponds to the general endpointer. For example, the computing device 121 may detect pauses with a length of three hundred milliseconds when the user 127 is classified as an expert user. Applying this pause threshold to utterance 124, the computing device 121 detects expert length pauses during pauses 136, 142, 148, and 154. Because none of the pauses are less than three hundred milliseconds, all of the pauses in utterance 124 include an expert length pause detection.” [0025] “The computing device 121 combines the expert pause detector signal 115 and the complete query signal 106 to determine a speech endpoint for the utterance 124 when the computing device classifies the user as an expert. When the computing device 121 detects a pause, such as the pause of the expert pause detector signal 115 during pause 136, the computing device 121 determines whether the utterance 124 is complete. During pause 136, the complete query signal 106 indicates that the utterance 124 is not complete. Even though the computing device 121 detected an expert length pause, the utterance 124 is not complete, so the computing device 121 continues processing the audio data of the utterance 124. During pause 142, the computing device 121 detects an expert length pause and the complete query signal 106 indicates that the utterance is complete and, therefore, generates an endpoint of the utterance 124 as indicated by the expert endpoint signal 118. When the user 127 is classified as an expert, the endpoint of the utterance 124 is after word 139, and the transcription 163 of the utterance 124 is “Text Mom.””, Fig. 1 graphically shows the determining of query completion, speech endpointing based on user type, and pause detection between utterances) and an input-provision context is the classification of the user as an expert or novice which affects how the system endpoints the user’s utterances/queries;
wherein the wait time is based upon the query input and wherein the wait time is different for different query inputs at least by ([0028] “Query log 210 illustrates the voice queries provided by the user Bob. The voice queries in query log 210 include three voice queries and each includes either a complete indicator “[C]” or an incomplete indicator “[I].” Each voice query includes a timestamp that notes the date and time that Bob spoke the voice query. Each voice query includes data indicating the pause intervals between the spoken words. For example, “cat videos” may include data to indicate that Bob paused two hundred milliseconds between “cat” and “video.” “Call . . . mom” may include data to indicate that Bob paused one second between “call” and “mom.”” [0029] “Query log 215 illustrates the voice queries provided by the user Alice. The voice queries in query log 215 include five voice queries and each includes either a complete indicator “[C]” or an incomplete indicator “[I].” Each voice query includes a timestamp that notes the date and time that Alice spoke the voice query. Each voice query includes data indicating the pause intervals between the spoken words. For example, “Text Sally that I'll be ten minutes late” may include data to indicate that Alice paused one millisecond between “text” and “Sally,” paused three hundred milliseconds between “Sally” and “that,” and paused 1.5 seconds between “that” and “I'll,” as well as pause intervals between the other words. “Call mom” may include data to indicate that Alice paused three milliseconds between “call” and “mom.””, Fig. 1 also shows different pause detector times for different types of users for each separate utterance) and the pause time between words, such as “text” and “Sally” is three hundred milliseconds while the pause time between “call” “mom” is three milliseconds (wait time is based upon the query input and is different for different query inputs);
Tadpatrikar fails to disclose “wherein the determining comprises accessing crowd-sourced query input structures comprising the crowd-sourced data associated with completed queries related to a topic of the query input and comparing the query input to the crowd-sourced query input structures, wherein the comparing the query input to the crowd-sourced query input structures comprises identifying a feature of the user and comparing the query input to crowd-sourced input structures corresponding to other users having the feature; wherein the code that dynamically extends the wait time for processing the query input comprises extending the wait time for processing the query input by a predetermined amount depending on a topic of the query input; code that provides a notification that informs the user of the extending and that requests the user to provide additional query input; and code that executes, responsive to determining that the additional query input is received, a function corresponding to a completed query, wherein the completed query comprises the partial query input and the additional query input”
However, Gibbs teaches the following limitations, wherein the determining comprises accessing crowd-sourced query input structures comprising the crowd-sourced data associated with completed queries related to a topic of the query input and comparing the query input to the crowd-sourced query input structures, wherein the comparing the query input to the crowd-sourced query input structures comprises identifying a feature of the user and comparing the query input to crowd-sourced input structures corresponding to other users having the feature at least by ([0047] “FIG. 5 shows a set of data structures associated with historical queries (i.e., queries previously submitted) used for predicting queries corresponding to partially entered queries” [0048] “Referring to FIG. 5, a historical query log 502 is filtered by one or more filters 504 to create an authorized historical queries list 506. An ordered set builder 508 creates one or more fingerprint-to-table maps 510 from the authorized historical queries list 506 based on certain criteria. When the partial query is transmitted (FIG. 3, 308), it is received at the search engine 208 as partial query 513” [0049] “partial search queries received from a particular website might be mapped to predicted results using a set of fingerprint-to-table maps that were generated from historical queries received from the same website, or from a group of websites deemed to be similar to the particular website. Similarly, an individual user may, with his/her permission, have a user profile that specifies information about the user or about a group associated with the user, and that “personalization information” may be used to identify a respective set of fingerprint-to-table maps for use when predicting results for that user.” [0051] “The historical query log 502 contains a log of previously submitted queries received by the search engine 208 over a period of time. In some embodiments, the queries are from a particular user. In some embodiments, the queries are from a community of users sharing at least one similar characteristic such as belonging to the same workgroup, using the same language, having an internet address associated with the same country or geographic region, or the like. The selection of the community determines the pool of previously submitted queries from which the predictions are drawn. Different communities would tend to produce different sets of predictions” [0053] “other types of meta-data are associated and stored with the query such as the query language or other information which might be provided by the user or search assistant in accordance with user preferences (e.g., identification or profile information indicating certain preferences of the user). In some embodiments, the meta-information includes category or concept information gleaned from analyzing the terms in the query.” [0058] “Similarly, different fingerprint-to-table maps 510 could be created for geographical regions. As another example, different fingerprint-to-table maps 510 could be created from queries from particular IP addresses or groups of addresses, such as those from a particular network or a particular group of individuals (e.g., a corporation). Using the meta-information to create different fingerprint-to-table maps 510, allows the predictions to be based on users having characteristics similar to that of the searcher and which should increase the likelihood of a correct prediction”);
code that provides a notification that informs the user of the extending and that requests the user to provide additional query input at least by ([0039] “Regardless of the way in which the partial input is identified, it is transmitted to the search engine 208 (308) for processing. In response to the partial search query, the search engine 208 returns a set of ordered predicted search queries and/or URLs (310) which is presented to the user (312) ordered in accordance with a ranking criteria. The predictions may be displayed to the user in a number of ways. For example, the predictions could be displayed in a drop-down window, a persistent, or non-persistent window or other ways. In some embodiments, queries which the user had previously submitted could be visually indicated to the user (e.g., by highlighting the user's own previously entered queries” [0041] “r. For example, the predicted search queries and/or URLs might be presented in a drop down menu. Regardless of the manner in which the predicted queries and/or URLs are presented to the user, the user may select one of the queries and/or URLs if the user determines that one of the predictions matches the intended entry. In some instances, the predictions may provide the user with additional information which had not been considered. For example, a user may have one query in mind as part of a search strategy, but seeing the predicted results causes the user to alter the input strategy. Once the set is presented (312), the user's input is again monitored. If the user selects one of the predictions (302-final), the request is transmitted either to the search engine 208 as a search request or to a resource host as a URL request (304), as applicable”) and the system predicts a set of completed search queries based on entered partial search queries and displays them to the user for additional input which are selected by a user or the user alters the original search query;
and code that executes, responsive to determining that the additional query input is received, a function corresponding to a completed query, wherein the completed query comprises the partial query input and the additional query input at least by ([0033] “When a final input or selection (302—final input) is identified as a search query, the input is transmitted to the search engine 208 (304) for processing. The search engine 208 returns a set of search results, which is received by the search assistant 204 (306) or by a client application, such as a browser application. The list of search results is presented to the user such that the user may select one of the documents for further examination (e.g., visually or aurally)” [0041] “For example, the predicted search queries and/or URLs might be presented in a drop down menu. Regardless of the manner in which the predicted queries and/or URLs are presented to the user, the user may select one of the queries and/or URLs if the user determines that one of the predictions matches the intended entry. In some instances, the predictions may provide the user with additional information which had not been considered. For example, a user may have one query in mind as part of a search strategy, but seeing the predicted results causes the user to alter the input strategy. Once the set is presented (312), the user's input is again monitored. If the user selects one of the predictions (302-final), the request is transmitted either to the search engine 208 as a search request or to a resource host as a URL request (304), as applicable”) and once the system determines that a query input is final such as after user selection of the suggested queries, it is sent to the server and search results are returned to the user for selection and viewing (executing a function corresponding to the completed query).
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the teaching of Gibbs into the teaching of Tadpatrikar because the references similarly disclose speech and/or query endpointing. Consequently, one of ordinary skill in the art would be motivated to further modify the system as in Tadpatrikar to further include the accessing of crow-sourced query structures as in Gibbs to more accurately characterize the users input in order to reduce endpointing errors which would result in a bad user experience.
Tadpatrikar, Gibbs fail to disclose “wherein the code that dynamically extends the wait time for processing the … input comprises extending the wait time for processing the … input by a predetermined amount depending on a topic of the … input”
However, Buchholz teaches the above limitation at least by ([0021] “The set of pause types includes, but is not limited to speaker pauses, topic pauses, heading pauses, paragraph pauses, sentence pauses, phrase pauses, word pauses, end pauses and live audio pauses…Topic and speakers pauses are more indicative of spoken dialogues and represent context information. During the course of a conversation, speakers and topics change. These are natural places to introduce pauses in the playback. In the case of TTS, several types of indicia may be used to detect such places. For example, in a Q&A scenario, the question is often differentiated by Q: or speaker initials, e.g., DB, CM, or by typeface change. Thus, a pause would be introduced after a question (speaker change) or after the answer (topic change assuming each question introduces a new topic). In the case of recorded audio, an edit function introduces speaker and topic markers at the discretion of the editor. Those having ordinary skill in the art will appreciate that other methods for identifying topic and speaker pauses may be used” [0024] “an application 502 a provides text data 520 as input to a parser 504.” [0033] “The controller 602, upon receiving a message 650 that a condition adversely affecting the delivery of the transmitted packets has arisen, can decide to insert one or more pauses in the reconstructed audio 640 based on the pause information 652 included in the transmitted packets 630…the controller 602 attempts to insert pauses at the speaker or topic level first, then at the heading, paragraph, or sentence level, and finally at the phrase or word level” [0036] “As note above, the length of pauses inserted may be set to a predetermined length…. In a second embodiment, the length of a pause could be dependent upon the type of pause being inserted. For example, word and phrase pauses could be of a relatively short duration; heading, paragraph and sentence pauses could be longer; and speaker and topic pauses could be longer still… Further still, combinations of these three approaches may be mixed.”) and pauses of a predetermined length are inserted into the parsed input data based on the type of pauses to be inserted, such as a topic pauses which are of longer duration than those inserted for word, heading, paragraph, or sentence pauses.
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the teaching of Buchholz into the teaching of Tadpatrikar, Gibbs because the references similarly disclose speech and/or query endpointing. Consequently, one of ordinary skill in the art would be motivated to further modify the system as in the combination of references to further include the insertion of pauses depending on topics of the input data as in Buchholz to “accommodate the occurrence of delays in delivering the packets” (Buchholz, [0006]).

Claims 2, 12 are rejected under 35 U.S.C. 103 as being unpatentable over Tadpatrikar (US 2017/0110116) in view of Gibbs (US 2006/0106769) and Buchholz (US 2002/0111812) and further in view of Simko (US 2018/0350395).
As per claim 2, claim 1 is incorporated, Tadpatrikar, Gibbs, Buchholz fail to disclose “wherein the detecting the one or more pause indicators in the query input comprises detecting a filler word contained within the query input”
However, Simko teaches the above limitation at least by ([0095] “This change in training data may incentivize the system to detect any acoustic cues which help indicate whether the user intends to utter more speech. For example, if a user says “um” during a longish pause, the end-of-query classifier has the power (due to the LSTM) and inclination (due to the modified loss function) to remember that acoustic event and decrease the probability of query-complete in subsequent silence frames.”) and the filler word is the word “um”, from the example user speech.
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the teaching of Simko into the teaching of Tadpatrikar, Gibbs, Buchholz because the references similarly disclose user-querying and query completion. Consequently, one of ordinary skill in the art would be motivated to further modify the system as in the combination of references to further include the extending of the waiting time for the completion of query input as in Simko to confer the benefit of [0016] “endpoint[ing] at the intended end of the utterance, leading to more accurate or desirable natural language processing outputs, and to faster processing by the natural language processing system” by extending the query input wait time when detecting filler words as in Simko to the combination of references.
Claim 12 recites equivalent claim limitations as the method of claim 2, except that it sets forth the claimed invention as an information handling device, as such it is rejected for the same reason as applied hereinabove.

Response to Arguments
The following is in response to the amendment filed on 03/15/22.
Applicant’s arguments with respect to allowability of claims 7, 17 are moot because the examiner discovered a new reference upon updating the search responsive to the amendments made to the instant claims which changed their scope, as explained by the examiner in the response to amendment section herein. Accordingly, the examiner has included new grounds of rejection herein.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WILLIAM P BARTLETT whose telephone number is (469)295-9085.  The examiner can normally be reached on M-Th 11:30-8:30, F 11-3.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.  
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Usmaan Saeed can be reached on 5712724046.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/WILLIAM P BARTLETT/
Examiner, Art Unit 2169
/USMAAN SAEED/Supervisory Patent Examiner, Art Unit 2169