DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on June 5, 2020 is being considered by the examiner.

Claim Objections
Claims 1 and 18 are objected to because of the following informalities: 
In the preamble of claim 1, the phrase “An method” should read “A method”. 
In claim 18, line 3, the punctuation at the end of the clause should be either a comma or a semicolon, not both.
Appropriate correction is required.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-5, 7-11, and 13-17 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The independent claims 1, 7 and 13 recite “recognizing voice data inputted by a user; obtaining a voice text corresponding to the voice data; obtaining, based on the voice text, a text to-be-input corresponding to the voice data, wherein the text to-be-input includes a plurality of words constituting a phrase or a sentence; and displaying the text to-be-input in an input textbox of an input interface.”

This judicial exception is not integrated into a practical application. In particular, claims 1 and 7, and 13 recite additional elements of a “terminal”, “processor”, and “memory” as per the independent claims. For example, in [0067] of the as filed specification, there is description of using a general purpose computing environment or computing device as recited in [00117]-[00127]. However, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional element of using a computer is noted as a general computer as noted. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Further, the additional limitation in the claims noted above are directed towards insignificant solution activity. As such, the claims are not patent eligible.
With respect to claim 2, 8, and 14, the claims relate to the choosing between presenting the voice text or the target text based on the corresponding relationships between other voice texts and target texts. This is equivalent to a decision by a listener of whether two phrases are related. No additional limitations are present. With respect to claim 3, 9, and 15, the claims relate to determining whether there’s a corresponding relationship based on processing through a matching algorithm. Then, using the output to determine if there is a corresponding relationship. This is little more than a mental step using calculations to determine based on known relations 

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claims 1, 7, and 13 is/are rejected under 35 U.S.C. 103 as being anticipated by Joshi (U.S. Pat. App. Pub. No. 2014/0163954, hereinafter Joshi).

Regarding claim 1, Joshi discloses A method, comprising (“method 500”; Joshi, ¶ [0061]): recognizing voice data inputted by a user (“voice input can be received at a device to generate a communication such as a message. In some implementations, input can be entered by a user through a user interface such as a keyboard, virtual keyboard, voice recognition, or the like.”; Joshi, ¶ [0062]); obtaining a voice text corresponding to the voice data (“voice input received at the device can be converted to text input at the device.”; Joshi, ¶ [0062]); wherein the text to-be-input includes a plurality of words constituting a phrase or a sentence (“A predicted-text suggestion (text to-be-input) can include words, phrases, or sentences.”; Joshi, ¶ [0065]); and displaying the text to-be-input in an input textbox of an input interface (FIG 6 displays the predicted text suggestion 660 (text to-be-input) in an input textbox (showing input text 610 positioned above) as part of the user interface on display 670; Joshi, ¶¶ [0069]-[0070], FIG. 6).

Regarding claim 7, Joshi discloses A terminal, comprising (“computing environment 1200” performing the method 500; Joshi, ¶¶ [0061], [0104]); a processor (“computing environment 1200 includes one or more processing units 1210, 1215 and memory 1220, 1225”; Joshi, ¶ [0105]); and a memory configured to store computer instructions executable by the processor, wherein the processor is configured to (“Any of the disclosed methods can be implemented as computer-executable instructions stored on one or more computer-readable storage media”; Joshi, ¶ [0111]): recognize voice data inputted by a user (“voice input can be received at a device to generate a communication such as a message. In some implementations, input can be entered by a user through a user interface such as a keyboard, virtual keyboard, voice recognition, or the like.”; Joshi, ¶ [0062]); obtain a voice text corresponding to the voice data (“voice input received at the device can be converted to text input at the device.”; Joshi, ¶ [0062]); wherein the text to-be-input includes a plurality of words constituting a phrase or a sentence (“A predicted-text suggestion (text to-be-input) can include words, phrases, or sentences.”; Joshi, ¶ [0065]); and display the text to-be-input in an input textbox of an input interface (FIG 6 displays the predicted text suggestion 660 (text to-be-input) in an input textbox (showing input text 610 positioned above) as part of the user interface on display 670; Joshi, ¶¶ [0069]-[0070], FIG. 6).

Regarding claim 13, Joshi discloses A non-transitory computer-readable storage medium storing instructions that, when executed by one or more processors of a terminal, cause the terminal to (“Any of the disclosed methods can be implemented as computer-executable instructions stored on one or more computer-readable storage media”; Joshi, ¶ [0111]): recognize voice data inputted by a user (“voice input can be received at a device to generate a communication such as a message. In some implementations, input can be entered by a user through a user interface such as a keyboard, virtual keyboard, voice recognition, or the like.”; Joshi, ¶ [0062]); obtain a voice text corresponding to the voice data (“voice input received at the device can be converted to text input at the device.”; Joshi, ¶ [0062]); wherein the text to-be-input includes a plurality of words constituting a phrase or a sentence (“A predicted-text suggestion (text to-be-input) can include words, phrases, or sentences.”; Joshi, ¶ [0065]); and display the text to-be-input in an input textbox of an input interface (FIG 6 displays the predicted text suggestion 660 (text to-be-input) in an input textbox (showing input text 610 positioned above) as part of the user interface on display 670; Joshi, ¶¶ [0069]-[0070], FIG. 6).


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 2-4, 6, 8-10, 14-16 and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Joshi in view of Tomar (U.S. Pat. App. Pub. No. 2018/0358005, hereinafter Tomar).

Regarding claim 2, the rejection of claim 1 is incorporated. Joshi discloses all of the elements of the current invention as stated above. However, Joshi fails to expressly recite wherein obtaining the text to-be-input comprises: determining, based on the voice text and corresponding relationships between different voice texts and target texts, whether the corresponding relationships include a target text that matches the voice text; when the corresponding relationships include the target text that matches the voice text, designating the target text as the text to-be-input; and when the corresponding relationships do not include the target text that matches the voice text, designating the voice text as the text to-be-input.

Tomar teaches a vocal user interface which incorporates predicted intent into speech recognition. (Tomar, ¶¶ [0009], [0010]). Regarding claim 2, Tomar teaches wherein obtaining the text to-be-input comprises: determining, based on the voice text and corresponding relationships between different voice texts and target texts, whether the corresponding relationships include a target text that matches the voice text (“the STI module 107... can correlate feature vectors from an utterance (voice data for the voice text) with a semantic representation corresponding to phrases representing possible actions or intents of the user (referred to throughout as “predicted intent,” the target texts),” where the correlation of the feature vectors is the corresponding relationship between the utterance (voice data for the voice text) and the phrases representing possible actions, and “During usage, the STI module 107 processes feature vectors 103 and maps an utterance (voice text) to one of the pre-defined 'intents' that may correspond to phrases representing possible actions (target text) that the user might want to be performed for a given acoustic input 101,” and producing the “predicted intent 108” which includes the corresponding relationships, as input for the decision fusion module 111. The decision fusion module 111 “makes a final decision in the form of the desired user intent or action 112” using the predicted intent 108 (including confidence) thus determines whether predicted intent 108 (target text) matches the predicted text (voice text).; Tomar, ¶¶ [0045], [0048]); when the corresponding relationships include the target text that matches the voice text, designating the target text as the text to-be-input (“In the decision fusion module 111, the predicted intent 108 from the STI module 107 (target text), and predicted text 110 from the ASR module 109 (voice text) are fused to make a final decision in the form of the desired user intent or action 112... [where] the decision fusion module 111 can take into account a confidence in the predicted intent 108 and predicted text 110 to choose the outcome of the more confident system as the final output,” where the confidence in the predicted intent is the determination of whether the predicted intent (target text) matches the intent of the predicted text (voice text). As described in the example of FIG. 4, “If the predicted confidence in the outputs 404 is above the threshold, the decision fusion module 400 outputs the predicted intent or action 406 for the acoustic input 101.” In this case, the predicted intent matches the voice text and is output by the decision fusion module 111 (designated as the Tomar, ¶¶ [0048], [0060]); and when the corresponding relationships do not include the target text that matches the voice text, designating the voice text as the text to-be-input (“Alternatively, if the confidence score of the prediction in the outputs 404 is below the threshold, the decision fusion module 400 can use the ASR system outputs 402 to make a prediction about the user's intended action” or if no match to the voice text “outputs text 413 as a transcription of the acoustic input 101,” designating the predicted text (voice text) as the output (text to-be-input).; Tomar, ¶¶ [0061]-[0062]).

It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and methods for predictive text suggestion of Joshi to incorporate the teachings of Tomar to include wherein obtaining the text to-be-input comprises: determining, based on the voice text and corresponding relationships between different voice texts and target texts, whether the corresponding relationships include a target text that matches the voice text; when the corresponding relationships include the target text that matches the voice text, designating the target text as the text to-be-input; and when the corresponding relationships do not include the target text that matches the voice text, designating the voice text as the text to-be-input. The combination of “a text-independent STI and a speech to text based ASR system” can “produce improved recognition accuracy for a VUI system that can be used to control one or more devices or equipment,” as recognized by Tomar. (Tomar, ¶ [0040]).

Regarding claim 3, the rejection of claim 2 is incorporated. Joshi and Tomar disclose all of the elements of the current invention as stated above. However, Joshi fails to expressly recite wherein determining whether the corresponding relationships include the target text that matches the voice text comprises: using the voice text and the corresponding relationships between the different voice texts and the target texts as an input of a matching algorithm to obtain an output 

The relevance of Tomar is described above with relation to claim 2. Regarding claim 3, Tomar teaches wherein determining whether the corresponding relationships include the target text that matches the voice text comprises: using the voice text and the corresponding relationships between the different voice texts and the target texts as an input of a matching algorithm to obtain an output of the matching algorithm (“In the decision fusion module 111, the predicted intent 108 from the STI module 107 (target text), and predicted text 110 from the ASR module 109 (voice text) are fused to make a final decision in the form of the desired user intent or action 112... [where] the decision fusion module 111 can take into account a confidence in the predicted intent 108 (the result from the corresponding relationships between the different voice texts and the target texts in the STI module 107)... to choose the outcome of the more confident system as the final output.” As used here, confidence is determined based on the corresponding relationships as incorporated by the “the STI module 107” which is trained to “correlate feature vectors from an utterance with a semantic representation corresponding to phrases representing possible actions or intents of the user (corresponding relationships)” where the predicted intent 108 is a selected possibility from the corresponding relationships and the confidence level for that selected possibility represents the corresponding relationships. Thus, the method uses the “phrases representing possible actions or intents of the user” (thus the corresponding relationships between the different voice texts and the target texts) as an input of the STI module 107 to produce the predicted intent 108 and the “confidence in the predicted intent 108” and the predicted text (voice text) are received as input to the decision fusion module 111 to obtain an output of the matching algorithm, where the decision fusion module Tomar, ¶¶ [0045], [0048]); and determining, based on the output of the matching algorithm, whether the corresponding relationships include the target text that matches the voice text (“The decision fusion module 200 outputs the final decision of the system 100 in the form of the desired intent or action 206, a semantic representation 207 of the decoded output, and optionally a text output 208,” thus the method determines, based on the output of the desicion fusion module, whether the predicted intent 108 (the target text) selected based on the feature vectors and semantic representations (the corresponding relationships between the different voice texts and the target texts) matches the predicted text (voice text).; Tomar, ¶ [0052]).

It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and methods for predictive text suggestion of Joshi to incorporate the teachings of Tomar to include wherein determining whether the corresponding relationships include the target text that matches the voice text comprises: using the voice text and the corresponding relationships between the different voice texts and the target texts as an input of a matching algorithm to obtain an output of the matching algorithm; and determining, based on the output of the matching algorithm, whether the corresponding relationships include the target text that matches the voice text. The combination of “a text-independent STI and a speech to text based ASR system” can “produce improved recognition accuracy for a VUI system that can be used to control one or more devices or equipment,” as recognized by Tomar. (Tomar, ¶ [0040]).

Regarding claim 4, the rejection of claim 3 is incorporated. Joshi and Tomar disclose all of the elements of the current invention as stated above. However, Joshi fails to expressly recite further comprising: determining, based on the voice data, non-voice text information of the voice data; and selecting the matching algorithm corresponding to the voice data from a set of matching 

The relevance of Tomar is described above with relation to claim 2. Regarding claim 4, Tomar teaches further comprising: determining, based on the voice data, non-voice text information of the voice data (The method can further include “contextual learning component 203 [to] help the decision fusion module 200... by incorporating contextual information for an acoustic input 101.” where “contextual information [can] include time of day, background acoustics, etc.” where background acoustics are based on the utterance (voice data), and where background acoustics are non-voice which are converted to contextual information, such as “status of radio, status of the music player,” where status is a description, thus including non-voice text information; Tomar, ¶ [0053]); and selecting the matching algorithm corresponding to the voice data from a set of matching algorithms based on the non-voice text information (The system starts using an implementation, such as those shown in FIGS. 1-5. “The decision fusion module 600, using a contextual learning component 603, determines that the remaining acoustic part requires a text transcription. Having determined this, the decision fusion module 600 uses only the ASR system's text output 602 to transcribe the remaining acoustic part ‘I will be ten minutes late’, into a text message 605.” thus the system selects a different “decision fusion module implementation” (matching algorithm) from a set of decision fusion module implementations (see FIGS. 2-6 for example implementations) based in part on the contextual information (non-voice text information). Though the selection is described with reference to FIG. 6, Tomar further teaches that “choose either one or a combination of the outputs of these systems to produce an Tomar, ¶¶ [0065], [0066], FIGS. 1-6); wherein using the voice text and the corresponding relationships between the different voice texts and the target texts as the input of the matching algorithm to obtain the output of the matching algorithm comprises using the voice text and the corresponding relationships between the different voice texts and the target texts as the input of the matching algorithm corresponding to the voice data, to obtain the output of the matching algorithm (As described previously, “In the decision fusion module 111, the predicted intent 108 from the STI module 107 (target text), and predicted text 110 from the ASR module 109 (voice text) are fused to make a final decision in the form of the desired user intent or action 112... [where] the decision fusion module 111 can take into account a confidence in the predicted intent 108 (corresponding relationships between the different voice texts and the target texts)... to choose the outcome of the more confident system as the final output,” thus the method uses the “confidence in the predicted intent 108” (incorporating the feature vectors and semantic representations, thus the corresponding relationships between the different voice texts and the target texts) and the predicted text (voice text corresponding to voice data) as an input of the decision fusion module 111 (matching algorithm) to obtain an output of the decision fusion module 111 (matching algorithm) which may be chosen based on the contextual information (non-voice text information); Tomar, ¶ [0045], [0048]).

It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and methods for predictive text suggestion of Joshi to incorporate the teachings of Tomar to include further comprising: determining, based on the voice data, non-voice text information of the voice data; and selecting the matching algorithm corresponding to the voice data from a set of matching algorithms based on the non-voice text information, wherein using the voice text and the corresponding relationships between the different voice texts and the target texts as the input of the matching Tomar. (Tomar, ¶ [0040]).

Regarding claim 6, the rejection of claim 3 is incorporated. Joshi and Tomar disclose all of the elements of the current invention as stated above. However, Joshi fails to expressly recite the method further comprising: obtaining state information of a terminal to which the user inputs the voice data, wherein using the voice text and the corresponding relationships between the different voice texts and the target texts as the input of the matching algorithm, to obtain the output of the matching algorithm comprises using the state information of the terminal, the voice text, and the corresponding relationships between the different voice texts and the target texts as the input of the matching algorithm, to obtain the output of the matching algorithm.

The relevance of Tomar is described above with relation to claim 2. Regarding claim 6, Tomar teaches the method further comprising: obtaining state information of a terminal to which the user inputs the voice data (The method can further include “contextual learning component 203 [to] help the decision fusion module 200... by incorporating contextual information for an acoustic input 101.” where “contextual information [can] include status of a connected device (mobile phone etc.)” thus the state information of a terminal; Tomar, ¶ [0053]); wherein using the voice text and the corresponding relationships between the different voice texts and the target texts as the input of the matching algorithm (The system displays the transition from a first implementation, such as “one or more of the embodiments described in FIG. 1 to FIG. 5”, to a second implementation based on the contextual information (state information). “The Tomar further teaches that “choose either one or a combination of the outputs of these systems to produce an output” and that FIG. 6 is merely exemplary.; Tomar, ¶¶ [0065], [0066], FIGS. 1-6) to obtain the output of the matching algorithm comprises using the state information of the terminal, the voice text, and the corresponding relationships between the different voice texts and the target texts as the input of the matching algorithm, to obtain the output of the matching algorithm (“In the decision fusion module 111, the predicted intent 108 from the STI module 107 (target text), and predicted text 110 from the ASR module 109 (voice text) are fused to make a final decision in the form of the desired user intent or action 112... [where] the decision fusion module 111 can take into account a confidence in the predicted intent 108 and predicted text 110 to choose the outcome of the more confident system as the final output,” where the confidence in the predicted intent is the determination of whether the predicted intent (target text) matches the intent of the predicted text (voice text). As described in the example of FIG. 4, “the STI system outputs 401,” which correspond to both the predicted intent 108 (target texts) and the confidence (corresponding relationships) “are processed using a contextual learning component 403 to improve the predictions, by taking into account any available contextual information (state information)” and “If the predicted confidence in the outputs 404 is above the threshold, the decision fusion module 400 outputs the predicted intent or action 406 for the acoustic input 101.” thus when the predicted intent confidence is above a threshold (matches the Tomar, ¶¶ [0048], [0060]).

It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and methods for predictive text suggestion of Joshi to incorporate the teachings of Tomar to include wherein the method further comprising: obtaining state information of a terminal to which the user inputs the voice data, wherein using the voice text and the corresponding relationships between the different voice texts and the target texts as the input of the matching algorithm, to obtain the output of the matching algorithm comprises using the state information of the terminal, the voice text, and the corresponding relationships between the different voice texts and the target texts as the input of the matching algorithm, to obtain the output of the matching algorithm. The combination of “a text-independent STI and a speech to text based ASR system” can “produce improved recognition accuracy for a VUI system that can be used to control one or more devices or equipment,” as recognized by Tomar. (Tomar, ¶ [0040]).

Regarding claim 8, the rejection of claim 7 is incorporated. Claim 8 is substantially the same as claim 2 and is therefore rejected under the same rationale as above.

Regarding claim 9, the rejection of claim 8 is incorporated. Claim 9 is substantially the same as claim 3 and is therefore rejected under the same rationale as above.

Regarding claim 10, the rejection of claim 9 is incorporated. Claim 10 is substantially the same as claim 4 and is therefore rejected under the same rationale as above.

Regarding claim 14, the rejection of claim 13 is incorporated. Claim 14 is substantially the same as claim 2 and is therefore rejected under the same rationale as above.

Regarding claim 15, the rejection of claim 14 is incorporated. Claim 15 is substantially the same as claim 3 and is therefore rejected under the same rationale as above.

Regarding claim 16, the rejection of claim 15 is incorporated. Claim 16 is substantially the same as claim 4 and is therefore rejected under the same rationale as above.

Regarding claim 18, the rejection of claim 15 is incorporated. Claim 18 is substantially the same as claim 6 and is therefore rejected under the same rationale as above.

Claims 5, 11-12, and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Joshi and Tomar as applied to claims 4, 10, and 16 above, and further in view of Fujimoto (U.S. Pat. App. Pub. No. 2005/0144013, hereinafter Fujimoto).

Regarding claim 5, the rejection of claim 4 is incorporated. Joshi and Tomar disclose all of the elements of the current invention as stated above. However, Joshi and Tomar fail to expressly recite wherein the non-voice text information comprises at least one of emotion information, gender information, or age information.

Fujimoto teaches a conversational control apparatus and method. (Fujimoto, ¶ [0003]). Regarding claim 5, Fujimoto teaches wherein the non-voice text information comprises at least one of emotion information, gender information, or age information (The method discloses that “event information flag 840 is … generated based on Fujimoto, ¶¶ [0014], [0104]-[0107]).

It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and methods for predictive text suggestion of Joshi as modified by the predicted intent of the vocal user interface of Tomar to incorporate the teachings of Fujimoto to include wherein the non-voice text information Fujimoto. (Fujimoto, ¶ [0010]).

Regarding claim 11, the rejection of claim 10 is incorporated. Claim 11 is substantially the same as claim 5 and is therefore rejected under the same rationale as above.

Regarding claim 12, the rejection of claim 11 is incorporated. Joshi, Tomar, and Fujimoto disclose all of the elements of the current invention as stated above. However, Joshi and Fujimoto fail to expressly recite the method further comprising: obtaining state information of a terminal to which the user inputs the voice data, wherein using the voice text and the corresponding relationships between the different voice texts and the target texts as the input of the matching algorithm, to obtain the output of the matching algorithm comprises using the state information of the terminal, the voice text, and the corresponding relationships between the different voice texts and the target texts as the input of the matching algorithm, to obtain the output of the matching algorithm.

The relevance of Tomar is described above with relation to claim 2. Regarding claim 12, Tomar teaches the method further comprising: obtaining state information of a terminal to which the user inputs the voice data (The method can further include “contextual learning component 203 [to] help the decision fusion module 200... by incorporating contextual information for an acoustic input 101.” where “contextual information [can] include status of a connected device (mobile phone etc.)” (state information of a terminal); Tomar, ¶ [0053]); wherein using the voice text and the corresponding relationships between the different voice texts and the target texts as the input of the matching algorithm (The system displays the transition from a first implementation, such as “one or more of the embodiments described in FIG. 1 to FIG. 5”, to a second implementation based on the contextual information (state information). “The decision fusion module 600, using a contextual learning component 603, determines that the remaining acoustic part requires a text transcription. Having determined this, the decision fusion module 600 uses only the ASR system's text output 602 to transcribe the remaining acoustic part ‘I will be ten minutes late’, into a text message 605.” thus the system selects a different “decision fusion module implementation” (matching algorithm) from a set of decision fusion module implementations (see FIGS. 2-6 for example implementations) based in part on the contextual information (non-voice text information). Though the selection is described with reference to FIG. 6, Tomar further teaches that “choose either one or a combination of the outputs of these systems to produce an output” and that FIG. 6 is merely exemplary of possible choices; Tomar, ¶¶ [0065], [0066], FIGS. 1-6) to obtain the output of the matching algorithm comprises using the state information of the terminal, the voice text, and the corresponding relationships between the different voice texts and the target texts as the input of the matching algorithm, to obtain the output of the matching algorithm (“In the decision fusion module 111, the predicted intent 108 from the STI module 107 (target text), and predicted text 110 from the ASR module 109 (voice text) are fused to make a final decision in the form of the desired user intent or action 112... [where] the decision fusion module 111 can take into account a confidence in the predicted intent 108 and predicted text 110 to choose the outcome of the more confident system as the final output,” where the confidence in the predicted intent is the determination of whether the predicted intent (target text) matches the intent of the predicted text (voice text). As described in the example of FIG. 4, “the STI system outputs 401,” which correspond to both the predicted intent 108 (target texts) and the confidence (corresponding relationships) “are processed using a contextual learning component 403 to improve the predictions, by taking into account any available contextual information (state information)” and “If the predicted confidence in the outputs 404 is above the Tomar, ¶¶ [0048], [0060]).

It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the systems and methods for predictive text suggestion of Joshi as modified by the predicted intent of the vocal user interface of Tomar and the event information of Fujimoto to further incorporate the teachings of Tomar to include wherein the method further comprising: obtaining state information of a terminal to which the user inputs the voice data, wherein using the voice text and the corresponding relationships between the different voice texts and the target texts as the input of the matching algorithm, to obtain the output of the matching algorithm comprises using the state information of the terminal, the voice text, and the corresponding relationships between the different voice texts and the target texts as the input of the matching algorithm, to obtain the output of the matching algorithm. The combination of “a text-independent STI and a speech to text based ASR system” can “produce improved recognition accuracy for a VUI system that can be used to control one or more devices or equipment,” as recognized by Tomar. (Tomar, ¶ [0040]).

Regarding claim 17, the rejection of claim 16 is incorporated. Claim 17 is substantially the same as claim 5 and is therefore rejected under the same rationale as above.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 

Purho (U.S. Pat. App. Pub. No. 2007/0100619) discloses a combined predictive speech and text recognition system
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Sean E. Serraguard whose telephone number is (313)446-6627.  The examiner can normally be reached on 07:00-17:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel C. Washburn can be reached on (571) 272-5551.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SES/Patent Examiner, Art Unit 2657                                                                                                                                                                                            

/Paras D Shah/Primary Examiner, Art Unit 2659                                                                                                                                                                                                        
04/29/2021