DETAILED ACTION
This office action is in response to Applicant’s submission filed on 7/23/2020. Claims 1-20 are pending in the application. As such, claims 1- 20 have been examined.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No. CN201910672486.6
, filed on 7/24/2019.

Information Disclosure Statement
The information disclosure statement(s)(IDS) submitted on the following date 11/19/2020, has been considered by the examiner.

Drawings
The drawing filed on 7/23/2020 have been accepted and considered by the examiner.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:


Claims 1 , 6, 12, 14, 16 – 20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claims as whole, considering all claim elements both individually and in combination, do not amount to significantly more than an abstract idea.
The independent claim 1 recites: “An apparatus comprising: one or more processors; and one or more memories storing thereon computer-readable instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising: performing speech recognition on an inputted speech to obtain a first text; correcting the first text according to a mapping relationship between words in different languages to obtain at least one second text; and outputting the first text in response to determining that the at least one second text corresponds to a same language.” Also, claim 6 recites “A method comprising: performing speech recognition on an inputted speech to obtain a first text; correcting the first text according to a mapping relationship between words in different languages to obtain at least one second text; and in response to determining that the at least one second text corresponds to different languages, determining an outputted text according to first probability values corresponding to each of the at least one second text, a respective first probability value representing a probability that the first text is corrected to a respective second text in the at least one second text.“ Lastly, claim 18 recites “One or more memories storing thereon computer-readable instructions that, when executed by one or more processors, cause the one or more 
The limitation of “performing”, obtaining”, “correcting”, outputting”, and ”determining”, as drafted covers a human organizing activities, as such they all point to an abstract idea. Performing speech recognition on a speech can be performed by human via attentively listening to the speech and transcribing the speech with a pen on a paper and as such obtaining the text corresponding to the speech that he or she heard. Therefore, human can convert speech to text, subsequently correcting and mistake that could have occurred during transcription, and then output the corrected text. Furthermore, a human who is capable of understanding other languages can do the same or translating one speech to another just as the transcription of the same language. Lastly human can further calculate the probability of an event knowing the parameters involved. This is like tossing a fair coin 100 times and calculating the probability that Heads or tail will be 50% or in case of the limitation recited here “determining an outputted text according to first probability values”. Total probability is a mathematical concept that is obtain by the multiplication of the individual ones and human can perform that with a pen and paper.
This judicial exception is not integrated into a practical application. Even though claims 1, and 18, recites dependency to processors, and a storage device, or programs, however the as filed applicant’s specification relies on executing the controller via a general-purpose 
general purpose computer (or processor) -see Par. 0010 of the Applicant’s Specification “An example embodiment of the present disclosure provides an electronic device comprising a processor and a memory, wherein the memory stores thereon computer-readable instructions which, when executed by the processor, enables the processor to at least execute the speech recognition method described herein.” Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Moreover, the limitation in the claims noted above taken individual or as an ordered set do not amount to significantly more than judicial exception. As such they are directed to an abstract idea (mental process) as discussed. Thus neither of the additional elements nor limitations ‘as taken individually or ordered set’ amount to significantly more solution activity. The claims are not patent eligible.

Claim 12, recites “wherein the determining the outputted text according to the first probability values corresponding to each of the at least one second text comprises: determining that the outputted text is a second text having the largest first probability value.” Calculating, the outcome based on series of probability of first value and second value is a mental process which can be carried out with a pen and paper by a human. The claim does not include additional elements that are sufficient to amount to significantly 

Claim 14, is directed toward human activity. It recites: “wherein the determining the outputted text according to the second probability values corresponding to each of the at least one second text comprises: in response to determining that the first text is consistent with a second text having the largest second probability value, outputting the first text; and in response to determining that the first text is inconsistent with the second text having the largest second probability value, outputting the second text having the largest second probability value.” Calculating a probability of an event is a mental process and based on predetermined values, human can accomplish it with pen and paper. Using an event “probability” information to decide whether the outputted text is appropriate is not by itself constitute an additional element that is sufficient to integrate the judicial exception into a practical application. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim directed toward abstract idea. The claims are not patent eligible.

Claim 16, is directed toward human activity. It recites “wherein the determining the outputted text according to the first probability values and the second probability values corresponding to each of the at least one second text comprises: in response to determining that the first text is consistent with a second text having the largest summed probability value, outputting the first text; and in response to determining that the first text is 

Claim 17, is directed toward human activity. It recites “wherein; the mapping relationship between words in different languages comprises a mapping relationship between words in different dialects of the same language; and the at least one second text corresponds to the same language refers to that the at least one second text corresponds to a same dialect of the same language”. Mapping similar meaning word in a vector space based on their semantic relationship can be carried out by a human with a pen and paper. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim directed toward abstract idea. The claims are not patent eligible.

Claim 19, is directed toward human activity. It recites “wherein the acts further comprise: in response to determining that the at least one second text includes a word not corresponding to the first language, determining a target second text according to probability values corresponding to each of the at least one second text; and translating the target second text into the second language”. Translation of words into different languages and dealing with an out of word vocabulary is something that human can do based on assumption and probability and sematic relationship. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim directed toward abstract idea. The claims are not patent eligible.

Claim 20, is directed toward human activity. It recites “wherein a respective probability value representing a probability that the first text is corrected to a respective second text in the at least one second text.” Deciding on the correctness of a translation based on the probability value can be done by a human. As mentioned probability calculation can be carried out by a pen and paper. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim directed toward abstract idea. The claims are not patent eligible.
	
Therefore, claims 1, 6, 12, 14, 16 – 20 are not patent eligible under 35 USC 101.



Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Arar et al. (US20200098370A1)(hereinafter "Arar").
Note: Each of the functional elements of claim 6, and 18 are identical to that of claim 1, therefore claims are mapped together.

Regarding claims 1, 6, and 18 Arar teaches an apparatus, method and a memory device comprising one or more processors; and one or more memories storing thereon computer-readable instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising: (Arar, Par. 0058:” “A computer readable signal medium can be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.”, and Par. 0061:”These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus,…”, and Par. 0044:”The workstation shown in FIG. 4 Random Access Memory [RAM] 414, Read Only Memory [ROM] 416, an I/O adapter 418 for connecting peripheral devices ….”).
performing speech recognition on an inputted speech to obtain a first text; (Arar, Par. 0050:” In one embodiment, system 500, provides for phonetic and semantic utterance matches, and multiple languages are processed in a prioritized manner. That is, STT transcription is performed by system 500 with primary, secondary, etc. languages, where the multilingual STT operation is based on a combination of both acoustic and grammar modeling. Based on a user profile, the STT processing transcribes in multiple languages, and NLP is used to match speech to languages referenced and stored in the user's profile. In one embodiment, a score is computed for each word so that system 500 can determine its relevance. Once determined to be above the threshold, processing is carried out to find a match. The context is also analyzed to check for grammatical correctness of the spoken utterance. The processing is refined through the use of an algorithm [e.g., machine learning] to keep refining it and adding more words to the corpus of transcription data.”).
correcting the first text according to a mapping relationship between words in different languages to obtain at least one second text; and (Arar, Par. 0002:” The acoustical relevance includes a second probability score. Upon determining that the second probability score is below a second threshold, it is determined whether a match for the problem word exists in the secondary language corpus. Upon determining that the match exists in the secondary language corpus, a second transcription for the utterance is provided.”).
outputting the first text in response to determining that the at least one second text corresponds to a same language. (Arar, Par. 0052:”FIG. 7 illustrates a block diagram of a process bilingual [or multilingual beyond bilingual] STT transcription, according to one embodiment. In block 710, process 700 obtains a default language corpus, e.g., an English language transcription corpus, etc., stored in a system [e.g., computing node 10, FIG. 1, processing system 300, FIG. 3, system 400, FIG. 4, system 500, FIG. 5, etc.]. In block 720, process 700 determines the default language [e.g., based on the corpus, based on geographic location, based on other local users, etc.] and a second language preference [e.g., based on a profile, based on probabilities of one language used in conjunction with another language based on a set of users and their associated spoken languages, etc.]. In block 730 process 700 obtains a second language corpus [e.g., stored in a system [e.g., computing node 10, FIG. 1, processing system 300, FIG. 3, system 400, FIG. 4, system 500, FIG. 5, etc.] based on the second language preference. In block 740 process 700 receives a first transcription of an utterance [e.g., speech, text, etc.] using the default language corpus and NLP. In block 750 process 700 determines at least one problem word [e.g., a word that does not fit within the context of neighboring words, etc.] in the first transcription based on an associated grammatical relevance to neighboring words in the first transcription [where the grammatical relevance comprises a first probability score]. In block 760 upon determining that the first probability score is below a first threshold, process 700 performs an acoustic lookup for an audible match for the problem word in the first transcription based on an associated acoustical relevance, wherein the acoustical relevance comprises a second probability score. In block 770 upon determining that the second probability score is below a second threshold, process 700 determines whether a match for the problem word exists in the secondary language corpus. upon determining that the match exists in the secondary language corpus, process 700 provides a second transcription for the utterance.”).



Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.


Claims 2, 5, 7, 10 - 15, 19, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Arar, and in further view of Waibel et al.  (US20090281789A1) (hereinafter " Waibel").

Regarding claim 2, and 7 Arar fails to explicitly disclose, however, Waibel teaches acquiring a speech sample containing a plurality of languages; (Waibel, Par.  0033:”In this example the system operates between two languages La and Lb. This is the typical implementation of a speech-to-speech dialog system involving speech-to-speech translation in both directions, from La to Lb and from Lb to La.”, and Par. 0037:”… In this example, the device 13 displays the text of audio input of a La and corresponding text in window 15. Machine translation of text La in the second language Lb is displayed in window 16.”).
performing speech recognition on the speech sample to obtain a plurality of text candidates; (Waibel, Par. 0041:” To help the user determine if the translation output is adequate, the automatically generated translation [FIG. 2, item 16] is translated back into the input language via MT module 3 or 8 and displayed with parentheses under the original input as illustrated for example in FIG. 2, item 15a. If the confidence of both speech recognition and translation are high [step 31] as determined by the ASR model, 2 or 9, and the MT module, 3 or 8, spoken output [item 26] is generated via loud speakers 5 or 6, via TTS modules 4 or 7 [step 33].” and Par. 0046:” The user can also correct the speech recognition or machine translation utterance, by re-speaking it or entering the sentence via a keyboard or handwriting interface. Alternatively a user can highlight an erroneous segment in the output hypothesis … The user can also select an erroneous segment in the output hypothesis ….Here they are applied to the speech recognition and translation modules of interactive speech translation systems.”, and Par. 0047:” If the user corrects the speech recognition output [step 43] the system first determines if the correction contains a new word [step 44]. This determination is made by checking for the word in the recognition lexicon model 20 associated with each language, La and Lb. If the word is not found the system prompts the user to add the new word to the active system vocabulary if desired [FIG. 5, step 50]. Otherwise, the probabilities in the ASR models [FIG. 3, item 17] are updated to reduce the likelihood of the same error occurring again. This can be performed in a discriminative manner where probabilities of the corrected word sequence are increased, and those of close-competing hypotheses are reduced.”).
forming a training sample from annotated texts corresponding to the plurality of text candidates and the speech sample; and training a machine translation model using the training sample. (Waibel, Par. 0009:” … it also involves manual tagging and editing of entries, collection of extensive databases involving the required word, retraining of language model and translation model probabilities and re-optimization of the entire system, so as to re-establish the consistency between all the components and components' dictionaries and to restore the statistical balance between the words, phrases and concepts in the system [probabilities have to add up to 1, and thus all words would be affected by a single word addition].”, and Par. 0076:” If the user enters a word that does not match any of the pre-defined classes within the unknown’ class. For ASR, the ‘unknown’ class is defined by words that occurred in the training data but not in the recognition lexicon. For SMT bilingual entries that do not occur in the translation lexicon are set to the unknown tag in the target language model.”, and Par. 0102:” Class-based models for a statistical machine translation framework can be trained using the procedure shown in FIG. 10. First, the training corpora of sentence pairs are normalized [step 100] and tagging models [FIG. 3, model 22] are used to tag the corpora [step 101]. One approach to do this is described in Lafferty01. In this step, sentences that combine to form a training-pair can be tagged independently, tagged jointly, or tags from one language can be projected to the other. After the entire training corpus is tagged, words within sentence-pairs are aligned [step 102]…. In this step, multi-word phrases within a tagged entity [i.e. “New York”] are treated as a single token. Next, phrases are extracted [step 103] using methods such as Koehn07 to generate class-based translation models [FIG. 3, model 23]. The tagged corpus is also used to train class-based target language models [FIG. 3, model 24]. Training may be accomplished …”, and Par. 0116:” In an embodiment, in the case where no manually annotated corpora is available for a specific language, labels can be generated by projecting labels from a first language where labels are known, across the sentence-pairs in the training corpora to the non-annotated language.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Arar in view of Waibel to acquire a speech sample containing a plurality of languages; performing speech recognition on the speech sample to obtain a plurality of text candidates; forming a training sample from annotated texts corresponding to the plurality of text candidates and the speech sample; and 



Regarding claim 5, and 10 Arar fails to explicitly disclose, however, Waibel teaches acquiring corpus samples corresponding to each of the plurality of languages; training a language model using the corpus samples corresponding to each of the plurality of languages; and initializing parameters of the machine translation model according to parameters of the language model. (Waibel, Par. 0102:” Class-based models for a statistical machine translation framework can be trained using the procedure shown in FIG. 10. First, the training corpora of sentence pairs are normalized [step 100] and tagging models [FIG. 3, model 22] are used to tag the corpora [step 101]… In this step, sentences that combine to form a training-pair can be tagged independently, tagged jointly, or tags from one language can be projected to the other. After the entire training corpus is tagged, words within sentence-pairs are aligned [step 102]… In this step, multi-word phrases within a tagged entity [i.e. “New York”] are treated as a single token. Next, phrases are extracted [step 103] using methods such as Koehn07 to generate class-based translation models [FIG. 3, model 23]. The tagged corpus is also used to train class-based target language models [FIG. 3, model 24]. Training may be accomplished…”, and Par. 0103:” To translate an input sentence the method illustrated in FIG. 11 is applied. First, the input sentence is normalized [step 105] and tagged [step 106] using a similar procedure as that applied to the training corpora. The input sentence is tagged using a monolingual tagger [FIG. machine translation decoding is performed using the same procedure used in standard statistical machine translation, However, phrase-pairs are matched at the class-level…”, and Par. 0116:” In an embodiment, in the case where no manually annotated corpora is available for a specific language, labels can be generated by projecting labels from a first language where labels are known, across the sentence-pairs in the training corpora to the non-annotated language.”, and Par. 0119:” To realize effective class-based SMT, accurate and consistent tagging across sentence-pairs is vital. We investigated two approaches to improve tagging quality; first, the introduction of bilingual features from word-alignment; and second, bilingual tagging, where both sides of a sentences-pair are jointly tagged. From the parallel training corpora 14,000 sentence-pairs were manually tagged using the 16 class labels indicated in Table 2.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Arar in view of Waibel to acquire corpus samples corresponding to each of the plurality of languages; training a language model using the corpus samples corresponding to each of the plurality of languages; and initializing parameters of the machine translation model according to parameters of the language model, in order to improve translation quality compared to the baseline system, as evidence by Waibel (see Par. 0131).

Regarding claim 11 Arar fails to explicitly disclose, however, Waibel teaches predicting the first probability values using the machine translation model. (Waibel, Par. 0041:” To help translation output is adequate, the automatically generated translation [FIG. 2, item 16] is translated back into the input language via MT module 3 or 8 and displayed with parentheses under the original input as illustrated for example in FIG. 2, item 15a. If the confidence of both speech recognition and translation are high [step 31] as determined by the ASR model, 2 or 9 ….”, and Par. 0106:” Search is performed to find the translation hypothesis with maximum likelihood P[fJ1|eI1]·P[eI1] given the translation model probability P[fJ1|eI1] [FIG. 3, model 23] and the MT class-based language model probability P[eI1] [FIG. 3, model 24].”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Arar in view of Waibel to predict the first probability values using the machine translation model, in order to improve translation quality compared to the baseline system, as evidence by Waibel (see Par. 0131).

Regarding claim 12 Arar fails to explicitly disclose, however, Waibel teaches wherein the determining the outputted text according to the first probability values corresponding to each of the at least one second text comprises: determining that the outputted text is a second text having the largest first probability value. (Waibel, Par. 0041:” To help the user determine if the translation output is adequate, the automatically generated translation [FIG. 2, item 16] is translated back into the input language via MT module 3 or 8 and displayed with parentheses under the original input as illustrated for example in FIG. 2, item 15a. If the confidence of both speech recognition and translation are high [step 31] as determined by the ASR model, 2 or 9…”, and Par. 0047:” If the user corrects the speech recognition output [step 43] the system new word [step 44]. This determination is made by checking for the word in the recognition lexicon model 20 associated with each language, La and Lb. If the word is not found the system prompts the user to add the new word to the active system vocabulary if desired [FIG. 5, step 50]. Otherwise, the probabilities in the ASR models [FIG. 3, item 17] are updated to reduce the likelihood of the same error occurring again. This can be performed in a discriminative manner where probabilities of the corrected word sequence are increased, and those of close-competing hypotheses are reduced.”, and Par. 0106:”Search is performed to find the translation hypothesis with maximum likelihood P[fJ1|eI1]·P[eI1] given the translation model probability P[fJ1|eI1] [FIG. 3, model 23] and the MT class-based language model probability P[eI1] [FIG. 3, model 24].”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Arar in view of Waibel to determine that the outputted text is a second text having the largest first probability value, in order to improve translation quality compared to the baseline system, as evidence by Waibel (see Par. 0131).

Regarding claim 13 Arar fails to explicitly disclose, however, Waibel teaches wherein the determining the outputted text according to the first probability values corresponding to each of the at least one second text comprises: inputting the at least one second text into the language model to determine second probability values corresponding to each of the at least one second text using the language model, (Waibel, Par. 0017:”In another embodiment, a multimodal interactive interface enabling a user to add new words to a speech-to-speech translation device in the field and without technical expertise is disclosed. Examples include: [1] Methods to automatically classify class of word or word-phrase to be added to the system, and automatically generate of pronunciations, and translation of the word; … [4] Method for setting language model and translation probabilities for new word; and [5] Boosting or discounting language model and translation probabilities for new learned word based on relevance to user activities, interests and history of use.”, and Par. 0018:”… Examples include: [1] Interface and methods to enable users to correct automatic speech recognition results, and use of this feedback information to adapt speech recognition components; [2] Interface and methods to enable users to correct machine translation hypotheses, and use of this feedback information to improve machine translation components; and [3] Method for automatically adjusting [enhancing or decreasing] language model, dictionary and translation model probability for correct or corrected word based on user correction.”, and Par. 0109:”In this example, even though the word “Wheeling” did not appear in the training corpora, after the user has entered the word via the “User Field Customization Module” [FIG. 1, module 12] the system is able to correctly translate the word.”, Par. 0112:”In another embodiment, both sentences in the translation-pair are jointly labeled while applying the constraint that the class-tag sets must be equivalent. Specifically, for the sentence-pair [Sa,Sb] we search for the label-sequence-pair [Ta,Th] that maximizes the joint maximum conditional probability …”, and Par. 0116:”In an embodiment, in the case where no manually annotated corpora is available for a specific language, labels can be generated by projecting labels from a first language where labels are known, across the sentence-pairs in the training corpora to the non-annotated language. “).
add new words to a speech-to-speech translation device in the field and without technical expertise is disclosed. Examples include: [1] Methods to automatically classify class of word or word-phrase to be added to the system, and automatically generate of pronunciations, and translation of the word; … [4] Method for setting language model and translation probabilities for new word; and [5] Boosting or discounting language model and translation probabilities for new learned word based on relevance to user activities, interests and history of use.”, and Par. 0018:” Examples include: [1] Interface and methods to enable users to correct automatic speech recognition results, and use of this feedback information to adapt speech recognition components; [2] Interface and methods to enable users to correct machine translation hypotheses, and use of this feedback information to improve machine translation components; and [3] Method for automatically adjusting [enhancing or decreasing] language model, dictionary and translation model probability for correct or corrected word based on user correction.”, and Par. 0112:”In another embodiment, both sentences in the translation-pair are jointly labeled while applying the constraint that the class-tag sets must be equivalent. Specifically, for the sentence-pair [Sa,Sb] we search for the label-sequence-pair [Ta,Th] that maximizes the joint maximum conditional probability…”, and Par. 0116:”In an embodiment, in the case where no manually annotated corpora is available for a specific language, labels can be generated by projecting labels from a first language where labels are known, across the sentence-pairs in the training corpora to the non-annotated language.”)
speech-to-speech translation system. The first includes a speech translation framework that enables the addition of new words to the active system vocabulary, or the switching between location or task specific vocabularies. This provides for dynamic addition of words to a speech recognition module without requiring the module to be re-started. The system uses multilingual system-dictionary and language independent word-classes across all system components in the speech-to-speech translation device, class-based machine-translation [phrase-based statistical MT, syntactic, example-based, etc.], multilingual word-class tagging during model training, based on combination of monolingual taggers, and word-class tagging in new language by way of alignment via parallel corpus from known tagged language. Second, a multimodal interactive interface enables non-experts to add new words to the system. Third, the system is designed to accommodate ASR and SMT model adaptation using multimodal feedback provided by the user. And fourth, the system has networking capability to enable sharing of corrections or words.”, and Par. 0051:”class [i.e. semantic or syntactic class of the new entry]”, and Par. 0099:”… Alignment is performed word-to-word; translation examples, or phrase-pairs are matched at the word level; and word-based language models are applied. Hierarchical translation modules such as those in Chiang05, and syntax-based translation models such as in Yamada02, extend on this by introducing intermediate structure. However, these approaches still require exact word matches. As each word is treated as a separate entity, these models do not generalize to unseen words.”, and Par. 0109:”In this example, even though the word did not appear in the training corpora, after the user has entered the word via the “User Field Customization Module” [FIG. 1, module 12] the system is able to correctly translate the word. Furthermore, as the word-class is known [in this example “@PLACE.city”] the system is able to select better translations for surrounding words and will order the words in the translation output correctly.”).
determining the outputted text according to the second probability values corresponding to each of the at least one second text. (Waibel, Par. 0017:”In another embodiment, a multimodal interactive interface enabling a user to add new words to a speech-to-speech translation device in the field and without technical expertise is disclosed. Examples include: [1] Methods to automatically classify class of word or word-phrase to be added to the system, and automatically generate of pronunciations, and translation of the word; … [4] Method for setting language model and translation probabilities for new word; and [5] Boosting or discounting language model and translation probabilities for new learned word based on relevance to user activities, interests and history of use.”, and Par. 0018:”Examples include: [1] Interface and methods to enable users to correct automatic speech recognition results, and use of this feedback information to adapt speech recognition components; [2] Interface and methods to enable users to correct machine translation hypotheses, and use of this feedback information to improve machine translation components; and [3] Method for automatically adjusting [enhancing or decreasing] language model, dictionary and translation model probability for correct or corrected word based on user correction.”, and Par. 0109:”In this example, even though the word “Wheeling” did not appear in the training corpora, after the user has entered the word via the “User Field Customization Module” [FIG. 1, module 12] correctly translate the word. “, and Par. 0112:”In another embodiment, both sentences in the translation-pair are jointly labeled while applying the constraint that the class-tag sets must be equivalent. Specifically, for the sentence-pair [Sa,Sb] we search for the label-sequence-pair [Ta,Th] that maximizes the joint maximum conditional probability…”, and Par. 0116:”In an embodiment, in the case where no manually annotated corpora is available for a specific language, labels can be generated by projecting labels from a first language where labels are known, across the sentence-pairs in the training corpora to the non-annotated language.“).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Arar in view of Waibel to input the at least one second text into the language model to determine second probability values corresponding to each of the at least one second text using the language model, wherein the language model is obtained through training by using the corpus samples corresponding to each of the plurality of languages, a respective second probability value representing a reasonableness of grammar and semantics of the respective second text; and determining the outputted text according to the second probability values corresponding to each of the at least one second text, in order to improve translation quality compared to the baseline system, as evidence by Waibel (see Par. 0131).

Regarding claim 14 Arar fails to explicitly disclose, however, Waibel teaches wherein the determining the outputted text according to the second probability values corresponding to each of the at least one second text comprises:  in response to determining that the first text is translation output is adequate, the automatically generated translation [FIG. 2, item 16] is translated back into the input language via MT module 3 or 8 and displayed with parentheses under the original input as illustrated for example in FIG. 2, item 15a. If the confidence of both speech recognition and translation are high [step 31] [consistent] as determined by the ASR model, 2 or 9, and the MT module, 3 or 8, spoken output [item 26] is generated via loud speakers 5 or 6, via TTS modules 4 or 7 [step 33]. Otherwise [inconsistent], the system indicates that the translation may be wrong via the GUI, audio and/or tactical feedback. The specific TTS module used in step 33 is selected based on the output language.”, and Par.0042:”Thereafter, if the user is dissatisfied with the generated translation, the user may intervene during the speech-to-speech translation process in any of steps from 27 to 33 or after process has completed. This invokes the Correction and Repair Module 11 at [step 35]. The correction and repair module 11 records and logs any corrections the user may make, which can be later used to update ASR modules 2 and 9 and MT modules 3 and 8 as described in detail further below in this document.”, and Par. 0047:” If the user corrects the speech recognition output [step 43] the system first determines if the correction contains a new word [step 44]. This determination is made by checking for the word in the recognition lexicon model 20 associated with each language, La and Lb. If the word is not found the system prompts the user to add the new word to the active system vocabulary if desired [FIG. 5, step 50]. Otherwise, the probabilities in the updated to reduce the likelihood of the same error occurring again. This can be performed in a discriminative manner where probabilities of the corrected word sequence are increased, and those of close-competing hypotheses are reduced.”, and Par. 0106:” Search is performed to find the translation hypothesis with maximum likelihood P[fJ1|eI1]·P[eI1] given the translation model probability P[fJ1|eI1] [FIG. 3, model 23] and the MT class-based language model probability P[eI1] [FIG. 3, model 24].”, and Par. 0109:”In this example, even though the word “Wheeling” did not appear in the training corpora, after the user has entered the word via the “User Field Customization Module” [FIG. 1, module 12] the system is able to correctly translate the word. “).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Arar in view of Waibel to in response to determining that the first text is consistent with a second text having the largest second probability value, outputting the first text; and  in response to determining that the first text is inconsistent with the second text having the largest second probability value, outputting the second text having the largest second probability value, in order to improve translation quality compared to the baseline system, as evidence by Waibel (see Par. 0131).

Regarding claim 15 Arar fails to explicitly disclose, however, Waibel teaches wherein the determining the outputted text according to the first probability values corresponding to each of the at least one second text comprises: inputting the at least one second text into the language model to determine second probability values corresponding to each of the at least one second text using the language model, (Waibel, Par. 0017:”In another embodiment, a add new words to a speech-to-speech translation device in the field and without technical expertise is disclosed. Examples include: [1] Methods to automatically classify class of word or word-phrase to be added to the system, and automatically generate of pronunciations, and translation of the word; … [4] Method for setting language model and translation probabilities for new word; and [5] Boosting or discounting language model and translation probabilities for new learned word based on relevance to user activities, interests and history of use.”, and Par. 0018:”… Examples include: [1] Interface and methods to enable users to correct automatic speech recognition results, and use of this feedback information to adapt speech recognition components; [2] Interface and methods to enable users to correct machine translation hypotheses, and use of this feedback information to improve machine translation components; and [3] Method for automatically adjusting [enhancing or decreasing] language model, dictionary and translation model probability for correct or corrected word based on user correction.”, and Par. 0109:”In this example, even though the word “Wheeling” did not appear in the training corpora, after the user has entered the word via the “User Field Customization Module” [FIG. 1, module 12] the system is able to correctly translate the word.”, Par. 0112:”In another embodiment, both sentences in the translation-pair are jointly labeled while applying the constraint that the class-tag sets must be equivalent. Specifically, for the sentence-pair [Sa,Sb] we search for the label-sequence-pair [Ta,Th] that maximizes the joint maximum conditional probability …”, and Par. 0116:”In an embodiment, in the case where no manually annotated corpora is available for a specific language, labels can be generated by projecting labels from a first language where sentence-pairs in the training corpora to the non-annotated language. “).
wherein the language model is obtained through training by using the corpus samples corresponding to each of the plurality of languages, (Waibel, Par. 0017:”In another embodiment, a multimodal interactive interface enabling a user to add new words to a speech-to-speech translation device in the field and without technical expertise is disclosed. Examples include: [1] Methods to automatically classify class of word or word-phrase to be added to the system, and automatically generate of pronunciations, and translation of the word; … [4] Method for setting language model and translation probabilities for new word; and [5] Boosting or discounting language model and translation probabilities for new learned word based on relevance to user activities, interests and history of use.”, and Par. 0018:” Examples include: [1] Interface and methods to enable users to correct automatic speech recognition results, and use of this feedback information to adapt speech recognition components; [2] Interface and methods to enable users to correct machine translation hypotheses, and use of this feedback information to improve machine translation components; and [3] Method for automatically adjusting [enhancing or decreasing] language model, dictionary and translation model probability for correct or corrected word based on user correction.”, and Par. 0112:”In another embodiment, both sentences in the translation-pair are jointly labeled while applying the constraint that the class-tag sets must be equivalent. Specifically, for the sentence-pair [Sa,Sb] we search for the label-sequence-pair [Ta,Th] that maximizes the joint maximum conditional probability…”, and Par. 0116:”In an embodiment, in the case where no manually annotated corpora is available for a specific language, labels can be generated by projecting labels from a first language where labels are known, across the sentence-pairs in the training corpora to the non-annotated language.”)
a respective second probability value representing a reasonableness of grammar and semantics of the respective second text; and (Waibel, Par. 0016:”In an embodiment, four primary features equip the system to provide a field maintainable class-based speech-to-speech translation system. The first includes a speech translation framework that enables the addition of new words to the active system vocabulary, or the switching between location or task specific vocabularies. This provides for dynamic addition of words to a speech recognition module without requiring the module to be re-started. The system uses multilingual system-dictionary and language independent word-classes across all system components in the speech-to-speech translation device, class-based machine-translation [phrase-based statistical MT, syntactic, example-based, etc.], multilingual word-class tagging during model training, based on combination of monolingual taggers, and word-class tagging in new language by way of alignment via parallel corpus from known tagged language. Second, a multimodal interactive interface enables non-experts to add new words to the system. Third, the system is designed to accommodate ASR and SMT model adaptation using multimodal feedback provided by the user. And fourth, the system has networking capability to enable sharing of corrections or words.”, and Par. 0051:”class [i.e. semantic or syntactic class of the new entry]”, and Par. 0099:”… Alignment is performed word-to-word; translation examples, or phrase-pairs are matched at the word level; and word-based language models are applied. Hierarchical translation modules such as those in Chiang05, and syntax-based translation models such as in Yamada02, extend on this by introducing intermediate structure. However, these approaches require exact word matches. As each word is treated as a separate entity, these models do not generalize to unseen words.”, and Par. 0109:”In this example, even though the word “Wheeling” did not appear in the training corpora, after the user has entered the word via the “User Field Customization Module” [FIG. 1, module 12] the system is able to correctly translate the word. Furthermore, as the word-class is known [in this example “@PLACE.city”] the system is able to select better translations for surrounding words and will order the words in the translation output correctly.”).
determining the outputted text according to the first probability values and the second probability values corresponding to each of the at least one second text. (Waibel, Par. 0017:”In another embodiment, a multimodal interactive interface enabling a user to add new words to a speech-to-speech translation device in the field and without technical expertise is disclosed. Examples include: [1] Methods to automatically classify class of word or word-phrase to be added to the system, and automatically generate of pronunciations, and translation of the word; … [4] Method for setting language model and translation probabilities for new word; and [5] Boosting or discounting language model and translation probabilities for new learned word based on relevance to user activities, interests and history of use.”, and Par. 0018:”Examples include: [1] Interface and methods to enable users to correct automatic speech recognition results, and use of this feedback information to adapt speech recognition components; [2] Interface and methods to enable users to correct machine translation hypotheses, and use of this feedback information to improve machine translation components; and [3] Method for automatically adjusting [enhancing or decreasing] language model, dictionary and translation model probability for correct or corrected word based on user corrects the speech recognition output [step 43] the system first determines if the correction contains a new word [step 44]. This determination is made by checking for the word in the recognition lexicon model 20 associated with each language, La and Lb. If the word is not found the system prompts the user to add the new word to the active system vocabulary if desired [FIG. 5, step 50]. Otherwise, the probabilities in the ASR models [FIG. 3, item 17] are updated to reduce the likelihood of the same error occurring again. This can be performed in a discriminative manner where probabilities of the corrected word sequence are increased, and those of close-competing hypotheses are reduced.”, and Par. 0109:”In this example, even though the word “Wheeling” did not appear in the training corpora, after the user has entered the word via the “User Field Customization Module” [FIG. 1, module 12] the system is able to correctly translate the word. “, and Par. 0112:”In another embodiment, both sentences in the translation-pair are jointly labeled while applying the constraint that the class-tag sets must be equivalent. Specifically, for the sentence-pair [Sa,Sb] we search for the label-sequence-pair [Ta,Th] that maximizes the joint maximum conditional probability…”, and Par. 0116:”In an embodiment, in the case where no manually annotated corpora is available for a specific language, labels can be generated by projecting labels from a first language where labels are known, across the sentence-pairs in the training corpora to the non-annotated language.“).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Arar in view of Waibel to input the at least one second text into the language model to determine second probability values corresponding to each of the at least one second text using the language model, wherein the 

Regarding claim 19 Arar fails to explicitly disclose, however, Waibel teaches in response to determining that the at least one second text includes a word not corresponding to the first language, (Waibel, Par. 0067:”In the interface, the user may select a ‘new-word’ mode from the menu, or the new word learning mode could be invoked after a user correction has yielded a new/unknown word. In the window pane that appears he/she can now type the desired new word, name, special term, concept, expression. Based on the orthographic input in the user's language [this can be character sets different from English, e.g., Chinese, Japanese, Russian, etc.]. The system then generates a transliteration in Roman alphabet and the words predicted pronunciation. This is done by conversion rules that are either hand written or extracted from preexisting phonetic dictionaries or learned from transliterated speech data. The user then views the automatic conversion and can play the sound of the generated pronunciation via TTS. The user may iterate and modify either of these representations [script, Romanized transliteration, phonetic transcription, and its sound in either language] and the other corresponding entries will be regenerated similarly [thus a modified transcription in one language may modify the transcription in the other].”, and Par. 0076:”If the user enters a word not match any of the pre-defined classes within the system, the user can assign it to the ‘unknown’ class. For ASR, the ‘unknown’ class is defined by words that occurred in the training data but not in the recognition lexicon. For SMT bilingual entries that do not occur in the translation lexicon are set to the unknown tag in the target language model.”).
determining a target second text according to probability values corresponding to each of the at least one second text; and translating the target second text into the second language. (Waibel, Par. 0100:”One embodiment of class-based machine translation is class-based statistical machine translation, in which a foreign language sentence fJ1=f1, f2, . . . , fJ is translated into another language eI1=e1, e2, . . . , e1 by searching for the hypothesis  ̂eI1 with maximum likelihood, given:

 
    PNG
    media_image1.png
    39
    503
    media_image1.png
    Greyscale


Classes can be semantic classes, such as named-entities, syntactic classes or classes consisting of equivalent words or word phrases. As an example we describe the case when named-entity classes are incorporated into the system.”, and Par. 0101:”The two most informative models applied during translation are the target language model P[eI1] and the translation model P[fJ1|eI1]. In a class-based statistical machine translation framework P[fJ1|eI1] is a class-based translation model [FIG. 3, model 23], and P[eI1] is a class-based language model [FIG. 3, model 24].”, and Par. 0102:”Class-based models for a statistical machine translation framework can be trained using the procedure shown in FIG. 10. First, the training corpora of sentence pairs are normalized [step 100] and tagging models [FIG. 3, model 22] are used to tag the corpora training-pair can be tagged independently, tagged jointly, or tags from one language can be projected to the other. After the entire training corpus is tagged, words within sentence-pairs are aligned [step 102]….In this step, multi-word phrases within a tagged entity [i.e. “New York”] are treated as a single token. Next, phrases are extracted [step 103] using methods such as Koehn07 to generate class-based translation models [FIG. 3, model 23]. The tagged corpus is also used to train class-based target language models [FIG. 3, model 24]. Training may be accomplished”, and Par. 0110:”In an embodiment, a labeled parallel corpora is obtained by independently tagging each side of the training corpora with monolingual taggers and then removing inconsistent labels from each sentence-pair. In this approach, for each sentence-pair [Sa,Sb] the label-sequence-pair [Ta,Tb] is selected which has maximum conditional probabilities P[Ta,Sa] and P[Tb,Sb]. If the occurrence count of any class-tag differs between P[Ta,Sa] and P[Tb,Sb], that class-tag is removed from the label-sequence-pair [Ta,Th].”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Arar in view of Waibel to in response to determining that the at least one second text includes a word not corresponding to the first language, determining a target second text according to probability values corresponding to each of the at least one second text; and translating the target second text into the second language, in order to improve translation quality compared to the baseline system, as evidence by Waibel (see Par. 0131).

correct automatic speech recognition results, and use of this feedback information to adapt speech recognition components; [2] Interface and methods to enable users to correct machine translation hypotheses, and use of this feedback information to improve machine translation components; and [3] Method for automatically adjusting [enhancing or decreasing] language model, dictionary and translation model probability for correct or corrected word based on user correction.”, and Par. 0046:”The user can also select an erroneous segment in the output hypothesis via the touch screen and correct it by selecting a competing hypothesis in an automatically generated drop-down list, or …. Here they are applied to the speech recognition and translation modules of interactive speech translation systems.”, and Par. 0047:”If the user corrects the speech recognition output [step 43] the system first determines if the correction contains a new word [step 44]. This determination is made by checking for the word in the recognition lexicon model 20 associated with each language, La and Lb. If the word is not found the system prompts the user to add the new word to the active system vocabulary if desired [FIG. 5, step 50]. Otherwise, the probabilities in the ASR models [FIG. 3, item 17] are updated to reduce the likelihood of the same error occurring again. This can be performed in a discriminative manner where probabilities of the corrected word sequence are increased, and those of close-competing hypotheses are reduced.”, and Par. 0068:”The system further automatically selects the most likely word class that the new word belongs to based on co-occurrence statistics of other words [with known class] in similar sentence contexts.”, and Par. 0078:”In addition to the above five entries, an intra-class probability P[w|C] is also defined. In this fashion it is possible for the system to differentiate between words belonging to the same class. Thus words that are closer to the user's tasks, preferences and habits will be preferred and a higher intra-class probability assigned. This boosting of higher intra-class probability is determined based on relevance to the user, where relevance is assessed by observing…”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Arar in view of Waibel to wherein a respective probability value representing a probability that the first text is corrected to a respective second text in the at least one second text, in order to improve translation quality compared to the baseline system, as evidence by Waibel (see Par. 0131).


Claims 3, and 8 are rejected under 35 U.S.C. 103 as being unpatentable over Arar and Waibel as applied to claims 2, and 7 respectively,  and in further view of Freitag et al.  (US20210019373A1) (hereinafter "Freitag").

Regarding claim 3, and 8 Arar and Waibel fail to explicitly disclose, however, Freitag teaches inputting the first text into the machine translation model; and correcting the first text using the machine translation model. (Freitag, Par. 0092:” In some implementations, a method implemented by one or more processors is provided that includes processing a first instance of text in a target language using a multilingual automatic post-editing model to generate first edited text, wherein the first instance of text in the target language is generated using a neural machine translation model translating a first source language to the target language, wherein the multilingual automatic post-editing model is used in correcting one or more translation errors introduced by the neural machine translation model, and wherein the multilingual post-editing model is trained for use in correcting translation errors in the target language translated from any one of a plurality of source languages.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Arar and Waibel in view of Freitag to input the first text into the machine translation model; and correcting the first text using the machine translation model, in order to improve the accuracy and/or robustness of edited translated text generated using an APE (automatic post-editing) model trained on such training instances, as evidence by Freitag (See Par. 0010)

Claims 4, and 9 are rejected under 35 U.S.C. 103 as being unpatentable over Arar and Waibel as applied to claims 2, and 7 respectively,  and in further view of Scaria et al.  (US20200126544A1) (hereinafter "Scaria").

Regarding claim 4, and 9 Arar and Waibel fail to explicitly disclose, however, Scaria teaches wherein the machine translation model is composed of an encoder and a decoder; and (Scaria, Par. 0033:” Deep neural network [DNN] Decoder 312, can include a deep neural network architecture including a plurality of fully connected computational nodes, or include Encoder 306, including 1-D convolutional layers and fully connected layers in a LSTM configuration.”).
the encoder or the decoder include any one of the following neural network models: a recurrent neural network model, a long short-term memory network model, and a bidirectional long short-term memory network model. (Scaria, Par. 0032:” ... The fully connected layers are connected in a long short-term memory [LSTM] configuration to permit DNN 306 to determine Interlingua language words and phrases by processing spoken language input 302 including includes multiple text data words. Text data natural language input 302 can include multiple text data words corresponding to multiple time samples within spoken language input 302. DNN 306 determines an Interlingua language output 308 based on context formed by the relative position of words in text data text data natural language input 302 based on language token 304. An example of a deep neural network including 1-D convolution and LSTM layers is the Multilingual Neural Machine Translation System, a system developed by Google, Inc., Mountain View, Calif.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Arar and Waibel in view of Scaria to wherein the machine translation model is composed of an encoder and a decoder; and the encoder or the decoder include any one of the following neural network models: a recurrent neural network model, a long short-term memory network model, and a bidirectional long short-term memory network model, in order to improve NLU system which can determine an Interlingua language response to translate the Interlingua language response into one of a plurality of natural languages, as evidence by Scaria (See Par. 0030).


Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Arar and Waibel as applied to claim 15,  and in further view of Gonzalez-Dominguez et al.  (US20150364129A1) (hereinafter " Gonzalez-Dominguez ").

Regarding claim 16 Arar fails to explicitly disclose, however, Waibel teaches in response to determining that the first text is consistent with a second text having the largest summed probability value, outputting the first text; and (Waibel, Par. 0017:”… [4] Method for setting language model and translation probabilities for new word; and [5] Boosting or discounting language model and translation probabilities for new learned word based on relevance to user activities, interests and history of use.”, and Par. 0018:”…Examples include: [1] Interface and methods to enable users to correct automatic speech recognition results, and use of this feedback information to adapt speech recognition components; [2] Interface and methods to enable users to correct machine translation hypotheses, and use of this feedback information to improve machine translation components; and [3] Method for automatically adjusting [enhancing or decreasing] language model, dictionary and translation model probability for correct or corrected word based on user correction.”, and Par. 0047:”If the user corrects the speech recognition output [step 43] the system first determines if the correction contains a new word [step 44]. This determination is made by checking for the word in the recognition lexicon model 20 associated with each language, La and Lb. If the word is not found the system prompts the user to add the new word to the active system vocabulary if desired [FIG. 5, step probabilities in the ASR models [FIG. 3, item 17] are updated to reduce the likelihood of the same error occurring again. This can be performed in a discriminative manner where probabilities of the corrected word sequence are increased, and those of close-competing hypotheses are reduced.”, and Par. 0112:”In another embodiment, both sentences in the translation-pair are jointly labeled while applying the constraint that the class-tag sets must be equivalent. Specifically, for the sentence-pair [Sa, Sb] we search for the label-sequence-pair [Ta, Th] that maximizes the joint maximum conditional probability…”, and Par. 0116:”In an embodiment, in the case where no manually annotated corpora is available for a specific language, labels can be generated by projecting labels from a first language where labels are known, across the sentence-pairs in the training corpora to the non-annotated language. “). Note, since each probability are the highest the sum will also be highest.
in response to determining that the first text is inconsistent with the second text having the largest summed probability value (Waibel, Par. 0041:” To help the user determine if the translation output is adequate, the automatically generated translation [FIG. 2, item 16] is translated back into the input language via MT module 3 or 8 and displayed with parentheses under the original input as illustrated for example in FIG. 2, item 15a. If the confidence of both speech recognition and translation are high [step 31] [consistent] as determined by the ASR model, 2 or 9, and the MT module, 3 or 8, spoken output [item 26] is generated via loud speakers 5 or 6, via TTS modules 4 or 7 [step 33]. Otherwise [inconsistent], the system indicates that the translation may be wrong via the GUI, audio and/or tactical feedback. The specific TTS module used in step 33 is selected based on the output language.”, and Par.0042:”Thereafter, if the user is dissatisfied with the generated translation, the user may intervene during the speech-to-speech translation process in any of steps from 27 to 33 or after process has completed. This invokes the Correction and Repair Module 11 at [step 35]. The correction and repair module 11 records and logs any corrections the user may make, which can be later used to update ASR modules 2 and 9 and MT modules 3 and 8 as described in detail further below in this document.”, and Par. 0047:” If the user corrects the speech recognition output [step 43] the system first determines if the correction contains a new word [step 44]. This determination is made by checking for the word in the recognition lexicon model 20 associated with each language, La and Lb. If the word is not found the system prompts the user to add the new word to the active system vocabulary if desired [FIG. 5, step 50]. Otherwise, the probabilities in the ASR models [FIG. 3, item 17] are updated to reduce the likelihood of the same error occurring again. This can be performed in a discriminative manner where probabilities of the corrected word sequence are increased, and those of close-competing hypotheses are reduced.”, and Par. 0106:” Search is performed to find the translation hypothesis with maximum likelihood P[fJ1|eI1]·P[eI1] given the translation model probability P[fJ1|eI1] [FIG. 3, model 23] and the MT class-based language model probability P[eI1] [FIG. 3, model 24].”, and Par. 0109:”In this example, even though the word “Wheeling” did not appear in the training corpora, after the user has entered the word via the “User Field Customization Module” [FIG. 1, module 12] the system is able to correctly translate the word. “).
outputting the second text having the largest summed probability value (Waibel, Par. 0017:” … [4] Method for setting language model and translation probabilities for new word; and [5] Boosting or discounting language model and translation probabilities for new learned word based on relevance to user activities, interests and history of use.”, and Par. 0018:”, and translation model probability for correct or corrected word based on user correction.”, and Par. 0047:”If the user corrects the speech recognition output [step 43] the system first determines if the correction contains a new word [step 44]. This determination is made by checking for the word in the recognition lexicon model 20 associated with each language, La and Lb. If the word is not found the system prompts the user to add the new word to the active system vocabulary if desired [FIG. 5, step 50]. Otherwise, the probabilities in the ASR models [FIG. 3, item 17] are updated to reduce the likelihood of the same error occurring again. This can be performed in a discriminative manner where probabilities of the corrected word sequence are increased, and those of close-competing hypotheses are reduced.”, and Par. 0112:”In another embodiment, both sentences in the translation-pair are jointly labeled while applying the constraint that the class-tag sets must be equivalent. Specifically, for the sentence-pair [Sa, Sb] we search for the label-sequence-pair [Ta, Th] that maximizes the joint maximum conditional probability…”, and Par. 0116:”In an embodiment, in the case where no manually annotated corpora is available for a specific language, labels can be generated by projecting labels from a first language where labels are known, across the sentence-pairs in the training corpora to the non-annotated language. “). Note, since each probability are the highest the sum will also be highest.
wherein [[a respective summed probability value of the respective second text represents a weighted sum]] of the respective first probability value and the respective second probability value corresponding to the respective second text. (Waibel, Par. 0041:” To help the user determine if the translation output is adequate, the automatically generated translation confidence of both speech recognition and translation are high [step 31] [consistent] as determined by the ASR model, 2 or 9, and the MT module, 3 or 8, spoken output [item 26] is generated via loud speakers 5 or 6, via TTS modules 4 or 7 [step 33]. Otherwise [inconsistent], the system indicates that the translation may be wrong….”, and Par. 0042:” Thereafter, if the user is dissatisfied with the generated translation, the user may intervene during the speech-to-speech translation process in any of steps from 27 to 33 or after process has completed. This invokes the Correction and Repair Module 11 at [step 35].”, and Par. 0047:” If the user corrects the speech recognition output [step 43] the system first determines if the correction contains a new word [step 44]. This determination is made by checking for the word in the recognition lexicon model 20 associated with each language, La and Lb. If the word is not found the system prompts the user to add the new word to the active system vocabulary if desired [FIG. 5, step 50]. Otherwise, the probabilities in the ASR models [FIG. 3, item 17] are updated to reduce the likelihood of the same error occurring again. This can be performed in a discriminative manner where probabilities of the corrected word sequence are increased, and those of close-competing hypotheses are reduced.”, and Par. 0106:” Search is performed to find the translation hypothesis with maximum likelihood P[fJ1|eI1]·P[eI1] given the translation model probability P[fJ1|eI1] [FIG. 3, model 23] and the MT class-based language model probability P[eI1] [FIG. 3, model 24].”, and Par. 0109:”In this example, even though the word “Wheeling” did not appear in the training corpora, after the user has entered the word via the “User Field Customization Module” [FIG. 1, module 12] the system is able to correctly translate the word. “).

Arar and Waibel fail to explicitly disclose, however, Gonzalez-Dominguez teaches a respective summed probability value of the respective second text represents a weighted sum (Gonzalez-Dominguez, Par. 0009:”For example, the method further includes receiving, from the particular one of the multiple speech recognizers, a preliminary language model confidence score that indicates a preliminary level of confidence that a language model has in the preliminary transcription of the utterance in a language corresponding to the language model; and determining that the preliminary language model confidence score is less than a language model confidence score received from the particular one of the multiple speech recognizers. Providing the speech data to a language identification module includes providing the speech data to a neural network that has been trained to provide likelihood scores for multiple languages. Selecting the language based on the language identification scores and the language model confidence scores includes determining a combined score for each of multiple languages, wherein the combined score for each language is based on at least the language identification score for the language and the language model confidence score for the language; and selecting the language based on the combined scores. Determining a combined score for each of multiple languages includes weighting the likelihood scores or the language model confidence scores using one or more weighting values. Receiving the speech data includes receiving speech data that includes an utterance of a user; further including before receiving the speech data, receiving data indicating multiple languages that the user speaks; storing data indicating the multiple languages that the user speaks; wherein providing the speech data to multiple speech recognizers that are each configured to recognize speech in a different language includes based on the stored data indicating the multiple languages that the user speaks, providing the speech data to a set of speech recognizers configured to recognize speech in a different one of the languages that the user speaks.”, and Par. 0043:”In some configurations, a weighting may be applied to each component confidence score 126 and 120. This may be desirable, for example, if empirical testing [e.g., for a single user 102, for a class of users, for all users] shows that a particular weighting gives more favorable results. For example, language model scores 120 may be slightly more or slightly less predictive of the language spoken than output of the language identification model, and may accordingly be given a slightly higher or lower weight.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Arar and Waibel in view of Gonzalez-Dominguez to respective summed probability value of the respective second text .


Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Arar as applied to claim 6,  and in further view of Miodrag Potkonjak (US20100324894A1) (hereinafter " Potkonjak").

Regarding claim 17 Arar fails to explicitly disclose, however, Potkonjak teaches wherein; the mapping relationship between words in different languages comprises a mapping relationship between words in different dialects of the same language; and the at least one second text corresponds to the same language refers to that the at least one second text corresponds to a same dialect of the same language. (Potkonjak, Par. 009:” This disclosure is drawn to methods, apparatus, systems and computer program products related to voice to text to voice processing. An audio signal comprising spoken voice can be processed such that the signal conforms to a set of specified constraints and objectives. The audio signal can be preprocessed and translated into text prior to being analyzed and reorganized in the textual domain. The resulting text can then be converted into a new voice format where additional processing may be conducted. The voice to text to voice processing may translate the voice content from a specific language to the same language with improved clarity, corrected grammar, adjusted vocabulary level, corrected slang, altered dialect, altered accent, or other 
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Arar in view of Potkonjak to wherein; the mapping relationship between words in different languages comprises a mapping relationship between words in different dialects of the same language; and the at least one second text corresponds to the same language refers to that the at least one second text corresponds to a same dialect of the same language, in order to improve human communications over both wired and wireless devices and infrastructure, as evidence by Potkonjak (See Par. 0010).


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure. Zou et al. (US-20180336900A1) teaches Par. 0004:”As speech technologies develop, speech transcription from speech to a corresponding text gradually prevails in daily life. However, the current speech transcription technique can only recognize and transcribe speech in the current language, for example, a corresponding transcription result of one mandarin speech is a text of Chinese characters corresponding to the speech. The current speech transcription technique cannot satisfy the demand of cross-language speech transcription, for example, it is impossible to directly translingually transcribe one input mandarin speech into a corresponding English translation text. To implement cross-.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DARIOUSH AGAHI whose telephone number is (408)918-7689. The examiner can normally be reached Monday - Thursday and alternate Fridays, 7:30-4:30 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic 





/DARIOUSH AGAHI/             Examiner, Art Unit 2656                                                                                                                                                                                           
/HUYEN X VO/             Primary Examiner, Art Unit 2656