DETAILED ACTION
This office action is in response to Applicant’s submission filed on 6/22/2022. Claims 1, 2, 5, 6 – 7, 10, and 18 were amended. Claim 21 is new. Claims 1-21 are pending in the application. As such, claims 1-21 have been examined.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Argument
Applicant’s arguments filed in the Amendment filed 6/22/2022 (herein “Amendment”) with respect to the 35 U.S.C. §101 rejection raised in the previous office action have been fully considered, and they are persuasive. Therefore, the rejection of claims 1, 6, 12, 14, 16 – 20 under 35 U.S.C. §101 is withdrawn.
Applicant’s arguments filed in the Amendment with respect to the 35 USC §102 rejection against claims 1, 6, and 18 have been fully considered, but are persuasive only to the extent that the amendments have changed the broadest reasonable interpretation, thus necessitating a new ground of rejection in view of newly cited an NPL reference of Prajit et al. “Unsupervised Pretraining for Sequence to Sequence Learning,” and Tu et al. (US20200081982A1). 
Specifically, on page 11 of the Amendment, Applicant sets forth:  Waibel does not teach or fairly suggest initializing parameters of the class-based translation model (model 23) according the parameters of the class-based language model. Even though Examiner does not agree, nevertheless the new reference(s) applied in this Office Action should remedy any concerns brought about by Applicant in the “Amendment.”
Therefore, while all of the Applicant’s arguments and amendments filed in the Amendment have been fully considered, they are not persuasive. Please see prior art section below for more detail including updated citations and obviousness rationale.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.


Claims 1, 6, 18 and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Arar (US20200098370A1), and in further view of Tu et al. (US20200081982A1)(herein “Tu”), and Prajit et al. “Unsupervised Pretraining for Sequence to Sequence Learning” Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 383–391 Copenhagen, Denmark, September 7–11, 2017 (herein " Prajit").

Note: Each of the functional elements of claim 6, and 18 are similar to that of claim 1, therefore claims are mapped together.
Arar was applied in the previous Office Action.
Regarding claims 1, 6, and 18 Arar teaches an apparatus, method and a memory device comprising one or more processors; and one or more memories storing thereon computer-readable instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising: (Arar, Par. 0058:” “A computer readable signal medium can be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.”, and Par. 0061:”These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus,…”, and Par. 0044:”The workstation shown in FIG. 4 can include a Random Access Memory [RAM] 414, Read Only Memory [ROM] 416, an I/O adapter 418 for connecting peripheral devices ….”).
performing speech recognition on an inputted speech to obtain a first text; (Arar, Par. 0050:” In one embodiment, system 500, provides for phonetic and semantic utterance matches, and multiple languages are processed in a prioritized manner. That is, STT transcription is performed by system 500 with primary, secondary, etc. languages, where the multilingual STT operation is based on a combination of both acoustic and grammar modeling. Based on a user profile, the STT processing transcribes in multiple languages, and NLP is used to match speech to languages referenced and stored in the user's profile. In one embodiment, a score is computed for each word so that system 500 can determine its relevance. Once determined to be above the threshold, processing is carried out to find a match. The context is also analyzed to check for grammatical correctness of the spoken utterance. The processing is refined through the use of an algorithm [e.g., machine learning] to keep refining it and adding more words to the corpus of transcription data.”).
outputting the first text in response to determining that the at least one second text corresponds to a same language, (Arar, Par. 0052:” FIG. 7 illustrates a block diagram of a process 700 for bilingual [or multilingual beyond bilingual] STT transcription, according to one embodiment. In block 710, process 700 obtains a default language corpus, e.g., an English language transcription corpus, etc., stored in a system [e.g., computing node 10, FIG. 1, processing system 300, FIG. 3, system 400, FIG. 4, system 500, FIG. 5, etc.]. In block 720, process 700 determines the default language [e.g., based on the corpus, based on geographic location, based on other local users, etc.] and a second language preference [e.g., based on a profile, based on probabilities of one language used in conjunction with another language based on a set of users and their associated spoken languages, etc.]. In block 730 process 700 obtains a second language corpus [e.g., stored in a system [e.g., computing node 10, FIG. 1, processing system 300, FIG. 3, system 400, FIG. 4, system 500, FIG. 5, etc.] based on the second language preference. In block 740 process 700 receives a first transcription of an utterance [e.g., speech, text, etc.] using the default language corpus and NLP. In block 750 process 700 determines at least one problem word [e.g., a word that does not fit within the context of neighboring words, etc.] in the first transcription based on an associated grammatical relevance to neighboring words in the first transcription [where the grammatical relevance comprises a first probability score]. In block 760 upon determining that the first probability score is below a first threshold, process 700 performs an acoustic lookup for an audible match for the problem word in the first transcription based on an associated acoustical relevance, wherein the acoustical relevance comprises a second probability score. In block 770 upon determining that the second probability score is below a second threshold, process 700 determines whether a match for the problem word exists in the secondary language corpus. In block 780 upon determining that the match exists in the secondary language corpus, process 700 provides a second transcription for the utterance.”).
correcting, [[by inputting the first text into a trained machine translation model,]] the first text according to a mapping relationship between words in different languages to obtain at least one second text; and (Arar, Par. 0002:” The acoustical relevance includes a second probability score. Upon determining that the second probability score is below a second threshold, it is determined whether a match for the problem word exists in the secondary language corpus. Upon determining that the match exists in the secondary language corpus, a second transcription for the utterance is provided.”).
Arar fails to explicitly disclose, however, Tu teaches by inputting the first text into a trained machine translation model, (Tu, Par. 0006:” An aspect of the present disclosure provides a translation model based training method for a computer device. The method includes inputting a source sentence [first text] to a translation model, to obtain a target sentence outputted by the translation model;”)
training the machine translation model using training samples to obtain the trained machine translation model. (Tu, Par. 0077:” In an embodiment, the translation model may be trained in the following manner: initializing an input layer, an intermediate layer, and an output layer that are included by the translation model; constructing a training sample set, the training sample set including to-be-translated source sentences and target sentences as translation results corresponding to the source sentences; initializing a loss function established based on the input of the translation model, the output of the translation model, and a translation model parameter[s]; using source sentences of selected training samples as input, using target sentences of the selected training samples as output, and calculating, by using a maximum likelihood estimate algorithm, an updated value of the translation model parameter in a dimension that corresponds to the selected training sample when the loss function obtains a minimum value relative to the selected training samples; fusing updated values corresponding to selected training samples based on a similarity between the corresponding samples; and updating the translation model parameter based on an updated value that is obtained through fusion and that is of the translation model parameters corresponding to the training sample set. During actual implementation, the translation model parameter may be updated in a manner of superposing the updated value that is obtained through fusion and that is of the translation model parameter corresponding to the training sample set and the translation model parameter before the update.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Arar in view of Tu to input the first text into a trained machine translation model, training the machine translation model using training samples to obtain the trained machine translation model, in order to improve the accuracy of the translation of the translation model, as evidence by Tu (See Par. 0048).
Arar, and Tu fail to explicitly disclose, however, Prajit teaches wherein the trained machine translation model is obtained by performing actions including: initializing parameters of a machine translation model according to parameters of a trained language model, and (Prajit, Abstract:” This work presents a general unsupervised learning method to improve the accuracy of sequence to sequence [seq2seq] models. In our method, the weights of the encoder and decoder of a seq2seq model are initialized with the pretrained weights of two language models and then fine-tuned with labeled data. We apply this method to challenging benchmarks in machine translation and abstractive summarization and find that it significantly improves the subsequent supervised models.”, and Introduction, page 383:” we propose a simple and effective technique for using unsupervised pretraining to improve seq2seq models. Our proposal is to initialize both encoder and decoder networks with pretrained weights of two language models.”)
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Arar and Tu in view of Prajit to initialize parameters of a machine translation model according to parameters of a trained language model, in order to improve sequence to sequence learning, as evidence by Prajit (See Conclusion, Page 390).

Regarding claim 21, Tu further teaches initializing the parameters of an input layer and a first RNN layer of the machine translation model using the parameters of the trained language model. (Tu, Tu, Par. 0077:” In an embodiment, the translation model may be trained in the following manner: initializing an input layer, an intermediate layer, and an output layer that are included by the translation model; constructing a training sample set, the training sample set including to-be-translated source sentences and target sentences as translation results corresponding to the source sentences; initializing a loss function established based on the input of the translation model, the output of the translation model, and a translation model parameter[s]; using source sentences of selected training samples as input, using target sentences of the selected training samples as output, and calculating, by using a maximum likelihood estimate algorithm, an updated value of the translation model parameter in a dimension that corresponds to the selected training sample when the loss function obtains a minimum value relative to the selected training samples; fusing updated values corresponding to selected training samples based on a similarity between the corresponding samples; and updating the translation model parameter based on an updated value that is obtained through fusion and that is of the translation model parameters corresponding to the training sample set. During actual implementation, the translation model parameter may be updated in a manner of superposing the updated value that is obtained through fusion and that is of the translation model parameter corresponding to the training sample set and the translation model parameter before the update.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Arar in view of Tu to initialize the parameters of an input layer and a first RNN layer of the machine translation model using the parameters of the trained language model, in order to improve the accuracy of the translation of the translation model, as evidence by Tu (See Par. 0048).


Claims 2, 5, 7, 10 - 15, 19, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Arar, Tu and Prajit,  and in further view of Waibel  (US20090281789A1).

Waibel was applied in the previous Office Action.
Regarding claims 2, and 7, Arar, Tu, and Prajit fail to explicitly disclose, however, Waibel teaches wherein the acts further comprise acquiring a speech sample containing a plurality of languages; (Waibel, Par.  0033:” In this example the system operates between two languages La and Lb. This is the typical implementation of a speech-to-speech dialog system involving speech-to-speech translation in both directions, from La to Lb and from Lb to La.”, and Par. 0037:” … In this example, the device 13 displays the text of audio input of a La and corresponding text in window 15. Machine translation of text La in the second language Lb is displayed in window 16.”).
performing speech recognition on the speech sample to obtain a plurality of text candidates; (Waibel, Par. 0041:” To help the user determine if the translation output is adequate, the automatically generated translation [FIG. 2, item 16] is translated back into the input language via MT module 3 or 8 and displayed with parentheses under the original input as illustrated for example in FIG. 2, item 15a. If the confidence of both speech recognition and translation are high [step 31] as determined by the ASR model, 2 or 9, and the MT module, 3 or 8, spoken output [item 26] is generated via loud speakers 5 or 6, via TTS modules 4 or 7 [step 33].” and Par. 0046:” The user can also correct the speech recognition or machine translation output via a number of modalities. The user can correct the entire utterance, by re-speaking it or entering the sentence via a keyboard or handwriting interface. Alternatively a user can highlight an erroneous segment in the output hypothesis … The user can also select an erroneous segment in the output hypothesis …. Here they are applied to the speech recognition and translation modules of interactive speech translation systems.”, and Par. 0047:” If the user corrects the speech recognition output [step 43] the system first determines if the correction contains a new word [step 44]. This determination is made by checking for the word in the recognition lexicon model 20 associated with each language, La and Lb. If the word is not found the system prompts the user to add the new word to the active system vocabulary if desired [FIG. 5, step 50]. Otherwise, the probabilities in the ASR models [FIG. 3, item 17] are updated to reduce the likelihood of the same error occurring again. This can be performed in a discriminative manner where probabilities of the corrected word sequence are increased, and those of close-competing hypotheses are reduced.”).
forming a training sample from annotated texts corresponding to the plurality of text candidates and the speech sample; and training a machine translation model using the training sample to obtain the trained machine translation model. (Waibel, Par. 0009:” … it also involves manual tagging and editing of entries, collection of extensive databases involving the required word, retraining of language model and translation model probabilities and re-optimization of the entire system, so as to re-establish the consistency between all the components and components' dictionaries and to restore the statistical balance between the words, phrases and concepts in the system [probabilities have to add up to 1, and thus all words would be affected by a single word addition].”, and Par. 0076:” If the user enters a word that does not match any of the pre-defined classes within the system, the user can assign it to the ‘unknown’ class. For ASR, the ‘unknown’ class is defined by words that occurred in the training data but not in the recognition lexicon. For SMT bilingual entries that do not occur in the translation lexicon are set to the unknown tag in the target language model.”, and Par. 0102:” Class-based models for a statistical machine translation framework can be trained using the procedure shown in FIG. 10. First, the training corpora of sentence pairs are normalized [step 100] and tagging models [FIG. 3, model 22] are used to tag the corpora [step 101]. One approach to do this is described in Lafferty01. In this step, sentences that combine to form a training-pair can be tagged independently, tagged jointly, or tags from one language can be projected to the other. After the entire training corpus is tagged, words within sentence-pairs are aligned [step 102]…. In this step, multi-word phrases within a tagged entity [i.e. “New York”] are treated as a single token. Next, phrases are extracted [step 103] using methods such as Koehn07 to generate class-based translation models [FIG. 3, model 23]. The tagged corpus is also used to train class-based target language models [FIG. 3, model 24]. Training may be accomplished …”, and Par. 0116:” In an embodiment, in the case where no manually annotated corpora is available for a specific language, labels can be generated by projecting labels from a first language where labels are known, across the sentence-pairs in the training corpora to the non-annotated language.”, and Par. 0102:” Class-based models for a statistical machine translation framework can be trained using the procedure shown in FIG. 10. First, the training corpora of sentence pairs are normalized [step 100] and tagging models [FIG. 3, model 22] are used to tag the corpora [step 101]. One approach to do this is described in Lafferty01. In this step, sentences that combine to form a training-pair can be tagged independently, tagged jointly, or tags from one language can be projected to the other.”, and Par. 0029: “FIG. 9 is a flow chart illustrating the steps required to train class-based MT models;”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Arar in view of Waibel to acquire a speech sample containing a plurality of languages; performing speech recognition on the speech sample to obtain a plurality of text candidates; forming a training sample from annotated texts corresponding to the plurality of text candidates and the speech sample; and training a machine translation model using the training sample to obtain the trained machine translation model, in order to improve translation quality compared to the baseline system, as evidence by Waibel (see Par. 0131).

Regarding claims 5, and 10, Arar, Tu and Prajit fail to explicitly disclose, however, Waibel teaches wherein the acts further comprise acquiring corpus samples corresponding to each of the plurality of languages; and training a language model using the corpus samples corresponding to each of the plurality of languages. (Waibel, Par. 0102:” Class-based models for a statistical machine translation framework can be trained using the procedure shown in FIG. 10. First, the training corpora of sentence pairs are normalized [step 100] and tagging models [FIG. 3, model 22] are used to tag the corpora [step 101] … In this step, sentences that combine to form a training-pair can be tagged independently, tagged jointly, or tags from one language can be projected to the other. After the entire training corpus is tagged, words within sentence-pairs are aligned [step 102] … In this step, multi-word phrases within a tagged entity [i.e. “New York”] are treated as a single token. Next, phrases are extracted [step 103] using methods such as Koehn07 to generate class-based translation models [FIG. 3, model 23]. The tagged corpus is also used to train class-based target language models [FIG. 3, model 24]. Training may be accomplished…”, and Par. 0103:” To translate an input sentence the method illustrated in FIG. 11 is applied. First, the input sentence is normalized [step 105] and tagged [step 106] using a similar procedure as that applied to the training corpora. The input sentence is tagged using a monolingual tagger [FIG. 3, model 22]. Next, the input sentence is decoded using class-based MT models [FIG. 3, models 23 and 24]. For class-based statistical machine translation decoding is performed using the same procedure used in standard statistical machine translation, However, phrase-pairs are matched at the class-level…”, and Par. 0116:” In an embodiment, in the case where no manually annotated corpora is available for a specific language, labels can be generated by projecting labels from a first language where labels are known, across the sentence-pairs in the training corpora to the non-annotated language.”, and Par. 0119:” To realize effective class-based SMT, accurate and consistent tagging across sentence-pairs is vital. We investigated two approaches to improve tagging quality; first, the introduction of bilingual features from word-alignment; and second, bilingual tagging, where both sides of a sentences-pair are jointly tagged. From the parallel training corpora 14,000 sentence-pairs were manually tagged using the 16 class labels indicated in Table 2.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Arar, Tu, and Prajit in view of Waibel to acquire corpus samples corresponding to each of the plurality of languages, in order to improve translation quality compared to the baseline system, as evidence by Waibel (see Par. 0131).

Regarding claim 11, Arar, Tu, and Prajit fail to explicitly disclose, however, Waibel teaches predicting the first probability values using the machine translation model. (Waibel, Par. 0041:” To help the user determine if the translation output is adequate, the automatically generated translation [FIG. 2, item 16] is translated back into the input language via MT module 3 or 8 and displayed with parentheses under the original input as illustrated for example in FIG. 2, item 15a. If the confidence of both speech recognition and translation are high [step 31] as determined by the ASR model, 2 or 9 ….”, and Par. 0106:” Search is performed to find the translation hypothesis with maximum likelihood P[fJ1|eI1]·P[eI1] given the translation model probability P[fJ1|eI1] [FIG. 3, model 23] and the MT class-based language model probability P[eI1] [FIG. 3, model 24].”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Arar, Tu, and Prajit in view of Waibel to predict the first probability values using the machine translation model, in order to improve translation quality compared to the baseline system, as evidence by Waibel (see Par. 0131).

Regarding claim 12, Arar, Tu, and Prajit fail to explicitly disclose, however, Waibel teaches wherein the determining the outputted text according to the first probability values corresponding to each of the at least one second text comprises: determining that the outputted text is a second text having the largest first probability value. (Waibel, Par. 0041:” To help the user determine if the translation output is adequate, the automatically generated translation [FIG. 2, item 16] is translated back into the input language via MT module 3 or 8 and displayed with parentheses under the original input as illustrated for example in FIG. 2, item 15a. If the confidence of both speech recognition and translation are high [step 31] as determined by the ASR model, 2 or 9…”, and Par. 0047:” If the user corrects the speech recognition output [step 43] the system first determines if the correction contains a new word [step 44]. This determination is made by checking for the word in the recognition lexicon model 20 associated with each language, La and Lb. If the word is not found the system prompts the user to add the new word to the active system vocabulary if desired [FIG. 5, step 50]. Otherwise, the probabilities in the ASR models [FIG. 3, item 17] are updated to reduce the likelihood of the same error occurring again. This can be performed in a discriminative manner where probabilities of the corrected word sequence are increased, and those of close-competing hypotheses are reduced.”, and Par. 0106:”Search is performed to find the translation hypothesis with maximum likelihood P[fJ1|eI1]·P[eI1] given the translation model probability P[fJ1|eI1] [FIG. 3, model 23] and the MT class-based language model probability P[eI1] [FIG. 3, model 24].”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Arar, Tu, and Prajit in view of Waibel to determine that the outputted text is a second text having the largest first probability value, in order to improve translation quality compared to the baseline system, as evidence by Waibel (see Par. 0131).

Regarding claim 13, Arar, Tu, and Prajit fail to explicitly disclose, however, Waibel teaches wherein the determining the outputted text according to the first probability values corresponding to each of the at least one second text comprises: inputting the at least one second text into the language model to determine second probability values corresponding to each of the at least one second text using the language model, (Waibel, Par. 0017:”In another embodiment, a multimodal interactive interface enabling a user to add new words to a speech-to-speech translation device in the field and without technical expertise is disclosed. Examples include: [1] Methods to automatically classify class of word or word-phrase to be added to the system, and automatically generate of pronunciations, and translation of the word; … [4] Method for setting language model and translation probabilities for new word; and [5] Boosting or discounting language model and translation probabilities for new learned word based on relevance to user activities, interests and history of use.”, and Par. 0018:”… Examples include: [1] Interface and methods to enable users to correct automatic speech recognition results, and use of this feedback information to adapt speech recognition components; [2] Interface and methods to enable users to correct machine translation hypotheses, and use of this feedback information to improve machine translation components; and [3] Method for automatically adjusting [enhancing or decreasing] language model, dictionary and translation model probability for correct or corrected word based on user correction.”, and Par. 0109:”In this example, even though the word “Wheeling” did not appear in the training corpora, after the user has entered the word via the “User Field Customization Module” [FIG. 1, module 12] the system is able to correctly translate the word.”, Par. 0112:”In another embodiment, both sentences in the translation-pair are jointly labeled while applying the constraint that the class-tag sets must be equivalent. Specifically, for the sentence-pair [Sa,Sb] we search for the label-sequence-pair [Ta,Th] that maximizes the joint maximum conditional probability …”, and Par. 0116:”In an embodiment, in the case where no manually annotated corpora is available for a specific language, labels can be generated by projecting labels from a first language where labels are known, across the sentence-pairs in the training corpora to the non-annotated language. “).
wherein the language model is obtained through training by using the corpus samples corresponding to each of the plurality of languages, (Waibel, Par. 0017:” In another embodiment, a multimodal interactive interface enabling a user to add new words to a speech-to-speech translation device in the field and without technical expertise is disclosed. Examples include: [1] Methods to automatically classify class of word or word-phrase to be added to the system, and automatically generate of pronunciations, and translation of the word; … [4] Method for setting language model and translation probabilities for new word; and [5] Boosting or discounting language model and translation probabilities for new learned word based on relevance to user activities, interests and history of use.”, and Par. 0018:” Examples include: [1] Interface and methods to enable users to correct automatic speech recognition results, and use of this feedback information to adapt speech recognition components; [2] Interface and methods to enable users to correct machine translation hypotheses, and use of this feedback information to improve machine translation components; and [3] Method for automatically adjusting [enhancing or decreasing] language model, dictionary and translation model probability for correct or corrected word based on user correction.”, and Par. 0112:”In another embodiment, both sentences in the translation-pair are jointly labeled while applying the constraint that the class-tag sets must be equivalent. Specifically, for the sentence-pair [Sa, Sb] we search for the label-sequence-pair [Ta, Th] that maximizes the joint maximum conditional probability…”, and Par. 0116:”In an embodiment, in the case where no manually annotated corpora is available for a specific language, labels can be generated by projecting labels from a first language where labels are known, across the sentence-pairs in the training corpora to the non-annotated language.”)
a respective second probability value representing a reasonableness of grammar and semantics of the respective second text; and (Waibel, Par. 0016:” In an embodiment, four primary features equip the system to provide a field maintainable class-based speech-to-speech translation system. The first includes a speech translation framework that enables the addition of new words to the active system vocabulary, or the switching between location or task specific vocabularies. This provides for dynamic addition of words to a speech recognition module without requiring the module to be re-started. The system uses multilingual system-dictionary and language independent word-classes across all system components in the speech-to-speech translation device, class-based machine-translation [phrase-based statistical MT, syntactic, example-based, etc.], multilingual word-class tagging during model training, based on combination of monolingual taggers, and word-class tagging in new language by way of alignment via parallel corpus from known tagged language. Second, a multimodal interactive interface enables non-experts to add new words to the system. Third, the system is designed to accommodate ASR and SMT model adaptation using multimodal feedback provided by the user. And fourth, the system has networking capability to enable sharing of corrections or words.”, and Par. 0051:”class [i.e. semantic or syntactic class of the new entry]”, and Par. 0099:” … Alignment is performed word-to-word; translation examples, or phrase-pairs are matched at the word level; and word-based language models are applied. Hierarchical translation modules such as those in Chiang05, and syntax-based translation models such as in Yamada02, extend on this by introducing intermediate structure. However, these approaches still require exact word matches. As each word is treated as a separate entity, these models do not generalize to unseen words.”, and Par. 0109:” In this example, even though the word “Wheeling” did not appear in the training corpora, after the user has entered the word via the “User Field Customization Module” [FIG. 1, module 12] the system is able to correctly translate the word. Furthermore, as the word-class is known [in this example “@PLACE.city”] the system is able to select better translations for surrounding words and will order the words in the translation output correctly.”).
determining the outputted text according to the second probability values corresponding to each of the at least one second text. (Waibel, Par. 0017:” In another embodiment, a multimodal interactive interface enabling a user to add new words to a speech-to-speech translation device in the field and without technical expertise is disclosed. Examples include: [1] Methods to automatically classify class of word or word-phrase to be added to the system, and automatically generate of pronunciations, and translation of the word; … [4] Method for setting language model and translation probabilities for new word; and [5] Boosting or discounting language model and translation probabilities for new learned word based on relevance to user activities, interests and history of use.”, and Par. 0018:”Examples include: [1] Interface and methods to enable users to correct automatic speech recognition results, and use of this feedback information to adapt speech recognition components; [2] Interface and methods to enable users to correct machine translation hypotheses, and use of this feedback information to improve machine translation components; and [3] Method for automatically adjusting [enhancing or decreasing] language model, dictionary and translation model probability for correct or corrected word based on user correction.”, and Par. 0109:”In this example, even though the word “Wheeling” did not appear in the training corpora, after the user has entered the word via the “User Field Customization Module” [FIG. 1, module 12] the system is able to correctly translate the word. “, and Par. 0112:” In another embodiment, both sentences in the translation-pair are jointly labeled while applying the constraint that the class-tag sets must be equivalent. Specifically, for the sentence-pair [Sa, Sb] we search for the label-sequence-pair [Ta, Th] that maximizes the joint maximum conditional probability…”, and Par. 0116:”In an embodiment, in the case where no manually annotated corpora is available for a specific language, labels can be generated by projecting labels from a first language where labels are known, across the sentence-pairs in the training corpora to the non-annotated language.“).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Arar, Tu, and Prajit in view of Waibel to input the at least one second text into the language model to determine second probability values corresponding to each of the at least one second text using the language model, wherein the language model is obtained through training by using the corpus samples corresponding to each of the plurality of languages, a respective second probability value representing a reasonableness of grammar and semantics of the respective second text; and determining the outputted text according to the second probability values corresponding to each of the at least one second text, in order to improve translation quality compared to the baseline system, as evidence by Waibel (see Par. 0131).

Regarding claim 14, Arar, Tu, and Prajit fails to explicitly disclose, however, Waibel teaches wherein the determining the outputted text according to the second probability values corresponding to each of the at least one second text comprises:  in response to determining that the first text is consistent with a second text having the largest second probability value, outputting the first text; and  in response to determining that the first text is inconsistent with the second text having the largest second probability value, outputting the second text having the largest second probability value. (Waibel, Par. 0041:” To help the user determine if the translation output is adequate, the automatically generated translation [FIG. 2, item 16] is translated back into the input language via MT module 3 or 8 and displayed with parentheses under the original input as illustrated for example in FIG. 2, item 15a. If the confidence of both speech recognition and translation are high [step 31] [consistent] as determined by the ASR model, 2 or 9, and the MT module, 3 or 8, spoken output [item 26] is generated via loud speakers 5 or 6, via TTS modules 4 or 7 [step 33]. Otherwise [inconsistent], the system indicates that the translation may be wrong via the GUI, audio and/or tactical feedback. The specific TTS module used in step 33 is selected based on the output language.”, and Par.0042:” Thereafter, if the user is dissatisfied with the generated translation, the user may intervene during the speech-to-speech translation process in any of steps from 27 to 33 or after process has completed. This invokes the Correction and Repair Module 11 at [step 35]. The correction and repair module 11 records and logs any corrections the user may make, which can be later used to update ASR modules 2 and 9 and MT modules 3 and 8 as described in detail further below in this document.”, and Par. 0047:” If the user corrects the speech recognition output [step 43] the system first determines if the correction contains a new word [step 44]. This determination is made by checking for the word in the recognition lexicon model 20 associated with each language, La and Lb. If the word is not found the system prompts the user to add the new word to the active system vocabulary if desired [FIG. 5, step 50]. Otherwise, the probabilities in the ASR models [FIG. 3, item 17] are updated to reduce the likelihood of the same error occurring again. This can be performed in a discriminative manner where probabilities of the corrected word sequence are increased, and those of close-competing hypotheses are reduced.”, and Par. 0106:” Search is performed to find the translation hypothesis with maximum likelihood P[fJ1|eI1]·P[eI1] given the translation model probability P[fJ1|eI1] [FIG. 3, model 23] and the MT class-based language model probability P[eI1] [FIG. 3, model 24].”, and Par. 0109:”In this example, even though the word “Wheeling” did not appear in the training corpora, after the user has entered the word via the “User Field Customization Module” [FIG. 1, module 12] the system is able to correctly translate the word. “).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Arar, Tu, and Prajit in view of Waibel to in response to determining that the first text is consistent with a second text having the largest second probability value, outputting the first text; and in response to determining that the first text is inconsistent with the second text having the largest second probability value, outputting the second text having the largest second probability value, in order to improve translation quality compared to the baseline system, as evidence by Waibel (see Par. 0131).

Regarding claim 15, Arar, Tu, and Prajit fail to explicitly disclose, however, Waibel teaches wherein the determining the outputted text according to the first probability values corresponding to each of the at least one second text comprises: inputting the at least one second text into the language model to determine second probability values corresponding to each of the at least one second text using the language model, (Waibel, Par. 0017:”In another embodiment, a multimodal interactive interface enabling a user to add new words to a speech-to-speech translation device in the field and without technical expertise is disclosed. Examples include: [1] Methods to automatically classify class of word or word-phrase to be added to the system, and automatically generate of pronunciations, and translation of the word; … [4] Method for setting language model and translation probabilities for new word; and [5] Boosting or discounting language model and translation probabilities for new learned word based on relevance to user activities, interests and history of use.”, and Par. 0018:”… Examples include: [1] Interface and methods to enable users to correct automatic speech recognition results, and use of this feedback information to adapt speech recognition components; [2] Interface and methods to enable users to correct machine translation hypotheses, and use of this feedback information to improve machine translation components; and [3] Method for automatically adjusting [enhancing or decreasing] language model, dictionary and translation model probability for correct or corrected word based on user correction.”, and Par. 0109:”In this example, even though the word “Wheeling” did not appear in the training corpora, after the user has entered the word via the “User Field Customization Module” [FIG. 1, module 12] the system is able to correctly translate the word.”, Par. 0112:”In another embodiment, both sentences in the translation-pair are jointly labeled while applying the constraint that the class-tag sets must be equivalent. Specifically, for the sentence-pair [Sa,Sb] we search for the label-sequence-pair [Ta,Th] that maximizes the joint maximum conditional probability …”, and Par. 0116:”In an embodiment, in the case where no manually annotated corpora is available for a specific language, labels can be generated by projecting labels from a first language where labels are known, across the sentence-pairs in the training corpora to the non-annotated language. “).
wherein the language model is obtained through training by using the corpus samples corresponding to each of the plurality of languages, (Waibel, Par. 0017:” In another embodiment, a multimodal interactive interface enabling a user to add new words to a speech-to-speech translation device in the field and without technical expertise is disclosed. Examples include: [1] Methods to automatically classify class of word or word-phrase to be added to the system, and automatically generate of pronunciations, and translation of the word; … [4] Method for setting language model and translation probabilities for new word; and [5] Boosting or discounting language model and translation probabilities for new learned word based on relevance to user activities, interests and history of use.”, and Par. 0018:” Examples include: [1] Interface and methods to enable users to correct automatic speech recognition results, and use of this feedback information to adapt speech recognition components; [2] Interface and methods to enable users to correct machine translation hypotheses, and use of this feedback information to improve machine translation components; and [3] Method for automatically adjusting [enhancing or decreasing] language model, dictionary and translation model probability for correct or corrected word based on user correction.”, and Par. 0112:”In another embodiment, both sentences in the translation-pair are jointly labeled while applying the constraint that the class-tag sets must be equivalent. Specifically, for the sentence-pair [Sa,Sb] we search for the label-sequence-pair [Ta,Th] that maximizes the joint maximum conditional probability…”, and Par. 0116:”In an embodiment, in the case where no manually annotated corpora is available for a specific language, labels can be generated by projecting labels from a first language where labels are known, across the sentence-pairs in the training corpora to the non-annotated language.”)
a respective second probability value representing a reasonableness of grammar and semantics of the respective second text; and (Waibel, Par. 0016:” In an embodiment, four primary features equip the system to provide a field maintainable class-based speech-to-speech translation system. The first includes a speech translation framework that enables the addition of new words to the active system vocabulary, or the switching between location or task specific vocabularies. This provides for dynamic addition of words to a speech recognition module without requiring the module to be re-started. The system uses multilingual system-dictionary and language independent word-classes across all system components in the speech-to-speech translation device, class-based machine-translation [phrase-based statistical MT, syntactic, example-based, etc.], multilingual word-class tagging during model training, based on combination of monolingual taggers, and word-class tagging in new language by way of alignment via parallel corpus from known tagged language. Second, a multimodal interactive interface enables non-experts to add new words to the system. Third, the system is designed to accommodate ASR and SMT model adaptation using multimodal feedback provided by the user. And fourth, the system has networking capability to enable sharing of corrections or words.”, and Par. 0051:”class [i.e. semantic or syntactic class of the new entry]”, and Par. 0099:”… Alignment is performed word-to-word; translation examples, or phrase-pairs are matched at the word level; and word-based language models are applied. Hierarchical translation modules such as those in Chiang05, and syntax-based translation models such as in Yamada02, extend on this by introducing intermediate structure. However, these approaches still require exact word matches. As each word is treated as a separate entity, these models do not generalize to unseen words.”, and Par. 0109:” In this example, even though the word “Wheeling” did not appear in the training corpora, after the user has entered the word via the “User Field Customization Module” [FIG. 1, module 12] the system is able to correctly translate the word. Furthermore, as the word-class is known [in this example “@PLACE.city”] the system is able to select better translations for surrounding words and will order the words in the translation output correctly.”).
determining the outputted text according to the first probability values and the second probability values corresponding to each of the at least one second text. (Waibel, Par. 0017:” In another embodiment, a multimodal interactive interface enabling a user to add new words to a speech-to-speech translation device in the field and without technical expertise is disclosed. Examples include: [1] Methods to automatically classify class of word or word-phrase to be added to the system, and automatically generate of pronunciations, and translation of the word; … [4] Method for setting language model and translation probabilities for new word; and [5] Boosting or discounting language model and translation probabilities for new learned word based on relevance to user activities, interests and history of use.”, and Par. 0018:”Examples include: [1] Interface and methods to enable users to correct automatic speech recognition results, and use of this feedback information to adapt speech recognition components; [2] Interface and methods to enable users to correct machine translation hypotheses, and use of this feedback information to improve machine translation components; and [3] Method for automatically adjusting [enhancing or decreasing] language model, dictionary and translation model probability for correct or corrected word based on user correction.”, and Par. 0047:”If the user corrects the speech recognition output [step 43] the system first determines if the correction contains a new word [step 44]. This determination is made by checking for the word in the recognition lexicon model 20 associated with each language, La and Lb. If the word is not found the system prompts the user to add the new word to the active system vocabulary if desired [FIG. 5, step 50]. Otherwise, the probabilities in the ASR models [FIG. 3, item 17] are updated to reduce the likelihood of the same error occurring again. This can be performed in a discriminative manner where probabilities of the corrected word sequence are increased, and those of close-competing hypotheses are reduced.”, and Par. 0109:”In this example, even though the word “Wheeling” did not appear in the training corpora, after the user has entered the word via the “User Field Customization Module” [FIG. 1, module 12] the system is able to correctly translate the word. “, and Par. 0112:” In another embodiment, both sentences in the translation-pair are jointly labeled while applying the constraint that the class-tag sets must be equivalent. Specifically, for the sentence-pair [Sa,Sb] we search for the label-sequence-pair [Ta,Th] that maximizes the joint maximum conditional probability…”, and Par. 0116:”In an embodiment, in the case where no manually annotated corpora is available for a specific language, labels can be generated by projecting labels from a first language where labels are known, across the sentence-pairs in the training corpora to the non-annotated language.“).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Arar, Tu, and Prajit in view of Waibel to input the at least one second text into the language model to determine second probability values corresponding to each of the at least one second text using the language model, wherein the language model is obtained through training by using the corpus samples corresponding to each of the plurality of languages, a respective second probability value representing a reasonableness of grammar and semantics of the respective second text; and determining the outputted text according to the first probability values and the second probability values corresponding to each of the at least one second text, in order to improve translation quality compared to the baseline system, as evidence by Waibel (see Par. 0131).

Regarding claim 19, Arar, Tu, and Prajit fail to explicitly disclose, however, Waibel teaches in response to determining that the at least one second text includes a word not corresponding to the first language, (Waibel, Par. 0067:” In the interface, the user may select a ‘new-word’ mode from the menu, or the new word learning mode could be invoked after a user correction has yielded a new/unknown word. In the window pane that appears he/she can now type the desired new word, name, special term, concept, expression. Based on the orthographic input in the user's language [this can be character sets different from English, e.g., Chinese, Japanese, Russian, etc.]. The system then generates a transliteration in Roman alphabet and the words predicted pronunciation. This is done by conversion rules that are either hand written or extracted from preexisting phonetic dictionaries or learned from transliterated speech data. The user then views the automatic conversion and can play the sound of the generated pronunciation via TTS. The user may iterate and modify either of these representations [script, Romanized transliteration, phonetic transcription, and its sound in either language] and the other corresponding entries will be regenerated similarly [thus a modified transcription in one language may modify the transcription in the other].”, and Par. 0076:”If the user enters a word that does not match any of the pre-defined classes within the system, the user can assign it to the ‘unknown’ class. For ASR, the ‘unknown’ class is defined by words that occurred in the training data but not in the recognition lexicon. For SMT bilingual entries that do not occur in the translation lexicon are set to the unknown tag in the target language model.”).
determining a target second text according to probability values corresponding to each of the at least one second text; and translating the target second text into the second language. (Waibel, Par. 0100:” One embodiment of class-based machine translation is class-based statistical machine translation, in which a foreign language sentence fJ1=f1, f2, . . ., fJ is translated into another language eI1=e1, e2, . . ., e1 by searching for the hypothesis  ̂eI1 with maximum likelihood, given:

 
    PNG
    media_image1.png
    39
    503
    media_image1.png
    Greyscale


Classes can be semantic classes, such as named-entities, syntactic classes or classes consisting of equivalent words or word phrases. As an example we describe the case when named-entity classes are incorporated into the system.”, and Par. 0101:” The two most informative models applied during translation are the target language model P[eI1] and the translation model P[fJ1|eI1]. In a class-based statistical machine translation framework P[fJ1|eI1] is a class-based translation model [FIG. 3, model 23], and P[eI1] is a class-based language model [FIG. 3, model 24].”, and Par. 0102:” Class-based models for a statistical machine translation framework can be trained using the procedure shown in FIG. 10. First, the training corpora of sentence pairs are normalized [step 100] and tagging models [FIG. 3, model 22] are used to tag the corpora [step 101]. One approach to do this is described in Lafferty01. In this step, sentences that combine to form a training-pair can be tagged independently, tagged jointly, or tags from one language can be projected to the other. After the entire training corpus is tagged, words within sentence-pairs are aligned [step 102] …. In this step, multi-word phrases within a tagged entity [i.e. “New York”] are treated as a single token. Next, phrases are extracted [step 103] using methods such as Koehn07 to generate class-based translation models [FIG. 3, model 23]. The tagged corpus is also used to train class-based target language models [FIG. 3, model 24]. Training may be accomplished”, and Par. 0110:” In an embodiment, a labeled parallel corpora is obtained by independently tagging each side of the training corpora with monolingual taggers and then removing inconsistent labels from each sentence-pair. In this approach, for each sentence-pair [Sa,Sb] the label-sequence-pair [Ta,Tb] is selected which has maximum conditional probabilities P[Ta,Sa] and P[Tb,Sb]. If the occurrence count of any class-tag differs between P[Ta,Sa] and P[Tb,Sb], that class-tag is removed from the label-sequence-pair [Ta,Th].”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Arar, Tu, and Prajit in view of Waibel to in response to determining that the at least one second text includes a word not corresponding to the first language, determining a target second text according to probability values corresponding to each of the at least one second text; and translating the target second text into the second language, in order to improve translation quality compared to the baseline system, as evidence by Waibel (see Par. 0131).

Regarding claim 20, Arar, Tu, and Prajit fails to explicitly disclose, however, Waibel teaches wherein a respective probability value representing a probability that the first text is corrected to a respective second text in the at least one second text. (Waibel, Par. 0018:” Examples include: [1] Interface and methods to enable users to correct automatic speech recognition results, and use of this feedback information to adapt speech recognition components; [2] Interface and methods to enable users to correct machine translation hypotheses, and use of this feedback information to improve machine translation components; and [3] Method for automatically adjusting [enhancing or decreasing] language model, dictionary and translation model probability for correct or corrected word based on user correction.”, and Par. 0046:”The user can also select an erroneous segment in the output hypothesis via the touch screen and correct it by selecting a competing hypothesis in an automatically generated drop-down list, or …. Here they are applied to the speech recognition and translation modules of interactive speech translation systems.”, and Par. 0047:” If the user corrects the speech recognition output [step 43] the system first determines if the correction contains a new word [step 44]. This determination is made by checking for the word in the recognition lexicon model 20 associated with each language, La and Lb. If the word is not found the system prompts the user to add the new word to the active system vocabulary if desired [FIG. 5, step 50]. Otherwise, the probabilities in the ASR models [FIG. 3, item 17] are updated to reduce the likelihood of the same error occurring again. This can be performed in a discriminative manner where probabilities of the corrected word sequence are increased, and those of close-competing hypotheses are reduced.”, and Par. 0068:”The system further automatically selects the most likely word class that the new word belongs to based on co-occurrence statistics of other words [with known class] in similar sentence contexts.”, and Par. 0078:”In addition to the above five entries, an intra-class probability P[w|C] is also defined. In this fashion it is possible for the system to differentiate between words belonging to the same class. Thus words that are closer to the user's tasks, preferences and habits will be preferred and a higher intra-class probability assigned. This boosting of higher intra-class probability is determined based on relevance to the user, where relevance is assessed by observing…”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Arar, Tu, and Prajit in view of Waibel to wherein a respective probability value representing a probability that the first text is corrected to a respective second text in the at least one second text, in order to improve translation quality compared to the baseline system, as evidence by Waibel (see Par. 0131).

Claims 3, and 8 are rejected under 35 U.S.C. 103 as being unpatentable over Arar, TU, Prajit, and Waibel as applied to claims 2, and 7 respectively,  and in further view of Freitag  (US20210019373A1).

Freitag was applied in the previous Office Action.
Regarding claim 3, and 8 Arar, Tu, Prajit, and Waibel fail to explicitly disclose, however, Freitag teaches inputting the first text into the machine translation model; and correcting the first text using the machine translation model. (Freitag, Par. 0092:” In some implementations, a method implemented by one or more processors is provided that includes processing a first instance of text in a target language using a multilingual automatic post-editing model to generate first edited text, wherein the first instance of text in the target language is generated using a neural machine translation model translating a first source language to the target language, wherein the multilingual automatic post-editing model is used in correcting one or more translation errors introduced by the neural machine translation model, and wherein the multilingual post-editing model is trained for use in correcting translation errors in the target language translated from any one of a plurality of source languages.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Arar, Tu, Prajit, and Waibel in view of Freitag to input the first text into the machine translation model; and correcting the first text using the machine translation model, in order to improve the accuracy and/or robustness of edited translated text generated using an APE (automatic post-editing) model trained on such training instances, as evidence by Freitag (See Par. 0010)

Claims 4, and 9 are rejected under 35 U.S.C. 103 as being unpatentable over Arar, Tu, Prajit, and Waibel as applied to claims 2, and 7 respectively,  and in further view of Scaria  (US20200126544A1).

Scaria was applied in the previous Office Action.
Regarding claim 4, and 9 Arar, Tu, Prajit, and Waibel fail to explicitly disclose, however, Scaria teaches wherein the machine translation model is composed of an encoder and a decoder; and (Scaria, Par. 0033:” Deep neural network [DNN] Decoder 312, can include a deep neural network architecture including a plurality of fully connected computational nodes, or include the same architecture as DNN Encoder 306, including 1-D convolutional layers and fully connected layers in a LSTM configuration.”).
the encoder or the decoder include any one of the following neural network models: a recurrent neural network model, a long short-term memory network model, and a bidirectional long short-term memory network model. (Scaria, Par. 0032:” ... The fully connected layers are connected in a long short-term memory [LSTM] configuration to permit DNN 306 to determine Interlingua language words and phrases by processing spoken language input 302 including includes multiple text data words. Text data natural language input 302 can include multiple text data words corresponding to multiple time samples within spoken language input 302. DNN 306 determines an Interlingua language output 308 based on context formed by the relative position of words in text data text data natural language input 302 based on language token 304. An example of a deep neural network including 1-D convolution and LSTM layers is the Multilingual Neural Machine Translation System, a system developed by Google, Inc., Mountain View, Calif.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Arar, Tu, Prajit, and Waibel in view of Scaria to wherein the machine translation model is composed of an encoder and a decoder; and the encoder or the decoder include any one of the following neural network models: a recurrent neural network model, a long short-term memory network model, and a bidirectional long short-term memory network model, in order to improve NLU system which can determine an Interlingua language response to translate the Interlingua language response into one of a plurality of natural languages, as evidence by Scaria (See Par. 0030).

Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Arar, Tu, Prajit, and Waibel as applied to claim 15, and in further view of Gonzalez-Dominguez  (US20150364129A1).

Gonzalez-Dominguez was applied in the previous Office Action.
Regarding claim 16 Arar, Tu, Prajit fail to explicitly disclose, however, Waibel teaches in response to determining that the first text is consistent with a second text having the largest summed probability value, outputting the first text; and (Waibel, Par. 0017:”… [4] Method for setting language model and translation probabilities for new word; and [5] Boosting or discounting language model and translation probabilities for new learned word based on relevance to user activities, interests and history of use.”, and Par. 0018:”…Examples include: [1] Interface and methods to enable users to correct automatic speech recognition results, and use of this feedback information to adapt speech recognition components; [2] Interface and methods to enable users to correct machine translation hypotheses, and use of this feedback information to improve machine translation components; and [3] Method for automatically adjusting [enhancing or decreasing] language model, dictionary and translation model probability for correct or corrected word based on user correction.”, and Par. 0047:”If the user corrects the speech recognition output [step 43] the system first determines if the correction contains a new word [step 44]. This determination is made by checking for the word in the recognition lexicon model 20 associated with each language, La and Lb. If the word is not found the system prompts the user to add the new word to the active system vocabulary if desired [FIG. 5, step 50]. Otherwise, the probabilities in the ASR models [FIG. 3, item 17] are updated to reduce the likelihood of the same error occurring again. This can be performed in a discriminative manner where probabilities of the corrected word sequence are increased, and those of close-competing hypotheses are reduced.”, and Par. 0112:” In another embodiment, both sentences in the translation-pair are jointly labeled while applying the constraint that the class-tag sets must be equivalent. Specifically, for the sentence-pair [Sa, Sb] we search for the label-sequence-pair [Ta, Th] that maximizes the joint maximum conditional probability…”, and Par. 0116:”In an embodiment, in the case where no manually annotated corpora is available for a specific language, labels can be generated by projecting labels from a first language where labels are known, across the sentence-pairs in the training corpora to the non-annotated language. “). Note, since each probability are the highest the sum will also be highest.
in response to determining that the first text is inconsistent with the second text having the largest summed probability value (Waibel, Par. 0041:” To help the user determine if the translation output is adequate, the automatically generated translation [FIG. 2, item 16] is translated back into the input language via MT module 3 or 8 and displayed with parentheses under the original input as illustrated for example in FIG. 2, item 15a. If the confidence of both speech recognition and translation are high [step 31] [consistent] as determined by the ASR model, 2 or 9, and the MT module, 3 or 8, spoken output [item 26] is generated via loud speakers 5 or 6, via TTS modules 4 or 7 [step 33]. Otherwise [inconsistent], the system indicates that the translation may be wrong via the GUI, audio and/or tactical feedback. The specific TTS module used in step 33 is selected based on the output language.”, and Par.0042:”Thereafter, if the user is dissatisfied with the generated translation, the user may intervene during the speech-to-speech translation process in any of steps from 27 to 33 or after process has completed. This invokes the Correction and Repair Module 11 at [step 35]. The correction and repair module 11 records and logs any corrections the user may make, which can be later used to update ASR modules 2 and 9 and MT modules 3 and 8 as described in detail further below in this document.”, and Par. 0047:” If the user corrects the speech recognition output [step 43] the system first determines if the correction contains a new word [step 44]. This determination is made by checking for the word in the recognition lexicon model 20 associated with each language, La and Lb. If the word is not found the system prompts the user to add the new word to the active system vocabulary if desired [FIG. 5, step 50]. Otherwise, the probabilities in the ASR models [FIG. 3, item 17] are updated to reduce the likelihood of the same error occurring again. This can be performed in a discriminative manner where probabilities of the corrected word sequence are increased, and those of close-competing hypotheses are reduced.”, and Par. 0106:” Search is performed to find the translation hypothesis with maximum likelihood P[fJ1|eI1]·P[eI1] given the translation model probability P[fJ1|eI1] [FIG. 3, model 23] and the MT class-based language model probability P[eI1] [FIG. 3, model 24].”, and Par. 0109:”In this example, even though the word “Wheeling” did not appear in the training corpora, after the user has entered the word via the “User Field Customization Module” [FIG. 1, module 12] the system is able to correctly translate the word. “).
outputting the second text having the largest summed probability value (Waibel, Par. 0017:” … [4] Method for setting language model and translation probabilities for new word; and [5] Boosting or discounting language model and translation probabilities for new learned word based on relevance to user activities, interests and history of use.”, and Par. 0018:”, and Par. 0018:”…[3] Method for automatically adjusting [enhancing or decreasing] language model, dictionary and translation model probability for correct or corrected word based on user correction.”, and Par. 0047:”If the user corrects the speech recognition output [step 43] the system first determines if the correction contains a new word [step 44]. This determination is made by checking for the word in the recognition lexicon model 20 associated with each language, La and Lb. If the word is not found the system prompts the user to add the new word to the active system vocabulary if desired [FIG. 5, step 50]. Otherwise, the probabilities in the ASR models [FIG. 3, item 17] are updated to reduce the likelihood of the same error occurring again. This can be performed in a discriminative manner where probabilities of the corrected word sequence are increased, and those of close-competing hypotheses are reduced.”, and Par. 0112:” In another embodiment, both sentences in the translation-pair are jointly labeled while applying the constraint that the class-tag sets must be equivalent. Specifically, for the sentence-pair [Sa, Sb] we search for the label-sequence-pair [Ta, Th] that maximizes the joint maximum conditional probability…”, and Par. 0116:”In an embodiment, in the case where no manually annotated corpora is available for a specific language, labels can be generated by projecting labels from a first language where labels are known, across the sentence-pairs in the training corpora to the non-annotated language. “). Note, since each probability are the highest the sum will also be highest.
wherein [[a respective summed probability value of the respective second text represents a weighted sum]] of the respective first probability value and the respective second probability value corresponding to the respective second text. (Waibel, Par. 0041:” To help the user determine if the translation output is adequate, the automatically generated translation [FIG. 2, item 16] is translated back into the input language via MT module 3 or 8 … If the confidence of both speech recognition and translation are high [step 31] [consistent] as determined by the ASR model, 2 or 9, and the MT module, 3 or 8, spoken output [item 26] is generated via loud speakers 5 or 6, via TTS modules 4 or 7 [step 33]. Otherwise [inconsistent], the system indicates that the translation may be wrong….”, and Par. 0042:” Thereafter, if the user is dissatisfied with the generated translation, the user may intervene during the speech-to-speech translation process in any of steps from 27 to 33 or after process has completed. This invokes the Correction and Repair Module 11 at [step 35].”, and Par. 0047:” If the user corrects the speech recognition output [step 43] the system first determines if the correction contains a new word [step 44]. This determination is made by checking for the word in the recognition lexicon model 20 associated with each language, La and Lb. If the word is not found the system prompts the user to add the new word to the active system vocabulary if desired [FIG. 5, step 50]. Otherwise, the probabilities in the ASR models [FIG. 3, item 17] are updated to reduce the likelihood of the same error occurring again. This can be performed in a discriminative manner where probabilities of the corrected word sequence are increased, and those of close-competing hypotheses are reduced.”, and Par. 0106:” Search is performed to find the translation hypothesis with maximum likelihood P[fJ1|eI1]·P[eI1] given the translation model probability P[fJ1|eI1] [FIG. 3, model 23] and the MT class-based language model probability P[eI1] [FIG. 3, model 24].”, and Par. 0109:”In this example, even though the word “Wheeling” did not appear in the training corpora, after the user has entered the word via the “User Field Customization Module” [FIG. 1, module 12] the system is able to correctly translate the word. “).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Arar, Tu, and Prajit in view of Waibel to in response to determining that the first text is consistent with a second text having the largest summed probability value, outputting the first text; and in response to determining that the first text is inconsistent with the second text having the largest summed probability value, outputting the second text having the largest summed probability value, wherein [[a respective summed probability value of the respective second text represents a weighted sum]] of the respective first probability value and the respective second probability value corresponding to the respective second text, in order to improve translation quality compared to the baseline system, as evidence by Waibel (see Par. 0131).
Arar, Tu, Prajit and Waibel fail to explicitly disclose, however, Gonzalez-Dominguez teaches a respective summed probability value of the respective second text represents a weighted sum (Gonzalez-Dominguez, Par. 0009:”For example, the method further includes receiving, from the particular one of the multiple speech recognizers, a preliminary language model confidence score that indicates a preliminary level of confidence that a language model has in the preliminary transcription of the utterance in a language corresponding to the language model; and determining that the preliminary language model confidence score is less than a language model confidence score received from the particular one of the multiple speech recognizers. Providing the speech data to a language identification module includes providing the speech data to a neural network that has been trained to provide likelihood scores for multiple languages. Selecting the language based on the language identification scores and the language model confidence scores includes determining a combined score for each of multiple languages, wherein the combined score for each language is based on at least the language identification score for the language and the language model confidence score for the language; and selecting the language based on the combined scores. Determining a combined score for each of multiple languages includes weighting the likelihood scores or the language model confidence scores using one or more weighting values. Receiving the speech data includes receiving speech data that includes an utterance of a user; further including before receiving the speech data, receiving data indicating multiple languages that the user speaks; storing data indicating the multiple languages that the user speaks; wherein providing the speech data to multiple speech recognizers that are each configured to recognize speech in a different language includes based on the stored data indicating the multiple languages that the user speaks, providing the speech data to a set of speech recognizers configured to recognize speech in a different one of the languages that the user speaks.”, and Par. 0043:”In some configurations, a weighting may be applied to each component confidence score 126 and 120. This may be desirable, for example, if empirical testing [e.g., for a single user 102, for a class of users, for all users] shows that a particular weighting gives more favorable results. For example, language model scores 120 may be slightly more or slightly less predictive of the language spoken than output of the language identification model, and may accordingly be given a slightly higher or lower weight.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Arar, Tu, Prajit and Waibel in view of Gonzalez-Dominguez to respective summed probability value of the respective second text represents a weighted sum, in order to improve the overall accuracy of the speech recognition system, as evidence by Gonzalez-Dominguez (See Par. 0063).

Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Arar, Tu, and Prajit as applied to claim 6, and in further view of Potkonjak (US20100324894A1).

Potkonjak was applied in the previous Office Action.
Regarding claim 17 Arar, Tu, and Prajit fail to explicitly disclose, however, Potkonjak teaches wherein; the mapping relationship between words in different languages comprises a mapping relationship between words in different dialects of the same language; and the at least one second text corresponds to the same language refers to that the at least one second text corresponds to a same dialect of the same language. (Potkonjak, Par. 009:” This disclosure is drawn to methods, apparatus, systems and computer program products related to voice to text to voice processing. An audio signal comprising spoken voice can be processed such that the signal conforms to a set of specified constraints and objectives. The audio signal can be preprocessed and translated into text prior to being analyzed and reorganized in the textual domain. The resulting text can then be converted into a new voice format where additional processing may be conducted. The voice to text to voice processing may translate the voice content from a specific language to the same language with improved clarity, corrected grammar, adjusted vocabulary level, corrected slang, altered dialect, altered accent, or other modifications of oral communication characteristics. The processing may also include translation into one or more other languages from the original language.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Arar, Tu, and Prajit in view of Potkonjak to wherein; the mapping relationship between words in different languages comprises a mapping relationship between words in different dialects of the same language; and the at least one second text corresponds to the same language refers to that the at least one second text corresponds to a same dialect of the same language, in order to improve human communications over both wired and wireless devices and infrastructure, as evidence by Potkonjak (See Par. 0010).


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure. Zou et al. (US-20180336900A1) teaches Par. 0004:” As speech technologies develop, speech transcription from speech to a corresponding text gradually prevails in daily life. However, the current speech transcription technique can only recognize and transcribe speech in the current language, for example, a corresponding transcription result of one mandarin speech is a text of Chinese characters corresponding to the speech. The current speech transcription technique cannot satisfy the demand of cross-language speech transcription, for example, it is impossible to directly translingually transcribe one input mandarin speech into a corresponding English translation text. To implement cross-language speech transcription, a two-step scheme is mostly employed in the prior art: first, using a speech recognition tool to transcribe the input speech and generate a text; then, translating the generated text via machine translation and finally obtaining a cross-language speech transcription text result.”.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DARIOUSH AGAHI whose telephone number is (408)918-7689. The examiner can normally be reached Monday - Thursday and alternate Fridays, 7:30-4:30 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DARIOUSH AGAHI/Examiner, Art Unit 2656                                                                                                                                                                                                        

/MICHELLE M KOETH/Primary Examiner, Art Unit 2656