Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114 
A request for continued examination under, including the fee set forth in 37 CFR1.17(e), was filed in this application after final rejection. Since this application is eligiblefor continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e)has been timely paid, the finality of the previous Office action has been withdrawnpursuant to 37 CFR 1.114. Applicant's submission filed on 09/15/2022 has been entered.
Status of the Claims
Claims 1-2, 7-9, 14-16, and 21-22 are pending. 
Response to Applicant’s Argument
In response to “As seen above, Cuthbert separately discloses sending the transcription and translation results to the user device after performing the respective speech recognition and translation processes. However, Cuthbert does not disclose that the first recognition information (i.e., the transcription of the voice input) is sent to the terminal while performing the translation process, as required by amended claim 1. The fact that Cuthbert may perform partial translation while the user is speaking does not remedy this issue because it is performed by the user device, not the server” and “However, simultaneous display would occur if the speech recognition and translation processes were both performed by the user device (i.e., no transmission from the server is required) or if the server sends both the transcription and translation results back to the user device at the same time. Thus, it should be clear that the claimed limitation is not inherently disclosed by the process of Cuthbert either. Therefore, Cuthbert also fails to teach the features of amended claim 1”.
Cuthbert teaches speech recognition may be performed by user device 10, by a server, or by a combination of both (¶30). Take the alternative where the user device 10 sending audio data along with identifiers corresponding to first and second languages to a speech recognition program at the server via a network, the speech recognition program at the server may then perform speech recognition on the primary user’s utterance based on the language identifier associated with the audio data (¶30). 
The server may then (1) transmit a transcription of the primary user’s utterance back to the user device 10 (¶30). 
Thereafter, a language translation application on user device 10 (per ¶17) may (2) perform partial translation of the voice input while the primary user is speaking (¶31). Here, the translation of the primary user’s speech may be performed by the user device 10, by a translation program at a server, or by a combination of both (¶31). Therefore, the language translation application on user device 10 performs the partial translation of the voice input while the primary user is speaking either by (A) the user device 10 itself, (B) by the translation program at the server, or (C) by a combination of (A) and (B).
 Take implementation (B) where user device 10 may access the translation service / translation program at the server (¶¶31-32) so that the language translation application may translate the primary user’s utterance into the second language, the translation program at the server translates a transcription of the primary user’s utterance into text representing the primary user’s utterance in the second language (¶32). 
The server may then (3) transmit the text translation back to the user device 10 for display (¶32). 

    PNG
    media_image1.png
    649
    652
    media_image1.png
    Greyscale

Fig. 4 shows a user interface 400 being displayed while the language translation application determines that a microphone on the user device 10 is currently receiving an audio signal where the top portion 410 shows a partial transcription of the primary user’s speech and bottom portion 420 shows a partial translation of the transcription (¶34). Specifically, note that microphone icon 30 on user interface 400 is animated with a pulse effect and highlighted to create a visual indication that the language translation application is receiving voice input with a microphone of the user device 10 (¶35).
Fig. 5 shows user interface 500 being displayed while the language translation application obtains and generates an audio signal corresponding to a translation of the primary user’s speech with top portion 510 including a full transcription of the primary user’s speech while bottom portion 520 shows a translation of the transcription (¶37):
Apply the aforementioned disclosures on a time line as follows:

    PNG
    media_image2.png
    660
    1111
    media_image2.png
    Greyscale

Given the disclosure that the language translation application (using the translation program at the server) may perform partial translations of voice input while the primary user is speaking (¶31), Cuthbert teaches using the translation program at the server to perform partial translations while the user is speaking “where is the bathroom?”. 
Further, given that the primary user is speaking “where is the bathroom?” covering the time Fig. 4 showing user interface 400 (when server transmitted transcription / recognition information “where is” to user device 10) and the time Fig. 5 showing user interface 500 (when server transmitted transcription / recognition information “where is the bathroom?” to user device 10), Cuthbert teaches “sending first recognition information to the terminal while perform the translation process on the first recognition result” because the language translation application / server translation program is performing partial translations of voice input while the primary user is speaking “where is the bathroom?” covering the time when the server transmitted / sent recognition information “where is” for display on user device 10 (user interface 400) and the time when the server transmitted / sent recognition information “where is the bathroom?” for display on user device 10 (user device 500).
Claim Rejections - 35 USC § 103
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 103 that form the basis for the rejections under this section made in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 7-8, 14-15, and 22 are rejected under 35 USC 103(a) as being unpatentable over Franz et al. (US 2002/0198713 A1) in view of Boesen (US 2018/0121623 A1), Kent et al. (US 2010/0057436 A1), and Cuthbert et al. (US 2015/0134322 A1).
Regarding Claims 1 and 8, Franz discloses a server (¶66, remote server), comprising: 
a memory, a processor and computer programs stored in the memory and executable by the processor, wherein when the computer programs are executed by the processor, a voice translation device is realized (¶61 an ¶277, speech translation system “STS” comprising a processor, memory, and program instruction), wherein the voice translation device is configured to perform the steps of: 
acquiring voice data input from a terminal, the voice data input comprising voice data to be translated (¶66, access server function remotely from a PDA or cell phone; ¶67, STS accepts spoken language in a source language and performs speech recognition in the source language); 
determining a language type of the voice data input (¶67 and ¶69, STS performs speech recognition in the source language to produce at least one speech recognition hypothesis from coded multiple hypotheses and to output the best hypothesis; per ¶67, optionally allowing the user to confirm the recognized expression or allow user to choose from a sequence of candidate recognitions); 
performing a recognition process on the voice data input using a language model corresponding to the determined language type to acquire first recognition information corresponding to the voice data input (¶104, upon receipt of a speech input 1201, acoustic speech recognition component 1202 uses at least one word pronunciation dictionary 1222 and at least one acoustic model 1224 to generate at least one data structure 1204 encoding hypothesized words where data structure information 1204 is used for utterance hypothesis construction 1206; ¶119, utterance hypothesis construction component uses language model (i.e., data structure information 1204) to construct utterance hypothesis); 
acquiring the voice data to be translated from the first recognition information (¶70 and ¶102, perform matching and transfer recursively on parts of the shallow syntactic representation of the input to construct one or more hypotheses for speech recognition in a speech translation system);
performing a translation process on the voice data to be translated according to a target language type to acquire a translation result corresponding to the voice data to be translated (¶70 and ¶102-103, perform source to target language transfer to produce target language syntactic representation); and
Franz does not disclose the voice data input is a single voice data input comprises the voice data to be translated and the target language type of the voice data to be translated, and acquiring the target language type from the first recognition information. 
Boesen teaches a language translation engine (¶71, the medical engine 218 may also perform language translation in real-time eliminating the need for a translator for simple or routine conversations) acquiring a single voice input comprising voice data to be translated and a target language type of the voice data to be translated (¶84, receiving a request: “please translate ‘we need to deliver your baby now’ from English into Spanish”), determine a language type of the single voice data input (¶71, the language or speech detected by one or more microphones of the wireless earpieces 202 may be converted into the natural language of the user of the wireless earpieces 202; e.g., for the request “please translate ‘we need to deliver your baby now’ from English into Spanish”, detect the language type as English) to perform a recognition process on the single voice data input corresponding to the determined language type to acquire first recognition information (¶114, the applicable medical engine may utilize automatic speech recognition to transcribe human speech (e.g., commands, questions, dictation, etc.) into text or other formats for subsequent analysis), acquire the voice data to be translated and the target language type from the first recognition information, and translate the voice data to be translated according to the target language type to acquire a translation result corresponding to the voice data to be translated (¶85,  the medical engine may be activated as requested by the user. For example, the request may be converted into a command succeeded by the logic or processor of the wireless earpieces to activate the medical engine; in view of ¶41, ¶59, and ¶84, transcribe the request “please translate ‘we need to deliver your baby now’ from English into Spanish” into a command to translate “we need to deliver your baby now” into Spanish  and perform an action to translate “we need to deliver your baby now” from English language type to target language type Spanish).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to modify Franz to acquire a single voice data input comprising voice data to be translated and a target language type of the voice data to be translated, recognize the single voice data input, and acquire the voice data to be translated and the target language type from the recognition in order to detect language or speech for conversion into natural language of the user (Boesen, ¶71).
Franz does not disclose performing the translation process on the voice data to be translated based on a translation model corresponding to the target language type after the target language type corresponding to the voice data to be translated is acquired, so as to acquire a translation result corresponding to the voice data to be translated. 
Kent teaches a translation system (Abstract) performing a translation process on input speech sample in a first language / voice data to be translated based on a translation model corresponding to a target language type after the target language type corresponding to the voice data to be translated is acquired so as to acquire a translation result corresponding to the voice data to be translated (¶43, input / output language select 144 allows user to select a source language and a target language; ¶45, SR module 130 uses statistical models 206 to convert input speech 202 into text; ¶46, machine translation module 132 uses statistical models 208 to compute the best possible translation of the that text into the target language).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to modify Franz to perform the translation process on the voice data to be translated based on a translation model corresponding to the target language type after the target language type corresponding to the voice data to be translated is acquired in order to allow the user to select the target language (Kent, ¶43).
Franz does not disclose sending the first recognition information to the terminal while performing the translation process on the first recognition information, and after the translation result is acquired, sending the translation result to the terminal.
Cuthbert teaches a client terminal-server transcription and translation configuration (¶30, device 10 requires server speech recognition program and ¶31, device 10 requires server translation program) where the server acquires voice data input from the client terminal (¶30, user device 10 sends audio data to speech recognition program at a server via a network) and performs a recognition process to acquire first recognition information (¶30, server speech recognition program performs speech recognition on primary user’s utterance based on language identifier associated with the audio data) in order to perform a translation process on voice data to be translated acquired from the first recognition information to generate a translation result (¶32, language translation application on user device 10 per ¶17 uses a translation program at a server to translate a transcription of the primary user’s utterance into text from a first language into a second language while the primary user is speaking ); and 
the server sends the first recognition information to the client terminal (¶30, server speech recognition program perform speech recognition on primary user’s utterance and the server may then transmit a transcription of the primary user’s utterance back to user device 10; see Figs. 4 and 5) while performing the translation process on the first recognition result (¶33, language translation program reads the text file output by the speech recognizer and uses this text file to generate a text file for a pre-specified target language; i.e., once the English transcription file is generated, server then transmits corresponding English transcription file back to user device 10 and translates the English transcription file into Spanish translation file, and then transmit the text translation back to user device 10 for display per ¶30 and ¶32; Fig. 4 shows a time when the server sent recognition information “where is…” and sent translation result “Donde esta…” to user device 10 while Fig. 5 shows a time when the server sent recognition information “where is the bathroom?” and sent translation result “Donde esta el bano?” to user device 10; Figs. 4 and 5 collectively showed during the time of Fig. 4 and the time of Fig. 5, translation from “where is the bathroom” into “Donde esta el bano” was performed while the primary user was speaking “where is the bathroom?”, which covers the time when the server sent recognition information “where is…” at the time of Fig. 4 and recognition information “where is the bathroom” at the time of Fig. 5 to user device 10), and after the translation result is acquired, sending the translation result to the terminal (¶32, the server may then transmit the text translation back to the user device 10 for display; see Fig. 5, “Donde esta el bano”). 
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to modify Franz to send the first recognition information to the terminal while performing the translation process on the first recognition information in order to properly sequence the input of speaker utterance and output of recognition results and translation results through the user interface in situations involving users speaking different languages (Cuthbert, ¶3).
Regarding Claims 7 and 14, Franz discloses wherein after the acquiring the translation result corresponding to the voice data to be translated, the method further comprises: sending the first recognition information and the translation result to the terminal (¶66, when remote server hosts the STS system and the user may dial the STS translation service from a PDA or cell phone; ¶67, after STS performs speech recognition in the source language, optionally allow the user to confirm the recognized expression would require transmitting the recognition information to the PDA or cell phone).
Regarding Claim 15, Franz discloses a non-transitory computer readable storage medium, having computer programs stored thereon, wherein when the computer programs are executed by a processor, a voice translation method of claims 1 and 8 is realized (¶61 an ¶277, speech translation system “STS” comprising a processor, memory, and program instruction).
Regarding Claim 22, Franz discloses recognizing an intention of the first recognition information to determine a translation intention corresponding to the first recognition information (¶71, the STS analyzes the input, determines the meaning of the input, and renders that meaning in the appropriate way in a target language), wherein different translation intentions correspond to different translation models (¶73-74, combing syntactic analysis with analogical or statistical transfer to produce high quality translation in different domains; see for example, ¶82, parse the input “I want to make a reservation for three people for tomorrow evening at seven o’clock” to identify syntactic constituents / parse tree; ¶85, the domain independent syntactic analysis is combined with domain dependent translation example database described in ¶73), and wherein translation results corresponding to the same recognition information are different depending on different translation intentions (¶96 and ¶100-101, perform an initial fast match to quickly check the compatibility of the input parse tree with a domain specific example database to rule out unlikely examples where the fast match is performed based on syntactic head of the constituents to be matched while constrained to equality or to a thesaurus based measure of close semantic similarity); 
determining a translation model corresponding to the determined translation intention according to the determined translation intention corresponding to the first identification information (¶97-99, after initial fast match, perform best match to find the best match from the example database given an input); and 
performing the translation process on the voice data to be translated according to the determined translation model to acquire the translation result corresponding to the voice data to be translated (in view of ¶19, match the input to source expressions of example pairs in the example database, find the most appropriate examples, take the target expression from best matching examples and construct an expression in the target language).
Claims 2, 9, and 16 are rejected under 35 USC 103(a) as being unpatentable over Franz et al. (US 2002/0198713 A1) in view of Boesen (US 2018/0121623 A1), Kent et al. (US 2010/0057436 A1), and Cuthbert et al. (US 2015/0134322 A1) as applied to claims 1, 8, and 15, in view of Chun (US 2011/0218804 A1).
Regarding Claims 2, 9, and 16, Franz does not disclose wherein determining the language type of the single voice data input comprises: determining a feature vector of the single voice data input; and determining the language type of the voice data based on a match degree between the feature vector and a preset language type model.  
Chun discloses a server (¶66 and Fig. 1, device 1 receiving audio data from a remote location over a network) determining a language type of voice data input acquired from a terminal (¶66, receiving audio data from a remote location; ¶78, ¶81 and ¶141, determining a likelihood of a sequence of observations / vectors representing audio occurs in a given language) wherein determining the language type of the voice data input comprises: 
determining a feature vector of the voice data input (¶78 and ¶141, speech signals are converted into an input vector in n-dimensional acoustic space); and 
determining the language type of the acquired voice data based on a match degree between the feature vector and a preset language type model (¶81 and ¶141, determining the likelihood of the sequence of observations occurring in a given language is evaluated using the language model). 
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to modify Franz to determine the target language type by matching a feature vector of the single voice data input with a preset language type model in order to output the sequence of words into a translation system where it is translated into a second language (Chun, ¶141).
Claim 21 is rejected under 35 USC 103(a) as being unpatentable over Franz et al. (US 2002/0198713 A1) in view of Boesen (US 2018/0121623 A1), Kent et al. (US 2010/0057436 A1), and Cuthbert et al. (US 2015/0134322 A1) as applied to Claim 1, in further view of Choi (US 2005/0182628 A1).
Regarding Claim 21, Franz discloses wherein before the performing the translation process on the voice data to be translated, the method further comprises: 
performing a post-process on the first recognition information to generate second recognition information (¶83 and ¶96, process incomplete or imperfectly grammatical natural human speech by performing morphological analysis to re-arrange syntactic constituents to generate a final feature structure like Fig. 7, “I want to make a reservation for three people for tomorrow morning” by rearranging syntactic features through insert, delete, or join parts of syntactic representation) and performing the translation process on the voice data to be translated comprises performing the translation process on the second recognition information (¶94-97, since natural human speech is not perfectly complete and grammatical, perform optimization procedure to insert, delete or join parts of the syntactic representation and perform matching with the appropriate domain specific example database). 
Franz does not disclose wherein before performing the translation process on the voice data to be translated, performing a post-process on the first recognition information to generate second recognition information, wherein the post-process comprises correction based on hot words.
Choi discloses a domain based speech recognition apparatus performing a first speech recognition on speech input / voice data to be translated to generate a first recognition information (Abstract and ¶46-48, using first acoustic model and first language model to recognize Korean language based speech input to generate first recognition result in Korean equivalent of “what time is the temperature now?”) and perform a post-process on the first recognition information to generate second recognition information by correcting the first recognition information via a correction based on hot words (¶49-51, determine a domain keyword “temperature” to select a proper candidate domain; ¶53, apply second acoustic model and second language model to generate second recognition sentence “what is the temperature now?”).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to modify Franz to perform a post-process on the first recognition information comprising a correction based on hot words as taught by Choi in order to minimize misrecognition of a word in a final recognition result (Choi, Abstract).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to examiner Richard Z. Zhu whose telephone number is 571-270-1587 or examiner’s supervisor King Poon whose telephone number is 571-272-7440. Examiner Richard Zhu can normally be reached on M-Th, 0730:1700.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/RICHARD Z ZHU/Primary Examiner, Art Unit 2675                                                                                                                                                                                                        09/22/2022