Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Acknowledgement  
Acknowledgement is made of applicant’s amendment made on 09/20/2021. Applicant’s submission filed has been entered and made of record.
Status of the Claims
Claims 1-2, 7-9, 14-16, and 21-22 are pending. 
Response to Applicant’s Argument
In response to “The translation voice command ofDaigle is a discrete voice input received by the system separate from the voice data to be translated (i.e., the message). For example, the system may prompt the user whether to translate the message to the recipient's preferred language” and “In contrast, describes that a message is first created by the user in text (i.e., text to be translated), and a subsequent voice command contains a target language type for the translation. Accordingly, the message and the target language type of Daigle are not included in the same voice data acquired by the terminal, as required by claim 1”.
Claim 1 recites “acquiring voice data from a terminal, the acquired voice data comprising voice data to be translated and a target language type of the voice data to be translated”. Here, claim 1 requires a terminal to send voice data comprising two components: 
Daigle teaches “Additionally, while the following description and accompanying drawing specifically describe translation of instant messaging text, it will be clear to one of ordinary skill in the art that the systems and methods presented herein may be extended to translating other messaging protocols such as voice-over Internet protocol (VoIP)” (Col 3, Rows 10-15). In other words, any one of the client devices with respective translation logic (Col 4, Rows 50-60) being configured to transmit voice message over VoIP to the other client device meets the limitation (1) “acquiring voice data from a terminal, the acquired voice data comprising voice data to be translated”.
Daigle further teaches “The system may also respond to voice commands. If the user sender creates a message for a recipient and this recipient does not have a language profile established, the user sender can speak a command "please translate message to Spanish before sending”. The system responds by taking the message and translating it so that it appears in Spanish on the recipient side” (Col 13, Rows 49-55). In other words, the client device / terminal sending the voice message over VoIP further sends voice command comprising (2) a target language type of the voice data to be translated.
In response to “Second, claim 1 recites that first recognition information is acquired after performing a recognition process on the acquired voice data, and the target language type is obtained from the first recognition information corresponding to the acquired voice data. In contrast, Daigle discloses that the target language type is obtained from the user's subsequent translation voice command. The user's voice command (and any recognition thereof) does the user's voice command does not contain the voice data to be translated. Accordingly, the target language type of Daigle is not obtained from the first recognition information, as required by claim 1”.
 The limitation “acquiring voice data from a terminal, the acquired voice data comprising voice data to be translated and a target language type of the voice data to be translated” does not impose any temporal limitation on “voice data”. In other words, even if the voice command “please translate message to Spanish before sending” is a voice command subsequent to the voice message send over VoIP, this teaching still meets the limitation “acquiring voice data from a terminal, the acquired voice comprising… a target language type of the voice data to be translated” and the teaching “The system responds by taking the message and translating it so that it appears in Spanish on the recipient side” (Col 13, Rows 49-55) meets the limitation “the target language type is obtained from the first recognition information corresponding to the acquired voice data”.
In response to “Third, claim 1 recites that voice data to be translated is acquired from the first recognition information, and a translation process is performed on the voice data to be translated to acquire a translation. In contrast, Daigle discloses that the message to be translated is acquired from a text message in an instant messaging system. Accordingly, Daigle does not disclose that a recognition process is used to acquire the voice data to be translated and the target language type, as required by claim 1”. 
As previously noted, “it will be clear to one of ordinary skill in the art that the systems and methods presented herein may be extended to translating other messaging protocols such as voice-over Internet protocol (VoIP)” (Col 3, Rows 10-15). Therefore, Daigle teaches voice message consistent with VoIP where the established function of Franz would recognize and translate to obtain predictable result.
Claim Rejections - 35 USC § 103
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 103 that form the basis for the rejections under this section made in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 7-8, 14-15, and 22 are rejected under 35 USC 103(a) as being unpatentable over Franz et al. (US 2002/0198713 A1) in view of Daigle et al. (US 8027438 B1).
Regarding Claims 1 and 8, Franz discloses a server (¶66, remote server), comprising: 
a memory, a processor and computer programs stored in the memory and executable by the processor, wherein when the computer programs are executed by the processor, a voice translation device is realized (¶61 an ¶277, speech translation system “STS” comprising a processor, memory, and program instruction), wherein the voice translation device is configured to perform the steps of: 
acquiring voice data from a terminal, the acquired voice data comprising voice data to be translated (¶66, access server function remotely from a PDA or cell phone; ¶67, STS accepts spoken language in a source language and performs speech recognition in the source language; 
¶67 and ¶69, STS performs speech recognition in the source language to produce at least one speech recognition hypothesis from coded multiple hypotheses and to output the best hypothesis; per ¶67, optionally allowing the user to confirm the recognized expression or allow user to choose from a sequence of candidate recognitions); 
performing a recognition process on the acquired voice data using a language model corresponding to the determined language type to acquire first recognition information corresponding to the acquired voice data (¶104, upon receipt of a speech input 1201, acoustic speech recognition component 1202 uses at least one word pronunciation dictionary 1222 and at least one acoustic model 1224 to generate at least one data structure 1204 encoding hypothesized words where data structure information 1204 is used for utterance hypothesis construction 1206; ¶119, utterance hypothesis construction component uses language model (i.e., data structure information 1204) to construct utterance hypothesis); 
acquiring the voice data to be translated from the first recognition information (¶70 and ¶102, perform matching and transfer recursively on parts of the shallow syntactic representation of the input to construct one or more hypotheses for speech recognition in a speech translation system); and
performing a translation process on the voice data to be translated according to the target language type to acquire a translation result corresponding to the voice data to be translated (¶70 and ¶102-103, perform source to target language transfer to produce target language syntactic representation).
Franz does not disclose the acquired voice data further comprises a target language type of the voice data to be translated and acquiring the target language type from the first recognition information. 
Daigle teaches a translation system for translating messages (Abstract) that responds to voice command comprising a target language type of the messages to be translated and acquiring the target language type from speech recognition information resulted from speech recognition processing of the voice command (Col 13, Rows 49-55, system responds to voice command “please translate message to Spanish before sending”).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to modify Franz to respond to voice commands by performing speech recognition (Franz, ¶67) on voice data comprising a target language type in order to allow the user to select a preferred source language – target language pair (Franz, ¶110; Daigle, Col 13, Rows 20-24 and Rows 49-53, responding to voice commands for selecting a target language type).
Regarding Claims 7 and 14, Franz discloses wherein after the acquiring the translation result corresponding to the voice data to be translated, the method further comprises: sending the first recognition information and the translation result to the terminal (¶66, when remote server hosts the STS system and the user may dial the STS translation service from a PDA or cell phone; ¶67, after STS performs speech recognition in the source language, optionally allow the user to confirm the recognized expression would require transmitting the recognition information to the PDA or cell phone).
Regarding Claim 15, Franz discloses a non-transitory computer readable storage medium, having computer programs stored thereon, wherein when the computer programs are executed by a processor, a voice translation method of claims 1 and 8 is realized (¶61 an ¶277, speech translation system “STS” comprising a processor, memory, and program instruction).
Regarding Claim 22, Franz discloses recognizing an intention of the first recognition information to determine a translation intention corresponding to the first recognition information (¶71, the STS analyzes the input, determines the meaning of the input, and renders that meaning in the appropriate way in a target language), wherein different translation intentions correspond to different translation models (¶73-74, combing syntactic analysis with analogical or statistical transfer to produce high quality translation in different domains; see for example, ¶82, parse the input “I want to make a reservation for three people for tomorrow evening at seven o’clock” to identify syntactic constituents / parse tree; ¶85, the domain independent syntactic analysis is combined with domain dependent translation example database described in ¶73), and wherein translation results corresponding to the same recognition information are different depending on different translation intentions (¶96 and ¶100-101, perform an initial fast match to quickly check the compatibility of the input parse tree with a domain specific example database to rule out unlikely examples where the fast match is performed based on syntactic head of the constituents to be matched while constrained to equality or to a thesaurus based measure of close semantic similarity); 
¶97-99, after initial fast match, perform best match to find the best match from the example database given an input); and 
performing the translation process on the voice data to be translated according to the determined translation model to acquire the translation result corresponding to the voice data to be translated (in view of ¶19, match the input to source expressions of example pairs in the example database, find the most appropriate examples, take the target expression from best matching examples and construct an expression in the target language).
Claims 2, 9, and 16 are rejected under 35 USC 103(a) as being unpatentable over Franz et al. (US 2002/0198713 A1) and Daigle et al. (US 8027438 B1) as applied to claims 1, 8, and 15, in view of Chun (US 2011/0218804 A1).
Regarding Claims 2, 9, and 16, Franz does not disclose wherein determining the language type of the acquired voice data comprises: determining a feature vector of the acquired voice data; and determining the language type of the voice data based on a match degree between the feature vector and a preset language type model.  
Chun discloses a server (¶66 and Fig. 1, device 1 receiving audio data from a remote location over a network) determining a language type of voice data acquired from a terminal (¶66, receiving audio data from a remote location; ¶78, ¶81 and ¶141, determining a likelihood of a sequence of observations / vectors representing audio occurs in a given language) wherein determining the language type of the acquired voice data comprises: 
¶78 and ¶141, speech signals are converted into an input vector in n-dimensional acoustic space); and 
determining the language type of the acquired voice data based on a match degree between the feature vector and a preset language type model (¶81 and ¶141, determining the likelihood of the sequence of observations occurring in a given language is evaluated using the language model). 
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to modify Franz to determine the target language type by matching a feature vector of the acquired voice data with a preset language type model in order to output the sequence of words into a translation system where it is translated into a second language (Chun, ¶141).
Claim 21 is rejected under 35 USC 103(a) as being unpatentable over Franz et al. (US 2002/0198713 A1) in view of Daigle et al. (US 8027438 B1) as applied to Claim 1, in further view of Choi (US 2005/0182628 A1).
Regarding Claim 21, Franz discloses wherein before the performing the translation process on the voice data to be translated, the method further comprises: 
performing a post-process on the first recognition information to generate second recognition information (¶83 and ¶96, process incomplete or imperfectly grammatical natural human speech by performing morphological analysis to re-arrange syntactic constituents to generate a final feature structure like Fig. 7, “I want to make a reservation for three people for tomorrow morning” by rearranging syntactic features through insert, delete, or join parts of syntactic representation) and performing the ¶94-97, since natural human speech is not perfectly complete and grammatical, perform optimization procedure to insert, delete or join parts of the syntactic representation and perform matching with the appropriate domain specific example database). 
Franz does not disclose wherein before performing the translation process on the voice data to be translated, performing a post-process on the first recognition information to generate second recognition information, wherein the post-process comprises correction based on hot words.
Choi discloses a domain based speech recognition apparatus performing a first speech recognition on speech input / voice data to be translated to generate a first recognition information (Abstract and ¶46-48, using first acoustic model and first language model to recognize Korean language based speech input to generate first recognition result in Korean equivalent of “what time is the temperature now?”) and perform a post-process on the first recognition information to generate second recognition information by correcting the first recognition information via a correction based on hot words (¶49-51, determine a domain keyword “temperature” to select a proper candidate domain; ¶53, apply second acoustic model and second language model to generate second recognition sentence “what is the temperature now?”).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to modify Franz to perform a post-process on the first recognition Choi in order to minimize misrecognition of a word in a final recognition result (Choi, Abstract).
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to examiner Richard Z. Zhu whose telephone number is 571-270-1587 or examiner’s supervisor King Y. Poon whose telephone number is 571-272-7440. Examiner Richard Zhu can normally be reached on M-Th, 0730:1700.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access 
/RICHARD Z ZHU/Primary Examiner, Art Unit 2675                                                                                                                                                                                                        09/30/2021