DETAILED ACTION

Introduction
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendments/Arguments
2.	With respect to Claim 1, Applicant argued on pages 2-3 of the Remarks that “the Office Action alleges a skilled person would modify Abkairow with Chao’ 724 to display an alternative ASR result in a second language to provide multiple ASR results in different language. But this amounts to the conclusory (and circular) allegation that a skilled person would modify a reference to include a feature for the reason of including the feature. In particular, the alleged reasoning for modifying Abkairow with Chao to include “multiple speech recognition models for multiple different languages” merely states the result of the proposed combination, e.g., “generating the multiple textual representations, wherein each of the multiple different textual representation is corresponding to a specific language.” That is, modifying any reference to include “multiple speech recognition models for multiple different languages” results in “generating the multiple textual representations, wherein each of the multiple different textual representation is corresponding to a specific language.” Accordingly, adopting the Examiner's alleged reasoning for combination leads to the nonsensical conclusion that a skilled person would combine Chao'724 with any other reference. Applicant thus respectfully submits that the Office Action provides insufficient reasoning to support the alleged conclusion of obviousness. According to In re Kahn, 441 F. 3d 977, 988, 78 USPQ2d 1329, 1336 (Fed. Cir. 2006), “rejections on obviousness cannot be sustained by mere conclusory statements; instead, there must be some articulated reasoning with some rational underpinning to support the legal conclusion of obviousness.” Here, the Office has failed to meet the burden of providing “articulated reasoning with some rational underpinning” to support the combination of Abkairov and Chao '724.
 	In response, the examiner recognizes that obviousness may be established by combining or modifying the teachings of the prior art to produce the claimed invention where there is some teaching, suggestion, or motivation to do so found either in the references themselves or in the In re Fine, 837 F.2d 1071, 5 USPQ2d 1596 (Fed. Cir. 1988), In re Jones, 958 F.2d 347, 21 USPQ2d 1941 (Fed. Cir. 1992), and KSR International Co. v. Teleflex, Inc., 550 U.S. 398, 82 USPQ2d 1385 (2007).  In this case, Abkairov discloses a method for converting a received natural language speech into multiple candidate transcriptions, displaying a first transcription to the user, and displaying a list of two or more alternative transcriptions in response to receiving of rejection of the first transcription from the user. Chao utilizes multiple speech recognition models for multiple different languages in order to convert the received natural language speech into multiple different transcriptions in multiple different languages ([0046]) with the suggestion to combine the references being to allow the system to be used by multilingual users and/or households while avoiding excess usage of computational and/or network resources when the user provides input in an alternate language as taught by Choi (para. [0015]).

 	Applicant argued on page 3 of the Remarks that “not only does the Office Action fail to provide adequate reasoning to combine the references to allegedly satisfy claim 1, a skilled person would find no reason for the combination. In particular, modifying Abkairov to add the feature of using multiple ASR models in different languages and presenting the transcription in multiple languages would require extra processing power and clutter the UI with unnecessary audio transcriptions. In other words, the proposed modification of Abkairov “amount[s] to extra work and greater expense for no apparent reason.” (MPEP 2143 and see In re Omeprazole Patent Litigation, 536 F. 3d 1361, 87 USPQ2d 1865 (Fed. Cir. 2008).) Thus, a person of ordinary skill in the art would not be motivated to combine Abkairov with Chao '724 for the sole purpose of satisfying “wherein... the second recognition result being in a second language,” as recited in claim 1.”
	In response, Examiner respectfully notes that Abkairov discloses providing first and second recognition results in a first language. Choi discloses providing first and second recognition results in multiple languages where to account for multilingual users and households, multiple recognition results are provided in multiple languages to avoid unsatisfactory situations where an alternate language result is required but there is only a single speech recognition provided (Choi in para. [0015]). As provided in Choi (para. [0015]) and contrary to Applicant’s 
Claim Rejections - 35 USC § 103
3.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

4.	Claims 1-4, 6-9, 12, 13, 15, 17-19 are rejected under 35 U.S.C. 103 as being unpatentable over Abkairov (US 2017/0085696 A1) in view of Chao (US 2019/0318724 A1.)

 	With respect to Claim 1, Abkairov discloses
	A non-transitory computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by one or more processors of an electronic device (Abkairov [0067] In the case of a software implementation, the module, functionality, or logic represents program code that performs specified tasks when executed on a processor(e.g. CPU or CPUs). The program code can be stored in one or more computer readable memory devices. The features of the techniques described below are platform-independent, meaning that the techniques may be implemented on a variety of commercial computing platforms having a variety of processors), cause the electronic device to: 
 	cause a first recognition result for a received natural language speech input to be displayed (Abkairov [0031] to provide a transcription of each of one or more phrases of speech, [0033] a second area 304 showing a suggested transcription of a phrase in a video message that the sending (near-end) user is about to send, Here the second area 304 is shown as a plain text box, but it will be appreciated this is just schematic and various other design options are possible. For example the suggested transcription may be overlaid over a preview of the video about to be sent, and/or may take other shapes. E.g. in embodiments the user interface element 304 showing the transcribed phrase may take the form of a speech bubble displayed wholly or partially overlaid over the video image 302, or otherwise in association with it (e.g. below), Fig. 3 element 304), wherein: 
 	 	the first recognition result is in a first language (Abkairov Fig. 3 element 304 Can you give me an egg sample); and 
 	 	a second recognition result for the received natural language speech input is available for display responsive to receiving input indicative of user selection of the first recognition result (Abkairov [0034] The user interface 204, 304 is thus configured, under control of the communication client application 206, to give the sending user of the near-end user terminal 102a the chance to review the transcription before sending to the far-end user(s) of the one or more far-end terminals 102b-d as part of the video messaging conversation. Further, it is also configured, again under control of the communication client application 206, to enable the sending user of the near-end user terminal 102a to either accept or reject the transcription using a quick gesture performed on the surface of the touch screen 204, [0042] one of the rejection gestures such as tapping the suggested transcription 304 may summon a list of two or more alternative transcriptions), 
	receive the input indicative of user selection of the first recognition result (Abkairov [0042] one of the rejection gestures such as tapping the suggested transcription 304 may summon a list of two or more alternative transcriptions); and 
 	in response to receiving the input indicative of user selection of the first recognition result, cause the second recognition result to be displayed (Abkairov [0042] one of the rejection gestures such as tapping the suggested transcription 304 may summon a list of two or more alternative transcriptions, optionally listed together with their respective estimated probabilities as estimated by the transcription module (again see above). The sending user can then select an alternative option from the list by touching that option in the presented list on the touchscreen 204 (or perhaps also the option of rejecting all suggestions and abandoning sending, or re-speaking) E.g. in the example shown, the sending user actually said "Can you give me an example?", but the transcription module 205 output "Can you give me an egg sample". In this case tapping the incorrect transcription 304 may bring up a list of alternatives on the touch screen 204 such as: [0043] ["Can you give me an egg sample?" 33%], [0044] "Can you give me an egg sandwich?" 29%, [0045] "Can you give me an example?" 27%, [0046] "Canyon grove means are ample" 5%, [0047] "Can you grieve mayonnaise ex maple?" 2 %.)
	Abkairov fail to explicitly teach 
 	the second recognition result being in a second language;
	However, Chao et al. teach
 	the second recognition result being in a second language (Chao et al. [0046] Multiple speech recognition models 136 for multiple different languages can be utilized in processing of audio data to generate multiple candidate semantic and/or textual representations (e.g., each corresponding to a different language));
	Abkairov and Chao et al. are analogous art because they are from a similar field of endeavor in the Signal Processing algorithm and application. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of displaying one or more alternative transcription to the user in response to the user’s rejection, using teaching of the multiple speech recognition models for multiple different languages as taught by Chao et al. for the benefit of generating the multiple different textual representations, wherein each of the multiple different textual representation is corresponding to a specific language (Chao et al. [0046]  Multiple speech recognition models 136 for multiple different languages can be utilized in processing of audio data to generate multiple candidate semantic and/or textual representations (e.g., each corresponding to a different language)) and also to allow the system (Abkairov’s system) to be used by multilingual users and/or households while avoiding excess usage of computational and/or network resources when the user provides input in an alternate language  (Chao, [0015])

	With respect to Claim 2, Abkairov in view of Chao et al. teach 
 	wherein: the electronic device includes a display; 
 	causing the first recognition result to be displayed includes displaying the first recognition result at a first location on the display (Abkairov Fig. 3 element 304, [0042] one of the rejection gestures such as tapping the suggested transcription 304 may summon a list of two or more alternative transcriptions, optionally listed together with their respective estimated probabilities as estimated by the transcription module); and 
 	the input indicative of user selection of the first recognition result includes a user gesture at the first location (Abkairov [0042] one of the rejection gestures such as tapping the suggested transcription 304 may summon a list of two or more alternative transcriptions, optionally listed together with their respective estimated probabilities as estimated by the transcription module.)

 	With respect to Claim 3, Abkairov in view of Chao et al. teach 
 	wherein the second recognition result is available for display without receiving natural language speech input other than the received natural language speech input (Abkairov Fig. 3 element 304, [0042] one of the rejection gestures such as tapping the suggested transcription 304 may summon a list of two or more alternative transcriptions, optionally listed together with their respective estimated probabilities as estimated by the transcription module. The Examiner notes that the user taps the suggested transcription to reject it. The rejection summons one or more alternative transcriptions. The user uses a touch screen to tap, the user does not speak any command to reject the suggested transcription.)

 	With respect to Claim 4, Abkairov in view of Chao et al. teach
 	wherein causing the second recognition result to be displayed includes causing the second recognition result to be displayed adjacent to the displayed first recognition result (Abkairov [0042] one of the rejection gestures such as tapping the suggested transcription 304 may summon a list of two or more alternative transcriptions, optionally listed together with their respective estimated probabilities as estimated by the transcription module (again see above). The sending user can then select an alternative option from the list by touching that option in the presented list on the touchscreen 204 (or perhaps also the option of rejecting all suggestions and abandoning sending, or re-speaking) E.g. in the example shown, the sending user actually said "Can you give me an example?", but the transcription module 205 output "Can you give me an egg sample". In this case tapping the incorrect transcription 304 may bring up a list of alternatives on the touch screen 204 such as: [0043] ["Can you give me an egg sample?" 33%], [0044] "Can you give me an egg sandwich?" 29%, [0045] "Can you give me an example?" 27%, Canyon grove means are ample" 5%, [0047] "Can you grieve mayonnaise ex maple?" 2 %.)
 
With respect to Claim 6, Abkairov in view of Chao et al. teach 
 	wherein the one or more programs further comprise instructions, which when executed by the one or more processors, cause the electronic device to: 
 	receive the natural language speech input (Chao et al. [0014] audio data can be generated based on detection of a spoken utterance of a user via one or more microphones of a client device, [0045] process audio data received at an assistant interface 126 to determine text and/or other semantic representation(s) of a spoken utterance embodied in the audio data); 
 		process the natural language speech input using a language recognizer for the first language, including determining, based on the natural language speech input, the first recognition result (Chao et al. [0027] When the interaction characteristic occurs, the audio data received relative to the interaction characteristic can be processed through multiple different speech recognition models corresponding to multiple different languages. Text or phonemes resulting from the processing can be analyzed to determine a language that the text or phonemes most likely corresponds to. For instance, textual data or phoneme data can be generated from each of the models, and percentage similarities for the languages can be provided. A speech recognition model corresponding to a language that has the highest percentage similarity for the text or phonemes generated can be activated, [0045] The speech recognition engine 134 can utilize one or more speech recognition model 136 in determining text and/or other semantic representation of a spoken utterance embodied in audio data. As described herein, multiple speech recognition models 136 can be provided, and each speech recognition model can be for a corresponding language. For example, a first speech recognition model can be for English, a second speech recognition model can be for French, etc. Further, as described herein, which of multiple speech recognition models 136 is utilized in processing of audio data can be based on, for example, information contained in a user profile determined to correspond to the audio data being processed. For example, a given user profile can be determined to correspond to audio data being processed based on matching voice features of the audio data to voice feature associated with the profile);  and 
When the interaction characteristic occurs, the audio data received relative to the interaction characteristic can be processed through multiple different speech recognition models corresponding to multiple different languages, [0069] In response, the automated assistant can cause multiple models (e.g., an "English" speech recognition model and a "Chinese" speech recognition model) associated with the user profile to process any subsequent spoken utterance from the user, in order to determine whether the user 202 has switched a language that are speaking in. For instance, the subsequent user dialog 204 of "Sh ok[hacek over (a)]o" can be converted to audio data and processed through an "English" speech recognition model and a "Chinese" speech recognition model. The output from each model can include text and/or phonemes, which can be processed to determine a likelihood that the user is speaking English or Chinese): 
 	 	 	process the natural language speech input using a language recognizer for the second language, including determining, based on the natural language speech input, the second recognition result (Chao et al. [0027] When the interaction characteristic occurs, the audio data received relative to the interaction characteristic can be processed through multiple different speech recognition models corresponding to multiple different languages. Text or phonemes resulting from the processing can be analyzed to determine a language that the text or phonemes most likely corresponds to. For instance, textual data or phoneme data can be generated from each of the models, and percentage similarities for the languages can be provided. A speech recognition model corresponding to a language that has the highest percentage similarity for the text or phonemes generated can be activated.)
 
	With respect to Claim 7, Abkairov in view of Chao et al. teach 
 	wherein the one or more programs further comprise instructions, which when executed by the one or more processors, cause the electronic device to: 
 	determine a first likelihood that the natural language speech input is the first language (Chao et al. [0017] The probability metrics can be based on past usage of the multiple candidate languages, and each probability metric can correspond to one or more interaction characteristics (e.g., each based on an instant interaction between the user and the automated assistant(s)), [0018] a single particular language, of multiple languages assigned to the user profile, can have an assigned probability metric, for one or more interaction characteristics or parameters (e.g., a duration of a response from the user, a length of a delay in responding to the automated assistant, an anticipated type of input or type of speech to be provided to the automated assistant), where the probability metric indicates a very high likelihood of the single particular language being spoken by the given user);
 	determine a second likelihood that the natural language speech input is the second language (Chao et al. [0017] The probability metrics can be based on past usage of the multiple candidate languages, and each probability metric can correspond to one or more interaction characteristics (e.g., each based on an instant interaction between the user and the automated assistant(s)), [0018] a single particular language, of multiple languages assigned to the user profile, can have an assigned probability metric, for one or more interaction characteristics or parameters (e.g., a duration of a response from the user, a length of a delay in responding to the automated assistant, an anticipated type of input or type of speech to be provided to the automated assistant), where the probability metric indicates a very high likelihood of the single particular language being spoken by the given user); and
 	determine whether the first likelihood or the second likelihood exceeds a threshold (Chao et al., [0019] As another particular example, two particular languages, of three or more candidate languages assigned to the user profile, can have corresponding assigned probability metrics, for one or more interaction characteristics, where the probability metrics each indicate at least a likelihood of a corresponding one of the two particular languages being spoken by the given user. Based on the assigned probability metrics, the two particular languages can be selected, and speech recognition of the given spoken utterance performed using only speech recognition models for the two particular languages. The other candidate language(s) may not be selected for speech recognition based on their corresponding assigned probability metrics, for the one or more current contextual parameters, failing to satisfy a threshold.)

 	With respect to Claim 8, Abkairov in view of Chao et al. teach
 	wherein: 
 	determining the first likelihood includes determining the first likelihood while processing the natural language speech input using the language recognizer for the first language (Chao et al. [0027] When the interaction characteristic occurs, the audio data received relative to the interaction characteristic can be process through multiple different languages. Text or phonemes resulting from the processing can be analyzed to determine a language that the text or phonemes most likely corresponds to. For instance, textual data or phoneme data can be generated from each of the models, and percentage similarities for the languages can be provided); and
 	determining the second likelihood includes determining the second likelihood while processing the natural language speech input using the language recognizer for the first language (Chao et al. [0027] When the interaction characteristic occurs, the audio data received relative to the interaction characteristic can be process through multiple different languages. Text or phonemes resulting from the processing can be analyzed to determine a language that the text or phonemes most likely corresponds to. For instance, textual data or phoneme data can be generated from each of the models, and percentage similarities for the languages can be provided. The Examiner notes that Chao et al. processing the audio data through multiple different languages by utilizing the multiple speech recognition models, and each speech recognition model can be for a corresponding language. The percentage similarities for the languages can be determined based on textual data or phoneme data generated from each of the models. It is construed that the speech recognition model for the first language is used to determine the first likelihood of the audio data for the first language, the speech recognition model for the second language is used to determine the second likelihood of the audio data for the second language. The audio data is process through multiple speech models, it implied that the first speech model is processing the audio data when the second speech model is processing the audio data.)
 
 	With respect to Claim 9, Abkairov in view of Chao et al. teach
 	wherein causing the first recognition result to be displayed is performed in accordance with a determination that the first likelihood exceeds the threshold and in accordance with a determination that the second likelihood does not exceed the threshold (Chao et al. [0025] The selected probabilistic metric can be compared and/or processed with other probabilistic metrics in order to determine a suitable speech recognition model to user for processing the audio data corresponding to the response for the user. The speech recognition model for a first language can be selected over a speech recognition model for a second language, strictly based on whether the interaction characteristic did or did not satisfy a particular threshold, Abkairov Fig. 3 element 304 displaying the text transcription of the user’s speech.)

Claim 12, Abkairov in view of Chao et al. teach 
 	wherein the one or more programs further comprise instructions, which when executed by the one or more processors, cause the electronic device to:  
 	select, from a plurality of language classifiers, a language classifier for the first language and a language classifier for the second language based on first context information (Chao et al. (Chao et al.[0017] selecting only the subset of languages can be based on, for example, probability metrics assigned to the multiple candidate languages for a particular user. The probability metrics can be based on past usage of the multiple candidate languages, and each probability metric can be correspond to one or more interaction characteristic (e.g., each based on an instant interaction between the user and the automatic assistant(s))); and
            wherein: 
 		determining the first likelihood includes determining the first likelihood using the language classifier for the first language (Chao et al.[0019] As another particular example, two particular languages, of three or more candidate languages assigned to the user profile, can have corresponding assigned probability metrics, for one or more interaction characteristics,where the probability metrics each indicate at least a likelihood of a corresponding one of the two particular languages being spoken by the given user. Based on the assigned probability metrics, the two particular languages can be selected, and speech recognition of the given spoken utterance performed using only speech recognition models for the two particular languages, [0027] Text or phonemes resulting from the processing can be analyzed to determine a language that the text or phonemes most likely correspond to); and 
 		determining the second likelihood includes determining the second likelihood using the language classifier for the second language (Chao et al. [0019] As another particular example, two particular languages, of three or more candidate languages assigned to the user profile, can have corresponding assigned probability metrics, for one or more interaction characteristics, where the probability metrics each indicate at least a likelihood of a corresponding one of the two particular languages being spoken by the given user. Based on the assigned probability metrics, the two particular languages can be selected, and speech recognition of the given spoken utterance performed using only speech recognition models for the two particular languages, [0027] Text or phonemes resulting from the processing can be analyzed to determine a language that the text or phonemes most likely correspond to.)

 	With respect to Claim 13, Abkairov in view of Chao et al. teach 
 	wherein the one or more programs further comprise instructions, which when executed by the one or more processors, cause the electronic device to: 
 	select, from a plurality of language recognizers, the language recognizer for the first language and the language recognizer for the second language based on second context information associated with a user (Chao et al.[0017] selecting only the subset of languages can be based on, for example, probability metrics assigned to the multiple candidate languages for a particular user. The probability metrics can be based on past usage of the multiple candidate languages, and each probability metric can be correspond to one or more interaction characteristic (e.g., each based on an instant interaction between the user and the automatic assistant(s)), [0027] Text or phonemes resulting from the processing can be analyzed to determine a language that the text or phonemes most likely correspond to.)

With respect to Claim 15, Abkairov in view of Chao et al. teach 
 	wherein the second context information includes one or more respective languages of one or more communications associated with the user (Chao et al. [0017] selecting only the subset of languages can be based on, for example, probability metrics assigned to the multiple candidate languages for a particular user. The probability metrics can be based on past usage of the multiple candidate languages, and each probability metric can be correspond to one or more interaction characteristic (e.g., each based on an instant interaction between the user and the automatic assistant(s).)

With respect to Claim 17, Abkairov in view of Chao et al. teach 
 	wherein causing the first recognition result to be displayed is based on a user setting (Chao et al. [0023] The user profile can be manually created or modified by the user in order that the user can manually designate preferred languages with which the user can engage with the automated assistant, [0026] The user profile can indicate a default language that the user more commonly prefers to speak in, [0051] when the user 130 is interacting with the automated assistant 126, a first language can be selected from the user profile for the user 130 as a default language for the user 130, Abkairov Fig. 3 element 304. The result of the speech recognition in Abkairov is displayed.)

	With respect to Claim 18, Abkairov disclose 
 	An electronic device, comprising: 
 	one or more processors (Abkairov [0067] In the case of a software implementation, the module, functionality, or logic represents program code that programs specified tasks when executed on a processor (e.g., CPU or CPUs). The program code can be stored in one or more computer readable memory devices);  
 	memory (Abkairov [0067] In the case of a software implementation, the module, functionality, or logic represents program code that programs specified tasks when executed on a processor (e.g., CPU or CPUs). The program code can be stored in one or more computer readable memory devices); and 
 	one or more programs stored in the memory, the one or more programs including instructions for (Abkairov [0067] In the case of a software implementation, the module, functionality, or logic represents program code that programs specified tasks when executed on a processor (e.g., CPU or CPUs). The program code can be stored in one or more computer readable memory devices): 
 		causing a first recognition result for a received natural language speech input to be displayed (Abkairov [0031] to provide a transcription of each of one or more phrases of speech, [0033] a second area 304 showing a suggested transcription of a phrase in a video message that the sending (near-end) user is about to send, Here the second area 304 is shown as a plain text box, but it will be appreciated this is just schematic and various other design options are possible. For example the suggested transcription may be overlaid over a preview of the video about to be sent, and/or may take other shapes. E.g. in embodiments the user interface element 304 showing the transcribed phrase may take the form of a speech bubble displayed wholly or partially overlaid over the video image 302, or otherwise in association with it (e.g. below), Fig. 3 element 304), wherein: 
 		the first recognition result is in a first language (Abkairov Fig. 3 element 304 Can you give me an egg sample); and 
The user interface 204, 304 is thus configured, under control of the communication client application 206, to give the sending user of the near-end user terminal 102a the chance to review the transcription before sending to the far-end user(s) of the one or more far-end terminals 102b-d as part of the video messaging conversation. Further, it is also configured, again under control of the communication client application 206, to enable the sending user of the near-end user terminal 102a to either accept or reject the transcription using a quick gesture performed on the surface of the touch screen 204, [0042] one of the rejection gestures such as tapping the suggested transcription 304 may summon a list of two or more alternative transcriptions), 
receiving the input indicative of user selection of the first recognition result (Abkairov [0042] one of the rejection gestures such as tapping the suggested transcription 304 may summon a list of two or more alternative transcriptions); and 
in response to receiving the input indicative of user selection of the first recognition result, causing the second recognition result to be displayed (Abkairov [0042] one of the rejection gestures such as tapping the suggested transcription 304 may summon a list of two or more alternative transcriptions, optionally listed together with their respective estimated probabilities as estimated by the transcription module (again see above). The sending user can then select an alternative option from the list by touching that option in the presented list on the touchscreen 204 (or perhaps also the option of rejecting all suggestions and abandoning sending, or re-speaking) E.g. in the example shown, the sending user actually said "Can you give me an example?", but the transcription module 205 output "Can you give me an egg sample". In this case tapping the incorrect transcription 304 may bring up a list of alternatives on the touch screen 204 such as: [0043] ["Can you give me an egg sample?" 33%], [0044] "Can you give me an egg sandwich?" 29%, [0045] "Can you give me an example?" 27%, [0046] "Canyon grove means are ample" 5%, [0047] "Can you grieve mayonnaise ex maple?" 2 %.)
	Abkairov fail to explicitly teach 
 	the second recognition result being in a second language;
	However, Chao et al. teach
Multiple speech recognition models 136 for multiple different languages can be utilized in processing of audio data to generate multiple candidate semantic and/or textual representations (e.g., each corresponding to a different language));
	Abkairov and Chao et al. are analogous art because they are from a similar field of endeavor in the Signal Processing algorithm and application. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of displaying one or more alternative transcription to the user in response to the user’s rejection, using teaching of the multiple speech recognition models for multiple different languages as taught by Chao et al. for the benefit of generating the multiple different textual representations, wherein each of the multiple different textual representation is corresponding to a specific language (Chao et al. [0046]  Multiple speech recognition models 136 for multiple different languages can be utilized in processing of audio data to generate multiple candidate semantic and/or textual representations (e.g., each corresponding to a different language)) and also to allow the system (Abkairov’s system) to be used by multilingual users and/or households while avoiding excess usage of computational and/or network resources when the user provides input in an alternate language  (Chao, [0015])

  	With respect to Claim 19, Abkairov discloses
 	A method for processing natural language speech inputs, the method comprising: 
 	at an electronic device with one or more processors and memory (Abkairov [0067] In the case of a software implementation, the module, functionality, or logic represents program code that programs specified tasks when executed on a processor (e.g., CPU or CPUs). The program code can be stored in one or more computer readable memory devices): 
 	causing a first recognition result for a received natural language speech input to be displayed (Abkairov [0031] to provide a transcription of each of one or more phrases of speech, [0033] a second area 304 showing a suggested transcription of a phrase in a video message that the sending (near-end) user is about to send, Here the second area 304 is shown as a plain text box, but it will be appreciated this is just schematic and various other design options are possible. For example the suggested transcription may be overlaid over a preview of the video about to be sent, and/or may take other shapes. E.g. in embodiments the user interface element 304 showing the transcribed phrase may take the form of a speech bubble displayed wholly or partially overlaid over the video image 302, or otherwise in association with it (e.g. below), Fig. 3 element 304), wherein: 
 		the first recognition result is in a first language (Abkairov Fig. 3 element 304 Can you give me an egg sample); and 
 		a second recognition result for the received natural language speech input is available for display responsive to receiving input indicative of user selection of the first recognition result (Abkairov [0034] The user interface 204, 304 is thus configured, under control of the communication client application 206, to give the sending user of the near-end user terminal 102a the chance to review the transcription before sending to the far-end user(s) of the one or more far-end terminals 102b-d as part of the video messaging conversation. Further, it is also configured, again under control of the communication client application 206, to enable the sending user of the near-end user terminal 102a to either accept or reject the transcription using a quick gesture performed on the surface of the touch screen 204, [0042] one of the rejection gestures such as tapping the suggested transcription 304 may summon a list of two or more alternative transcriptions), 
	receiving the input indicative of user selection of the first recognition result (Abkairov [0042] one of the rejection gestures such as tapping the suggested transcription 304 may summon a list of two or more alternative transcriptions); and 
 	in response to receiving the input indicative of user selection of the first recognition result, causing the second recognition result to be displayed (Abkairov [0042] one of the rejection gestures such as tapping the suggested transcription 304 may summon a list of two or more alternative transcriptions, optionally listed together with their respective estimated probabilities as estimated by the transcription module (again see above). The sending user can then select an alternative option from the list by touching that option in the presented list on the touchscreen 204 (or perhaps also the option of rejecting all suggestions and abandoning sending, or re-speaking) E.g. in the example shown, the sending user actually said "Can you give me an example?", but the transcription module 205 output "Can you give me an egg sample". In this case tapping the incorrect transcription 304 may bring up a list of alternatives on the touch screen 204 such as: [0043] ["Can you give me an egg sample?" 33%], [0044] "Can you give me 
	Abkairov fail to explicitly teach 
 	the second recognition result being in a second language;
	However, Chao et al. teach
 	the second recognition result being in a second language (Chao et al. [0046] Multiple speech recognition models 136 for multiple different languages can be utilized in processing of audio data to generate multiple candidate semantic and/or textual representations (e.g., each corresponding to a different language));
	Abkairov and Chao et al. are analogous art because they are from a similar field of endeavor in the Signal Processing algorithm and application. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of displaying one or more alternative transcription to the user in response to the user’s rejection, using teaching of the multiple speech recognition models for multiple different languages as taught by Chao et al. for the benefit of generating the multiple different textual representations, wherein each of the multiple different textual representation is corresponding to a specific language (Chao et al. [0046]  Multiple speech recognition models 136 for multiple different languages can be utilized in processing of audio data to generate multiple candidate semantic and/or textual representations (e.g., each corresponding to a different language)) and also to allow the system (Abkairov’s system) to be used by multilingual users and/or households while avoiding excess usage of computational and/or network resources when the user provides input in an alternate language  (Chao, [0015])

5.	Claim 5 is  rejected under 35 U.S.C. 103 as being unpatentable over Abkairov (US 2017/0085696 A1) in view of Chao (US 2019/0318724 A) and Jun et al. (US 2013/0290001 A1.)

With respect to Claim 5, Abkairov in view of Chao et al. disclose all the limitations of Claim 1 upon which Claim 5 depends. Abkairov in view of Chao et al. fail to teach 
 	wherein the one or more programs further comprise instructions, which when executed by the one or more processors, cause the electronic device to: 

 	receive an input indicative of user selection of the displayed second recognition result; 
 	in response to receiving the input indicative of user selection of the displayed second recognition result, initiate a task based on the second recognition result; and 
 	provide a result based on the task.  
	However, Yun et al. teach 
 	wherein the one or more programs further comprise instructions, which when executed by the one or more processors, cause the electronic device to: 
 	in accordance with causing the second recognition result to be displayed (Yun et al. [0135] The image processing apparatus 100 may display the information of the voice recognized at operation S506 on the display unit 130 at operation S508. When there is a plurality of voice recognition results, the first controller 160 may display a plurality of information on the display unit 130 to enable a user to select one of the voice recognition result): 
 	receive an input indicative of user selection of the displayed second recognition result (Yun et al. [0135] The image processing apparatus 100 may display the information of the voice recognized at operation S506 on the display unit 130 at operation S508. When there is a plurality of voice recognition results, the first controller 160 may display a plurality of information on the display unit 130 to enable a user to select one of the voice recognition result); 
 	in response to receiving the input indicative of user selection of the displayed second recognition result, initiate a task based on the second recognition result (Yun et al. [0135] The image processing apparatus 100 may display the information of the voice recognized at operation S506 on the display unit 130 at operation S508. When there is a plurality of voice recognition results, the first controller 160 may display a plurality of information on the display unit 130 to enable a user to select one of the voice recognition result, Fig. 8 S508 Display information of recognized voice, S510 Transmit command to electronic apparatus corresponding to recognized voice, S512 Perform operation corresponding to received command); and 
 	provide a result based on the task (Yun et al. [0077] The first controller 160 performs an operation corresponding to the recognition result of the voice recognition engine 161. For example, when the image processing apparatus 100 is implemented as a TV, upon recognition of a voice command such as, for example, "volume up", " volume down", "increase volume, or "decrease volume " by the voice recognition engine 161 while a user watches a program, such as a movie or news, the first controller 160 may accordingly adjust the volume of the movie or news.)
 	Abkairov, Chao et al. and Yun et al. are analogous art because they are from a similar field of endeavor in the Signal Processing algorithm and application. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of displaying one or more alternative transcription to the user in response to the user’s rejection, using teaching of the multiple speech recognition models for multiple different languages as taught by Chao et al. for the benefit of generating the multiple different textual representations, wherein each of the multiple different textual representation is corresponding to a specific language and also to allow the system (Abkairov’s system) to be used by multilingual users and/or households while avoiding excess usage of computational and/or network resources when the user provides input in an alternate language, using teaching of displaying the plurality of the voice recognition results as taught by Yun et al. for the benefit of enabling the user to select one of the voice recognition results (Yun et al. [0135] The image processing apparatus 100 may display the information of the voice recognized at operation S506 on the display unit 130 at operation S508. When there is a plurality of voice recognition results, the first controller 160 may display a plurality of information on the display unit 130 to enable a user to select one of the voice recognition result, Fig. 8 S508 Display information of recognized voice, S510 Transmit command to electronic apparatus corresponding to recognized voice, S512 Perform operation corresponding to received command.)

6.	Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Abkairov (US 2017/0085696 A1) in view of Chao (US 2019/0318724 A1) and Khoe et al. (US 2014/0035823 A1.)

 	With respect to Claim 14, Abkairov in view of Chao et al. teach all the limitations of Claim 13 upon which Claim 14 depends. Abkairov in view of Chao et al. fail to explicitly teach 
 	wherein the second context information includes one or more respective languages of one or more keyboards of the user.  
	However, Khoe et al. teach
keyboard switch determiner 220 can determine one or more languages or candidates languages based on the set of contextual attributes. Keyboard switch determiner 220 in some embodiments can perform a heuristics calculation when determining the language(s) most likely to be the desired language to use in the composition-at-hand. Keyboard switch determiner 220 can use the set of contextual attributes in the calculation and assign a likelihood score to each candidate language.)
 	Abkairov, Chao et al. and Khoe et al. are analogous art because they are from a similar field of endeavor in the Signal Processing algorithm and application. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of displaying one or more alternative transcription to the user in response to the user’s rejection, using teaching of the multiple speech recognition models for multiple different languages as taught by Chao et al. for the benefit of generating the multiple different textual representations, wherein each of the multiple different textual representation is corresponding to a specific language and also to allow the system (Abkairov’s system) to be used by multilingual users and/or households while avoiding excess usage of computational and/or network resources when the user provides input in an alternate language, using teaching of keyboard switch determiner as taught by Khoe et al. for the benefit of determining one or more candidate languages (Khoe et al. [0040] keyboard switch determiner 220 can determine one or more languages or candidates languages based on the set of contextual attributes. Keyboard switch determiner 220 in some embodiments can perform a heuristics calculation when determining the language(s) most likely to be the desired language to use in the composition-at-hand. Keyboard switch determiner 220 can use the set of contextual attributes in the calculation and assign a likelihood score to each candidate language.)

7.	Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Abkairov (US 2017/0085696 A1) in view of Chao (US 2019/0318724 A1) and Chao et al. (US 2019/0318735 A1, hereinafter is Chao ‘735’)

 	With respect to Claim 16, Abkairov in view of Chao et al. teach all the limitation of Claim 13 upon which Claim 16 depends. Abkairov in view of Chao et al fail to explicitly 

	However, Chao et al. ‘735’ teach
 	wherein the second context information includes a location of the user (Chao et al. ‘735’ [0017] One or more candidate languages assigned to the user profile for a user can be based on information that is associated with the user and accessible to the automated assistant, such as, for example, emails, contact names, images that include text, location data, etc.)) 
 	Abkairov, Chao et al. and Chao et al. ‘735’ are analogous art because they are from a similar field of endeavor in the Signal Processing algorithm and application. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of displaying one or more alternative transcription to the user in response to the user’s rejection, using teaching of the multiple speech recognition models for multiple different languages as taught by Chao et al. for the benefit of generating the multiple different textual representations, wherein each of the multiple different textual representation is corresponding to a specific language and also to allow the system (Abkairov’s system) to be used by multilingual users and/or households while avoiding excess usage of computational and/or network resources when the user provides input in an alternate language, using teaching of the location data associated with the user as taught by Chao ‘735’ for the benefit of determining candidate languages of the user (Chao et al. ‘735’ [0043] One or more candidate languages assigned to the user profile for a user can be based on information that is associated with the user and accessible to the automated assistant, such as, for example, emails, contact names, images that include text, location data, etc.)) 

Allowable Subject Matter
8.	Claim 10 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
	Claim 11 is objected to as being dependent upon an objected claim(s) by virtue of their dependency. 
The following is an examiner’s statement of reasons for allowance: The prior art(s) taken alone or in combination fail(s) to teach the following element(s) in combination with the other recited elements in the claim(s).

 		causing the first recognition result to be displayed includes causing an underlined representation of the first recognition result to be displayed in accordance with determining that the first likelihood is less than the second threshold; and 
 		receiving the input indicative of user selection of the first recognition result includes receiving an input indicative of user selection of the underlined representation of the first recognition result.” as recited in Claim 10. 
	 
Conclusion
9.	The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure. 
a. 	Raichelgauz et al. (US 2017/0243583 A1.) In this reference, Raichelgauz et al. disclose a method/a system for translating the audio inputs to multiple languages.
b.	Mitchell et al. (US 2002/0099542 A1.) In this reference, Mitchell et al. teach highlighting the word if it is determined that the score for the word is less than the threshold. 
c.	Yamazaki et al. (US 8,868,431 B2.) In this reference, Yamazaki et al. teach identifying language of the utterance, and outputting a score showing a confidence level for each language. 

10.	THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

11.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to THUYKHANH LE whose telephone number is (571)272-6429.  The examiner can normally be reached on Mon-Fri: 9am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew C. Flanders can be reached 571-272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/THUYKHANH LE/Primary Examiner, Art Unit 2655