DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments and amendments in the Amendment filed March 28, 2022 (herein “Amendment”), with respect to the objection to the Specification has been fully considered and is persuasive.  The objection to the Specification has been withdrawn.
Applicant’s arguments and amendments in the Amendment with respect to the objections to claims 2-6 and 16-20 have been fully considered and are persuasive.  The objections to claims 2-6 and 16-20 have been withdrawn. 
 Applicant’s arguments and amendments in the Amendment with respect to the rejection to claims 2 and 4 under 35 U.S.C. 112(b) have been fully considered and are persuasive.  The rejection to claims 2 and 4 under 35 U.S.C. 112(b) has been withdrawn. 
Applicant’s arguments and amendments in the Amendment with respect to the rejection to claims 15-20 under 35 U.S.C. 101 have been fully considered and are persuasive.  The rejection to claims 15-20 under 35 U.S.C. 101 has been withdrawn. 
Applicant’s arguments and amendments in the Amendment with respect to the rejection to claims 15-20 under 35 U.S.C. 101 have been fully considered and are persuasive.  The rejection to claims 15-20 under 35 U.S.C. 101 has been withdrawn. 
Applicant’s arguments and amendments in the Amendment with respect to the rejection to claims 15-20 under 35 U.S.C. 101 have been fully considered and are persuasive.  The rejection to claims 15-20 under 35 U.S.C. 101 has been withdrawn. 
Applicant’s arguments and amendments in the Amendment with respect to the rejection to claims 1, 7-8 and 15, under 35 U.S.C. 102 and claims 2-6, 9-14 and 16-20 under 35 U.S.C. 103 have been fully considered and are persuasive to the extent that Applicant argues that primary reference Sung does not teach “identifying a keyword in the first recognition result.” However, as indicated on page 15 of the Office Action, the “identifying a keyword in the first recognition result” is taught by the Jagatheesan reference. Regarding the teachings of Jagatheesan, Applicant merely states that “the other reference Jagatheesan fails to cure the deficiencies of Sung” but does not rebut the citations to Jagatheesan given on page 15 of the Office Action. Accordingly, the rejection in reliance upon Jagatheesan is maintained, while claims 1, 9 and 15 are now  newly rejected under 35 U.S.C. 103 in view of Sung and Jagatheesan.

Claim Objections
Claims 3, 11 and 17, are objected to because of the following informalities:  all three claims recite “key word” (with a space – thus two words), yet the limitation from which “key word” has its antecedent basis in the independent claims is “keyword”. Therefore, for consistency, claims 3, 11 and 17 should amend “key word” to be “keyword” instead.  Appropriate correction is required.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 3-9, 11-15, 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Sung et al., (US 2013/0238336 A1, herein “Sung”), in view of Jagatheesan et al., (US 2015/0025890 A1, herein “Jagatheesan”).
Regarding claim 1, Sung teaches a data processing method, comprising (Sung para. [0043], fig. 3, process for performing speech recognition): 
obtaining media data (Sung para. [0044], audio is received, where the audio can be from various media types, and as designated in the figure 3, is input speech); 
outputting first media data to a first recognition module, and obtaining a first recognition result of the first media data, wherein the first media data is a part of the media data (Sung paras. [0046], and [0061] input audio is recognized by various language models respective to language recognizer components 212a-212n (where one of the components would be a first recognition module), where the input audio (media data) includes a part in a first language and a part in a second language, and where a first language model is used to recognize a first part (first media data), and where a first language candidate (first recognition result) is produced for the first part using a recognition score produced by the first language model); 
determining whether the first recognition result satisfied a preset condition (Sung para. [0061], when a recognition score is produced in a first language model that is disproportionate to recognition scores produced by the first language model bordering the first part, then this condition (preset into the system as it is disclosed as rule governing the method) indicates that the first portion is in a different language than the second portion, and thus satisfied the preset condition of the disproportionate recognition scores for the first language model);
outputting the second media data to a second recognition module, and obtaining a second recognition result of the second media data, wherein the second media data is a part of the media data (Sung paras. [0046], and [0061] input audio is recognized by various language models respective to language recognizer components 212a-212n (where one of the components would be a second recognition module), where the input audio (media data) includes a part in a first language and a part in a second language, and where a second language model is used to recognize a second part (second media data), and where a second language candidate (second recognition result) is produced for the second part using a recognition score produced by the second language model); and 
obtaining a final recognition result of the media data based on the first recognition result and the second recognition result (Sung paras. [0061] and [0053], data output (final recognition result) corresponds to (thus based on) the first recognition candidate and second recognition candidate).
Sung does not explicitly teach the preset condition including identifying a keyword in the first recognition result;
in response to the first recognition result satisfying the preset condition, determining second media data.
Jagatheesan teaches the preset condition including identifying a keyword in the first recognition result (Jagatheesan paras. [0061]-[0064], the threshold determining whether the speech needs to be forwarded to another speech recognizer is a function of the presence of certain keywords in the voice command);  
in response to the first recognition result satisfying the preset condition, determining second media data (Jagatheesan paras. [0049], [0060]-[0064], and [0068], in an HSR (hierarchical automatic speech recognition system) a local small automatic speech recognition determines, using a decision function, that it is necessary to forward the input voice command to another automatic speech recognition at a higher level in the hierarchy until an HSR that is capable of processing the speech for recognition is found, and the speech is processed there (thus the forwarded speech to the appropriate level HSR being the second media data, which may be a portion of the original input speech)).
Therefore, taking the teachings of Sung and Jagatheesan together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the process for speech recognition as disclosed in Sung with an hierarchical sending of the speech data on based on the result of a first local ASR as it pertained to a preset condition as disclosed on Jagatheesan at least because doing so would reduce latency and use consumer’s edge resources for speech-to-text-to-action translation (Jagatheesan para. [0052]).
Regarding claims 3, 11 and 17, Sung teaches wherein the comprises: data in the first recognition result that is unrecognized by the first recognition module (Sung paras. [0061]-[0062], the English recognizer provides high confidence scores (on recognition) for the first English part of an input phrase, but low confidence scores for the second part (which is in French), thus identifying the second part as being unrecognized by the first English language recognizer); and
when the data in the first recognition result that is unrecognized by the first recognition module (Sung paras. [0061]-[0062], the English recognizer provides high confidence scores (on recognition) for the first English part of an input phrase, but low confidence scores for the second part (which is in French), thus identifying the second part as being unrecognized by the first English language recognizer, this second part would have French words in it (data unrecognized)), [the processor is further configured for: - claim 11/ the program instructions further cause the processor to perform: - claim 17]
determining the second media data includes determining the data unrecognized by the first recognition module as the second media data (Sung paras. [0061]-[0062], the English recognizer provides high confidence scores (on recognition) for the first English part of an input phrase, but low confidence scores for the second part (which is in French), thus identifying the second part as being unrecognized by the first English language recognizer, where the second part then becomes the part (the second media data) that the French language recognizer is used), and 
obtaining the final recognition result of the media data based on the first recognition result and the second recognition result includes (Sung paras. [0061] and [0053], data output (final recognition result) corresponds to (thus based on) the first recognition candidate and second recognition candidate) determining a location of data unrecognizable by the first recognition module in the first recognition result (Sung paras. [0061]-[0062], because the latter part of the spoken phrase includes the French phrase “Cathedrale Saint-Maclou de Pontoise” where the first word of the phrase begins a second part where the first English language recognizer has a low confidence score (and thus is unrecognizable by the English (first) recognition module, where that low confidence score occurs is the designated location in the first recognition result), and placing the second recognition result in the location of the data unrecognizable by the first recognition module in the first recognition result, thereby obtaining the final recognition result of the media data (Sung paras. [0061]-[0062] the data output corresponds to the first recognition candidate which is the first recognition candidate (from the English recognizer) for the first part, and corresponds also to the second recognition candidate being the second recognition candidate (from the French recognizer) for the second part – the one with the low confidence score in the first (English) recognizer, thus unrecognizable).
While Sung teaches that foreign words can be identified via the contrast in confidence scores returned from a particular language recognizer, Sung does not call these portions of the input speech a “keyword,” and therefore, Sung does not explicitly teach that the “keyword comprises” the data.
Sung further does not explicitly teach when the key word is the data in the first recognition result.
Jagatheesan teaches the keyword comprises (Jagatheesan paras. [0061], [0064] and [0068], the presence of certain keywords/keyword spotting is an element of the function that determines which ASR to use).
Jagatheesan further teaches when the key word is the data in the first recognition result (Jagatheesan paras. [0061]-[0064], the threshold determining whether the speech needs to be forwarded to another speech recognizer is a function of the presence of certain keywords in the voice command).
Therefore, taking the teachings of Sung and Jagatheesan together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the process for speech recognition as disclosed in Sung based on a keyword spotted as disclosed on Jagatheesan at least because doing so would reduce latency and use consumer’s edge resources for speech-to-text-to-action translation (Jagatheesan para. [0052]).
Regarding claims 4, 12 and 18, while Sung discloses determining a second recognition module to which [the identified word in the input speech] corresponds from a plurality of candidate recognition modules (Sung paras. [0061]-[0062], because the French recognizer has a high confidence score for the latter part (the French part) of the input speech, the French recognizer is identified as the appropriate recognizer for that latter part of the input, from among all other language recognizers present in the system), and at least regarding claims 12 and 18, Sung teaches the processor is further configured for (Sung paras. [0036], [0043] and [0076]-[0078], the process for speech recognition performed by the speech recognizer 210 as a processor), Sung does not disclose the remainder of claims 4, 12 and 18.
Jagatheesan teaches before outputting the second media data to the second recognition module, [the method further comprises: - claim 4 only / the processor is further configured for: - claims 12 and 18] determining the keyword in the first recognition result from a plurality of candidate keywords (Jagatheesan paras. [0061]-[0064], [0068], and [0076]-[0078], the threshold determining whether the speech needs to be forwarded to another speech recognizer is a function of the presence of certain keywords in the voice command, where if it is determined, then the speech is forwarded to another HSR, where examples of such keywords are those words in a command associated with a particular actionable commands, such as “Fireplace” to send the command to a smart fireplace or “search” to be sent to a search engine); and
a second recognition module to which the keyword corresponds (Jagatheesan paras. [0061], [0064] and [0068], the HSR (speech recognition) level that is the most capable of processing the speech, and thus, to which the keyword corresponds, since the presence of a keyword in the speech is a factor in determining whether a particular HSR level is the right one for processing).
Therefore, taking the teachings of Sung and Jagatheesan together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the process for speech recognition as disclosed in Sung with an hierarchical sending of the speech data on based on the result of a first local ASR as it pertained to a preset condition as disclosed on Jagatheesan at least because doing so would reduce latency and use consumer’s edge resources for speech-to-text-to-action translation (Jagatheesan para. [0052]).
Regarding claims 5, 13 and 19, Sung teaches wherein: in response to the first recognition result satisfying the preset condition, [the processor is further configured for: - claims 13 and 19] [the determining the second media data includes: - claim 5] determining data at a preset location with respect to the in the first media data as the second media data (Sung para. [0061], where in the audio recognized for a first language, the recognition score is disproportionate, that portion is identified as the location -  a second part (the second media data) - at which the input audio should be processed by another language recognizer instead).
While Sung teaches that foreign words can be identified via the contrast in confidence scores returned from a particular language recognizer, Sung does not call these portions of the input speech a “keyword.”
Jagatheesan teaches keyword (Jagatheesan paras. [0061], [0064] and [0068], the presence of certain keywords/keyword spotting is an element of the function that determines which ASR to use).
Therefore, taking the teachings of Sung and Jagatheesan together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the process for speech recognition as disclosed in Sung based on a keyword spotted as disclosed on Jagatheesan at least because doing so would reduce latency and use consumer’s edge resources for speech-to-text-to-action translation (Jagatheesan para. [0052]).
Regarding claims 6, 14 and 20, Sung teaches [wherein: the obtaining the final recognition result of the media data based on the first recognition result and the second recognition result includes (Sung paras. [0061] and [0053], data output (final recognition result) corresponds to (thus based on) the first recognition candidate and second recognition candidate) – claim 6 / the processor is further configured for – claims 14 and 20]: 
determining a preset location with respect to the in the first recognition result (Sung paras. [0061]-[0062], because the latter part of the spoken phrase includes the French phrase “Cathedrale Saint-Maclou de Pontoise” where the first word of the phrase begins a second part where the first English language recognizer has a low confidence score, where that low confidence score occurs is the designated location in the first recognition result), and placing the second recognition result in the preset location with respect to the in the first recognition result, thereby obtaining the final recognition result of the media data (Sung paras. [0061]-[0062] the data output corresponds to the first recognition candidate which is the first recognition candidate (from the English recognizer) for the first part, and corresponds also to the second recognition candidate being the second recognition candidate (from the French recognizer) for the second part).
While Sung teaches that foreign words can be identified via the contrast in confidence scores returned from a particular language recognizer, Sung does not call these portions of the input speech a “keyword.” 
Jagatheesan teaches keyword (Jagatheesan paras. [0061], [0064] and [0068], the presence of certain keywords/keyword spotting is an element of the function that determines which ASR to use).
Therefore, taking the teachings of Sung and Jagatheesan together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the process for speech recognition as disclosed in Sung based on a keyword spotted as disclosed on Jagatheesan at least because doing so would reduce latency and use consumer’s edge resources for speech-to-text-to-action translation (Jagatheesan para. [0052]).
Regarding claim 7, Sung teaches wherein: the media data, the first media data, and the second media data are same (Sung para. [0046], the input audio (the media data) is recognized using multiple language recognizer components 212a-212n employing different language models, therefore, the same input audio is input and processed by a first language recognizer component 212a (thus a first media data processed by a first language recognizer component 212a) as it is also input into additional language recognizer components (thus the same media data being input as second media data to an additional language recognizer)).
Regarding claim 8, Sung teaches wherein the obtaining the final recognition result of the media data based on the first recognition result and the second recognition result includes: obtaining the first recognition result by using the first recognition module to recognize a first portion of the media data, obtaining the second recognition result by using the second recognition module to recognize a second portion of the media data, and combining the first recognition result and the second recognition result to obtain the final recognition result of the media data; or obtaining the first recognition result by using the first recognition module to recognize the media data, obtaining the second recognition result by using the second recognition module to recognize the media data, matching the first recognition result and the second recognition result to obtain a multi-language matching degree order, and determining the final recognition result of the media data based on the multi-language matching degree order (Sung paras. [0061]-[0062], given the limitations in this claim are recited in the alternative or, and considering the first presented limitation, Sung teaches that both an English and French language recognition component respectively process the phrase “directions to the Cathedrale Saint-Maclou de Pontoise” and where from the disproportionate scores of the English language recognizer between the first English part “directions to the” and the French part “Cathedrale Saint-Maclou de Pontoise” the system determines which portion from the multi-lingual input should be selected from which language recognizer to arrive at the data output, where the English language recognizer is used for the first part “directions to the” and the French language recognizer is used for the second part “Cathedrale Saint-Maclou de Pontoise”).
Regarding claim 9, Sung teaches an electronic apparatus, comprising: a processor, the processor being configured for (Sung paras. [0036], [0043] and [0076]-[0078], speech recognition system being a computer system such as a computer having a processor that the speech recognizer 210 is implemented on, where the process for speech recognition performed by the speech recognizer 210 is detailed in fig. 3): 
obtaining media data (Sung para. [0044], audio is received, where the audio can be from various media types, and as designated in the figure 3, is input speech); 
outputting first media data to a first recognition module, and obtaining a first recognition result of the first media data, wherein the first media data is a part of the media data (Sung paras. [0046], and [0061] input audio is recognized by various language models respective to language recognizer components 212a-212n (where one of the components would be a first recognition module), where the input audio (media data) includes a part in a first language and a part in a second language, and where a first language model is used to recognize a first part (first media data), and where a first language candidate (first recognition result) is produced for the first part using a recognition score produced by the first language model); 
determining whether the first recognition result satisfied a preset condition (Sung para. [0061], when a recognition score is produced in a first language model that is disproportionate to recognition scores produced by the first language model bordering the first part, then this condition (preset into the system as it is disclosed as rule governing the method) indicates that the first portion is in a different language than the second portion, and thus satisfied the preset condition of the disproportionate recognition scores for the first language model);
outputting second media data to a second recognition module, and obtaining a second recognition result of the second media data, wherein the second media data is at least a part of the media data (Sung paras. [0046], and [0061] input audio is recognized by various language models respective to language recognizer components 212a-212n (where one of the components would be a second recognition module), where the input audio (media data) includes a part in a first language and a part in a second language, and where a second language model is used to recognize a second part (second media data), and where a second language candidate (second recognition result) is produced for the second part using a recognition score produced by the second language model); and 
obtaining a final recognition result of the media data based on the first recognition result and the second recognition result (Sung paras. [0061] and [0053], data output (final recognition result) corresponds to (thus based on) the first recognition candidate and second recognition candidate); and
a memory, configured to store the first recognition result, the second recognition result and the final recognition result (Sung paras. [0076]-[0079], memory which is part of the computing device implementing the speech recognizer, having instructions for execution stored therein, and processed by the processor).
While Sung teaches that the search recognizer is implemented on computing device 500 which includes a processor and memory, that the memory stores information within the computing device, and that the processor processes instructions for execution within the computing device including instructions stored in the memory, thus suggesting that the memory is configured to store the results of the speech recognizer are also stored, even if temporarily in the process of calculation, in a memory, Sung does not explicitly or necessarily teach these limitations. However, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the memory of Sung to be configured to store the recognition results that are output from the respective language recognition components and a final data output at least because such a modification for a computer system to store results of signal processing calculations would be combining prior art elements according to known methods to yield predictable results. See MPEP 2143(I)(A).
Further, Sung does not explicitly teach the preset condition including identifying a keyword in the first recognition result;
in response to the first recognition result satisfying the preset condition, determining second media data.
Jagatheesan teaches the preset condition including identifying a keyword in the first recognition result (Jagatheesan paras. [0061]-[0064], the threshold determining whether the speech needs to be forwarded to another speech recognizer is a function of the presence of certain keywords in the voice command);  
in response to the first recognition result satisfying the preset condition, determining second media data (Jagatheesan paras. [0049], [0060]-[0064], and [0068], in an HSR (hierarchical automatic speech recognition system) a local small automatic speech recognition determines, using a decision function, that it is necessary to forward the input voice command to another automatic speech recognition at a higher level in the hierarchy until an HSR that is capable of processing the speech for recognition is found, and the speech is processed there (thus the forwarded speech to the appropriate level HSR being the second media data, which may be a portion of the original input speech)).
Therefore, taking the teachings of Sung and Jagatheesan together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the process for speech recognition as disclosed in Sung with an hierarchical sending of the speech data on based on the result of a first local ASR as it pertained to a preset condition as disclosed on Jagatheesan at least because doing so would reduce latency and use consumer’s edge resources for speech-to-text-to-action translation (Jagatheesan para. [0052]).
Regarding claim 15, Sung teaches a non-transitory computer-readable storage medium, comprising: a processor and a memory, the memory (Sung paras. [0043], and [0091], fig. 3, techniques described, including a process for performing speech recognition, implemented by one or more computer programs (instructions) executable on a programmable system with a programmable processor, including a computer readable medium (memory) used to provide the machine instructions to the programmable processor) containing program instructions for causing a computer to perform the method of (Sung paras. [0043], and [0091], fig. 3, techniques described, including a process for performing speech recognition, implemented by one or more computer programs (instructions) executable on a programmable system with a programmable processor, including a computer readable medium used to provide the machine instructions to the programmable processor): 
receiving media data (Sung para. [0044], audio is received, where the audio can be from various media types, and as designated in the figure 3, is input speech); 
outputting first media data to a first recognition module, and obtaining a first recognition result of the first media data, wherein the first media data is a part of the media data (Sung paras. [0046], and [0061] input audio is recognized by various language models respective to language recognizer components 212a-212n (where one of the components would be a first recognition module), where the input audio (media data) includes a part in a first language and a part in a second language, and where a first language model is used to recognize a first part (first media data), and where a first language candidate (first recognition result) is produced for the first part using a recognition score produced by the first language model); 
determining whether the first recognition result satisfied a preset condition (Sung para. [0061], when a recognition score is produced in a first language model that is disproportionate to recognition scores produced by the first language model bordering the first part, then this condition (preset into the system as it is disclosed as rule governing the method) indicates that the first portion is in a different language than the second portion, and thus satisfied the preset condition of the disproportionate recognition scores for the first language model);
outputting second media data to a second recognition module, and obtaining a second recognition result of the second media data, wherein the second media data is a part of the media data (Sung paras. [0046], and [0061] input audio is recognized by various language models respective to language recognizer components 212a-212n (where one of the components would be a second recognition module), where the input audio (media data) includes a part in a first language and a part in a second language, and where a second language model is used to recognize a second part (second media data), and where a second language candidate (second recognition result) is produced for the second part using a recognition score produced by the second language model); and 
obtaining a final recognition result of the media data based on the first recognition result and the second recognition result (Sung paras. [0061] and [0053], data output (final recognition result) corresponds to (thus based on) the first recognition candidate and second recognition candidate).
Sung does not explicitly teach the preset condition including identifying a keyword in the first recognition result;
in response to the first recognition result satisfying the preset condition, determining second media data.
Jagatheesan teaches the preset condition including identifying a keyword in the first recognition result (Jagatheesan paras. [0061]-[0064], the threshold determining whether the speech needs to be forwarded to another speech recognizer is a function of the presence of certain keywords in the voice command);  
in response to the first recognition result satisfying the preset condition, determining second media data (Jagatheesan paras. [0049], [0060]-[0064], and [0068], in an HSR (hierarchical automatic speech recognition system) a local small automatic speech recognition determines, using a decision function, that it is necessary to forward the input voice command to another automatic speech recognition at a higher level in the hierarchy until an HSR that is capable of processing the speech for recognition is found, and the speech is processed there (thus the forwarded speech to the appropriate level HSR being the second media data, which may be a portion of the original input speech)).
Therefore, taking the teachings of Sung and Jagatheesan together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the process for speech recognition as disclosed in Sung with an hierarchical sending of the speech data on based on the result of a first local ASR as it pertained to a preset condition as disclosed on Jagatheesan at least because doing so would reduce latency and use consumer’s edge resources for speech-to-text-to-action translation (Jagatheesan para. [0052]).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Choi, US 2005/0182628 A1, directed towards domain based speech recognition that uses recognition models specific to domains. Choi teaches that a first recognition is performed to detect domain keywords in a speech input, and then further recognition processing is performed using a domain specific recognition model.
Wang, US 2019/0294674 A1, directed towards a sentence meaning recognition method capable of use with recognizing Chinese phonetics. Wang performs its voice recognition using multiple recognition devices and their outputs to determine a final output.
Sharma, US 2003/0236664 A1, directed towards multi-pass recognition of spoken dialogue. Sharma teaches that an input utterance (speech) is analyzed by multiple speech recognizers and in the process of reaching a final speech recognition result, keywords present in the input utterance are analyzed. 
Dai et al., US 9,959,865 B2, directed towards obtaining and recognizing voice information using two different voice recognition models. Dai also is directed towards recognizing first information by a second recognition model subject to a preset condition, but does not disclose that a final recognition result is determined by considering portions recognized by one recognition model but not another.
VanBlon, US 9,620,122 B2, directed towards recognizing speech in a hybrid way using multiple speech recognizers. VanBlon teaches recognizing what portions of input speech have not been confidently detected by a local speech detector, and sending those portions to a remote recognition, and then merging the local and remote results to arrive at an output result. VanBlon is directed towards misrecognition generally however, and does not appear to teach or suggest multi-lingual misrecognition aspects.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHELLE M KOETH whose telephone number is (571)272-5908. The examiner can normally be reached Monday-Friday, 09:30-18:30 EDT/EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

MICHELLE M. KOETH
Primary Examiner
Art Unit 2656



/MICHELLE M KOETH/Primary Examiner, Art Unit 2656