Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 12, 13 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Adams (US 20150255069 A1) in further view of Deshmukh (US 20130018649 A1) in further view of Hoffmeister (US 10332508 B1).
With respect to claim 1 Adams teaches A multi-lingual speech recognition and theme-semanteme analysis method, comprising:  
5by a speech recognizer, obtaining an alphabet string corresponding to a voice input signal ([0061] These users may pronounce foreign word using a combination of pronunciations from multiple languages including the user's language of origin. The user may pronounce a portion of the foreign word in a first language and other portions in one or more different languages. For example, the user may pronounce a first portion of the band name, Kraftwerk, in English (e.g., K R AE F T) and a second portion in German (e.g. V EH R K.) ),
by the speech recognizer, determining that the alphabet string corresponds to a plurality of original words according to a multi-lingual vocabulary ([0061] These users may pronounce foreign word using a combination of pronunciations from multiple languages including the user's language of origin. The user may pronounce a portion of the foreign word in a first language and other portions in one or more different languages. For example, the user may pronounce a first portion of the band name, Kraftwerk, in English (e.g., K R AE F T) and a second portion in German (e.g. V EH R K.)); 
by the speech recognizer, forming a sentence ([0045] The speech recognition engine 318 may also compute scores of branches of the paths based on language models or grammars. Language modeling involves determining scores for what words are likely to be used together to form coherent words and sentences) according to the multi-lingual 10vocabulary ([0061] These users may pronounce foreign word using a combination of pronunciations from multiple languages including the user's language of origin. The user may pronounce a portion of the foreign word in a first language and other portions in one or more different languages. For example, the user may pronounce a first portion of the band name, Kraftwerk, in English (e.g., K R AE F T) and a second portion in German (e.g. V EH R K.)) and the plurality of original words (0061] Some users may have a first language of origin but reside in a country (or operate an ASR device) where the user communicates in a different language. These users may pronounce foreign word using a combination of pronunciations from multiple languages including the user's language of origin. The user may pronounce a portion of the foreign word in a first language and other portions in one or more different languages. For example, the user may pronounce a first portion of the band name, Kraftwerk, in English (e.g., K R AE F T) and a second portion in German (e.g. V EH R K.)); 
[[by a sematic analyzer, according to the sentence and a theme vocabulary-semantic relationship data set, selectively executing a correction procedure to generate a corrected sentence, an analysis state determining procedure, or a procedure of outputting the sentence; 
by the sematic analyzer, when determining that the correction procedure successes, 15outputting the corrected sentence; and 
by the sematic analyzer, when determining that the correction procedure fails, executing the analysis state determining procedure to selectively output a determined result]].
Adams does not teach 
by a sematic analyzer, according to the sentence and a theme vocabulary-semantic relationship data set, selectively executing a correction procedure to generate a corrected sentence, an analysis state determining procedure, or a procedure of outputting the sentence; 
by the sematic analyzer, when determining that the correction procedure successes, 15outputting the corrected sentence; and 
by the sematic analyzer, when determining that the correction procedure fails, executing the analysis state determining procedure to selectively output a determined result
Deshmukh teaches 
by a sematic analyzer, according to the sentence and a theme vocabulary-semantic relationship data set, selectively executing a correction procedure to generate a corrected sentence, an analysis state determining procedure, or a procedure of outputting the sentence ([0017] The grammatically correct sentences 206 from the sentence verifier 205 can be added to an existing statistical language model, for example, in a natural language processing application such as a user query interface or an automatic speech recognition application. Incorrect variations 302 that fail the testing of the sentence verifier 205 can be discarded or saved for later study.); 
by the sematic analyzer, when determining that the correction procedure successes ([0008]The sentence verifier may grammatically test each candidate sentence using an existing language model, a syntactic parser, and/or a grammar checker), 15outputting the corrected sentence ([0017] The grammatically correct sentences 206 from the sentence verifier 205 can be added to an existing statistical language model, for example, in a natural language processing application such as a user query interface or an automatic speech recognition application. Incorrect variations 302 that fail the testing of the sentence verifier 205 can be discarded or saved for later study); and 
[[by the sematic analyzer, when determining that the correction procedure fails, executing the analysis state determining procedure to selectively output a determined result.
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Adam in view of Deshmukh to incorporate by a sematic analyzer, according to the sentence and a theme vocabulary-semantic relationship data set, selectively executing a correction procedure to generate a corrected sentence, an analysis state determining procedure, so that a sentence generator computes candidate sentence that are not restricted to existing sentence databases (Deshmukh,   [0006]).
Neither Adam nor Deshmukh teach by the sematic analyzer, when determining that the correction procedure fails, executing the analysis state determining procedure to selectively output a determined result.
Hoffmeister teaches by the sematic analyzer, when determining that the correction procedure fails (Col 20 ll47-49 The DNN may output a yes/no indication (illustrated in equation (2) as [0,1] thus classifying the sentence as correct or incorrect.), executing the analysis state determining procedure to selectively output a determined result (Col 20 ll50-56 The DNN may also output a probability, which may be used as a confidence of the sentence being correct. Thus, as illustrated in FIG. 12, the feature vector y.sub.sentence 1134 may be input into trained classifier G 1202, which will then output a confirmation 1206 that that the sentence/ASR result is correct or incorrect and/or may output a confidence score 1204 indicating the classifier's confidence of the correctness of the sentence.) 
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify the teachings of Adams, Deshmukh to include the teachings of Hoffmeister motivation being that a feature vectors from RNN networks are used to confirm accuracy of ASR results (Hoffmeister, Col 16 ll 14-34).

With respect to claim 12 Adams teaches A multi-lingual speech recognition and theme-semanteme analysis device, comprising: 
a voice input interface configured to receive a voice input signal ([0025] As illustrated in FIG. 3, the ASR device 302 may include an audio capture device 304 for capturing spoken utterances for processing. The audio capture device 304 may include a microphone or other suitable component for capturing sound.);  
10an output interface configured to output a sentence, a corrected sentence or a determined result ([0027] The ASR device 302 includes input device(s) 306 and output device(s) 307. A variety of input/output device(s) may be included in the device. Example input devices 306 include an audio capture device 304, such as a microphone (pictured as a separate component), a touch input device, keyboard, mouse, stylus or other input device. Example output devices 307 include a visual display, tactile display, audio speakers, headphones, printer or other output device and [0038] The language information is used to adjust the acoustic score by considering what sounds and/or words are used in context with each other, thereby improving the likelihood that the ASR module outputs speech results that make sense grammatically.); and 
a processor connected with the voice input interface and the output interface ([0044] Following ASR processing, the ASR results may be sent by the ASR module 314 to another component of the ASR device 302, such as the controller/processor 308 for further processing (such as execution of a command included in the interpreted text) or to the output device 307 for sending to an external device.), and comprising: 
a speech recognizer configured to obtain an alphabet string corresponding to a 15voice input signal according to a pronunciation-alphabet table, to determine that the alphabet string corresponds to a plurality of original words according to a multi-lingual vocabulary ([0061] These users may pronounce foreign word using a combination of pronunciations from multiple languages including the user's language of origin. The user may pronounce a portion of the foreign word in a first language and other portions in one or more different languages. For example, the user may pronounce a first portion of the band name, Kraftwerk, in English (e.g., K R AE F T) and a second portion in German (e.g. V EH R K.), and to form a sentence according to the multi-lingual vocabulary and the plurality of original words ((0061] Some users may have a first language of origin but reside in a country (or operate an ASR device) where the user communicates in a different language. These users may pronounce foreign word using a combination of pronunciations from multiple languages including the user's language of origin. The user may pronounce a portion of the foreign word in a first language and other portions in one or more different languages. For example, the user may pronounce a first portion of the band name, Kraftwerk, in English (e.g., K R AE F T) and a second portion in German (e.g. V EH R K.) ); and a 
[[sematic analyzer connected with the speech recognizer, and configured to 20selectively execute a correction procedure to generate a corrected sentence, an analysis state determining procedure or a procedure of outputting the sentence according to the sentence and a theme vocabulary-semantic relationship data set, to output the corrected sentence when the correction procedure successes, and to execute the analysis state determining procedure to selectively output a determined result 25when the correction procedure fails]].
Adams does not teach sematic analyzer connected with the speech recognizer, and configured to 20selectively execute a correction procedure to generate a corrected sentence, an analysis state determining procedure or a procedure of outputting the sentence according to the sentence and a theme vocabulary-semantic relationship data set, to output the corrected sentence when the correction procedure successes, and to execute the analysis state determining procedure to selectively output a determined result 25when the correction procedure fails.
Deshmukh teaches sematic analyzer connected with the speech recognizer, and configured to 20selectively execute a correction procedure to generate a corrected sentence, an analysis state determining procedure or a procedure of outputting the sentence according to the sentence and a theme vocabulary-semantic relationship data set, to output the corrected sentence ([0017] The grammatically correct sentences 206 from the sentence verifier 205 can be added to an existing statistical language model, for example, in a natural language processing application such as a user query interface or an automatic speech recognition application. Incorrect variations 302 that fail the testing of the sentence verifier 205 can be discarded or saved for later study) when the correction procedure successes ([0008] The sentence verifier may grammatically test each candidate sentence using an existing language model, a syntactic parser, and/or a grammar checker), and to [[execute the analysis state determining procedure to selectively output a determined result 25when the correction procedure fails]].
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Adam in view of Deshmukh to incorporate by a sematic analyzer, according to the sentence and a theme vocabulary-semantic relationship data set, selectively executing a correction procedure to generate a corrected sentence, an analysis state determining procedure, so that a sentence generator computes candidate sentence that are not restricted to existing sentence databases (Deshmukh,  [0006]).

Neither Adam nor Deshmukh teach by the sematic analyzer, when determining that the correction procedure fails, executing the analysis state determining procedure to selectively output a determined result 
Hoffmeister teaches by the sematic analyzer, when determining that the correction procedure fails (Col 20 ll47-49 The DNN may output a yes/no indication (illustrated in equation (2) as [0,1] thus classifying the sentence as correct or incorrect.), executing the analysis state determining procedure to selectively output a determined result (Col 20 ll50-56 The DNN may also output a probability, which may be used as a confidence of the sentence being correct. Thus, as illustrated in FIG. 12, the feature vector y.sub.sentence 1134 may be input into trained classifier G 1202, which will then output a confirmation 1206 that that the sentence/ASR result is correct or incorrect and/or may output a confidence score 1204 indicating the classifier's confidence of the correctness of the sentence.) 
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify the teachings of Adams, Deshmukh to include the teachings of Hoffmeister motivation being that a feature vectors from RNN networks are used to confirm accuracy of ASR results (Hoffmeister, Col 16 ll 14-34).

With respect to claim 13 Adams teaches  further comprising a memory, wherein the memory is electrically connected with the processor ([0026] Computer instructions for processing by the controller/processor 308 for operating the ASR device 302 and its various components may be executed by the controller/processor 308 and stored in the memory 310, storage 312, external device, or in memory/storage included in the ASR module 314 discussed below), and stores the pronunciation-alphabet table ([0028]The recognition score may be based on a number of factors including, for example, the similarity of the sound in the utterance to models for language sounds (e.g., an acoustic model), and the likelihood that a particular word which matches the sounds would be included in the sentence at the specific location (e.g., using a language model or grammar).), the multi-lingual vocabulary ([0061] These users may pronounce foreign word using a combination of pronunciations from multiple languages including the user's language of origin. The user may pronounce a portion of the foreign word in a first language and other portions in one or more different languages. For example, the user may pronounce a first portion of the band name, Kraftwerk, in English (e.g., K R AE F T) and a second portion in German (e.g. V EH R K.) and [[the theme vocabulary-semantic relationship data set]].  
Adams does not teach the theme vocabulary-semantic relationship data set.
Deshmukh teaches the theme vocabulary-semantic relationship data set ([0015] FIG. 2 shows the basic architecture of one specific embodiment for generating semantically similar sentences for a statistical language model, and FIG. 3 shows an example of the text flows through such an embodiment. Initially, a given input sentence 201--in this case, "I want to change my home address."--is input to a semantic class generator 202 that provides a set of corresponding semantically similar words 203 for each word in the input sentence 201.)
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Adam in view of Deshmukh to incorporate by a sematic analyzer, according to the sentence and a theme vocabulary-semantic relationship data set, selectively executing a correction procedure to generate a corrected sentence, an analysis state determining procedure, so that a sentence generator computes candidate sentence that are not restricted to existing sentence databases (Deshmukh,  [0006]).
With respect to claim 14 Adams teaches wherein the processor is configured to have a communication connection with a memory ([0026] Computer instructions for processing by the controller/processor 308 for operating the ASR device 302 and its various components may be executed by the controller/processor 308 and stored in the memory 310, storage 312, external device, or in memory/storage included in the ASR module 314 discussed below), and to obtain the pronunciation-alphabet table ([0028]The recognition score may be based on a number of factors including, for example, the similarity of the sound in the utterance to models for language sounds (e.g., an acoustic model), and the likelihood that a particular word which matches the sounds would be included in the sentence at the specific location (e.g., using a language model or grammar).), the multi-lingual vocabulary ([0061] These users may pronounce foreign word using a combination of pronunciations from multiple languages including the user's language of origin. The user may pronounce a portion of the foreign word in a first language and other portions in one or more different languages. For example, the user may pronounce a first portion of the band name, Kraftwerk, in English (e.g., K R AE F T) and a second portion in German (e.g. V EH R K.)  and [[the theme vocabulary-semantic relationship data set from the memory]].  
Adams does not teach the theme vocabulary-semantic relationship data set from the memory.
Deshmukh teaches the theme vocabulary-semantic relationship data set from the memory ([0015] FIG. 2 shows the basic architecture of one specific embodiment for generating semantically similar sentences for a statistical language model, and FIG. 3 shows an example of the text flows through such an embodiment. Initially, a given input sentence 201--in this case, "I want to change my home address."--is input to a semantic class generator 202 that provides a set of corresponding semantically similar words 203 for each word in the input sentence 201 and [0006] Also note that the generated semantically similar sentences are not restricted to be selected from an existing sentence database [ memory] )
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Adam in view of Deshmukh to incorporate by a sematic analyzer, according to the sentence and a theme vocabulary-semantic relationship data set, selectively executing a correction procedure to generate a corrected sentence, an analysis state determining procedure, so that a sentence generator computes candidate sentence that are not restricted to existing sentence databases (Deshmukh,  [0006]).

With respect to claims 2 and 15 Adams and Deshmukh do not teach determining an error rate of the plurality of converted words according to the sentence and the theme vocabulary-semantic relationship data set; when the error rate is in a first error rate range, outputting the sentence; when the error rate is in a first error rate range, outputting the sentence; and when the error rate is in a third error rate range, executing the correction procedure.
Hoffmeister teaches
determining an error rate of the plurality of converted words according to the sentence 25and the theme vocabulary-semantic relationship data set (Col 20 ll 39-52 To confirm whether a sentence is correct, the final set of hierarchical features may be input into and classified by a DNN following the encoder/classifier approach …The DNN may also output a probability, which may be used as a confidence of the sentence being correct." Col 9 ll4-16 The NLU knowledge exchange 272 includes a databases of devices (274a-274n) identifying domains associated with specific devices. For example, the device 110 may be associated with domains for music, telephony, calendaring, contact lists, and device-specific communications, but not video ); 
when the error rate is in a first error rate range, outputting the sentence (Col 20 ll 39-52 To confirm whether a sentence is correct, the final set of hierarchical features may be input into and classified by a DNN following the encoder/classifier approach… The DNN may output a yes/no indication (illustrated in equation (2) as [0,1] thus classifying the sentence as correct or incorrect...If the confidence score 1204 exceeds a threshold, the system may determine that the sentence is correct.); 
when the error rate is in a first error rate range, outputting the sentence (Col 20 ll 39-52 To confirm whether a sentence is correct, the final set of hierarchical features may be input into and classified by a DNN following the encoder/classifier approach… The DNN may output a yes/no indication (illustrated in equation (2) as [0,1] thus classifying the sentence as correct or incorrect...If the confidence score 1204 exceeds a threshold, the system may determine that the sentence is correct.); 
and when the error rate is in a third error rate range, executing the correction procedure (Col 16 ll 44-48 If the result is not correct (or has a confidence score below the threshold) the system may request a user to restate an utterance or present the ASR results to the user for confirmation (e.g., outputting “please restate your request” or “you said ‘play music by Queen.’ Is that correct?”).)
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify the teachings of Adams, Deshmukh to include the teachings of Hoffmeister to execute error correction using  feature vectors from RNN networks to confirm accuracy of ASR results (Hoffmeister, Col 16 ll 14-34).

Claims 4 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over, Adams, Deshmukh and  Hoffmeister, as applied to claims 2 and 15 respectively,  and in further view of  Yoo (US-20200126534-A1)
With respect to claim 4 Adams, Deshmukh and Hoffmeister do not teach  selecting one of a plurality of prestored theme vocabulary-semantic relationship data sets to serve as the theme vocabulary-semantic relationship data set; wherein the plurality of prestored theme vocabulary-semantic relationship data sets respectively correspond to different languages, and the theme vocabulary-semantic relationship data set corresponds to the unified language.
Yoo teaches selecting one of a plurality of prestored theme vocabulary-semantic relationship data sets to serve as the theme vocabulary-semantic relationship data set  ([0142] In an example, the speech recognition apparatus 900 may identify a language of a user and select the corresponding language speech recognition model 1031. For example, the speech recognition apparatus 900 may store multiple speech recognition models respectively for different spoken languages, with a speech recognition model corresponding to each of the languages being stored, and additionally store a corresponding parameter generation model and a corresponding dialect classification model corresponding to each of the multiple speech recognition models. The speech recognition apparatus 900 may thus apply generated dialect parameters to the selected speech recognition model 1031. The speech recognition apparatus 900 may generate the speech recognition result for the speech signal using the speech recognition model 1031 to which the generated dialect parameters are applied).
wherein the plurality of prestored theme vocabulary-semantic relationship data sets respectively correspond to different languages, and the theme vocabulary-semantic relationship data set corresponds to the unified language ([0142] In an example, the speech recognition apparatus 900 may identify a language of a user and select the corresponding language speech recognition model 1031. For example, the speech recognition apparatus 900 may store multiple speech recognition models respectively for different spoken languages, with a speech recognition model corresponding to each of the languages being stored, and additionally store a corresponding parameter generation model and a corresponding dialect classification model corresponding to each of the multiple speech recognition models. The speech recognition apparatus 900 may thus apply generated dialect parameters to the selected speech recognition model 1031. The speech recognition apparatus 900 may generate the speech recognition result for the speech signal using the speech recognition model 1031 to which the generated dialect parameters are applied)
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Adams, Deshmukh and Hoffmeister, in view of Yoo to  select one of a plurality of prestored theme vocabulary-semantic relationship data sets to serve as the theme vocabulary-semantic relationship data set, wherein the plurality of prestored theme vocabulary-semantic relationship data sets respectively correspond to different languages, and the theme vocabulary-semantic relationship data set corresponds to the unified language , in order to improve an overall performance of speech recognition via optimization for a pronunciation of a user (Yoo [0143]).

Claims 8 is rejected under 35 U.S.C. 103 as being unpatentable over Adams, Deshmukh and Hoffmeister as applied to claim 1 and in further view of Ganapathy (US-20200107078-A1) and AHN (US-20170032779-A1)

With respect to claim 8 Adams, Deshmukh and Hoffmeister does not teach selecting a unified language according to a language family distribution of the plurality of original words; and 25according to the multi-lingual vocabulary, obtaining a plurality of converted words respectively corresponding to the plurality of original words, and forming the sentence by the plurality of converted words; wherein the converted words belong to the unified language.
Ganapathy teaches
selecting a unified language according to a language family distribution of the plurality of original words ([0039] According to one embodiment, determining conversation language at block 205 may include determining more than one language. In response to detecting more than one language a control device can select a conversation language. Selection of the conversation language may be based on the word count of each conversation language.); and  
25[[according to the multi-lingual vocabulary, obtaining a plurality of converted words respectively corresponding to the plurality of original words, and forming the sentence by the plurality of converted words]]; 
wherein the converted words belong to the unified language ([0040] Determining conversation language at block 205 may include performing one or more operations to characterize speech detected in a space. In one embodiment, one or more of sound and keyword recognition are used to identify possible languages. Phrases and sentences may be determined in addition to determining words. Process 200 may include parameters for natural language processing. In addition, process 200 may load a plurality of language and sound data sets as a reference. Languages and sound parameters may be assigned identifiers to allow for a control device to request subtitle data based on a determined language.).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify  Adams, Deshmukh, Hoffmeister in view of  Ganapathy to  select a unified language according to a language family distribution of the plurality of original words, wherein the converted words belong to the unified language, in order to provide functions to allow for determination of a conversation language relative to the display device and presentation (Ganapathy, [0026])
Adams, Deshmukh Hoffmeister and Ganapathy do not teach according to the multi-lingual vocabulary, obtaining a plurality of converted words respectively corresponding to the plurality of original words, and forming the sentence by the plurality of converted words.
AHN teaches according to the multi-lingual vocabulary, obtaining a plurality of converted words respectively corresponding to the plurality of original words, and forming the sentence by the plurality of converted words ([0060] The language selector 140 may obtain appearance probability information of each of the candidate pronunciation variants detected by the candidate pronunciation variant detector 120, using, for example, a pronunciation dictionary 150 and a grammar model 160. The language selector 140 selects a final language that is recognized, based on the appearance probability information of each of the candidate pronunciation variants, and [0061] Furthermore, the language selector 140 may obtain words corresponding to respective candidate pronunciation variants using the pronunciation dictionary 150 and may obtain an appearance probability value for each word corresponding to the respective candidate pronunciation variants using the grammar model 160. The language selector 140 may finally select a candidate pronunciation variant corresponding to a word having the largest appearance probability value. A word corresponding to the finally selected candidate pronunciation variant may be output as a word that is recognized.)
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Adams, Deshmukh, Hoffmeister and Ganapathy in view of AHN to according to the multi-lingual vocabulary, obtain a plurality of converted words respectively corresponding to the plurality of original words, and forming the sentence by the plurality of converted words, in order to improve speech recognition performance as the grammar model is update (AHN [0083]).

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Adams, Deshmukh, Hoffmeister, Yeh and Marringo as applied to claim 9 and in further view of Rechner (US 20200092607 A)

With respect to claim 11 Adams, Deshmukh, Hoffmeister, Yeh and Marringo do not teach according to a language distribution of the determined original words in the alphabet 5string, selecting one of the prestored alphabet groups in the pending word set to be another one of the plurality of original words.
Rechner teaches 
according to a language distribution of the determined original words in the alphabet 5string, selecting one of the prestored alphabet groups in the pending word set to be another one of the plurality of original words ([0008] Embodiments can further provide a method further comprising wherein the step of evaluating the text to determine candidate words to filter comprises, determining the language of the text, selecting a dictionary of objectionable words based on the language of the text, and comparing each word in the text to the words in the selected dictionary ).
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Adams, Deshmukh, Hoffmeister, Yeh and Marringo, in view of Rechner to incorporate according to a language distribution of the determined original words in the alphabet string, selecting one of the prestored alphabet groups in the pending word set to be another one of the plurality of original words, in order to include the ability to teach the cognitive system to improve the deep learning semantic analysis. (Rechner [0082])

Claim 5 and 18 is rejected under 35 U.S.C. 103 as being unpatentable over Adams, Deshmukh  and Hoffmeister as applied to claims 1 and 12 respectively, and in further view of Gould (US-6839669-B1),  Ahn and Beattie (US-20070055514-A1)

With respect to claims 5 and 18 Adams and Deshmukh, and Hoffmeister do no teach generating a confused sentence set by a language recognition acoustic model, with the confused sentence set comprising a plurality of sentence candidates; supplementing the confused sentence set according to the plurality of original words 25and the multi-lingual vocabulary; according to the theme vocabulary-semantic relationship data set, determining whether a suitable one exists in the plurality of sentence candidates; when the suitable one exists, replacing the sentence with the suitable one, and determining that the correction procedure successes; and  20when no suitable one exists, determining that the correction procedure fails.
Gould teaches
generating a confused sentence set by a language recognition acoustic model, with the confused sentence set comprising a plurality of sentence candidates (Col 1 ll 43-56) In a typical speech recognition system, a user dictates into a microphone connected to a computer. The computer then performs speech recognition to find acoustic models that best match the user's speech. The words or phrases corresponding to the best matching acoustic models are referred to as recognition candidates. The computer may produce a single recognition candidate (i.e., a single sequence of words or phrases) for an utterance, or may produce a list of recognition candidates. Typically, the best recognition candidate is immediately displayed to the user or an action corresponding to the best recognition candidate is performed. The user generally is permitted to correct errors in the recognition. Other recognition candidates may also be displayed.  ); 
[[supplementing the confused sentence set according to the plurality of original words 25and the multi-lingual vocabulary]]; 
according to the theme vocabulary-semantic relationship data set, determining whether a suitable one exists in the plurality of sentence candidates (Col 66 ll 42-51 After processing the utterance, the recognizer provides the best-scoring hypotheses to the control/interface module 2320 as a list of recognition candidates, where each recognition candidate corresponds to a hypothesis and has an associated score. Some recognition candidates may correspond to text while other recognition candidates correspond to commands. Commands may include words, phrases, or sentences. When the software 360 is called by the interface software 380, the control/interface module 720 returns the best-scoring candidate to the interface software 380.Typically, the best recognition candidate is immediately displayed to the user or an action corresponding to the best recognition candidate is performed. The user generally is permitted to correct errors in the recognition and (1044) Other functions provided by the control/interface module 1820 include a vocabulary customizer and a vocabulary manager. The vocabulary customizer optimizes the language model of a specific topic by scanning user supplied text."); 
when the suitable one exists, replacing the sentence with the suitable one, and determining that the correction procedure successes (Col 79 ll 17-29 When the system makes a recognition error, the user may invoke an appropriate correction command to remedy the error. FIGS. 34A-34N illustrate a user interface provided by the control/interface module 1820 in response to a sequence of interspersed text and commands. As shown in FIG. 34A, the recognizer 1815 correctly recognizes a first utterance 3400 ("When a justice needs a friend New-Paragraph") and the control/interface module 1820 displays the results 3405 ("When a justice needs a friend") of recognizing the utterance in a dictation window 3410. The module 1820 displays text 3405 ("When a justice needs a friend") corresponding to a text portion of the utterance and implements the formatting command ("New-Paragraph") included in the utterance); and  
[[when no suitable one exists, determining that the correction procedure fails.]]
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Adams, Deshmukh and Hoffmeister   in view of Gould to generate a confused sentence set by a language recognition acoustic model, with the confused sentence set comprising a plurality of sentence candidates, in order to provide considerable reductions in the processing associated with parsing an utterance, particularly when an early pattern scores well (Gould (Col. 4 ll 40-42).
Adams, Deshmukh, Hoffmeister and Gould do not teach supplementing the confused sentence set according to the plurality of original words 25and the multi-lingual vocabulary; 
AHN teaches supplementing the confused sentence set according to the plurality of original words 25and the multi-lingual vocabulary ([0060] The language selector 140 may obtain appearance probability information of each of the candidate pronunciation variants detected by the candidate pronunciation variant detector 120, using, for example, a pronunciation dictionary 150 and a grammar model 160. The language selector 140 selects a final language that is recognized, based on the appearance probability information of each of the candidate pronunciation variants, and [0061] Furthermore, the language selector 140 may obtain words corresponding to respective candidate pronunciation variants using the pronunciation dictionary 150 and may obtain an appearance probability value for each word corresponding to the respective candidate pronunciation variants using the grammar model 160. The language selector 140 may finally select a candidate pronunciation variant corresponding to a word having the largest appearance probability value. A word corresponding to the finally selected candidate pronunciation variant may be output as a word that is recognized.)
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Adams, Deshmukh, Hoffmeister and Gould in view of AHN to supplement the confused sentence set according to the plurality of original words and the multi-lingual vocabulary, in order to improve speech recognition performance as the grammar model is update (AHN [0083]).
Adams, Deshmukh, Gould and AHN do not teach when no suitable one exists, determining that the correction procedure fails.
Beattie teaches when no suitable one exists, determining that the correction procedure fails ([0057] If input has not been received, process 130 returns to determining 160 the amount of time that has passed. This loop is repeated until either a valid recognition is received or the time exceeds the threshold. If a valid recognition is received (in response to determination 160), process 130 proceeds 154 to a subsequent word in the passage and re-initializes 132 the timer. If the time exceeds the threshold, process 130 proceeds 162 to a subsequent word in the passage, but the word is indicated as not receiving a correct response within the allowable time period.)
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Adams, Deshmukh, Hoffmeister and Gould in view of Beattie to replace the sentence with the suitable one, and determining that the correction procedure successes, in order to allow the level of accuracy in pronunciation to be adjusted according to the skill level of the user (Beattie [0061]).

Claims 6 and 19 is rejected under 35 U.S.C. 103 as being unpatentable over Adams, Deshmukh and Hoffmeister applied to claims 1 and 12 respectively, and in further view of Marringo (US 20070225983 A1).

With respect to claims 6 and 19 Adams and Deshmukh do not teach determining whether a number of executions of the step of determining that the 5alphabet string corresponds to the plurality of original words according to the multi-lingual vocabulary by the speech recognizer exceeds a preset number; when the number of executions does not exceed the preset number, instructing the speech recognizer to re-determine that the alphabet string corresponds to another plurality of original words according to the multi-lingual vocabulary, and adding 1 to the number of 10executions; and when the number of executions exceeds the preset number, outputting a failure indicator or a voice input request.
Marringo teaches 
determining whether a number of executions of the step of determining that the alphabet string corresponds to the plurality of original words according to the multi-lingual vocabulary by the speech recognizer exceeds a preset number ([0057] If the current failure count for the first iterative sequence exceeds a predetermined appropriate limit, microcontroller 14 sets the current failure count to zero, returns SRP 32 to a power down mode, terminates the Worldwide Current Time Function, and the timekeeping device returns to normal clock function.); 
when the number of executions does not exceed the preset number, instructing the speech recognizer to re-determine that the alphabet string corresponds to another plurality of original words according to the multi-lingual vocabulary, and adding 1 to the number of executions ([0058] If the current failure count for the first iterative sequence does not exceed a predetermined appropriate limit, the microcontroller retrieves a "TRY AGAIN" prompt from memory, and updates display 18 with the "TRY AGAIN" prompt, and waits for an appropriate predetermined period of seconds while displaying the "TRY AGAIN" prompt, after which the microcontroller returns to Step 1 of the first iterative sequence, and display 18 is updated to again display "WHAT REGION?"); and 
when the number of executions exceeds the preset number, outputting a failure indicator or a voice input request ([0057] If the current failure count for the first iterative sequence exceeds a predetermined appropriate limit, microcontroller 14 sets the current failure count to zero, returns SRP 32 to a power down mode, terminates the Worldwide Current Time Function, and the timekeeping device returns to normal clock function.).
 It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to substitute Adams, Deshmukh’s teaching with Marringo’s teachings of iterating through a data tree directed to a user selection, which is known technique in the field of speech recognition processing in order for a recognizer to cycle through multiple alphabet strings. (See KSR v Teleflex).

Claims 7 and 20 is rejected under 35 U.S.C. 103 as being unpatentable over Adams, Deshmukh and Marringo as applied to claims 6 and 19, and in further view of Lu (US-20140222417-A1).

With respect to claims 7 and 20 Adams, Deshmukh and Marringo do not teach wherein the step of determining that the alphabet string corresponds to the plurality of original words according to the multi-lingual vocabulary is executed by a first word segmentation method, and the step of re-determining that the alphabet string corresponds to the another plurality of original words according to the multi-lingual vocabulary is executed by a second word segmentation method which is different from the first word segmentation method.  
Lu teaches wherein the step of determining that the alphabet string corresponds to the plurality of original words according to the multi-lingual vocabulary is executed by a first word segmentation method, and the step of re-determining that the alphabet string corresponds to the another plurality of original words according to the multi-lingual vocabulary is executed by a second word segmentation method which is different from the first word segmentation method ([0023] According to the above characteristics, embodiments of the present application propose a strategy for training an acoustic language model based on word segmentation according to word classes and [0025] As described herein, word segmentation refers to the process of dividing a continuous language sample (e.g., a text string) into a sequence of unambiguous semantic units (e.g., words) [first segmentation]. For example, in the Chinese language, a textual string containing Chinese characters or Pinyin does not include natural delimiters between words, and the divisions between semantic units within the textual string are not apparent. Therefore, in order to interpret the meaning of the textual string, the string is segmented into a sequence of chunks [second word segmentation] each representing a respective word.)
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify the teachings of Adams, Deshmukh to include the teachings of Lu motivation being that training an acoustic language model for recognition improves the recognition accuracy of the system (Lu, [0007]).
 
Claims 9 is rejected under 35 U.S.C. 103 as being unpatentable over Adams, Deshmukh and  Hoffmeister as applied to claims 1 and 12 respectively, and in further view of Yeh (US 20170024465 A1) and Marringo.
With respect to claim 9 Adams, Deshmukh do not teach 5setting an alphabet group to be recognized in the alphabet string, with the alphabet group to be recognized having a head position and an end position; setting a value of the head position to be 1, and setting a value of the end position to be M; determining a number of prestored alphabet groups, which match the alphabet group, 10in the multi-lingual vocabulary; when the number is zero, subtracting 1 from the value of the end position, and re-executing the step of determining the number of prestored alphabet groups, which match the alphabet group, in the multi-lingual vocabulary; when the number is one, regarding the prestored alphabet group as one of the plurality 15of original words; and when the number is more than one, storing the prestored alphabet groups into a pending word set.
Yeh teaches 
5setting an alphabet group to be recognized in the alphabet string, with the alphabet group to be recognized having a head position and an end position ([0058] The NER 254 also may use list match features to flag phrases in the utterance that match those in an externally provided dictionary. The dictionary is constructed by extracting all relevant entries (e.g., movie and TV show titles, actor names, and role names) along with their type (e.g., movie, actor, etc.) from an EPG database 264. Each word in a phrase is assigned a feature if the phrase has an exact match in the dictionary. The features are of the form bY, iY, eY, and represent the beginning [head position], middle, and end [end position] of a phrase of type Y, respectively. A word can receive multiple list match features if it participates in multiple matches.); 
setting a value of the head position to be 1, and setting a value of the end position to be M; 
determining a number of prestored alphabet groups, which match the alphabet group, 10in the multi-lingual vocabulary ([0058] Each word in a phrase is assigned a feature if the phrase has an exact match in the dictionary. The features are of the form bY, iY, eY, and represent the beginning [head position 1], middle, and end of a phrase [end position M] of type Y, respectively. ); 
[[when the number is zero, subtracting 1 from the value of the end position, and re-executing the step of determining the number of prestored alphabet groups, which match the alphabet group, in the multi-lingual vocabulary]]; 
when the number is one, regarding the prestored alphabet group as one of the plurality 15of original words ([0058] The NER 254 also may use list match features to flag phrases in the utterance that match those in an externally provided dictionary. The dictionary is constructed by extracting all relevant entries (e.g., movie and TV show titles, actor names, and role names) along with their type (e.g., movie, actor, etc.) from an EPG database 264. Each word in a phrase is assigned a feature if the phrase has an exact match in the dictionary [number is one]); and 
when the number is more than one, storing the prestored alphabet groups into a pending word set ([0058] The NER 254 also may use list match features to flag phrases in the utterance that match those in an externally provided dictionary. The dictionary is constructed by extracting all relevant entries (e.g., movie and TV show titles, actor names, and role names) along with their type (e.g., movie, actor, etc.) from an EPG database 264. Each word in a phrase is assigned a feature if the phrase has an exact match in the dictionary. The features are of the form bY, iY, eY, and represent the beginning, middle, and end of a phrase of type Y, respectively. A word can receive multiple list match features if it participates in multiple matches [more than one]).
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to substitute Adams, Deshmukh’s teaching with Yeh’s teachings of using list match features to flag phrases in the utterance, which is known technique in the field of speech recognition processing in order to mark head and end positions in strings. (See KSR v Teleflex). 
Adams, Deshmukh and Yeh do not teach when the number is zero, subtracting 1 from the value of the end position, and re-executing the step of determining the number of prestored alphabet groups, which match the alphabet group, in the multi-lingual vocabulary.
Marringo teaches when the number is zero, subtracting 1 from the value of the end position, and re-executing the step of determining the number of prestored alphabet groups, which match the alphabet group, in the multi-lingual vocabulary ([0058] If the current failure count for the first iterative sequence does not exceed a predetermined appropriate limit, the microcontroller retrieves a "TRY AGAIN" prompt from memory, and updates display 18 with the "TRY AGAIN" prompt, and waits for an appropriate predetermined period of seconds while displaying the "TRY AGAIN" prompt, after which the microcontroller returns to Step 1 of the first iterative sequence, and display 18 is updated to again display "WHAT REGION?")
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to substitute Adams, Deshmukh’s teaching with Marringo’s teachings of iterating through a data tree directed to a user selection, which is known technique in the field of speech recognition processing in order for a recognizer to cycle through multiple alphabet strings. (See KSR v Teleflex).

Allowable Subject Matter
Claims 3 and 10 are objected to as being dependent upon a rejected base claims, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Claim 3 recites wherein the step of determining the error rate of the plurality of converted words according to the sentence and the theme vocabulary-semantic relationship data set comprises:  determining that the sentence has one or more sub-sentences; for each of the one or more sub-sentences, determining a uniform theme proportion of the sub-sentence according to the theme vocabulary-semantic relationship data set; and obtaining the error rate according to the uniform theme proportion of each of the one or more sub-sentence. The teachings to the indicated allowable subject matter are in the reference Hoffmeister which cites “Col 20 ll 39-52 To confirm whether a sentence is correct, the final set of hierarchical features may be input into and classified by a DNN following the encoder/classifier approach.” This reference or other references cited in prior art do not teach uniform theme proportion. 
Claim 10 recites when the number is one or more, further determining whether the value of the end position is equal to M; and when the value of the end position is not equal to M, setting a sum of the value of the end position and the value of the head position to be a new value of the head position, setting the value of the end position to be M, and re-executing the step of determining the number of prestored alphabet groups, which match the alphabet group, in the multi-lingual vocabulary.  The teachings to the indicated allowable subject matter are in the reference Hoffmeister which cites “([0058] Each word in a phrase is assigned a feature if the phrase has an exact match in the dictionary. The features are of the form bY, iY, eY, and represent the beginning, middle, and end of a phrase of type Y, respectively” This reference or other references cited in prior art do not teach when the value of the end position is not equal to M, setting a sum of the value of the end position and the value of the head position to be a new value of the head position. 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ATHAR N PASHA whose telephone number is (408)918-7675.  The examiner can normally be reached on Monday-Thursday Alternate Fridays, 7:30-4:30 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571)272-5551.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.   Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/A.N.P./Examiner, Art Unit 2657                                                                                                                                                                                                        

/Paras D Shah/Primary Examiner, Art Unit 2659