DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
2.	The information disclosure statement (IDS) submitted on 08/13/2020, 02/02/2021, 04/29/2021 are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.

Priority Acknowledgment
3.               Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in Application 10-2020-0018574, filed on 02/14/2020 in Korean Intellectual Property Office and in Application 10-2019-013325 filed on 10/24/2019 in Korean Intellectual Property Office. 

Claim Objections
4. 	Claim 1-20 are objected to because of the following informalities: texts is/are blurry. Appropriate correction is required. 

Specification Objections
5. 	The specification is objected to because of the following informalities: texts is/are blurry. Appropriate correction is required. 

Claim Rejections - 35 USC § 103
6.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such 

7.	Claims 1, 10, 12, 19 are rejected under 35 U.S.C.103 as being unpatentable over Du et al. (US 2019/0103109 A1) in view of Willett et al. (US 2018/0197545 A1.)

	With respect to Claim 1, Du et al. disclose
 	A server comprising: 
 	a memory storing one or more computer-readable instructions (Du et al. [0013] the embodiments of the present application provide a server including: one or more processors, and a storage apparatus for storing one or more programs, and the one or more programs, when executed by the one or more processors, causing the one or more processors to implement the method described in any implementation in the first aspect); 
 	a processor configured to execute the one or more computer-readable instructions stored in the memory(Du et al. [0013] the embodiments of the present application provide a server including: one or more processors, and a storage apparatus for storing one or more programs, and the one or more programs, when executed by the one or more processors, causing the one or more processors to implement the method described in any implementation in the first aspect);  and  
 	wherein the processor when executing the one or more computer-readable instructions (Du et al. [0013] the embodiments of the present application provide a server including: one or more processors, and a storage apparatus for storing one or more programs, and the one or more programs, when executed by the one or more processors, causing the one or more processors to implement the method described in any implementation in the first aspect) is configured to: 
the server may determine a word information set stored in association with the user identifier of the user based on the user identifier of the user, the word information set stores a historical character string weijing, an input result Wei Jing corresponding to weijing, and a candidate result monosodium glutamate corresponding to weijing. Finally, the server replaces the “monosodium glutamate” in the first recognized text with “Wei Jing” based on the word information in the determined word information set to obtain a second recognized text “call Wei Jing”); and 
control the communication interface to transmit a second character string to the device, the second character string comprising the portion of the first character string replaced with the estimated character string (Du et al. [0046] the server may determine a word information set stored in association with the user identifier of the user based on the user identifier of the user, the word information set stores a historical character string weijing, an input result Wei Jing corresponding to weijing, and a candidate result monosodium glutamate corresponding to weijing. Finally, the server replaces the “monosodium glutamate” in the first recognized text with “Wei Jing” based on the word information in the determined word information set to obtain a second recognized text “call Wei Jing” and send the second recognized text to the smart phone used by the user.)
Du et al. fail to explicitly teach 
 	a communication interface configured to receive from a device a first character string of speech recognition by the device of a speech signal input to the device, 
	However, Willett et al. teach 
	a communication interface configured to receive from a device a first character string of speech recognition by the device of a speech signal input to the device, (Willett et al. [0046] In act 410, audio comprising speech is received by a mobile electronic device in a hybrid speech processing system. The process then proceeds to act 412, where at least a portion of the input speech is processed by an embedded ASR engine on the mobile device to generate recognized text. The process then proceeds to act 414, where at least a portion of the recognized text output from the embedded ASR engine is transmitted to a remote server in the hybrid speech processing system. The recognized text transmitted to the remote server may include partial or full ASR results, and embodiments are not limited in this respect. Transmitting recognized text to a remote server is shown in the process of FIG. 4 as an act that is always performed.)
Du et al. and Willett et al. are analogous art because they are from a similar field of endeavor in the Signal Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of replacing some words in the transcription by the server as taught by Du et al., using teaching of generating the recognized text at the embedded ASR and transmitting the recognized text from the embedded ASR to the server as taught by Willett et al. for the benefit of  determining whether to request speech for processing from the mobile device and determining a semantic category associated with the recognized text at the server (Willett et al. [0047] The processed of Fig. 4 then proceeds to act 416, where a controller implemented on the server determines, based on the recognized text received from the mobile device, whether to request speech for processing from the mobile device. In some embodiments, at least a portion of the recognized text may be processed by a server NLU engine configured to determine a semantic category associated with the recognized text.)
 	With respect to Claim 10, Du et al. in view of Willett et al. teach 
wherein the processor when executing the one or more computer-readable instructions is further configured to provide a service associated with the speech signal input to the device, based on the second character string (Du et al. [0046] Fig. 3 Recognized result: call Wei Jing. The Examiner notes that the system in Du et al. provides a phone call service associated with the speech signal input to the device based on the second character string “call Wei Jing”.)
 	With respect to Claim 12, Du et al. disclose
 	An operation method of a server, the operation method comprising:  
 	identifying an estimated character string to replace a portion of the first character string, based on the first character string (Du et al. [0046] the server may determine a word information set stored in association with the user identifier of the user based on the user identifier of the user, the word information set stores a historical character string weijing, an input result Wei Jing corresponding to weijing, and a candidate result monosodium glutamate corresponding to weijing. Finally, the server replaces the “monosodium glutamate” in the first recognized text with “Wei Jing” based on the word information in the determined word information set to obtain a second recognized text “call Wei Jing”); 
 transmitting a second character string to the device, the second character string comprising the portion of the first character string replaced with the estimated character string (Du et al. [0046] the server may determine a word information set stored in association with the user identifier of the user based on the user identifier of the user, the word information set stores a historical character string weijing, an input result Wei Jing corresponding to weijing, and a candidate result monosodium glutamate corresponding to weijing. Finally, the server replaces the “monosodium glutamate” in the first recognized text with “Wei Jing” based on the word information in the determined word information set to obtain a second recognized text “call Wei Jing” and send the second recognized text to the smart phone used by the user.)
Du et al. fail to explicitly teach 
 	receiving from a device a first character string of speech recognition by the device of a speech signal input to the device; 
 	However, Willett et al. teach 
 	receiving from a device a first character string of speech recognition by the device of a speech signal input to the device (Willett et al. [0046] In act 410, audio comprising speech is received by a mobile electronic device in a hybrid speech processing system. The process then proceeds to act 412, where at least a portion of the input speech is processed by an embedded ASR engine on the mobile device to generate recognized text. The process then proceeds to act 414, where at least a portion of the recognized text output from the embedded ASR engine is transmitted to a remote server in the hybrid speech processing system. The recognized text transmitted to the remote server may include partial or full ASR results, and embodiments are not limited in this respect. Transmitting recognized text to a remote server is shown in the process of FIG. 4 as an act that is always performed);
Du et al. and Willett et al. are analogous art because they are from a similar field of endeavor in the Signal Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of replacing some words in the transcription by the server as taught by Du et al., using teaching of generating the recognized text at the embedded ASR and transmitting the recognized text from the embedded ASR to the server as taught by Willett et al. for the benefit of  determining whether to request speech for processing from the mobile device and determining a semantic category associated with the recognized text at the server (Willett et al. [0047] The processed of Fig. 4 then proceeds to act 416, where a controller implemented on the server determines, based on the recognized text received from the mobile device, whether to request speech for processing from the mobile device. In some embodiments, at least a portion of the recognized text may be processed by a server NLU engine configured to determine a semantic category associated with the recognized text.)
 	With respect to Claim 19, Du et al. in view of Willett et al. teach 
 further comprising providing a service associated with the speech signal input to the device, based on the second character string (Du et al. [0046] Fig. 3 Recognized result: call Wei Jing. The Examiner notes that the system in Du et al. provides a phone call service associated with the speech signal input to the device based on the second character string “call Wei Jing”.)
8.	Claims 2-4, 13-15 are rejected under 35 U.S.C.103 as being unpatentable over Du et al. (US 2019/0103109 A1) in view of Willett et al. (US 2018/0197545 A1) and Ittycheriah et al. (US 6269335 B1.)
	With respect to Claim 2, Du et al. in view of Willett et al. teach all the limitations of Claim 1 upon which Claim 2 depends. Du et al. in view of Willett et al. fail to explicitly teach 
 	wherein the processor when executing the one or more computer-readable instructions is further configured to: 

 	obtain, the second character string, by replacing the portion of the first character string with the estimated character string based on the replacement characters, 
 	wherein the replacement characters are characters having pronunciations similar to each character within the first character string. 
	However, Ittycheriah et al. teach 
 	wherein the processor when executing the one or more computer-readable instructions is further configured to (Ittycheriah et al. col. 4 lines 27- 32 The speech recognition system 10, itself, includes a speech utterance pre-processor 12, an acoustic front-end 14 operatively coupled to the pre-processor 12, and a speech recognition engine 16 operatively coupled to the acoustic front-end 14): 
 	identify replacement characters corresponding to each character within the portion of the first character string (Ittycheriah et al. Fig. 3A element 108 Identify homophones from the measures) and identify the estimated character string, based on the replacement characters (Ittycheriah et al. col. 3 lines 36-45 the results of the homophone identification process may be added to the N-best list generated by the speech recognize in response to the uttered word. Then, a second decoding pass (e.g., a detailed match, an acoustic re-scoring, a language model re-scoring) is performed using the augmented N-best list in order to yield the result which is considered to be the top hypotheses for the uttered word. In this manner, there is no feedback to the user, rather, the speech recognizer utilizes the results to make the best selection); and 
 	obtain, the second character string, by replacing the portion of the first character string with the estimated character string based on the replacement characters (Ittycheriah et al. col. 12 lines 10-17 The second pass, e.g., re-scoring or detailed match, preferably includes increasing the beam-width associated with the Viterbi algorithm thereby increasing the likelihood that all homophones are identified for the uttered word, Alternatively, the identified homophones may be used to filter the N-best list produced by the speech recognizer to clean the list up in the order that the list only includes acoustically similar words. Ittycheriah et al. disclose a method of identifying a list of homophones corresponding to the decoded word from the transcription to replace the decoded word by one of the identified homophones.), 
 	wherein the replacement characters are characters having pronunciations similar to each character within the first character string (Ittycheriah et al. col. 12 lines 10-17 The second pass, e.g., re-scoring or detailed match, preferably includes increasing the beam-width associated with the Viterbi algorithm thereby increasing the likelihood that all homophones are identified for the uttered word, Alternatively, the identified homophones may be used to filter the N-best list produced by the speech recognizer to clean the list up in the order that the list only includes acoustically similar words.) 
 	Du et al., Willett et al. and Ittycheriah et al. are analogous art because they are from a similar field of endeavor in the Signal Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of replacing some words in the transcription by the server as taught by Du et al., using teaching of generating the recognized text at the embedded ASR and transmitting the recognized text from the embedded ASR to the server as taught by Willett et al. for the benefit of  determining whether to request speech for processing from the mobile device and determining a semantic category associated with the recognized text at the server, using teaching of identifying the homophones corresponding with decoded word as taught by Ittycheriah et al. for the benefit of correcting the error in the transcription (Ittycheriah et al. col. 12 lines 10-17 The second pass, e.g., re-scoring or detailed match, preferably includes increasing the beam-width associated with the Viterbi algorithm thereby increasing the likelihood that all homophones are identified for the uttered word, Alternatively, the identified homophones may be used to filter the N-best list produced by the speech recognizer to clean the list up in the order that the list only includes acoustically similar words.)
With respect to Claim 3, Du et al. in view of Willett et al. teach all the limitations of Claim 1 upon which Claim 3 depends. Du et al. in view of Willett et al. fail to explicitly teach 

However, Ittycheriah et al. teach
wherein the processor when executing the one or more computer-readable instructions is further configured to calculate likelihood matrices relating to replacement characters of the estimated character string that are to replace each character within the portion of the first character string (Ittycheriah et al. col. 11 lines 12-31 This procedure is carried out for all leaves in the system and a symmetric matrix such as is shown below is produced identifying all the pair-wise distance between leaves. The distance matrix is subsequently used to find the total distance between two words. An example of such a symmetric  matrix is below...In this example, the first column and row represent leaf number 1. The second column and row represent leaf number 2, and so on. Therefore, the diagonal zeros (0) represent the fact that the leaf distance to itself is zero and the matrix is symmetric because the distance from leaf 2 to leaf 1 is the same as the distance from leaf 1 to the leaf 2), and identify the second character string based on likelihood values within the likelihood matrices (Ittycheriah et al. col. 8 lines 9-21 The step of comparing the decoded word to all other existing vocabulary words (step 106) to identify homophones may be accomplished in many ways. A preferred manner involves calculating respective distance measures or scores between the decoded word and the other existing words in the vocabulary. The distance measure associated with the decoded word and any other word from the vocabulary is preferably generated by respectively comparing leaves from the lefeme sequence of the decoded word with leaves from the lefeme sequence of the other existing word. A measure or score is generated for each pair-wise leaf comparison and the total distance measure for the words is calculated by adding up the pair-wise leaf scores.)
Du et al., Willett et al. and Ittycheriah et al. are analogous art because they are from a similar field of endeavor in the Signal Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the The step of comparing the decoded word to all other existing vocabulary words (step 106) to identify homophones may be accomplished in many ways. A preferred manner involves calculating respective distance measures or scores between the decoded word and the other existing words in the vocabulary. The distance measure associated with the decoded word and any other word from the vocabulary is preferably generated by respectively comparing leaves from the lefeme sequence of the decoded word with leaves from the lefeme sequence of the other existing word. A measure or score is generated for each pair-wise leaf comparison and the total distance measure for the words is calculated by adding up the pair-wise leaf scores.)
	With respect to Claim 4, Du et al.  in view of Willett et al. and Ittycheriah et al. teach 
 	wherein the processor when executing the one or more computer-readable instructions is further configured to: 
 	calculate a likelihood of the estimated character string, based on the likelihood values within the likelihood matrices (Ittycheriah et al. col. 8 lines 9-21 The step of comparing the decoded word to all other existing vocabulary words (step 106) to identify homophones may be accomplished in many ways. A preferred manner involves calculating respective distance measures or scores between the decoded word and the other existing words in the vocabulary. The distance measure associated with the decoded word and any other word from the vocabulary is preferably generated by respectively comparing leaves from the lefeme sequence of the decoded word with leaves from the lefeme sequence of the other existing word. A measure or score is generated for each pair-wise leaf comparison and the total distance measure for the words is calculated by adding up the pair-wise leaf scores); and 
the results of the homophone identification process may be added to the N-best list generated by the speech recognize in response to the uttered word. Then, a second decoding pass (e.g., a detailed match, an acoustic re-scoring, a language model re-scoring) is performed using the augmented N-best list in order to yield the result which is considered to be the top hypotheses for the uttered word. In this manner, there is no feedback to the user, rather, the speech recognizer utilizes the results to make the best selection, col. 8 lines 9-21 The step of comparing the decoded word to all other existing vocabulary words (step 106) to identify homophones may be accomplished in many ways. A preferred manner involves calculating respective distance measures or scores between the decoded word and the other existing words in the vocabulary. The distance measure associated with the decoded word and any other word from the vocabulary is preferably generated by respectively comparing leaves from the lefeme sequence of the decoded word with leaves from the lefeme sequence of the other existing word. A measure or score is generated for each pair-wise leaf comparison and the total distance measure for the words is calculated by adding up the pair-wise leaf scores, col. 2 lines 56-67 a method of identifying homophones of a word uttered by a user from at least a portion of existing words of a vocabulary of a speech recognition engine comprises the steps of: decoding the uttered word to yield a decoded word; computing respective measures between the decoded word and at least a portion of the other existing vocabulary words, the respective measures indicative of acoustic similarity between the word and the at least a portion of other existing words; identifying, as homophones of the uttered word, the other existing words associated with measures which correspond to a threshold range.)
 	With respect to Claim 13, Du et al. in view of Willett et al. teach all the limitations of Claim 12 upon which Claim 13 depends. Du et al. in view of Willett et al. fail to explicitly teach 
 	wherein the identifying comprises: 
 	identifying replacement characters corresponding to each character within the portion of the first character string; and 

 	wherein the obtaining of the second character string, based on the plurality of estimated character strings, comprises obtaining, the second character string, by replacing the portion of the first character string with the estimated character string based on the replacement characters, and
 	 the replacement characters are characters having pronunciations similar to each character within the first character string. 
	However, Ittycheriah et al. teach 
 	wherein the identifying comprises: 
 	identifying replacement characters corresponding to each character within the portion of the first character string (Ittycheriah et al. Fig. 3A element 108 Identify homophones from the measures); and 
 	identifying the estimated character string, based on the replacement characters (Ittycheriah et al. col. 3 lines 36-45 the results of the homophone identification process may be added to the N-best list generated by the speech recognize in response to the uttered word. Then, a second decoding pass (e.g., a detailed match, an acoustic re-scoring, a language model re-scoring) is performed using the augmented N-best list in order to yield the result which is considered to be the top hypotheses for the uttered word. In this manner, there is no feedback to the user, rather, the speech recognizer utilizes the results to make the best selection), 
 	wherein the obtaining of the second character string, based on the plurality of estimated character strings, comprises obtaining, the second character string, by replacing the portion of the first character string with the estimated character string based on the replacement characters Ittycheriah et al. col. 12 lines 10-17 The second pass, e.g., re-scoring or detailed match, preferably includes increasing the beam-width associated with the Viterbi algorithm thereby increasing the likelihood that all homophones are identified for the uttered word, Alternatively, the identified homophones may be used to filter the N-best list produced by the speech recognizer to clean the list up in the order that the list only includes acoustically similar words. Ittycheriah 
 	 the replacement characters are characters having pronunciations similar to each character within the first character string (Ittycheriah et al. col. 12 lines 10-17 The second pass, e.g., re-scoring or detailed match, preferably includes increasing the beam-width associated with the Viterbi algorithm thereby increasing the likelihood that all homophones are identified for the uttered word, Alternatively, the identified homophones may be used to filter the N-best list produced by the speech recognizer to clean the list up in the order that the list only includes acoustically similar words.) 
 	Du et al., Willett et al. and Ittycheriah et al. are analogous art because they are from a similar field of endeavor in the Signal Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of replacing some words in the transcription by the server as taught by Du et al., using teaching of generating the recognized text at the embedded ASR and transmitting the recognized text from the embedded ASR to the server as taught by Willett et al. for the benefit of  determining whether to request speech for processing from the mobile device and determining a semantic category associated with the recognized text at the server, using teaching of identifying the homophones corresponding with decoded word as taught by Ittycheriah et al. for the benefit of correcting the error in the transcription (Ittycheriah et al. col. 12 lines 10-17 The second pass, e.g., re-scoring or detailed match, preferably includes increasing the beam-width associated with the Viterbi algorithm thereby increasing the likelihood that all homophones are identified for the uttered word, Alternatively, the identified homophones may be used to filter the N-best list produced by the speech recognizer to clean the list up in the order that the list only includes acoustically similar words.)
With respect to Claim 14, Du et al. in view of Willett et al. teach all the limitations of Claim 12 upon which Claim 14 depends. Du et al. in view of Willett et al. fail to explicitly teach 
 	wherein the identifying comprises: 

 	identifying the second character sting based on likelihood values within the likelihood matrices. 
However, Ittycheriah et al. teach
 	wherein the identifying comprises: 
 	calculating likelihood matrices relating to replacement characters of the estimated character string that are to replace each character within the portion of the first character string (Ittycheriah et al. col. 11 lines 12-31 This procedure is carried out for all leaves in the system and a symmetric matrix such as is shown below is produced identifying all the pair-wise distance between leaves. The distance matrix is subsequently used to find the total distance between two words. An example of such a symmetric  matrix is below...In this example, the first column and row represent leaf number 1. The second column and row represent leaf number 2, and so on. Therefore, the diagonal zeros (0) represent the fact that the leaf distance to itself is zero and the matrix is symmetric because the distance from leaf 2 to leaf 1 is the same as the distance from leaf 1 to the leaf 2); and 
 identifying the second character sting based on likelihood values within the likelihood matrices (Ittycheriah et al. col. 8 lines 9-21 The step of comparing the decoded word to all other existing vocabulary words (step 106) to identify homophones may be accomplished in many ways. A preferred manner involves calculating respective distance measures or scores between the decoded word and the other existing words in the vocabulary. The distance measure associated with the decoded word and any other word from the vocabulary is preferably generated by respectively comparing leaves from the lefeme sequence of the decoded word with leaves from the lefeme sequence of the other existing word. A measure or score is generated for each pair-wise leaf comparison and the total distance measure for the words is calculated by adding up the pair-wise leaf scores.)
The step of comparing the decoded word to all other existing vocabulary words (step 106) to identify homophones may be accomplished in many ways. A preferred manner involves calculating respective distance measures or scores between the decoded word and the other existing words in the vocabulary. The distance measure associated with the decoded word and any other word from the vocabulary is preferably generated by respectively comparing leaves from the lefeme sequence of the decoded word with leaves from the lefeme sequence of the other existing word. A measure or score is generated for each pair-wise leaf comparison and the total distance measure for the words is calculated by adding up the pair-wise leaf scores.)
	With respect to Claim 15, Du et al.  in view of Willett et al. and Ittycheriah et al. teach 
  	wherein the obtaining of the second character string comprises: 
 	calculating a likelihood of the estimated character string, based on the likelihood values within the likelihood matrices (Ittycheriah et al. col. 8 lines 9-21 The step of comparing the decoded word to all other existing vocabulary words (step 106) to identify homophones may be accomplished in many ways. A preferred manner involves calculating respective distance measures or scores between the decoded word and the other existing words in the vocabulary. The distance measure associated with the decoded word and any other word from the vocabulary is preferably generated by respectively comparing leaves from the lefeme sequence of the decoded word with leaves from the lefeme sequence of the other existing word. A measure or score is generated for each pair-wise leaf comparison and the total distance measure for the words is calculated by adding up the pair-wise leaf scores); and 
 	selecting the estimated character string from among a plurality of estimated character strings, based on the likelihood, dictionary information, and a language model (Ittycheriah et al. col. 3 lines 36-45 the results of the homophone identification process may be added to the N-best list generated by the speech recognize in response to the uttered word. Then, a second decoding pass (e.g., a detailed match, an acoustic re-scoring, a language model re-scoring) is performed using the augmented N-best list in order to yield the result which is considered to be the top hypotheses for the uttered word. In this manner, there is no feedback to the user, rather, the speech recognizer utilizes the results to make the best selection, col. 8 lines 9-21 The step of comparing the decoded word to all other existing vocabulary words (step 106) to identify homophones may be accomplished in many ways. A preferred manner involves calculating respective distance measures or scores between the decoded word and the other existing words in the vocabulary. The distance measure associated with the decoded word and any other word from the vocabulary is preferably generated by respectively comparing leaves from the lefeme sequence of the decoded word with leaves from the lefeme sequence of the other existing word. A measure or score is generated for each pair-wise leaf comparison and the total distance measure for the words is calculated by adding up the pair-wise leaf scores, col. 2 lines 56-67 a method of identifying homophones of a word uttered by a user from at least a portion of existing words of a vocabulary of a speech recognition engine comprises the steps of: decoding the uttered word to yield a decoded word; computing respective measures between the decoded word and at least a portion of the other existing vocabulary words, the respective measures indicative of acoustic similarity between the word and the at least a portion of other existing words; identifying, as homophones of the uttered word, the other existing words associated with measures which correspond to a threshold range.)
9.	Claim 8 is rejected under 35 U.S.C.103 as being unpatentable over Du et al. (US 2019/0103109 A1) in view of Willett et al. (US 2018/0197545 A1), Ittycheriah et al. (US 6269335 B1) and Brown et al. (US 6,400,805 B1.)

With respect to Claim 8, Du et al.  in view of Willett et al. and Ittycheriah et al. teach all the limitations of Claim 3 upon which Clam 8 depends. Du et al. in view of Willett et al. and Ittycheriah et al. fail to explicitly teach
 	wherein the likelihood matrices obtained for each character of the first character string are calculated based on a pre-determined confusion matrix. 
	However, Brown et al. teach
 	wherein the likelihood matrices obtained for each character of the first character string are calculated based on a pre-determined confusion matrix (Brown et al. col. 9 lines 61-67, col. 10 lines 1-33  Each confusion set is assigned a different character change weighting. Each confusion set may also be assigned a separate character identity weighting or, instead, an overall character weighting applicable to each confusion set may be used. The character change weighting assigned to each confusion set is an average of each of the confusion matrix values that reflect the respective probabilities that one character of the confusion set would be misrecognized as another character of the confusion set.... A character identity weighting is an average of the confusion matrix  probabilities  that each particular character to which this weighting corresponds will be correctly recognized as itself. For instance, in confusion set 1, this character identity weighting would be the average of the probability that an A would be recognized as an A, the probability that a J would be recognized as a J, and the probability that a K would be recognized as a K.)
 	Du et al., Willett et al., Ittycheriah et al. and Brown et al. are analogous art because they are from a similar field of endeavor in the Signal Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of replacing some words in the transcription by the server as taught by Du et al., using teaching of generating the recognized text at the embedded ASR and transmitting the recognized text from the embedded ASR to the server as taught by Willett et al. for the benefit of  determining whether to request speech for processing from the mobile device and determining a semantic category associated with the recognized text at the server, using teaching the matrix as taught by Ittycheriah et al. for the benefit of Each confusion set is assigned a different character change weighting. Each confusion set may also be assigned a separate character identity weighting or, instead, an overall character weighting applicable to each confusion set may be used. The character change weighting assigned to each confusion set is an average of each of the confusion matrix values that reflect the respective probabilities that one character of the confusion set would be misrecognized as another character of the confusion set.... A character identity weighting is an average of the confusion matrix  probabilities  that each particular character to which this weighting corresponds will be correctly recognized as itself. For instance, in confusion set 1, this character identity weighting would be the average of the probability that an A would be recognized as an A, the probability that a J would be recognized as a J, and the probability that a K would be recognized as a K.)
10.	Claims 9, 18 are rejected under 35 U.S.C.103 as being unpatentable over Du et al. (US 2019/0103109 A1) in view of Willett et al. (US 2018/0197545 A1) and Kristjansson et al. (US 2015/0287406 A1.)
	With respect to Claim 9, Du et al. in view of  Willett et al. teach all the limitations of Claim 1 upon which Claim 9 depends. Du et al. in view of Willett et al. fail to explicitly teach
 	wherein the first character string includes characters respectively corresponding to speech signal frames obtained by splitting the speech signal at intervals of a preset time. 
	However, Kristjansson et al. teach 
 	wherein the first character string includes characters respectively corresponding to speech signal frames obtained by splitting the speech signal at intervals of a preset time (Kristjansson et al. Fig. 1 element 120 Speech Recognition Engine, [0046] As described above with reference to FIG. 1, the input speech 115 includes one or more speech signals from a user and usually also includes noise. The methods and systems described herein are used to obtain the clean speech estimate 130 from the input speech 115. In some implementations, obtaining the input speech 115 includes dividing an incoming analog signal into segments of a predetermined time period and sampling the segments. In the current example, the incoming signal is segmented into 20 ms segments and sampled at 8 KHz to produce a discretized version of the incoming signal. A frequency domain transform can then be calculated using the discretized signal. For example, an N point fast Fourier transform (FFT) can be calculated to obtain an observation vector yobs corresponding to a given segment. In some implementations, the vectors y.sub.obs can be used to represent the input speech 115. Observation vectors for different segments can be denoted as a function of time. For example, in the current example where 20 ms segments are considered, an observation vector can be denoted as y.sub.obs(t), which represents a segment at a (20 ms.Math.t) offset from the beginning of the signal.)
 	Du et al., Willett et al. and Kristjansson et al. are analogous art because they are from a similar field of endeavor in the Signal Processing techniques and applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the steps of replacing some words in the transcription by the server as taught by Du et al., using teaching of generating the recognized text at the embedded ASR and transmitting the recognized text from the embedded ASR to the server as taught by Willett et al. for the benefit of  determining whether to request speech for processing from the mobile device and determining a semantic category associated with the recognized text at the server, using teaching of dividing the speech signal into segments of a predetermined time period as taught by Kristjansson et al. for the benefit of producing a discretized version of the speech signal in speech recognition (Kristjansson et al. Fig. 1 element 120 Speech Recognition Engine, [0046] As described above with reference to FIG. 1, the input speech 115 includes one or more speech signals from a user and usually also includes noise. The methods and systems described herein are used to obtain the clean speech estimate 130 from the input speech 115. In some implementations, obtaining the input speech 115 includes dividing an incoming analog signal into segments of a predetermined time period and sampling the segments. In the current example, the incoming signal is segmented into 20 ms segments and sampled at 8 KHz to produce a discretized version of the incoming signal. A frequency domain transform can then be calculated using the discretized signal. For example, an N point fast Fourier transform (FFT) can be calculated to obtain an observation vector yobs corresponding to a given segment. In some implementations, the vectors y.sub.obs can be used to represent the input speech 115. Observation vectors for different segments can be denoted as a function of time. For example, in the current example where 20 ms segments are considered, an observation vector can be denoted as y.sub.obs(t), which represents a segment at a (20 ms.Math.t) offset from the beginning of the signal.)	
	With respect to Claim 18, Du et al.  in view of Willett et al. and Ittycheriah et al. teach all the limitations of Claim 12 upon which Clam 18 depends. Du et al. in view of Willett et al. and Ittycheriah et al. fail to explicitly teach
 	wherein the first character string includes characters respectively corresponding to speech signal frames obtained by splitting the speech signal at intervals of a preset time.
 	However, Kristjansson et al. teach 
 	wherein the first character string includes characters respectively corresponding to speech signal frames obtained by splitting the speech signal at intervals of a preset time (Kristjansson et al. Fig. 1 element 120 Speech Recognition Engine, [0046] As described above with reference to FIG. 1, the input speech 115 includes one or more speech signals from a user and usually also includes noise. The methods and systems described herein are used to obtain the clean speech estimate 130 from the input speech 115. In some implementations, obtaining the input speech 115 includes dividing an incoming analog signal into segments of a predetermined time period and sampling the segments. In the current example, the incoming signal is segmented into 20 ms segments and sampled at 8 KHz to produce a discretized version of the incoming signal. A frequency domain transform can then be calculated using the discretized signal. For example, an N point fast Fourier transform (FFT) can be calculated to obtain an observation vector yobs corresponding to a given segment. In some implementations, the vectors y.sub.obs can be used to represent the input speech 115. Observation vectors for different segments can be denoted as a function of time. For example, in the current example where 20 ms segments are considered, an observation vector can be denoted as y.sub.obs(t), which represents a segment at a (20 ms.Math.t) offset from the beginning of the signal.)
As described above with reference to FIG. 1, the input speech 115 includes one or more speech signals from a user and usually also includes noise. The methods and systems described herein are used to obtain the clean speech estimate 130 from the input speech 115. In some implementations, obtaining the input speech 115 includes dividing an incoming analog signal into segments of a predetermined time period and sampling the segments. In the current example, the incoming signal is segmented into 20 ms segments and sampled at 8 KHz to produce a discretized version of the incoming signal. A frequency domain transform can then be calculated using the discretized signal. For example, an N point fast Fourier transform (FFT) can be calculated to obtain an observation vector yobs corresponding to a given segment. In some implementations, the vectors y.sub.obs can be used to represent the input speech 115. Observation vectors for different segments can be denoted as a function of time. For example, in the current example where 20 ms segments are considered, an observation vector can be denoted as y.sub.obs(t), which represents a segment at a (20 ms.Math.t) offset from the beginning of the signal.)
Allowable Subject Matter
11.	Claims 5-7, 16, 17 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. 

Claims 11, 20 are allowed. 
The prior art(s) taken alone or in combination fail(s) to teach the following element(s) in combination with the other recited elements in the claim(s). 
	“determine whether to replace a portion of the first character string with another character string; 
 	control the communication interface to transmit the first character string to the server, based on the determination; and” as recited in Claim 11. 
	“determining whether to replace a portion of the first character string with another character string; 
 	transmitting the first character string to a server, based on the determination; and” as recited in Claim 20. 
	The closest prior art found. 
a.	Willett et al. (US 2018/0197545 A1.) In this reference, Willett et al. disclose a method/a system for generating the recognized text at the embedded ASR and transmitting the recognized text from the embedded ASR to the server as taught by Willett et al. for the benefit of  determining whether to request speech for processing from the mobile device and determining a semantic category associated with the recognized text at the server (Willett et al. [0047] The processed of Fig. 4 then proceeds to act 416, where a controller implemented on the server determines, based on the recognized text received from the mobile device, whether to request speech for processing from the mobile device. In some embodiments, at least a portion of the recognized text may be processed by a server NLU engine configured to determine a semantic category associated with the recognized text.) Willett et al. does not teach determining whether to replace a portion of the original transcription with another word(s)/phrase(s), Willett et al. does not teach transmitting the original transcription to a server based on the determination. Willett et al. fail to teach and/or suggest the allowable subject matter noted above. 
b.	Du et al. (US 2019/0103109 A1.) In this reference, the server receives the voice information from the smart phone, recognizes the voice information to obtain a first recognition Further referring FIG. 3, FIG. 3 is a schematic diagram of an application scenario of the method for recognizing voice based on an embodiment. In the application scenario of FIG. 3, first, the user sends voice information “call Wei Jing” through a smart phone, the smart phone may send the voice information to a server, and the server receives the voice information and acquires a user identifier of the user. Next, the server may recognize the voice information to obtain a first recognized text “call monosodium glutamate”. Then, the server may determine a word information set stored in association with the user identifier of the user based on the user identifier of the user, the word information set stores a historical character string weijing, an input result Wei Jing corresponding to weijing, and a candidate result monosodium glutamate corresponding to weijing. Finally, the server replaces the “monosodium glutamate” in the first recognized text with “Wei Jing” based on the word information in the determined word information set to obtain a second recognized text “call Wei Jing” and sends the second recognized text to the smart phone used by the user for display by the smart phone, as shown in FIG. 3.) Du et al. does not teach determining whether to replace a portion of the original transcription with another word(s)/phrase(s), Du et al. does not teach transmitting the original transcription to a server based on the determination. Du et al. fail to teach and/or suggest the allowable subject matter noted above. 
c.	Deisher et al. (US 2016/0379626 A1). In this reference, Deisher et al. receives word/phrase from the server and substitute the word/phrase into the local text string to generate different candidate text strings. It means that the local phone replaces a portion of the first character string with another character string to generate the second character string at the local phone (Deisher et al. [0044] If the acoustic score for the new text string is low, then the results are rejected. In some embodiments, the process ends as shown. In other embodiments, the local phone recognition lattice is used to resolve pronunciation ambiguities. Different words from the remote text string may be substituted into the local text string to generate different candidate text strings. These possibilities for the local text string are scored and the one with the highest score is selected. If the acoustic confidence is still too low, then the process ends. The local ASR is not changed, [0051] At 512 the text string hypothesis with the highest score is selected. Using the local phone lattice, each hypothesis of the modified local phone lattice will be tested against the actual utterance by scoring and the hypothesis with the highest score will be selected. In one embodiment, the cloud ASRs are only used to substitute low scoring words in the local ASR result with words from the cloud ASRs. In another embodiment, only the cloud ASRs are used to form the hypotheses. A phone lattice allows many different hypotheses to be tested through the single lattice structure.)
Conclusion
12.	The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure. See PTO-892.
a.	Bai et al. (US 2020/0357389.) In this reference, Bai et al. disclose a method for correcting the misrecognized text at the server. 
b.	Raghumathan et al. (US 2020/0027445 A1). In this reference, Raghumathan et al. teach correcting the mistranscirption at the Transcription Correction system. 
c. 	Bliss et al. (US 2007/0276651 A1.) In this reference, Bliss et al. disclose a method of sending a portion of the dictation containing the unrecognized words to the server. 
13. 	Any inquiry concerning this communication or earlier communications from the examiner should be directed to THUYKHANH LE whose telephone number is (571)272-6429. The examiner can normally be reached Mon-Fri: 9am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew C. Flanders can be reached on 571-272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about 





/THUYKHANH LE/Primary Examiner, Art Unit 2655