DETAILED ACTION
This communication is in response to the RCE with Amendment and Arguments filed on 04/04/2022. Claims 1-19 are pending and have been examined. 
All Objections/Rejections not mentioned in this Office Action have been withdrawn by the Examiner.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Change of Examiner
The Examiner of record for this application has changed from Anup Chandora to Paras Shah. 

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 04/04/2022 has been entered.
 

Response to Amendments and Arguments
The Applicant has amended independent Claims 1, 8 and 15. Hence, the Applicant’s arguments are moot in view of new grounds for rejection. More specifically, the newly added limitation to Claim 1, 8, and 15 is “inputting the user speech to an end-to-end speech recognition mode… wherein the end-to-end speech recognition model combines, into a single neural network, an acoustic model for extracting an acoustic feature and predicting a phoneme sequence, a pronunciation model for mapping the phoneme sequence with a word sequence, and a language model for designating probability to the word sequence” raises new grounds for rejection. Since the Applicant’s arguments are directed towards this new amendment, these arguments are moot in view of new grounds for rejection. Hence, a new reference has been applied. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 7-9, 14-15 are rejected under 35 U.S.C. 103 as being unpatentable over Hyun et. al. (European Publication No. EP 3113176 A1), hereinafter Hyun in view of Lee (US 2018/0190268) in view of Prabahavalkar (US 2020/0027444, provisional 62/701237 filed on 07/20/2018).
Regarding Claim 1, Hyun discloses:
An electronic device (Hyun, Paragraph 12: electronic device) comprising:
a microphone (Hyun, Paragraph 81: speech receiver 510 receives a user's audio signal input via microphone of electronic device 500);
a memory including at least one instruction (Hyun, Paragraph 97: memories storing instructions); and
at least one processor connected to the microphone and the memory to control the electronic device (Hyun, Paragraph 97: processor performs operations of Figures 1-6 and are implemented by hardware components, e.g. controllers, sensors [as in microphones], and memories), wherein the at least one processor is configured to:
based on a user speech being input through the microphone, identify a grapheme   sequence corresponding to the input user speech (Hyun, Paragraph 93 and 86: speech recognizer 520 inputs an audio signal to an acoustic model and outputs the final recognition result, the candidate target sequence, in a text format [as grapheme sequence]), 
obtain information regarding an edit distance between the identified grapheme sequence and each of a plurality of commands included in a command dictionary, the command dictionary stored in the memory, (Hyun, Paragraph 59: candidate set extractor 230 calculates similarities [as obtain information regarding an edit distance] between phoneme sequences and target sequences using a similarity calculation algorithm including an edit distance algorithm, and based on the similarities [as edit distance], extracts a specific number of phoneme sequences (e.g., the top 20 sequences) as candidate target sequences in order of similarity; Hyun, Figure 6, Paragraph 91: recognition result based on the pre-stored acoustic model and a predefined recognition target list [as command dictionary])
obtain a command sequence (Hyun, Paragraph 50: candidate target sequence [as command sequence] is the recognition result of an input audio signal) corresponding to the identified grapheme sequence based on the information regarding the edit distance between the identified grapheme sequence and each of a plurality of commands, the command sequence including at least one of the plurality of commands, (Hyun, Paragraph 59:  candidate set extractor 230 calculates similarities between phoneme sequences and target sequences using a similarity calculation algorithm including an edit distance algorithm, and based on the similarities, extracts a specific number of phoneme sequences (e.g., the top 20 sequences) as candidate target sequences in order of similarity)
map the obtained command sequence to one of a plurality of control commands to control an operation of the electronic device (Hyun, Paragraph 76: the speech recognition apparatus calculates similarities between one or more acquired phoneme sequences and each candidate target sequence in a candidate set using a similarity calculation algorithm including 10 an edit distance algorithm, and returns a candidate target sequence having the highest similarity as a recognition result [as map the obtained command sequence to one of the plurality of control command]; Paragraph 49: candidate set extractor 120 extracts target sequences [as control commands] from recognition target list 140 according to devices to be operated by a user to generate a candidate set), and
control an operation of the electronic device based on the mapped control command (Hyun, Paragraph 67: target sequences [as based on the mapped control command] includes commands for controlling a TV e.g. power on/off command [as control an operation of the electronic device]; Hyun, Paragraph 48: recognition target list 140 includes various commands to operate the TV.)
However, Hyun does not specifically teach inputting the user speech to an end-to-end speech recognition mode, wherein the end-to-end speech recognition model combines, into a single neural network, an acoustic model for extracting an acoustic feature and predicting a phoneme sequence, a pronunciation model for mapping the phoneme sequence with a word sequence, and a language model for designating probability to the word sequence.
Lee teaches inputting the user speech to an end-to-end speech recognition model (see [0086], where in a neural network the acoustic and language model are both included), wherein the end-to-end speech recognition model combines, into a single neural network, an acoustic model for extracting an acoustic feature and predicting a phoneme sequence,[[ a pronunciation model for mapping the phoneme sequence with a word sequence]], and a language model for designating probability to the word sequence (see [0086], where language model and acoustic model implemented in a same neural network, [0067], where acoustic model used to recognize phoneme unit, and language model used to determine probability between words).
Hyun and Lee are in the same field of endeavor of speech recognition, and therefore are analogous art. Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed inventions to have substituted  the speech recognition as taught by Hyun with the end-to-end speech recognition as taught by Lee in order to yield a predictable result of identify the text representative of the spoken input (see KSR v. Teleflex).
However, Hyun in view of Lee do not specifically teach the end-to-end speech recognition model comprising a pronunciation model.
Prabhavalkar does teach an end-to-end speech recognition model wherein the end-to-end speech recognition model combines, into a single neural network, an acoustic model… a pronunciation model for mapping the phoneme sequence with a word sequence, and a language model (see [0005], where end to end speech recognition integrates acoustic, pronunciation and language models into a single neural network and see provisional app support in para [0004]).
Hyun and Lee and Prabhavalkar are in the same field of endeavor of speech recognition, and therefore are analogous art. Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed inventions to have substituted  the speech recognition as taught by Hyun and Lee with the end-to-end speech recognition as taught by Prabhavalkar in order to enhance speech recognition accuracy (see Prabhavalkar [0004], and provisional, [0003]).

Regarding Claim 2, Hyun in view of Lee in view of Prabhavalkar teach all of the limitations as in claim 1, above.  
Furthermore, Hyun discloses:
The electronic device of claim 1, wherein the memory comprises software (Hyun, Paragraph 97: speech recognition apparatus 200 is implemented by hardware components, e.g. memories storing software) in which an end-to-end speech recognition model is implemented (Hyun, Paragraph 83: acoustic model with network based on a Recurrent Neural Network (RNN) or a Deep Neural Network (DNN) [as end-to-end speech recognition models]), and
wherein the at least one processor is further configured to:
execute software (Hyun, Paragraph 97: processor includes memories storing instructions) in which the end-to-end speech recognition model is implemented (Hyun, Paragraph 83: acoustic model with network based on a Recurrent Neural Network (RNN) or a Deep Neural Network (DNN) [as end-to-end speech recognition models]), and
identify the grapheme sequence by inputting, to the end-to-end speech recognition model, a user speech that is input through the microphone (Hyun, Paragraph 83 and 86: speech recognizer 520 inputs an audio signal to an acoustic model and outputs the final recognition result, the candidate target sequence, in a text format [as grapheme sequence]; Hyun, Paragraph 83: acoustic model with network based on a Recurrent Neural Network (RNN) or a Deep Neural Network (DNN) [as end-to-end speech recognition models]). 

Regarding Claim 7, Hyun in view of Lee in view of Prabhavalkar teach all of the limitations as in claim 1, above.  
Furthermore,  Hyun discloses:
The electronic device of claim 1, wherein the plurality of commands is related to a type of the electronic device and a function included in the electronic device (Hyun, Paragraph 84: recognition target list is predefined according to the types and purposes of the electronic device 500; Hyun, Paragraph 67: target sequences includes commands for controlling a TV [as type of electronic device] e.g. power on/off command [as function]). 

Regarding Claim 8, Hyun discloses:
A controlling method of an electronic device (Hyun, Paragraph 98: speech recognition methods in Figures 3, 4, and 6 to control electronic device), the method comprising:
based on a user speech being input through a microphone (Hyun, Paragraph 81: speech receiver 510 receives a user's audio signal input via microphone of electronic device 500), identifying a grapheme sequence corresponding to the input user speech (Hyun, Paragraph 83 and 86: speech recognizer 520 inputs an audio signal to an acoustic model and outputs the final recognition result, the candidate target sequence, in a text format [as grapheme sequence]);
obtaining information regarding an edit distance between the identified grapheme sequence and each of a plurality of commands included in a command dictionary, the command dictionary stored in a memory; (Hyun, Paragraph 59: candidate set extractor 230 calculates similarities [as obtain information regarding an edit distance] between phoneme sequences and target sequences using a similarity calculation algorithm including an edit distance algorithm, and based on the similarities [as edit distance], extracts a specific number of phoneme sequences (e.g., the top 20 sequences) as candidate target sequences in order of similarity; Hyun, Figure 6, Paragraph 91: recognition result based on the pre-stored acoustic model and a predefined recognition target list [as command dictionary])
obtaining a command sequence (Hyun, Paragraph 50: candidate target sequence [as command sequence] is the recognition result of an input audio signal) corresponding to the identified grapheme sequence based on the information regarding the edit distance between the identified grapheme sequence and each of the plurality of commands (Hyun, Paragraph 59: candidate set extractor 230 calculates similarities between phoneme sequences and target sequences using a similarity calculation algorithm including an edit distance algorithm, and based on the similarities, extracts a specific number of phoneme sequences (e.g., the top 20 sequences) as candidate target sequences in order of similarity) that are included in a command dictionary that is stored in the memory and are related to control of the electronic device and the identified grapheme sequence, the command sequence including at least one of the plurality of commands; (Hyun, Figure 6, Paragraph 91: recognition result based on the pre-stored acoustic model and a predefined recognition target list [as command dictionary]);
mapping the obtained command sequence to one of a plurality of control commands to control an operation of the electronic device (Hyun, Paragraph 76: the speech recognition apparatus calculates similarities between one or more acquired phoneme sequences and each candidate target sequence in a candidate set using a similarity calculation algorithm including 10 an edit distance algorithm, and returns a candidate target sequence having the highest similarity as a recognition result [as map the obtained command sequence to one of the plurality of control command]; Paragraph 49: candidate set extractor 120 extracts target sequences [as control commands] from recognition target list 140 according to devices to be operated by a user to generate a candidate set), and
controlling an operation of the electronic device based on the mapped control command (Hyun, Paragraph 67: target sequences [as based on the mapped control command] includes commands for controlling a TV e.g. power on/off command [as control an operation of the electronic device]; Hyun, Paragraph 48: recognition target list 140 includes various commands to operate the TV.) 
However, Hyun does not specifically teach inputting the user speech to an end-to-end speech recognition mode, wherein the end-to-end speech recognition model combines, into a single neural network, an acoustic model for extracting an acoustic feature and predicting a phoneme sequence, a pronunciation model for mapping the phoneme sequence with a word sequence, and a language model for designating probability to the word sequence.
Lee teaches inputting the user speech to an end-to-end speech recognition model (see [0086], where in a neural network the acoustic and language model are both included), wherein the end-to-end speech recognition model combines, into a single neural network, an acoustic model for extracting an acoustic feature and predicting a phoneme sequence,[[ a pronunciation model for mapping the phoneme sequence with a word sequence]], and a language model for designating probability to the word sequence (see [0086], where language model and acoustic model implemented in a same neural network, [0067], where acoustic model used to recognize phoneme unit, and language model used to determine probability between words).
Hyun and Lee are in the same field of endeavor of speech recognition, and therefore are analogous art. Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed inventions to have substituted  the speech recognition as taught by Hyun with the end-to-end speech recognition as taught by Lee in order to yield a predictable result of identify the text representative of the spoken input (see KSR v. Teleflex).
However, Hyun in view of Lee does not specifically teach the end-to-end speech recognition model comprising a pronunciation model.
Prabhavalkar does teach an end-to-end speech recognition model wherein the end-to-end speech recognition model combines, into a single neural network, an acoustic model… a pronunciation model for mapping the phoneme sequence with a word sequence, and a language model (see [0005], where end to end speech recognition integrates acoustic, pronunciation and language models into a single neural network and see provisional app support in para [0004]).
Hyun and Lee and Prabhavalkar are in the same field of endeavor of speech recognition, and therefore are analogous art. Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed inventions to have substituted  the speech recognition as taught by Hyun and Lee with the end-to-end speech recognition as taught by Prabhavalkar in order to enhance speech recognition accuracy (see Prabhavalkar [0004], and provisional, [0003]).

Regarding Claim 9, Hyun in view of Lee in view of Prabhavalkar teach all of the limitations as in claim 8, above.  
Furthermore,  Hyun discloses:
The controlling method of claim 8, wherein the identifying of the grapheme sequence comprises inputting, to an end-to-end speech recognition model, a user speech that is input through the microphone (Hyun, Paragraph 83 and 86: speech recognizer 520 inputs an audio signal to an acoustic model and outputs the final recognition result, the candidate target sequence, in a text format [as grapheme sequence]; Hyun, Paragraph 83: acoustic model with network based on a Recurrent Neural Network (RNN) or a Deep Neural Network (DNN) [as end-to-end speech recognition models]).

Regarding Claim 14, Hyun in view of Lee in view of Prabhavalkar teach all of the limitations as in claim 8, above.  
Furthermore,  Hyun discloses:
The controlling method of claim 8, wherein the plurality of commands is related to a type of the electronic device and a function included in the electronic device (Hyun, Paragraph 84: recognition target list is predefined according to the types and purposes of the electronic device 500; Hyun, Paragraph 67: target sequences includes commands for controlling a TV [as type of electronic device] e.g. power on/off command [as function]).

Regarding Claim 15, Hyun discloses:
A non-transitory computer readable recordable medium (see non-transitory computer-readable storage media, as shown in Paragraph 100) including a program for executing a controlling method of an electronic device (Hyun, Paragraph 98: software for performing speech recognition methods in Figures 3, 4, and 6 to control electronic device) , wherein the controlling method of the electronic device comprises:
based on a user speech being input through a microphone (Hyun, Paragraph 81: speech receiver 510 receives a user's audio signal input via microphone of electronic device 500), identifying a grapheme sequence corresponding to the input user speech (Hyun, Paragraph 83 and 86: speech recognizer 520 inputs an audio signal to an acoustic model and outputs the final recognition result, the candidate target sequence, in a text format [as grapheme sequence]);
obtaining information regarding an edit distance between the identified grapheme sequence and each of a plurality of commands included in a command dictionary, the command dictionary stored in a memory; (Hyun, Paragraph 59: candidate set extractor 230 calculates similarities [as obtain information regarding an edit distance] between phoneme sequences and target sequences using a similarity calculation algorithm including an edit distance algorithm, and based on the similarities [as edit distance], extracts a specific number of phoneme sequences (e.g., the top 20 sequences) as candidate target sequences in order of similarity; Hyun, Figure 6, Paragraph 91: recognition result based on the pre-stored acoustic model and a predefined recognition target list [as command dictionary])
obtaining a command sequence (Hyun, Paragraph 50: candidate target sequence [as command sequence] is the recognition result of an input audio signal) corresponding to the identified grapheme sequence based on the information regarding the edit distance between the identified grapheme sequence and each of the plurality of commands (Hyun, Paragraph 59: candidate set extractor 230 calculates similarities between phoneme sequences and target sequences using a similarity calculation algorithm including an edit distance algorithm, and based on the similarities, extracts a specific number of phoneme sequences (e.g., the top 20 sequences) as candidate target sequences in order of similarity) that are included in a command dictionary that is stored in the memory and are related to control of the electronic device and the identified grapheme sequence, the command sequence including at least one of the plurality of commands; (Hyun, Figure 6, Paragraph 91: recognition result based on the pre-stored acoustic model and a predefined recognition target list [as command dictionary]); 
mapping the obtained command sequence to one of a plurality of control commands to control an operation of the electronic device (Hyun, Paragraph 76: the speech recognition apparatus calculates similarities between one or more acquired phoneme sequences and each candidate target sequence in a candidate set using a similarity calculation algorithm including 10 an edit distance algorithm, and returns a candidate target sequence having the highest similarity as a recognition result [as map the obtained command sequence to one of the plurality of control command]; Paragraph 49: candidate set extractor 120 extracts target sequences [as control commands] from recognition target list 140 according to devices to be operated by a user to generate a candidate set), and
controlling an operation of the electronic device based on the mapped control command (Hyun, Paragraph 67: target sequences [as based on the mapped control command] includes commands for controlling a TV e.g. power on/off command [as control an operation of the electronic device]; Hyun, Paragraph 48: recognition target list 140 includes various commands to operate the TV.) 
However, Hyun does not specifically teach inputting the user speech to an end-to-end speech recognition mode, wherein the end-to-end speech recognition model combines, into a single neural network, an acoustic model for extracting an acoustic feature and predicting a phoneme sequence, a pronunciation model for mapping the phoneme sequence with a word sequence, and a language model for designating probability to the word sequence.
Lee teaches inputting the user speech to an end-to-end speech recognition model (see [0086], where in a neural network the acoustic and language model are both included), wherein the end-to-end speech recognition model combines, into a single neural network, an acoustic model for extracting an acoustic feature and predicting a phoneme sequence,[[ a pronunciation model for mapping the phoneme sequence with a word sequence]], and a language model for designating probability to the word sequence (see [0086], where language model and acoustic model implemented in a same neural network, [0067], where acoustic model used to recognize phoneme unit, and language model used to determine probability between words).
Hyun and Lee are in the same field of endeavor of speech recognition, and therefore are analogous art. Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed inventions to have substituted  the speech recognition as taught by Hyun with the end-to-end speech recognition as taught by Lee in order to yield a predictable result of identify the text representative of the spoken input (see KSR v. Teleflex).


Claim 3-4 and 10-11 are rejected under 35 U.S.C. 103 as being unpatentable over Hyun in view of Lee in view of Prabhavalkar, as applied in claims 2 and 9, above and further in view of Georges et. al. (US PGPub No. US 20190027133 A1), hereinafter Georges.
Regarding Claim 3-4 and 10-11, Hyun in view of Lee in view of Prabhavalkar discloses all of the limitations noted above in Claims 2-3 and 9-10.
Furthermore, Hyun discloses an electronic device:
wherein the memory further comprises software in which an artificial neural network model is implemented (Hyun, Paragraph 83: speech recognizer 520 inputs an audio signal to an acoustic model with a network based on a Recurrent Neural Network (RNN) or a Deep Neural Network (DNN) [as artificial neural network models] and is trained using a Connectionist Temporal Classification (CTC) learning algorithm), and
wherein the at least one processor is further configured to: execute software (Hyun, Paragraph 97: processor includes memories storing instructions) in which the artificial neural network is implemented (Hyun, Paragraph 83: speech recognizer 520 inputs an audio signal to an acoustic model with a network based on a Recurrent Neural Network (RNN) or a Deep Neural Network (DNN) [as artificial neural network models] and is trained using a Connectionist Temporal Classification (CTC) learning algorithm), and
where at least one of the end-to-end speech recognition model or the artificial neural network model comprises a recurrent neural network (RNN) (Hyun, Paragraph 83: speech recognizer 520 inputs an audio signal to an acoustic model with a network based on a Recurrent Neural Network (RNN) or a Deep Neural Network (DNN) [as artificial neural network models] and is trained using a Connectionist Temporal Classification (CTC) learning algorithm). 
However, Hyun in view of Lee in view of Prabhavalkar does not disclose an electronic device configured to:
input the obtained command sequence to the artificial neural network model and map to at least one of the plurality of control commands.
Georges does teach an electronic device configured to: 
input the obtained command sequence to the artificial neural network model and map to at least one of the plurality of control commands  (Georges, Paragraph 17 and 18: word sequences [as inputted command sequence] generated from the automatic speech recognizer (ASR) 208 are sent to the natural language understander (NLU) 210, which contains a trained neural network; then, the NLU generates normalized commands with an intent e.g. desired course of action [as mapped control command]; Georges, Paragraph 29: NLU includes a classifier 312 that may be a deep neural network (DNN) or a recurrent neural network (RNN)).
Hyun and Lee and Prabhavalkar and Georges are both considered to be analogous to the claimed invention because they are in the same field of speech recognition using artificial neural networks and involving an execution procedure of a spoken command. Accordingly, it would have been obvious to one of ordinary skill in the art at the time the invention was effectively filed to have combined Hyun directed to acoustic model with a network based on a Recurrent Neural Network (RNN) or a Deep Neural Network (DNN) and Georges directed to a natural language understander with a neural network classifier that inputs word sequences and outputs normalized commands and arrived at an artificial network model that inputs a command sequence and outputs a control command.  One of ordinary skill in the art would have been motivated to make such a combination, as taught by Georges, because speech language understanding systems e.g. NLUs can be used to control applications using spoken language (Georges, Paragraph 13-14).

Claim(s) 5 and 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hyun in view of Lee in view of Prabhavalkar, as applied in claims 3 and 10, above and further in view of Choi et. al. (US PGPub No. US 20180144749 A1), hereinafter Choi.
Regarding Claims 5 and 12, Hyun in view of Lee in view of Prabhavalkar discloses all of the limitations noted above in Claims 3 and 10, with the exception that Hyun in view of Lee in view of Prabhavalkar does not disclose an electronic device:
wherein the at least one processor is further configured to jointly train an entire pipeline of the end-to-end speech recognition model and the artificial neural network model
jointly training an entire pipeline of the end-to-end speech recognition model and the artificial neural network model.
Choi does teach an electronic device:
wherein the at least one processor is further configured to jointly train an entire pipeline of the end-to-end speech recognition model and the artificial neural network model (Choi, Paragraph 15: neural network configured according to having been trained in a learning process using training data, where the learning process includes simultaneously training the acoustic model, the language model, and the unified model).
Hyun in view of Lee in view of Prabhavalkar and Choi are both considered to be analogous to the claimed invention because they are in the same field of speech recognition using artificial neural networks, feature extraction, and selection of recognition units i.e. phonemes.  Accordingly, it would have been obvious to one of ordinary skill in the art at the time the invention was effectively filed to have combined Hyun (directed to electronic device and controlling method that includes an acoustic model with a network based on a Recurrent Neural Network (RNN) or a Deep Neural Network (DNN)) and Choi (directed to a neural network configured according to having been trained in a learning process using training data, where the learning process includes simultaneously training the acoustic model, the language model, and the unified model) and arrived at an electronic device and controlling method that jointly trains an entire pipeline of the end-to-end speech recognition model and the artificial neural network model.  One of ordinary skill in the art would have been motivated to make such a combination because simultaneous training allows acoustic and language models to recognize speech together and therefore, be updated together, rather than be trained and updated independently based on respective forced alignment information (Choi, Paragraph 80).

Claim(s) 6 and 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hyun in view of Lee in view of Prabhavalkar, as applied in claims 1 and 8 above, and futher in view of Georges (previously introduced).
Regarding Claims 6 and 13, Hyun in view of Lee in view of Prabhavalkar discloses all of the limitations noted above in Claims 1 and 8. Furthermore, Hyun discloses an electronic device:
wherein the obtaining of the command sequence comprises obtaining, from the identified grapheme sequence, a command sequence that is within a predetermined edit distance with the identified grapheme sequence among the plurality of commands (Hyun, Paragraph 59: candidate set extractor 230 calculates similarities between phoneme sequences and target sequences using a similarity calculation algorithm including an edit distance algorithm, and based on the similarities [as edit distance], extracts a specific number of phoneme sequences (e.g., the top 20 sequences) as candidate target sequences in order of similarity).
However, Hyun in view of Lee in view of Prabhavalkar does not disclose an electronic device: 
wherein the edit distance is a minimum number of removal, insertion, and substitution of a letter  that are required to convert the identified grapheme sequence to each of the plurality of commands, and
obtaining, from the identified grapheme sequence, a command sequence that is within a predetermined edit distance with the identified grapheme sequence among the plurality of commands 
Georges does teach an electronic device: 
wherein the edit distance is a minimum number of removal, insertion, and substitution of a letter  that are required to convert the identified grapheme sequence to each of the plurality of commands (Georges, Paragraph 48: Levenshtein distance [as edit distance] is the minimum number of single character edits e.g. insertions, deletions or substitutions), and
a predetermined edit distance (Georges, Paragraph 48: matching algorithm using Levenshtein distance with a previously agreed cut-off value [as predetermined edit distance]).
obtaining, from the identified grapheme sequence, a command sequence that is within a predetermined edit distance with the identified grapheme sequence among the plurality of commands (Georges, Figure 5, Paragraph 48-49: type casting 520 is performed on tagged word sequence hypotheses [as command sequence] by inputting semantically grouped words and computing a canonical representation and final property [as command]; type casting 520 accomplishes this classifying step by performing a matching algorithm using Levenshtein distance with a previously agreed cut-off value).
Hyun in view of Lee in view of Prabhavalkar and Georges are both considered to be analogous to the claimed invention because they are in the same field of speech recognition using artificial neural networks and involving an execution procedure of a spoken command. Accordingly, it would have been obvious to one of ordinary skill in the art at the time the invention was effectively filed to have combined Hyun directed to a similarity algorithm utilizing edit distance and Georges directed to a type casting method of inputting semantically grouped words and computing a canonical representation and final property by performing a matching algorithm utilizing Levenshtein distance with a cut-off value, wherein Levenshtein distance is a minimum number of single character edits e.g. insertions, deletions or substitutions and arrived at a method of obtaining a command sequence that is within a predetermined cutoff value, wherein edit distance is a minimum number of single character edits e.g. insertions, deletions or substitutions.  One of ordinary skill in the art would have been motivated to make such a combination because Levenshtein distance is known to be used in matching (or similarity) algorithms and in sequence replacement to reduce errors (Georges, Paragraph 35).

Claim(s) 16 is rejected under 35 U.S.C. 103 as being unpatentable over Hyun in view of Lee in view of Prabhavalkar, as applied in claim 1, above and further in view of Bowman et. al. (US PGPub No. US 2012/0192096 A1), hereinafter Bowman.
Regarding Claim 16, Hyun in view of Lee in view of Prabhavalkar discloses all of Claim 1 limitations above. Furthermore, Hyun discloses the electronic device, wherein the at least one processor is further configured to:
determine whether the obtained command sequence is included in the command dictionary, (Hyun, Paragraph 59:  candidate set extractor 230 calculates similarities between phoneme sequences and target sequences using a similarity calculation algorithm including an edit distance algorithm, and based on the similarities, extracts a specific number of phoneme sequences (e.g., the top 20 sequences) as candidate target sequences [as determined that obtained command sequence] in order of similarity;)
in response to the obtained command sequence being included in the command dictionary: (Hyun, Figure 6, Paragraph 91: recognition result based on the pre-stored acoustic model and a predefined recognition target list [as obtained command sequence being included in the command dictionary]) 
map the obtained command sequence to one of the plurality of control commands to control an operation of the electronic device, and control an operation of the electronic device based on the mapped control command, and (Hyun, Paragraph 76: the speech recognition apparatus calculates similarities between one or more acquired phoneme sequences and each candidate target sequence in a candidate set using a similarity calculation algorithm including 10 an edit distance algorithm, and returns a candidate target sequence having the highest similarity as a recognition result [as mapping the obtained command sequence to one of the plurality of control command]; Paragraph 49: candidate set extractor 120 extracts target sequences [as control commands] from recognition target list 140 according to devices to be operated by a user to generate a candidate set; Hyun, Paragraph 48: recognition target list 140 includes various commands to operate the TV; Hyun, Paragraph 67: target sequences [as based on the mapped control command] includes commands for controlling a TV e.g. power on/off command [as to control an operation of the electronic device]);
Hyun in view of Lee in view of Prabhavalkar does not disclose the electronic device, wherein the at least one processor is further configured to:
in response to the obtained command sequence not being included in the command dictionary, 
provide a notification presenting a choice of options to a user of the electronic device. 
However, Bowman does teach the electronic device, wherein the at least one processor is further configured to:
in response to the obtained command sequence not being included in the command dictionary, (Bowman, Paragraph 75: For example, "Call V" results in more calls to contact "Vesper" than contact "Victoria", predicting the Vesper as the value of the command parameter, and displaying and selecting this prediction allows a call to Vesper to be initiated just by causing the activation input (such as depression of a "Call" or "ENTER" key). If the prediction is incorrect [as in response to the obtained command sequence not being included in the command dictionary], further input in the command line, such as "Call Vi", will further disambiguate the matching commands
provide a notification presenting a choice of options to a user of the electronic device. (Bowman, Paragraph 75: If the prediction is incorrect, further input in the command line, such as "Call Vi" [as notification presenting a choice of options], will further disambiguate the matching commands; Bowman, Paragraph 21: This active command line user interface provides a visual drop down list that displays the available commands matching input in the command line [as provides a notification presenting a choice of options] and optionally context-sensitive information. The visual drop down list may include selections, option values, hints for command and command parameter inputs and/or dynamically suggest values of objects to be acted upon by the command; Bowman, Figure 5, Paragraph 114: The active command line driven user interface provides ambiguity resolution between connectors and command parameters; Ambiguity is resolved by processing and displaying all command possibilities and matching the input string in the command line 504 in the command list 512.)
Hyun in view of Lee in view of Prabhavalkar and Bowman are both considered to be analogous to the claimed invention because they are in the same field of electronic device control. Accordingly, it would have been obvious to one of ordinary skill in the art at the time the invention was effectively filed to have combined Hyun (directed to the electronic device to, in response to the obtained command sequence being included in the command dictionary, map the obtained command sequence to one of the plurality of control commands to control an operation of the electronic device) and Bowman (directed to the electronic device to, in response to the obtained command sequence not being included in the command dictionary, provide a notification presenting a choice of options to a user of the electronic device) and arrived the electronic device to, in response to the obtained command sequence being included in the command dictionary, map the obtained command sequence to one of the plurality of control commands to control an operation of the electronic device, and, in response to the obtained command sequence not being included in the command dictionary, provide a notification presenting a choice of options to a user of the electronic device.  One of ordinary skill in the art would have been motivated to make such a combination because the active command line driven user interface provides ambiguity resolution between connectors and command parameters. (Bowman, Paragraph 114).

Claim(s) 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Hyun in view of Lee in view of Prabhavalkar in view of Bowman, as applied in claim 16, above and further in view of Keely et. al. (US PGPub No. US 2003/0216913 A1), hereinafter Keeley.
Regarding Claim 17, Hyun in view of Lee in view of Prabhavalkar in view of Bowman discloses all of Claim 16 limitations above, with the exception that Hyun does not disclose the electronic device:
wherein the choice of options includes an option to input another user speech to the electronic device. 
However, Keeley does teach the electronic device: 
wherein the choice of options includes an option to input another user speech to the electronic device. (Keeley, Figure 4, Re-write/speak 415 and [0140], where natural input to replace the inaccurate text)
Hyun in view of Lee in view of Prabhavalkar in view of Bowman  and Keeley are both considered to be analogous to the claimed invention because they are in the same field of electronic device control. Accordingly, it would have been obvious to one of ordinary skill in the art at the time the invention was effectively filed to have combined Hyun (directed to the electronic device to, in response to the obtained command sequence being included in the command dictionary, map the obtained command sequence to one of the plurality of control commands to control an operation of the electronic device) and Keeley (directed to wherein the choice of options includes an option to input another user speech to the electronic device.  One of ordinary skill in the art would have been motivated to make such a combination in order to correct inaccurately recognized text (Keeley, Paragraph [0009]).

Regarding Claim 18, Hyun in view of Lee in view of Prabhavalkar in view of Bowman, discloses all of Claim 16 limitations above, with the exception that Hyun in view of Lee in view of Prabhavalkar in view of Bowman does not disclose the electronic device:
wherein the choice of options includes an option to add the obtained command sequence to the command dictionary.
However, Keeley does teach the electronic device: 
wherein the choice of options includes an option to add the obtained command sequence to the command dictionary. (Keeley, Add to Dictionary 419 and [0141], where existing text is added to a recognition dictionary)
Hyun in view of Lee in view of Prabhavalkar in view of Bowman  and Keeley are both considered to be analogous to the claimed invention because they are in the same field of electronic device control. Accordingly, it would have been obvious to one of ordinary skill in the art at the time the invention was effectively filed to have combined Hyun (directed to the electronic device to, in response to the obtained command sequence being included in the command dictionary, map the obtained command sequence to one of the plurality of control commands to control an operation of the electronic device) and Keeley directed to wherein the choice of options includes an option to add the obtained command sequence to the command dictionary. One of ordinary skill in the art would have been motivated to make such a combination in order to correct inaccurately recognized text (Keeley, Paragraph [0009]).

Claim(s) 19 is rejected under 35 U.S.C. 103 as being unpatentable over Hyun in view of Lee in view of Prabhavalkar in view of Bowman, as applied in claim 16, above and further in view of Comerford (US 8,407,057).
Regarding Claim 19, Hyun in view of Lee in view of Prabhavalkar in view of Bowman  discloses all of Claim 16 limitations above, with the exception that Hyun does not disclose the electronic device, wherein the at least one processor is further configured to:
in response to a function of the electronic device being updated, add a command corresponding to the updated function to the command dictionary.
However, Comerford does teach the electronic device, wherein the at least one processor is further configured to:
in response to a function of the electronic device being updated, add a command corresponding to the updated function to the command dictionary. (see col. 8, lines 23-26, where modification of a voice command or action is performed, col. 9, lines 50-col. 10, lines 6, where “I want quiet” is learned by system to  to be related to turning off radio, turning the CD player off, closing the windows, setting the climate control to 72 and muting the telephone)
Hyun in view of Lee in view of Prabhavalkar in view of Bowman  and Comerford are both considered to be analogous to the claimed invention because they are in the same field of electronic device control. Accordingly, it would have been obvious to one of ordinary skill in the art at the time the invention was effectively filed to have combined Hyun (directed to the electronic device to, in response to the obtained command sequence being included in the command dictionary, map the obtained command sequence to one of the plurality of control commands to control an operation of the electronic device) and Comerford  in response to a function of the electronic device being updated, add a command corresponding to the updated function to the command dictionary.  One of ordinary skill in the art would have been motivated to make such a combination in order to provide a user guided teaching and learning of new commands and action to be executed by the conversational learning system (see Comerford col. 1, lines 13-15).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PARAS D SHAH whose telephone number is (571)270-1650. The examiner can normally be reached Monday-Thursday 7:30AM-2:30PM, 5PM-7PM (EST), Friday 8AM-noon (EST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached on 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Paras D Shah/Primary Examiner, Art Unit 2659                                                                                                                                                                                                        
05/22/2022