DETAILED ACTION
Introduction
This office action is in response to Applicant’s submission filed on 09/20/2022. Claims 1-26 are pending in the application and have been examined.
	
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
The response filed on 09/20/2022 has been correspondingly accepted and considered in this Office Action. Claims 1-26 have been examined. Applicant’s amendments to claim 26, indicating a decider compiler to process the compiling, determining and recompiling overcome the 35 U.S.C 101 rejection previously set forth in the Non-Final Office Action mailed 06/23/2022. Therefore, the above referenced rejection under 35 U.S.C. 101 is withdrawn. Applicant’s amendments to claim 1 to limit the adaptive decoder overcome the 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, rejections previously set forth in the Non-Final Office Action mailed 06/23/2022. Therefore, the above referenced rejection under 35 U.S.C. 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph,  is withdrawn.

Response to Arguments
Applicant's arguments filed 09/20/2022 have been fully considered as follows:
Applicant’s arguments with respect to claim 1 on pg. 11 states that
“With respect to labeled speech utterances, Lilly discloses a user may speak an utterance.
However, Lilly does not disclose a labeled speech utterance or labeled audio, much less using labeled speech utterances to generate a hypothesis sequence.. Lilly does not disclose the ASR model is compressed and also does not disclose any other compressed acoustic model. Thus, Lilly does not disclose a compressed acoustic model.”
	
The examiner respectfully disagrees, Lilly teaches “the ASR transcribes audio data into text data
representing the words of the speech contained in the audio data. The text data may then be used
by other components for various purposes, such as executing system commands, inputting data,
etc. A spoken utterance in the audio data is input to a processor configured to perform ASR
which then interprets the utterance based on the similarity between the utterance and pre-
established language models 254 stored in an ASR model knowledge base (ASR Models Storage
252)” in Lilly, Col. 7, lines 1-9, the labeled speech utterances are preestablished language models stored in the ASR Models Storage which are compared to spoken utterance. Lilly further goes to teach “The system may then identify an existing word known to the system (i.e., within the ASR lexicon 190) that is used in a similar manner to the new word. This may be done as follows. The system may determine (134) an existing word in the ASR lexicon 190 where the existing word has second usage characteristics. The system may then determine (136) that the first characteristics are similar to the second characteristics. One example of this is to determine that a first vector associated with the first characteristics is sufficiently similar to a second vector associated with the second characteristics, as described below in reference to FIGS. 4A and 4B. If the first characteristics are sufficiently similar to the second characteristics, the system may determine that the new word is used similarly to the existing word. With this information, the system may determine potential variations of the new word based on the variations of the existing word.” in Lilly, col. 3, lines 27-42,  and then a new ASR model is trained to decode additional grapheme sequences or  new word and variation of a new word. Lilly further teaches “If limited speech recognition is included, the ASR module 250 may be configured to identify a limited number of words, such as keywords detected by the device, whereas extended speech recognition may be configured to recognize a much larger range of words.” In Lilly, col 18 lines 20-25 the limited ASR module is interpreted as compressed acoustic model. Therefore, Lilly teaches receiving a labeled speech utterance or labeled audio and then using labeled speech utterances to generate a hypothesis sequence generated by the compressed acoustic model and therefore, the rejections of Claims 1, 13 and 21 under 35 U.S.C. 103 are sustained and further updated accordingly.
Applicant’s arguments with respect to claims 25 and 26 on pg. 12 state that
“Applicant additionally respectfully submits Lilly and Brocious do not disclose or suggest determining a second set of commands for an adapted lexicon, wherein the second set of commands is associated with a new state or context and recompiling an adapted lexicon or decoder with the second set of commands, as recited in independent claim 25 and similarly recited in independent claim 26.”

The examiner respectfully disagrees, Lilly teaches “If the first characteristics are sufficiently similar to the second characteristics, the system may determine that the new word is used similarly to the existing word. With this information, the system may determine potential variations of the new word based on the variations of the existing word. The system can then store (142) the new word (“bolt”) and/or the variation of the new word (“unbolt”) in the ASR lexicon 190 and can train (144) a new ASR model(s) using the new word and the variation of the new word” in Lilly, col. 3 lines 38-43 and col. 3, 63-67. Lilly teaches recognizing new word as a second set of commands which is used to train the ASR or adapted lexicon to recognize the second set of commands.  Brocious teaches “the audio context changes shown in blocks 108 and 112” in Brocious, col. 3 line 41. Therefore, Brocious teaches recognizing the change of context or state and Lilly in view of Brocious teaches recompiling the adapted lexicon or decoder with the second set of commands associated with a new context and therefore, the rejections of Claims 25 and 26  under 35 U.S.C. 103 are sustained and further updated accordingly.
Applicant’s arguments with respect to claims 5 on pg. 12 states that
“Lilly does not disclose anything about a timestamp.”

The examiner respectfully disagrees, Lilly teaches “Following detection of a wakeword, the device sends audio data 111 corresponding to the utterance, to a server 120 that includes an ASR module 250. The audio data 111 may be output from an acoustic front end (AFE) 256 located on the device 110 prior to transmission. Or the audio data 111 may be in a different form for processing by a remote AFE 256, such as the AFE 256 located with the ASR module 250. The wakeword detection module 220 works in conjunction with other components of the device to detect keywords in audio 11. Keyword detection may include analyzing individual directional audio signals, such as those processed post-beamforming if applicable. Other techniques known in the art of keyword detection (also known as keyword spotting) may also be used. Instead, incoming audio (or audio data) is analyzed to determine if specific characteristics of the audio match preconfigured acoustic waveforms, audio signatures, or other data to determine if the incoming audio “matches” stored audio data corresponding to a keyword. Under a wakeword configuration, when a wakeword is detected the system may “wake” and commence further speech processing.  The wakeword detection module 220 may access the storage 608 and compare the captured audio to the stored models and audio sequences using audio comparison, pattern recognition, keyword spotting, audio signature, and/or other audio processing techniques” in Lilly, col 5, lines 35-45 , col 6 lines 16-25, col  18 lines 45-50 and col. 19 lines 8-13. Lilly discloses the wakeword or keyword detection processing which is used to wake or confirm the recognition of the keyword at the time receiving the wakeword or keyword to further send the AFE processed audio data to ASR module the which is interpreted as timestamp of the wakeword and therefore, the rejection of Claims 5 under 35 U.S.C. 103 are sustained and further updated accordingly.
In response to the art rejection(s) of the remainder of dependent claims are rejected under 35 U.S.C 103, in case said claims are correspondingly discussed and/or argued for at least the same rationale presented in Remarks filed 09/20/2022, Examiner respectfully notes as follows. For completeness, should the mentioned claims be likewise traversed for similar reasons to independent claims 1, 13 and 21 correspondingly, Examiner respectfully directs Applicant to the same previous supra reasons provided in the response directed towards claims 1, 13 and 21 correspondingly discussed above. For at least the same supra provided reasons, Examiner likewise respectfully disagrees, and Applicant's arguments have been fully considered but they are not persuasive.

Claim Rejections - 35 USC § 103
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 1-12 are rejected under 35 U.S.C. 103 as being unpatentable over Lilly, US Patent 10,134,388 in view of Parlikar et. al., US 9,508,341.
Regarding claim 1, Lilly teaches an apparatus (see Lilly Fig. 6, Fig. 7)), comprising: an adaptive decoder configured to determine a command from a sequence of graphemes, the sequence of graphemes generated using a compressed acoustic model, wherein the adaptive decoder expands to recognize additional grapheme sequences associated with the command(See Lilly, Col 5, lines 23-26 FIG. 2 is a conceptual diagram of how a spoken utterance is traditionally processed, allowing a system to capture and execute commands spoken by a user, such as spoken commands that may follow a wakeword. See Lilly, col 2 lines 62-66 during a training process the system, through server 120, may determine a determine (130) a new word in a text corpus 180 but not in an ASR lexicon 190. The ASR lexicon 190 is data representing all the words recognizable by the system 100 for purposes of ASR processing. See Lilly, col. 3 lines 63-67 The system can then store (142) the new word (“bolt”) and/or the variation of the new word (“unbolt”) in the ASR lexicon 190 and can train (144) a new ASR model(s) using the new word and the variation of the new word; interpreted as command decoding from words (interpreted as sequence of graphemes) and expanded to recognize additional words associated with command), and wherein the additional grapheme sequences are hypothesis sequences generated by the compressed acoustic model using labeled speech utterances (see Lilly, Col 3, lines 58-68  Thus the system could determine (138) a variation (“unlock”) of the existing word (“lock”), where the variation has a root (“lock”) and an affix (“un”). The system could then create (140) a variation (“unbolt”) of the new word using the same affix (“un”) and the root of the new word (“bolt”). The system can then store (142) the new word (“bolt”) and/or the variation of the new word (“unbolt”) in the ASR lexicon 190 and can train (144) a new ASR model(s) using the new word and the variation of the new word; word is interpreted as sequence of graphemes).  However, Lilly does not teach determine a command from a sequence of graphemes.  However, Parlikar teaches determine a command from a sequence of graphemes (see Parlikar, Col 3 lines 3-8 The ASR module 156 may be configured to base its response on a lexicon 152. The lexicon 152 includes words and associated pronunciations. The ASR module 156 may be configured to compare the received audio data with the pronunciations included in the lexicon 152 to recognize the utterance; the words in lexicon are interpreted as commands, words as sequence of graphemes).
Lilly and Parlikar are considered to be analogous to the claimed invention because they relate to speech processing techniques to enable speech-based user control of a computing device to perform tasks based on the user's spoken commands. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Lilly on speech recognition combined with natural language understanding processing techniques with the processing of utterances into written transcriptions teachings of Parlikar to better understand different pronunciations of words (see Parlikar, col2 lines 1-10). 
Regarding claim 2, Lilly in view of Parlikar teach the apparatus of Claim 1. Lilly further teaches wherein the adaptive decoder is expanded responsive to the hypothesis sequences being different than a label sequence associated with the labeled speech utterances (see Lilly, Col 3, lines 58-68  Thus the system could determine (138) a variation (“unlock”) of the existing word (“lock”), where the variation has a root (“lock”) and an affix (“un”). The system could then create (140) a variation (“unbolt”) of the new word using the same affix (“un”) and the root of the new word (“bolt”). The system can then store (142) the new word (“bolt”) and/or the variation of the new word (“unbolt”) in the ASR lexicon 190 and can train (144) a new ASR model(s) using the new word and the variation of the new word; each word is interpreted as label sequence).
Regarding claim 3, Lilly in view of Parlikar teach the apparatus of Claim 1. Lilly further teaches wherein the adaptive decoder comprises an adaptive lexicon, wherein the additional grapheme sequences are added to the adaptive lexicon (see Lilly, Col 3 lines 26-33 he system may then identify an existing word known to the system (i.e., within the ASR lexicon 190) that is used in a similar manner to the new word. This may be done as follows. The system may determine (134) an existing word in the ASR lexicon 190 where the existing word has second usage characteristics. The system may then determine (136) that the first characteristics are similar to the second characteristics. See Lilly col 3 lines 63-65, The system can then store (142) the new word (“bolt”) and/or the variation of the new word (“unbolt”) in the ASR lexicon 190).
Regarding claim 4, Lilly in view of Parlikar teach the apparatus of Claim 3. Lilly further teaches wherein the adaptive decoder comprises an adaptive language model, wherein the additional grapheme sequences are used to generate the adaptive language model (See Lilly col 3 lines 63-67, The system can then store (142) the new word (“bolt”) and/or the variation of the new word (“unbolt”) in the ASR lexicon 190) and can train (144) a new ASR model(s) using the new word and the variation of the new word; the new ASR Model is interpreted as the adaptive language model).
Regarding claim 5, Lilly in view of Parlikar teach the apparatus of Claim 1. Lilly further teaches comprising a trigger module configured to recognize a spoken keyword in an audio signal and send a control signal and a timestamp to the adaptive decoder responsive to recognizing the spoken keyword (see Lilly, Col 5 lines 42-51 The wakeword detection module 220 works in conjunction with other components of the device, for example a microphone (not pictured) to detect keywords in audio 11. For example, the device 110 may convert audio 11 into audio data, and process the audio data with the wakeword detection module 220 to determine whether speech is detected, and if so, if the audio data comprising speech matches an audio signature and/or model corresponding to a particular keyword; detection of wakeword interpreted as recognizing a spoken keyword in audio signal and sending a control signal and timestamp responsive to the wakeword/spoken keyword).
Regarding claim 6, Lilly in view of Parlikar teach the apparatus of Claim 1. Lilly further teaches wherein the decoder is configured to use a dynamic command list (see Lilly, Col 11 lines 52-66 The output from the NLU processing (which may include tagged text, commands, etc.) may then be sent to a command processor 290, which may be located on a same or separate server 120 as part of system 100; the command processor is interpreted to contain the dynamic command list).
Regarding claim 7, Lilly in view of Parlikar teach the apparatus of Claim 6. Lilly further teaches wherein the dynamic command list is associated with a state or context of the apparatus (see Lilly, Col 10, lines 10-19 An intent classification (IC) module 264 parses the query to determine an intent or intents for each identified domain, where the intent corresponds to the action to be performed that is responsive to the query. Each domain is associated with a database (278a-278n) of words linked to intents. For example, a music intent database may link words and phrases such as “quiet,” “volume off,” and “mute” to a “mute” intent. The IC module 264 identifies potential intents for each identified domain by comparing words in the query to the words and phrases in the intents database 278 ).
Regarding claim 8, Lilly in view of Parlikar teach the apparatus of Claim 1. Lilly further teaches an acoustic engine configured to generate the sequence of graphemes (see Lilly, Col 7 lines 19-38 The different ways a spoken utterance may be interpreted (i.e., the different hypotheses) may each be assigned a probability or a confidence score representing the likelihood that a particular set of words matches those spoken in the utterance. Thus, each potential textual interpretation of the spoken utterance (hypothesis) is associated with a confidence score. Based on the considered factors and the assigned confidence score, the ASR process 250 outputs the most likely text recognized in the audio data. The ASR process may also output multiple hypotheses in the form of a lattice or an N-best list with each hypothesis corresponding to a confidence score or other score (such as probability scores, etc.); the utterance interpretation as words are interpreted as generation of sequence of graphemes).  
  
Regarding claim 9, Lilly in view of Parlikar teach the apparatus of Claim 8. Lilly further teaches wherein the acoustic engine is implemented by a digital signal processor and the adaptive decoder is implemented by an application processor separate from the digital signal processor (see Lilly, Fig. 8 and col 20 lines 3-30, Fig. 8, Networked devices 110 may capture audio using one-or-more built-in or connected microphones 650 or audio capture devices, with processing performed by ASR, NLU, or other components of the same device or another device connected via network 199, such as an ASR 250, NLU 260, etc. of one or more servers 120. The system may also include an ASR lexicon 190, which may be stored local to an ASR model training server 120. The system may also include (or be able to access) text corpus(es) 180 which may be located proximate to or separate from an ASR model training server 120; 180 is interpreted as the adaptive decoder implemented in application processor  which located separate from the ASR model training server 120 which is interpreted as the digital signal processor).
Regarding claim 10, Lilly in view of Parlikar teach the apparatus of Claim 8. Lilly further teaches wherein the adaptive decoder is implemented by a digital signal processor and the acoustic engine is implemented by an application processor separate from the digital signal processor (see Lilly, Fig. 8 and col 20 lines 3-30, Fig. 8, Networked devices 110 may capture audio using one-or-more built-in or connected microphones 650 or audio capture devices, with processing performed by ASR, NLU, or other components of the same device or another device connected via network 199, such as an ASR 250, NLU 260, etc. of one or more servers 120. The system may also include an ASR lexicon 190, which may be stored local to an ASR model training server 120. The system may also include (or be able to access) text corpus(es) 180 which may be located proximate to or separate from an ASR model training server 120; 180 is interpreted as the adaptive decoder implemented in digital signal processor  which located separate from the ASR model training server 120 which is interpreted as the application signal processor).
Regarding claim 11, Lilly in view of Parlikar teach the apparatus of Claim 8. Lilly further teaches wherein the acoustic engine and the adaptive decoder are implemented by a digital signal processor(see Lilly, Fig. 8 and col 20 lines 3-30, Fig. 8, Networked devices 110 may capture audio using one-or-more built-in or connected microphones 650 or audio capture devices, with processing performed by ASR, NLU, or other components of the same device or another device connected via network 199, such as an ASR 250, NLU 260, etc. of one or more servers 120. The system may also include an ASR lexicon 190, which may be stored local to an ASR model training server 120. The system may also include (or be able to access) text corpus(es) 180 which may be located proximate to or separate from an ASR model training server 120; 190 is interpreted as the adaptive decoder implemented in digital signal processor  which located same device from the ASR model training server 120 which is interpreted as the digital signal processor).
Regarding claim 12, Lilly in view of Parlikar teach the apparatus of Claim 8. Lilly further teaches wherein the acoustic engine and the adaptive decoder are implemented by an application processor (see Lilly, Fig. 8 and col 20 lines 3-30, Fig. 8, Networked devices 110 may capture audio using one-or-more built-in or connected microphones 650 or audio capture devices, with processing performed by ASR, NLU, or other components of the same device or another device connected via network 199, such as an ASR 250, NLU 260, etc. of one or more servers 120. The system may also include an ASR lexicon 190, which may be stored local to an ASR model training server 120. The system may also include (or be able to access) text corpus(es) 180 which may be located proximate to or separate from an ASR model training server 120; 190 is interpreted as the adaptive decoder implemented in application processor  which located same device from the ASR model training server 120 which is interpreted as the application processor).
Claims 13, 15-20 are rejected under 35 U.S.C. 103 as being unpatentable over Lilly, US Patent 10,134,388 in view of Li et. al., US Patent Publication Application 2009/0150153.
Regarding claim 13, Lilly teaches an audio processing system, comprising: a decoder module configured to determine a command from a sequence of graphemes generated by a compressed acoustic mode (see Lilly, Col 5, lines 23-26 FIG. 2 is a conceptual diagram of how a spoken utterance is traditionally processed, allowing a system to capture and execute commands spoken by a user, such as spoken commands that may follow a wakeword. Col 7 lines 19-38 The different ways a spoken utterance may be interpreted (i.e., the different hypotheses) may each be assigned a probability or a confidence score representing the likelihood that a particular set of words matches those spoken in the utterance; the words are interpreted as commands determined from a sequence of graphemes); and a decoder compilation module configured to: receive a speech utterance and a label grapheme sequence corresponding to the speech utterance (see Lilly, Col. 7, lines 1-9, the ASR transcribes audio data into text data representing the words of the speech contained in the audio data. The text data may then be used by other components for various purposes, such as executing system commands, inputting data, etc. A spoken utterance in the audio data is input to a processor configured to perform ASR which then interprets the utterance based on the similarity between the utterance and pre-established language models 254 stored in an ASR model knowledge base (ASR Models Storage 252)); generate a hypothesis grapheme sequence for the speech utterance using the compressed acoustic model (see Lilly, Col 3, lines 58-68  Thus the system could determine (138) a variation (“unlock”) of the existing word (“lock”), where the variation has a root (“lock”) and an affix (“un”). The system could then create (140) a variation (“unbolt”) of the new word using the same affix (“un”) and the root of the new word (“bolt”). The system can then store (142) the new word (“bolt”) and/or the variation of the new word (“unbolt”) in the ASR lexicon 190); However, Lilly fails to teach determine an error measurement between the hypothesis grapheme sequence and the label grapheme sequence; and expand the decoder module to recognize the hypothesis grapheme sequence responsive to the error measurement exceeding a threshold.
However, Li teaches a decoder compilation module configured to: receive a speech utterance and a label grapheme sequence corresponding to the speech utterance (see Li, [0030] Turning to FIG. 1, there is shown general conceptual diagram including components that retrain a recognizer 102 using labeled acoustic data as the adaptation data 104 ); generate a hypothesis grapheme sequence for the speech utterance using the compressed acoustic model (see Li, [0049] More particularly, an example training procedure is described with reference to the flow diagram of FIG. 3, beginning at step 302 which represents starting with an ML-adapted grapheme model .thetaML as described above. Step 304 obtains the n-best recognition results g'i for xi, by using a speech recognizer and using the ML-adapted graphoneme; interpreted as the hypothesis grapheme sequence with phoneme pair for the speech utterance); determine an error measurement between the hypothesis grapheme sequence and the label grapheme sequence (see Li, [0049, 0050] Step 304 obtains the n-best recognition results. Step 308 applies stochastic gradient descent to Equation (10) with respect to .theta.. Early stopping to avoid overfitting is applied. Note that in an n-gram model with backoff, if an n-gram does not exist in the model, its probability is computed by backing off to a lower-order distribution; interpreted as applying stochastic gradient descent based on the error measurement); and expand the decoder module to recognize the hypothesis grapheme sequence responsive to the error measurement exceeding a threshold (see Li, [0058] If at step 416 the user confirms "yes" to this second attempt in this example, at step 418 the recognized name at step 414 is used as the grapheme label for each of the recorded acoustics, as recorded at step 402 (labeled "1st" in FIG. 4) and as recorded at step 412 (labeled "2nd" in FIG. 4) ).
Lilly and Li are considered to be analogous to the claimed invention because they relate to speech processing techniques to enable speech-based user control of a computing device to perform tasks based on the user's spoken commands. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Lilly on speech recognition combined with natural language understanding processing techniques with the retraining includes optimizing the graphoneme model using acoustic data teachings of Li to better understand different pronunciations of words (see Li [0006, 0007]). 
Regarding claim 15, Lilly in of Li teach the processing system of Claim 13. Lilly further teaches wherein updating the decoder module comprises adding the hypothesis grapheme sequence to an adapted lexicon used by the decoder module (see Lilly, Col 3 lines 26-33 the system may then identify an existing word known to the system (i.e., within the ASR lexicon 190) that is used in a similar manner to the new word. This may be done as follows. The system may determine (134) an existing word in the ASR lexicon 190 where the existing word has second usage characteristics. The system may then determine (136) that the first characteristics are similar to the second characteristics. See Lilly col 3 lines 63-65, The system can then store (142) the new word (“bolt”) and/or the variation of the new word (“unbolt”) in the ASR lexicon 190).
Regarding claim 16, Lilly in of Li teach the processing system of Claim 15. Lilly further teaches wherein expanding the decoder module comprises recompiling a language model used by the decoder module using the hypothesis sequence (see Lilly, col 3 lines 63-67, The system can then store (142) the new word (“bolt”) and/or the variation of the new word (“unbolt”) in the ASR lexicon 190) and can train (144) a new ASR model(s) using the new word and the variation of the new word; the new ASR Model is interpreted as a language model used by the decoder module using the hypothesis sequence ).
Regarding claim 17, Lilly in of Li teach the processing system of Claim 13. Lilly further teaches wherein the command is included in a first set of commands the decoder module is configured to recognize, wherein the decoder compilation module is further configured to update the decoder module to recognize a second set of commands (see Lilly, Col 3 lines 26-33 the system may then identify an existing word known to the system (i.e., within the ASR lexicon 190) that is used in a similar manner to the new word. This may be done as follows. The system may determine (134) an existing word in the ASR lexicon 190 where the existing word has second usage characteristics. The system may then determine (136) that the first characteristics are similar to the second characteristics; the new word is interpreted as second set of commands).
Regarding claim 18, Lilly in of Li teach the processing system of Claim 17. Lilly further teaches wherein the first set of commands is stored in the decoder module, wherein the second set of commands replaces the first set of commands (see Lilly, Col 15, lines 56-67 The system may then determine a distance in the vector space between the first vector and second vector and may determine (510) that the distance is below a threshold. The threshold may be configured in a number of different ways, including experimentally determined during a training time to configure a threshold that leads to desired system results. Different thresholds may be determined and used for different domains, purposes, etc. The system may then determine (512) a variation of the second word, where the variation of the second word is in the ASR lexicon 190 and includes a root of the second word and a plurality of additional letters).
	Regarding claim 19, Lilly in of Li teach the processing system of Claim 18. Lilly further teaches wherein the first set of commands and the second set of commands are associated with a state or context of an application system (see Lilly, Col 3 lines 26-33 he system may then identify an existing word known to the system (i.e., within the ASR lexicon 190) that is used in a similar manner to the new word. This may be done as follows. The system may determine (134) an existing word in the ASR lexicon 190 where the existing word has second usage characteristics. The system may then determine (136) that the first characteristics are similar to the second characteristics).
Regarding claim 20, Lilly in of Li teach the processing system of Claim 13. Lilly further teaches further comprising a trigger module configured to recognize a spoken keyword in an audio signal and send a control signal to the decoder module responsive to recognizing the spoken keyword (see Lilly, col 3 lines 27-34The device 110, using a wakeword detection module 220, then processes the audio, or audio data corresponding to the audio, to determine if a keyword (such as a wakeword) is detected in the audio. Following detection of a wakeword, the device sends audio data 111 corresponding to the utterance, to a server 120 that includes an ASR module 250; detection of wakeword interpreted as recognizing a spoken keyword in audio signal and sending a control signal responsive to the wakeword/spoken keyword).
Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Lilly, US Patent 10,134,388 in view of Li et. al., US Patent Publication Application 2009/0150153 further in view of Weber, US Patent 9,224,386.
Regarding claim 14, Lilly in of Li teach the processing system of Claim 13. Lilly in view of Li fail to teach wherein the decoder compilation module is further configured to generate a confusion matrix for the compressed acoustic model, wherein the error measurement corresponds to a value in the confusion matrix.
However, Weber teaches wherein the decoder compilation module is further configured to generate a confusion matrix for the compressed acoustic model, wherein the error measurement corresponds to a value in the confusion matrix (see Weber, Col 4 34-36 At block 106, the process 100 may use a confusion matrix to generate alternate recognition hypotheses which may be used to discriminatively train the language model. See Weber, Fig. 2C and  Col 8 lines 36-30 FIG. 3A illustrates a sample process 300 for generating insertion and deletion probabilities for phonemes. In some embodiments, insertion and deletion probabilities generated with the process 300 may be used, along with the confusion probabilities generated in the process 200 described above; the confusion probabilities are interpreted as error measurements ).
Lilly, Li and Weber are considered to be analogous to the claimed invention because they relate to speech processing techniques to enable speech-based user control of a computing device to perform tasks based on the user's spoken commands. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Lilly and Li on speech recognition combined with natural language understanding processing techniques with the confusion matrix be used to generate errors from known transcriptions teachings of Weber to improve the degree of confidence that the proposed transcription is correct (see Weber col 1 lines 54- col2 lines 10). 






Claims 21-24 are rejected under 35 U.S.C. 103 as being unpatentable over Lilly, US Patent 10,134,388 in view of  Weber, US Patent 9,224,386.
	Regarding claim 21, Lilly teaches the method comprising : receiving a speech utterance and a command label corresponding to the speech utterance (see Lilly, Col. 7, lines 1-9, the ASR transcribes audio data into text data representing the words of the speech contained in the audio data. The text data may then be used by other components for various purposes, such as executing system commands, inputting data, etc. A spoken utterance in the audio data is input to a processor configured to perform ASR which then interprets the utterance based on the similarity between the utterance and pre-established language models 254 stored in an ASR model knowledge base (ASR Models Storage 252) ); generating a hypothesis grapheme sequence for the speech utterance using a compressed acoustic model (see Lilly, Col 3, lines 58-68  Thus the system could determine (138) a variation (“unlock”) of the existing word (“lock”), where the variation has a root (“lock”) and an affix (“un”). The system could then create (140) a variation (“unbolt”) of the new word using the same affix (“un”) and the root of the new word (“bolt”). The system can then store (142) the new word (“bolt”) and/or the variation of the new word (“unbolt”) in the ASR lexicon 190); determining an error measurement between the hypothesis grapheme sequence [[to]] and the command label (see Lilly, Col 15, lines 57-67, The system may then determine a distance in the vector space between the first vector and second vector and may determine (510) that the distance is below a threshold. The threshold may be configured in a number of different ways, including experimentally determined during a training time to configure a threshold that leads to desired system results. Different thresholds may be determined and used for different domains, purposes, etc. The system may then determine (512) a variation of the second word, where the variation of the second word is in the ASR lexicon 190 and includes a root of the second word and a plurality of additional letters; distance is interpreted as error measurement);  determining, using the recompiled adaptive decoder, a command from a sequence of graphemes, the sequence of graphemes generated by the compressed acoustic model (see Lilly, col 16 lines 5-9 The system may then determine (516) an expected pronunciation of the variation of the first word using a grapheme-to-phoneme process and may then train (518) a new ASR model(s) to recognize the first word and/or the variation of the first word; new ASR model interpreted as recompiled adaptive decoder ). However, Lilly fails to teach recompiling an adaptive decoder to recognize the hypothesis grapheme sequence responsive to the error measurement exceeding a threshold.
	However, Weber teaches recompiling an adaptive decoder to recognize the hypothesis grapheme sequence responsive to the error measurement exceeding a threshold (see Weber, col 4 lines 54-68, At block 106, the process 100 may use a confusion matrix to generate alternate recognition hypotheses which may be used to discriminatively train the language model. In another embodiment, a subset of all possible confusions are added, such as only those substitutions, insertions, and deletions associated with a probability exceeding a threshold, or only the substitutions, insertions, and deletions associated with the top N probabilities, where N can be any predetermined or dynamically determined number. See Weber, col 5 lines 66-67 As shown in block 120, the previously described process may be repeated for every utterance in the LM training data; probability exceeding a threshold is interpreted as error measurement exceeding a threshold to train the LM with alternate hypothesis as indicated in Weber, Fig. 1).
Lilly and Weber are considered to be analogous to the claimed invention because they relate to speech processing techniques to enable speech-based user control of a computing device to perform tasks based on the user's spoken commands. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Lilly on speech recognition combined with natural language understanding processing techniques with the confusion matrix be used to generate errors from known transcriptions teachings of Weber to improve the degree of confidence that the proposed transcription is correct (see Weber col 1 lines 54- col2 lines 10). 
Regarding claim 22, Lilly in of Weber teach the method of Claim 21. Lilly further teaches wherein recompiling the adaptive decoder comprises adding the hypothesis grapheme sequence to an adapted lexicon used by the adaptive decoder (see Lilly, Col 3 lines 26-33 he system may then identify an existing word known to the system (i.e., within the ASR lexicon 190) that is used in a similar manner to the new word. This may be done as follows. The system may determine (134) an existing word in the ASR lexicon 190 where the existing word has second usage characteristics. The system may then determine (136) that the first characteristics are similar to the second characteristics. See Lilly col 3 lines 63-65, The system can then store (142) the new word (“bolt”) and/or the variation of the new word (“unbolt”) in the ASR lexicon 190).
Regarding claim 23, Lilly in of Weber teach the method of Claim 22. Lilly further teaches wherein recompiling the adaptive decoder comprises generating a language model used by the decoder module using the hypothesis sequence (see Lilly, col 3 lines 63-67, The system can then store (142) the new word (“bolt”) and/or the variation of the new word (“unbolt”) in the ASR lexicon 190) and can train (144) a new ASR model(s) using the new word and the variation of the new word; the new ASR Model is interpreted as a language model used by the decoder module using the hypothesis sequence ).
Regarding claim 24, Lilly in of Weber teach the method of Claim 21. Lilly further teaches wherein the command is included in a first set of commands the decoder module is configured to recognize, the method further comprising recompiling the adaptive decoder to recognize a second set of commands (see Lilly, Col 3 lines 26-33 he system may then identify an existing word known to the system (i.e., within the ASR lexicon 190) that is used in a similar manner to the new word. This may be done as follows. The system may determine (134) an existing word in the ASR lexicon 190 where the existing word has second usage characteristics. The system may then determine (136) that the first characteristics are similar to the second characteristics; the new word is interpreted as second set of commands).
Claims 25 and 26 are rejected under 35 U.S.C. 103 as being unpatentable over Lilly, US Patent 10,134,388 in view of Brocious et.al., US Patent Application Publication 7,240,006.
Regarding claim 25, Lilly teaches a processing system, comprising: an adapted lexicon comprising a first set of commands (See Lilly, col 9 lines 55-65 Each gazetteer (284a-284n) may include domain-indexed lexical information associated with a particular user and/or device. For example, the Gazetteer A (284a) includes domain-index lexical information 286aa to 286an. A user's music-domain lexical information might include album titles, artist names, and song names, for example, whereas a user's contact-list lexical information might include the names of contacts. Since every user's music collection and contact list is presumably different, this personalized information improves entity resolution; each Domain Lexicon, 286aa is interpreted as first set of commands) a decoder configured to use the adapted lexicon and determine a command from [[the]] a grapheme sequence, the command included in the adapted lexicon (see Lilly, col 10, lines 30-35, The intents identified by the IC module 264 are linked to domain-specific grammar frameworks (included in 276) with “slots” or “fields” to be filled. For example, if “play music” is an identified intent, a grammar (276) framework or frameworks may correspond to sentence structures such as “Play {Artist Name},” “Play {Album Name},” “Play {Song name},” “Play {Song name} by {Artist Name},” etc.; the intent identified from the spoken word is interpreted as determining the command from the grapheme sequence); and a decoder compiler configured to: determine a second set of commands for the adapted lexicon, wherein the second set of commands is associated with a new state or context (see Lilly, Col 3 lines 26-33 the system may then identify an existing word known to the system (i.e., within the ASR lexicon 190) that is used in a similar manner to the new word. This may be done as follows. The system may determine (134) an existing word in the ASR lexicon 190 where the existing word has second usage characteristics. The system may then determine (136) that the first characteristics are similar to the second characteristics; the new word is interpreted as second set of commands ); and recompile the adapted lexicon with the second set of commands.
determine a change of state or context of an application of the processing system (see Lilly, col 3 lines 63-67, The system can then store (142) the new word (“bolt”) and/or the variation of the new word (“unbolt”) in the ASR lexicon 190) and can train (144) a new ASR model(s) using the new word and the variation of the new word; train the new ASR model is interpreted as recompiling the adapted lexicon with second set of commands ).  However, Lilly fails to teach determine a change of state or context of an application of the processing system.
	However, Brocious teaches determine a change of state or context of an application of the processing system (see Brocious, col. 3, lines 33-42, As shown in FIG. 1, the current audio context is tracked by the use of an audio queue 100. The audio queue 100 contains entries for numerous events of interest (102, 104, 106, 108, 110 and 112) to the browser as the audio presentation is being read aloud to the user. The events may include audio context changes, buffers to be spoken, and screen control commands. Other events may also be utilized in a particular application. Of particular interest to the present invention are the audio context changes shown in blocks 108 and 112).
Lilly and Brocious are considered to be analogous to the claimed invention because they relate to speech processing techniques to perform tasks based on the user's spoken commands. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Lilly on speech recognition combined with natural language understanding processing techniques with applications to register specific commands  teachings of Brocious to improve the ability to interact verbally with an application (see Brocious col 1 lines 31-41). 
Regarding claim 26, Lilly teaches a method, comprising: using an adapted lexicon, via a decoder, to determine a command from a grapheme sequence (see Lilly, col 10, lines 30-35, The intents identified by the IC module 264 are linked to domain-specific grammar frameworks (included in 276) with “slots” or “fields” to be filled. For example, if “play music” is an identified intent, a grammar (276) framework or frameworks may correspond to sentence structures such as “Play {Artist Name},” “Play {Album Name},” “Play {Song name},” “Play {Song name} by {Artist Name},” etc.; the intent identified from the spoken word is interpreted as determining the command from the grapheme sequence ); compiling, via a decoder compiler, [[a]] the decoder to recognize a first set of commands, the first set of commands associated with a first state (see Lilly, col. 8 lines 22-26, the specific models used may be general models or may be models corresponding to a particular domain. For example, a music processing system may use certain models trained to recognize a set of words of an ASR lexicon 190 whereas a banking system may use other models trained to recognize a different set of words of the ASR lexicon 190; particular domain is interpreted as first state);  determining, via a decoder compiler, a second set of commands to compile on the decoder, wherein the second set of commands is associated with a second state of the device application (see Lilly, col 9, lines 25-35, to correctly perform NLU processing of speech input, the NLU process 260 may be configured to determine a “domain” of the utterance so as to determine and narrow down which services offered by the endpoint device (e.g., server 120 or device 110) may be relevant. For example, an endpoint device may offer services relating to interactions with a telephone service, a contact list service, a calendar/scheduling service, a music player service, etc. Words in a single text query may implicate more than one service, and some services may be functionally linked (e.g., both a telephone service and a calendar service may utilize data from the contact list)); and recompiling, via a decoder compiler, the decoder with the second set of commands(see Lilly, col 3 lines 63-67, The system can then store (142) the new word (“bolt”) and/or the variation of the new word (“unbolt”) in the ASR lexicon 190) and can train (144) a new ASR model(s) using the new word and the variation of the new word; train the new ASR model is interpreted as recompiling the adapted lexicon with second set of commands ).  However, Lilly fails to teach receiving a change of state of a device application.
	However, Brocious teaches receiving a change of state of a device application (see Brocious, col. 3, lines 33-42, As shown in FIG. 1, the current audio context is tracked by the use of an audio queue 100. The audio queue 100 contains entries for numerous events of interest (102, 104, 106, 108, 110 and 112) to the browser as the audio presentation is being read aloud to the user. The events may include audio context changes, buffers to be spoken, and screen control commands. Other events may also be utilized in a particular application. Of particular interest to the present invention are the audio context changes shown in blocks 108 and 112).
Lilly and Brocious are considered to be analogous to the claimed invention because they relate to speech processing techniques to perform tasks based on the user's spoken commands. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Lilly on speech recognition combined with natural language understanding processing techniques with applications to register specific commands  teachings of Brocious to improve the ability to interact verbally with an application (see Brocious col 1 lines 31-41). 
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
 A. Sokolov and A. V. Savchenko, "Voice command recognition in intelligent systems using deep neural networks," 2019 IEEE 17th World Symposium on Applied Machine Intelligence and Informatics (SAMI), 2019, pp. 113-116, teaches isolated voice command recognition for autonomous man-machine and intelligent robotic systems (see Sokolov, pg. 000113, sect. II ).
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NANDINI SUBRAMANI whose telephone number is (571)272-3916. The examiner can normally be reached Monday - Friday 12:00pm - 5:00 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh M Mehta can be reached on (571)272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/NANDINI SUBRAMANI/            Examiner, Art Unit 2656                                                                                                                                                                                            
/Paras D Shah/            Primary Examiner, Art Unit 2659                                                                                                                                                                                            

12/01/2022