DETAILED ACTION
Introduction
This office action is in response to Applicant’s submission filed on 08/14/2020. Claims 1-20 are pending in the application. As such, claims 1-20 have been examined.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 05/20/2021 is being considered by the examiner.

Drawings
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(4) because reference character “210” in Figure 2 has been used to designate both Acoustic score and n-gram score.  Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

The drawings are objected to as failing to comply with 37 CFR 1.84(p)(5) because they include the following reference character(s) not mentioned in the description: “314” from Figure 3 and “400” from Figure 4.  Corrected drawing sheets in compliance with 37 CFR 1.121(d), or amendment to the specification to add the reference character(s) in the description in compliance with 37 CFR 1.121(b) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Claim Objections
Claim 14 is objected to because of the following informalities:  It is listed twice in the claims.  Appropriate correction is required.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim(s) 1-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ehsani et al. (US 8442812 B2) (Further referred to as Ehsani) in view of Goyal et al. (US 20180203852 A1) (Further referred to as Goyal).

Regarding Claim 1, Ehsani teaches an automatic speech recognition (ASR) system that determines a textual representation of a word spoken in a natural language, the system comprising: a language model configured to: for each word candidate in word candidates that correspond to a spoken word: determine a base score (Ehsani Column 13 Lines 18-45 - The general idea is to implement an N-gram phrase-based language model (a language model that uses phrases rather than single words as the basis for n-gram modeling), in order to calculate the best parse of a sentence. Note that some words may act as phrases as can be seen in Sentence 3 (e.g. the word "direct" in the above example). Given these log probabilities, we can calculate the best phrase-based parse through a sentence by multiplying the probabilities (or summing the log probabilities) of each of the bigrams for each possible parse. We select the parse with the highest overall likelihood as the best parse (in this case, Parse 1). Parse 1 is interpreted to be the base score.);
Ehsani and Goyal teaches to combine the base score and the bias score into an n-gram score (Ehsani Column 14 Lines 14-27, Goyal Paragraph 54 - A significant advantage of using a language modeling technique to iteratively refine corpus segmentation is that this technique allows us to identify new phrases and collocations and thereby enlarge our initial phrase dictionary. A language model based corpus segmentation assigns probabilities not only to phrases contained in the dictionary, but to unseen phrases as well (phrases not included in the dictionary). Recurring unseen phrases encountered in the parses with the highest unigram probability score are likely to be significant fixed phrases rather than just random word sequences. By keeping track of unseen phrases and selecting recurring phrases with the highest unigram probabilities, we identify new collocations that can be added to the dictionary. The highest unigram probability score is interpreted as the n-gram score. For the bias score, Goyal teaches that a next character representation 76 from the set is then sampled from a probability distribution over the input characters which is based on the weights (normalized to sum to 1). This biases the current cell to focus on a region of the input dialog act that the current output is most related to, so that the next input character (representation) to a cell of the decoder RNN has a higher probability to be selected from that region, rather than randomly from the bag of character representations. This would be implemented in the phrase dictionary used in the highest unigram possibility score in Ehsani.);
Goyal teaches to determine a bias score from a logarithmic probability associated with each word candidate, wherein the logarithmic probability for each word candidate is determined from a class-based language model (Goyal Paragraph 54 - A next character representation 76 from the set is then sampled from a probability distribution over the input characters which is based on the weights (normalized to sum to 1). This biases the current cell to focus on a region of the input dialog act that the current output is most related to, so that the next input character (representation) to a cell of the decoder RNN has a higher probability to be selected from that region, rather than randomly from the bag of character representations.);
Goyal teaches a decoder neural network configured to determine the textual representation of the spoken word using n-gram scores associated with the word candidates (Goyal Paragraph 54 - In the exemplary embodiment, an attention mechanism 114 focuses the attention of the decoder on the character representation(s) 76 most likely to be the next to be input to the decoder RNN 102. In particular, the vector which is the current hidden state h.sub.t−1 of the decoder cell 104, 106, 108 is compared to the character representations 76 (vectors of the same size as the hidden state) and the character representations 76 are accorded weights as a function of their similarity (affinity).).
Ehsani and Goyal are both considered to be analogous to the claimed invention because both relate to natural language processing to better understand the user. Therefore it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Ehsani on how to more effectively determine the user input with including a bias score and a decoder neural network (Goyal Paragraph 7 and 17 - In accordance with one aspect of the exemplary embodiment, a method is provided for generating a system for generation of a target sequence of characters from an input semantic representation. The method includes providing training data which includes training pairs, each training pair including a semantic representation and a corresponding reference sequence in a natural language. A target background model is built using words occurring in the training data, the target background model being adaptable to accept subsequences of an input semantic representation. The target background model is incorporated into a hierarchical model which includes an encoder and a decoder, the encoder and decoder each operating at the character level. The background model, when adapted to accept subsequences of an input semantic representation, biases the decoder towards outputting a target character sequence including at least one of: words occurring in the training data, and subsequences of the input semantic representation. The hierarchical model is trained on the training pairs to output a target sequence from an input semantic representation. [This method overcomes prior art] drawbacks in considering morphological variance associated with a language, where essential information is lost during the de-lexicalization process. Also, this approach suffers from coordination issues when multiple occurrences of the same type of information exist in the dialog act. [The prior art] model would therefore need to include categories within a placeholder to learn the interaction between them. [In prior art models,] unknown words also pose problems.).

Regarding Claim 2, Ehsani and Goyal teach all of the limitations of claim 1. Ehsani and Goyal also teach that the language model is configured to increase the base score with the bias score for the each word candidate that is an out-of-vocabulary word, wherein the out-of-vocabulary word is a word that is not included in a vocabulary used to train the language model (Ehsani Column 14 Lines 14-27 - A significant advantage of using a language modeling technique to iteratively refine corpus segmentation is that this technique allows us to identify new phrases and collocations and thereby enlarge our initial phrase dictionary. A language model based corpus segmentation assigns probabilities not only to phrases contained in the dictionary, but to unseen phrases as well (phrases not included in the dictionary). Recurring unseen phrases encountered in the parses with the highest unigram probability score are likely to be significant fixed phrases rather than just random word sequences. By keeping track of unseen phrases and selecting recurring phrases with the highest unigram probabilities, we identify new collocations that can be added to the dictionary. Words not included in the dictionary are the out of vocabulary words and it is shown that base, with the bias score (which is taught in Goyal), increase when a word not in vocabulary is present.).
Ehsani and Goyal are both considered to be analogous to the claimed invention because both relate to natural language processing to better understand the user. Therefore it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Ehsani on how to more effectively determine the user input with including a bias score and a decoder neural network (Goyal Paragraph 7 and 17 - In accordance with one aspect of the exemplary embodiment, a method is provided for generating a system for generation of a target sequence of characters from an input semantic representation. The method includes providing training data which includes training pairs, each training pair including a semantic representation and a corresponding reference sequence in a natural language. A target background model is built using words occurring in the training data, the target background model being adaptable to accept subsequences of an input semantic representation. The target background model is incorporated into a hierarchical model which includes an encoder and a decoder, the encoder and decoder each operating at the character level. The background model, when adapted to accept subsequences of an input semantic representation, biases the decoder towards outputting a target character sequence including at least one of: words occurring in the training data, and subsequences of the input semantic representation. The hierarchical model is trained on the training pairs to output a target sequence from an input semantic representation. [This method overcomes prior art] drawbacks in considering morphological variance associated with a language, where essential information is lost during the de-lexicalization process. Also, this approach suffers from coordination issues when multiple occurrences of the same type of information exist in the dialog act. [The prior art] model would therefore need to include categories within a placeholder to learn the interaction between them. [In prior art models,] unknown words also pose problems.).

Regarding Claim 3, Ehsani and Goyal teaches all of the limitations of claim 1. Ehsani and Goyal also teach that the language model is configured to decrease the base score with the bias score for each word candidate that is a vocabulary word, wherein the vocabulary word is a word that is included in a vocabulary used to train the language model (Ehsani Column 14 Lines 29-44 - In the first case, we start a unigram language model, and use this model to segment sub-corpus C2. The segmented sub-corpus C2 is subsequently used to build a new, improved unigram language model on the initial sub-corpus C1. We iterate the procedure until we see little change in the unigram probability scores. At this point we switch to a bigram language model (based on phrase pairs) and reiterate the language modeling process until we see very little change. Then we use a tri-gram model (based on sequences of three phrases) and reiterate the procedure again until we see little changes in the segmentation statistics and few new, unseen phrases. At this point, our dictionary contains a large number of plausible phrase candidates and we have obtained a fairly good parse through each utterance. As the word candidate is learned by the language model, the base, with the  bias score (which is taught in Goyal), is shown to decrease over the iterations.).
Ehsani and Goyal are both considered to be analogous to the claimed invention because both relate to natural language processing to better understand the user. Therefore it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Ehsani on how to more effectively determine the user input with including a bias score and a decoder neural network (Goyal Paragraph 7 and 17 - In accordance with one aspect of the exemplary embodiment, a method is provided for generating a system for generation of a target sequence of characters from an input semantic representation. The method includes providing training data which includes training pairs, each training pair including a semantic representation and a corresponding reference sequence in a natural language. A target background model is built using words occurring in the training data, the target background model being adaptable to accept subsequences of an input semantic representation. The target background model is incorporated into a hierarchical model which includes an encoder and a decoder, the encoder and decoder each operating at the character level. The background model, when adapted to accept subsequences of an input semantic representation, biases the decoder towards outputting a target character sequence including at least one of: words occurring in the training data, and subsequences of the input semantic representation. The hierarchical model is trained on the training pairs to output a target sequence from an input semantic representation. [This method overcomes prior art] drawbacks in considering morphological variance associated with a language, where essential information is lost during the de-lexicalization process. Also, this approach suffers from coordination issues when multiple occurrences of the same type of information exist in the dialog act. [The prior art] model would therefore need to include categories within a placeholder to learn the interaction between them. [In prior art models,] unknown words also pose problems.).

Regarding Claim 4, Ehsani and Goyal teach all of the limitations of claim 1. Ehsani also teaches that building the class-based language model using a second vocabulary of words, wherein the class-based language model has a plurality of non-overlapping classes of words according to n- gram statistics for the words in the second vocabulary of words (Ehsani Column 14 Lines 29-44 - In the first case, we start a unigram language model, and use this model to segment sub-corpus C2. The segmented sub-corpus C2 is subsequently used to build a new, improved unigram language model on the initial sub-corpus C1. We iterate the procedure until we see little change in the unigram probability scores. At this point we switch to a bigram language model (based on phrase pairs) and reiterate the language modeling process until we see very little change. Then we use a tri-gram model (based on sequences of three phrases) and reiterate the procedure again until we see little changes in the segmentation statistics and few new, unseen phrases. At this point, our dictionary contains a large number of plausible phrase candidates and we have obtained a fairly good parse through each utterance. C2 is the second vocabulary of words.);
determine a logarithmic probability for each word in the second vocabulary of words from the class-based language model, wherein the logarithmic probability is based on a class in the non-overlapping classes (Ehsani Column 14 Lines 29-44 - In the first case, we start a unigram language model, and use this model to segment sub-corpus C2. The segmented sub-corpus C2 is subsequently used to build a new, improved unigram language model on the initial sub-corpus C1. We iterate the procedure until we see little change in the unigram probability scores. At this point we switch to a bigram language model (based on phrase pairs) and reiterate the language modeling process until we see very little change. Then we use a tri-gram model (based on sequences of three phrases) and reiterate the procedure again until we see little changes in the segmentation statistics and few new, unseen phrases. At this point, our dictionary contains a large number of plausible phrase candidates and we have obtained a fairly good parse through each utterance. The unigram probability score is found with C2. The unigram probability score includes the logarithmic probability.);
and store the logarithmic probability for each word in a memory accessible to the language model (Ehsani Column 14 Lines 29-44 - In the first case, we start a unigram language model, and use this model to segment sub-corpus C2. The segmented sub-corpus C2 is subsequently used to build a new, improved unigram language model on the initial sub-corpus C1. We iterate the procedure until we see little change in the unigram probability scores. At this point we switch to a bigram language model (based on phrase pairs) and reiterate the language modeling process until we see very little change. Then we use a tri-gram model (based on sequences of three phrases) and reiterate the procedure again until we see little changes in the segmentation statistics and few new, unseen phrases. At this point, our dictionary contains a large number of plausible phrase candidates and we have obtained a fairly good parse through each utterance. This probability is stored through iterations and the word is eventually saved into the dictionary.).
Ehsani and Goyal are both considered to be analogous to the claimed invention because both relate to natural language processing to better understand the user. Therefore it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Ehsani on how to more effectively determine the user input with including a bias score and a decoder neural network (Goyal Paragraph 7 and 17 - In accordance with one aspect of the exemplary embodiment, a method is provided for generating a system for generation of a target sequence of characters from an input semantic representation. The method includes providing training data which includes training pairs, each training pair including a semantic representation and a corresponding reference sequence in a natural language. A target background model is built using words occurring in the training data, the target background model being adaptable to accept subsequences of an input semantic representation. The target background model is incorporated into a hierarchical model which includes an encoder and a decoder, the encoder and decoder each operating at the character level. The background model, when adapted to accept subsequences of an input semantic representation, biases the decoder towards outputting a target character sequence including at least one of: words occurring in the training data, and subsequences of the input semantic representation. The hierarchical model is trained on the training pairs to output a target sequence from an input semantic representation. [This method overcomes prior art] drawbacks in considering morphological variance associated with a language, where essential information is lost during the de-lexicalization process. Also, this approach suffers from coordination issues when multiple occurrences of the same type of information exist in the dialog act. [The prior art] model would therefore need to include categories within a placeholder to learn the interaction between them. [In prior art models,] unknown words also pose problems.).

Regarding Claim 5, Ehsani and Goyal teach all the limitations of claim 1. Ehsani also teaches that an acoustic model is configured to receive the spoken word and determine the word candidates for the spoken word (Ehsani Column 18 Lines 54-67 - The operation of a voice-interactive application entails processing acoustic, syntactic, semantic, and pragmatic information derived from a user's voice input in such a way as to generate a desired response from the application. This process is controlled by the interaction of at least five separate but interrelated components (see FIG. 6): 1. a speech recognition front-end consisting of: (a) an acoustic signal analyzer, (b) a decoder, (c) phone models, (d) a phonetic dictionary, and (e) a recognition grammar; 2. a Natural Language Understanding (NLU) component; 3. a Dialogue Finite State Machine; 4. an application Interface; and 5. a speech output back-end. There is an acoustic signal analyzer.).
Ehsani and Goyal are both considered to be analogous to the claimed invention because both relate to natural language processing to better understand the user. Therefore it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Ehsani on how to more effectively determine the user input with including a bias score and a decoder neural network (Goyal Paragraph 7 and 17 - In accordance with one aspect of the exemplary embodiment, a method is provided for generating a system for generation of a target sequence of characters from an input semantic representation. The method includes providing training data which includes training pairs, each training pair including a semantic representation and a corresponding reference sequence in a natural language. A target background model is built using words occurring in the training data, the target background model being adaptable to accept subsequences of an input semantic representation. The target background model is incorporated into a hierarchical model which includes an encoder and a decoder, the encoder and decoder each operating at the character level. The background model, when adapted to accept subsequences of an input semantic representation, biases the decoder towards outputting a target character sequence including at least one of: words occurring in the training data, and subsequences of the input semantic representation. The hierarchical model is trained on the training pairs to output a target sequence from an input semantic representation. [This method overcomes prior art] drawbacks in considering morphological variance associated with a language, where essential information is lost during the de-lexicalization process. Also, this approach suffers from coordination issues when multiple occurrences of the same type of information exist in the dialog act. [The prior art] model would therefore need to include categories within a placeholder to learn the interaction between them. [In prior art models,] unknown words also pose problems.).

Regarding Claim 6, Ehsani and Goyal teach all the limitations of claim 5. Ehsani also teaches that the acoustic model is further configured to determine an acoustic score for each word candidate in the word candidates (Ehsani Column 19 Lines 1-25 - The components enumerated above work together in the following manner: 1. When a speech signal is received through a microphone or telephone hand-set, its acoustic features are analyzed by the acoustic signal decoder (a) and a set n of the most probable word hypotheses are computed based on the acoustic information contained in the signal, and the phonetic transcriptions contained in the dictionary (d). The dictionary is a word list that maps the vocabulary specified in the recognition grammar (e) to their phonetic transcriptions. The recognition grammar (e) defines legitimate user responses including their linguistic variants and thus tells the system what commands to expect at each point in a given interaction. Because the grammar specifies only legitimate word sequences, it narrows down the hypotheses generated by the acoustic signal analyzer to a limited number of possible commands that are can be recognized by the system at any given point. The result of the front-end processing is a transcription of the speech input. 2. The Natural Language Understanding component (component 2) extracts the meaning of the transcribed speech input and translates the utterances specified in the recognition grammar into a formalized set of instructions that can be processed by the application.);
Goyal further teaches that the decoder neural network is further configured to determine the textual representation of the spoken word using acoustic scores associated with the word candidates (Goyal Paragraph 54 - In the exemplary embodiment, an attention mechanism 114 focuses the attention of the decoder on the character representation(s) 76 most likely to be the next to be input to the decoder RNN 102. In particular, the vector which is the current hidden state h.sub.t−1 of the decoder cell 104, 106, 108 is compared to the character representations 76 (vectors of the same size as the hidden state) and the character representations 76 are accorded weights as a function of their similarity (affinity).).
Ehsani and Goyal are both considered to be analogous to the claimed invention because both relate to natural language processing to better understand the user. Therefore it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Ehsani on how to more effectively determine the user input with including a bias score and a decoder neural network (Goyal Paragraph 7 and 17 - In accordance with one aspect of the exemplary embodiment, a method is provided for generating a system for generation of a target sequence of characters from an input semantic representation. The method includes providing training data which includes training pairs, each training pair including a semantic representation and a corresponding reference sequence in a natural language. A target background model is built using words occurring in the training data, the target background model being adaptable to accept subsequences of an input semantic representation. The target background model is incorporated into a hierarchical model which includes an encoder and a decoder, the encoder and decoder each operating at the character level. The background model, when adapted to accept subsequences of an input semantic representation, biases the decoder towards outputting a target character sequence including at least one of: words occurring in the training data, and subsequences of the input semantic representation. The hierarchical model is trained on the training pairs to output a target sequence from an input semantic representation. [This method overcomes prior art] drawbacks in considering morphological variance associated with a language, where essential information is lost during the de-lexicalization process. Also, this approach suffers from coordination issues when multiple occurrences of the same type of information exist in the dialog act. [The prior art] model would therefore need to include categories within a placeholder to learn the interaction between them. [In prior art models,] unknown words also pose problems.).

Regarding Claim 7, Ehsani and Goyal teach all of the limitations of claim 1. Ehsani and Goyal also teach that the bias score does not depend on previous words received by the ASR system and the base score depends on the previous words received by the ASR system (Ehsani Column 14 Lines 29-44, Goyal Paragraph 35 - In the first case, we start a unigram language model, and use this model to segment sub-corpus C2. The segmented sub-corpus C2 is subsequently used to build a new, improved unigram language model on the initial sub-corpus C1. We iterate the procedure until we see little change in the unigram probability scores. At this point we switch to a bigram language model (based on phrase pairs) and reiterate the language modeling process until we see very little change. Then we use a tri-gram model (based on sequences of three phrases) and reiterate the procedure again until we see little changes in the segmentation statistics and few new, unseen phrases. At this point, our dictionary contains a large number of plausible phrase candidates and we have obtained a fairly good parse through each utterance. The iterations are to make there be little change in the unigram probability score. This is interpreted that, throughout the iterations, the base score is changing based on the change in words that was perceived by the ASR system. Goyal Paragraph 35 teaches that in generating the hierarchical model 50, a target vocabulary 63 of words is built from the reference sequences 62 in the collection 56 and may include all the words found in the reference sequences 62, or a subset of the most common ones (e.g., excluding named entities). A set of at least 100, such up to 10,000 or up to 1000 common words may be used. The vocabulary 63 is used to bias the model 50 towards outputting known words. This shoes that the bias score is based on the existing words in the database, not the new ones perceived by the ASR system.).
Ehsani and Goyal are both considered to be analogous to the claimed invention because both relate to natural language processing to better understand the user. Therefore it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Ehsani on how to more effectively determine the user input with including a bias score and a decoder neural network (Goyal Paragraph 7 and 17 - In accordance with one aspect of the exemplary embodiment, a method is provided for generating a system for generation of a target sequence of characters from an input semantic representation. The method includes providing training data which includes training pairs, each training pair including a semantic representation and a corresponding reference sequence in a natural language. A target background model is built using words occurring in the training data, the target background model being adaptable to accept subsequences of an input semantic representation. The target background model is incorporated into a hierarchical model which includes an encoder and a decoder, the encoder and decoder each operating at the character level. The background model, when adapted to accept subsequences of an input semantic representation, biases the decoder towards outputting a target character sequence including at least one of: words occurring in the training data, and subsequences of the input semantic representation. The hierarchical model is trained on the training pairs to output a target sequence from an input semantic representation. [This method overcomes prior art] drawbacks in considering morphological variance associated with a language, where essential information is lost during the de-lexicalization process. Also, this approach suffers from coordination issues when multiple occurrences of the same type of information exist in the dialog act. [The prior art] model would therefore need to include categories within a placeholder to learn the interaction between them. [In prior art models,] unknown words also pose problems.).

Regarding Claim 8, Ehsani teaches a method for determining a textual representation of a word spoken in a natural language, the method comprising: for each word candidate in word candidates that correspond to a spoken word: determining, using a language model stored in memory and executed on a processor (Ehsani Column 20 Lines 6-17 - The phrase thesaurus described above can be implemented as part of a computer system that can be used to automatically generate complex recognition grammar for speech recognition systems. The recognition grammar can then be used with an interactive user interface that is responsive to spoken input (voice input). The recognition grammar enables interpretation of the spoken input to the user interface. The system combines call-flow design, network expansion, and grammar compilation into a single development tool. The thesaurus forms the key element of this system, but in order to function in the manner desired, it must be integrated and work together with a number of other system components.),
a base score (Ehsani Column 13 Lines 18-45 - The general idea is to implement an N-gram phrase-based language model (a language model that uses phrases rather than single words as the basis for n-gram modeling), in order to calculate the best parse of a sentence. Note that some words may act as phrases as can be seen in Sentence 3 (e.g. the word "direct" in the above example). Given these log probabilities, we can calculate the best phrase-based parse through a sentence by multiplying the probabilities (or summing the log probabilities) of each of the bigrams for each possible parse. We select the parse with the highest overall likelihood as the best parse (in this case, Parse 1). Parse 1 is interpreted to be the base score.);
Ehsani and Goyal teach to combine the base score and the bias score into an n-gram score (Ehsani Column 14 Lines 14-27, Goyal Paragraph 54 - A significant advantage of using a language modeling technique to iteratively refine corpus segmentation is that this technique allows us to identify new phrases and collocations and thereby enlarge our initial phrase dictionary. A language model based corpus segmentation assigns probabilities not only to phrases contained in the dictionary, but to unseen phrases as well (phrases not included in the dictionary). Recurring unseen phrases encountered in the parses with the highest unigram probability score are likely to be significant fixed phrases rather than just random word sequences. By keeping track of unseen phrases and selecting recurring phrases with the highest unigram probabilities, we identify new collocations that can be added to the dictionary. The highest unigram probability score is interpreted as the n-gram score. For the bias score, Goyal teaches that a next character representation 76 from the set is then sampled from a probability distribution over the input characters which is based on the weights (normalized to sum to 1). This biases the current cell to focus on a region of the input dialog act that the current output is most related to, so that the next input character (representation) to a cell of the decoder RNN has a higher probability to be selected from that region, rather than randomly from the bag of character representations. This would be implemented in the phrase dictionary used in the highest unigram possibility score in Ehsani.);
Goyal teaches to determine a bias score from a logarithmic probability associated with each word candidate, wherein the logarithmic probability for each word candidate is determined from a class-based language model (Goyal Paragraph 54 - A next character representation 76 from the set is then sampled from a probability distribution over the input characters which is based on the weights (normalized to sum to 1). This biases the current cell to focus on a region of the input dialog act that the current output is most related to, so that the next input character (representation) to a cell of the decoder RNN has a higher probability to be selected from that region, rather than randomly from the bag of character representations.);
Goyal also teaches a decoder neural network configured to determine the textual representation of the spoken word using n-gram scores associated with the word candidates (Goyal Paragraph 54 - In the exemplary embodiment, an attention mechanism 114 focuses the attention of the decoder on the character representation(s) 76 most likely to be the next to be input to the decoder RNN 102. In particular, the vector which is the current hidden state h.sub.t−1 of the decoder cell 104, 106, 108 is compared to the character representations 76 (vectors of the same size as the hidden state) and the character representations 76 are accorded weights as a function of their similarity (affinity).).
Ehsani and Goyal are both considered to be analogous to the claimed invention because both relate to natural language processing to better understand the user. Therefore it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Ehsani on how to more effectively determine the user input with including a bias score and a decoder neural network (Goyal Paragraph 7 and 17 - In accordance with one aspect of the exemplary embodiment, a method is provided for generating a system for generation of a target sequence of characters from an input semantic representation. The method includes providing training data which includes training pairs, each training pair including a semantic representation and a corresponding reference sequence in a natural language. A target background model is built using words occurring in the training data, the target background model being adaptable to accept subsequences of an input semantic representation. The target background model is incorporated into a hierarchical model which includes an encoder and a decoder, the encoder and decoder each operating at the character level. The background model, when adapted to accept subsequences of an input semantic representation, biases the decoder towards outputting a target character sequence including at least one of: words occurring in the training data, and subsequences of the input semantic representation. The hierarchical model is trained on the training pairs to output a target sequence from an input semantic representation. [This method overcomes prior art] drawbacks in considering morphological variance associated with a language, where essential information is lost during the de-lexicalization process. Also, this approach suffers from coordination issues when multiple occurrences of the same type of information exist in the dialog act. [The prior art] model would therefore need to include categories within a placeholder to learn the interaction between them. [In prior art models,] unknown words also pose problems.).

Regarding Claim 9, Ehsani and Goyal teach all of the limitations of claim 8. Ehsani and Goyal also teach that increasing, using the language model, the base score with the bias score for the each word candidate that is an out-of-vocabulary word, wherein the out-of-vocabulary word is a word that is not included in a vocabulary used to train the language model (Ehsani Column 14 Lines 14-27 - A significant advantage of using a language modeling technique to iteratively refine corpus segmentation is that this technique allows us to identify new phrases and collocations and thereby enlarge our initial phrase dictionary. A language model based corpus segmentation assigns probabilities not only to phrases contained in the dictionary, but to unseen phrases as well (phrases not included in the dictionary). Recurring unseen phrases encountered in the parses with the highest unigram probability score are likely to be significant fixed phrases rather than just random word sequences. By keeping track of unseen phrases and selecting recurring phrases with the highest unigram probabilities, we identify new collocations that can be added to the dictionary. Words not included in the dictionary are the out of vocabulary words and it is shown that base, with the bias score (which is taught in Goyal), increase when a word not in vocabulary is present.).
Ehsani and Goyal are both considered to be analogous to the claimed invention because both relate to natural language processing to better understand the user. Therefore it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Ehsani on how to more effectively determine the user input with including a bias score and a decoder neural network (Goyal Paragraph 7 and 17 - In accordance with one aspect of the exemplary embodiment, a method is provided for generating a system for generation of a target sequence of characters from an input semantic representation. The method includes providing training data which includes training pairs, each training pair including a semantic representation and a corresponding reference sequence in a natural language. A target background model is built using words occurring in the training data, the target background model being adaptable to accept subsequences of an input semantic representation. The target background model is incorporated into a hierarchical model which includes an encoder and a decoder, the encoder and decoder each operating at the character level. The background model, when adapted to accept subsequences of an input semantic representation, biases the decoder towards outputting a target character sequence including at least one of: words occurring in the training data, and subsequences of the input semantic representation. The hierarchical model is trained on the training pairs to output a target sequence from an input semantic representation. [This method overcomes prior art] drawbacks in considering morphological variance associated with a language, where essential information is lost during the de-lexicalization process. Also, this approach suffers from coordination issues when multiple occurrences of the same type of information exist in the dialog act. [The prior art] model would therefore need to include categories within a placeholder to learn the interaction between them. [In prior art models,] unknown words also pose problems.).

Regarding Claim 10, Ehsani and Goyal teach all of the limitations of claim 8. Ehsani and Goyal also teach that decreasing the base score with the bias score for each word candidate that is a vocabulary word, wherein the vocabulary word is a word that is included in a vocabulary used to train the language model (Ehsani Column 14 Lines 29-44 - In the first case, we start a unigram language model, and use this model to segment sub-corpus C2. The segmented sub-corpus C2 is subsequently used to build a new, improved unigram language model on the initial sub-corpus C1. We iterate the procedure until we see little change in the unigram probability scores. At this point we switch to a bigram language model (based on phrase pairs) and reiterate the language modeling process until we see very little change. Then we use a tri-gram model (based on sequences of three phrases) and reiterate the procedure again until we see little changes in the segmentation statistics and few new, unseen phrases. At this point, our dictionary contains a large number of plausible phrase candidates and we have obtained a fairly good parse through each utterance. As the word candidate is learned by the language model, the base, with the  bias score (which is taught in Goyal), is shown to decrease over the iterations.).
Ehsani and Goyal are both considered to be analogous to the claimed invention because both relate to natural language processing to better understand the user. Therefore it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Ehsani on how to more effectively determine the user input with including a bias score and a decoder neural network (Goyal Paragraph 7 and 17 - In accordance with one aspect of the exemplary embodiment, a method is provided for generating a system for generation of a target sequence of characters from an input semantic representation. The method includes providing training data which includes training pairs, each training pair including a semantic representation and a corresponding reference sequence in a natural language. A target background model is built using words occurring in the training data, the target background model being adaptable to accept subsequences of an input semantic representation. The target background model is incorporated into a hierarchical model which includes an encoder and a decoder, the encoder and decoder each operating at the character level. The background model, when adapted to accept subsequences of an input semantic representation, biases the decoder towards outputting a target character sequence including at least one of: words occurring in the training data, and subsequences of the input semantic representation. The hierarchical model is trained on the training pairs to output a target sequence from an input semantic representation. [This method overcomes prior art] drawbacks in considering morphological variance associated with a language, where essential information is lost during the de-lexicalization process. Also, this approach suffers from coordination issues when multiple occurrences of the same type of information exist in the dialog act. [The prior art] model would therefore need to include categories within a placeholder to learn the interaction between them. [In prior art models,] unknown words also pose problems.).

Regarding Claim 11, Ehsani and Goyal teach all of the limitations of claim 8. Ehsani also teaches that prior to determining the textual representation for the spoken word, building the class- based language model using a second vocabulary of words, wherein the class-based language model has a plurality of non-overlapping classes of words according to n-gram statistics for the words in the second vocabulary of words (Ehsani Column 14 Lines 29-44 - In the first case, we start a unigram language model, and use this model to segment sub-corpus C2. The segmented sub-corpus C2 is subsequently used to build a new, improved unigram language model on the initial sub-corpus C1. We iterate the procedure until we see little change in the unigram probability scores. At this point we switch to a bigram language model (based on phrase pairs) and reiterate the language modeling process until we see very little change. Then we use a tri-gram model (based on sequences of three phrases) and reiterate the procedure again until we see little changes in the segmentation statistics and few new, unseen phrases. At this point, our dictionary contains a large number of plausible phrase candidates and we have obtained a fairly good parse through each utterance. C2 is the second vocabulary of words.);
determining a logarithmic probability for each word in the second vocabulary of words using the non-overlapping classes in the class-based language model (Ehsani Column 14 Lines 29-44 - In the first case, we start a unigram language model, and use this model to segment sub-corpus C2. The segmented sub-corpus C2 is subsequently used to build a new, improved unigram language model on the initial sub-corpus C1. We iterate the procedure until we see little change in the unigram probability scores. At this point we switch to a bigram language model (based on phrase pairs) and reiterate the language modeling process until we see very little change. Then we use a tri-gram model (based on sequences of three phrases) and reiterate the procedure again until we see little changes in the segmentation statistics and few new, unseen phrases. At this point, our dictionary contains a large number of plausible phrase candidates and we have obtained a fairly good parse through each utterance. The unigram probability score is found with C2. The unigram probability score includes the logarithmic probability.);
storing the logarithmic probability for each word in a memory accessible to the language model (Ehsani Column 14 Lines 29-44 - In the first case, we start a unigram language model, and use this model to segment sub-corpus C2. The segmented sub-corpus C2 is subsequently used to build a new, improved unigram language model on the initial sub-corpus C1. We iterate the procedure until we see little change in the unigram probability scores. At this point we switch to a bigram language model (based on phrase pairs) and reiterate the language modeling process until we see very little change. Then we use a tri-gram model (based on sequences of three phrases) and reiterate the procedure again until we see little changes in the segmentation statistics and few new, unseen phrases. At this point, our dictionary contains a large number of plausible phrase candidates and we have obtained a fairly good parse through each utterance. This probability is stored through iterations and the word is eventually saved into the dictionary.);
and deleting the class-based language model (Ehsani Column 14 Lines 29-44 - In the first case, we start a unigram language model, and use this model to segment sub-corpus C2. The segmented sub-corpus C2 is subsequently used to build a new, improved unigram language model on the initial sub-corpus C1. We iterate the procedure until we see little change in the unigram probability scores. At this point we switch to a bigram language model (based on phrase pairs) and reiterate the language modeling process until we see very little change. Then we use a tri-gram model (based on sequences of three phrases) and reiterate the procedure again until we see little changes in the segmentation statistics and few new, unseen phrases. At this point, our dictionary contains a large number of plausible phrase candidates and we have obtained a fairly good parse through each utterance. The class based language model shown in the beginning with C1 and C2 are rewritten. Since it was rewritten, it has been interpreted as being deleted.).
Ehsani and Goyal are both considered to be analogous to the claimed invention because both relate to natural language processing to better understand the user. Therefore it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Ehsani on how to more effectively determine the user input with including a bias score and a decoder neural network (Goyal Paragraph 7 and 17 - In accordance with one aspect of the exemplary embodiment, a method is provided for generating a system for generation of a target sequence of characters from an input semantic representation. The method includes providing training data which includes training pairs, each training pair including a semantic representation and a corresponding reference sequence in a natural language. A target background model is built using words occurring in the training data, the target background model being adaptable to accept subsequences of an input semantic representation. The target background model is incorporated into a hierarchical model which includes an encoder and a decoder, the encoder and decoder each operating at the character level. The background model, when adapted to accept subsequences of an input semantic representation, biases the decoder towards outputting a target character sequence including at least one of: words occurring in the training data, and subsequences of the input semantic representation. The hierarchical model is trained on the training pairs to output a target sequence from an input semantic representation. [This method overcomes prior art] drawbacks in considering morphological variance associated with a language, where essential information is lost during the de-lexicalization process. Also, this approach suffers from coordination issues when multiple occurrences of the same type of information exist in the dialog act. [The prior art] model would therefore need to include categories within a placeholder to learn the interaction between them. [In prior art models,] unknown words also pose problems.).

Regarding Claim 12, Ehsani and Goyal teach all of the limitations of claim 8. Ehsani also teaches that receiving, at an acoustic model, the spoken word and determine the word candidates for the spoken word (Ehsani Column 18 Lines 54-67 - The operation of a voice-interactive application entails processing acoustic, syntactic, semantic, and pragmatic information derived from a user's voice input in such a way as to generate a desired response from the application. This process is controlled by the interaction of at least five separate but interrelated components (see FIG. 6): 1. a speech recognition front-end consisting of: (a) an acoustic signal analyzer, (b) a decoder, (c) phone models, (d) a phonetic dictionary, and (e) a recognition grammar; 2. a Natural Language Understanding (NLU) component; 3. a Dialogue Finite State Machine; 4. an application Interface; and 5. a speech output back-end. There is an acoustic signal analyzer.).
Ehsani and Goyal are both considered to be analogous to the claimed invention because both relate to natural language processing to better understand the user. Therefore it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Ehsani on how to more effectively determine the user input with including a bias score and a decoder neural network (Goyal Paragraph 7 and 17 - In accordance with one aspect of the exemplary embodiment, a method is provided for generating a system for generation of a target sequence of characters from an input semantic representation. The method includes providing training data which includes training pairs, each training pair including a semantic representation and a corresponding reference sequence in a natural language. A target background model is built using words occurring in the training data, the target background model being adaptable to accept subsequences of an input semantic representation. The target background model is incorporated into a hierarchical model which includes an encoder and a decoder, the encoder and decoder each operating at the character level. The background model, when adapted to accept subsequences of an input semantic representation, biases the decoder towards outputting a target character sequence including at least one of: words occurring in the training data, and subsequences of the input semantic representation. The hierarchical model is trained on the training pairs to output a target sequence from an input semantic representation. [This method overcomes prior art] drawbacks in considering morphological variance associated with a language, where essential information is lost during the de-lexicalization process. Also, this approach suffers from coordination issues when multiple occurrences of the same type of information exist in the dialog act. [The prior art] model would therefore need to include categories within a placeholder to learn the interaction between them. [In prior art models,] unknown words also pose problems.).

Regarding Claim 13, Ehsani and Goyal teach all of the limitations of claim 8. Ehsani also teaches determining, using an acoustic model, an acoustic score for each word candidate in the word candidates (Ehsani Column 19 Lines 1-25 - The components enumerated above work together in the following manner: 1. When a speech signal is received through a microphone or telephone hand-set, its acoustic features are analyzed by the acoustic signal decoder (a) and a set n of the most probable word hypotheses are computed based on the acoustic information contained in the signal, and the phonetic transcriptions contained in the dictionary (d). The dictionary is a word list that maps the vocabulary specified in the recognition grammar (e) to their phonetic transcriptions. The recognition grammar (e) defines legitimate user responses including their linguistic variants and thus tells the system what commands to expect at each point in a given interaction. Because the grammar specifies only legitimate word sequences, it narrows down the hypotheses generated by the acoustic signal analyzer to a limited number of possible commands that are can be recognized by the system at any given point. The result of the front-end processing is a transcription of the speech input. 2. The Natural Language Understanding component (component 2) extracts the meaning of the transcribed speech input and translates the utterances specified in the recognition grammar into a formalized set of instructions that can be processed by the application.);
and Goyal teaches determining, using the neural network decoder, the textual representation of the spoken word using acoustic scores associated with the word candidates (Goyal Paragraph 54 - In the exemplary embodiment, an attention mechanism 114 focuses the attention of the decoder on the character representation(s) 76 most likely to be the next to be input to the decoder RNN 102. In particular, the vector which is the current hidden state h.sub.t−1 of the decoder cell 104, 106, 108 is compared to the character representations 76 (vectors of the same size as the hidden state) and the character representations 76 are accorded weights as a function of their similarity (affinity).).
Ehsani and Goyal are both considered to be analogous to the claimed invention because both relate to natural language processing to better understand the user. Therefore it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Ehsani on how to more effectively determine the user input with including a bias score and a decoder neural network (Goyal Paragraph 7 and 17 - In accordance with one aspect of the exemplary embodiment, a method is provided for generating a system for generation of a target sequence of characters from an input semantic representation. The method includes providing training data which includes training pairs, each training pair including a semantic representation and a corresponding reference sequence in a natural language. A target background model is built using words occurring in the training data, the target background model being adaptable to accept subsequences of an input semantic representation. The target background model is incorporated into a hierarchical model which includes an encoder and a decoder, the encoder and decoder each operating at the character level. The background model, when adapted to accept subsequences of an input semantic representation, biases the decoder towards outputting a target character sequence including at least one of: words occurring in the training data, and subsequences of the input semantic representation. The hierarchical model is trained on the training pairs to output a target sequence from an input semantic representation. [This method overcomes prior art] drawbacks in considering morphological variance associated with a language, where essential information is lost during the de-lexicalization process. Also, this approach suffers from coordination issues when multiple occurrences of the same type of information exist in the dialog act. [The prior art] model would therefore need to include categories within a placeholder to learn the interaction between them. [In prior art models,] unknown words also pose problems.).

Regarding the first instance of Claim 14, Ehsani and Goyal teach all of the limitations of claim 8. Ehsani and Goyal also teach that the bias score does not depend on previous words received by an ASR system and the base score depends on the previous words received by the ASR system (Ehsani Column 14 Lines 29-44, Goyal Paragraph 35 - In the first case, we start a unigram language model, and use this model to segment sub-corpus C2. The segmented sub-corpus C2 is subsequently used to build a new, improved unigram language model on the initial sub-corpus C1. We iterate the procedure until we see little change in the unigram probability scores. At this point we switch to a bigram language model (based on phrase pairs) and reiterate the language modeling process until we see very little change. Then we use a tri-gram model (based on sequences of three phrases) and reiterate the procedure again until we see little changes in the segmentation statistics and few new, unseen phrases. At this point, our dictionary contains a large number of plausible phrase candidates and we have obtained a fairly good parse through each utterance. The iterations are to make there be little change in the unigram probability score. This is interpreted that, throughout the iterations, the base score is changing based on the change in words that was perceived by the ASR system. Goyal Paragraph 35 teaches that in generating the hierarchical model 50, a target vocabulary 63 of words is built from the reference sequences 62 in the collection 56 and may include all the words found in the reference sequences 62, or a subset of the most common ones (e.g., excluding named entities). A set of at least 100, such up to 10,000 or up to 1000 common words may be used. The vocabulary 63 is used to bias the model 50 towards outputting known words. This shoes that the bias score is based on the existing words in the database, not the new ones perceived by the ASR system.).
Ehsani and Goyal are both considered to be analogous to the claimed invention because both relate to natural language processing to better understand the user. Therefore it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Ehsani on how to more effectively determine the user input with including a bias score and a decoder neural network (Goyal Paragraph 7 and 17 - In accordance with one aspect of the exemplary embodiment, a method is provided for generating a system for generation of a target sequence of characters from an input semantic representation. The method includes providing training data which includes training pairs, each training pair including a semantic representation and a corresponding reference sequence in a natural language. A target background model is built using words occurring in the training data, the target background model being adaptable to accept subsequences of an input semantic representation. The target background model is incorporated into a hierarchical model which includes an encoder and a decoder, the encoder and decoder each operating at the character level. The background model, when adapted to accept subsequences of an input semantic representation, biases the decoder towards outputting a target character sequence including at least one of: words occurring in the training data, and subsequences of the input semantic representation. The hierarchical model is trained on the training pairs to output a target sequence from an input semantic representation. [This method overcomes prior art] drawbacks in considering morphological variance associated with a language, where essential information is lost during the de-lexicalization process. Also, this approach suffers from coordination issues when multiple occurrences of the same type of information exist in the dialog act. [The prior art] model would therefore need to include categories within a placeholder to learn the interaction between them. [In prior art models,] unknown words also pose problems.).

Regarding the second instance of Claim 14, Ehsani teaches an automatic speech recognition (ASR) system that determines a textual representation of a word spoken in a natural language, the system comprising: build a class-based language model using a vocabulary of words, wherein the class- based language model has a plurality of non-overlapping classes of words according to n-gram statistics for the words in the vocabulary of words (Ehsani Column 14 Lines 29-44 - In the first case, we start a unigram language model, and use this model to segment sub-corpus C2. The segmented sub-corpus C2 is subsequently used to build a new, improved unigram language model on the initial sub-corpus C1. We iterate the procedure until we see little change in the unigram probability scores. At this point we switch to a bigram language model (based on phrase pairs) and reiterate the language modeling process until we see very little change. Then we use a tri-gram model (based on sequences of three phrases) and reiterate the procedure again until we see little changes in the segmentation statistics and few new, unseen phrases. At this point, our dictionary contains a large number of plausible phrase candidates and we have obtained a fairly good parse through each utterance. C2 is the second vocabulary of words.);
determine a logarithmic probability for each word in the vocabulary of words using the non-overlapping classes in the class-based language model (Ehsani Column 14 Lines 29-44 - In the first case, we start a unigram language model, and use this model to segment sub-corpus C2. The segmented sub-corpus C2 is subsequently used to build a new, improved unigram language model on the initial sub-corpus C1. We iterate the procedure until we see little change in the unigram probability scores. At this point we switch to a bigram language model (based on phrase pairs) and reiterate the language modeling process until we see very little change. Then we use a tri-gram model (based on sequences of three phrases) and reiterate the procedure again until we see little changes in the segmentation statistics and few new, unseen phrases. At this point, our dictionary contains a large number of plausible phrase candidates and we have obtained a fairly good parse through each utterance. The unigram probability score is found with C2. The unigram probability score includes the logarithmic probability.);
a language model configured to determine a base score (Ehsani Column 13 Lines 18-45 - The general idea is to implement an N-gram phrase-based language model (a language model that uses phrases rather than single words as the basis for n-gram modeling), in order to calculate the best parse of a sentence. Note that some words may act as phrases as can be seen in Sentence 3 (e.g. the word "direct" in the above example). Given these log probabilities, we can calculate the best phrase-based parse through a sentence by multiplying the probabilities (or summing the log probabilities) of each of the bigrams for each possible parse. We select the parse with the highest overall likelihood as the best parse (in this case, Parse 1). Parse 1 is interpreted to be the base score.).
Goyal further teaches a bias score for at least one word candidate for a spoken word, wherein the bias score is based on a logarithmic probability for the at least one word candidate (Goyal Paragraph 54 - A next character representation 76 from the set is then sampled from a probability distribution over the input characters which is based on the weights (normalized to sum to 1). This biases the current cell to focus on a region of the input dialog act that the current output is most related to, so that the next input character (representation) to a cell of the decoder RNN has a higher probability to be selected from that region, rather than randomly from the bag of character representations.);
and a decoder configured to convert the at least one word candidate into a textual representation of the spoken word based on a combination of the base score and the bias score (Goyal Paragraph 54 - In the exemplary embodiment, an attention mechanism 114 focuses the attention of the decoder on the character representation(s) 76 most likely to be the next to be input to the decoder RNN 102. In particular, the vector which is the current hidden state h.sub.t−1 of the decoder cell 104, 106, 108 is compared to the character representations 76 (vectors of the same size as the hidden state) and the character representations 76 are accorded weights as a function of their similarity (affinity).).
Ehsani and Goyal are both considered to be analogous to the claimed invention because both relate to natural language processing to better understand the user. Therefore it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Ehsani on how to more effectively determine the user input with including a bias score and a decoder neural network (Goyal Paragraph 7 and 17 - In accordance with one aspect of the exemplary embodiment, a method is provided for generating a system for generation of a target sequence of characters from an input semantic representation. The method includes providing training data which includes training pairs, each training pair including a semantic representation and a corresponding reference sequence in a natural language. A target background model is built using words occurring in the training data, the target background model being adaptable to accept subsequences of an input semantic representation. The target background model is incorporated into a hierarchical model which includes an encoder and a decoder, the encoder and decoder each operating at the character level. The background model, when adapted to accept subsequences of an input semantic representation, biases the decoder towards outputting a target character sequence including at least one of: words occurring in the training data, and subsequences of the input semantic representation. The hierarchical model is trained on the training pairs to output a target sequence from an input semantic representation. [This method overcomes prior art] drawbacks in considering morphological variance associated with a language, where essential information is lost during the de-lexicalization process. Also, this approach suffers from coordination issues when multiple occurrences of the same type of information exist in the dialog act. [The prior art] model would therefore need to include categories within a placeholder to learn the interaction between them. [In prior art models,] unknown words also pose problems.).

Regarding Claim 15, Ehsani and Goyal teach all of the limitations of claim 14. Ehsani and Goyal also teach that the language model is configured to increase the base score with the bias score for the at least one word candidate that is an out-of-vocabulary word, wherein the out-of-vocabulary word is a word that is not included in a vocabulary used to train the language model (Ehsani Column 14 Lines 14-27 - A significant advantage of using a language modeling technique to iteratively refine corpus segmentation is that this technique allows us to identify new phrases and collocations and thereby enlarge our initial phrase dictionary. A language model based corpus segmentation assigns probabilities not only to phrases contained in the dictionary, but to unseen phrases as well (phrases not included in the dictionary). Recurring unseen phrases encountered in the parses with the highest unigram probability score are likely to be significant fixed phrases rather than just random word sequences. By keeping track of unseen phrases and selecting recurring phrases with the highest unigram probabilities, we identify new collocations that can be added to the dictionary. Words not included in the dictionary are the out of vocabulary words and it is shown that base, with the bias score (which is taught in Goyal), increase when a word not in vocabulary is present.).
Ehsani and Goyal are both considered to be analogous to the claimed invention because both relate to natural language processing to better understand the user. Therefore it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Ehsani on how to more effectively determine the user input with including a bias score and a decoder neural network (Goyal Paragraph 7 and 17 - In accordance with one aspect of the exemplary embodiment, a method is provided for generating a system for generation of a target sequence of characters from an input semantic representation. The method includes providing training data which includes training pairs, each training pair including a semantic representation and a corresponding reference sequence in a natural language. A target background model is built using words occurring in the training data, the target background model being adaptable to accept subsequences of an input semantic representation. The target background model is incorporated into a hierarchical model which includes an encoder and a decoder, the encoder and decoder each operating at the character level. The background model, when adapted to accept subsequences of an input semantic representation, biases the decoder towards outputting a target character sequence including at least one of: words occurring in the training data, and subsequences of the input semantic representation. The hierarchical model is trained on the training pairs to output a target sequence from an input semantic representation. [This method overcomes prior art] drawbacks in considering morphological variance associated with a language, where essential information is lost during the de-lexicalization process. Also, this approach suffers from coordination issues when multiple occurrences of the same type of information exist in the dialog act. [The prior art] model would therefore need to include categories within a placeholder to learn the interaction between them. [In prior art models,] unknown words also pose problems.).

Regarding Claim 16, Ehsani and Goyal teach all of the limitations of claim 14. Ehsani and Goyal also teach that the language model is configured to decrease the base score with the bias score for the at least one word candidate that is a vocabulary word, wherein the vocabulary word is a word that is included in a vocabulary used to train the language model (Ehsani Column 14 Lines 29-44 - In the first case, we start a unigram language model, and use this model to segment sub-corpus C2. The segmented sub-corpus C2 is subsequently used to build a new, improved unigram language model on the initial sub-corpus C1. We iterate the procedure until we see little change in the unigram probability scores. At this point we switch to a bigram language model (based on phrase pairs) and reiterate the language modeling process until we see very little change. Then we use a tri-gram model (based on sequences of three phrases) and reiterate the procedure again until we see little changes in the segmentation statistics and few new, unseen phrases. At this point, our dictionary contains a large number of plausible phrase candidates and we have obtained a fairly good parse through each utterance. As the word candidate is learned by the language model, the base, with the  bias score (which is taught in Goyal), is shown to decrease over the iterations.).
Ehsani and Goyal are both considered to be analogous to the claimed invention because both relate to natural language processing to better understand the user. Therefore it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Ehsani on how to more effectively determine the user input with including a bias score and a decoder neural network (Goyal Paragraph 7 and 17 - In accordance with one aspect of the exemplary embodiment, a method is provided for generating a system for generation of a target sequence of characters from an input semantic representation. The method includes providing training data which includes training pairs, each training pair including a semantic representation and a corresponding reference sequence in a natural language. A target background model is built using words occurring in the training data, the target background model being adaptable to accept subsequences of an input semantic representation. The target background model is incorporated into a hierarchical model which includes an encoder and a decoder, the encoder and decoder each operating at the character level. The background model, when adapted to accept subsequences of an input semantic representation, biases the decoder towards outputting a target character sequence including at least one of: words occurring in the training data, and subsequences of the input semantic representation. The hierarchical model is trained on the training pairs to output a target sequence from an input semantic representation. [This method overcomes prior art] drawbacks in considering morphological variance associated with a language, where essential information is lost during the de-lexicalization process. Also, this approach suffers from coordination issues when multiple occurrences of the same type of information exist in the dialog act. [The prior art] model would therefore need to include categories within a placeholder to learn the interaction between them. [In prior art models,] unknown words also pose problems.).

Regarding Claim 17, Ehsani and Goyal teach all of the limitations of claim 14. Ehsani also teaches to delete the class-based language model after the logarithmic probability for each word in the vocabulary of words has been determined (Ehsani Column 14 Lines 29-44 - In the first case, we start a unigram language model, and use this model to segment sub-corpus C2. The segmented sub-corpus C2 is subsequently used to build a new, improved unigram language model on the initial sub-corpus C1. We iterate the procedure until we see little change in the unigram probability scores. At this point we switch to a bigram language model (based on phrase pairs) and reiterate the language modeling process until we see very little change. Then we use a tri-gram model (based on sequences of three phrases) and reiterate the procedure again until we see little changes in the segmentation statistics and few new, unseen phrases. At this point, our dictionary contains a large number of plausible phrase candidates and we have obtained a fairly good parse through each utterance. The class based language model shown in the beginning with C1 and C2 are rewritten. Since it was rewritten, it has been interpreted as being deleted.).
Ehsani and Goyal are both considered to be analogous to the claimed invention because both relate to natural language processing to better understand the user. Therefore it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Ehsani on how to more effectively determine the user input with including a bias score and a decoder neural network (Goyal Paragraph 7 and 17 - In accordance with one aspect of the exemplary embodiment, a method is provided for generating a system for generation of a target sequence of characters from an input semantic representation. The method includes providing training data which includes training pairs, each training pair including a semantic representation and a corresponding reference sequence in a natural language. A target background model is built using words occurring in the training data, the target background model being adaptable to accept subsequences of an input semantic representation. The target background model is incorporated into a hierarchical model which includes an encoder and a decoder, the encoder and decoder each operating at the character level. The background model, when adapted to accept subsequences of an input semantic representation, biases the decoder towards outputting a target character sequence including at least one of: words occurring in the training data, and subsequences of the input semantic representation. The hierarchical model is trained on the training pairs to output a target sequence from an input semantic representation. [This method overcomes prior art] drawbacks in considering morphological variance associated with a language, where essential information is lost during the de-lexicalization process. Also, this approach suffers from coordination issues when multiple occurrences of the same type of information exist in the dialog act. [The prior art] model would therefore need to include categories within a placeholder to learn the interaction between them. [In prior art models,] unknown words also pose problems.).

Regarding Claim 18, Ehsani and Goyal teach all of the limitation of claim 14. Ehsani also teaches that an acoustic model configured to receive the spoken word and determine the at least one word candidate for the spoken word and an acoustic score for the at least one word candidate (Ehsani Column 18 Lines 54-67 - The operation of a voice-interactive application entails processing acoustic, syntactic, semantic, and pragmatic information derived from a user's voice input in such a way as to generate a desired response from the application. This process is controlled by the interaction of at least five separate but interrelated components (see FIG. 6): 1. a speech recognition front-end consisting of: (a) an acoustic signal analyzer, (b) a decoder, (c) phone models, (d) a phonetic dictionary, and (e) a recognition grammar; 2. a Natural Language Understanding (NLU) component; 3. a Dialogue Finite State Machine; 4. an application Interface; and 5. a speech output back-end. There is an acoustic signal analyzer.).
Ehsani and Goyal are both considered to be analogous to the claimed invention because both relate to natural language processing to better understand the user. Therefore it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Ehsani on how to more effectively determine the user input with including a bias score and a decoder neural network (Goyal Paragraph 7 and 17 - In accordance with one aspect of the exemplary embodiment, a method is provided for generating a system for generation of a target sequence of characters from an input semantic representation. The method includes providing training data which includes training pairs, each training pair including a semantic representation and a corresponding reference sequence in a natural language. A target background model is built using words occurring in the training data, the target background model being adaptable to accept subsequences of an input semantic representation. The target background model is incorporated into a hierarchical model which includes an encoder and a decoder, the encoder and decoder each operating at the character level. The background model, when adapted to accept subsequences of an input semantic representation, biases the decoder towards outputting a target character sequence including at least one of: words occurring in the training data, and subsequences of the input semantic representation. The hierarchical model is trained on the training pairs to output a target sequence from an input semantic representation. [This method overcomes prior art] drawbacks in considering morphological variance associated with a language, where essential information is lost during the de-lexicalization process. Also, this approach suffers from coordination issues when multiple occurrences of the same type of information exist in the dialog act. [The prior art] model would therefore need to include categories within a placeholder to learn the interaction between them. [In prior art models,] unknown words also pose problems.).

Regarding Claim 19, Ehsani and Goyal teach all of the limitations of claim 14. Ehsani and Goyal also teach that the decoder neural network is further configured to determine the textual representation of the spoken word using acoustic scores associated with the at least one word candidate and the combination of the base score and the bias score (Ehsani Column 14 Lines 14-27, Goyal Paragraph 54 - A significant advantage of using a language modeling technique to iteratively refine corpus segmentation is that this technique allows us to identify new phrases and collocations and thereby enlarge our initial phrase dictionary. A language model based corpus segmentation assigns probabilities not only to phrases contained in the dictionary, but to unseen phrases as well (phrases not included in the dictionary). Recurring unseen phrases encountered in the parses with the highest unigram probability score are likely to be significant fixed phrases rather than just random word sequences. By keeping track of unseen phrases and selecting recurring phrases with the highest unigram probabilities, we identify new collocations that can be added to the dictionary. The highest unigram probability score is interpreted as the n-gram score. For the bias score, Goyal teaches that a next character representation 76 from the set is then sampled from a probability distribution over the input characters which is based on the weights (normalized to sum to 1). This biases the current cell to focus on a region of the input dialog act that the current output is most related to, so that the next input character (representation) to a cell of the decoder RNN has a higher probability to be selected from that region, rather than randomly from the bag of character representations. This would be implemented in the phrase dictionary used in the highest unigram possibility score in Ehsani.).
Ehsani and Goyal are both considered to be analogous to the claimed invention because both relate to natural language processing to better understand the user. Therefore it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Ehsani on how to more effectively determine the user input with including a bias score and a decoder neural network (Goyal Paragraph 7 and 17 - In accordance with one aspect of the exemplary embodiment, a method is provided for generating a system for generation of a target sequence of characters from an input semantic representation. The method includes providing training data which includes training pairs, each training pair including a semantic representation and a corresponding reference sequence in a natural language. A target background model is built using words occurring in the training data, the target background model being adaptable to accept subsequences of an input semantic representation. The target background model is incorporated into a hierarchical model which includes an encoder and a decoder, the encoder and decoder each operating at the character level. The background model, when adapted to accept subsequences of an input semantic representation, biases the decoder towards outputting a target character sequence including at least one of: words occurring in the training data, and subsequences of the input semantic representation. The hierarchical model is trained on the training pairs to output a target sequence from an input semantic representation. [This method overcomes prior art] drawbacks in considering morphological variance associated with a language, where essential information is lost during the de-lexicalization process. Also, this approach suffers from coordination issues when multiple occurrences of the same type of information exist in the dialog act. [The prior art] model would therefore need to include categories within a placeholder to learn the interaction between them. [In prior art models,] unknown words also pose problems.).

Regarding Claim 20, Ehsani and Goyal teach all of the limitations of claim 14. Ehsani and Goyal also teach that the bias score does not depend on previous words processed by the ASR system and the base score depends on the previous words processed by the ASR system (Ehsani Column 14 Lines 29-44, Goyal Paragraph 35 - In the first case, we start a unigram language model, and use this model to segment sub-corpus C2. The segmented sub-corpus C2 is subsequently used to build a new, improved unigram language model on the initial sub-corpus C1. We iterate the procedure until we see little change in the unigram probability scores. At this point we switch to a bigram language model (based on phrase pairs) and reiterate the language modeling process until we see very little change. Then we use a tri-gram model (based on sequences of three phrases) and reiterate the procedure again until we see little changes in the segmentation statistics and few new, unseen phrases. At this point, our dictionary contains a large number of plausible phrase candidates and we have obtained a fairly good parse through each utterance. The iterations are to make there be little change in the unigram probability score. This is interpreted that, throughout the iterations, the base score is changing based on the change in words that was perceived by the ASR system. Goyal Paragraph 35 teaches that in generating the hierarchical model 50, a target vocabulary 63 of words is built from the reference sequences 62 in the collection 56 and may include all the words found in the reference sequences 62, or a subset of the most common ones (e.g., excluding named entities). A set of at least 100, such up to 10,000 or up to 1000 common words may be used. The vocabulary 63 is used to bias the model 50 towards outputting known words. This shoes that the bias score is based on the existing words in the database, not the new ones perceived by the ASR system.).
Ehsani and Goyal are both considered to be analogous to the claimed invention because both relate to natural language processing to better understand the user. Therefore it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Ehsani on how to more effectively determine the user input with including a bias score and a decoder neural network (Goyal Paragraph 7 and 17 - In accordance with one aspect of the exemplary embodiment, a method is provided for generating a system for generation of a target sequence of characters from an input semantic representation. The method includes providing training data which includes training pairs, each training pair including a semantic representation and a corresponding reference sequence in a natural language. A target background model is built using words occurring in the training data, the target background model being adaptable to accept subsequences of an input semantic representation. The target background model is incorporated into a hierarchical model which includes an encoder and a decoder, the encoder and decoder each operating at the character level. The background model, when adapted to accept subsequences of an input semantic representation, biases the decoder towards outputting a target character sequence including at least one of: words occurring in the training data, and subsequences of the input semantic representation. The hierarchical model is trained on the training pairs to output a target sequence from an input semantic representation. [This method overcomes prior art] drawbacks in considering morphological variance associated with a language, where essential information is lost during the de-lexicalization process. Also, this approach suffers from coordination issues when multiple occurrences of the same type of information exist in the dialog act. [The prior art] model would therefore need to include categories within a placeholder to learn the interaction between them. [In prior art models,] unknown words also pose problems.).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Bennet (US 7831426 B2) and NPL - M. Padmanabhan and M. Picheny, "Large-vocabulary speech recognition algorithms," in Computer, vol. 35, no. 4, pp. 42-50, March 2002, doi: 10.1109/MC.2002.993770. (Year: 2002).
Bennet (US 7831426 B2) discloses an invention that is “a network based interactive speech system responds in real-time to speech-based queries addressed to a set of topic entries” (Bennet – Abstract). 
M. Padmanabhan and M. Picheny, "Large-vocabulary speech recognition algorithms," in Computer, vol. 35, no. 4, pp. 42-50, March 2002, doi: 10.1109/MC.2002.993770. (Year: 2002) teaches of “next-generation speech recognition applications … that match human performance levels” (Padmanabhan – Abstract).
Please, see additional references in form PTO-892 for more details. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to UTHEJ KUNAMNENI whose telephone number is (571)272-5428. The examiner can normally be reached M-F 8:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on (571) 272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/UTHEJ KUNAMNENI/               Examiner, Art Unit 2656                                                                                                                                                                                         
/EDGAR X GUERRA-ERAZO/               Primary Examiner, Art Unit 2656