DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Introduction
This office action is in response to Applicant’s submission filed on 08/06/2020. Claims 9-28 are pending in the application. As such, Claims 9-28 have been examined.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 11/02/2020 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Objections
Claim 9, 16 and 23 objected to because of the following informalities:  
“the utterance suitable for a predetermined purpose” on Line 18 of Claim 9. This lacks antecedent basis as it has not been previously established in the claim. It is recommended to change “the” to “an”.
“the utterance suitable for a predetermined purpose” on Line 20-21 of Claim 16. This lacks antecedent basis as it has not been previously established in the claim. It is recommended to change “the” to “an”.
.

Appropriate correction is required.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 9-28 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Hoffmeister(US 10332508 B1).

Regarding Claim 9:
Hoffmeister teaches a computer-implemented method for generating aspects of utterance(Fig 18, shows computer with processor, memory, storage, IO), 
the method comprising: receiving an input speech(Col 3, Ln 9-10, receiving a spoken user query), 
the input speech comprising a speech uttered by a speaker and a noise(Col 3, Ln 9-10, receiving a spoken user query. Col 5, Ln 36-38, speech from background noise. Therefore, noise is included in input); 

extracting an acoustic feature from the uttered speech(Col 5, Ln 30-35, signal to noise ratio. Also Col 15, Ln 19-23, For ASR processing…acoustic features….LFBE…MFCC or other features, are determined and used); 
generating a set of speech recognition results with recognition scores based on the uttered speech(Col 6, Ln 49-59, utterance assigned…confidence score); 
generating a set of speech-recognition-result word vector expressions and a set of speech- recognition-result part-of-speech vector expressions based on the set of speech recognition results with recognition scores(Col 15, Ln 34-45, input….the one hot is augmented….including but not limited to…word embeddings, POS tagger. Input is shown to be based off of set of results and scores as shown above in Col 6, Ln 49-59 citation); 
generating a target utterance estimation model based on the extracted acoustic feature, the generated set of speech recognition results with recognition scores, the generated set of speech- recognition-result word vector expressions and the generated set of speech-recognition-result part-of-speech vector expressions(Col 15, Ln 34-45, input….the one hot is augmented….including but not limited to…word embeddings, POS tagger. Col 6, Ln 59-64, each interpretation of the utterance is associate with a confidence score, ASR process outputs the most likely, the model is based on the ASR output. Col 5, Ln 29-35, signal to noise ratio is used to detect speech in input, Col 15, Ln 19-23, For ASR processing…acoustic features….LFBE…MFCC or other features, 
and providing, by the generated target utterance estimation model, a probability of the uttered speech detected from the input speech being the utterance suitable for a predetermined purpose(Col 16, Ln 28-29, the above techniques may be used to assign a confidence score to an ASR result. Col 4, Ln 32-33, executes command associated with NLU results).

Regarding  Claim 10:
Hoffmeister teaches the method of claim 9, wherein the target utterance estimation model is based at least on a sequence of a combination of a recognition score of a word based at least on the set of speech recognition results with recognition scores(Col 14, Ln 10-15, data calculated by the ASR component during processing, such data may include probabilities associated with certain words, Col 6, Ln 59-64, each interpretation of the utterance is associate with a confidence score, ASR process outputs the most likely, the model is based on the ASR output. Also, Col 14, Ln 10-15, ASR component may be configured to output data calculated by the ASR component during processing…..use such data to confirm results of ASR), 4U.S. Patent Application Serial No. 16/968,126 Preliminary Amendment dated March 4, 2021 
a word vector of the word based at least on the set of speech-recognition-result word vector expressions(Col 15, Ln 34-45, for NLU processing… one hot vector 
a part-of-speech vector of the word based on the set of speech-recognition-result part-of-speech vector expressions(Col 15, Ln 34-45, for NLU processing…one hot vector augmented….labels from a tagger e.g. part-of-speech), 
and an acoustic feature of the word based on the acoustic feature of the uttered speech(Col 15, Ln 19-33, For ASR processing…acoustic features….LFBE…MFCC or other features, are determined and used…Alignments can be provided at the level of senons, phones, or any other level suitable. Since NLU uses the ASR output which uses the acoustic feature, the model is based on it. Also, Col 14, Ln 10-15, ASR component may be configured to output data calculated by the ASR component during processing…..use such data to confirm results of ASR).

Regarding Claim 11:
Hoffmeister teaches the method of claim 9, the method further comprising: rejecting the input speech as a background noise based on the probability of the uttered speech from the input speech being the utterance suitable for a predetermined purpose(Col 16, Ln 43-45, if result is not correct or has confidence score below the threshold, the system requests the user to restate), 
wherein the predetermined purpose includes a spoken dialogue(Col 4, Ln 32-33, executes command associated with NLU results).

Regarding Claim 12:

the neural network processing time-series data(Col 15, Ln 34-35, for NLU processing the base input is typically text in the form of word sequences).

Regarding Claim 13:
Hoffmeister teaches the method of claim 9, the method further comprising: receiving, by the target utterance estimation model, a correct answer of the input speech for training the target utterance estimation model(Col 20, Ln 64-66, The classifier and encoders may be trained using samples of acoustic data with the annotated correct word sequence), 
the correct answer being the utterance in a spoken dialogue(Col 4, Ln 32-33, executes command associated with NLU results).

Regarding Claim 14:
Hoffmeister teaches the method of claim 9, wherein each of the recognition scores comprises a numerical value based on one or more of a confidence score of speech recognition, an acoustic score indicating a similarity between the acoustic feature of the input speech and a feature based on the acoustic model, and a language score indicating a degree of matching between the speech recognition results and a 

Regarding Claim 15:
Hoffmeister teaches the method of claim 9, wherein the set of speech-recognition-result word vector expressions comprises a vector generated for each word in the set of speech recognition results with a space between adjacent words based on a morphological analysis(Col 15, Ln 34-45, For NLU processing the input…word sequence…represented by series of one hot vectors…..word embeddings that represent how individual words are used in a text corpus. Col 14, Ln 25-28, encoding to project data points into a vector space…to determine how they relate to each other), 
and wherein the set of speech-recognition-result part-of-speech vector expressions comprises a vector generated for each part-of-speech for words in the set of speech recognition results(Col 15, Ln 34-45, For NLU processing the input…word sequence…represented by series of one hot vectors…augmented with….labels from a tagger e.g. part-of-speech tagger).

Regarding Claim 16:
Hoffmeister teaches a system comprising: a processor; and a memory storing computer executable instructions that when executed by the processor cause the 
receive an input speech(Col 3, Ln 9-10, receiving a spoken user query),
 	the input speech comprising a speech uttered by a speaker and a noise(Col 3, Ln 9-10, receiving a spoken user query. Col 5, Ln 36-38, speech from background noise. Therefore noise is included in input); 
detect, based the input speech, an utterance, the utterance corresponding to the speech uttered by the speaker(Col 5, Ln 27-32, determine whether speech is present in audio input); 
extract an acoustic feature from the uttered speech(Col 5, Ln 30-35, signal to noise ratio. Also Col 15, Ln 19-23, For ASR processing…acoustic features….LFBE…MFCC or other features, are determined and used); 
generate a set of speech recognition results with recognition scores based on the uttered speech(Col 6, Ln 49-59, utterance assigned…confidence score); 
generate a set of speech-recognition-result word vector expressions and a set of speech-recognition-result part-of-speech vector expressions based on the set of speech recognition results with recognition scores(Col 15, Ln 34-45, input….the one hot is augmented….including but not limited to…word embeddings, POS tagger. Input is shown to be based off of set of results and scores as shown above in Col 6, Ln 49-59 citation); 
generate a target utterance estimation model based on the extracted acoustic feature, the generated set of speech recognition results with recognition scores, the generated set of speech-recognition-result word vector expressions and the generated 
and provide, by the generated target utterance estimation model, a probability of the uttered speech detected from the input speech being the utterance suitable for a predetermined purpose(Col 16, Ln 28-29, the above techniques may be used to assign a confidence score to an ASR result. Col 4, Ln 32-33, executes command associated with NLU results).

Regarding Claim 17:
	Claim 17 contains similar limitations as claim 10 and is therefore rejected for the same reasons.

Regarding Claim 18:
	Claim 18 contains similar limitations as claim 11 and is therefore rejected for the same reasons.

Regarding Claim 19:
	Claim 19 contains similar limitations as claim 12 and is therefore rejected for the same reasons.

Regarding Claim 20:
	Claim 20 contains similar limitations as claim 13 and is therefore rejected for the same reasons.

Regarding Claim 21:
	Claim 21 contains similar limitations as claim 14 and is therefore rejected for the same reasons.

Regarding Claim 22:
	Claim 22 contains similar limitations as claim 15 and is therefore rejected for the same reasons.

Regarding Claim 23:
Hoffmeister teaches a computer-readable non-transitory recording medium storing computer- executable instructions that when executed by a processor cause a computer system to(Col 30, Ln 17, computer-readable instructions. Col 30, Ln 24-27, may include RAM, ROM, MRAM. Fig 18, shows computer with processor, memory, storage): 
receive an input speech(Col 3, Ln 9-10, receiving a spoken user query), 

detect, based the input speech, an utterance, the utterance corresponding to the speech uttered by the speaker(Col 5, Ln 27-32, determine whether speech is present in audio input); 
extract an acoustic feature from the uttered speech(Col 5, Ln 30-35, signal to noise ratio. Also Col 15, Ln 19-23, For ASR processing…acoustic features….LFBE…MFCC or other features, are determined and used); 
generate a set of speech recognition results with recognition scores based on the uttered speech(Col 6, Ln 49-59, utterance assigned…confidence score); 
generate a set of speech-recognition-result word vector expressions and a set of speech- recognition-result part-of-speech vector expressions based on the set of speech recognition results with recognition scores(Col 15, Ln 34-45, input….the one hot is augmented….including but not limited to…word embeddings, POS tagger. Input is shown to be based off of set of results and scores as shown above in Col 6, Ln 49-59 citation); 
generate a target utterance estimation model based on the extracted acoustic feature, the generated set of speech recognition results with recognition scores, the generated set of speech- recognition-result word vector expressions and the generated set of speech-recognition-result part-of-speech vector expressions(Col 15, Ln 34-45, input….the one hot is augmented….including but not limited to…word embeddings, POS tagger. Col 6, Ln 59-64, each interpretation of the utterance is associate with a 
and provide, by the generated target utterance estimation model, a probability of the uttered speech detected from the input speech being the utterance suitable for a predetermined purpose(Col 16, Ln 28-29, the above techniques may be used to assign a confidence score to an ASR result. Col 4, Ln 32-33, executes command associated with NLU results).

Regarding Claim 24:
Claim 24 contains similar limitations as claim 10 and is therefore rejected for the same reasons.

Regarding Claim 25:
Claim 25 contains similar limitations as claim 11 and is therefore rejected for the same reasons.

Regarding Claim 26:
Claim 26 contains similar limitations as claim 12 and is therefore rejected for the same reasons.

Regarding Claim 27:
Claim 27 contains similar limitations as claim 13 and is therefore rejected for the same reasons.

Regarding Claim 28:
Claim 28 contains similar limitations as claim 15 and is therefore rejected for the same reasons.


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
TONOZUKA EIJI et al.(JP 2009025518 A)
Reference included in search report. Uses probability and acoustic features to determine if input is misrecognized.
EBENEZER; Samuel et al.(US 20180102136 A1)
Uses acoustic features and neural networks to determine if input is speech or a noise.
CHOI H et al. (JP 2016134169 A)
Included in search report.
IWAMOTO T et al. (JP 2014191029 A)
Included in search report.


Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALEXANDER G MARLOW whose telephone number is (571)272-4536. The examiner can normally be reached Monday - Thursday 10:00 am - 8:00 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richmond Dorvil can be reached on (571)272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ALEXANDER G MARLOW/Assistant Examiner, Art Unit 2658                                                                                                                                                                                                        

/RICHEMOND DORVIL/Supervisory Patent Examiner, Art Unit 2658