Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Allowable Subject Matter
Examiner’s reason for Allowance
Claims 1-20 are allowed.
 (Claim 1) A data processing system comprising: 
a processor; 
and a memory in communication with the processor, 
the memory comprising executable instructions that, when executed by the processor, cause the data processing system to perform functions of:
 receiving a training data set, the training data set including speech audio data that is labeled to identify occurrences of one or more speech features at specific locations in the speech audio data,
 the one or more speech features being features that indicate one or more problems in a speaker's speech;
 performing preprocessing on the speech audio data to prepare the speech audio data for use in training one or more models that detect in an audio data only the one or more speech features that indicate the one or more problems;
 and providing the preprocessed speech audio data to a stacked machine learning (ML) model for training the stacked ML model to detect the one or more speech features, wherein:
 the stacked ML model includes a feature aggregation model, a sequence to sequence model, and a decision-making model, 
and the sequence to sequence model is trained to receive feature vectors from the feature aggregation model, identify one or more sequences, the one or more sequences including a previous state and a current state, and 
generate a prediction sequence score for at least one of the one or more identified sequences, 
the prediction sequence score representing a likelihood of the at least one of the more identified sequences being indicative of the at least one of the one or more speech features that indicate the one or more problems.
The following is an examiner's statement of reasons for allowance:Regarding claim 1, the prior art of record, specifically Brouwer et al. (US Patent Application Publication #Osbourne et al. US20190340235 ) teaches a  a method performed by a device may include identifying a plurality of samples of textual content; performing tokenization of the plurality of samples of textual content to generate a respective plurality of tokenized samples; determining groupings of attributes of the sample matrix using a convolutional neural network; determining context relationships between the groupings of attributes using a bidirectional long short term memory (LSTM) technique; a predicted label of the predicted labels from a plurality of labels based on respective scores of the particular sample with regard to the plurality of labels and providing information identifying the predicted labels. (Paragraphs 0003). Galle et al. (US 2018/0011839) teaches probability component  outputs a prediction for an input symbol being in a respective context in an input sequence  or a prediction for a sequence of symbols, using the gapped sequence model. The exemplary model  uses relevant ones of the statistics  and includes a back-off operator which applies a smoothing technique for providing symbol prediction for symbols of the input sequence for which the full context has not been observed (or is below a threshold) in combination with that symbol in the training set. An information generator  may generate information based on the computed prediction, such as a prediction as to whether the input sequence is in a given language, one of a set of candidate sequences having the highest score. (Paragraphs 0034).
However, none of the prior art cited alone or in combination provides the motivation to teach the training data set including speech audio data that is labeled to identify occurrences of one or more speech features at specific locations in the speech audio data, the one or more speech features being features that indicate one or more problems in a speaker's speech;
performing preprocessing on the speech audio data to prepare the speech audio data for use in training one or more models that detect in an audio data only the one or more speech features that indicate the one or more problems;
 and providing the preprocessed speech audio data to a stacked machine learning (ML) model for training the stacked ML model to detect the one or more speech features.
Claim 9, A data processing system comprising: 
a processor; and a memory in communication with the processor,
 the memory comprising executable instructions that, when executed by the processor, cause the data processing system to perform functions of: 
3Application No.: 16/710,442 Attorney Docket No. 407678-US-NP/170101-466US receiving speech audio data that includes occurrences of one or more speech features at specific locations in the speech audio data, 
the one or more speech features being features that indicate one or more problems in a speaker's speech; 
performing preprocessing on the speech audio data to prepare the speech audio data for use as an input into one or more models that only detect the one or more speech features that indicate the one or more problems; 
providing the preprocessed speech audio data to a stacked ML model; and analyzing the preprocessed speech audio data via the stacked ML model to detect the one or more problems, wherein:
 the stacked ML model includes a feature aggregation model, a sequence to sequence model, and a decision-making model, and 
the sequence to sequence model is trained to receive feature vectors from the feature aggregation model, 
identify one or more sequences, the one or more sequences including a previous state and a current state, and 
generate a prediction sequence score for at least one of the one or more identified sequences, 
the prediction sequence score representing a likelihood of the at least one of the more identified sequences being indicative of the at least one of the one or more speech features that indicate the one or more problems.
The following is an examiner's statement of reasons for allowance:Regarding claim 9, the prior art of record, specifically Brouwer et al. (US Patent Application Publication #Osbourne et al. US20190340235 ) teaches a  a method performed by a device may include identifying a plurality of samples of textual content; performing tokenization of the plurality of samples of textual content to generate a respective plurality of tokenized samples; determining groupings of attributes of the sample matrix using a convolutional neural network; determining context relationships between the groupings of attributes using a bidirectional long short term memory (LSTM) technique; a predicted label of the predicted labels from a plurality of labels based on respective scores of the particular sample with regard to the plurality of labels and providing information identifying the predicted labels. (Paragraphs 0003). Galle et al. (US 2018/0011839) teaches probability component  outputs a prediction for an input symbol being in a respective context in an input sequence  or a prediction for a sequence of symbols, using the gapped sequence model. The exemplary model  uses relevant ones of the statistics  and includes a back-off operator which applies a smoothing technique for providing symbol prediction for symbols of the input sequence for which the full context has not been observed (or is below a threshold) in combination with that symbol in the training set. An information generator  may generate information based on the computed prediction, such as a prediction as to whether the input sequence is in a given language, one of a set of candidate sequences having the highest score. (Paragraphs 0034).
However, none of the prior art cited alone or in combination provides the motivation to teach performing preprocessing on the speech audio data to prepare the speech audio data for use as an input into one or more models that only detect the one or more speech features that indicate the one or more problems; providing the preprocessed speech audio data to a stacked ML model; and analyzing the preprocessed speech audio data via the stacked ML model to detect the one or more problems, wherein: the stacked ML model includes a feature aggregation model, a sequence to sequence model, and a decision-making model, and the sequence to sequence model is trained to receive feature vectors from the feature aggregation model, identify one or more sequences, the one or more sequences including a previous state and a current state, and generate a prediction sequence score for at least one of the one or more identified sequences, the prediction sequence score representing a likelihood of the at least one of the more identified sequences being indicative of the at least one of the one or more speech features that indicate the one or more problems.
Claim 17, A method for detecting one or more speech features in speech audio data comprising:
receiving the speech audio data, the speech audio data including occurrences of one or more speech features at specific locations in the speech audio data, the one or more speech features being features that indicate one or more problems in a speaker's speech;
performing preprocessing on the speech audio data to prepare the speech audio data for use as an input into one or more models that only detect the one or more speech features that indicate the one or more problems;
providing the preprocessed speech audio data to a stacked ML model; and
analyzing the preprocessed speech audio data via the stacked ML model to detect the one or more speech features problems, wherein:
the stacked ML model includes a feature aggregation model, a sequence to sequence model, and a decision-making model, and
the sequence to sequence model is trained to receive feature vectors from the feature aggregation model, identify one or more sequences, the one or more sequences including a previous state and a current state, and generate a prediction sequence score for at least one of the one or more identified sequences, the prediction sequence score representing a likelihood of the at least one of the more identified sequences being indicative of the at least one of the one or more speech features that indicate the one or more problems.
The following is an examiner's statement of reasons for allowance:Regarding claim 17, the prior art of record, specifically Brouwer et al. (US Patent Application Publication #Osbourne et al. US20190340235 ) teaches a  a method performed by a device may include identifying a plurality of samples of textual content; performing tokenization of the plurality of samples of textual content to generate a respective plurality of tokenized samples; determining groupings of attributes of the sample matrix using a convolutional neural network; determining context relationships between the groupings of attributes using a bidirectional long short term memory (LSTM) technique; a predicted label of the predicted labels from a plurality of labels based on respective scores of the particular sample with regard to the plurality of labels and providing information identifying the predicted labels. (Paragraphs 0003). Galle et al. (US 2018/0011839) teaches probability component  outputs a prediction for an input symbol being in a respective context in an input sequence  or a prediction for a sequence of symbols, using the gapped sequence model. The exemplary model  uses relevant ones of the statistics  and includes a back-off operator which applies a smoothing technique for providing symbol prediction for symbols of the input sequence for which the full context has not been observed (or is below a threshold) in combination with that symbol in the training set. An information generator  may generate information based on the computed prediction, such as a prediction as to whether the input sequence is in a given language, one of a set of candidate sequences having the highest score. (Paragraphs 0034).
However, none of the prior art cited alone or in combination provides the motivation to teach receiving the speech audio data, the speech audio data including occurrences of one or more speech features at specific locations in the speech audio data, the one or more speech features being features that indicate one or more problems in a speaker's speech.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Akwasi M Sarpong whose telephone number is (571)270-3438. The examiner can normally be reached Mon-Fri. 8:00am-4:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, KING D POON can be reached on 571-272-7440. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/AKWASI M SARPONG/           Primary  Examiner, Art Unit 2675
        07/14/2022