Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on June 19, 2020 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Terminal Disclaimer
The terminal disclaimer filed on August 05, 2022 disclaiming the terminal portion of any patent granted on this application which would extend beyond the expiration date of U.S. Patent No. 10,783,880 has been reviewed and is accepted.  The terminal disclaimer has been recorded.
Response to Arguments and Amendments
The amendment filed on August 05, 2022 has been entered. Claims 45-56 are pending in this application.
The applicant claims that Hwang fails to disclose the limitations of “the sequence of recognized words output by the speech recognition system” and “outputs an alignment result identifying a time alignment between the received sequence of recognized words and sub-word units and the sequence of acoustic feature vectors representing the input utterance spoken by the user”. However, the examiner respectfully disagrees with this assertion. In figure 4 and paragraphs 52-59, Hwang discloses phonetic sequence 407, which is then provided to the alignment module 414. This can be interpreted as the sequence of recognized words output by the speech recognition system. Furthermore, also found in figure 4 and paragraphs 52-59, Hwang discloses alignment module 414, which aligns the previously mentioned phonetic sequence. Also, on paragraph 0044, Hwang discloses “a feature vector is computed from the mel-coefficients by taking the first and Second derivative of the mel-frequency coefficients plus power with respect to time. Thus, for Such feature vectors, each frame is associated with 39 values that form the feature vector”. This can be interpreted as the alignment result identifying a time alignment between the sequence of recognized words and the sequence of acoustic feature vectors representing the input utterance spoken by the user.
Hence, the applicant’s arguments are not persuasive.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 45-56 are rejected under 35 U.S.C. 103 as being unpatentable over Yoon (U.S. Publication No. 20140141392) in view of Hwang (U.S. Publication No. 20050203738).
Regarding claim 45, Yoon discloses a speech processing system comprising:
an input for receiving an input utterance spoken by a user (See e.g., “…input device 674, such as a microphone,…” and how “…speech sample 202 is provided…,” YOON paras. 13-15, 31);
a speech recognition system that recognizes the input utterance spoken by the user and that outputs a recognition result comprising a sequence of recognized words and sub-word units corresponding to the input utterance (See e.g., “…A speech sample 202 is accessed and provided to an automatic speech recognizer 204 that generates word hypotheses for the speech sample 202 and time stamp associations for those word hypotheses that are output 206 to a speech sample scoring engine 208…,” YOON paras. 13-15, Figs. 1, 2, 4, 5);
an acoustic model store that stores acoustic speech models (See e.g., “…an acoustic model trained on native English speakers to generate word hypotheses, time stamp associations, and other acoustic measures 406…”; “…automatic speech recognizer 404 may include an acoustic model trained using non-native speakers…,” YOON paras. 13-15, 22-24, Figs. 1, 2, 4, 5).
However, Yoon does not disclose a word alignment unit configured to receive the sequence of recognized words and sub-word units output by the speech recognition system and to align a sequence of said acoustic speech models corresponding to the received sequence of recognized words and sub-word units with a sequence of acoustic feature vectors rep resenting the input utterance spoken by the user and to output an alignment result identifying a time alignment between the received sequence of recognized words and sub-word units and the sequence of acoustic feature vectors representing the input utterance spoken by the user.
Hwang does teach a word alignment unit configured to receive the sequence of recognized words and sub-word units output by the speech recognition system and to align a sequence of said acoustic speech models corresponding to the received sequence of recognized words and sub-word units with a sequence of acoustic feature vectors rep resenting the input utterance spoken by the user and to output an alignment result identifying a time alignment between the received sequence of recognized words and sub-word units and the sequence of acoustic feature vectors representing the input utterance spoken by the user (See e.g., how in Fig. 4 best phonetic sequence 407 and list of possible phonetic sequences 412 are inputted to alignment module 414, and further outputted to rescoring module 416 in combination with acoustic model 318, and see also how “best phonetic sequence 407 from SLU engine 405 and list of possible phonetic sequences 412 from grammar module 404 are provided to alignment module 414… alignment module 414 aligns phonetic sequences 407 and 412 …,” HWANG paras. 52-59, Fig. 4).
It would have been obvious to a person of ordinary skill in the art, before the effective filling date of the claimed invention, to modify the teachings of YOON with an architecture having alignment capabilities as techniques and applications of the same taught by HWANG in order to advantageously furnish alignment modules and/or methods having advantages for calculating speech recognition error rates due, for example, from substitution errors, deletion errors, and/or insertion errors when assessing pronunciation of user’s spoken input (HWANG paras. 52-61, Fig. 4).
Regarding claim 46, Yoon in view of Hwang teaches the speech processing system, wherein the word alignment unit is configured to output a sequence of sub-word units corresponding to a dictionary pronunciation of the recognized input utterance (See e.g., “…SLU engine 405 comprises or accesses SLU dictionary 409 and acoustic model 318 to generate the most likely sequence of SLUs, typically based on a highest probability score. SLU engine 403 then converts the most likely sequence of syllable-like units into a sequence of phonetic units, which is provided to alignment module 414…,” HWANG paras. 52-61, Fig. 4).
Regarding claim 47, Yoon in view of Hwang teaches the speech processing system, wherein the word alignment unit is configured to output a sequence of sub-word units corresponding to a dictionary pronunciation of the matching possible utterance (See e.g., “…SLU engine 405 comprises or accesses SLU dictionary 409 and acoustic model 318 to generate the most likely sequence of SLUs, typically based on a highest probability score. SLU engine 403 then converts the most likely sequence of syllable-like units into a sequence of phonetic units, which is provided to alignment module 414…,” HWANG paras. 52-61, Fig. 4).
Regarding claim 48, Yoon in view of Hwang teaches a speech processing system, further comprising a sub-word alignment unit configured to receive the sequence of sub-word units corresponding to the dictionary pronunciation (See e.g., “…SLU engine 405 comprises or accesses SLU dictionary 409 and acoustic model 318 to generate the most likely sequence of SLUs, typically based on a highest probability score. SLU engine 403 then converts the most likely sequence of syllable-like units into a sequence of phonetic units, which is provided to alignment module 414…,” HWANG paras. 52-61, Fig. 4)
and configured to align the sequence of sub-word units corresponding to the dictionary pronunciation received from the word alignment unit with the input utterance spoken by the user whilst allowing for sub-word units to be inserted between words (See e.g., “…SLU engine 405 comprises or accesses SLU dictionary 409 and acoustic model 318 to generate the most likely sequence of SLUs, typically based on a highest probability score. SLU engine 403 then converts the most likely sequence of syllable-like units into a sequence of phonetic units, which is provided to alignment module 414…,” HWANG paras. 52-61, Fig. 4) and for sub-word units of a word to be replaced by other sub-word units to determine where the input utterance spoken by the user differs from the dictionary pronunciation (See e.g., how “…in some cases the user's pronunciation of a new word can be very different than a typical pronunciation. For instance, a speaker might pronounce an English word by substituting a foreign translation of the English word. This feature, for example, would permit a speech recognition lexicon to store the text or spelling of a word in one language and the acoustic description in a second language different from the first language…,” HWANG paras. 52-61, Fig. 4) and to output a sequence of sub-word units corresponding to an actual pronunciation of the input utterance spoken by the user (See e.g., “…SLU engine 405 comprises or accesses SLU dictionary 409 and acoustic model 318 to generate the most likely sequence of SLUs, typically based on a highest probability score. SLU engine 403 then converts the most likely sequence of syllable-like units into a sequence of phonetic units, which is provided to alignment module 414…,” HWANG paras. 52-61, Fig. 4).
Regarding claim 49, Yoon in view of Hwang teaches a speech processing system, wherein the sub-word alignment unit is configured to use the sequence of sub-word units corresponding to the dictionary pronunciation of the recognized input utterance to generate a network having a plurality of paths allowing for sub-word units to be inserted between recognized words and for sub-word units of a recognized word to be replaced by other sub-word units and wherein the sub-word alignment unit is configured to align acoustic speech models for the different paths defined by the network with the input utterance spoken by the user (See e.g., “…alignment module 414 places the aligned phonetic sequences in a single graph. During this process, identical phonetic units that are aligned with each other are combined onto a single path. Differing phonetic units that are aligned with each other are placed on parallel alternative paths in the graph… The single graph is provided to rescoring module 416…to rescore possible combinations of phonetic units represented by paths through the single graph… rescoring module 416 performs a Viterbi search to identify the best path through the graph using acoustic model scores generated by comparing the feature vectors 403 produced by the user's pronunciation of the word with the model parameters stored in acoustic model 318 for each phonetic unit along a path…,” HWANG paras. 52-61, 78, Fig. 4).
Regarding claim 50, Yoon in view of Hwang teaches a speech processing system, wherein the sub-word alignment unit is configured to maintain a score representing the closeness of the match between the acoustic speech models for the different paths defined by the second network and input utterance spoken by the user (See e.g., “…SLU engine 405 comprises or accesses SLU dictionary 409 and acoustic model 318 to generate the most likely sequence of SLUs, typically based on a highest probability score. SLU engine 403 then converts the most likely sequence of syllable-like units into a sequence of phonetic units, which is provided to alignment module 414…,” HWANG paras. 52-61, Fig. 4).
Regarding claim 51, Yoon in view of Hwang teaches a speech processing system, further comprising a speech scoring feature determining unit configured to receive and to determine a measure of similarity between the sequence of sub-word units output by the word alignment unit and the sequence of sub-word units output by the sub-word alignment unit (See e.g., how in Fig. 4 best phonetic sequence 407 and list of possible phonetic sequences 412 are inputted to alignment module 414, and further outputted to rescoring module 416 in combination with acoustic model 318, and see also how “best phonetic sequence 407 from SLU engine 405 and list of possible phonetic sequences 412 from grammar module 404 are provided to alignment module 414… alignment module 414 aligns phonetic sequences 407 and 412 …,” and please see e.g., “…score select and update module 418 selects the highest scoring phonetic sequence or path though the single graph. The selected sequence is provided to update user lexicon 314 at step 514 and language model 316 at step 516…,” HWANG paras. 52-60, Fig. 4).
Regarding claim 52, Yoon in view of Hwang teaches a speech processing system, further comprising a free align unit configured to align acoustic speech models with the input utterance spoken by the user and to output an alignment result including a sequence of sub-word units that matches with the input utterance spoken by the user (See e.g., “…alignment module 414 places the aligned phonetic sequences in a single graph. During this process, identical phonetic units that are aligned with each other are combined onto a single path. Differing phonetic units that are aligned with each other are placed on parallel alternative paths in the graph… The single graph is provided to rescoring module 416…to rescore possible combinations of phonetic units represented by paths through the single graph… rescoring module 416 performs a Viterbi search to identify the best path through the graph using acoustic model scores generated by comparing the feature vectors 403 produced by the user's pronunciation of the word with the model parameters stored in acoustic model 318 for each phonetic unit along a path…,” HWANG paras. 52-61, 78, Fig. 4).
Regarding claim 53, Yoon in view of Hwang teaches a speech processing system, comprising a speech scoring feature determining unit configured to receive and to determine a plurality of speech scoring feature values for the input utterance (See e.g., “…SLU engine 405 comprises or accesses SLU dictionary 409 and acoustic model 318 to generate the most likely sequence of SLUs, typically based on a highest probability score. SLU engine 403 then converts the most likely sequence of syllable-like units into a sequence of phonetic units, which is provided to alignment module 414…,”; See e.g., how in Fig. 4 best phonetic sequence 407 and list of possible phonetic sequences 412 are inputted to alignment module 414, and further outputted to rescoring module 416 in combination with acoustic model 318, and see also how “best phonetic sequence 407 from SLU engine 405 and list of possible phonetic sequences 412 from grammar module 404 are provided to alignment module 414… alignment module 414 aligns phonetic sequences 407 and 412 …,” and please see e.g., “…score select and update module 418 selects the highest scoring phonetic sequence or path though the single graph. The selected sequence is provided to update user lexicon 314 at step 514 and language model 316 at step 516…,”  HWANG paras. 52-61, Fig. 4).
Regarding claim 54, Yoon discloses a speech processing system, further comprising a scoring unit operable to receive the plurality of speech scoring feature values for the input utterance determined by the speech scoring feature determining unit and configured to generate a score representing the language ability of the user (See e.g., “The speech sample scoring engine 208 generates a plurality of difficulty measures 210, 212 that are provided to a scoring model 214 for generation of a difficulty score 216 that is associated with a speech sample 202 under consideration” YOON paras. 13).
Regarding claim 55, Yoon discloses a speech processing system, wherein the score represents the fluency and/or proficiency of the user's spoken utterance (See e.g., “a pure acoustic characteristic is determined by analyzing a number of pauses in the speech sample 202 to deter mine fluency difficulty measures Such as silences per unit time or silences per word. Such a second difficulty measure 212 is provided to the scoring model 214 for generation of a difficulty score 216 representative of the difficulty of the speech sample” YOON paras. 15).
Regarding claim 56, Yoon discloses a speech processing system comprising:
receiving an input utterance spoken by a user(See e.g., “…input device 674, such as a microphone,…” and how “…speech sample 202 is provided…,” YOON paras. 13-15, 31);
using a speech recognition system to recognize the input utterance spoken by the user and to output a recognition result comprising a sequence of recognized words and sub-word units corresponding to the input utterance (See e.g., “…A speech sample 202 is accessed and provided to an automatic speech recognizer 204 that generates word hypotheses for the speech sample 202 and time stamp associations for those word hypotheses that are output 206 to a speech sample scoring engine 208…,” YOON paras. 13-15, Figs. 1, 2, 4, 5);
However, Yoon does not disclose receiving the sequence of recognized words and sub-word units output by the speech recognition system and aligning a sequence of acoustic speech models corresponding to the received sequence of recognized words and sub-word units with a sequence of acoustic feature vectors representing the input utterance spoken by the user;
and outputting an alignment result identifying a time alignment between the received sequence of recognized words and sub-word units and the sequence of acoustic feature vectors representing the input utterance spoken by the user.
Hwang does teach receiving the sequence of recognized words and sub-word units output by the speech recognition system and aligning a sequence of acoustic speech models corresponding to the received sequence of recognized words and sub-word units with a sequence of acoustic feature vectors representing the input utterance spoken by the user (See e.g., how in Fig. 4 best phonetic sequence 407 and list of possible phonetic sequences 412 are inputted to alignment module 414, and further outputted to rescoring module 416 in combination with acoustic model 318, and see also how “best phonetic sequence 407 from SLU engine 405 and list of possible phonetic sequences 412 from grammar module 404 are provided to alignment module 414… alignment module 414 aligns phonetic sequences 407 and 412 …,” HWANG paras. 52-59, Fig. 4);
 and outputting an alignment result identifying a time alignment between the received sequence of recognized words and sub-word units and the sequence of acoustic feature vectors representing the input utterance spoken by the user (See e.g., how in Fig. 4 best phonetic sequence 407 and list of possible phonetic sequences 412 are inputted to alignment module 414, and further outputted to rescoring module 416 in combination with acoustic model 318, and see also how “best phonetic sequence 407 from SLU engine 405 and list of possible phonetic sequences 412 from grammar module 404 are provided to alignment module 414… alignment module 414 aligns phonetic sequences 407 and 412 …,” HWANG paras. 52-59, Fig. 4).
It would have been obvious to a person of ordinary skill in the art, before the effective filling date of the claimed invention, to modify the teachings of YOON with an architecture having alignment capabilities as techniques and applications of the same taught by HWANG in order to advantageously furnish alignment modules and/or methods having advantages for calculating speech recognition error rates due, for example, from substitution errors, deletion errors, and/or insertion errors when assessing pronunciation of user’s spoken input (HWANG paras. 52-61, Fig. 4).
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action.
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Deng (U.S. Publication No. 20120065976) teaches a deep belief network for large vocabulary continuous speech recognition. Waibel (U.S. Publication No. 20110307241) teaches an enhanced speech-to-speech translation system and methods.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ETHAN DANIEL KIM whose telephone number is (571) 272-1405.  The examiner can normally be reached on Monday - Friday 9:00 - 5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on (571) 272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ETHAN DANIEL KIM/Examiner, Art Unit 2658

/RICHEMOND DORVIL/Supervisory Patent Examiner, Art Unit 2658