DETAILED ACTION

Introduction

1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . A response was filed in this application on 07/23/2021 after the non-final rejection of 05/13/2021. Claims 1, 3-10, and 12-15 remain pending following entry of this response and are examined below. Claims 1, 3-7, 10, 12-15 have been amended while claims 2, 11, 16-20 have been cancelled and no new claims have been added. 

Response to amendments and arguments

2.	The Applicants acknowledged the allowable subject matter indicated in the last office action and have accordingly amended claims 1 and 10 to recite subject matter previously recited in claims 2 and 11, respectively, and have rewritten claim 7 into independent form. Applicant have further cancelled claims 2, 11, 16-18, and 20 from further consideration in this application. The Applicant’s arguments are therefore persuasive in conjunction with the amendments to the independent claims submitted in this latest response and an updated search. Instant claims 1, 3-10, and 12-15 are therefore in condition for allowance. 



                        Allowable Subject Matter

3.	Claims 1, 3-10, and 12-15 are allowable over the prior art of record. The following is the examiner’s statement of the reasons for allowance. The closest relevant prior art (which is discussed in further detail below), either taken individually or in combination, fails to explicitly teach or reasonably suggest the invention as represented by independent claims. The applicant has described a novel method of identifying and recovering out-of-vocabulary words in transcripts of a voice data recording using word recognition models and word sub-unit recognition models.

Most pertinent prior art:

Nissan (U.S. Patent Application Publication # 2016/0171973 A1) in paragraphs 42-43 and Figure 1, teaches speech input into a hybrid speech recognition system. Para 44 and Figure 1, teach said hybrid speech recognizer 132 generating transcripts of words and sub-words, denoted also as hybrid transcriptions 102. Paragraphs 46-50, 61 and 91 along with figures 1-2, teach that said hybrid transcriptions 102 are provided to a patterns extractor 134, which identifies and extracts patterns or sequences of sub-words in the hybrid transcriptions. Patterns set 106 are provided to a candidate patterns identifier 136, which identifies patterns for representing out-of-vocabulary words, thus Paragraphs 83-84 along with figure 1, show sequence of operations denoted by 158, wherein out-of-vocabulary potential words 112, reduced set 164 and a pronunciations repository as described above, denoted also as a repository 156, are provided to a transcripts updater, denoted also as transcripts updater 152. Transcripts updater 152 checks the terms in reduced set 164 against repository 156, and terms that meet a metric condition with respect to pronunciations in repository 156, such as a sufficiently small phonetic distance, are replaced in hybrid transcriptions 102, thus generating updated transcriptions. 

Ouyang (U.S. Patent # 9047268 B2) in col 4, line 63 to col 5, line 23 and figures 1-6, teaches determining scores for a first set of candidate strings. A lexicon may include each candidate string in the first set of candidate strings. For each respective candidate string from the first set of candidate strings, the score for the respective candidate string is based at least in part on a probability of the respective character string being entered. In addition, the computing device may determine scores for a second set of candidate strings. The lexicon does not necessarily include each candidate string from the second set of candidate strings. Rather, the second set of candidate strings may include candidate strings that are not in the lexicon. For each respective candidate string from the second set of candidate strings, the score for the respective candidate string is based at 

Minnis (U.S. Patent # 10380242 B2) in col 1, lines 40-50 and figures 2-3, teaches a method for out-of-vocabulary compound word handling. The method includes storing a plurality of compound word rules and compound word dictionaries in a database. The method also includes evaluating membership criteria associated with a received compound word, wherein membership criteria includes at least one of dictionary based or, part of speech (POS) based criteria. The method may further include applying one or more filtering rules to the received compound word.

Profio (U.S. Patent Application Publication # 2002/0013706 A1) in para 40 and figure 1, teaches a basic functionality of a speech recognizer wherein an unknown word is input to a first-stage recognition unit 1 which performs an automatic speech recognition on basis of a general vocabulary 7. The recognition result of the first-stage recognition unit 1 is output as a first recognition result. This first recognition result is input to a key-subword detection unit 2 in order to determine the category which applies to the input unknown word. As mentioned above, the category is dependent on one or more recognized key-subwords within the first recognition result. Based on the one or more detected key-subwords a vocabulary reduction unit 8 determines the vocabulary belonging to the category defined by the set of key-subwords output from the key-subword 

Hence, as evidenced above, the prior art of record, although teaching bits and parts, fails to completely describe the invention set forth in the independent claims, namely a method for recovering out-of-vocabulary words in transcriptions of a voice data recording, comprising receiving a voice data recording for transcription into a textual representation of the voice data recording; transcribing the voice data recording into the textual representation using a word recognition model, wherein the word recognition model comprises a connectionist temporal classification model trained using a training data set of whole words and sub- words; identifying an unknown word in the textual representation; reconstructing the unknown word in the textual representation based on recognition of sub-units of the unknown word generated by a sub-unit recognition model, wherein the sub-unit recognition model comprises a connectionist temporal classification model trained using sub-words; modifying the textual representation of the voice data recording by replacing the unknown word with the reconstruction of the unknown word and outputting the modified textual representation of the voice data recording.



CONCLUSION

4.	The following prior art, made of record but not relied upon, is consideredpertinent to applicant's disclosure: Lilly (U.S. Patent # 9697827 B1), Gupta (U.S. Patent Application Publication # 2015/0228272 A1), Mamou (U.S. Patent Application Publication # 2009/0030894 A1), Sabourin (U.S. Patent # 6912499 B1), Cheng (U.S. Patent Application Publication # 2008/0162128 A1), Seide (U.S. Patent Application Publication # 2003/0110032 A1), Jiang (U.S. Patent # 6539353 B1), Popvici (U.S. Patent Application Publication # 2017/0076718 A1). These references are also included in the attached PTO-892 form.

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance”.



Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. If you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). In case you require assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/NEERAJ SHARMA/
Primary Examiner, Art Unit 2659

571-270-5487 (Direct Phone)
571-270-6487 (Direct Fax)
neeraj.sharma@uspto.gov (Direct Email)