DETAILED ACTION
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 10/14/2020 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner. However, it is noted that All Non-Patent Literature (NPL) citations need at least a month and year of publication: MPEP 609.04(a): The date of publication supplied must include at least the month and year of publication, except that the year of publication (without the month) will be accepted if the applicant points out in the information disclosure statement that the year of publication is sufficiently earlier than the effective U.S. filing date and any foreign priority date so that the particular month of publication is not in issue. NPL cited without at least the month and year of publication has been labeled with “no date available”.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the claims at issue are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); and In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on a nonstatutory double patenting ground provided the reference application or patent either is shown to be commonly owned with this application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159.  See MPEP §§ 706.02(l)(1) - 706.02(l)(3) for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/forms/. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to http://www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1 & 6 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-2 of US Patent 10,832,654 B2. Although the claims at issue are not identical, they are not patentably distinct from each other.
Claims 1 & 6 of the instant application is anticipated by US Patent claims 1-2 in that claims 1-2 of the US Patent contain all the limitations of claims 1 & 6 of the instant application. Claims 1 & 6 of the instant application therefore is not patently distinct from the US Patent claims and as such is unpatentable for obvious-type double patenting. 




Claims of Instant Application 17/070,283
Claims of US PATENT 10,832,654 B2
1. A method comprising: 
receiving, at data processing hardware, audio data of an utterance captured by a computing device;



identifying, by the data processing hardware, a native language of a speaker of the utterance based on tones or intonation within the audio data of the utterance;
based on the identified native language of the speaker of the utterance, selecting, by the data processing hardware, an accent library that includes phonemes for pronunciations of words for the particular language; and 




generating, by the data processing hardware, using a speech recognition engine altered by the selected accent library, a transcription of the utterance.
1. A computer-implemented method comprising: 
receiving, by an automated speech recognition system that includes a speech recognition engine, audio data of an utterance; 
selecting, by the automated speech recognition system, a linguistic library that includes the words of a language;
identifying, by the automated speech recognition system, a native language of a speaker of the utterance based on tones or intonation within the audio data of the utterance; based on the identified native language of the speaker of the utterance, selecting, by the automated speech recognition system, two or more accent libraries that each include phonemes for different pronunciations for the words of the language;
generating, by the automated speech recognition system, a combined accent library by combining the two or more accent libraries; obtaining a transcription of the utterance by performing, by the speech recognition engine, speech recognition on the audio data of the utterance using the linguistic library and the combined accent library; and providing, for output, the transcription of the utterance.
6. The method of claim 1, further comprising: determining, by the data processing hardware, demographic data of the speaker of the utterance, wherein identifying the native language of the speaker of the utterance is further based on the demographic data of the speaker of the utterance.
2. The method of claim 1, comprising: determining demographic data of a speaker of the utterance, wherein selecting the two or more accent libraries that each include phonemes for different pronunciations for the words of the language is further based on the demographic data of the speaker of the utterance.



Claims 11 & 16 (drawn to a system) of the instant application is anticipated by US Patent claims 8-9 (drawn to a system) in a similar manner in which claims 1 & 6 of the instant application is anticipated by US Patent claims 1-2 in the table above. This is not repeated in the table.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 1 and 11 recites the limitation "the particular language”.  In addition, claims 3 and 13 recites again the limitation "the particular language” and it is unclear to which particular language this refers to. There is insufficient antecedent basis for this limitation in the claim.

Claim Rejections - 35 USC § 103
The following is a quotation of pre-AIA  35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:
(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the manner in which the invention was made.

Claim 1-2 & 11-12 is/are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Das et al (US 20040148161) in view of Basson et al (US 20120290299).
Regarding claim 1, Das discloses a method comprising: 
receiving, at data processing hardware, audio data of an utterance captured by a computing device (¶20 comprises an audio input 102--a telephone line, for example, or a microphone--connected to a language and accent identifier 106 and an accent normalizer 116; Input 102 conveys signals representing accented speech);
identifying, by the data processing hardware, a native language of a speaker of the utterance based on tones or intonation within the audio data of the utterance (¶13-18 Significant characteristics of accent are: Word stop-release time., Voice onset time, Vowel duration., Slope of the intonation contour (i.e., the slope of the fundamental frequency)., Shift of the second and third formants; ¶20 Identifier 106 recognizes the language in, and the accent with, which the speech is spoken); 
based on the identified native language of the speaker of the utterance, selecting, by the data processing hardware, an accent library that includes phonemes for pronunciations of words for the particular language (¶20 Comparator 108 selects, from database 112 contents that were selected by identifier 106, the contents that correspond to the accented phonemes identified by detector 104. The contents of database 112 illustratively comprise database entries 114 each comprising an accent-affected phoneme in a language and an accent, and the corresponding unaccented phoneme or the rules for forming the unaccented phoneme in that language); and 
Das fails to specifically teach generating, by the data processing hardware, using a speech recognition engine altered by the selected accent library, a transcription of the utterance.
Basson teaches generating, by the data processing hardware, using a speech recognition engine altered by the selected accent library, a transcription of the utterance (¶43 In the speaker adaptation 808, a user's accent serves as input to an accent dependent model selection component 810, which provides user's voice data to an unsupervised acoustic component 812, which provides an adapted acoustic model to speech transcription engine 814).
Therefore, it would have been obvious to one with ordinary skill in the art at the time the invention was made to have implemented the teaching of generating, by the data processing hardware, using a speech recognition engine altered by the selected accent library, a transcription of the utterance from Basson into the method as disclosed by Das. The motivation for doing this is to improve techniques for converting spoken speech into written speech.

Regarding claim 2, the combination of Das and Basson discloses the method of claim 1, further comprising providing, by the data processing hardware, the transcription of the utterance for output (Basson ¶34 ASR unit 101 processes the speech input 100 and provides input to spoken to written (S2W) text module 102, which then generates text 103, which is output by the system; ¶43 In the speaker adaptation 808, a user's accent serves as input to an accent dependent model selection component 810, which provides user's voice data to an unsupervised acoustic component 812, which provides an adapted acoustic model to speech transcription engine 814).
Therefore, it would have been obvious to one with ordinary skill in the art at the time the invention was made to have implemented the teaching of providing, by the data processing hardware, the transcription of the utterance for output from Basson into the method as disclosed by Das. The motivation for doing this is to improve techniques for converting spoken speech into written speech.

Regarding claim(s) 11-12 (drawn to a system):               
The rejection/proposed combination of Das and Basson, explained in the rejection of method claim(s) 1-2, anticipates/renders obvious the steps of the system of claim(s) 11-12 because these steps occur in the operation of the proposed combination as discussed above. Thus, the arguments similar to that presented above for claim(s) 1-2 is/are equally applicable to claim(s) 11-12.

Claim 3-8 & 13-18 rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over the combination of Das and Basson as applied to claim 1 and 11 above, and further in view of Mazza (US 20030191639).
Regarding claim 3, the combination of Das and Basson discloses the method of claim 1, but fails to teach wherein the accent library is selected among multiple accent libraries each associated with different accents present within a particular language; and the speaker of the utterance speaks the particular language as a second language.
Mazza teaches wherein the accent library is selected among multiple accent libraries each associated with different accents present within a particular language (¶36-37 using the retrieved language preference 370 (combined with a required type of vocabulary according to application needs), an appropriate vocabulary (e.g., English digit vocabulary) and acoustic models (e.g., acoustic models for English digits in French accent) may be determined.; for instance, if the area code 320 corresponds to a geographical area in Texas, it may be inferred that acoustic models corresponding to a Texan accent may be appropriate. As another example, if the exchange number 330 corresponds to a region (e.g., Chinatown in New York City), in which majority people speak English with a particular accent (i.e., Chinese living in Chinatown of New York City speak English with Chinese accent), a particular set of acoustic models corresponding to the inferred accent may be considered as appropriate) ; and the speaker of the utterance speaks the particular language as a second language (¶23 for example, a French person may speak English with a French accent; this indicates English as a second language).
Therefore, it would have been obvious to one with ordinary skill in the art at the time the invention was made to have implemented the teaching of wherein the accent library is selected among multiple accent libraries each associated with different accents present within a particular language; and the speaker of the utterance speaks the particular language as a second language from Mazza into the method as disclosed by the combination of Das and Basson. The motivation for doing this is to improve recognition performance.

Regarding claim 4, the combination of Das and Basson discloses the method of claim 1, but fails to teach obtaining, by the data processing hardware, device personal data based on user interactions with the computing device, wherein identifying the native language of the speaker of the utterance is further based on the device personal data.
Mazza teaches obtaining, by the data processing hardware, device personal data based on user interactions with the computing device, wherein identifying the native language of the speaker of the utterance is further based on the device personal data (¶36 Geographical information related to a call can be used to obtain more information relevant to the selection of vocabularies and acoustic models. For example, a caller ID forwarded from the voice response system 130 can be used to retrieve a corresponding customer profile that provides further relevant information such as language preference. Using the retrieved language preference 370 (combined with a required type of vocabulary according to application needs), an appropriate vocabulary (e.g., English digit vocabulary) and acoustic models (e.g., acoustic models for English digits in French accent) may be determined.).
Therefore, it would have been obvious to one with ordinary skill in the art at the time the invention was made to have implemented the teaching of obtaining, by the data processing hardware, device personal data based on user interactions with the computing device, wherein identifying the native language of the speaker of the utterance is further based on the device personal data from Mazza into the method as disclosed by the combination of Das and Basson. The motivation for doing this is to improve recognition performance.

Regarding claim 5, the combination of Das, Basson, and Mazza discloses the method of claim 4, wherein the device personal data comprises contextual application information (Mazza ¶35 A customer profile may record each of such individual potential callers and their language preferences (not shown in FIG. 3); ¶36 customer profile that provides further relevant information such as language preference).
Therefore, it would have been obvious to one with ordinary skill in the art at the time the invention was made to have implemented the teaching of wherein the device personal data comprises contextual application information from Mazza into the method as disclosed by the combination of Das and Basson. The motivation for doing this is to improve recognition performance.

Regarding claim 6, the combination of Das and Basson discloses the method of claim 1, but fails to teach determining, by the data processing hardware, demographic data of the speaker of the utterance, wherein identifying the native language of the speaker of the utterance is further based on the demographic data of the speaker of the utterance.
Mazza teaches determining, by the data processing hardware, demographic data of the speaker of the utterance, wherein identifying the native language of the speaker of the utterance is further based on the demographic data of the speaker of the utterance (¶37 in this case, the area code 320 or the exchange number 330 may be used to infer a language preference. For instance, if the area code 320 corresponds to a geographical area in Texas, it may be inferred that acoustic models corresponding to a Texan accent may be appropriate).
Therefore, it would have been obvious to one with ordinary skill in the art at the time the invention was made to have implemented the teaching of determining, by the data processing hardware, demographic data of the speaker of the utterance, wherein identifying the native language of the speaker of the utterance is further based on the demographic data of the speaker of the utterance from Mazza into the method as disclosed by the combination of Das and Basson. The motivation for doing this is to improve recognition performance.

Regarding claim 7, the combination of Das, Basson, and Mazza discloses the method of claim 6, wherein the demographic data of the speaker comprises a geographical location of where the speaker is located (Mazza ¶37 in this case, the area code 320 or the exchange number 330 may be used to infer a language preference. For instance, if the area code 320 corresponds to a geographical area in Texas, it may be inferred that acoustic models corresponding to a Texan accent may be appropriate).
Therefore, it would have been obvious to one with ordinary skill in the art at the time the invention was made to have implemented the teaching of wherein the demographic data of the speaker comprises a geographical location of where the speaker is located from Mazza into the method as disclosed by the combination of Das and Basson. The motivation for doing this is to improve recognition performance.

Regarding claim 8, the combination of Das, Basson, and Mazza discloses the method of claim 6, wherein the demographic data of the speaker is based on countries of addresses stored in an address book of a computing device that receives the utterance (Mazza ¶36 Geographical information related to a call can be used to obtain more information relevant to the selection of vocabularies and acoustic models; ¶37 n this case, the area code 320 or the exchange number 330 may be used to infer a language preference. For instance, if the area code 320 corresponds to a geographical area in Texas, it may be inferred that acoustic models corresponding to a Texan accent may be appropriate. As another example, if the exchange number 330 corresponds to a region (e.g., Chinatown in New York City), in which majority people speak English with a particular accent (i.e., Chinese living in Chinatown of New York City speak English with Chinese accent), a particular set of acoustic models corresponding to the inferred accent may be considered as appropriate).
Therefore, it would have been obvious to one with ordinary skill in the art at the time the invention was made to have implemented the teaching of wherein the demographic data of the speaker is based on countries of addresses stored in an address book of a computing device that receives the utterance from Mazza into the method as disclosed by the combination of Das and Basson. The motivation for doing this is to improve recognition performance.

Regarding claim(s) 13-18 (drawn to a system):               
The rejection/proposed combination of Das, Basson, and Mazza, explained in the rejection of method claim(s) 3-8, anticipates/renders obvious the steps of the system of claim(s) 13-18 because these steps occur in the operation of the proposed combination as discussed above. Thus, the arguments similar to that presented above for claim(s) 3-8 is/are equally applicable to claim(s) 13-18.

Claim 9-10 and 19-20 are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over the combination of Das and Basson as applied to claim 1 and 11 above, and further in view of Duan et al (US 20050228667 ).
Regarding claim 9,  the combination of Das and Basson disclose the method of claim 1, but fails to teach determining, by the data processing hardware, a level of accuracy for the transcription of the utterance; and based on the level of accuracy for the transcription of the utterance, selecting one or more additional accent libraries, wherein, when generating the transcription for the utterance, the speech recognition engine is further altered by the selected one or more additional accent libraries.
Duan teaches determining, by the data processing hardware, a level of accuracy for the transcription of the utterance (¶55 In step 826, a word-error rate corresponding to the current language model 218 is calculated and stored based upon a comparison between a known correct transcription of the pre-defined development data and a top recognition candidate 712 from N-best list 710); and based on the level of accuracy for the transcription of the utterance, selecting one or more additional accent libraries (¶55-56 if the new current lambda is not greater than one, the FIG. 8 process returns to step 818 to iteratively generate a new current language model 218, rescore N-best list 710, and calculate a new current word-error rate corresponding to the new current language model 218), wherein, when generating the transcription for the utterance, the speech recognition engine is further altered by the selected one or more additional accent libraries (¶55-56 in step 826, a word-error rate corresponding to the current language model 218 is calculated and stored based upon a comparison between a known correct transcription of the pre-defined development data and a top recognition candidate 712 from N-best list 710; recognizer 314 may then effectively utilize optimized language model 218 for accurately performing various speech recognition procedure).
Therefore, it would have been obvious to one with ordinary skill in the art at the time the invention was made to have implemented the teaching of determining, by the data processing hardware, a level of accuracy for the transcription of the utterance; and based on the level of accuracy for the transcription of the utterance, selecting one or more additional accent libraries, wherein, when generating the transcription for the utterance, the speech recognition engine is further altered by the selected one or more additional accent libraries from Duan into the method as disclosed by the combination of Das and Basson. The motivation for doing this is to improve systems and methods for effectively implementing an optimized language model for speech recognition.

Regarding claim 10,  the combination of Das, Basson, and Duan disclose the method of claim 9, wherein a number of additional accent libraries selected increases as the level of accuracy for the transcription increases (Duan ¶55-56 if the new current lambda is not greater than one, the FIG. 8 process returns to step 818 to iteratively generate a new current language model 218, rescore N-best list 710, and calculate a new current word-error rate corresponding to the new current language model 218; In accordance with the present invention, recognizer 314 may then effectively utilize optimized language model 218 for accurately performing various speech recognition procedures).
Therefore, it would have been obvious to one with ordinary skill in the art at the time the invention was made to have implemented the teaching of wherein a number of additional accent libraries selected increases as the level of accuracy for the transcription increases from Duan into the method as disclosed by the combination of Das and Basson. The motivation for doing this is to improve systems and methods for effectively implementing an optimized language model for speech recognition.

Regarding claim(s) 19-20 (drawn to a system):               
The rejection/proposed combination of Das, Basson, and Duan, explained in the rejection of method claim(s) 9-10, anticipates/renders obvious the steps of the system of claim(s) 19-20 because these steps occur in the operation of the proposed combination as discussed above. Thus, the arguments similar to that presented above for claim(s) 9-10 is/are equally applicable to claim(s) 19-20.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KEVIN KY whose telephone number is (571)272-7648. The examiner can normally be reached Monday-Friday 9-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chan Park can be reached on 571-272-7409. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/KEVIN KY/Primary Examiner, Art Unit 2669