DETAILED ACTION

Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement

The 28 page Information Disclosure Statement filed 6/12/2021, has been accepted in this application and considered to the extent the Office's internal time constraints permit. Applicant is reminded that "Although a concise explanation of the relevance of the information is not required for English language information, applicants are encouraged to provide a concise explanation of why the English language information is being submitted and how it is understood to be relevant. Concise explanations (especially those which point out the relevant pages and lines) are helpful to the Office, particularly where documents are lengthy and complex and applicant is aware of a section that is highly relevant to patentability or where a large number of documents are submitted and applicant is aware that one or more are highly relevant to patentability." See MPEP 609.04(a). In this case, applicant has submitted references numbering in the thousands, and it appears many of them are only tangentially relevant to the claimed invention. Although not required to do so, Applicant is requested to point out the most relevant documents and an explanation of relevance in order to help make the prosecution record clear. 
Double Patenting

The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.

Claims 1, 11 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 17 of U.S. Patent No. 10,311,860. Although the claims at issue are not identical, they are not patentably distinct from each other because, claims 1, 11 of the instant Application are similar in scope and content of claims 1 and 17 of the patent issued to the same Applicant. 
It is clear that all the elements of the application claims 1 and 11 are to be found in patented claims 1 and 17 (as the application claims 1, 11 fully encompasses patented claims 1, 17).  The difference between the application claims and the patent claims lies in the fact that the patent claim includes many more elements and is thus much more specific.  Thus the invention of claims 1, 17 of the patent is in effect a “species” of the “generic” invention of the application claims 1, 11. It has been held that the generic invention is “anticipated” by the “species”.  See In re Goodman, 29 USPQ2d 2010 (Fed. Cir. 1993).  Since application claims 1, 11 is anticipated by claims 1, 17 of the patent, it is not patentably distinct from of the patented claims. Dependent claims 2-10 and 12-20 are dependent claims depending on the independent claims 1 and 11 and are also rejected.

Application No: 17/337,400
Patent No: 10,311,860
1. A computer-implemented method when executed on data processing hardware causes the data processing hardware to perform operations comprising: receiving context data for a user device associated with a user: identifying an initial set of n-grams from the context data; receiving audio data corresponding to an utterance detected by the user device; processing, using a speech recognizer, the audio data to generate speech recognition candidates for the utterance spoken by the user, each speech recognition candidate associated with a respective speech recognition score; adjusting, using the initial set of n-grams, one or more of the speech recognition scores associated with the speech recognition candidates; and after adjusting the one or more speech recognition scores, determining a transcription of the utterance by selecting the speech recognition candidate that is associated with the highest respective speech recognition score.
1. A computer-implemented method comprising: receiving audio data corresponding to a user utterance and context data for the user utterance; identifying, based on the context data, an initial set of one or more n-grams including one or more n-grams that do not represent speech preceding the user utterance; generating an expanded set of one or more n-grams based at least on the initial set of n-grams, the expanded set of n-grams comprising one or more n-grams that are different from the n-grams in the initial set of n-grams; based at least on the expanded set of n-grams, adjusting a language model trained to predict a first set of n-grams to be able to predict an additional n-gram in the expanded set of n-grams; determining one or more speech recognition candidates for at least a portion of the user utterance using the adjusted language model, wherein each speech recognition candidate comprises one or more words; after determining the one or more speech recognition candidates, adjusting a score for a particular speech recognition candidate based on determining that the particular speech recognition candidate is included in the expanded set of n-grams; after adjusting the score for the particular speech recognition candidate, determining, a transcription for the user utterance that includes at least one of the one or more speech recognition candidates; and providing the transcription of the user utterance for output.
2. The computer-implemented method of claim 1, wherein the initial set of n-grams are identified before the user speaks the utterance.

3. The computer-implemented method of claim 1, wherein: the context data comprises an application identifier or dialog state identifier; and identifying the initial set of n-grams comprises retrieving data indicating one or more words or phrases corresponding to the application identifier or dialog state identifier.

4. The computer-implemented method of claim 1, wherein identifying the initial set of n-grams from the context data comprises: identifying a first set of one or more n-grams from the context data; and generating an expanded set of one or more n-grams based at least on the first set of n-grams, the expanded set of n-grams comprising one or more n-grams that are different from the n-grams in the first set of n-grams.

5. The computer-implemented method of claim 1, wherein: the utterance is detected by the user device providing an interface to the user; and the context data comprises data that indicates a topic corresponding to the interface.

6. The computer-implemented method of claim 1, wherein: the utterance is detected by the user device providing an interface to the user; and the context data comprises data indicating a task to be performed using the interface.

7. The computer-implemented method of claim 1, wherein the context data indicates one or more phrases included in a graphical user interface of the user device.

8. The computer-implemented method of claim 1, wherein the initial set of n-grams comprises one or more words or phrases displayed on a screen of the user device.

9. The computer-implemented method of claim 1, wherein the initial set of n-grams are provided by an application running on the user device.

10. The computer-implemented method of claim 1, wherein: the data processing hardware resides on the user device; or the data processing hardware resides on a server system in communication with the user device over a communication network.

11. A system comprising: data processing hardware; and memory hardware in communication with the data processing hardware and storing instructions that when executed on the data processing hardware causes the data processing hardware to perform operations comprising: receiving context data for a user device associated with a user: identifying an initial set of n-grams from the context data; receiving audio data corresponding to an utterance detected by the user device; processing, using a speech recognizer, the audio data to generate speech recognition candidates for the utterance spoken by the user, each speech recognition candidate associated with a respective speech recognition score; adjusting, using the initial set of n-grams, one or more of the speech recognition scores associated with the speech recognition candidates; and after adjusting the one or more speech recognition scores, determining a transcription of the utterance by selecting the speech recognition candidate that is associated with the highest respective speech recognition score.
17. A system comprising: one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: receiving audio data corresponding to a user utterance and context data for the user utterance; identifying, based on the context data, an initial set of one or more n-grams including one or more n-grams that do not represent speech preceding the user utterance; generating an expanded set of one or more n-grams based at least on the initial set of n-grams, the expanded set of n-grams comprising one or more n-grams that are different from the n-grams in the initial set of n-grams; based at least on the expanded set of n-grams, adjusting a language model trained to predict a first set of n-grams to be able to predict an additional n-gram in the expanded set of n-grams; determining one or more speech recognition candidates for at least a portion of the user utterance using the adjusted language model, wherein each speech recognition candidate comprises one or more words; after determining the one or more speech recognition candidates, adjusting a score for a particular speech recognition candidate based on determining that the particular speech recognition candidate is included in the expanded set of n-grams; after adjusting the score for the particular speech recognition candidate, determining, a transcription for the user utterance that includes at least one of the one or more speech recognition candidates; and providing the transcription of the user utterance for output.
12. The system of claim 11, wherein the initial set of n-grams are identified before the user speaks the utterance.

13. The system of claim 11, wherein: the context data comprises an application identifier or dialog state identifier; and identifying the initial set of n-grams comprises retrieving data indicating one or more words or phrases corresponding to the application identifier or dialog state identifier.

14. The system of claim 11, wherein identifying the initial set of n-grams from the context data comprises: identifying a first set of one or more n-grams from the context data; and generating an expanded set of one or more n-grams based at least on the first set of n-grams, the expanded set of n-grams comprising one or more n-grams that are different from the n-grams in the first set of n-grams.

15. The system of claim 11, wherein: the utterance is detected by the user device providing an interface to the user; and the context data comprises data that indicates a topic corresponding to the interface.

16. The system of claim 11, wherein: the utterance is detected by the user device providing an interface to the user; and the context data comprises data indicating a task to be performed using the interface.

17. The system of claim 11, wherein the context data indicates one or more phrases included in a graphical user interface of the user device.

18. The system of claim 11, wherein the initial set of n-grams comprises one or more words or phrases displayed on a screen of the user device.

19. The system of claim 11, wherein the initial set of n-grams are provided by an application running on the user device.

20. The system of claim 11, wherein: the data processing hardware resides on the user device; or the data processing hardware resides on a server system in communication with the user device over a communication network.



Claims 1, and 3 - 11 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 17 of U.S. Patent No. 11,037,551. Although the claims at issue are not identical, they are not patentably distinct from each other because, claims 1, 3-11 of the instant Application are similar in scope and content of claims 1 – 9, 11, 13-14 and 17 of the patent issued to the same Applicant. 
It is clear that all the elements of the application claims 1 and 3 - 11 are to be found in patented claims 1 – 9, 11, 13-14 and 17 (as the application claims 1, 3 - 11 fully encompasses patented claims 1 – 9, 11, 13-14 and 17).  The difference between the application claims and the patent claims lies in the fact that the patent claim includes many more elements and is thus much more specific.  Thus the invention of claims 1 – 9, 11, 13-14 and 17 of the patent is in effect a “species” of the “generic” invention of the application claims 1, and 3-11. It has been held that the generic invention is “anticipated” by the “species”.  See In re Goodman, 29 USPQ2d 2010 (Fed. Cir. 1993).  Since application claims 1, 3-11 is anticipated by claims 1 – 9, 11, 13-14 and 17 of the patent, it is not patentably distinct from of the patented claims. Dependent claims 2 and 12-20 are dependent claims depending on the independent claims 1 and 11 and are also rejected.
Application No: 17/337,400
Patent No: 11,037,551
1. A computer-implemented method when executed on data processing hardware causes the data processing hardware to perform operations comprising: receiving context data for a user device associated with a user: identifying an initial set of n-grams from the context data; receiving audio data corresponding to an utterance detected by the user device; processing, using a speech recognizer, the audio data to generate speech recognition candidates for the utterance spoken by the user, each speech recognition candidate associated with a respective speech recognition score; adjusting, using the initial set of n-grams, one or more of the speech recognition scores associated with the speech recognition candidates; and after adjusting the one or more speech recognition scores, determining a transcription of the utterance by selecting the speech recognition candidate that is associated with the highest respective speech recognition score.
1. A computer-implemented method comprising: obtaining a dialog state of a user device associated with a user; before the user speaks an utterance to the user device: identifying, from an n-gram cache, an initial set of n-grams that represent one or more words or phrases corresponding to the dialog state of the user device; and biasing a language model on the initial set of n-grams to increase a likelihood of output of the one or more words or phrases corresponding to the dialog state of the user device; receiving audio data indicating the utterance of the user; processing the audio data using the biased language model to generate a transcription of the utterance; and providing the transcription of the utterance for output.

2. The computer-implemented method of claim 1, wherein the initial set of n-grams are identified before the user speaks the utterance.

3. The computer-implemented method of claim 1, wherein: the context data comprises an application identifier or dialog state identifier; and identifying the initial set of n-grams comprises retrieving data indicating one or more words or phrases corresponding to the application identifier or dialog state identifier.
7. The method of claim 2, wherein: the context data comprises an application identifier or dialog state identifier; and identifying the initial set of n-grams that represent the one or more words or phrases comprises retrieving data indicating one or more words or phrases corresponding to the application identifier or dialog state identifier.
4. The computer-implemented method of claim 1, wherein identifying the initial set of n-grams from the context data comprises: identifying a first set of one or more n-grams from the context data; and generating an expanded set of one or more n-grams based at least on the first set of n-grams, the expanded set of n-grams comprising one or more n-grams that are different from the n-grams in the first set of n-grams.
8. The method of claim 2, wherein identifying the initial set of n-grams that represent one or more words or phrases corresponding to the dialog state of the user device comprises: identifying a first set of one or more n-grams from the context data; and generating an expanded set of one or more n-grams based at least on the first set of n-grams, the expanded set of n-grams comprising one or more n-grams that are different from the n-grams in the first set of n-grams.
5. The computer-implemented method of claim 1, wherein: the utterance is detected by the user device providing an interface to the user; and the context data comprises data that indicates a topic corresponding to the interface.
2. The method of claim 1, further comprising: receiving context data for the user device wherein obtaining the dialog state comprises identifying the dialog state based on the context data.
3. The method of claim 2, wherein the utterance is detected by the user device providing an interface to the user, and wherein the context data comprises data that indicates a topic corresponding to the interface.
4. The method of claim 2, wherein the utterance is detected by the user device providing an interface to the user, and wherein the context data comprises data indicating a task to be performed using the interface.
5. The method of claim 2, wherein the utterance is detected by the user device providing an interface to the user, and wherein the context data comprises data indicating a step for completing a portion of task to be performed using the interface.
6. The computer-implemented method of claim 1, wherein: the utterance is detected by the user device providing an interface to the user; and the context data comprises data indicating a task to be performed using the interface.
2. The method of claim 1, further comprising: receiving context data for the user device wherein obtaining the dialog state comprises identifying the dialog state based on the context data.
3. The method of claim 2, wherein the utterance is detected by the user device providing an interface to the user, and wherein the context data comprises data that indicates a topic corresponding to the interface.
4. The method of claim 2, wherein the utterance is detected by the user device providing an interface to the user, and wherein the context data comprises data indicating a task to be performed using the interface.
5. The method of claim 2, wherein the utterance is detected by the user device providing an interface to the user, and wherein the context data comprises data indicating a step for completing a portion of task to be performed using the interface.
7. The computer-implemented method of claim 1, wherein the context data indicates one or more phrases included in a graphical user interface of the user device.
6. The method of claim 2, wherein the context data indicates one or more words or phrases included in a graphical user interface of the user device at a time that the utterance was spoken.
8. The computer-implemented method of claim 1, wherein the initial set of n-grams comprises one or more words or phrases displayed on a screen of the user device.
9. The method of claim 1, wherein identifying the dialog state of the user device comprises identifying one of a plurality of different dialog states of the user device that each correspond to a different interface or view of an application.
9. The computer-implemented method of claim 1, wherein the initial set of n-grams are provided by an application running on the user device.
11. The method of claim 10, wherein for at least one of the dialog states of the user device, one or more of the n-grams in the predetermined set of n-grams for the dialog state of the user device are not displayed by the application during the dialog state of the user device.
10. The computer-implemented method of claim 1, wherein: the data processing hardware resides on the user device; or the data processing hardware resides on a server system in communication with the user device over a communication network.
13. The method of claim 1, wherein receiving the audio data comprises receiving, by a server system, audio data provided by the user device over a communication network.
14. The method of claim 13, wherein identifying the dialog state of the user device comprises identifying, by the server system, the dialog state of the user device based on additional data provided by the user device over the communication network.
11. A system comprising: data processing hardware; and memory hardware in communication with the data processing hardware and storing instructions that when executed on the data processing hardware causes the data processing hardware to perform operations comprising: receiving context data for a user device associated with a user: identifying an initial set of n-grams from the context data; receiving audio data corresponding to an utterance detected by the user device; processing, using a speech recognizer, the audio data to generate speech recognition candidates for the utterance spoken by the user, each speech recognition candidate associated with a respective speech recognition score; adjusting, using the initial set of n-grams, one or more of the speech recognition scores associated with the speech recognition candidates; and after adjusting the one or more speech recognition scores, determining a transcription of the utterance by selecting the speech recognition candidate that is associated with the highest respective speech recognition score.
17. A system comprising: one or more computers; and one or more computer-readable media storing instructions that, when executed by the one or more computers, cause the one or more computers to perform operations comprising: obtaining a dialog state of a user device associated with a user; before the user speaks an utterance to the user device: identifying, from an n-gram cache, an initial set of n-grams that represent one or more words or phrases corresponding to the dialog state of the user device; and biasing a language model on the initial set of n-grams to increase a likelihood of output of the one or more words or phrases corresponding to the dialog state of the user device; receiving audio data indicating the utterance of the user; processing the audio data using the biased language model to generate a transcription of the utterance; and providing the transcription of the utterance for output.

12. The system of claim 11, wherein the initial set of n-grams are identified before the user speaks the utterance.

13. The system of claim 11, wherein: the context data comprises an application identifier or dialog state identifier; and identifying the initial set of n-grams comprises retrieving data indicating one or more words or phrases corresponding to the application identifier or dialog state identifier.

14. The system of claim 11, wherein identifying the initial set of n-grams from the context data comprises: identifying a first set of one or more n-grams from the context data; and generating an expanded set of one or more n-grams based at least on the first set of n-grams, the expanded set of n-grams comprising one or more n-grams that are different from the n-grams in the first set of n-grams.

15. The system of claim 11, wherein: the utterance is detected by the user device providing an interface to the user; and the context data comprises data that indicates a topic corresponding to the interface.

16. The system of claim 11, wherein: the utterance is detected by the user device providing an interface to the user; and the context data comprises data indicating a task to be performed using the interface.

17. The system of claim 11, wherein the context data indicates one or more phrases included in a graphical user interface of the user device.

18. The system of claim 11, wherein the initial set of n-grams comprises one or more words or phrases displayed on a screen of the user device.

19. The system of claim 11, wherein the initial set of n-grams are provided by an application running on the user device.

20. The system of claim 11, wherein: the data processing hardware resides on the user device; or the data processing hardware resides on a server system in communication with the user device over a communication network.






Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Please see attached form PTO-892.
Biadsy et al., (US 2015/0228279 A1) teach methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for language models using non-linguistic context. In some implementations, context data indicating non-linguistic context for the utterance is received. Based on the context data, feature scores for one or more non-linguistic features are generated. The feature scores for the non-linguistic features are provided to a language model trained to process scores for non-linguistic features. The output from the language model is received, and a transcription for the utterance is determined using the output of the language model.
Chelba et al., (US 8,175,878 B1) teach systems, methods, and apparatuses, including computer program products, are provided for representing language models. In some implementations, a computer-implemented method is provided. The method includes generating a compact language model including receiving a collection of n-grams from the corpus, each n-gram of the collection having a corresponding first probability of occurring in the corpus and generating a tree representing the collection of n-grams. The method also includes using the language model to identify a second probability of a particular string of words occurring.
Zhou (US 2003/0149561 A1) teaches a spoken dialog system using a best-fit language model and a spoken dialog system using best-fit grammar are disclosed. A spoken dialog system implementing both a best-fit language model and best-fit grammar is further disclosed. Regarding the language model, likelihood scores from a large vocabulary continuous speech recognition ("LVCSR") module are used to select the best-fit language model among a general task language model and dialog-state dependent language models. Based on the chosen language model, a dialog manager can implement different strategies to improve general dialog performance and recognition accuracy. Regarding grammar, the best-fit grammar method improves performance and user experience of dialog systems by choosing the best-fit grammar among a general purpose grammar and dialog-state dependent sub-grammars. Based on the selected grammar pattern, the dialog system can choose from varying dialog strategies, resulting in an increase in user acceptance of spoken dialog systems.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to VIJAY B CHAWAN whose telephone number is (571)272-7601. The examiner can normally be reached 7-5 Monday thru Thursday.

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on 571-272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/VIJAY B CHAWAN/Primary Examiner, Art Unit 2658