DETAILED ACTION

Notice of Pre-AIA  or AIA  Status

The present application is being examined under the pre-AIA  first to invent provisions. 

Double Patenting

The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.

Claims 1, 3, 5-6, 11, 13, 15-16, and 19-20 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 4, 14-16, 19, and 28-30 of U.S. Patent No. 10,872,600. Although the claims at issue are not identical, they are not patentably distinct from each other because claims 1, 3, 5-6, 11, 13, 15-16, and 19-20 are similar in scope and content of the patented claims 1, 6, 12-16 of the patent issued to the same Applicant.
It is clear that all the elements of the application claims 1, 3, 5-6, 11, 13, 15-16, and 19-20 are to be found in patented claims 1, 4, 14-16, 19, and 28-30 (as the application claims 1, 3, 5-6, 11, 13, 15-16, and 19-20 fully encompasses patented claims 1, 4, 14-16, 19, and 28-30).  The difference between the application claims and the patent claims lies in the fact that the patent claim includes many more elements and is thus much more specific.  Thus the invention of claims 1, 4, 14-16, 19, and 28-30 of the patent is in effect a “species” of the “generic” invention of the application claims 1, 3, 5-6, 11, 13, 15-16, and 19-20. It has been held that the generic invention is “anticipated” by the “species”.  See In re Goodman, 29 USPQ2d 2010 (Fed. Cir. 1993).  Since application claims 1, 3, 5-6, 11, 13, 15-16, and 19-20  is anticipated by claims 1, 4, 14-16, 19, and 28-30 of the patent, it is not patentably distinct from of the patented claims. 

Application No: 17/101,946
Patent No: 10,872,600
1. A method comprising: receiving, at data processing hardware, from a computing device associated with a user, first audio data and second audio data captured by the computing device; processing, by the data processing hardware, the first audio data to identify a concept associated with the first audio data; influencing, by the data processing hardware, a speech recognition language model based on the identified concept associated with the first audio data; and generating, by the data processing hardware, using the influenced speech recognition language model, a textual representation of the second audio data.
1. A method comprising: receiving, at a voice recognition system, first audio data and second audio data from a computing device associated with a user, the computing device configured to: capture an audio stream comprising the first audio data and the second audio data; separate the first audio data and the second audio data from the captured audio stream; and provide the first audio data and the second audio data to the voice recognition system; processing, by the voice recognition system, the second audio data to generate at least one term associated with the second audio data; influencing, by the voice recognition system, a speech recognition model based on the at least one term associated with the second audio data; and after influencing the speech recognition model, transcribing, by the voice recognition system, the first audio data into a textual representation using the speech recognition model.
2. The method of claim 1, wherein the computing device captures the first audio data before capturing the second audio data.

3. The method of claim 1, further comprising: generating, by the data processing hardware, a set of terms related to the identified concept, wherein influencing the speech recognition model based on the identified concept comprises adjusting a probability or relevance score associated with the speech recognition language model recognizing at least one term in the set of terms related to the identified concept.
4. The method of claim 1, wherein the computing device is configured to separate the first audio data and the second audio data from the captured audio stream by separating the captured audio stream into a first substream and a second substream, the first substream corresponding to the first audio data and the second substream isolated from the first substream and corresponding to the second audio data.
4. The method of claim 3, wherein generating the set of terms related to the identified concept comprises querying a conceptual expansion database for the set of terms using the identified concept associated with the first audio data.

5. The method of claim 3, further comprising: generating, by the data processing hardware, conceptual bias data using set of terms related to the identified concept associated with the first audio data; and adjusting, by the data processing hardware, the probability or relevance score associated with the speech recognition language model recognizing the at least one term in the set of terms related to the identified concept based on the conceptual bias data.
14. The method of claim 13, further comprising: generating, by the voice recognition system, conceptual bias data using the at least one term associated with the second audio data; and adjusting, by the voice recognition system, the probability or relevance score associated with the speech recognition model recognizing the at least one term associated with the second audio data based on the conceptual bias data.
6. The method of claim 5, wherein generating the textual representation of the second audio data using the influenced speech recognition language model comprises, selecting, by the influenced speech recognition language model, the textual representation from a set of textual representations that have substantially similar frequencies of occurrence in a particular language by using the conceptual bias data to weigh a statistical selection of the textual representation from the set of textual representations.
15. The method of claim 14, wherein transcribing the first audio data into the textual representation using the speech recognition model comprises, selecting, by the speech recognition model, the textual representation from a set of textual representations that have substantially similar frequencies of occurrence in a particular language by using the conceptual bias data to weigh a statistical selection of the textual representation from the set of textual representations.
7. The method of claim 1, wherein the second audio data corresponds to an utterance spoken by the user of the computing device.

8. The method of claim 1, wherein: the data processing hardware resides on a server in communication with the computing device; and the computing device is configured to transmit the first audio data and the second audio data over a communication channel to the data processing hardware residing on the server.

9. The method of claim 1, further comprising transmitting, by the data processing hardware, the textual representation of the second audio data to the computing device, the textual representation when received by the computing device causing the computing device to perform a particular task based on the textual representation.

10. The method of claim 1, further comprising transmitting, by the data processing hardware, the textual representation of the second audio data to the computing device, the textual representation when received by the computing device causing the computing device to display the textual representation on a display.

11. A system comprising: data processing hardware; and memory hardware in communication with the data processing hardware and storing instructions, that when executed by the data processing hardware, cause the data processing hardware to perform operations comprising: receiving, from a computing device associated with a user, first audio data and second audio data captured by the computing device; processing the first audio data to identify a concept associated with the first audio data; influencing a speech recognition language model based on the identified concept associated with the first audio data; and generating, using the influenced speech recognition language model, a textual representation of the second audio data.
16. A voice recognition system comprising: data processing hardware; and memory hardware in communication with the data processing hardware and storing instructions, that when executed by the data processing hardware, cause the data processing hardware to perform operations comprising: receiving first audio data and second audio data from a computing device associated with a user, the computing device configured to: capture an audio stream comprising the first audio data and the second audio data; separate the first audio data and the second audio data from the captured audio stream; and provide the first audio data and the second audio data to the voice recognition system; processing the second audio data to generate at least one term associated with the second audio data; influencing a speech recognition model based on the at least one term associated with the second audio data; and after influencing the speech recognition model, transcribing the first audio data into a textual representation using the speech recognition model.
12. The system of claim 11, wherein the computing device captures the first audio data before capturing the second audio data.

13. The system of claim 11, wherein the operations further comprise: generating a set of terms related to the identified concept, wherein influencing the speech recognition model based on the identified concept comprises adjusting a probability or relevance score associated with the speech recognition language model recognizing at least one term in the set of terms related to the identified concept.
19. The voice recognition system of claim 16, wherein the computing device is configured to separate the first audio data and the second audio data from the captured audio stream by separating the captured audio stream into a first substream and a second substream, the first substream corresponding to the first audio data and the second substream isolated from the first substream and corresponding to the second audio data.
14. The system of claim 13, wherein generating the set of terms related to the identified concept comprises querying a conceptual expansion database for the set of terms using the identified concept associated with the first audio data.

15. The system of claim 13, wherein the operations further comprise: generating conceptual bias data using set of terms related to the identified concept associated with the first audio data; and adjusting the probability or relevance score associated with the speech recognition language model recognizing the at least one term in the set of terms related to the identified concept based on the conceptual bias data.
28. The voice recognition system of claim 16, wherein influencing the speech recognition model based on the at least one term associated with the second audio data comprises adjusting a probability or relevance score associated with the speech recognition model recognizing the at least one term associated with the second audio data.
29. The voice recognition system of claim 28, wherein the operations further comprise: generating conceptual bias data using the at least one term associated with the second audio data; and adjusting the probability or relevance score associated with the speech recognition model recognizing the at least one term associated with the second audio data based on the conceptual bias data.
16. The system of claim 15, wherein generating the textual representation of the second audio data using the influenced speech recognition language model comprises, selecting, by the influenced speech recognition language model, the textual representation from a set of textual representations that have substantially similar frequencies of occurrence in a particular language by using the conceptual bias data to weigh a statistical selection of the textual representation from the set of textual representations.
30. The voice recognition system of claim 29, wherein transcribing the first audio data into the textual representation comprises, selecting, by the speech recognition model, the textual representation from a set of textual representations that have substantially similar frequencies of occurrence in a particular language by using the conceptual bias data to weigh a statistical selection of the textual representation from the set of textual representations.

17. The system of claim 11, wherein the second audio data corresponds to an utterance spoken by the user of the computing device.

18. The system of claim 11, wherein: the data processing hardware resides on a server in communication with the computing device; and the computing device is configured to transmit the first audio data and the second audio data over a communication channel to the data processing hardware residing on the server.

19. The system of claim 11, wherein the operations further comprise transmitting the textual representation of the second audio data to the computing device, the textual representation when received by the computing device causing the computing device to perform a particular task based on the textual representation.
30. The voice recognition system of claim 29, wherein transcribing the first audio data into the textual representation comprises, selecting, by the speech recognition model, the textual representation from a set of textual representations that have substantially similar frequencies of occurrence in a particular language by using the conceptual bias data to weigh a statistical selection of the textual representation from the set of textual representations.
20. The system of claim 11, wherein the operations further comprise transmitting the textual representation of the second audio data to the computing device, the textual representation when received by the computing device causing the computing device to display the textual representation on a display.
30. The voice recognition system of claim 29, wherein transcribing the first audio data into the textual representation comprises, selecting, by the speech recognition model, the textual representation from a set of textual representations that have substantially similar frequencies of occurrence in a particular language by using the conceptual bias data to weigh a statistical selection of the textual representation from the set of textual representations.


Claims 1-6, 11-12, and 15-16 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-3, 5, 7-10, 12, and 14 of U.S. Patent No. 10,224,024. Although the claims at issue are not identical, they are not patentably distinct from each other because claims 1-6, 11-12, and 15-16 are similar in scope and content of the patented claims 1, 6, 12-16 of the patent issued to the same Applicant.
It is clear that all the elements of the application claims 1-6, 11-12, and 15-16   are to be found in patented claims 1-3, 5, 7-10, 12, and 14 (as the application claims1-6, 11-12, and 15-16 fully encompasses patented claims 1-3, 5, 7-10, 12, and 14).  The difference between the application claims and the patent claims lies in the fact that the patent claim includes many more elements and is thus much more specific.  Thus the invention of claims 1-3, 5, 7-10, 12, and 14 of the patent is in effect a “species” of the “generic” invention of the application claims 1-6, 11-12, and 15-16. It has been held that the generic invention is “anticipated” by the “species”.  See In re Goodman, 29 USPQ2d 2010 (Fed. Cir. 1993).  Since application claims 1-6, 11-12, and 15-16  is anticipated by claims 1-3, 5, 7-10, 12, and 14 of the patent, it is not patentably distinct from of the patented claims. 

Application No: 17/101,946
Patent No: 10,224,024
1. A method comprising: receiving, at data processing hardware, from a computing device associated with a user, first audio data and second audio data captured by the computing device; processing, by the data processing hardware, the first audio data to identify a concept associated with the first audio data; influencing, by the data processing hardware, a speech recognition language model based on the identified concept associated with the first audio data; and generating, by the data processing hardware, using the influenced speech recognition language model, a textual representation of the second audio data.
1. A computer-implemented method comprising: receiving an audio stream containing first audio data and second audio data, the first audio data corresponding to an utterance spoken by a user and the second audio data corresponding to background audio associated with playback of an item of media content; processing the second audio data to generate at least one term associated with the background audio; adjusting a probability or relevance score associated with a speech recognition model recognizing the at least one term associated with the background audio in the first audio data; after adjusting the probability or relevance score associated with the speech recognition model recognizing the at least one term associated with the background audio, transcribing the first audio data into a textual representation of the utterance using the speech recognition model; and transmitting the textual representation of the utterance to a computing device associated with the user, the textual representation of the utterance when received by the computing device causing the computing device to at least one of display the textual representation on a display or perform a particular task based on the textual representation.
2. The method of claim 1, wherein the computing device captures the first audio data before capturing the second audio data.
2. The computer-implemented method of claim 1, wherein processing the second audio data to generate the at least one term comprises: identifying one or more concepts from audio features of the second audio data; and generating a set of terms related to the identified one or more concepts.
3. The method of claim 1, further comprising: generating, by the data processing hardware, a set of terms related to the identified concept, wherein influencing the speech recognition model based on the identified concept comprises adjusting a probability or relevance score associated with the speech recognition language model recognizing at least one term in the set of terms related to the identified concept.

4. The method of claim 3, wherein generating the set of terms related to the identified concept comprises querying a conceptual expansion database for the set of terms using the identified concept associated with the first audio data.
3. The computer-implemented method of claim 2, wherein generating the set of terms related to the identified one or more concepts comprises querying a conceptual expansion database for the set of terms using the one or more concepts identified from the audio features of the second audio data.
5. The method of claim 3, further comprising: generating, by the data processing hardware, conceptual bias data using set of terms related to the identified concept associated with the first audio data; and adjusting, by the data processing hardware, the probability or relevance score associated with the speech recognition language model recognizing the at least one term in the set of terms related to the identified concept based on the conceptual bias data.
5. The computer-implemented method of claim 2, further comprising: generating conceptual bias data using the at least one term associated with the background audio; and adjusting the probability or relevance score associated with the speech recognition model recognizing the at least one term associated with the background audio in the first audio data based on the conceptual bias data.
6. The method of claim 5, wherein generating the textual representation of the second audio data using the influenced speech recognition language model comprises, selecting, by the influenced speech recognition language model, the textual representation from a set of textual representations that have substantially similar frequencies of occurrence in a particular language by using the conceptual bias data to weigh a statistical selection of the textual representation from the set of textual representations.
7. The computer-implemented method of claim 5, wherein transcribing the first audio data into the textual representation of the utterance comprises selecting, by the speech recognition model, the textual representation from a set of textual representations that have substantially similar frequencies of occurrence in a particular language by using the conceptual data to weigh a statistical selection of the textual representation from the set of textual representations.
7. The method of claim 1, wherein the second audio data corresponds to an utterance spoken by the user of the computing device.

8. The method of claim 1, wherein: the data processing hardware resides on a server in communication with the computing device; and the computing device is configured to transmit the first audio data and the second audio data over a communication channel to the data processing hardware residing on the server.

9. The method of claim 1, further comprising transmitting, by the data processing hardware, the textual representation of the second audio data to the computing device, the textual representation when received by the computing device causing the computing device to perform a particular task based on the textual representation.

10. The method of claim 1, further comprising transmitting, by the data processing hardware, the textual representation of the second audio data to the computing device, the textual representation when received by the computing device causing the computing device to display the textual representation on a display.

11. A system comprising: data processing hardware; and memory hardware in communication with the data processing hardware and storing instructions, that when executed by the data processing hardware, cause the data processing hardware to perform operations comprising: receiving, from a computing device associated with a user, first audio data and second audio data captured by the computing device; processing the first audio data to identify a concept associated with the first audio data; influencing a speech recognition language model based on the identified concept associated with the first audio data; and generating, using the influenced speech recognition language model, a textual representation of the second audio data.
8. A system comprising: one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: receiving an audio stream containing first audio data and second audio data, the first audio data corresponding to an utterance spoken by a user and the second audio data corresponding to background audio associated with playback of an item of media content; processing the second audio data to generate at least one term associated with the background audio; adjusting a probability or relevance score associated with a speech recognition model recognizing the at least one term associated with the background audio in the first audio data; after adjusting the probability or relevance score associated with the speech recognition model recognizing the at least one term associated with the background audio, transcribing the first audio data into a textual representation of the utterance using the speech recognition model; and transmitting the textual representation of the utterance to a computing device associated with the user, the textual representation of the utterance when received by the computing device causing the computing device to at least one of display the textual representation on a display or perform a particular task based on the textual representation.
12. The system of claim 11, wherein the computing device captures the first audio data before capturing the second audio data.
9. The system of claim 8, wherein processing the second audio data to generate the at least one term comprises: identifying one or more concepts from audio features of the second audio data; and generating a set of terms related to the identified one or more concepts.
13. The system of claim 11, wherein the operations further comprise: generating a set of terms related to the identified concept, wherein influencing the speech recognition model based on the identified concept comprises adjusting a probability or relevance score associated with the speech recognition language model recognizing at least one term in the set of terms related to the identified concept.

14. The system of claim 13, wherein generating the set of terms related to the identified concept comprises querying a conceptual expansion database for the set of terms using the identified concept associated with the first audio data.

15. The system of claim 13, wherein the operations further comprise: generating conceptual bias data using set of terms related to the identified concept associated with the first audio data; and adjusting the probability or relevance score associated with the speech recognition language model recognizing the at least one term in the set of terms related to the identified concept based on the conceptual bias data.
10. The system of claim 9, wherein generating the set of terms related to the identified one or more concepts comprises querying a conceptual expansion database for the set of terms using the one or more concepts identified from the audio features of the second audio data.
12. The system of claim 9, wherein the operations further comprise: generating conceptual bias data using the at least one term associated with the background audio; and adjusting the probability or relevance score associated with the speech recognition model recognizing the at least one term associated with the background audio in the first audio data based on the conceptual bias data.

16. The system of claim 15, wherein generating the textual representation of the second audio data using the influenced speech recognition language model comprises, selecting, by the influenced speech recognition language model, the textual representation from a set of textual representations that have substantially similar frequencies of occurrence in a particular language by using the conceptual bias data to weigh a statistical selection of the textual representation from the set of textual representations.
14. The system of claim 12, wherein transcribing the first audio data into the textual representation of the utterance comprises selecting, by the speech recognition model, the textual representation from a set of textual representations that have substantially similar frequencies of occurrence in a particular language by using the conceptual data to weigh a statistical selection of the textual representation from the set of textual representations.
17. The system of claim 11, wherein the second audio data corresponds to an utterance spoken by the user of the computing device.

18. The system of claim 11, wherein: the data processing hardware resides on a server in communication with the computing device; and the computing device is configured to transmit the first audio data and the second audio data over a communication channel to the data processing hardware residing on the server.

19. The system of claim 11, wherein the operations further comprise transmitting the textual representation of the second audio data to the computing device, the textual representation when received by the computing device causing the computing device to perform a particular task based on the textual representation.

20. The system of claim 11, wherein the operations further comprise transmitting the textual representation of the second audio data to the computing device, the textual representation when received by the computing device causing the computing device to display the textual representation on a display.




Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
VanLund et al., (US 9,552,816 B2) teach a speech-based system which includes an audio device in a user premises and a network-based service that supports use of the audio device by multiple applications. The audio device may be directed to play audio content such as music, audio books, etc. The audio device may also be directed to interact with a user through speech. The network-based service monitors event messages received from the audio device to determine which of the multiple applications currently has speech focus. When receiving speech from a user, the service first offers the corresponding meaning to the application, if any, that currently has primary speech focus. If there is no application that currently has primary speech focus, or if the application having primary speech focus is not able to respond to the meaning, the service then offers the user meaning to the application that currently has secondary speech focus.
Weinstein et al., (US 2015/0039299 A1) teach a processing system receives an audio signal encoding a portion of an utterance. The processing system receives context information associated with the utterance, wherein the context information is not derived from the audio signal or any other audio signal. The processing system provides, as input to a neural network, data corresponding to the audio signal and the context information, and generates a transcription for the utterance based on at least an output of the neural network.
Gruber et al., (US 2012/0265528 A1) teach a virtual assistant uses context information to supplement natural language or gestural input from a user. Context helps to clarify the user's intent and to reduce the number of candidate interpretations of the user's input, and reduces the need for the user to provide excessive clarification input. Context can include any available information that is usable by the assistant to supplement explicit user input to constrain an information-processing problem and/or to personalize results. Context can be used to constrain solutions during various phases of processing, including, for example, speech recognition, natural language processing, task flow processing, and dialog generation.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to VIJAY B CHAWAN whose telephone number is (571)272-7601. The examiner can normally be reached 7-5 Monday thru Thursday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on 571-272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/VIJAY B CHAWAN/Primary Examiner, Art Unit 2658