Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Claims 1-20 are pending. Claims 1, 11 and 20 are independent and are similarly amended and have similar scope.
This Application was published as U.S. 2020/0134026.
Apparent priority 10/25/2018.
 Pending Claims are allowed.

Claim 1, as amended, follows:
1. A computer-implemented method comprising: 
accessing an audio input stream that includes one or more words spoken by a speaking user in a first language and one or more words spoken by a further speaking user;
performing active noise cancellation on the one or more words in the audio input stream including generating a noise cancellation signal configured to substantially cancel out the audio input stream received from the speaking user;
applying the generated noise cancellation signal to the audio input stream to suppress the spoken words of the speaking user;
processing the audio input stream to identify the one or more words spoken by the speaking user;
determining that a first plurality of the words spoken by the speaking user are 
pausing active noise cancellation for the second plurality of the words that are spoken in the language understood by the listening user;
translating the first plurality of the identified words spoken by the speaking user into a second, different language;
generating spoken words in the second, different language using the translated words while performing active noise cancellation on the speaking user and on the further speaking user;
generating spoken words for the words spoken by the further speaking user that were suppressed by the active noise cancellation:
storing the generated spoken words for the further speaking user until the speaking user has stopped speaking for a specified amount of time; and
replaying the generated spoken words in the second language to the listening user, 
wherein the audio input stream provided to the listening user includes a mixture of original speech of the speaking user including the second plurality of words during which active noise cancellation is paused and the generated spoken words replayed in the second language, and 
wherein the stored generated spoken words for the further speaking user are sequentially played back to the listening user after the speaking user has stopped speaking for the specified amount of time.



    PNG
    media_image1.png
    383
    651
    media_image1.png
    Greyscale

 [0085] Computing environment 1100 of FIG. 11 illustrates an example in which multiple speaking users are speaking at the same time. Each speaking user (e.g., 1105 or 1107) may provide an audio input stream (e.g., 1104 or 1106, respectively) that includes words spoken by the two different users. A speech differentiation module 1103 (which may be part of the personalization engine 600 of FIG. 6 and/or part of the computer system 401 of FIG. 4) may then differentiate between the two speaking users 1105 and 1107 according to different vocal patterns or other vocal characteristics. The speech differentiation module 1103 may then generate spoken words in speech output 1102 for one speaking user (e.g., 1105), while performing active noise cancellation on both speaking users. In this manner, the listening user 1101 (who does not understand the language spoken by the two users) can still 
[0086] In some embodiments, the audio input stream 1106 from the other speaking user 1107 may be stored in a data store. This stored audio stream may then be parsed and translated. Then, when speaking user 1105 is finished speaking, the speech differentiation module 1103 may cause speaking user 1107's stored and translated words to be converted into spoken words. In some cases, the speech differentiation module 1103 may run according to policy indicating that, if two (or more) speaking users are speaking, one speaker will be chosen (perhaps based on eye tracking information to see which speaking user the listening user is looking at), and words from other speaking users will be stored. Then, once the speech differentiation module 1103 determines that the first speaking user has stopped talking for a specified amount of time, the generated spoken words for the other speaking user(s) will be played back sequentially to the listening user. In some cases, the policy may favor certain speaking users based on their identity. Thus, even if the listening user 1101 is in a crowd of people, the systems herein may focus on a single speaking user (based on that user's vocal characteristics, for example) or set of users and record audio from those users. This audio may then be converted to text, translated, converted back to speech and played back to the listening user 1101.
According to the above Disclosure, both speakers are talking at the same time such as in a conference setting, the system has to use a method (such as the described voiceprint) to distinguish between the portion of the audio stream that is spoken by one speaker from the other speaker (diarization), and some type of priority assignment is performed (e.g. either by gaze detection or according to speaker identity) which determines that the speech of the which speaker is played out first to be followed by the speech of the other speaker.
Allowable Subject Matter
Pending Claims 1-20 are allowed.
The following is an examiner’s statement of reasons for allowance: In view of each of the particular limitations of the independent Claims when considered in the order established by the Claim language and in the context of the language of the independent Claims when each Claim is considered as a whole, the independent Claims of this Application were not found in the prior art that was viewed.
In particular, while the building blocks of the independent Claims are in the prior art, the method and systems claimed in the independent Claims as a whole and including all of the limitations was not found in the prior art.  The claimed method and system include receiving an audio stream that includes a mixture of speech of two speakers, generating a noise cancelation signal that cancels all of the input speech from both speakers, identifying the words spoken by each speaker and determining that one of the speakers (first speaker) speaks in a language that is foreign to the listener while the other speaks (second speaker) the native tongue of the listener, and further identifying that the first speaker who speaks in a foreign language has spoken a mixture of words some of which are native and some foreign, and proceeding to record the speech of the second speaker that is in the native language of the listener as is whereas translating the speech of the first speaker who speaks in the foreign tongue with native words mixed in and synthesizing speech that is also again a mixture of translated language and original words spoken by the first speaker in the native tongue of the listener, and storing both sets of mixed generated audio (directly recorded or translated and synthesized)  of the first speaker and purely recorded speech of the second speaker and playing the translated and speech-synthesized speech corresponding to the input speech of the first speaker mixed with words of the first speaker that went right through because they did not need translation to the listener and then waiting for a predetermined duration of pause to occur in the speech of the first speaker before providing the directly recorded speech of the second speaker (lower priority speaker determined by this Claim as the speaker who speaks the native tongue of the listener) to the listener in a sequential manner following the mix of recorded and translated/synthesized speech corresponding to the speech of the first speaker.  According to the current language of the Claim the speech of the first user is mixed foreign and native and speech of the second user is pure native.
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee. Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”
Close Art of Record
Refer to the extensive discussion of prior art in previous Office action.  Goyal (U.S. 9,961,435) and Laskey (U.S. 10,791,404) remain the closest pieces of art and Goyal is close and comprehensive.  For example, of the material added by amendment, Goyal in Figure 4B teaches “ambient conversation” as one of the interfering parts of the audio that enters the headset.  Further, the loop in Figure 3 teaches providing an order or sequence for the providing of the received sounds to the listener according to the semantic relevance of the sound.  This still does not teach looking for the pause in the speech of one speaker which is of a specific duration and then beginning the output of the speech of the other user.
Goyal:

    PNG
    media_image2.png
    768
    514
    media_image2.png
    Greyscale
       
    PNG
    media_image3.png
    406
    396
    media_image3.png
    Greyscale



    PNG
    media_image4.png
    555
    899
    media_image4.png
    Greyscale

Laskey:

    PNG
    media_image5.png
    539
    746
    media_image5.png
    Greyscale

Additionally, note the following references regarding a mixture of words in different languages where some words are translated and others are not and the ones that are not translated are passed through as is:  Almagro (U.S. 20120029912), Gabryjelski (U.S. 20200058289), Gildein (U.S. 20170076713), and Sakamoto (U.S. 20200073889).
Gildein (U.S. 20170076713):

    PNG
    media_image6.png
    539
    435
    media_image6.png
    Greyscale

Gabryjelski (U.S. 20200058389):
    PNG
    media_image7.png
    316
    795
    media_image7.png
    Greyscale


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARIBA SIRJANI whose telephone number is (571)270-1499.  The examiner can normally be reached on 9 to 5, M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Desir can be reached on 571-272-7799.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Fariba Sirjani/
Primary Examiner, Art Unit 2659