DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1-20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Wexler et al. (WO #2020/079485 A2).

Regarding Claim 1, Wexler discloses a wearable device (Figs. 1-52) for processing audio signals, comprising:
a microphone (Wexler Fig. 29: user 2901 microphone in wearable device 2931) configured to capture sounds from an environment of a user of the wearable device (Wexler ¶0320 discloses wearable device 2931 includes at least one microphone configured to receive one or more audio signals from the environment of user 2901; claim 1); and
at least one processor (Wexler Figs. 5A-5C; ¶0144-¶0166) programmed to:
receive first audio signals, wherein the first audio signals are representative of sounds captured by the microphone during a first time period during which the user is in a location (Wexler ¶0317 discloses user 2901 [Fig. 29] sits by one side of a table [i.e., location]. ¶0420 discloses the processor can be configured to detect, based on analysis of the audio signals, a first audio signal associated with a first time period, wherein the first audio signal is representative of a voice of a single individual. ¶0543 discloses Fig. 44B illustrates audio signals 4416, that includes first audio signals 4410B from first sound-emanating object 4410A [e.g., a computer with speakers], acquired by wearable microphone 4404 [wearable by user 100] during a time period T. ¶0542 discloses Fig. 44A where user 100 is working at his desk [a location]);
obtain an audio segment from the first audio signals, wherein the audio segment includes a portion of the first audio signals in which an individual is speaking (Wexler ¶0320 discloses wearable device 2931 [Fig. 29] can include at least one microphone configured to receive one or more audio signals from the environment of user 2901. For example, the microphone can be configured to receive [or detect] an audio signal associated with the first individual 2911. ¶0485 discloses processor 3803 can transmit the amplified first audio signal as long as the first individual keeps speaking. Processor 3803 can transmit the amplified first audio even if other voices or sounds are captured by microphone 3802, whether recognized or not, in order to let user 100 continuously listen to the first individual);
generate a voice print of the individual using at least the audio segment (Wexler ¶0319 discloses wearable device 2931 can automatically identify an individual based on voice recognition [e.g., the voice print of the individual] associated with an audio signal detected. ¶0442 discloses voiceprint handling module 3666 [Fig. 36B] can be used to generate, store, or retrieve a voiceprint, using, for example, wavelet transform or any other attributes of the voice of one or more persons);
receive second audio signals (Wexler ¶0320 discloses the microphone can be configured to receive [or detect] an audio signal associated with the second individual 2912) representative of additional sounds captured by the microphone, wherein the additional sounds include sounds made by the individual (Wexler ¶0320 discloses the microphone can also be configured to receive [or detect] an audio signal associated with the speakerphone 2921 [e.g., the voice of a third individual participating in the conference through the speakerphone 2921]. The microphone can be configured to receive [or detect] an audio signal associated with audio such as background noise), and
wherein the second audio signals are at least one of (a) audio signals captured by the microphone within a predetermined time period after the first time period (Wexler ¶0420 discloses in addition, the processor can be configured to detect, based on analysis of the audio signals, a second audio signal associated with a second time period, wherein the second time period is different from the first time period, and wherein the second audio signal is representative of overlapping voices of two or more individuals), or (b) audio signals captured by the microphone while the user is in the location (Wexler ¶0317 discloses user 2901 [Fig. 29] sits by one side of a table [i.e., location]. ¶0320 discloses the second individual 2912 sits by another side of the table [i.e., location]); and
process the second audio signals using the generated voice print (Wexler ¶0442 discloses voiceprint handling module 3666 [Fig. 36B] can use any suitable algorithm to determine whether an audio signal comprises speech, as well as to determine whether there is one speaker or multiple speakers participating in a conversation. A voiceprint can then be extracted from single speech audio using a neural network. Information obtained using voiceprint handling module 3666 can be transmitted to speaker identification module 3654, and module 3654 can associate the identified speaker with the voiceprint. ¶0443 discloses speaker separation module 3658 can receive noisy audio captured by the device and voiceprints of one or more speakers, and separate one or more voices for one or more speakers. When no voiceprint is available for the speaker, matching the voice with a specific speaker can be performed in accordance with the captured images, for example by matching identified words with the lip movement of the speaker, by matching speaking and silent periods, or the like).

Regarding Claim 2, Wexler discloses the wearable device of claim 1,
wherein the at least one processor is configured to obtain the audio segment by selecting a portion of the received first audio signals where no other individual other than the individual is speaking (Wexler ¶0190 discloses at least one keyword can be determined based on at least one or more audio segments captured by apparatus 110. ¶0238 discloses a voice print can be extracted from a segment of a conversation in which an individual speaks alone, and then used for separating the individual's voice later in the conversation, whether the individual's is recognized or not).

Regarding Claim 3, Wexler discloses the wearable device of claim 1,
wherein the at least one processor is further programmed to retrieve a prior voice print of the individual stored in a database (Wexler ¶0409 discloses at step 3516 [Fig. 35A], an audio signal can be received by a microphone of the hearing aid system [e.g., an audio signal of user 100 communicating with another individual]. The audio signal can be further analyzed by a processor of the hearing aid system. Alternatively, the processor can be configured to determine if the captured audio signal corresponds to a recognized individual. For example, the processor can be configured to retrieve the recognized individual voiceprint, from a storage device, based to the person's identity. For example, the voiceprint can be retrieved from the database of server 250).

Regarding Claim 4, Wexler discloses the wearable device of claim 3,
wherein the at least one processor is programmed to generate the voice print of the individual using the obtained audio segment and the retrieved prior voice print (Wexler ¶0527 discloses in step 4306 [Fig. 43A], the processing device can determine an audioprint from the isolated audio stream. The determined audioprint can be a voiceprint associated with an individual).

Regarding Claim 5, Wexler discloses the wearable device of claim 3,
wherein the at least one processor is further programmed to store the generated voice print in the database in association with the prior voice print (Wexler ¶0442 discloses voiceprint handling module 3666 [Fig. 36B] can be used to generate, store, or retrieve a voiceprint, using, for example, wavelet transform or any other attributes of the voice of one or more persons).

Regarding Claim 6, Wexler discloses the wearable device of claim 3,
wherein the at least one processor is further programmed to replace the prior voice print stored in the database with the generated voice print when at least one attribute of the generated voice print is better in quality than at least one attribute of the prior voice print (Wexler ¶0577 discloses processor 4703 can be configured to determine a confidence score associated with the voiceprint match. For example, the confidence score can be based on the degree to which the voiceprint detected in the audio signals matches voiceprint data stored in database 4705 for a given object).

Regarding Claim 7, Wexler discloses the wearable device of claim 1,
wherein the at least one processor is further programmed to store the generated voice print in the database in association with an identifier of the location of the user (Wexler ¶0396 discloses a speaker's voiceprint and a high-quality voiceprint can provide for fast and efficient speaker separation. A high-quality voiceprint for a speaker can be collected, for example, when the speaker speaks alone, preferably in a quiet environment. Having a voiceprint of one or more speakers, a processor of the hearing aid system to separate an ongoing voice signal almost in real time, e.g., with a minimal delay, using a sliding time window. Different time windows can be selected, depending on the quality of the voiceprint, on the quality of the captured audio, the difference in characteristics between the speaker and other speaker(s), the available processing resources, the required separation quality, or the like. ¶0442 discloses voiceprint handling module 3666 [Fig. 36B] can be used to generate, store, or retrieve a voiceprint, using, for example, wavelet transform or any other attributes of the voice of one or more persons).

Regarding Claim 8, Wexler discloses the wearable device of claim 7, further comprising
an image sensor configured to capture one or more images from the environment of the user (Wexler ¶0123 discloses apparatus 110 can include an image sensor system 220 for capturing real-time image data of the field-of-view of user 100),
wherein the at least one processor is further programmed to receive an image including a representation of the location from the image sensor (Wexler ¶0123 discloses apparatus 110 can also include a processing unit 210 for controlling and performing the functionality of apparatus 110, such as to control the capture of image data, analyze the image data, and perform an action and/or output a feedback based on a hand-related trigger identified in the image data), and
retrieve a prior voice print of the individual stored in the database based on the representation of the location in the received image (Wexler ¶0377 discloses server 250 can receive images and audio information [e.g., voiceprints] for various individuals from a variety of sources. For example, Fig. 32 shows server 250 receiving images 3211 and audio data 3212 from user 100 wearing apparatus 110. ¶0392 discloses a voiceprint of a speaker can be obtained using an audio signal associated with a speech of the speaker and stored in the database of server 250 for further reference. The stored voice data can include one or more voiceprints that can be obtained from one or more speeches of the speaker. At least one audio signal can be determined to be associated with the recognized individual based on one or more predetermined voiceprint characteristics associated with the recognized individual detected in the at least one audio signal. The predetermined voiceprint can be stored in association with a person and one or more images or visual characteristics thereof, and optionally updated over time, enhanced, or the like. When the speaker is recognized in one or more images, one or more voiceprints can be retrieved and used for separating the specific voice from a mixture of voices. ¶0393 discloses if a speaker is not identified, the speaker's voiceprint can be extracted from an earlier part of the conversation when only that speaker was engaged in the conversation. The extraction of the voiceprint can be performed on segments of the audio for which the number of speaker algorithm indicates a single speaker. The extracted voiceprint can then be used later in the conversation for separating the speaker's voice from other voices. ¶0441 discloses speaker identification module 3654 can be used in identifying one or more speakers in an image captured by the apparatus, such that a speaker communicating with user 100 can be identified. Speaker identification module 3654 can identify the speaker by his/her location in the captured images that may display a field of view of user 100), and
wherein the at least one processor is programmed to generate the voice print of the individual using the obtained audio segment and the retrieved prior voice print (Wexler ¶0442 discloses voiceprint handling module 3666 [Fig. 36B] can be used to generate, store, or retrieve a voiceprint, using, for example, wavelet transform or any other attributes of the voice of one or more persons).

Regarding Claim 9, Wexler discloses the wearable device of claim 3,
wherein the at least one processor is further programmed to recognize the individual based on the received first audio signals (Wexler ¶0216 discloses a hearing aid can selectively amplify audio signals associated with a voice of a recognized individual. The hearing aid system can store voice characteristics and/or facial features of a recognized person to aid in recognition and selective amplification. For example, when an individual enters the field of view of apparatus 110, the individual can be recognized as an individual that has been introduced to the device, or that has possibly interacted with user 100 in the past. ¶0226 discloses processor 210 can use various techniques to recognize the voice of individual 2010. The recognized voice pattern and the detected facial features can be used, either alone or in combination, to determine that individual 2010 is recognized by apparatus 110. ¶0227 discloses processor 210 can further be configured to determine whether individual 2010 is recognized by user 100 based on one or more detected audio characteristics of sounds associated with a voice of individual 2010. In Fig. 20A, processor 210 can determine that sound 2020 corresponds to voice 2012 of user 2010. Processor 210 can analyze audio signals representative of sound 2020 captured by microphone 1720 to determine whether individual 2010 is recognized by user 100. This can be performed using voice recognition component 2041 [Fig. 20B] and can include one or more voice recognition algorithms, such as Hidden Markov Models, Dynamic Time Warping, neural networks, or other techniques. Voice recognition component and/or processor 210 can access database 2050, which can further include a voiceprint of one or more individuals. Voice recognition component 2041 can analyze the audio signal representative of sound 2020 to determine whether voice 2012 matches a voiceprint of an individual in database 2050).

Regarding Claim 10, Wexler discloses the wearable device of claim 1,
wherein the predetermined time period is 10 minutes (Wexler ¶0558 discloses in step 4654, the processing device can receive at least one audio signal representative of sounds acquired by wearable microphone 4404 during the time period. A predetermined time period can be a design choice, which can be set to 10 minutes).

Regarding Claim 11, Wexler discloses the wearable device of claim 1,
wherein the at least one processor is programmed to process the second audio signals by at least one of
(i) amplifying the sounds of the individual in the additional sounds (Wexler ¶0205 discloses based on the determined user look direction 1750, processor 210 can selectively condition or amplify sounds from a region associated with user look direction 1750. Fig. 18 illustrates an exemplary environment for use of a camera-based hearing aid. Microphone 1720 can detect one or more sounds 1820, 1821, and 1822 within the environment of user 100. Based on user look direction 1750, determined by processor 210, a region 1830 associated with user look direction 1750 can be determined),
(ii) attenuating sounds other than those of the individual in the additional sounds (Wexler ¶0207 discloses conditioning can also include attenuation or suppressing one or more audio signals received from directions outside of region 1830. For example, processor 1820 can attenuate sounds 1821 and 1822 [Fig. 18]),
(iii) adjusting one or more characteristics of the sounds of the individual in the additional sounds (Wexler ¶0208 discloses conditioning can further include changing a tone of audio signals corresponding to sound 1820 to make sound 1820 more perceptible to user 100. For example, user 100 may have lesser sensitivity to tones in a certain range and conditioning of the audio signals can adjust the pitch of sound 1820 to make it more perceptible to user 100. For example, user 100 may experience hearing loss in frequencies above 10 KHz. Accordingly, processor 210 can remap higher frequencies [e.g., at 15 KHz] to 10 KHz. Processor 210 can be configured to change a rate of speech associated with one or more audio signals. Accordingly, processor 210 can be configured to detect speech within one or more audio signals received by microphone 1720, for example using voice activity detection [VAD] algorithms or techniques. If sound 1820 is determined to correspond to voice or speech, for example from individual 1810, processor 220 can be configured to vary the playback rate of sound 1820. For example, the rate of speech of individual 1810 can be decreased to make the detected speech more perceptible to user 100. Various other processing can be performed, such as modifying the tone of sound 1820 to maintain the same pitch as the original audio signal, or to reduce noise within the audio signal), or
(iv) transcribing the sounds of the individual in the additional sounds (Wexler ¶0391 discloses the hearing aid system can be configured to record or transcribe the conversation between multiple speakers. The transcription process can be assisted by captured images by the hearing aid system. For example, the hearing aid system can identify and recognize speaker 3302 and/or speaker 3303. Speaker 3302 can be facing speaker 3303 [not shown in Fig. 33], and, based on the images captured by image capturing device 3322 of the hearing aid system, the hearing aid system can determine that speaker 3302 is addressing speaker 3303. The hearing aid system can be configured to transcribe the conversation between speaker 3302 and speaker 3303 and to identify the first speech as belonging to speaker 3302 and the second speech as to belonging to speaker 3303).

Regarding Claim 12, Wexler discloses the wearable device of claim 1,
wherein the at least one processor is further programmed to cause transmission of the processed second audio signals to an electronic device associated with the user (Wexler ¶0209 discloses the conditioned audio signal can then be transmitted to hearing interface device 1710 and produced for user 100 [Fig. 17A]. ¶0517 discloses selective conditioning module 4108 [Fig. 41A] can cause selective conditioning of at least one audio signal associated with the identified sound-emanating object. For example, selective conditioning module 4108 can amplify sounds from the user's smartphone and avoid from amplifying sounds from other phones. Transmission module 4110 can cause transmission of the at least one conditioned audio signal to a hearing interface device [e.g., hearing interface device 1710] configured to provide sounds to an ear of user 100).
Regarding Claim 13, Wexler discloses the wearable device of claim 12,
wherein the electronic device is at least one of a hearing aid worn by the user, an earphone worn by the user, a headphone worn by the user, a portable electronic device, or a storage device (Wexler ¶0150 discloses feedback-outputting unit 230 [Fig. 2] can be audio headphones, a hearing aid type device, a speaker, a bone conduction headphone, interfaces that provide tactile cues, vibrotactile stimulators, etc. ¶0199 discloses hearing interface device 1710 can be integrated into other devices, such as a Bluetooth™ headset of the user, glasses, a helmet [e.g., motorcycle helmets, bicycle helmets, etc.], a hat, etc. [Figs. 1A-17B]).

Regarding Claim 14, Wexler discloses the wearable device of claim 1, further comprising
an image sensor configured to capture one or more images from the environment of the user (Wexler ¶0123 discloses apparatus 110 can include an image sensor system 220 for capturing real-time image data of the field-of-view of user 100),
wherein the at least one processor is further programmed to receive an image including a representation of the individual from the image sensor (Wexler ¶0123 discloses apparatus 110 can also include a processing unit 210 for controlling and performing the functionality of apparatus 110, such as to control the capture of image data, analyze the image data, and perform an action and/or output a feedback based on a hand-related trigger identified in the image data), and
retrieve a prior voice print of the individual stored in a database using the image (Wexler ¶0377 discloses server 250 can receive images and audio information [e.g., voiceprints] for various individuals from a variety of sources. For example, Fig. 32 shows server 250 receiving images 3211 and audio data 3212 from user 100 wearing apparatus 110. ¶0392 discloses a voiceprint of a speaker can be obtained using an audio signal associated with a speech of the speaker and stored in the database of server 250 for further reference. The stored voice data can include one or more voiceprints that can be obtained from one or more speeches of the speaker. At least one audio signal can be determined to be associated with the recognized individual based on one or more predetermined voiceprint characteristics associated with the recognized individual detected in the at least one audio signal. The predetermined voiceprint can be stored in association with a person and one or more images or visual characteristics thereof, and optionally updated over time, enhanced, or the like. When the speaker is recognized in one or more images, one or more voiceprints can be retrieved and used for separating the specific voice from a mixture of voices. ¶0393 discloses if a speaker is not identified, the speaker's voiceprint can be extracted from an earlier part of the conversation when only that speaker was engaged in the conversation. The extraction of the voiceprint can be performed on segments of the audio for which the number of speaker algorithm indicates a single speaker. The extracted voiceprint can then be used later in the conversation for separating the speaker's voice from other voices. ¶0441 discloses speaker identification module 3654 can be used in identifying one or more speakers in an image captured by the apparatus, such that a speaker communicating with user 100 can be identified. Speaker identification module 3654 can identify the speaker by his/her location in the captured images that may display a field of view of user 100), and
wherein the at least one processor is programmed to generate the voice print of the individual using the obtained audio segment and the retrieved prior voice print (Wexler ¶0442 discloses voiceprint handling module 3666 [Fig. 36B] can be used to generate, store, or retrieve a voiceprint, using, for example, wavelet transform or any other attributes of the voice of one or more persons).

Regarding Claim 15, Wexler discloses the wearable device of claim 14,
wherein the at least one processor is further programmed to retrieve the prior voice print of the individual from the database (Wexler ¶0275 discloses processor 210 can perform further analysis on the first and second audio signals, for example, by determining the identity of individuals 2310 and 2410 using available voiceprints thereof) by comparing the received image with a plurality of images stored in the database in association with voice prints (Wexler ¶0224 discloses processor 210 [Fig. 20B] can be configured to store data associated with one or more faces recognized in images captured by apparatus 110 in database 2050. Each time a face is detected in the images, the detected facial features or other data can be compared to previously identified faces in database 2050. ¶0227 discloses database 2050 can contain voiceprint data associated with a number of individuals, similar to the stored facial identification data. ¶0276 discloses in step 2560 [Figs. 24-25], process 2500 can include causing selective conditioning of the first audio signal based on a determination that the first audio signal is associated with the identified lip movement associated with the mouth of the individual. Processor 210 can compare the identified lip movement with the first and second audio signals identified in step 2550. For example, processor 210 can compare the timing of the detected lip movements with the timing of the voice patterns in the audio signals. Where speech is detected, processor 210 can further compare specific lip movements to phonemes or other features detected in the audio signal. Accordingly, processor 210 can determine that the first audio signal is associated with the detected lip movements and is thus associated with an individual who is speaking).

Claims 16-19 are rejected for the same reasons as set forth in Claims 1-15 (Wexler ¶0025 discloses a non-transitory computer readable storage medium stores program instructions which are executed by at least one processor to perform the actions).

Claim 20 is rejected for the same reasons as set forth in Claim 1.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YOGESHKUMAR G PATEL whose telephone number is (571)272-3957. The examiner can normally be reached 7:30 AM-4 PM PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Duc Nguyen can be reached on 571-272-7503. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/YOGESHKUMAR PATEL/Primary Examiner, Art Unit 2651