Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Claims 1-20 are pending. Claims 1, 13, 19 and 22 are independent.
This Application was published as U.S. 20220084537.
            Apparent priority: 17 September 2020.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claims 1, 3, 7-8, 11-12, 19, and 21 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Sabin (20210345047).
Regarding Claim 1, Sabin teaches:
1. A system for selectively amplifying audio signals, the system comprising: 
at least one microphone configured to capture sounds from an environment of the user; and [Sabin, Figure 1, “one or more microphones 114.”  See [0033].]
at least one processor programmed to: [Sabin, Figure 1, “Audio processing system 102” and [0063] and “[0065] Actions associated with implementing all or part of the functions can be performed by one or more programmable processors executing one or more computer programs to perform the functions. …”]
receive an audio signal representative of sounds captured by the at least one microphone; [Sabin, Figure 1, “Microphone Inputs 116.”]
determine whether the audio signal comprises speech by a user of the system; [Sabin, Figure 1, “sensor 112” to “Voice Activity Detector 110” detects speech and further it detects that the speech is by the user of the system: “[0034] … In certain implementations, a voice activity detector (VAD) 110 is utilized to detect a voice of the user. VAD 110 may for example include or otherwise use a sensor 112 such as an accelerometer, bone conductive transducers, etc., that detects vibrations indicative of the user talking. Alternatively, VAD 110 may analyze a microphone signal to detect a user's voice.”  “[0036] Accordingly, VAD 110 can be configured to capture the phase difference between two of the microphones 114, analyze the phase difference, and determine if the acoustic signal being captured is the user's voice or an external ambient acoustic signal….”]
subject to the audio signal comprising speech by the user, modify the audio signal by attenuating a first part of the audio signal comprising the speech by the user; [Sabin, Figure 1, “Voice Suppression System 104.”  “[0038] In response to the VAD 110 detecting the voice of the user, the voice suppression system 104 institutes one or more actions to suppress the voice of the user in captured acoustic signals from the ambient environment….”]
subject to the audio signal comprising audio other than speech by the user, modify the audio signal by amplifying a second part of the audio signal comprising audio other than the speech by the user; and [Sabin, Figure 1, “Amplification Adjustment” and “Amplifier System 106.”  Sabin reduces the amplification when voice of the user is in the input audio and turns the amplification back up when user’s voice is no longer detected:  “[0038] …In some implementations, voice suppression system 104 interfaces with the amplifier system 106 to reduce amplification of the audio signals being output to the transducer 118 from a first level to a second level. When the VAD 110 no longer detects the user's voice, amplification is returned back to the first level….”]
transmit the modified audio signal to a hearing interface device. [Sabin, Figure 1, the adjusted/modified audio is output to the speaker 118.  See [0038] and [0042].]

Regarding Claim 3, Sabin teaches:
3. The system of claim 1, wherein the processor is further programmed to: 
perform additional processing on the second part of the audio signal; and [Sabin, Figure 1, “Noise Reduction System 108” operated on the environmental audio / “second part of the audio signal” only.  “[0017] In some implementations, active noise reduction (ANR) may be performed on the captured acoustic signals, wherein a first set of ANR filters optimized to reduce an occlusion are implemented in response to voice signals being detected and a second set of ANR filters optimized to reduce environmental noise are implemented in response to no voice signals being detected.”  See also [0041] discussing how the function of the ANR changes depending on whether user’s speech is detected or not.]
avoid performing the additional processing on the first part of the audio signal. [Sabin teaches that the user’s speech / “first part of the audio signal” is attenuated or altogether muted or even not captured by configuring the microphones to create a null at the user’s mouth.  See [0038]-[0041] and the bullet in the “voice suppression system 104.” Therefore, in several of these situations, there is no audio to be subjected to “additional processing.”]

Regarding Claim 7, Sabin teaches: (See Claim 21 which expresses the same idea in more detail)
7. The system of claim 1, wherein the at least one processor is programmed to modify the audio signal based on a user setting. [Sabin, “[0045] For example, upon detection of the user's voice, voice suppression system 204 can forward an amplifier control signal back to the device 200 instructing the amplifier system 206 to reduce amplification from a first level to a second level. When VAD 220 no longer detects the user's voice, a second control signal is sent to the accessory 202, and voice suppression system 204 forwards a second amplifier control signal back to the device 200 instructing the amplifier system 206 to return amplification to the first level. In certain implementations, user controls 222 are utilized to allow the user to set the first level and/or the second level.”]

Regarding Claim 8, Sabin teaches:
8. The system of claim 7, wherein
 the at least one processor is programmed to receive the user setting from a user device, and 
the user device includes one of a mobile phone, a laptop computer, a tablet computer, or a smartwatch. [Sabin teaches “[0031] Additionally, the solutions disclosed herein are intended to be applicable to a wide variety of accessory devices, i.e., devices that can communicate with a wearable audio device and assist in the processing of audio signals. Illustrative accessory devices include smartphones, Internet of Things (IoT) devices, computing devices, specialized electronics, vehicles, computerized agents, carrying cases, charging cases, smart watches, other wearable devices, etc.”]

Regarding Claim 11, Sabin teaches:
11. The system of claim 1, wherein the at least one processor is programmed to 
amplify the second part of the audio signal to a sound level about equal to an overall sound level of the captured audio signal. [Sabin teaches an embodiment where the speech is attenuated by lowering the level of amplification and then for environmental sound, this lowering is no longer performed and the audio goes back to the first level / “level of the captured audio signal”:    “[0038] In response to the VAD 110 detecting the voice of the user, the voice suppression system 104 institutes one or more actions to suppress the voice of the user in captured acoustic signals from the ambient environment. In some implementations, voice suppression system 104 interfaces with the amplifier system 106 to reduce amplification of the audio signals being output to the transducer 118 from a first level to a second level. When the VAD 110 no longer detects the user's voice, amplification is returned back to the first level. Although any amount of gain reduction can be utilized, a reduction of gain on the order of 10 dB, for example, reduces the negative impact of own-voice amplification without distracting attenuation of environmental sounds.”]

Regarding Claim 12, Sabin teaches:
12. The system of claim 1, wherein the at least one processor is programmed to 
amplify the second part of the audio signal to a sound level not exceeding an overall sound level of the captured audio signal. [Sabin in [0038] provided above teaches going back to the same level of the captured audio signal and thus does not exceed this level.]

Claim 19 is a method Claim with limitations similar to the limitations of system Claim 1 and is rejected under similar rationale.
19. A method for selectively amplifying audio signals, the method comprising: 
receiving at least one audio signal representative of the sounds captured by a microphone from an environment of a user; 
determining whether the audio signal comprises speech by the user; 
subject to a first part of the audio signal comprising speech by the user, modifying the audio signal by attenuating the first part of the audio signal; 
subject to a second part of the audio signal comprising audio other than speech by the user, modifying the audio signal by amplifying the second part of the audio signal; and 
transmitting the modified audio signal to a hearing interface device.

Claim 21 is a method Claim with limitations similar to the limitations of system Claim 7 and is rejected under similar rationale.
21. The method of claim 19, further comprising: 
receiving at least an attenuation or an amplification setting from the user; 
performing at least one of attenuating the first part or amplifying the second part based on the received setting.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Sabin.
Regarding Claim 9, Sabin teaches or suggests:
9. The system of claim 1, wherein the at least one processor is programmed to 
attenuate the first part of the audio signal and amplify the second part of the audio signal in accordance with a predetermined ratio. [Sabin teaches a gain reduction with an example of 10 dB which teaches or suggests a “predetermined ratio” of the Claim and further the attenuation and amplification are both done according to the same gain reduction.  Further, the teaching that first and second levels are input or known also teaches or suggests the ratio which can be easily obtained by dividing one level over another:  “[0038] In response to the VAD 110 detecting the voice of the user, the voice suppression system 104 institutes one or more actions to suppress the voice of the user in captured acoustic signals from the ambient environment. In some implementations, voice suppression system 104 interfaces with the amplifier system 106 to reduce amplification of the audio signals being output to the transducer 118 from a first level to a second level. When the VAD 110 no longer detects the user's voice, amplification is returned back to the first level. Although any amount of gain reduction can be utilized, a reduction of gain on the order of 10 dB, for example, reduces the negative impact of own-voice amplification without distracting attenuation of environmental sounds.”]

Claims 2 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Sabin in view of Cho (U.S. 9330673) and Levitt (U.S. 2016/0111111).
Regarding Claim 2, Sabin teaches that other sensors including cameras can be used in addition to the microphones to collect the input signals: “[0066] It is noted that while the implementations described herein utilize microphone systems to collect input signals, it is understood that any type of sensor can be utilized separately or in addition to a microphone system to collect input signals, e.g., accelerometers, thermometers, optical sensors, cameras, etc.”  But does not discuss the use of the camera in detail.
Cho teaches:
2. The system of claim 1, further comprising: 
a wearable camera configured to capture a plurality of images from the environment of the user, [Cho “A method and apparatus for performing microphone beamforming. The method includes recognizing a speech of a speaker, searching for a previously stored image associated with the speaker, searching for the speaker through a camera based on the image, recognizing a position of the speaker, and performing microphone beamforming according to the position of the speaker.”  Abstract.]
wherein the at least one processor is programmed to determine whether the audio signal comprises the speech by the user by matching at least a part of the audio signal with the plurality of images. [Cho first recognizes the speech and finds the previously stored image of the speaker and then searches the environment by a camera to find the same speaker based on his image.  So, Cho first matches the audio signal to one stored image and then matches the stored image to a plurality of images obtained from the vicinity.  See Figure 1.  Identify speaker by recognizing speech of speaker, S110, search for image about speaker S120, search for speaker through camera based on image at S130.  See also Figure 6.  Col. 3, line 7 to Col. 4, line 45.]
Sabin and Cho pertain to cancellation/attenuation of part of the sound received by a particular user and it would have been obvious to combine the method of detecting a particular sound source by camera from Cho with the system of Sabin as one method of determining which part of the sound is the target sound.  This combination falls under combining prior art elements according to known methods to yield predictable results or simple substitution of one known element for another to obtain predictable results. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Cho does not teach that the camera is a wearable camera included in the device of Claim 1.
Levitt teaches:
a wearable camera configured to capture a plurality of images from the environment of the user, [Levitt teaches that the “acoustic and optic signals that are received by the Speech Recognition Aid (SRA) 105 includes image captured by a “wearable camera.”  “[0079] Referring now to FIG. 1A, this figure depicts an embodiment of the SRA that may be used, for example, in face-to-face communication. In this embodiment, speech produced by a talker may be transmitted to the user of the SRA by means of acoustic and optic signals which are received by the SRA 105. The acoustic signals reaching the SRA 105 may be received by one or microphones which serve as the acoustic input to the SRA. The optic signals reaching the SRA 105 may be received by one or more wearable cameras which serve as the optic input to the SRA 105. The received acoustic and optic signals may be processed by the SRA 105 to improve the intelligibility and/or sound quality of the speech.”]
Sabin and Cho and Levitt pertain to cancellation/attenuation of part of the sound received by a particular user and it would have been obvious to combine the wearable camera of Levitt as a type of camera to be used for detecting a sound source with the system of combination.  This combination falls under combining prior art elements according to known methods to yield predictable results or simple substitution of one known element for another to obtain predictable results. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Claim 20 is a method Claim with limitations similar to the limitations of system Claim 2 and is rejected under similar rationale.
20. The method of claim 19, further comprising: 
receiving a plurality of images captured by a wearable camera from the environment of the user; and 
determining whether the audio signal comprises the speech based on the plurality of images.

Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Sabin in view of Levitt.
Regarding Claim 4, Sabin teaches that by suppressing the voice of the user, performance of the device is enhanced and therefore arguably does teach “enhancing clarity.”  See [0062].  However, another reference is added.
Levitt teaches (SRA in Levitt is “speech recognition aid”)
4. The system of claim 3, wherein the additional processing comprises at least one of 
enhancing clarity, [Levitt, Figure 5, “[process speech signal to improve intelligibility of speech signal (525).  “According to one embodiment, a method for improving intelligibility of a speech signal may include (1) at least one processor receiving an incoming speech signal comprising a plurality of sound elements; (2) the at least one processor recognizing a sound element in the incoming speech signal to improve the intelligibility thereof; (3) the at least one processor processing the sound element by at least one of modifying and replacing the sound element; and (4) the at least one processor outputting the processed speech signal comprising the processed sound element.”]
enhancing diction, 
modifying a pitch, 
reducing a playback speed, and [Levitt, “[0097] … The SRA, in an embodiment according to method 300, slows down the speech signal and/or elements of the speech signal including pauses in order to compensate for the reduced rate of temporal processing and reduced neural synchrony….”]
changing a gap between words. [Levitt, Figure 3, “[0100] According to another embodiment, the SRA may operate to identify continuants as well as pauses in the speech signal and thereafter increase their duration, in step 315. Accordingly, portions of the speech signal showing slow changes in formant values and pitch periods may be increased in duration to improve intelligibility.”]
Sabin and Levitt pertain to cancellation/attenuation of part of the sound to improve clarity and it would have been obvious to combine the various methods of improving clarity from Levitt with the system of Sabin which is more general.  This combination falls under combining prior art elements according to known methods to yield predictable results or simple substitution of one known element for another to obtain predictable results. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Claims 5-6 are rejected under 35 U.S.C. 103 as being unpatentable over Sabin in view of Hansen (U.S. 20130094683).
Regarding Claim 5, Note that the “First part” is the user’s speech.  Sabin teaches that the latency in a hearing assistance device causes a delay for the user in hearing his own voice.  This causes an undesirable echo.  Therefore, it makes sense that if the sound is being reproduced and delayed, that the delay imparted to the speech of the user is less than the delay of the environmental sounds.  But Sabin does not compare the delay between the speech and the environmental sounds.  (See [0253] of the published Specification for support for this Claim.)
Hansen teaches:
5. The system of claim 1, wherein the first part is transmitted at a delay smaller than the second part. [Hansen teaches that there is a delay between the reception of a processed and transmitted signal and acoustic propagation of the same audio signal.  When a person is speaking and hearing his own voice, the propagation delay is small because the distance is small.  Therefore, the propagation delay is smaller than the processing delay.   “[0058] … Typically, for a given system, the processing delays are known (at least within limits) and only the propagation delays vary (according to the distances between the sound sources and the user wearing the binaural listening system, which also typically can vary only within certain limits, e.g. limited by the boundaries of a room).”  “[0021] … In an embodiment, the aligned streamed target audio signal is mixed (e.g. by addition) with an attenuated version of the propagated electric signal. …”  “[0024] Preferably, each of the first and second listening devices are adapted to determine a delay between a time of arrival of the first, second streamed target audio signals and the first, second propagated electric signals, respectively. In an embodiment, the delay differences are determined in the alignment units of the respective listening devices.”  “[0033] In an embodiment, the listening device comprises an element for attenuating an acoustically propagated sound into the ear canal of a user wearing the listening device (e.g. through a vent or other opening between the listening device and the walls of the ear canal)….”]
Sabin and Hansen pertain to attenuation and amplification of portions of sound and it would have been obvious to combine the teaching of Hansen that a propagated sounds has a delay different from a processed sound in the “mixed signal” with the teaching of Sabin that is directed to attenuating the speech of the user to arrive at the claimed situation where the portion with less processing arrives with a shorter delay as a consequence of processing steps or lack thereof.  This combination falls under combining prior art elements according to known methods to yield predictable results or simple substitution of one known element for another to obtain predictable results. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.  (According to the Specification of the instant Application, this is not even a design choice and rather something that occurs naturally as a result of differential processing.)

Regarding Claim 6, Sabin does not mention but Hansen teaches:
6. The system of claim 1, wherein the first part is transmitted with a delay not exceeding 30 ms. [Hansen teaches if a Bluetooth system is used the propagation delay would be below 30ms: “[0056] … In some applications, e.g. digital systems, e.g. Bluetooth or DECT or ZigBee systems, the wireless propagation and processing delay is relatively long (e.g. more than 10 ms, e.g. more than 15 ms, e.g. more than 25 ms).”]
Rationale as provided for Claim 6.  Differential processing or different transmissions paths yield different delays.

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Sabin in view of Pederson (U.S. 20180262847).
Regarding Claim 10, Sabin does not teach that the selected levels relate to one another although it is normal that they do.
Pederson teaches:
10. The system of claim 1, wherein the at least one processor is programmed to 
attenuate the first part of the audio signal based on the second part of the audio signal. [Pederson teaches that the sound from different directions are differentially amplified and attenuated for example to compensate for background noise which means that attenuation of one part depends on the level of the other part:  “[0125] The hearing device further comprises a signal processor (PRO) for providing a processed frequency sub-band signal OUT based on the wirelessly received audio signal s and the environment signal Ã. The signal processor (PRO) may e.g. be configured to execute a number of processing algorithms (e.g. for applying a frequency and level dependent gain (or attenuation) to the input signal(s), e.g. to compensate for a hearing impairment of the user, and/or to compensate for a noisy environment) for enhancing the input signals s, Ã….”]
Sabin and Pederson pertain to differential amplification and attenuation of different sounds and it would have been obvious to make the differential levels dependent on one another or relative to each other as taught by Pederson because usually one is modified to compensate for the impact of the other.  This combination falls under combining prior art elements according to known methods to yield predictable results or simple substitution of one known element for another to obtain predictable results. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Claims 13, 15, 17, 22, and 24 are rejected under 35 U.S.C. 103 as being unpatentable over Sabin in view of Lovitt (U.S. 20200135163).
Regarding Claim 13, Sabin teaches the limitations of this Claim as shown with respect with Claim 1 except that Claim 13 has two limitations that differ from Claim 1 in that Claim 1 includes “determine whether the audio signal comprises speech by a user of the system;” and based on this determination performs the selective attenuation whereas Claim 13 determines the direction of audio from the user and selectively attenuates the audio from that direction.  Sabin does teach creating a null at the mouth of the user/speaker and teaches that acoustic signals from the user’s mouth are attenuated.  See Sabin:“[0013] In some implementations, the beamforming forms a null directed toward a mouth of the user.” And [0040] .. In these cases, the microphone inputs 116 are processed such that acoustic signals from the direction of the user's mouth are attenuated, thereby relatively enhancing acoustic signals detected in the remainder of the ambient environment….” 
But Sabin does not teach determining the direction of audio from the user as the direction of receiving sound from the user.  
Lovitt teaches:
13. A system for selectively amplifying audio signals, the system comprising: 
at least one microphone configured to capture sounds from an environment of the user; and 
at least one processor programmed to: 
determine a user-audio direction, the user-audio direction being a direction from which audio from a user is received by the at least one microphone; [Lovitt teaches the use of a DOA analyzer to determined the location of a source and DOA operates based on the direction of arrival of sound to determine where or at least in which direction the sound is:  “[0056] Localizing an audio source may be performed in a variety of different ways. In some cases, an AR or VR headset may initiate a direction of arrival (DOA) analysis to determine the location of a sound source. The DOA analysis may include analyzing the intensity, spectra, and/or arrival time of each sound at the AR/VR device to determine the direction from which the sounds originated. In some cases, the DOA analysis may include any suitable algorithm for analyzing the surrounding acoustic environment in which the artificial reality device is located.”]
receive an audio signal representative of sounds captured by the at least one microphone; [Lovitt, Figure 8, “microphone 606” receiving “sound 622.”]
determine whether the audio signal is being received from the user-audio direction; [Lovitt, Figure 8, “direction analyzer 620” and “determined direction 621” which operates on the sound (as opposed to gaze detection etc.):  “[0078] FIG. 8 illustrates an embodiment in which the sound reproduction system 604 includes a direction analyzer 620. The direction analyzer 620 may be configured to detect which direction the identified external sound 622 originated from. For instance, the direction analyzer may analyze signal strength of the sound 622 and determine that the signal is strongest in direction 621. Other means of determining the direction of the identified sound 622, including receiving an indication of location from another electronic device, may also be used…..”  “[0006] In some examples, the identified external sound may include various words, or a specific word or phrase. In some examples, the method may further include detecting which direction the identified external sound originated from and presenting the identified external sound to the user as coming from the detected direction. In some examples, the active noise cancelling signal may be further modified to present subsequently occurring audio from the detected direction.”]
subject to a first part of the audio signal being received from the user-audio direction, modify the audio signal by attenuating the first part of the audio signal; [This Claim does not say who the user is and what his relationship to the device may be.  Therefore, any of the people shown in Figure 8 may be the user whose speech is attenuated.  However, considering that intent of the Claim is known, the best reference teaching the attenuation of the user’s own voice is Sabin which was applied to Claim 1.]
subject to a second part of the audio signal being received from directions other than the user-audio direction, modify the audio signal by amplifying the second part of the audio signal; and [ Lovitt teaches active noise cancellation which removes or attenuates a part of the sound input and amplifies another part:  “The disclosed computer-implemented method may include applying, via a sound reproduction system, sound cancellation that reduces an amplitude of various sound signals. The method further includes identifying, among the sound signals, an external sound whose amplitude is to be reduced by the sound cancellation. The method then includes analyzing the identified external sound to determine whether the identified external sound is to be made audible to a user and, upon determining that the external sound is to be made audible to the user, the method includes modifying the sound cancellation so that the identified external sound is made audible to the user….”  Abstract.]
transmit the modified audio signal to a hearing interface device. [Lovitt, Figure 8, “modified ANC signal 623” being provided to the user 601.]

Sabin and Lovitt pertain to cancellation/attenuation of part of the sound received by a particular user and it would have been obvious to combine the method of detecting the sound that is to be attenuated from Lovitt with the system of Sabin as one method of determining which part of the sound is the target sound.  This combination falls under combining prior art elements according to known methods to yield predictable results or simple substitution of one known element for another to obtain predictable results. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396. (Note that Pederson could have been used instead of Lovitt.)

Regarding Claim 15, Sabin teaches:
15. The system of claim 13, wherein the at least one processor is programmed to 
determine the user-audio direction based on at least two microphones or a microphone array. [Sabin, Figure 1, and “[0033] … Any number of microphones 114 may be utilized (e.g., a microphone array), ….”]

Regarding Claim 17, Sabin in Figures 2 and 3 teaches a system including “a wearable hearing assist device and an accessory device” and the input can be received at either of those.  Sabin does not teach determining direction from an input to user device.  (See also rejection of Claim 18 over Pederson below.)
Lovitt teaches:
17. The system of claim 13, wherein the at least one processor is programmed to 
determine the user-audio direction based on an input received from a user device. [Lovitt, Figure 4 shows a “computer system 401” / “user device” that is separate from the headphone worn by the user 416.  The computing system 401 has access to user data (see [0086]) and at the least suggests a user device.  Figure 8 teaches the “directional analyzer 620” determining the direction of sound 622 based on input to it and then run the “sound reproduction module 408.”  See [0078].  “[0064] … The sound reproduction module 408 may be its own sound reproduction system, separate from computer system 401, or may be a module within computer system 401. The sound reproduction module 408 may generate speaker signals that drive speakers heard by the user 416. For instance, the sound reproduction module 408 may provide an audio signal to the user's head phones or external speakers….”]
Rationale for combination as provided for Claim 13.  This Claim adds a limitation pertaining to determining the direction of user which was brought in from Lovitt.

Claim 22 is a method Claim with limitations similar to the limitations of system Claim 13 and is rejected under similar rationale.
22. A method for selectively amplifying audio signals, the method comprising: 
determining a user-audio direction, the user-audio direction being a direction from which audio from a user is received by the at least one microphone; 
receiving at least one audio signal representative of the sounds captured by the at least one microphone from an environment of a user; 
determining whether the audio signal is being received from the user-audio direction; 
subject to a first part of the audio signal being received from the user-audio direction, modifying the audio signal by attenuating the first part of the audio signal; 
subject to a second part of the audio signal being received from directions other than the user-audio direction, modifying the audio signal by amplifying the second part of the audio signal; and 
transmitting the modified audio signal to a hearing interface device.

Claim 24 is a method Claim with limitations similar to the limitations of system Claim 17 and is rejected under similar rationale.
24. The method of claim 22, 
wherein determining the user-audio direction comprises receiving an input indicative of the user-audio direction from a user device.

Claims 14 and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Sabin and Lovitt in view of Cho.
Claim 14 is a Claim with limitations similar to the limitations of system Claim 2 and is rejected under similar rationale.
Lovitt teaches the use of a camera for detecting eye movement, user location, or sound events “[0059] … These sensors may include cameras, IR sensors, heat sensors, motion sensors, GPS receivers, or in some cases, sensor that detect a user's eye movements. For example, as noted above, an artificial reality device may include an eye tracker or gaze detector that determines where the user is looking. Often, the user's eyes will look at the source of the sound, if only briefly. Such clues provided by the user's eyes may further aid in determining the location of a sound source. Other sensors such as cameras, heat sensors, and IR sensors may also indicate the location of a user, the location of an electronic device, or the location of another sound source. …”  “[0081] …As with audio inputs, the event analyzer 630 may be configured to analyze camera or other sensor inputs to detect when an event has occurred….”  
Lovitt further teaches that its device is a wearable headset.  See Figure 1 and [0032].
Cho as applied to Claim 2 teaches the comparison of images for user-direction detection.
14. The system of claim 13, further comprising: 
a wearable camera configured to capture a plurality of images from the environment of the user, 
wherein the at least one processor is further programmed to determine the user-audio direction based on the plurality of images.
Sabin and Lovitt and Cho pertain to cancellation/attenuation of part of the sound received by a particular user and it would have been obvious to combine the method of detecting the target speaker by image from Cho with the system of combination as one other equivalent method of locating the source of sound.  This combination falls under combining prior art elements according to known methods to yield predictable results or simple substitution of one known element for another to obtain predictable results. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Claim 23 is a method Claim with limitations similar to the limitations of system Claim 2 or 14 and is rejected under similar rationale.
23. The method of claim 22, further comprising: 
receiving a plurality of images captured by a wearable camera from the environment of the user; and 
determining whether the audio signal comprises the speech by the user by matching at least a part of the audio signal with the plurality of images.

Claims 16 and 25 are rejected under 35 U.S.C. 103 as being unpatentable over Sabin and Lovitt in view of Bisani (U.S. 20150095026).
Regarding Claim 16, Sabin and Lovitt don’t teach a command to set the user direction and do not teach voice commands from the voice of which the direction of user may be determined.  Lovitt, however, was cited for the teaching that the direction of user is determined based on his received speech and speech generally includes spoken commands in many modern devices.
Bisani teaches:
16. The system of claim 13, wherein the at least one processor is programmed to 
determine the user-audio direction in response to a command received from the user. [Bisani teaches that its device receives spoken commands and also teaches that the direction of the desired audio is determined by conducting speech recognition on the sound from different directions and once the direction of the user is detected, the microphone array is steered toward the user.  See Figure 1.  “[0016] Automatic speech recognition (ASR) techniques enable a user to speak into an audio capture device (e.g., audio input/capture element and/or microphone) and have audio signals including speech translated into a command that is recognized by an ASR device. While audio input to a device may include speech from one or more users, it may also include background noise such as audio from other sources (e.g. other individuals, background audio from appliances, etc.). The audio from the other sources may originate from different directions, which further complicates the reception and processing of a desired audio….”  “[0062] If the direction of the audio from the user is unknown, however, a number of techniques and considerations may be used to determine the direction of the desired speech….”  “[0066] …Speech recognition results may be evaluated for all directions, and then the system may determine which direction is the most likely based on the speech recognition results. ….”]
Sabin/Lovitt and Bisani pertain to amplification/attenuation of part of the sound received from a particular direction (such as user’s direction) and it would have been obvious to combine the feature of Bisani that selects the directions of amplification or attenuation according to speech recognition results of the voice of the user and that a type of desirable speech that the system looks for is spoken command with the system of combination that finds the user direction based on his speech but does not say that user speech includes spoken commands.  This combination falls under combining prior art elements according to known methods to yield predictable results or simple substitution of one known element for another to obtain predictable results. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Claim 25 is a method Claim with limitations similar to the limitations of system Claim 16 and is rejected under similar rationale.
25. The method of claim 22, wherein determining the user-audio direction comprises: 
receiving an audio command from the user; and 
determining a direction from which the audio command is received as the user-audio direction.
Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over Sabin and Lovitt in view of Pederson.
Regarding Claim 18, Sabin and Lovitt don’t teach the feature claimed.
Pederson teaches:
18. The system of claim 17, wherein the input comprises at least one of a graphical or numerical representation of the user-audio direction. [Pederson, Figure 5B.  “[0009] … Alternatively, the hearing device (e.g. the beamformer filtering unit) may be arranged to cancel or attenuate said audio signal from said audio signal source (in said beamformed signal) in dependence of said direct representation of the audio signal or on an estimate or an indication (e.g. from a user interface) of a direction to said audio signal source.”  “[0149] … The auxiliary device (e.g. a SmartPhone) of FIG. 5A, 5B comprises a user interface (UI) providing the function of a remote control of the hearing system, e.g. for changing program or operating parameters (e.g. volume) in the hearing device(s), etc. The user interface (UI) of FIG. 5B illustrates an APP (denoted ‘TV Audio APP’) for selecting a mode of operation of the hearing system where audio signals streamed to the left and right hearing devices (HD.sub.L, HD.sub.R) are mixed with signals from the environment. The APP allows a user to select a manual (Manual), and an automatic (Automatic) mode (cf. Select source signals AS, TV). In the screen of FIG. 5B, the manual mode of operation has been selected as indicated by the left solid ‘tick-box’ and the bold face indication Manual. In this mode, the direction of arrival of a target sound source among the acoustic around sources (AS, other than the audio source, e.g. from the TV) and the direction to the audio sound source (TV) can be manually selected, e.g. via the touch sensitive screen….”

    PNG
    media_image1.png
    405
    324
    media_image1.png
    Greyscale
]
Sabin/Lovitt and Pederson pertain to amplification/attenuation of part of the sound received from a particular direction and it would have been obvious to combine the input of direction of sound from a user interface as shown in Pederson with the system of combination to allow more control to the user.  This combination falls under combining prior art elements according to known methods to yield predictable results or simple substitution of one known element for another to obtain predictable results. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Di Censo (U.S. 20180302738) “[0032] In another example, the user interface may receive verbal commands for selecting directions and other sound modification settings and parameters….”
Cella (U.S. 11488590) teaches that the direction of speech is determined to be from the user when “According to some embodiments, identifying the speech portion of the audio signal that contains the speech of the user includes analyzing a plurality of composite audio signals to determine a direction of an audio source present in the audio signal with respect to the in-ear device and determining that the audio signal contains the speech of the user when the direction of the audio source indicates that the audio source is inside a head of the user. The plurality of composite audio signals make up the audio signal.”  Col. 15, line 64 to Col. 16, line 5.
Agrawal (U.S. 20190268460) teaches that user can change the configuration settings of how each sound is modified and that the modifications include attenuations and amplifications:  “[0062] …This can include the communication device automatically accessing lookup tables and/or default values to help determine what parameters to change and how. However, sometimes the user desires to have more control over how a communication session gets modified. Accordingly, various implementations provide the user with access to enter user-defined configuration settings to drive what modifications are applied based on a proximity context.”  “[0052] … While illustrated here as an attenuation of two audio level units, any other attenuation can be applied. To determine what attenuation to apply, various implementations use the proximity context information in combination with lookup tables.”  “[0034] … In some implementations, device assistant module 116 manages an audio output level associated with audio output module 212 such that device assistant module 116 can amplify and/or attenuate the sounds generated by audio output module 212.”
Xu (U.S. 11234073) determines the location and hence direction of sound sources including the “user” of the Claim by various methods including DOA information which estimates the location from the direction of arrival of sound/audio.  See Figure 6 for implementation of DOA method of determining the direction of a speaker 604.  Figure 3, the speech of user 302 is marked “undesired speech” and is attenuated whereas speech of user 301 is “desired speech” and is amplified. “The computer system 101 may also include a location determining module 108. When determining how to selectively apply ANC to undesired sound sources, the embodiments herein may first determine the spatial location of those undesired sound sources. As referred to herein, an object's “spatial location” may refer to that object's two-dimensional or three-dimensional location in space. Any information associated with that location may be referred to as “spatial location information.” In some cases, the environment information 119 identified above may include GPS coordinates for the sound source (e.g., the GPS location of a television hung on a wall). In cases where the environment information 119 does not include explicit location information, the location determining module 108 may use various methods to gather spatial location information 109 and determine the spatial location of a given sound source. For example, the location determining module 108 may implement direction of arrival (DoA) information, depth camera information, inertial motion unit (IMU) information, or other sensor input to determine where a given sound source (e.g., a person (e.g., 120-122) or an electronic device) is currently located.”  Col. 3, 25-45.  “… The DOA analysis may include analyzing the intensity, spectra, and/or arrival time of each sound at the artificial reality glasses 602 to determine the direction from which the sounds originated. The DOA analysis may include various algorithms for analyzing the surrounding acoustic environment to determine where the sound source is located.”  Col. 8, 5-20   “The DOA analysis may be designed to receive input signals at the microphone array 603 and apply digital signal processing algorithms to the input signals to estimate the direction of arrival (e.g., directions D1, D2, D3, and D4). These algorithms may include, for example, delay and sum algorithms where the input signal is sampled, and the resulting weighted and delayed versions of the sampled signal are averaged together to determine a direction of arrival. A least mean squared (LMS) algorithm may also be implemented to create an adaptive filter. This adaptive filter may then be used to identify differences in signal intensity, for example, or differences in time of arrival. These differences may then be used to estimate the direction of arrival. In another embodiment, the DOA may be determined by converting the input signals into the frequency domain and selecting specific bins within the time-frequency (TF) domain to process. Each selected TF bin may be processed to determine whether that bin includes a portion of the audio spectrum with a direct-path audio signal. Those bins having a portion of the direct-path signal may then be analyzed to identify the angle at which a microphone array received the direct-path audio signal. The determined angle may then be used to identify the direction of arrival for the received input signal. In this manner, a DOA analysis may be used to identify the location of the sound source.”  Col. 8, 21-45.  Xu, “Accordingly, the embodiments of this disclosure are directed to an adaptive ANC system that attempts to preserve sound sources from certain directions or locations relative to a listener and only attenuate (or apply ANC to) the sounds from undesired directions.”  Col. 2, 24-30.

Brimijoin (U.S. 20220021972) has various methods for determining the location/direction of sound sources which include a DOA (direction of arrival) estimation that estimates the location from the direction of arrival of sound from a source.  Figure 2, “DOA estimation module 240.”  “[0062] The DOA estimation module 240 is configured to localize sound sources in the local area based in part on captured sound from the microphone array 210. Localization is a process of determining where sound sources are located relative to the user of the audio system 200. The DOA estimation module 240 performs a DOA analysis to localize one or more sound sources within the local area and update the model of the local area accordingly. The DOA analysis may include analyzing the intensity, spectra, and/or arrival time of each sound at the microphone array 210 to determine the direction from which the sounds originated. In some cases, the DOA analysis may include any suitable algorithm for analyzing a surrounding acoustic environment in which the audio system 200 is located.”
Harrison (U.S. 20190313054)
Sporer (U.S. 20220159403)

Published Application:
[0220] Systems and Methods for Selectively Attenuating a Voice
[0221] In some embodiments, audio and video may be captured and processed by a hearing interface device. For example, the hearing interface device may capture and selectively amplify sounds emanating from a looking direction of the user over other sounds. By way of another example, speech by a person the user is facing and talking with, may be amplified, while audio of the user and/or audio emanating from other directions may be attenuated or cancelled.
[0222] Audio intensity decreases proportionally to a square of the distance traveled by the audio between the audio source and the receiver, which may be a human ear, a microphone, or the like. Thus, because a microphone of a wearable device is generally closer to the user's mouth than to other audio sources and in particular to the mouth of another speaker, the user's voice is captured at higher intensity than the other voices. For example, if a user wearing a device is standing at a distance of 0.5 m from another user, and both are speaking at a similar volume, the user's speech will be captured at an intensity four times higher than the other speaker's speech because of the nearness of the user's mouth to the microphone of the wearable device.
[0223] Moreover, the user's speech may be less important for the user to hear, since the user is aware of what he or she said. Therefore, if sounds captured by the hearing interface device are determined to be made by the user, for example speech, those sounds may be attenuated relative to other sounds. The source of an audio signal may be determined in a variety of ways, for example, by using a directional microphone; using two or more microphones and determining the source of the audio signal in accordance with the time difference at which the audio signal is received by each microphone; processing images captured by the wearable device and matching the captured sound with mouth or face movements of a speaker; matching the audio with a voiceprint of the user; detecting the direction of the user's mouth by identifying the user's chin or another fact or body part in captured images, or the like. Then, if it is determined that the source of a sound is the user that sound may be attenuated.
[0224] In addition, disclosed systems and methods may relate to a-priori determining the direction from which sound by the user is received, and attenuating and/or silencing all audio coming from that direction. For example, the direction from which the user's speech is received may be graphically predefined by the user, wherein the user indicates the direction on a user interface, displayed for example on a display device of a paired device such as a smartphone. The user may also indicate the angle in a numerical manner relatively to the device, for example between −90° and +90°, or as a spatial angle.
[0225] Additional disclosed systems and methods may relate to calibrating the wearable device to recognize the direction from which the user's speech is received such that audio coming from that direction may be attenuated or silenced. The calibration may be performed in response to a predefined vocal command that may indicate the user's wish to attenuate his own voice. Thus, the user may say a predetermined word indicating that the following words constitute a command to the device, followed by the predefined command. The predefined command may include, for example, phrases such as “device please, attenuate my voice”, “please device, silence myself”, or the like. After detecting the predefined command, the wearable device may determine the direction the voice is coming from, associate this direction with the user's speech, and attenuate and/or silence further audio coming from this direction. In some embodiments, any other command by the user may serve this purpose as well. The direction from which the user's voice is received by the wearable device may be determined using beam forming.
[0226] It is also contemplated that additional processing differences may occur depending on whether the audio originates from the user or from some other source of sound. For example, speech by other speakers may be further processed for enhancing its clarity, for example reducing the word rate by slowing down the words and making the gaps between the words shorter, changing pitch, enhancing the diction or the like. Such processing may be omitted regarding speech by the user, since the user knows what he/she said, thereby reducing delays by saving processing time, and reducing the energy consumption of the device.
[0227] In some embodiments, the level of attenuation of the user's speech may be set by the user. For example, some users may want to hear themselves louder, some may want to hear themselves similar to the speech by other speakers, some may want minimal volume of their own voice, or the like. The user may set the level using, for example, a user interface displayed on a device coupled to the user's device such as a mobile phone, a laptop computer or the like.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARIBA SIRJANI whose telephone number is (571)270-1499. The examiner can normally be reached on 9 to 5, M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Desir can be reached on 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Fariba Sirjani/
Primary Examiner, Art Unit 2659