DETAILED ACTION
1.	This communication is in response to the Amendments and Arguments filed on 8/5/2021. Claims 1-25 are pending and have been examined.
Allowable Subject Matter
2.	Claim 11 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. Also the examiner’s questions stated in Claim 1 (and section 3, Response) must be satisfactorily addressed.
Response to Amendments and Arguments
3.	 Applicant's arguments with respect to claim rejections under 35 USC 103 have been fully considered, but they are not persuasive. 
	First, in response to the examiner’s request in the previous Office action, the applicant claimed that specification [0038], [0043], [0099], rather than [0073] are relied on for specifying the spectral mask composition. However, the examiner found no sufficient descriptions in those sections. [0073] is still the only section that provides brief description “the audio filtering application generates a composite or compound spectral mask by additive superposition of the spectrograms of each participant in the conversation.”
	Furthermore, the applicant argues that the references do not teach the following:
(1) “receiving a plurality of spectral masks, each spectral mask in the plurality corresponds to a respective participant in a selected group of participants included in a conversation.” Note that SAVANT teaches: [0077] “a reference speech characteristics is stored for each of a number of different individual speakers, or categories of speakers. An associated filter selection is also stored according to each of the individual speakers, or categories of speakers <read on a plurality of filters and the corresponding spectral masks>. Thus, once a determination is made associating a sampled audio speech signal with a respective one of the one or more different individual speakers, or categories of speakers, the filter selector 304 selects an appropriate filter <read on indirectly receiving spectral masks> based on the filter response associated with the identified speakers, or category of speakers.” The applicant is suggested to further specify where and how are the spectral masks received.
 (2) “generating a composite spectral mask by additive superposition of the plurality of spectral masks.” The applicant is requested to clarify this limitation. Specification [0073] “the audio filtering application generates a composite or compound spectral mask by additive superposition of the spectrograms of each participant in the conversation.” Note that “spectrogram” contains various speech features, including formants, pitch contour, voicing, power, and so on, so what features are additively superpositioned?  
(3) “applying the composite spectral mask to sound captured by a microphone to filter out sounds that do not match the composite spectral mask and amplifying remaining sounds that    match the composite spectral mask.” The applicant must clarify how the composite spectral mask, generated by additively superposing a plurality of spectral masks, can be used to “filter out sounds that do not match the composite spectral mask and amplifying remaining sounds that match the composite spectral mask.” Note that the composite spectral mask thus formed would be very different from each individual component mask. It is therefore very questionable that this would work at all, with no description in the Specification.    
Claim Rejections - 35 USC § 103
4.	Claims 1-2, 5-10, 12-13, 16-20, 23-25 are rejected under 35 U.S.C. 103 as being unpatentable over Savant (US 20090287489; hereinafter SAVANT) in view of Jiang, et al. (US .
As per claim 1, SAVANT (Title: Speech processing for plurality of users) discloses “A method for isolating and amplifying a conversation between selected participants (SAVANT, [0088], if a call <read on conversation between ‘selected’ participants> is placed or received to another remote user previously determined to have a deep voice, the receive audio processor is preset a received audio filter selection <read on isolating and amplifying> that provides suitable quality and intelligibility for the individual associated with the particular number. If a different individual happens to answer and engage in a conversation, the receive audio filter can be reconfigured as described above), the method comprising:
receiving a plurality of spectral masks, each spectral mask in the plurality corresponds to a respective participant in a selected group of participants included in a conversation (SAVANT, [Abstract], an audio processing circuit that is adaptable based on a pattern of the speaker's voice to provide improved audio quality and intelligibility .. to receive a voice signal from an individual speaker, to determine a pattern associated with the speaker's voice, and to adjust a filter based on the determined pattern; [0077], a reference speech characteristics is stored for each of a number of different individual speakers, or categories of speakers. An associated filter selection is also stored according to each of the individual speakers, or categories of speakers <read on a plurality of filters and the corresponding spectral masks>. Thus, once a determination is made associating a sampled audio speech signal with a respective one of the one or more different individual speakers, or categories of speakers, the filter selector 304 selects an appropriate filter <read on indirectly receiving spectral masks> based on the filter response associated with the identified speakers, or category of speakers); 
[ generating a composite spectral mask by additive superposition of the plurality of spectral masks ] (Examiner’s Note: The applicant is requested to clarify this limitation. Specification [0073] ‘the audio filtering application generates a composite or compound spectral mask by additive superposition of the spectrograms of each participant in the conversation.’ Note that ‘spectrogram’ contains various speech features, including formants, pitch contour, voicing, power, and so on, so what features are additively superpositioned?); and  
applying the composite spectral mask to sound captured by a microphone to filter out sounds that do not match the composite spectral mask and amplifying remaining sounds that    match the composite spectral mask (SAVANT, [0053], The transmit audio amplifier 206 receives the input audio signal from the microphone 202 and amplifies it as may be necessary; [0088], if a call is placed or received to another remote user previously determined to have a deep voice, the receive audio processor is preset a received audio filter selection that provides suitable quality and intelligibility for the individual associated with the particular number. Examiner’s Note: The applicant must clarify how the composite spectral mask generated by additively superposing a plurality of spectral masks, each corresponding to an individual speaker, can be used to ‘filter out sounds that do not match the composite spectral mask and amplifying remaining sounds that match the composite spectral mask.’ There is no description at all in the Specification and it is dubious that this would work at all).”  
generating a composite spectral mask ..” However, with the Examiner’s Note stated above, this feature is taught by JIANG (Title: Methods and apparatus for processing image data for use in tissue characterization). 
In the same field of endeavor, JIANG teaches: [0593] “In FIG. 73, step 130 uses the arbitrated spectra, BB and F, to determine four spectral masks … Step 1422 of FIG. 73 combines these masks to produce a hard “indeterminate” mask, a soft “indeterminate” mask, a mask identifying necrotic regions, and a mask identifying healthy (NED) regions. In the embodiment of FIG. 73, steps 1424 and 1426 apply the necrotic mask and hard “indeterminate” mask, respectively, prior to using the broadband spectral data in the statistical classifiers 134, while steps 1428 and 1430 apply the soft “indeterminate” mask and the NED mask after the statistical classification step 134.”
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of JIANG in the system taught by SAVANT to generate various combined/composite spectral masks using component masks for various signal processing/filtering applications.
SAVANT in view of JIANG does not expressly disclose “additive superposition of the plurality of spectral masks ..” However, with the Examiner’s Note stated above, this feature is taught by BUES (Title: Display system having circadian effect on humans). 
In the same field of endeavor, BUES teaches: “the illumination unit has a plurality of individual light sources .. each having different light spectrums .. (the predetermined light spectrum can then be generated by additive superposition with the individual light spectra of these light sources)).”

As per Claim 2 (dependent on claim 1), SAVANT in view of JIANG and BUES further discloses “transmitting the amplified remaining sounds that match the composite spectral mask to an audio output device corresponding to a participant of the conversation (SAVANT, [0053], The transmit audio filter 208 may be a low pass, a high pass, a band pass, or a combination of one or more of these filters for filtering the amplified transmit speech signal. The transmit audio amplifier 206 and transmit audio filter 208 function together to precondition the signal by reducing noise and level balancing prior to analog-to-digital conversion <read on a ready mechanism to transmit the audio signal to any destination>. Also see Claim 1 rejections).”  
As per Claim 5 (dependent on claim 2), SAVANT in view of JIANG and BUES further discloses “sending a voice sample (Examiner’s Note: ONE voice sample?) of the participant of the conversation to a deep neural network server of a cloud environment for generating a spectral mask personalized to the participant; and receiving the spectral mask personalized to the participant from the deep neural network server (SAVANT, [Abstract], receive a voice signal from an individual speaker, to determine a pattern associated with the speaker's voice <read on spectral mask>; [0067], Various technologies can be used to process voice patterns, such as frequency estimation .. neural networks; [0028], voice and/or data communications .. a wireless transceiver .. servers <read on a ready mechanism for transmitting and receiving any information. Also read on cloud for server>).”  
Claim 6 (dependent on claim 5), SAVANT in view of JIANG and BUES further discloses “combining the spectral mask personalized to the participant with the plurality of spectral masks corresponding to the selected group of participants included in the conversation to form the composite spectral mask; filtering, using the composite spectral mask, incoming audio signals to allow only the conversation between the selected group of participants and the participant to remain in an audio signal; and transmitting the audio signal that includes only the conversation between the selected group of participants and the participant to the audio output device (see Claims 1 and 2 rejections).”  
As per Claim 7 (dependent on claim 6), SAVANT in view of JIANG and BUES further discloses “sharing the spectral mask personalized to the participant and the plurality of spectral masks corresponding to the selected group of participants among mobile devices corresponding to the participant and the selected group of participants so that each mobile device generates its own composite spectral mask for filtering incoming audio signals to each mobile device (SAVANT, [Abstract], determine a pattern associated with the speaker's voice <read on spectral mask>; [0028], voice and/or data communications .. a wireless transceiver <with ready mechanisms for determining spectral masks and for transmitting/receiving any information, it is a system design choice to share all spectral masks for generating the composite spectral mask for any mobile device>).” 
As per Claim 8 (dependent on claim 5), SAVANT in view of JIANG and BUES further discloses “wherein the spectral mask personalized to the participant of the conversation is excluded from the composite spectral mask based on preference of the participant (SAVANT, [Abstract], determine a pattern associated with the speaker's voice <with a ready mechanism for .” 
As per Claim 9 (dependent on claim 1), SAVANT in view of JIANG and BUES further discloses “wherein the selected group of participants indicates who is authorized to participate in the conversation (SAVANT, [0088], If a different individual happens to answer and engage in a conversation .. <which persons are authorized to participate in a conversation is a system design choice>. Also see Claim 1 rejections for ‘selected group of participants.’).”  
As per Claim 10 (dependent on claim 1), SAVANT in view of JIANG and BUES further discloses “wherein the method is performed by a mobile device (SAVANT, [Abstract], A mobile communication device).”  
Claims 12-13, 16-18 (similar in scope to claims 1-2, 5-7) are rejected under the same rationale as applied above for claims 1-2, 5-7. SAVANT also teaches: [0029] “The host processor .. may be configured to communicate with each other using interfaces 106 such as one or more universal serial bus (USB) interfaces bus ..”
Claim 19-20, 23-25 (similar in scope to claims 1-2, 5-7) are rejected under the same rationale as applied above for claims 1-2, 5-7. SAVANT also teaches: [0029] “The host processor .. may be configured to communicate with each other using interfaces 106 such as one or more universal serial bus (USB) interfaces bus ..” Specification [0020] “A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se ..”
s 3-4, 14-15, 21-22 are rejected under 35 U.S.C. 103 as being unpatentable over SAVANT in view of JIANG and BUES, and further in view of Mozer, et al. (US 20090204410; hereinafter MOZER).
As per Claim 3 (dependent on claim 2), SAVANT in view of JIANG and BUES further discloses “[ performing real-time captioning of the conversation between the selected participants; and displaying the real-time captioning of the conversation to the participant of the conversation on one of a mobile device screen or smart glasses with the audio output device attached for reading text of the conversation as well as listening to the conversation ].” 
SAVANT in view of JIANG and BUES does not expressly disclose “performing real-time captioning of the conversation between the selected participants; and displaying the real-time captioning .. the audio output device attached for reading text of the conversation as well as listening to the conversation.” However, this feature is taught by MOZER (Title: Voice interface and search for electronic devices including bluetooth headsets and remote systems). 
In the same field of endeavor, MOZER teaches: [Abstract] “improving the interaction between a user and a small electronic device such as a Bluetooth headset .. utilizing speech recognizers and synthesizers in series to provide simple, reliable, and hands-free interfaces with users” and [0027] “the small electronic device with a voice user interface .. are equipped with audio speakers and speech synthesis software such that one or both of such devices may receive and provide audio information .. which may provide the status of the interaction or which may be a telephone conversation with a remote person.” Note that speech recognition reads on text transcription or captioning. Also note that for “conversation” transcription/captioning, speech recognition must be performed in real-time. MOZER also teaches: [0109] “small electronic device 1 may have .. a small liquid crystal display ..” which reads on a ready mechanism for displaying any information.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of MOZER in the system taught by SAVANT, JIANG and BUES to provide speech captioning via speech recognition and text-to-speech for audio output to the user.
 As per Claim 4 (dependent on claim 2), SAVANT in view of JIANG and BUES further discloses “wherein [ the audio output device corresponding to the participant of the conversation and for which the amplified remaining sounds are transmitted thereto is adjacent to an ear of the participant ] of the conversation (SAVANT, [0088], If a different individual happens to answer and engage in a conversation ..).”
SAVANT in view of JIANG and BUES does not expressly disclose “the audio output device corresponding to the participant of the conversation and for which the amplified remaining sounds are transmitted thereto is adjacent to an ear of the participant ..” However, this feature is taught by MOZER (Title: Voice interface and search for electronic devices including bluetooth headsets and remote systems). 
In the same field of endeavor, MOZER teaches: [Abstract] “improving the interaction between a user and a small electronic device such as a Bluetooth headset <where headset reads on ‘the audio output device .. is adjacent to an ear of the participant’ > .. utilizing speech recognizers and synthesizers in series to provide simple, reliable, and hands-free interfaces with users.” 

Claims 14-15 (similar in scope to claims 3-4) are rejected under the same rationale as applied above for claims 3-4. SAVANT also teaches: [0029] “The host processor .. may be configured to communicate with each other using interfaces 106 such as one or more universal serial bus (USB) interfaces bus ..”
Claim 21-22 (similar in scope to claims 3-4) are rejected under the same rationale as applied above for claims 3-4. SAVANT also teaches: [0029] “The host processor .. may be configured to communicate with each other using interfaces 106 such as one or more universal serial bus (USB) interfaces bus ..” Specification [0020] “A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se ..”
				Conclusion
6.	THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).   
	A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FENG-TZER TZENG whose telephone number is (571)272-4609. The examiner can normally be reached on M-F (8:30-5:00). The fax phone number where this application or proceeding is assigned is 571-273-4609.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir (SPE) can be reached on 571-272-7799. 
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/FENG-TZER TZENG/		9/8/2021Primary Examiner, Art Unit 2659