DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Office Action is in response to the applicants’ amendment filed on July 15, 2022 and wherein the Applicant amended claims 1, 5, 10, 21, 23, 24, 26, 29. 
In virtue of this communication, claims 1-30 are currently pending in this Office Action.
The Examiner appreciates the explanation of the amendment and analyses of the prior arts, and however, although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993) and MPEP 2145.

Claim Interpretation
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action with respect to claims 29-30 and based on an approval from the Applicant and see paragraph 3 of page 11 in Remarks filed on July 15, 2022.  

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention..

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1, 10, 24, 26, 29 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Tsingos et al. (WO 2019204214 A1, IDS, equivalent US 20210118452 A1 hereinafter Tsingos and applied).
Claim 1: Tsingos teaches an analogous field of endeavor by disclosing a device (title and abstract, ln 1-14 and an apparatus in fig. 8) comprising: 
a memory is disclosed and configured (memory in fig. 8, para 116-118) to store directivity data (radiation pattern G(θi, ϕi, ω) in equation 1, and to be  encoded as directivity data of an audio object and transmitted to an decoder, para 61-63 and ) of one or more audio sources corresponding to one or more input audio signals (multiple objects as inputs such as violin, trumpet, drum, or bell, para 37; spatial representations stored for further processing, para 71, wherein the radiation pattern is as a spatial representation, para 64); and 
one or more processors (multi-chip processor, DSP, FPGA, ASIC, etc., included in control system 815 in fig. 8, para 116) configured to: 
obtain the directivity data of a particular audio source of the one or more audio sources based on a type of the particular audio source (object type metadata 330 of one or more sound sources being “trumpet”, and/or “violin”, etc., and the audio object type metadata used to indicate corresponding radiation pattern G(θi, ϕi, ω), para 95, i.e., the radiation pattern is obtained according to the audio object type metadata); 
determine one or more equalizer settings based at least in part on the directivity data (an equalization is determined relative to a direction of the captured direction of the core audio signal or sound source for determining a sound target level, para 52); and 
generate, based on the equalizer settings, one or more output audio signals that correspond to a psychoacoustic enhanced version of the one or more input audio signals (rotated spherical coefficients that have been equalized based on the target level determination, para 52; realization of real-world radiation patterns for ).
Claim 24 has been analyzed and rejected according to claim 1 above.
Claim 26 has been analyzed and rejected according to claims 1 and 24 above and  Tsingos further teaches a non-transitory computer-readable medium (non-transitory media such as RAM, ROM, etc., para 11-12) storing instructions (having software stored in the non-transitory media, para 11-12) that, when executed by one or more processors, cause the one or more processors to implement method of claim 24 (executed by one or more components of a control system, including ASIC, FPGA, DSP, etc., as multi-chip processor, para 12).
Claim 29 has been analyzed and rejected according to claim 1 above.
Claim 10: Tsingos further teaches, according to claim 1 above, 
wherein the one or more processors are further configured to process a context detector input audio signal (data set in an encoded format to be transmitted to a decoding system in fig. 1C, including metadata and core audio data as data set 345 in fig. 3, para 89) that is based on the one or more input audio signals (audio object signals from different sound sources such as trumpet, violin, drum, bell, etc., at an audio encoder in fig. 1B, para 37) to generate context data of the one or more audio sources (e.g., via the decoder in fig. 1C to extract audio object metadata 151 and audio core data 152, etc., in fig. 1C), wherein the context data of the particular audio source of the one or more audio sources indicates an orientation of the particular audio source (represented by the retrieved radiation pattern represented by G(θi, ϕi, ω), para 95), a distance of the particular audio source (the metadata 151 further includes one or more objects distance and direction relative to a 6DoF space, information regarding source’s relative rotation, distance and direction in a 6DoF space, para 87), the type of the particular audio source (“I am a violin”, or “I am a trumpet”, etc., para 95), or a combination thereof (defined by data set 345 in fig. 3), wherein the type of the particular audio source includes one or more of a person, an animal, a bird, a vehicle, a musical instrument, or a combination thereof (including violin, trumpet, drum, or bell, etc., para 37, 95).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5, 10-15, 22, 24-26, 29 are rejected under 35 U.S.C. 103 as being unpatentable over Johnson et al (US 20170099458 A1, hereinafter Johnson) and in view of reference Tsingos (above).
Claim 1: Johnson teaches a device (title and abstract, ln 1-16, a near-end system in fig. 2A and a far-end system in fig. 2B) comprising:
a memory (a table in para 61, used to set the directivity index representing the percentage of a video frame occupied by the user 5A, and generated by the audio stream, para 60-62) configured to store directivity data of one or more audio sources (directivity index stored in the table, e.g., user 5A) corresponding to one or more input audio signals (the user 5A can be one or more users, para 46 and the table used for setting the directivity index, para 61);
one or more processors (including FPGA, ASIC, or a set of hardware logic structures, including ALUs, etc. to conduct the operations and functions of the near-end system 2, para 31) configured to:
determine one or more equalizer settings based at least in part on the directivity data (equalization settings included in the target audio parameters and set upon the percentage of the video frame occupied by the user 5A, and represented by the directivity index through the table of para 61); and
generate, based on the equalizer settings, one or more output audio signals that correspond to a psychoacoustic enhanced version of the one or more input audio signals (applying the target audio parameters to the system to produce sound more closely reflecting the orientation and positioning of the user 5A, para 58, e.g., using one or more linear filters, para 66, produced sound imitates the intimacy or social characteristics of the recorded user’s speech, para 75, i.e., psychoacoustic enhanced version of the user’s speeches inherently).
However, Johnson does not explicitly teach obtaining the directivity data of a particular audio source of the one or more audio sources based on a type of the particular audio source. 
Tsingos teaches an analogous field of endeavor by disclosing a device (title and abstract, ln 1-14 and an apparatus in fig. 8) comprising: 
a memory is disclosed and configured (memory in fig. 8, para 116-118) to store directivity data (radiation pattern G(θi, ϕi, ω) in equation 1, and to be  encoded as directivity data of an audio object and transmitted to an decoder, para 61-63 and ) of one or more audio sources corresponding to one or more input audio signals (multiple objects as inputs such as violin, trumpet, drum, or bell, para 37; spatial representations stored for further processing, para 71, wherein the radiation pattern is as a spatial representation, para 64); and 
one or more processors (multi-chip processor, DSP, FPGA, ASIC, etc., included in control system 815 in fig. 8, para 116) configured to: 
obtain the directivity data of a particular audio source of the one or more audio sources based on a type of the particular audio source (object type metadata 330 of one or more sound sources being “trumpet”, and/or “violin”, etc., and the audio object type metadata used to indicate corresponding radiation pattern G(θi, ϕi, ω), para 95, i.e., the radiation pattern is obtained according to the audio object type metadata); 
determine one or more equalizer settings based at least in part on the directivity data (an equalization is determined relative to a direction of the captured direction of the core audio signal or sound source for determining a sound target level, para 52); and 
generate, based on the equalizer settings, one or more output audio signals that correspond to a psychoacoustic enhanced version of the one or more input audio signals (rotated spherical coefficients that have been equalized based on the target level determination, para 52) for benefits of presenting a better and more accurate sound source for sound field rendering and sound effect improvement (para 3-4).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied wherein obtaining the directivity data of the  particular audio source of the one or more audio sources based on a type of the particular audio source, as taught by Tsingos, to the directivity data of the particular audio source in the device, as taught by Johnson, for the benefits discussed above.
Claim 24 has been analyzed and rejected according to claim 1 above.
Claim 26 has been analyzed and rejected according to claims 1 and 24 above and the combination of Johnson and Tsingos further teaches a non-transitory computer-readable medium (Johnson, memory unit 8A at the near-end system in 2A, and Tsingos, non-transitory media such as RAM, ROM, etc., para 11-12) storing instructions (Johnson, storing processing components of data for conducting the operations and functions, para 31 and Tsingos, having software stored in the non-transitory media) that, when executed by one or more processors, cause the one or more processors to implement method of claim 24 (Johnson, FPGA, ASIC, or general purpose microprocessor, or set of hardware logic structures, including filters, ALUs, state machines, etc., para 31, and Tsingos, executed by one or more components of a control system, including ASIC, FPGA, DSP, etc., as multi-chip processor, para 12).
Claim 29 has been analyzed and rejected according to claim 1 above.
Claim 2: the combination of Johnson and Tsingos further teaches, according to claim 1 above, wherein the psychoacoustic enhanced version approximates a frequency response of the one or more audio sources at a zoom orientation and a zoom distance associated with an audio zoom operation (Johnson, zoom length and zoom angle are adjusted by the camera 11A through zoom setting of the video camera 11A to the speaking individual 5A as the claimed zoom target, and para 34-35, and determine the percentage of the speaking individual in the video frame, para 55-56; e.g., far-away speaking individual’s speech is low-pass filtered, para 66).
Claim 3 has been analyzed and rejected according to claim 2 above and the combination of Johnson and Tsingos further teaches receive user input indicating a zoom target of an audio zoom operation (Johnson, the zooming performed manually input by the near-end user 5A, para 35); and determine the one or more equalizer settings based on the zoom target (Johnson, zooming performed to determine the percentage or ratio of the user 5A in the video frame, para 55-56 and thus, determining the equalizer setting, the discussion in claim 1 above).
Claim 4: the combination of Johnson and Tsingos further teaches, according to claim 3 above, wherein the zoom target includes a zoom location, a zoom distance, a zoom orientation, a selection of at least one of the one or more audio sources, or a combination thereof (Johnson, using the zooming to focus lenses on the speaker individual 5A, including apparent angle of view of video captured, and focal length, para 34-35, and the discussion in claim 3 above).
Claim 5: the combination of Johnson and Tsingos further teaches, according to claim 1 above, wherein the directivity data of the particular audio source of the one or more audio sources (Johnson, directivity index DI in fig. 8, e.g., the characteristics of the directivity index of the speaking individual 5A imitates the more intimate speaking style the user 5A is likely intending by siting close to the video camera or zooming in such that video frame is largely occupied by the user 5A, para 60-61 and Tsingos, radiation pattern of the sound source, such as trump, violin, drum, bell, etc., para 37) indicates orientation and distance frequency response characteristics of the particular audio source (Tsingos, the radiation pattern represented by G(θi, ϕi, ω) depending on frequency ω in figs. 2A-2C; in addition, directivity index DI is inherently indicating orientation and distance frequency response characteristics of a sound source, see https://www.princeton.edu/3D3A/Publications/ Tylka_3D3A_DICalculation.pdf, page 2, formula 1 in section 2 The Directivity Index, a copy has been listed in the attached PTO-892 and attached).
Claim 10: the combination of Johnson and Tsingos further teaches, according to claim 1 above, 
wherein the one or more processors are further configured to process a context detector input audio signal (Johnson, via microphone array 16A in fig. 3B, and Tsingos, data set in an encoded format to be transmitted to a decoding system in fig. 1C, including metadata and core audio data as data set 345 in fig. 3, para 89) that is based on the one or more input audio signals (Tsingos, audio object signals from different sound sources such as trumpet, violin, drum, bell, etc., at an audio encoder in fig. 1B, para 37) to generate context data of the one or more audio sources (Tsingos, e.g., via the decoder in fig. 1C to extract audio object metadata 151 and audio core data 152, etc., in fig. 1C), wherein the context data of the particular audio source of the one or more audio sources indicates an orientation of the particular audio source (Tsingos, represented by the retrieved radiation pattern represented by G(θi, ϕi, ω), para 95), a distance of the particular audio source (Tsingos, the metadata 151 further includes one or more objects distance and direction relative to a 6DoF space, information regarding source’s relative rotation, distance and direction in a 6DoF space, para 87), the type of the particular audio source (Tsingos, “I am a violin”, or “I am a trumpet”, etc., para 95), or a combination thereof (Tsingos, defined by data set 345 in fig. 3), wherein the type of the particular audio source includes one or more of a person, an animal, a bird, a vehicle, a musical instrument, or a combination thereof (Tsingos, including violin, trumpet, drum, or bell, etc., para 37, 95).
Claim 11: the combination of Johnson and Tsingos,  further teaches, according to claim 10 above, wherein the one or more processors are further configured to generate the context data based at least in part on image data that is associated with the one or more input audio signals (Johnson, using the facial recognition algorithms to determine the position of the user or speaker 5A, and in a video frame by extracting landmarks or features from the face of the user 5A, para 56, the audio signals of the speaker 5A by the microphone array 16A in fig. 3B and Tsingos, context of interactive environments such as video game, para 3).
Claim 12: the combination of Johnson and Tsingos further teaches, according to claim 11 above, wherein the one or more processors are further configured to retrieve the image data and the one or more input audio signals from memory (Johnson, through the memory unit 8A, including video conference unit 9A in fig. 2A, provides video conferencing functionality to the near-end user 5A, para 31, including conducting a video conference call with the graphical user interface, para 48, and thus, retrieving image data and the audio data from the memory for processing by the processor 7A is inherency).
Claim 13: the combination of Johnson and Tsingos further teaches, according to claim 11 above, the device further comprising one or more microphones coupled to the one or more processors and configured to generate the one or more input audio signals (Johnson, microphone arrays 17A connected to the processor 7A through Audio Codec 21A in fig. 2A and Tsingos, microphones used to pickup sounds and generate channel signals, para 38).
Claim 14: the combination of Johnson and Skoglund further teaches, according to claim 11 above, the device further comprising a camera coupled to the one or more processors and configured to generate the image data (Johnson, video camera 11A to Vdeo Codec 15A and then the processor 7A).
Claim 15: the combination of Johnson and Tsingos further teaches, according to claim 11 above, wherein the one or more processors are further configured to perform audio source recognition based on the one or more input audio signals, the image data, or both (Johnson, using the facial recognition to identify the user 5A, and other analyze relative positions, para 56), to identify the type of the particular audio source of the one or more audio sources (Johnson, using the facial recognition algorithms to determine the position of the user 5A in a video frame by extracting landmarks or features from the face of the user 5A, para 56, inherently identified the type of the sound source being human).
Claim 22: the combination of Johnson and Tsingos further teaches, according to claim 1 above, wherein the one or more processors are further configured to obtain the directivity data from the memory, another device, or both (Johnson, memory unit 8A in fig. 2A, and table storing the directivity index of para 61, and thus, the directivity data is inherently retrieved from the memory for the further audio stream adjustment at step 27 in fig. 4, and Tsingos, the spatial representation as the radiation pattern is stored for further processing, para 64, para 71; at decoding, rendering the core audio signal based on the audio object metadata and the decoded radiation pattern at step 430 in fig. 4).
Claim 25 has been analyzed and rejected according to claims 24, 3-4 above.

Claims 6-9, 27-28 are rejected under 35 U.S.C. 103 as being unpatentable over Johnson (above) and in view of references Tsingos (above) and Christoph (US 20100189275 A1).
Claim 6: the combination of Johnson and Tsingos further teaches, according to claim 1 above, wherein the one or more processors are further configured to: generating the beam patterns (by the microphone array and figs. 6A-6C, para 52), although Tsingos teaches using a beamforming for construction of a desired directivity pattern (para 108), except performing beamforming on the one or more input audio signals to generate one or more beamformed audio signals; and process an equalizer input audio signal that is based on the one or more beamformed audio signals to generate the one or more output audio signals.
Christoph teaches an analogous field of endeavor by disclosing a device (title and abstract, ln 1-11 and a communication system in fig. 1) and wherein performing beamforming is disclosed on  one or more input audio signals (BF+NR 9-12 with respect to microphone signals from microphones 1a-1b and 3a-3b in fig. 1) to generate one or more beamformed audio signals (output signals to switch control front and rear 13, 14 in fig. 1); and process an equalizer input audio signal (output signals from elements AEC 17/18 in fig. 1) that is based on the one or more beamformed audio signals (the output signals from elements 13/14 to the AEC 17/18 in fig. 1) to generate the one or more output audio signals (the output signals from the elements dynamic volume control DVC and dynamic equalization control DEC 19/20 and added to the speaker signals for the speakers 5-8 in fig. 1) for benefits of achieving a high audio quality and  improvement of voice intelligibility in a high noise acoustic environment including multiple echoes picked up from multiple microphones (para 3-7).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied the performing of the beamforming on the one or more input audio signals to generate the one or more beamformed audio signals; and process the equalizer input audio signal that is based on the one or more beamformed audio signals to generate the one or more output audio signals, as taught by Christoph, to functions implemented by the one or more processors in the device, as taught by the combination of Johnson and Tsingos, for the benefits discussed above.
Claim 7: the combination of Johnson, Tsingos, and Christoph further teaches, according to claim 1, wherein the one or more processors are further configured to: 
identify speech in a speech detection input audio signal that is based on the one or more input audio signals to generate one or more speech audio signals (Christoph, via VAD so that the current audio signals from the microphones 3a/3b and/or 4a/4b, having the majority of voice signals, and the NR is reduced for use, para 33); and process an equalizer input audio signal that is based on the one or more speech audio signals to generate the one or more output audio signals (Christoph, the further processing including AEC 17/18, and DVC and DEC 19/20 in fig. 1 for the similar benefits discussed in claim 6 above).
Claim 8: the combination of Johnson, Tsingos, and Christoph further teaches, according to claim 1, wherein the one or more processors are further configured to: 
apply, based on a zoom target (Johnson, e.g., zoom target on the speaker 5A, with zoom setting from digital zooming device 13A or the mechanical zoom lens 12A, para 55), one or more gains to a gain adjuster input audio signal that is based on the one or more input audio signals to generate a gain adjusted audio signal (Johnson, a higher level of the loudness is set if larger amount of a video frame, para 63, e.g., according to the table of para 64 and Christoph, via dynamic volume control DVC 1920 in fig. 1 or VAD for detecting a voice in the input audio signals and reduce the use of NR if voice is detected, i.e., the signal gain is adjusted, para 31); and process an equalizer input audio signal that is based on the gain adjusted audio signal to generate the one or more output audio signals (Johnson, corresponding loudness levels are used at operation 25, para 65-67, and Christoph, the DVC/DEC elements 19-20 in fig. 1, for the similar benefits discussed in claim 6 above).
Claim 9: the combination of Johnson, Tsingos, and Christoph further teaches, according to claim 1, wherein the one or more processors are further configured to: perform noise suppression on a noise suppressor input audio signal that is based on the one or more input audio signals to generate a noise suppressed audio signal (Christoph, via BF+NR 9-12 in fig. 1); and process an equalizer input audio signal that is based on the noise suppressed audio signal to generate the one or more output audio signals (Christoph, via DVC/DEC 19-20 to process the output signals from the elements BF+NR 9-12 in fig. 1).
Claim 27 has been analyzed and rejected according to claims 26, 6 above.
Claim 28 has been analyzed and rejected according to claims 26, 7 above.

Claims 16-20, 30 are rejected under 35 U.S.C. 103 as being unpatentable over Johnson (above) and in view of reference Tsingos (above) and Skoglund et al  (US 20190132685 A1, hereinafter Skoglund).
Claim 16: the combination of Johnson and Tsingos teaches all the elements of claim 16, according to claim 11 above, including wherein the one or more processors are further configured to:
perform audio source recognition based on the one or more input audio signals, the image data, or both, to determine the particular audio source of the one or more audio sources (Johnson, using the facial recognition and shape of the eyes, nose, cheekbones, and jaw of the user 5A, to determine the sound source 5A in the video frame, para 56 and applying the target audio parameters to reflect the orientation and positioning of the user 5A, para 58, i.e., human speaker recognized as the sound source inherently); and
perform image analysis on the image data to determine the orientation of the particular audio source (Johnson, using the facial recognition and shape of the eyes, nose, cheekbones, and jaw of the user 5A, to determine the sound source 5A in the video frame, para 56 and Skoglund, by using camera, performing face tracking algorithms to give a position of the faces by the scene camera, including a direction to or a location of such sound source being estimated, para 38).
However, the combination of Johnson and Tsingos does not explicitly teach that performing audio source recognition based on one or more input audio signals.
Skoglund teaches an analogous field of endeavor by disclosing a device (title and abstract, ln 1-20 and a hearing system in fig. 3) and wherein a processor is disclosed (microprocessors DSPs, FPGAs, etc., para 110) and configured to:
process a context detector input audio signal (output signal such as time difference τij between the microphone signals Mi and Mj, para 172) that is based on the one or more input audio signals to generate context data of the one or more audio sources (related to the sound source S and τij being calculated according to at least two electric input signals, para 172 and fig. 3), wherein the context data of a particular audio source of the one or more audio sources indicates an orientation of the particular audio source, a distance of the particular audio source, a type of the particular audio source, or a combination thereof (τij is caused by the sound source located different distances to microphone i and the microphone j, and thus, indicative of angle of the sound source relative to the listener inherently, para 172); and
performing audio source recognition based on one or more input audio signals (via the PRO2, generating data indicative of a location of said localized sound source S relative to the user, including the direction of arrival, i.e., directivity data, para 173, i.e., recognized sound source S via the determined position and the DOA inherently) for benefits of achieving an improvement of localizing a sound source more accurately and realistically by enabling a 3D DOA recognition of the sound source (para 2-5, 7).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied the one or more processors configured to perform the audio source recognition based on one or more input audio signals, as taught by Skoglund, to the one or more processors in the device, as taught by the combination of Johnson and Tsingos, for the benefits discussed above.
Claim 17: the combination of Johnson, Tsingos, and Skoglund further teaches, according to claim 11 above, wherein the one or more processors are further configured to:
perform audio source recognition based on the one or more input audio signals, the image data, or both, to determine the particular audio source of the one or more audio sources (Johnson, using the facial recognition and shape of the eyes, nose, cheekbones, and jaw of the user 5A, to determine the sound source 5A in the video frame, para 56 and applying the target audio parameters to reflect the orientation and positioning of the user 5A, para 58, and Skoglund, using the PRO1 and PRO2 to identify the DOA of the sound source by processing the microphone signals, para 172-173, and using face tracking algorithms to give a position of the faces by the scene camera, including a direction to or a location of such sound source being estimated, para 38); and
perform distance analysis on the one or more input audio signals, the image data,
or both, to determine the distance of the particular audio source (Johnson, using the face recognition to analyze the relative position, size, and/or shape of the eyes, nose, cheekbones, …, to determine amount of frame occupied by the user 5A, para 56, and loudness is higher if the area of the frame occupied by the user 5A is larger, and vise verse, and the louder sound imitates the user 5A being closer and vise verse, para 63, i.e., distance is determined by the loudness settings inherently, and Skoglund, T data indicative of a location of the sound source S including the distance from or to the sound source relative said user, para 20, e.g., the time difference can be converted to the distance, para 119-121).
Claim 18: the combination of Johnson, Tsingos, and Skoglund further teaches, according to claim 10 above, wherein the one or more processors are further configured to select the one or more equalizer settings based on equalizer setting data that associates the one or more equalizer settings with the context data, the directivity data, a zoom orientation, a zoom distance, or a combination thereof (Johnson, equalization filter set by one or more linear filters upon the user 5A occupied area size of the video frame, para 66, and the size is uniquely related to the directivity index of the audio stream in table of the para 61 and determined by the face recognition technique with the video camera 11A in fig. 2A, para 56, and the using zoom settings the equalization filter is also based on the face recognition, para 54-56).
Claim 19: the combination of Johnson, Tsingos, and Skoglund further teaches, according to claim 18 above, wherein the one or more processors are further configured to obtain the equalizer setting data from the memory, another device, or both (Johnson, memory 8A including the video conferencing unit 9A and implemented by the processor 7A, and thus, equalization settings used by the processor inherently are retrieved from the memory for the processor to process and Tsingos, determined equalization relative to a direction of the assumed capture direction of the core audio signal for a target level determination, para 52).
Claim 20: the combination of Johnson, Tsingos, and Skoglund further teaches, according to claim 10 above, wherein the one or more processors are further configured to select the one or more equalizer settings to reduce a frequency response corresponding to mid-frequencies (Johnson, generated equalization settings for sound generated by the audio stream being included in the target audio parameters, and the sound is generated by the speaker array 19B at the far-end side, and vise verse for accomplishing the video conference meeting in figs. 2A/2B, and the high frequency and low frequency being filtered out by the equalization filter upon the distance of the speaker or sound source, para 66, and the transducers 19A or 19B including mid-range drivers, subwoofers, woofers, and tweeters, para 40, and thus, the frequency response corresponding to mid-range frequencies is inherently filtered out for the transducers tweeters and subwoofers, woofers for providing high frequency sound and low frequency sound).
Claim 30: the combination of Johnson, Tsingos, and Skoglund further teaches, according to claim 29 above, wherein the means for obtaining, the means for determining, and the means for generating are integrated into at least one of a virtual assistant, a home appliance, a smart device, an internet of things IoT device, a communication device, a headset, a vehicle, a computer, a display device, a television, a gaming console, a music player, a radio, a video player, an entertainment unit, a personal media player, a digital video player, a camera, or a navigation device (Johnson, communication device in fig. 2A/2B, Tsingos, a personal computer, or a server or client computer, para 115, and Skoglund, hearing assistance device worn by a user, abstract).

Claim 21 is rejected under 35 U.S.C. 103 as being unpatentable over Johnson (above) and in view of references Tsingos (above) and Shimoyama (JP 2003-337164 A).
Claim 21: the combination of Johnson and Tsingos teaches all the elements of claim 21, according to claim 1 above, except 
generate, at a first time, a first sound spectrum of a first input audio signal corresponding to a particular audio source of the one or more audio sources; 
generate, at a second time, a second sound spectrum of a second input audio signal corresponding to the particular audio source; and 
update the directivity data to indicate that a difference between a first distance and a first orientation at the first time and a second distance and a second orientation at the second time corresponds to a difference between the first sound spectrum and the second sound spectrum.
Shimoyama teaches a device (title and abstract, ln 1-10, and fig. 36 and sound spatial monitoring apparatus in fig. 36) and wherein Shimoyama teaches
generating a first sound spectrum of a first input audio signal corresponding to a particular audio source of the one or more audio sources (e.g., sound source 3a and/or 3b in figs. 7-8; applying FFT analysis on two channel signals from the two microphones 1a/1b to generate two analysis result or two acoustic signals, i.e., spectrum in fig. 32, para 32); 
generating a second sound spectrum of a second input audio signal corresponding to the particular audio source (generating the two analysis result or two acoustic signals, i.e., spectrum by FFT analysis, para 32, 36, 93); and 
updating the directivity data to indicate that a difference between a first distance and a first orientation at the first time and a second distance and a second orientation at the second time corresponds to a difference between the first sound spectrum and the second sound spectrum (phase difference indicating the directivity of the sound source in figs. 19-21;  updated the phase difference indicating the sound source direction in figs. 20-21 and distance between the sound source and the microphones in fig. 19a/b) for benefits of achieving an accurate sound source detect with less environment interferences (abstract) and in a simpler manner (without need to measure transfer functions, para 8).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied generating the first sound spectrum of the first input audio signal corresponding to the particular audio source of the one or more audio sources; generating the second sound spectrum of the second input audio signal corresponding to the particular audio source; and updating the directivity data to indicate that the difference between the first distance and the first orientation at the first time and the second distance and the second orientation at the second time corresponds to the difference between the first sound spectrum and the second sound spectrum, as taught by Shimoyama, to the one or more processors in the device, as taught by the combination of Johnson and Tsingos, for the benefits discussed above.
However, the combination of Johnson and Shimoyama does not explicitly teach a first time for the Shimoyama’s generating the first sound spectrum and a second time for the Shimoyama’s generating the second sound spectrum.
It would have been obvious for one having ordinary skill in the art before the effective filing date of the claimed invention to have recognized that using a microphone to generate a first sound spectrum and a second sound spectrum at different times and different location or using a pair of microphones to generate the same at the same time and same pair location would have been a matter of designer’s choice, for example, using one microphone for performing measurement twice for costless in measurement device, while using pair of microphones for performing the same measurement at one time for saving the time.

Claim 23 is rejected under 35 U.S.C. 103 as being unpatentable over Johnson (above) and in view of references, Tsingos (above) and Christoph (above) and Skoglund (above).
Claim 23 has been analyzed and rejected according to claims 6-8 (Johnson, Christoph above) and claims 9-10 (Johnson, Skoglund above) above.

Response to Arguments
Applicant's arguments filed on July 15, 2022 have been fully considered and but are moot in view of the new ground(s) of rejection necessitated by the applicant amendment. The Office has thoroughly reviewed Applicants' arguments but firmly believes that the cited references to reasonably and properly meet the claimed limitations.
In the response to this office action, the examiner respectfully requests that support be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line numbers in the specification and/or drawing figure(s). This will assist the Examiner in prosecuting this application.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LESHUI ZHANG whose telephone number is (571)270-5589.  The examiner can normally be reached on Monday-Friday 6:30am-4:00pm EST.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vivian Chin can be reached on 571-272-7848.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/LESHUI ZHANG/
Primary Examiner, Art Unit 2654