DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments, see pp. 8-18, filed 2/25/2022, with respect to the Claim Rejections under 35 USC 112 have been fully considered and are persuasive.  The 35 USC 112 rejections of 11/26/2021 have been withdrawn. 
Applicant's arguments, see pp. 18-25 and 27, with respect to the Claim Rejections of claims 1-14 under 35 USC 103 have been fully considered but they are not persuasive. 
Regarding claim 1, the examiner respectfully disagrees with the applicant, and the examiner maintains that the combination of Visser and Chu makes obvious the claimed spatial mask.  Specifically, Visser teaches a spatial mask as a masking function applied by a masked signal generator (see Visser, ¶ 0292 and figure 42, unit 302).  The claimed invention does not positively include or exclude the manner of generating a spatial makes.  The prior art reads on a claim directed towards, for example, a device having a generated spatial mask for performing a function, when the prior art teaches the spatial mask with those features, and it is not important how the prior art arrived accomplished the task because the claimed subject matter does not address those features.  Therefore, while Visser teaches a “phase-based” method to generate the spatial mask, the claim language does not explicitly limit the generation, or creation, of a spatial mask to a specific method that would exclude phase-based calculations to determine which directions are to be attenuated or other directions to be left unattenuated.  
The claim recites “a processor … configured to: … generate a spatial mask as a function of direction relative to the direction of interest, the spatial mask emphasizing audio reception in the direction of interest and attenuating audio reception in the directions of speakers not lying in the direction of interest”.  The claim does not include or exclude a specific manner of generating said spatial mask.  Additionally, Applicant points to paragraphs [0044]-[0047] of their own specification to provide their definition of a spatial mask.  The Applicant’s specification describes what the spatial mask is, but does not describe how the spatial mask is generated.  For instance, compare Visser’s teachings of figures 3B-3D and figures 8A-8D, with applicants figure 5, the prior art’s spatial mask reads on the claimed spatial mask.  
Moreover, Applicant is arguing that the method of generating, or the steps taken to generate a spatial mask as taught by Visser (i.e., a phase-based approach) does not teach and/or suggest the claimed features of a spatial mask.  Since, the claimed subject matter does not include and/or exclude a specific manner to generate the spatial mask, it is not a persuasive argument that the prior art does not teach and/or suggest the features of a spatial mask by arguing that the prior art method of generating a spatial mask would not read on the claimed features of a generated spatial mask (i.e., regardless of how Visser generates the spatial mask, Visser teaches the features of a generated spatial mask).  Therefore, Visser teaches the claimed features of the spatial mask regardless if the spatial mask is phase-based (i.e., based on the sound arrival time (phase) differences).
Regarding the argument where applicant states “[n]owhere does Visser teach the application of its phase mask only when speech is present in the received signal” (see remarks of 2/25/2022, p. 21), the examiner respectfully disagrees.  Visser teaches the application of the phase mask only when speech is present, because Visser teaches that the input signal is gated by a voice activity detection (VAD) control signal.  The act of gating a signal refers to controlling when to let the input signal reach the output signal, such that a gated signal refers to an output signal corresponding to the exact input signal or to null signal (i.e., silence).  Visser teaches that the input signal is only output through the gate function based on a VAD control signal, such as only outputting the input signal when speech is detected, such that the spatial mask is only  applied to the input signal when speech is detected.
Regarding the argument where applicant essentially argues that Visser teaches features having nothing to do with speech (see remarks of 2/25/2022, pp. 21-23), the examiner respectfully disagrees.  Applicant argues that Visser teaches more than just speech, so it can’t only be speech.  However, Visser does teach features with respect to detecting speech.  Prior art still anticipates and/or makes obvious a claimed invention when the prior art discloses more features than claimed (i.e., the broader art does not need to have a one to one correspondence to the applicant’s invention while remaining silent on any subject, feature, etc. that the applicant has not claimed and/or disclosed).  In the instant claim, while Visser teaches many variations and other features that are irrelevant to the claimed features, Visser still teaches the features as they are claimed, and the combination of the cited prior art makes obvious the features as claimed, see the 35 USC 103 rejections below.
Regarding claim 8, the examiner respectfully disagrees with the applicant, and the examiner maintains that the combination of Visser and Chu makes obvious the claimed features.  The applicant has clarified the original disclosure’s support for the claimed subject matter (see applicants remarks on pp. 13-14 with respect to a previously made 35 USC 112 rejection), such that the cited portions of the prior art makes obvious this feature.  For example, Visser teaches that the masking function varies based on the difference between the desired direction and the non-desired directions, such that Visser teaches the spatial mask provides a scaling value between the values of 0 and 1 when the difference is less than a defined passband (e.g., figures 3B and 3C illustrate a masking function that passes signals that have a direction of arrival between about 3*pi/16 and 5*pi/16 radians, and figure 3C illustrates that when the deviation between the position of the desired user relative to the electronic device and another position is less than about 3*pi/16 radians, the spatial mask will attenuate the beam formed reception data) (see Visser, ¶ 0152-0153, 0169 and 0292, figure 3C, and figure 42, unit 302 and signal S20).  
Regarding claims 2-7 and 9-14, see the preceding remarks with respect to the independent claims from which they depend.  The cited prior art makes obvious the features of these claims (also see the 35 USC 103 rejections in the following section).

Claim Rejections - 35 USC § 103
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 1-2, 5, 6, and 8-11 are rejected under 35 U.S.C. 103 as being unpatentable over Visser at al., US 2011/0038489 A1 (previously cited and hereafter Visser), further in view of Chu, US 10,755,727 B1 (previously cited).
Regarding claim 1, Visser teaches:
“An electronic device having improved directional noise suppression, the electronic device comprising: 
a microphone array having a plurality of microphones resulting in a reception pattern” because Visser teaches apparatus A10, and variations of A10 as apparatuses A100, A20, and A200 (see Visser, ¶ 0212-0214 and figures 18A-18D), where the variation of apparatus A200 as apparatus A240 which includes A2402 that has multiple input channels (see Visser, ¶ 0220-0221 and figures 20F and 21), teaches the microphone arrays R100 and illustrates implementations R200 and R210 of the microphone array outputting multiple input channels, such as S10, or S10-1 and S10-2 (see Visser, ¶ 0223-0228, figure 22A, and figure 22B, units MC10, MC20, S10-1, and S10-2), and teaches a variation of A100 as implementation A400 that selects one of a plurality of beams to apply to the input channels S10-1 to S10-4, and further teaches a variation of A400 and/or A2402 as implementation A420 (see Visser, ¶ 0291-0292, figure 41, and figure 42, units S10-1 to S10-4), such that Visser teaches an apparatus with a microphone array having 2 or more microphones that has a resulting reception pattern, such as through using a selectable beamformer; and
“a processor linked to the microphone array and being configured to: specify a direction of interest;” because Visser teaches task T210, which is a subtask of task T200, and the task T210 estimates the direction of arrival of each frequency component of the multichannel signal (see Visser, ¶ 0139-0140 and figure 2A), teaches devices that implement apparatus A10 with a processor (see Visser, ¶ 0230-0231, figure 23A, and figure 23B, units CS10, MC10, and MC20), teaches overlapping sectors, where a detected sound with a specific directional of arrival will be detected in at least one of the sectors (see Visser, ¶ 0269-0273, and figures 36A-37), teaches coherency measures to indicate whether the multichannel signal is coherent in any of the sectors and specifically select one of the sectors as the direction of interest (see Visser, ¶ 0275-0278 and 0281, and figure 38A), and teaches the beamforming operation selects the beam based on the selected sector that was determined as the direction of interest (see Visser, ¶ 0290-0292, and figures 41 and 42, units 712).
In the disclosure, Visser teaches that interference noises, such as background conversations, are in the non-desired directions, and further teaches that multiple directions of interest are indicated when multiple talkers are talking or when a talker is moving (see Visser, ¶ 0115, 0140, 0277-0278, and 0281).  However, Visser does not appear to teach that the “directions of speakers” (i.e., the other talkers) do not lie in the direction of interest, such that Visser teaches the other talkers are other directions of interest.
Chu discloses a system performing directional speech separation (see Chu, abstract).  Similar to Visser, Chu teaches an electronic device with a microphone array and processor linked thereto that determines the direction of interest of speech from a first user (i.e., target speech) (see Chu, column 3, lines 20-28, column 5, lines 45-47, and column 29, lines 40-63, and figures 1 and 17, unit 112).  Importantly, Chu further teaches the step to “determine directions corresponding to directions of speakers not lying in the direction of interest” by teaching the determination of the direction of a second user’s speech (i.e., distractor or non-target speech) (see Chu, column 4, lines 27-59 and column 5, lines 45-57).  It would have been obvious to one of ordinary skill in the art at the time of the effective filing date to modify Visser with Chu for the purpose of isolating a first user’s speech in order to allow the first user to use voice commands while other speech is detected (see Chu, column 3, line 64 - column 4, line 26 and column 4, lines 48-55).  
Therefore, the combination makes obvious the additional features to:
“determine directions corresponding to directions of speakers not lying in the direction of interest;” because Visser teaches interfering signals, such as background conversations (see Visser, ¶ 0115), teaches the direction indicators as direction of arrival angles (see Visser, ¶ 0140), and teaches indicating one or more sectors having a detected coherent signal within the sector’s direction of arrival range during each time period, such that each sector indication is used to track more than one desired source (see Visser, ¶ 0277-0278, and 0281), where Chu teaches isolating first and second speech corresponding to a first and second user, respectively, in order to suppress undesired speech signals (see Chu, column 4, lines 27-59), and Chu teaches using two or more microphones to determine the direction-of-arrival (DOA) of different sources, such as determining the first and second speech signals are arriving from different directions (see Chu, column 5, lines 45-57), such that it is obvious to track desired sources such as a first user in a desired direction and a competing second speech signal from a second user in a different direction, not in the direction of interest;
“beam form the reception pattern of the microphone array to focus in the direction of interest and to suppress signals from the directions of speakers not lying in the direction of interest, creating beam formed reception data;” because Visser teaches that the beamformer selects the beam based on the selected sector that was determined as the direction of interest and outputs a first channel that includes the desired sound (see Visser, ¶ 0291-0292, figures 41 and 42, unit 800);
“determine if signals received within the direction of interest contain speech;” because Visser teaches voice activity detection based on speech characteristics and the coherency measures (see Visser, ¶ 0157-0160 and 0196);
“generate a spatial mask as a function of direction relative to the direction of interest, the spatial mask emphasizing audio reception in the direction of interest and attenuating audio reception in the directions of speakers not lying in the direction of interest; and” because Visser teaches method M100 with task T100 for calculating phase differences and task T200 for calculating coherency measures based on the phase differences, and task T202, which is an implementation of T200, includes task T210 for calculating a direction indicator (see Visser, ¶ 0131 and 0139, figure 1A, task T200, and figure 2A), where the direction indicators are rated (e.g., pass/fail), such as using a directional masking function to map the value of each direction indicator on whether the indicated direction falls within directions of arrival that are passed by a masking function (see Visser, ¶ 0147, where Visser teaches a subtask T220 of task T202), Visser illustrates the passband and stopband of a masking function where the masking function allows sounds to pass when the direction of arrival is within a certain direction (i.e., figure 3B illustrates a masking function that passes signals that have a direction of arrival between about 3*pi/16 and 5*pi/16 radians) (see Visser, ¶ 0152 and figure 3B), teaches an implementation T302 of task T300, which includes task T310, where the rating results of signal masking are used to mask frequency components, subbands, or the entire portion of one or more channels, such as passing signal content that is within the direction of interest and blocking signal content that is outside the direction of interest defined by the masking function (see Visser, ¶ 0166-0169 and figures 11A-11C), and teaches the apparatus having the beamformed output, such as a primary channel, and further teaches the primary channel is processed by the masked signal generator to pass or block signal content in the primary channel according to the desired direction of interest (see Visser, ¶ 0292 and figure 42, unit 302); and
“multiply the beam formed reception data by the spatial mask to generate an audio signal with directional noise suppression only when the signals received within the direction of interest contain the speech” because Visser teaches task T312 as an implementation of task T310, where the rating results of signal masking are used to mask frequency components, subbands, or the entire portion of one or more channels, such as passing signal content that is within the direction of interest and blocking signal content that is outside the direction of interest defined by the masking function (see Visser, ¶ 0169), teaches voice activity detection based on speech characteristics and the coherency measures (see Visser, ¶ 0196), teaches an implementation T316 of task T310, where a gating function and a weighting procedure producing the masked signal, such that one or more (possibly all) frequency components are gated according to the relationship of the coherency measure to a threshold and the one or more (possibly all) frequency components are weighted by the coherency measure (see Visser, ¶ 0177), teaches an implementation of task T316, where a gating signal based on the coherency measure is used to pass or block the signal, such as passing all subbands of the primary channel during an active frame (i.e., a frame indicated as being active by the VAD flag based on the coherency (see Visser, ¶ 0198-0199), and teaches the apparatus having the beamformed output, such as a primary channel, and further teaches the primary channel is processed by the masked signal generator to pass or block signal content in the primary channel by gating and subsequently applying a masking function (see Visser, ¶ 0292 and figure 42, units 302, 712, 800, and signal S20), such that the spatial mask is only applied to signals when voice activity is detected in the direction of interest, because the gated signal is allowed to pass to the masking function when the VAD indicates speech and is blocked when no speech is detected (see Visser, ¶ 0196, 0198-0199 and figure 42, units 302, 712, and 800).
Regarding claim 2, see the preceding rejection with respect to claim 1 above.  The combination makes obvious the “electronic device according to claim 1, wherein the processor is further configured to suppress ambient noise received by the microphone array” because Visser teaches a noise reduction operation to reduce nonstationary, or ambient, noise found in the direction of interest (i.e., the desired speaker or in the direction of the selected sector and/or beam) (see Visser, ¶ 0190, 0292, 0294, and 0299).
Regarding claim 5, see the preceding rejection with respect to claim 1 above.  The combination makes obvious the “electronic device according to claim 1, wherein the spatial mask leaves unattenuated all signals within a within a predefined radial threshold centered at the direction of interest” because Visser teaches a passband of the masking function, where the passband includes the direction of interest and provides a predefined width to control the spatial selectivity of the directions surrounding the direction of interest (see Visser, ¶ 0147-0154, figures 3B-C and 8A-D).
Regarding claim 6, see the preceding rejection with respect to claim 1 above.  The combination of Visser and Chu makes obvious the “electronic device according to claim 1, wherein the processor is further configured to determine a distance from the electronic device at which the speech originates, wherein the processor multiplies the beam formed reception data by the spatial mask to generate the audio signal with directional noise suppression only when both the signals received within the direction of interest contain the speech and the distance is less than a predetermined estimated distance” because Visser teaches the gating function that only allows the desired signal through based on a coherency value, and the discrimination of near-field versus far-field speech is also determined through the coherency value, such that the gating function allows the beamformed channel to be passed to the masked signal generator when the coherency value is high (e.g., the coherency value indicates that the desired source is speech and that the desired source is a near-field source) (see Visser, ¶ 0196-0197, 0199, 0207, 0291-0292).
Regarding claim 8, see the preceding rejection with respect to claim 1 above.  The combination of Visser and Chu makes obvious the electronic device of claim 1, and for the same reasons makes obvious a method with these features.  
Specifically, the combination makes obvious:
“A method of providing improved directional noise suppression in an electronic device having a microphone array made up of a plurality of microphones resulting in a reception pattern, the method comprising: 
specifying a direction of interest by estimating a position of a desired user relative to the electronic device;” because Visser teaches task T210, which is a subtask of task T200, and the task T210 estimates the direction of arrival of each frequency component of the multichannel signal (see Visser, ¶ 0139-0140 and figure 2A), teaches apparatus A10, and variations of A10 as apparatuses A100, A20, and A200 (see Visser, ¶ 0212-0214 and figures 18A-18D), teaches the variation of apparatus A200 as apparatus A240 which includes A2402 (see Visser, ¶ 0220-0221 and figures 20F and 21), teaches the microphone arrays R100 and illustrates implementations R200 and R210 of the microphone array outputting multiple channels, such as S10, or S10-1 and S10-2 (see Visser, ¶ 0223-0228, figure 22A, and figure 22B, units MC10, MC20, S10-1, and S10-2), teaches devices that implement apparatus A10 with a processor (see Visser, ¶ 0230-0231, figure 23A, and figure 23B, units CS10, MC10, and MC20), teaches overlapping sectors, where a detected sound with a specific directional of arrival will be detected in at least one of the sectors (see Visser, ¶ 0269-0273, figures 36A-37), teaches coherency measures to indicate whether the multichannel signal is coherent in any of the sectors and specifically select one of the sectors as the direction of interest (see Visser, ¶ 0275-0278 and 0281, and figure 38A), and teaches the beamforming operation selects the beam based on the selected sector that was determined as the direction of interest (see Visser, ¶ 0290-0292 and figures 41 and 42, units S10-1 to S10-4 and 712); and also Chu teaches using two or more microphones to determine the direction-of-arrival (DOA) of different sources, such as determining the first and second speech signals are arriving from different directions (see, Chu, column 5, lines 45-57);
“determining at least one direction corresponding to at least one direction of at least one speaker not lying in the direction of interest by estimating at least one other position of at least one other interfering user relative to the electronic device” because Visser teaches interfering signals, such as background conversations (see Visser, ¶ 0115), teaches the direction indicators as direction of arrival angles (see Visser, ¶ 0140), and teaches indicating one or more sectors having a detected coherent signal within the sector’s direction of arrival range during each time period, such that each sector indication is used to track more than one desired source (see Visser, ¶ 0277-0278, and 0281), and Chu teaches isolating first and second speech corresponding to a first and second user, respectively, in order to suppress undesired speech signals (see Chu, column 4, lines 27-59), and teaches using two or more microphones to determine the direction-of-arrival (DOA) of different sources, such as determining the first and second speech signals are arriving from different directions (see Chu, column 5, lines 45-57), such that it is obvious to track desired sources such as a first user in a desired direction and a competing second speech signal from a second user in a different direction, not in the direction of interest;
“beam forming the reception pattern of the microphone array to focus in the direction of interest and to suppress signals from the at least one direction of the at least one speaker not lying in the direction of interest, creating beam formed reception data;” because Visser teaches that the beamformer selects the beam based on the selected sector that was determined as the direction of interest and outputs a first channel that includes the desired sound (see Visser, ¶ 0291-0292, figures 41 and 42, unit 800);
“generating a spatial mask as a function of direction relative to the direction of interest, the spatial mask emphasizing audio reception in the direction of interest and attenuating audio reception in the at least one direction of the at least one speaker not lying in the direction of interest; and” because Visser teaches method M100 with task T100 for calculating phase differences and task T200 for calculating coherency measures based on the phase differences, and task T202, which is an implementation of T200, includes task T210 for calculating a direction indicator (see Visser, ¶ 0131 and 0139, figure 1A, task T200, and figure 2A), teaches a subtask T220 of task T202, where the direction indicators are rated (e.g., pass/fail), such as using a directional masking function to map the value of each direction indicator on whether the indicated direction falls within directions of arrival that are passed by a masking function (see Visser, ¶ 0147), illustrates the passband and stopband of a masking function where the masking function allows sounds to pass when the direction of arrival is centered around pi/4 radians (i.e., figure 3B appears to show a masking function that passes signals that have a direction of arrival between 3*pi/16 and 5*pi/16 radians (see Visser, ¶ 0152 and figure 3B), teaches an implementation T302 of task T300, which includes task T310, where the rating results of signal masking are used to mask frequency components, subbands, or the entire portion of one or more channels, such as passing signal content that is within the direction of interest and blocking signal content that is outside the direction of interest defined by the masking function (see Visser, ¶ 0166-0169 and figures 11A-11C), and teaches the apparatus having the beamformed output, such as a primary channel, and further teaches the primary channel is processed by the masked signal generator to pass or block signal content in the primary channel according to the desired direction of interest (see Visser, ¶ 0292 and figure 42, unit 302);
“multiplying the beam formed reception data by the spatial mask to generate an audio signal with directional noise suppression only when a deviation between the position of the desired user relative to the electronic device and the at least one other position of the at least one other interfering user relative to the electronic device is less than a predefined 4U.S.S.N. 16/783,059 threshold, otherwise leaving the beam formed reception data unattenuated” because, as understood with applicants remarks on pp. 13-14 with respect to a previously made 35 USC 112 rejection, Visser teaches that the masking function varies based on the difference between the desired direction and the non-desired directions, such that Visser teaches the spatial mask provides a scaling value between the values of 0 and 1 when the difference is less than a defined passband (e.g., figures 3B and 3C illustrate a masking function that passes signals that have a direction of arrival between about 3*pi/16 and 5*pi/16 radians, and figure 3C illustrates that when the deviation between the position of the desired user relative to the electronic device and another position is less than about 3*pi/16 radians, the spatial mask will attenuate the beam formed reception data) (see Visser, ¶ 0152-0153, 0169 and 0292, figure 3C, and figure 42, unit 302 and signal S20).
Regarding claim 9, see the preceding rejection with respect to claim 8 above.  The combination makes obvious the “method according to claim 8, further comprising applying a predetermined maximum attenuation to the beam formed reception data using the spatial mask only when the deviation between position of the desired user relative to the electronic device and the at least one other position of the at least one other interfering user relative to the electronic device exceeds another predefined threshold” because Visser teaches a passband and stopband of the masking function, where a width parameter controls the spatial selectivity of the directions surrounding and outside the direction of interest, such that the signals exceeding a threshold deviation from the direction of interest are maximally attenuated in the stopband (see Visser, ¶ 0147-0154, figures 3B-C and 8A-D).
Regarding claim 10, see the preceding rejection with respect to claim 8 above.  The combination of Visser and Chu makes obvious the “method according to claim 8, the spatial mask defining a linear slope between an unattenuated portion and an attenuated portion” because Visser teaches a masking function having a linear rolloff between the passband and the stopband (see Visser, ¶ 0153 and figure 3C).
Regarding claim 11, see the preceding rejection with respect to claim 8 above.  The combination makes obvious the “method according to claim 8, further comprising calculating, from the beam forming reception pattern, each of a direct-to-reverberant power ration [sic], a coherence, and a voice activity direction prior to generating the spatial mask” because Visser teaches the calculation of a ratio of power between the unmasked signal (i.e., the beamformed channel) and the masked signal (i.e., the beamformed channel processed with the masking signal generator), where a small calculated ratio indicates a reverberant signal (see Visser, ¶ 0182-0184); Visser teaches the coherence calculation (see Visser, ¶ 0120, 0136, 0139, 0160, figure 1A, T200, and figure 2A, T202 and T230); and Visser teaches calculating a voice activity direction prior to generating the spatial mask (see Visser, ¶ 0196 and 0198-0199).

Claims 3-4, 7, and 12-14 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Chu and Visser as applied to claim 1 above, and further in view of Kim et al., US 2013/0304476 A1 (previously cited and hereafter Kim).
Regarding claim 3, see the preceding rejection with respect to claim 1 above.  The combination of Visser and Chu makes obvious the electronic device according to claim 1, where Chu teaches that the device processes voice commands from a particular user and other user’s speech is considered distractor or non-target speech (see Chu, column 1, line 66 - column 2, line 8 and column 4, lines 48-55).  However, the combination does not appear to teach speaker/voice identification to select a particular user.
Kim discloses an audio user interaction recognition and context refinement system (see Kim, abstract).  Specifically, Kim teaches a system that provides social interaction analysis of a group of users in a group meeting through the analysis of audio collected by microphone arrays, where the microphone arrays use beamforming and similarity or correlation information is used to determine the user interaction information (see Kim, ¶ 0059-0062 and 0064, and figures 1 and 2).  Importantly, Kim teaches that the fixed microphone array performs DOA estimation, separates the active speakers, and performs speaker recognition to label the generated data, such as identifying the speaker’s name to help understanding of the generated social interaction analysis of a group meeting (see Kim, ¶ 0066-0068 and 0075, figure 3, units 320, 325, 330, 340, and 350, and figure 7, unit 466).  It would have been obvious to one of ordinary skill in the art at the time of the effective filing date to modify the combination of Visser and Chu with Kim for the purpose of identifying different speakers through speaker recognition in order to select the appropriate user and allow the selected user to send voice commands (see Chu, column 2, lines 2-6 in view of Kim, ¶ 0066).  Therefore, the combination of Visser, Chu, and Kim makes obvious the “electronic device according to claim 1, wherein the processor is further configured to apply speaker/voice identification to the audio signal with directional noise suppression” because Kim makes obvious speaker identification to label users and the combination makes obvious the use of speaker identification for selecting a particular user for processing said user’s voice commands (see Chu, column 1, line 66 - column 2, line 8 and column 4, lines 48-55 in view of Kim, ¶ 0066).
Regarding claim 4, see the preceding rejection with respect to claim 3 above.  In the combination, Chu teaches that the separated speech of the first user is sent to a remote server and the server performs automatic speech recognition (ASR) to determine the voice command from the first user’s speech (see Chu, column 2, lines 2-8 and column 3, line 64 - column 4, line 26).  One of ordinary skill in the art at the time of the effective filing date would have found it obvious that the ASR is performed locally via one or more processors in the local device (see Chu, column 2, lines 2-8 in view of Chu, column 29, lines 52-63 and column 30, line 36 - column 31, line 30).  Therefore, the combination makes obvious the “electronic device according to claim 3, wherein the processor is further configured to apply automatic speech recognition to speech of a speaker of interest to identify a command” (see Chu, column 2, lines 2-8, column 3, line 64 - column 4, line 26, column 29, lines 52-63 and column 30, line 36 - column 31, line 30).
Regarding claim 7, see the preceding rejection with respect to claims 1 and 3 above.  The combination of Visser and Chu makes obvious the electronic device of claim 1, and the combination of Visser, Chu, and Kim makes obvious the device of claim 1 with these additional features for the same reasons as stated above with respect to claim 3.
The combination of Visser, Chu, and Kim makes obvious the “electronic device according to claim 1, wherein the processor is further configured to identify a speaker of interest by applying speaker/voice identification to audio signals received at the microphone array” because Kim makes obvious speaker identification, using the audio from the microphone array, to label users and the combination makes obvious the use of speaker identification for selecting a particular user for processing said user’s voice commands (see Chu, column 1, line 66 - column 2, line 8 and column 4, lines 48-55 in view of Kim, ¶ 0066).
Regarding claim 12, see the preceding rejection with respect to claims 3 and 11 above.  The combination of Visser and Chu makes obvious the method of claim 11, and the combination of Visser, Chu, and Kim makes obvious the method of claim 11 with these additional features for the same reasons as stated above with respect to claim 3 above.
The combination of Visser, Chu, and Kim makes obvious the “method according to claim 11, further comprising applying automatic speech recognition to speech of a speaker of interest to identify a command [and] executing the command” because Chu teaches that the separated speech of the first user is sent to a remote server and the server performs automatic speech recognition (ASR) to determine the voice command from the first user’s speech (see Chu, column 2, lines 2-8 and column 3, line 64 - column 4, line 26).  One of ordinary skill in the art at the time of the effective filing date would have found it obvious that the ASR is performed locally via one or more processors in the local device (see Chu, column 2, lines 2-8, column 3, line 64 - column 4, line 26, column 29, lines 52-63 and column 30, line 36 - column 31, line 30), and because Chu teaches that the voice command controls the device to play music, capture audio using the microphones, etc. (see Chu, column 4, lines 19-22).
Regarding claim 13, see the preceding rejection with respect to claims 3 and 8 above.  The combination of Visser and Chu makes obvious the device of claim 8, and the combination of Visser, Chu, and Kim makes obvious the method of claim 8 with these additional features for the same reasons as stated above with respect to claim 3.  
The combination makes obvious the “method according to claim 8, further comprising identifying a speaker other than a speaker of interest via speaker/voice identification” because Kim makes obvious speaker identification to label users and the combination makes obvious the use of speaker identification for selecting a particular user among at least two different speakers for processing said user’s voice commands, such that it is obvious to identify other users to determine which user can use voice commands (see Chu, column 1, line 66 - column 2, line 8 and column 4, lines 48-55 in view of Kim, ¶ 0066).
Regarding claim 14, see the preceding rejection with respect to claims 3 and 8 above.  The combination of Visser and Chu makes obvious the device of claim 8, and the combination of Visser, Chu, and Kim makes obvious the method of claim 8 with these additional features for the same reasons as stated above with respect to claim 3.  
The combination makes obvious the “method according to claim 8, further comprising identifying a speaker of interest by applying speaker/voice identification to audio signals received at the microphone array” because Kim makes obvious speaker identification, using the audio from the microphone array, to label users and the combination makes obvious the use of speaker identification for selecting a particular user for processing said user’s voice commands (see Chu, column 1, line 66 - column 2, line 8 and column 4, lines 48-55 in view of Kim, ¶ 0066).

Allowable Subject Matter
Claims 15-20 are allowed.
The following is a statement of reasons for the indication of allowable subject matter:  The examiner agrees with applicant’s remarks with respect to claim 15 on pp. 17-18 and 25-27.  The remarks on pp. 17-18 clarify the subject matter of the claims as they are supported by the original disclosure, and therefore the remarks with respect to the 35 USC 103 rejection of 11/26/2021 are overcome.  The cited prior art of record does not appear to teach the features to “generate a spatial mask reception pattern of the microphone array to focus in the direction of interest and to suppress signals from all other directions when there is no signal activity from the direction of interest while leaving all signals from all directions unsuppressed when signal activity from a desired source is received from the direction of interest” (emphasis added).  Visser teaches the combined features of beamforming, signal gating according to voice activity in the direction of interest, and applying a generated spatial mask based on the direction of interest.  However, Visser does not appear to teach or reasonably suggest the features of leaving suppressing signals from all other directions when there is no signal activity from the direction of interest, because Visser teaches when activity is not detected, the signal is gated and not passed to the spatial masking function, while the spatial masking function also appears to be generated anytime the process is running, such that it would be updated only when the desired direction changes.  Visser does not appear to teach or reasonably suggest the features of leaving all signals unsuppressed when signal activity from a desired source is received from the direction of interest, because Visser teaches when activity is detected, a small passband corresponding to directions centered on the direction of interest is passed and all other directions are suppressed.  Therefore the cited prior art of record, alone or in combination, does not appear to teach or reasonably suggest these features in claim 15.  Claims 16-20 are allowable because they depend from claim 15.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Daniel R Sellers whose telephone number is (571)272-7528. The examiner can normally be reached Mon - Fri 10:00-4:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Fan S Tsang can be reached on (571)272-7547. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Daniel R Sellers/              Examiner, Art Unit 2653