Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No. CN201710061599.3, filed on 01/26/2017.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 07/05/2019 and 07/15/2019 are being considered by the examiner.
Drawings
The drawing submitted on 07/05/2019 is being considered by the examiner.
Response to Amendment
Claims 1-10 are currently pending in the application and claims 1-8 has been amended.
Response to Arguments
Applicant's arguments filed 05/21/2021 have been fully considered but are moot in view of new ground of rejection and further not persuasive as well for the following reasons:
Applicant Arguments: Applicant respectfully submits that Johnson does not disclose or suggest each and every limitation recited in claim 1. Specifically, Johnson does not disclose or suggest "processing the multichannel voice signals picked up by the microphone array into one channel enhanced voice, and outputting the one channel enhanced voice as a finally picked up voice," (emphasis added). As shown (cited portion), Johnson is completely silent regarding processing multichannel voice signals into one channel enhanced voice. 
Examiner Response:  Examine respectfully disagree with the applicant assertion from the cited reference that Johnson does not teach the above limitation. Col 10 line 21 to Col 11 line 19, clearly teaches the applicant concern limitation, “processing the multichannel voice signals picked up by the microphone array into one channel enhanced voice, and outputting the one channel enhanced voice as a finally picked up voice”. 
Below are the examiner mapping of the limitation with respect to the prior art teaching.
Processing the multichannel voice signals picked up by the microphone array into one channel enhanced voice (an audio signal generated from one or more microphones focused at a particular direction from combines signals of different microphones of different direction, i.e. combine signals from different microphones to generate an audio signal that is focused in a direction from which user speech has been detected) (Col 10 line 21 to Col 10 line 42, The audio device may include an audio processing module 1240 (illustrated in FIG. 12) that may include one or more audio beamformers or beamforming components that are configured to generate an audio signal that is focused in a direction from which user speech has been detected. Beamforming uses signal processing techniques to combine signals from the different microphones so that sound signals originating from a particular direction are emphasized while sound signals from other directions are deemphasized. More specifically, signals from the different microphones are combined in such a way that signals from a particular direction experience constructive interference, while signals from other directions experience destructive interference. The parameters used in beamforming may be varied to dynamically select different directions, even when using a fixed-configuration microphone array. )  and outputting the one channel enhanced voice as a finally picked up voice (Col 10 line 21 to Col 10 line 42, More specifically, the beamforming components may be responsive to spatially separated microphone elements of the microphone array 302 to produce directional audio signals that emphasize sounds originating from different directions relative to the audio device 110, and to select and output one of the audio signals that is most likely to contain user speech.).
Therefore examiner believes that prior art teaches the limitation that the applicant representative is arguing and the pending claims rejection remains.
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claim 8 is rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. 
Claim 8, line 2 recites “parallelly performing the voice activation detection…and performing the processing the multichannel voice signals picked up by the microphone array into one channel enhance voice.” which is not describe in the specification as claimed. 
Applicant’s specification is silent on parallel processing, however drawing, Fig.3, shows a parallel processing of wakeup data flow and pickup data flow. The block diagram also shows a feedback signal with “lighting pickup indicator lamp” outputted from “voice wakeup detection unit 14” and together with output of the pickup data flow is inputted to the “second voice enhancement unit 15” and the output of that is inputted to one channel voice signal”. This is not the parallel processing the claim 8 is claiming. 
For examination purpose examiner did not considered the parallel processing and treated the limitation as a processing recited by the prior art which is cited in the rejection of the claim 8.

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 3 and 6 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 3 line 8, recites “if yes, go to Step 2; otherwise, repeat Step 1”. Since the preceding claims 1-2 does not recites particularly “Step 1, and Step 2” it is not clear which steps are Step 1 and which steps are Step 2 in the method claims of 1-2. Thus the claim is indefinite. 
Claim 6, is rejected similarly for the similar limitation as claim 3.
For examination purpose, examiner did not consider the limitation in the rejection of claim 3 and 6.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1, and 3-9 are rejected under 35 U.S.C. 102(a) (2) as being anticipated by Johnson Jr. (US 10134425 B1), herein referred as Johnson.

Regarding Claim 1, Johnson teaches: A microphone array based pickup method, comprising: performing voice activation detection using one channel voice signal among multichannel voice signals picked up and output by a microphone array, (Col 3, lines 7- 39, Further, the system may track the direction from which audio is received and a duration associated with that direction. Thus the system can determine when multiple audio sources are active in an environment by tracking changes in direction from one input audio to the next. The system may weight certain audio data and/or active hypotheses based on the direction and/or duration indications that are associated with audio data. Further, the system may filter audio data using the direction and/or duration information to prevent the system from considering undesired audio, which may lead to latency. A spoken utterance in the audio data is input to a processor configured to perform ASR which then interprets the utterance based on the similarity between the utterance and pre-established language models 254 stored in an ASR model knowledge base (ASR Models Storage 252). For example, the ASR process may compare the input audio data with models for sounds (e.g., subword units or phonemes) and sequences of sounds to identify words that match the sequence of sounds spoken in the utterance of the audio data. The device may include a plurality of microphones and thus be capable of determining an incoming direction of audio. As shown in FIG. 1, the system may receive (152) audio including speech. The system may then determine (154) audio data from the audio. The system may also determine (156) a direction associated with the audio and determine (158) a duration during which audio has been received from the direction. The system may perform (160) ASR processing on the audio data and may determine (162) an endpoint of the speech using the audio data, direction, and duration.); locating the voice source by using the multichannel voice signals output by the microphone array to obtain the voice source locating direction based on a determination that voice activation signal occurs(Col 10, lines 21-42, Using the microphone array 302 and the plurality of microphones 308 the audio device 110 may employ beamforming techniques to isolate desired sounds for purposes of converting those sounds into audio signals for speech processing by the system. Beamforming is the process of applying a set of beamformer coefficients to audio signal data to create beampatterns, or effective directions of gain or attenuation. In some implementations, these volumes may be considered to result from constructive and destructive interference between signals from individual microphones in a microphone array. The audio device may include an audio processing module 1240 (illustrated in FIG. 12) that may include one or more audio beamformers or beamforming components that are configured to generate an audio signal that is focused in a direction from which user speech has been detected. More specifically, the beamforming components may be responsive to spatially separated microphone elements of the microphone array 302 to produce directional audio signals that emphasize sounds originating from different directions relative to the audio device 110, and to select and output one of the audio signals that is most likely to contain user speech. ) enhancing a voice signal in the voice source locating direction to obtain an enhanced voice signal (Col 10, line 66 to Col 11, line 2, A given beampattern may be used to selectively gather signals from a particular spatial location where a signal source is present. The selected beampattern may be configured to provide gain or attenuation for the signal source. Such spatial selectivity by using beamforming allows for the rejection or attenuation of undesired signals outside of the beampattern. Col 11, lines 14-19, The processed data from the beamformer module may then undergo additional filtering or be used directly by other modules. For example, a filter may be applied to processed data which is acquiring speech from a user to remove residual audio noise from a machine running in the environment.); conducting voice wakeup detection on the enhanced voice signal; based on a determination that a voice wakeup is detected, picking up and outputting the multichannel voice signals by the microphone array(Col 9, lines 35-49, Once speech is detected in the audio received by the device 110, the device may perform wakeword detection to determine when a user intends to speak a command to the device 110. As noted above, a wakeword is a special word that the device 110 is configured to recognize among the various audio inputs detected by the device 110. The wakeword is thus typically associated with a command to be executed by the device 110 and/or overall system 100. Following detection of the wakeword the device 110 may send audio data corresponding to the utterance (which may include the wakeword itself) to the server(s) 120. The server(s) 120 may then perform speech processing on the audio data 111 until an endpoint is detected (discussed below) and may also and execute any resulting command included in the utterance.); processing the multichannel voice signals picked up by the microphone array into one channel enhanced voice(an audio signal generated from one or more microphones focused at a particular direction from combines signals of different microphones of different direction, i.e. combine signals from different microphones to generate an audio signal that is focused in a direction from which user speech has been detected) (Col 10 line 21 to Col 11 line 19, The audio device may include an audio processing module 1240 (illustrated in FIG. 12) that may include one or more audio beamformers or beamforming components that are configured to generate an audio signal that is focused in a direction from which user speech has been detected. Beamforming uses signal processing techniques to combine signals from the different microphones so that sound signals originating from a particular direction are emphasized while sound signals from other directions are deemphasized. More specifically, signals from the different microphones are combined in such a way that signals from a particular direction experience constructive interference, while signals from other directions experience destructive interference. The parameters used in beamforming may be varied to dynamically select different directions, even when using a fixed-configuration microphone array. )  and outputting the one channel enhanced voice as a finally picked up voice (Col 10 line 21 to Col 10 line 42, The audio device may include an audio processing module 1240 (illustrated in FIG. 12) that may include one or more audio beamformers or beamforming components that are configured to generate an audio signal that is focused in a direction from which user speech has been detected. More specifically, the beamforming components may be responsive to spatially separated microphone elements of the microphone array 302 to produce directional audio signals that emphasize sounds originating from different directions relative to the audio device 110, and to select and output one of the audio signals that is most likely to contain user speech.).

Regarding Claim 3, Johnson teaches: The microphone array based pickup method of claim 1, wherein the performing voice activation detection  further comprises: selecting one channel voice signal from the multichannel voice signals captured by the microphone array; detecting a voice initial point and a voice ending point of a speaker in the voice signals; determining if a voice activation signal occurs according to signals between the voice initial point and the voice ending point (See rejection of Claim 1 and Col 3, lines 7- 39, Further, the system may track the direction from which audio is received and a duration associated with that direction. Thus the system can determine when multiple audio sources are active in an environment by tracking changes in direction from one input audio to the next. The system may weight certain audio data and/or active hypotheses based on the direction and/or duration indications that are associated with audio data. Further, the system may filter audio data using the direction and/or duration information to prevent the system from considering undesired audio, which may lead to latency. A spoken utterance in the audio data is input to a processor configured to perform ASR which then interprets the utterance based on the similarity between the utterance and pre-established language models 254 stored in an ASR model knowledge base (ASR Models Storage 252). For example, the ASR process may compare the input audio data with models for sounds (e.g., subword units or phonemes) and sequences of sounds to identify words that match the sequence of sounds spoken in the utterance of the audio data. The device may include a plurality of microphones and thus be capable of determining an incoming direction of audio. As shown in FIG. 1, the system may receive (152) audio including speech. The system may then determine (154) audio data from the audio. The system may also determine (156) a direction associated with the audio and determine (158) a duration during which audio has been received from the direction. The system may perform (160) ASR processing on the audio data and may determine (162) an endpoint of the speech using the audio data, direction, and duration.) Col 4, lines 4-9, An audio capture component, such as a microphone of the audio device 110, captures audio corresponding to a spoken utterance. Details for capturing the spoken utterance, such as determining the beginning and/or end of the utterance and configuring an audio signal corresponding to the utterance, is discussed below. Col 9, lines 35-49, Once speech is detected in the audio received by the device 110, the device may perform wakeword detection to determine when a user intends to speak a command to the device 110. As noted above, a wakeword is a special word that the device 110 is configured to recognize among the various audio inputs detected by the device 110. The wakeword is thus typically associated with a command to be executed by the device 110 and/or overall system 100. Following detection of the wakeword the device 110 may send audio data corresponding to the utterance (which may include the wakeword itself) to the server(s) 120. The server(s) 120 may then perform speech processing on the audio data 111 until an endpoint is detected (discussed below) and may also and execute any resulting command included in the utterance.).

Regarding Claim 4, Johnson teaches: The microphone array based pickup method of claim 1, wherein locating the voice source further comprises: determining the voice source locating direction by obtaining the location of the voice source according to the time difference of the signals received by at least two microphones in the microphone array (See rejection of Claim 1 and  Col 10, line 43 to Col 11, line 13, Audio beamforming, also referred to as audio array processing, uses a microphone array having multiple microphones that are spaced from each other at known distances. Sound originating from a source is received by each of the microphones. However, because each microphone is potentially at a different distance from the sound source, a propagating sound wave arrives at each of the microphones at slightly different times. This difference in arrival time results in phase differences between audio signals produced by the microphones. The phase differences can be exploited to enhance sounds originating from chosen directions relative to the microphone array.     Beamforming uses signal processing techniques to combine signals from the different microphones so that sound signals originating from a particular direction are emphasized while sound signals from other directions are deemphasized. More specifically, signals from the different microphones are combined in such a way that signals from a particular direction experience constructive interference, while signals from other directions experience destructive interference. The parameters used in beamforming may be varied to dynamically select different directions, even when using a fixed-configuration microphone array.   A given beampattern may be used to selectively gather signals from a particular spatial location where a signal source is present. The selected beampattern may be configured to provide gain or attenuation for the signal source. For example, the beampattern may be focused on a particular user's head allowing for the recovery of the user's speech while attenuating noise from an operating air conditioner that is across the room and in a different direction than the user relative to a device that captures the audio signals.   Such spatial selectivity by using beamforming allows for the rejection or attenuation of undesired signals outside of the beampattern. The increased selectivity of the beampattern improves signal-to-noise ratio for the audio signal. By improving the signal-to-noise ratio, the accuracy of speech recognition performed on the audio signal is improved. Col 11, lines 14-19, The processed data from the beamformer module may then undergo additional filtering or be used directly by other modules. For example, a filter may be applied to processed data which is acquiring speech from a user to remove residual audio noise from a machine running in the environment. ).

Regarding Claim 5, Johnson teaches: The microphone array based pickup method of claim 1, wherein enhancing the voice signal further comprises: suppressing the noise of the voice signal in the voice source locating direction to obtain an enhanced voice signal (See rejection of Claim 4).

Regarding Claim 6, Johnson teaches: The microphone array based pickup method of claim 1, wherein the conducting voice wakeup detection further comprising: sending the enhanced voice signal into a signal wakeup model; detecting if pre- determined wakeup words are included in the enhanced voice signal; if yes, go to Step 1; otherwise, go to Step 5 (Col 3, lines 7- 39, The system may weight certain audio data and/or active hypotheses based on the direction and/or duration indications that are associated with audio data. Further, the system may filter audio data using the direction and/or duration information to prevent the system from considering undesired audio, which may lead to latency. A spoken utterance in the audio data is input to a processor configured to perform ASR which then interprets the utterance based on the similarity between the utterance and pre-established language models 254 stored in an ASR model knowledge base (ASR Models Storage 252). For example, the ASR process may compare the input audio data with models for sounds (e.g., subword units or phonemes) and sequences of sounds to identify words that match the sequence of sounds spoken in the utterance of the audio data. The device may include a plurality of microphones and thus be capable of determining an incoming direction of audio. As shown in FIG. 1, the system may receive (152) audio including speech. The system may then determine (154) audio data from the audio. The system may also determine (156) a direction associated with the audio and determine (158) a duration during which audio has been received from the direction. The system may perform (160) ASR processing on the audio data and may determine (162) an endpoint of the speech using the audio data, direction, and duration.); Col 9, lines 35-49, Once speech is detected in the audio received by the device 110, the device may perform wakeword detection to determine when a user intends to speak a command to the device 110. As noted above, a wakeword is a special word that the device 110 is configured to recognize among the various audio inputs detected by the device 110. The wakeword is thus typically associated with a command to be executed by the device 110 and/or overall system 100. Following detection of the wakeword the device 110 may send audio data corresponding to the utterance (which may include the wakeword itself) to the server(s) 120. The server(s) 120 may then perform speech processing on the audio data 111 until an endpoint is detected (discussed below) and may also and execute any resulting command included in the utterance.).).

Regarding Claim 8, Johnson teach: The microphone array based pickup method of claim 1, further comprising: parallelly performing the voice activation detection based on the multichannel voice signals captured by the microphone array when the wakeup is detected and performing the processing the multichannel voice signals picked up by the microphone array into one channel enhanced voice (See rejection of Claim 1 and Col 3, lines 7- 39, Further, the system may track the direction from which audio is received and a duration associated with that direction. Thus the system can determine when multiple audio sources are active in an environment by tracking changes in direction from one input audio to the next. Col 9, lines 35-49, Once speech is detected in the audio received by the device 110, the device may perform wakeword detection to determine when a user intends to speak a command to the device 110. As noted above, a wakeword is a special word that the device 110 is configured to recognize among the various audio inputs detected by the device 110. The wakeword is thus typically associated with a command to be executed by the device 110 and/or overall system 100. Following detection of the wakeword the device 110 may send audio data corresponding to the utterance (which may include the wakeword itself) to the server(s) 120. The server(s) 120 may then perform speech processing on the audio data 111 until an endpoint is detected (discussed below) and may also and execute any resulting command included in the utterance.).Col 10, line 43 to Col 11, line 13, Audio beamforming, also referred to as audio array processing, uses a microphone array having multiple microphones that are spaced from each other at known distances. Sound originating from a source is received by each of the microphones. However, because each microphone is potentially at a different distance from the sound source, a propagating sound wave arrives at each of the microphones at slightly different times. This difference in arrival time results in phase differences between audio signals produced by the microphones. The phase differences can be exploited to enhance sounds originating from chosen directions relative to the microphone array. Beamforming uses signal processing techniques to combine signals from the different microphones so that sound signals originating from a particular direction are emphasized while sound signals from other directions are deemphasized. More specifically, signals from the different microphones are combined in such a way that signals from a particular direction experience constructive interference, while signals from other directions experience destructive interference. The parameters used in beamforming may be varied to dynamically select different directions, even when using a fixed-configuration microphone array.   A given beampattern may be used to selectively gather signals from a particular spatial location where a signal source is present. The selected beampattern may be configured to provide gain or attenuation for the signal source. For example, the beampattern may be focused on a particular user's head allowing for the recovery of the user's speech while attenuating noise from an operating air conditioner that is across the room and in a different direction than the user relative to a device that captures the audio signals.   Such spatial selectivity by using beamforming allows for the rejection or attenuation of undesired signals outside of the beampattern. The increased selectivity of the beampattern improves signal-to-noise ratio for the audio signal. By improving the signal-to-noise ratio, the accuracy of speech recognition performed on the audio signal is improved. Col 11, lines 14-19, The processed data from the beamformer module may then undergo additional filtering or be used directly by other modules. For example, a filter may be applied to processed data which is acquiring speech from a user to remove residual audio noise from a machine running in the environment. ).

Regarding Claim 9, Johnson teaches:  A microphone array based pickup system, comprising: a microphone array, including multiple microphone units, wherein the microphone units are configured to pick up and output multichannel voice signals (Col 10, lines 21-42, Using the microphone array 302 and the plurality of microphones 308 the audio device 110 may employ beamforming techniques to isolate desired sounds for purposes of converting those sounds into audio signals for speech processing by the system. Col 10, line 43 to Col 11, line 13, Beamforming uses signal processing techniques to combine signals from the different microphones so that sound signals originating from a particular direction are emphasized while sound signals from other directions are deemphasized.); a voice activation unit (VAD), connecting with the microphone array, conducts voice activation detection based on at least one channel voice signal among the multichannel voice signals to output voice activation signal or voice inactivation signal (Col 12, line 56- Col 13, line 16, The beginning/end of an utterance may also be detected using speech/voice characteristics. Other techniques may also be used to determine the beginning of an utterance (also called beginpointing) or end of an utterance (endpointing). Beginpointing/endpointing may be based, for example, on the number of silence/non-speech audio frames, for instance the number of consecutive silence/non-speech frames. For example, some systems may employ energy based or acoustic model based voice activity detection (VAD) techniques. Such techniques may determine whether speech is present in an audio input based on various quantitative aspects of the audio input, such as the spectral slope between one or more frames of the audio input; the energy levels (such as a volume, intensity, amplitude, etc.) of the audio input in one or more spectral bands; zero-crossing rate; the signal-to-noise ratios of the audio input in one or more spectral bands; or other quantitative aspects. These factors may be compared to one or more thresholds to determine if a break in speech has occurred that qualifies as a beginpoint/endpoint. Such thresholds may be set according to user input, or may be set by a device. In some embodiments, the beginpointing/endpointing may be further configured to determine that the audio input has an energy level satisfying a threshold for at least a threshold duration of time. In such embodiments, high-energy audio inputs of relatively short duration, which may correspond to sudden noises that are relatively unlikely to include speech, may be ignored.); a voice source locating unit (audio processing module 1240), connecting with the microphone array under the control of first controlled switch (beamformer module 1242) activated by the voice activation signal, determines voice source locating direction by locating the voice source according to the multichannel voice signals (Col 10, lines 32-36, The audio device may include an audio processing module 1240 (illustrated in FIG. 12) that may include one or more audio beamformers or beamforming components that are configured to generate an audio signal that is focused in a direction from which user speech has been detected.  Col 29, lines 17-39, The audio processing module 1240 may include a beamformer module 1242, a room impulse response (RIR) determination module 1244, a lobe-selection module 1246, and an acoustic echo cancellation (AEC) module 1248.  The beamformer module 1242 functions to create beampatterns, or effective directions of gain or attenuation. As illustrated and described below, the beampatterns include multiple lobes, each altering a gain from a respective region within the environment of the device 110. The beamformer module 1242 may be configured to create an indicator of a direction of received audio for consideration by the speech processing system either local to the device 110 or at a remote server 120, to which the device 110 may send the indicator of the direction of the received audio. The indicator of a direction may indicate a direction relative to the device 110, a particular beam and/or lobe determined by the beamformer module 1242, or some other indicator. The beamformer module 1242 and/or other component (either of device 110 or of server 120) may also be configured to track the duration over which a particular audio source was detected, for example an audio source associated with a first direction as detected by the device 110. ); a first voice enhancement unit (The lobe-selection module 1246), connecting with the voice source locating unit, enhances the voice signal from the voice source locating direction to obtain an enhanced voice signal (Col 29, line 65 to Col 30 , line 26, The lobe-selection module 1246, meanwhile, functions to select one or more lobes of a beampattern to enhance based on the RIR of the environment, described above, as well as with reference to a history of lobes that have previously been found to include user speech. For instance, because the RIR may indicate when the device 110 is near a wall or other occluding object, and the direction of that wall or object relative to the device 110, the lobe-selection module may take that into account when determining which lobes of a beampattern to enhance. In addition to referencing the RIR, the lobe selection module 1246 may reference a history of which lobes have previously been found to include user speech. That is, if particular lobe(s) of a beampattern correspond to regions of an environment that have been found to often include user speech, then the lobe selection module 1246 may increase the likelihood that these particular lobes will be enhanced. The lobe-selection module 1246 may then use the RIR measurement, the heuristics associated with previous lobe-selections, and an amount of energy associated with each lobe to select one or more lobes to enhance.); a voice wakeup detection unit (wakeword detection module 1252), connecting with the first voice enhancement unit, performs voice wakeup detection on the enhanced voice signal and outputs a voice wakeup signal or a voice un-wakeup signal (Col 9, lines 35-42, Once speech is detected in the audio received by the device 110, the device may perform wakeword detection to determine when a user intends to speak a command to the device 110. As noted above, a wakeword is a special word that the device 110 is configured to recognize among the various audio inputs detected by the device 110. The wakeword is thus typically associated with a command to be executed by the device 110 and/or overall system 100. Col 30 lines 47-52, The speech processing module 1250 may include a wakeword detection module 1252. The wakeword detection module may perform wakeword detection as described above. ); a second voice enhancement unit (speech processing module 1250), connecting with the microphone array under the control of second controlled switch (AEC module 1248) activated by the voice activation signal, processes the multichannel voice signals of the microphone array into an enhanced one channel voice ( a direction from which user speech has been detected) and outputs the enhanced one channel voice as a finally picked up voice (Col 10, lines 21-42, The audio device may include an audio processing module 1240 (illustrated in FIG. 12) that may include one or more audio beamformers or beamforming components that are configured to generate an audio signal that is focused in a direction from which user speech has been detected. More specifically, the beamforming components may be responsive to spatially separated microphone elements of the microphone array 302 to produce directional audio signals that emphasize sounds originating from different directions relative to the audio device 110, and to select and output one of the audio signals that is most likely to contain user speech. Col 30, lines 26-46,The AEC module 1248 may perform echo cancellation. The AEC module 1248 compares audio that is output by the speaker(s) 1216 of the device 110 with sound picked up by the microphone array 302 (or some other microphone used to capture spoken utterances), and removes the output sound from the captured sound in real time.  The AEC module 1248 may also work with other components, for example may apply more processing resources to preparing the portion of the audio signal corresponding to the selected lobes as compared to a remainder of the audio signal. Although illustrated as part of the audio processing module 1240, the AEC, and/or it functionality may be located elsewhere, for example in ASR module 250, ASR module 1256, etc. The output of the audio processing module 1240 may be sent to the AFE 256, to the speech processing module 1250, or to other components. Col 30, lines 57-64, The speech processing module 1250 may include an ASR module 250. The storage 1208 may include ASR models 252 used by the ASR module 250. If limited speech recognition is included, the speech recognition engine within ASR module 250 may be configured to identify a limited number of words, such as wake words of the device, whereas extended speech recognition may be configured to recognize a much larger range of words. Note: Col 10 lines 21-42 was recited in the first limitation of the claim and not added as new citation.).


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b) (2) (C) for any potential 35 U.S.C. 102(a) (2) prior art against the later invention.
Claims 2, 7, and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Johnson  in view of Abraham et al.(US 2016/0323668 A1)..

Regarding Claim 2, Johnson teach: The microphone array based pickup method of claim 1, wherein the picking up and outputting comprises: pointing to the voice source locating direction, while picking up and outputting the multichannel voice signals by the microphone array pointing to the voice source locating direction (see rejection of Claim 1 and Col 10, lines 21-42, Using the microphone array 302 and the plurality of microphones 308 the audio device 110 may employ beamforming techniques to isolate desired sounds for purposes of converting those sounds into audio signals for speech processing by the system. Beamforming is the process of applying a set of beamformer coefficients to audio signal data to create beampatterns, or effective directions of gain or attenuation. Col 10, line 66 to Col 11, line 2, A given beampattern may be used to selectively gather signals from a particular spatial location where a signal source is present. The selected beampattern may be configured to provide gain or attenuation for the signal source. Such spatial selectivity by using beamforming allows for the rejection or attenuation of undesired signals outside of the beampattern. Col 11, lines 14-19, The processed data from the beamformer module may then undergo additional filtering or be used directly by other modules. For example, a filter may be applied to processed data which is acquiring speech from a user to remove residual audio noise from a machine running in the environment. Col 23, lines 55-62, In particular, such directional based aspects may be helpful in noisy environments where non-desired audio (that is, audio from some source other than the speaker of the desired utterance) may interfere with the system's speech processing. For example, if a first user is speaking from a first direction a device's beamformer may focus on that first direction to capture speech for processing. Col 24, lines 25-32, In one embodiment the system may track the direction of incoming audio and may create an indicator of the direction that may be considered during ASR/endpointing. For example, if incoming audio is detected from a first direction as determined by the device 110 (or possibly the server 120), the system may create an indicator of that direction and associate that indicator/direction to the audio data associated with the incoming audio. Col 29, lines 22-31, The beamformer module 1242 functions to create beampatterns, or effective directions of gain or attenuation. As illustrated and described below, the beampatterns include multiple lobes, each altering a gain from a respective region within the environment of the device 110. The beamformer module 1242 may be configured to create an indicator of a direction of received audio for consideration by the speech processing system either local to the device 110 or at a remote server 120, to which the device 110 may send the indicator of the direction of the received audio.).
Johnson do not teach: lighting a pickup indicator lamp pointing to the voice source locating direction.
Abraham et al. teach: lighting a pickup indicator lamp pointing to the voice source locating direction ([0042] Referring now to FIGS. 1 and 3, the microphone array assembly 100 further includes an indicator 126 that visually indicates an operating mode or status of the microphone array 104 (e.g., power on, power off, mute, audio detected, etc.). As shown in FIG. 1, the indicator 126 can be integrated into the screen 108, so that the indicator 126 is visible on an exterior of the front face of the housing 102, to externally indicate the operating mode of the microphone array 104 to human speakers or others in the conferencing environment. In embodiments, the indicator 126 (also referred to herein as "external indicator") comprises at least one light source (not shown), such as, for example, a light emitting diode (LED), that is turned on or off in accordance with an operating mode (e.g., power on or off) of the array microphone assembly 100. In some embodiments, the light indicator 126 can turn on a first light source to indicate a first operating mode (e.g., power on) of the microphone array assembly 100, turn on a second light source to indicate a second operating mode (e.g., audio detected), such that, in some instances, both light sources may be on at the same time. [0063] In other cases, at least one of the light sources 1046, 1048 may indicate whether or not audio is being received from an outside audio source (e.g., during web conferencing). [0065] The lobes can be steerable so as to provide audio pick-up coverage of human speakers positioned at any point 360 degrees around the array 1034. For example, the audio component 1036 may be configured (e.g., using computer programming instructions) to allow the lobes to be steered or adjusted to any point in a three-dimensional space covering azimuth, elevation, and distance or radius. In embodiments, the beam pattern of the microphone array 1034 can be electronically steered without physically moving the array 1034.). 
Therefore it would have been obvious to one of ordinary skilled in the art before the effective filling date of  the invention was made for Johnson to include the teaching of Abraham et al. above in order to visually indicate an operating mode or status of the microphone array.

Regarding Claim 7, Johnson teach: The microphone array based pickup method of claim 2, wherein the processing the multichannel voice signals further comprises: enhancing the voice in the direction pointed by the pickup indicator lamp (See rejection of Claim 2).

Regarding Claim 10, Johnson teach: The microphone array based pickup system of claim 9, wherein the microphone array is of planar ring structure (circle) comprising multiple microphone units (Col 9, lines 55-64, The microphone array 302 comprises six microphones 308 that are laterally spaced from each other so that they can be used by audio beamforming components to produce directional audio signals. The microphones 308 may, in some instances, be dispersed around a perimeter of the device 110 in order to apply beampatterns to audio signals based on sound captured by the microphones 308. In the illustrated embodiment, the microphones 308 are positioned in a circle or hexagon on a top surface 310 of the cylindrical body 310. ); the multiple pickup indicator (lobs) are configured to indicate the voice source locating direction (see rejection of Claim 1 and Col 10, lines 21-42, Using the microphone array 302 and the plurality of microphones 308 the audio device 110 may employ beamforming techniques to isolate desired sounds for purposes of converting those sounds into audio signals for speech processing by the system. Beamforming is the process of applying a set of beamformer coefficients to audio signal data to create beampatterns, or effective directions of gain or attenuation. Col 10, line 66 to Col 11, line 2, A given beampattern may be used to selectively gather signals from a particular spatial location where a signal source is present. The selected beampattern may be configured to provide gain or attenuation for the signal source. Such spatial selectivity by using beamforming allows for the rejection or attenuation of undesired signals outside of the beampattern. Col 11, lines 14-19, The processed data from the beamformer module may then undergo additional filtering or be used directly by other modules. For example, a filter may be applied to processed data which is acquiring speech from a user to remove residual audio noise from a machine running in the environment. Col 23, lines 55-62, In particular, such directional based aspects may be helpful in noisy environments where non-desired audio (that is, audio from some source other than the speaker of the desired utterance) may interfere with the system's speech processing. For example, if a first user is speaking from a first direction a device's beamformer may focus on that first direction to capture speech for processing. Col 24, lines 25-32, In one embodiment the system may track the direction of incoming audio and may create an indicator of the direction that may be considered during ASR/endpointing. For example, if incoming audio is detected from a first direction as determined by the device 110 (or possibly the server 120), the system may create an indicator of that direction and associate that indicator/direction to the audio data associated with the incoming audio. Col 29, lines 22-31, The beamformer module 1242 functions to create beampatterns, or effective directions of gain or attenuation. As illustrated and described below, the beampatterns include multiple lobes, each altering a gain from a respective region within the environment of the device 110. The beamformer module 1242 may be configured to create an indicator of a direction of received audio for consideration by the speech processing system either local to the device 110 or at a remote server 120, to which the device 110 may send the indicator of the direction of the received audio.). 
Johnson do not teaches: multiple pickup indicator lamps are set along the encircling direction of the planar ring structure; the multiple pickup indicator lamps are configured to indicate the voice source locating direction.
 Abraham et al. teach: multiple pickup indicator lamps are set along on exterior of the front face of the housing; the multiple pickup indicator lamps are configured to indicate the voice source locating direction ([0042] Referring now to FIGS. 1 and 3, the microphone array assembly 100 further includes an indicator 126 that visually indicates an operating mode or status of the microphone array 104 (e.g., power on, power off, mute, audio detected, etc.). As shown in FIG. 1, the indicator 126 can be integrated into the screen 108, so that the indicator 126 is visible on an exterior of the front face of the housing 102, to externally indicate the operating mode of the microphone array 104 to human speakers or others in the conferencing environment. In embodiments, the indicator 126 (also referred to herein as "external indicator") comprises at least one light source (not shown), such as, for example, a light emitting diode (LED), that is turned on or off in accordance with an operating mode (e.g., power on or off) of the array microphone assembly 100. In some embodiments, the light indicator 126 can turn on a first light source to indicate a first operating mode (e.g., power on) of the microphone array assembly 100, turn on a second light source to indicate a second operating mode (e.g., audio detected), such that, in some instances, both light sources may be on at the same time. [0047] More specifically, in embodiments, the microphones 106 can be arranged in concentric, circular rings of varying sizes, so as to avoid undesired pickup patterns (e.g., due to grating lobes) and accommodate a wide range of audio frequencies. As used herein, the term "ring" may include any type of circular configuration (e.g., perfect circle, near-perfect circle, less than perfect circle, etc.), as well as any type of oval configuration or other oblong loop. [0063] In other cases, at least one of the light sources 1046, 1048 may indicate whether or not audio is being received from an outside audio source (e.g., during web conferencing). [0065] The lobes can be steerable so as to provide audio pick-up coverage of human speakers positioned at any point 360 degrees around the array 1034. For example, the audio component 1036 may be configured (e.g., using computer programming instructions) to allow the lobes to be steered or adjusted to any point in a three-dimensional space covering azimuth, elevation, and distance or radius. In embodiments, the beam pattern of the microphone array 1034 can be electronically steered without physically moving the array 1034.). 
Therefore it would have been obvious to one of ordinary skilled in the art before the effective filing date of the invention was made for Johnson to include the teaching of Abraham et al. above in order to visually indicate an operating mode or status of the microphone array.
Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The prior art of record Lee et al. (US 9870775 B2) teach: A method for recognizing a voice by an electronic device may include: detecting a voice input; determining the direction of the voice and a beamforming direction. Voice recognition is based on the voice when the direction of the voice and the beamforming direction correspond to each other.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MOHAMMAD K ISLAM whose telephone number is (571)270-5878.  The examiner can normally be reached on Monday -Friday, EST (IFP).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/MOHAMMAD K ISLAM/Primary Examiner, Art Unit 2656