DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Specification
The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. 
The following title is suggested: Noise Detection of Unrecognized Speech for Microphone Control.
The disclosure is objected to because of the following informalities:
In ¶[0037], “facies in the detection” should be “faces in the direction”.
In ¶[0049], “does not respond ((b) of Fig. 5)” should be “does not respond ((d) of Fig. 5)”.  Here, Figure 5(b) and Figure 5(c) are already described at ¶[0047] and ¶[0048].
In ¶[0053], “(YES, in S18)” should be “(YES, in S48)”.  See Figure 6.
In ¶[0058], “facies in the detection” should be “faces in the direction”.
Appropriate correction is required.

Claim Objections
Claims 1 to 7 are objected to because of the following informalities:
Independent claims 1 and 7 set forth limitations of “to distinctively obtain” and “distinctively obtaining”, where the limitation of “distinctively” is somewhat indefinite.  .
Appropriate correction is required.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3, and 5 to 7 are rejected under 35 U.S.C. 103 as being unpatentable over Bastyr et al. (U.S. Patent Publication 2014/0278394) in view of Lindahl (U.S. Patent Publication 2016/0189716).
Concerning independent claims 1 and 6, Bastyr et al. discloses a device and method for beamforming to obtain voice and noise signals, comprising:
“a speech sound obtaining section configured to distinctively obtain detected sounds from respective microphones, the detected sounds being ones that have been detected by the respective microphones” – apparatus 100 may include a group of microphones 110 that provide microphone outputs; four microphones are each oriented in different directions (¶[0027]: Figure 1); beamformer produces output of voice signal 135 (¶[0031]: Figure 1);
i.e., a confidence score, to beamformer controller 190 of the likelihood of recognized human speech; voice recognition engine 180 may also send control signal 165 to system control 101 in response to detected command words or command phrases received on voice signal 157 or on a buffered voice signal 118 (¶[0039]: Figure 1); voice recognition engine sends voice confidence metrics 159 to beamformer controller 190; if confidence scores are too low, then beamformer controller 190 may determine that noise suppression is required (¶[0056]: Figure 4: Step 415); trained speech recognition logic 182 is utilized in an attempt to identify the presence of the user’s voice in a voice signal; trained voice recognition logic 182 may be applied to a noise signal to verify that any voice present in the noise signal is mostly jammer voices and not the user’s voice (¶[0057] - ¶[0058: Figure 4: Step 415); beamformer 
“a detection control section configured to, in a case where the noise determining section determines that any of the detected sounds is a noise, control at least one of the microphones to stop detecting a sound” – microphone configuration logic may include switch logic that is operative to switch any microphone output of the plurality of microphone outputs on or off (¶[0022]); microphone control logic 120 may include switch logic operatively coupled to respond to control signals to individually turn each of the microphones on or off to configure the microphones in various ways (¶[0028]: Figure 1); switch logic of microphone configuration logic 120 may switch some or all of beamformer 130, and may control which microphones are used as needed for given beamformer patterns, and may turn microphones on or off by adjusting gain applied within the beamformers (¶[0029]: Figure 1); for a given orientation, device orientation detector 105 may send control signal 108 to microphone configuration logic 120 to turn off or turn on certain microphones of the group of microphones 110 (¶[0030]: Figure 1); beamformer controller 190 is operative to configure the group of microphones 110 
Concerning independent claims 1 and 6, Bastyr et al. arguably anticipates all of the limitations of these independent claims.  Generally, Bastyr et al. discloses determining that a detected sound is a noise “in a case where a content of a speech is not recognized from a detected sound”.  That is, if a confidence metric is low for voice recognition in Bastyr et al., this implies that “speech is not recognized”, and this produces a control signal to turn off a microphone.  Voice recognition, then, is used equivalently to voice activity detection to detect that a signal is speech or noise.  Additionally, Bastyr et al. discloses that voice recognition engine 180 may send a control signal to system control 101 in response to detected command words or command phrases.  (¶[0039]: Figure 1)  Bastyr et al., then, meets the preambular limitation of “causes an output section to output a response corresponding to the content of the speech” because a control signal causes a command to be executed by system control 101.  Compare Specification, ¶[0012] and ¶[0062], which states that a response could be an action, and is not necessarily an output of a message comprising a character string via a speaker.   
Concerning independent claims 1 and 6, even if the limitation of “in a case where a content of a speech is not recognized as detected sound” is omitted by Bastyr et al., this taught by Lindahl.  Generally, Lindahl teaches speech recognition wake-up of handheld portable electronic device.  (Abstract)  Three speech recognition engines 320A, 320B, and 320C may process audio signals from three microphones A, B, and C.  i.e., only ambient acoustic noise.  (¶[0060] - ¶[0061]: Figure 3)  In a first two frames, only non-speech is identified by a speech recognition engine from microphone A, and no speech is recognized.  Subsequently, in third and fourth frames from microphone A, speech is detected.  (¶[0063]: Figure 4)  A third speech recognition engine is associated with microphone C, where microphone C is occluded, and therefore microphone C is unable to pick up speech that can be recognized.  The frames of an audio signal from microphone C are then identified as only non-speech, and this is provided to decision logic 330.  (¶[0066]: Figure 3)  Decision logic may evaluate confidence levels received from speech recognition engines, and if no confidence levels exceed a predetermined threshold, then no generation is initiated.  (¶[0067]: Figure 3)  Decision logic 330 may select the preferred microphone by evaluating the three confidence levels computed for the detected sound or phrase.  (¶[0068]: Figure 3)  Lindahl, then, teaches whatever might be omitted by Bastyr et al. as directed to “in a case where a content of a speech is not recognized as detected sound” because a low confidence level is described as equivalent to noise when speech is not able to be recognized.  An objective is to select a best microphone to use that does not require too much power consumption and time delay.  (¶[0004] -¶[0005])  It would have been obvious to one having ordinary skill in the art to select a microphone according to a case where a content of a speech is not recognized from a detected sound as taught by Lindahl when a confidence score for voice recognition engine is low in Bastyr et al. for a purpose of selecting a best microphone to use that does not require too much power consumption and time delay.
Bastyr et al. discloses:	“each of the microphones is a microphone having directivity” – apparatus 100 may include a group of microphones, with each oriented in a different direction (¶[0027]: Figure 1);
“said information processing device further comprises a direction specifying section configured to specify, from the detected sounds detected by the respective microphones, directions in which the respective detected sounds have been uttered” – beamformer controller 190 is operative to change the beamform directivity until the voice signal 135 is substantially the user’s voice (¶[0040]: Figure 1); beamformers 131 and 132 are operative in order to steer a beam or a null in a desired direction (¶[0042]: Figure 1); beamformer controller 190 adjusts one or both of the beamforms to locate dominant energy directions (¶[0046]: Figure 2: Step 203); beam forms are adjusted to locate dominant energy source directions (¶[0051]: Figure 3: Step 303);
“in a case where the noise determining section determines that a detected sound detected by any of the microphones is a noise, the detection control section controls at least one of the microphones” – beamformer controller 190 receives voice confidence metrics 159 and determines if further noise suppression is required; if voice confidence metrics 159 are too low, then this indicates that further noise suppression would be required; beamformer controller 190 then sends control signal 194 to microphone configuration logic 120 and selects a different set of physical microphones from group of microphones 110, in an attempt to steer the peak in sensitivity of the beam towards another location where voice may be detected (¶[0058]: Figure 5: Steps 511 to 515).

Bastyr et al. discloses “an electronic apparatus” – apparatus 100 (¶[0027]: Figure 1); 
“the information processing device recited in claim 1” – microphone configuration logic 120 (¶[0027]: Figure 1);
“the microphones” – microphones 110 (¶[0027]: Figure 1);
“the output section” – voice recognition engine 180 may send a control signal 165 to system control 101, which may be a voice controller dialer of a mobile telephone, a video recorder system, or an in-vehicle control system (¶[0038] - ¶[0039]: Figure 1); “the output section”, then, sends this control signal to a system control 101.  

Concerning claim 7, Bastyr et al. discloses a “non-transitory computer-readable storage medium storing therein a control program causes a computer to function as the information processing device recited in claim 1, the control program causing the computer to function as the speech sound obtaining section, the noise determining section, the detection control section” – microphone configuration logic 120, beamformers 130, noise estimators 161 and 162, voice recognition engine 180, and beamformer controller 190 may be implemented as software executing on one or more programmable processors.  (¶[0063] - ¶[0064])

Claims 2 and 4 are rejected under 35 U.S.C. 103 as being unpatentable over Bastyr et al. (U.S. Patent Publication 2014/0278394) in view of Lindahl (U.S. Patent Publication 2016/0189716) as applied to claim 1 above, and further in view of Aoyama et al. (U.S. Patent Publication 2012/0232891).
Bastyr et al. discloses “the speech sound obtaining section obtains, a plurality of times, the detected sounds detected by the respective microphones” and “in a case where contents of speech are not recognized . . . the noise determining section determines that the detected sounds are each a noise”.  That is, Bastyr et al. is continually receiving audio that could be speech or noise at a plurality of microphones 110, and if voice confidence metrics are too low, then a sound is determined to be noise and not speech.  However, Bastyr et al. does not clearly disclose determining noise from “a given number of times in succession from respective detected sounds detected by an identified one of the microphones”.  Here, Bastyr et al. does not disclose determining speech as opposed to noise from sound “a given number of times”.  Still, this is an obvious modification because voice activity detectors commonly operate on a plurality of frames to determine speech from noise.  Moreover, Lindahl, at ¶[0069], actually teaches that decision logic may evaluate confidence levels from a group of multiple frames, and may require that the confidence level from each frame be satisfactory, e.g., is satisfactory across two frames, in order to initiate an operation to select a preferred microphone.  Arguably, then, this limitation of determining speech as opposed to noise from sound “a given number of times” is broadly taught by detecting speech from confidence levels across multiple frames in Lindahl. 
Concerning claim 2, Aoyama et al. specifically teaches a confirmation partner confirmation procedure for a robot, where if a robot 1 cannot perform tracking during a conversation with a user in a case, and cannot receive response contents predicted by the robot, then robot 1 repeats urging the user to respond a prescribed number of times Aoyama et al., then, teaches “a case where contents of speeches are not recognized, a given number of times in succession” to determine “that the detected sounds are each a noise.”  An objective, then, is to prevent continuation of an unnatural conversation where the robot is speaking to itself or responding to noise from a television or a radio.  It would have been obvious to one having ordinary skill in the art to determine that sounds are noise in Bastyr et al. when speech is not recognized a given number of times in succession as taught by Aoyama et al. for a purpose of preventing continuation of an unnatural conversation where a robot is responding to noise from a television or a radio.

Concerning claim 4, Bastyr et al. discloses “a case where (i) a content of a speech is recognized from a detected sound” for command words or command phrases.  (¶[0039]: Figure 1)  However, Bastyr et al. does not disclose recognizing speech “but (ii) the content of the speech does not correspond to a content of a response made by the output section, the noise determining section determines that the detected sound is noise.”  Still, Aoyama et al. teaches a confirmation partner confirmation procedure for a robot, where if conversational contents of the user are not the same as the predicted response contents, robot 1 urges the user to respond to robot 1 while continuing the conversation.  When robot 1’s question is “What is Yukiko’s favorite?”, and receives a Aoyama et al., then, teaches “a case where (i) a content of a speech is recognized from a detected sound but (ii) the content of the speech does not correspond to a content of a response made by the output section”.  Aoyama et al. ends the conversation when it determines noise is from a television or a radio because a predicted response is not obtained in a manner similar to turning off a microphone by Bastyr et al. so that “the noise determining section determines the detected sound is a noise.”

Conclusion
The prior art made of record and not relied upon is considered pertinent to Applicants’ disclosure.
Odell et al., Buck et al., Verdooner et al., and Bhaya et al. disclose related prior art.
Jhawar et al. is related art.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571) 272-5551.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair.  Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/MARTIN LERNER/Primary Examiner
Art Unit 2657                                                                                                                                                                                                        June 24, 2021