DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-6, 8-13, and 16-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Schevciw et al. (US PGPUB #2011/0288860) in view of Wang et al. (US PGPUB #2020/0286465) further in view of Shin et al. (US PGPUB #2012/0130713).

Regarding Claim 1, Schevciw discloses a method (title; Figs. 1A-9A, 17A-21B) comprising:
receiving, via a first microphone, a first audio signal (Schevciw Fig. 5B: mic MC10, audio signal MS30; ¶0113);
receiving, via a second microphone, a second audio signal (Schevciw Fig. 5B: mics ML10/MR10, audio signal MS10/MS20, ¶0113);
determining whether a first threshold of voice activity is met (Schevciw ¶0114 [see Fig. 5B] discloses for a case in which a gain-based scheme is used, detector VAD20 can be configured to produce VAD signal VS20 to indicate a presence of voice activity when the ratio of the level of third [i.e., first] audio signal AS30 to the level of second audio signal AS20 exceeds [alternatively, is not less than] a threshold value, and a lack of voice activity otherwise. Equivalently, detector VAD20 can be configured to produce VAD signal VS20 to indicate a presence of voice activity when the difference between the logarithm of the level of third [i.e., first] audio signal AS30 to the logarithm of the level of second audio signal AS20 exceeds [alternatively, is not less than] a threshold value, and a lack of voice activity otherwise);
in accordance with a determination that the first threshold of voice activity is met (Schevciw ¶0114 [see Fig. 5B] discloses for a case in which a gain-based scheme is used, detector VAD20 can be configured to produce VAD signal VS20 to indicate a presence of voice activity when the ratio of the level of third [i.e., first] audio signal AS30 to the level of second audio signal AS20 exceeds [alternatively, is not less than] a threshold value, and a lack of voice activity otherwise. Equivalently, detector VAD20 can be configured to produce VAD signal VS20 to indicate a presence of voice activity when the difference between the logarithm of the level of third [i.e., first] audio signal AS30 to the logarithm of the level of second audio signal AS20 exceeds [alternatively, is not less than] a threshold value, and a lack of voice activity otherwise):
determining that a voice onset has occurred (Schevciw ¶0098 discloses one example of a VAD operation whose results can be combined by detector VAD12 with results from more than one of the VAD operations on first audio signal AS10 and second audio signal AS20 includes comparing highband and low band energies of the segment to respective thresholds. Detecting speech onsets, comparing a ratio of frame energy to average energy and/or a ratio of lowband energy to highband energy; Fig. 4A: Voice Activity Detector VAD12);
in accordance with a determination that the first threshold of voice activity is not met, forgoing determining that a voice onset has occurred (Schevciw ¶0098 discloses detecting speech onsets and/or offsets, comparing a ratio of frame energy to average energy and/or a ratio of lowband energy to highband energy); and
transmitting an alert to a processor based on the determination that the voice onset has occurred (Schevciw ¶0129 discloses a gain-based VAD technique can be configured to detect that a segment is from a desired source in an endfire direction of the microphone array [e.g., to indicate detection of voice activity] when a difference between the gains of the channels is greater than a threshold value).
Schevciw may not explicitly disclose determining a first probability of voice activity based on the first audio signal; determining a second probability of voice activity based on the first audio signal and the second audio signal; determining whether a first threshold of voice activity is met based on the first probability of voice activity and the second probability of voice activity; transmitting an alert to a processor based on the determination that the voice onset has occurred; and in accordance with a determination that the first threshold of voice activity is not met, forgoing determining that a voice onset has occurred.
determining a first probability of voice activity based on the first audio signal (Wang ¶0029 discloses the server 120 obtains first speech segments based on the to-be-recognized speech signal, and then obtains first probabilities. ¶0075 discloses the first probabilities that are in a one-to-one correspondence with the first speech segments are first obtained);
determining a second probability of voice activity based on the first audio signal and the second audio signal (Wang ¶0029 discloses server 120 obtains second speech segments based on the to-be-recognized speech signal, and respectively generates first prediction characteristics of second speech segments based on first probabilities corresponding to first speech segments that correspond to each second speech segment. ¶0075 further discloses then the second probabilities that are in a one-to-one correspondence with the second speech segments are obtained based on first probabilities corresponding to first speech segments that correspond to each second speech segment).
Schevciw and Wang are analogous art as they pertain to process speech signals using head-mounted microphone pair. Therefore it would have been obvious to someone of ordinary skill in the art before the effective filing date of the invention was made to modify the speech processing (as taught by Schevciw) to determine probability to check whether the pre-determined keyword exists in the to-be-recognized speech signal (as taught by Wang, ¶0029) to overcome the conventional method which is extremely sensitive to manually set decision logic, causing low universality (Wang, ¶0004).
And Shin (Figs. 1-4, 7A-11B, 18-26) teaches determining whether a first threshold of voice activity is met based on the first probability of voice activity and the second probability of voice activity (Shin ¶0092 discloses task TA20 obtains the first series of voice activity ;
transmitting an alert to a processor based on the determination that the voice onset has occurred (Shin ¶0102 discloses Fig. 10D shows a block diagram of an apparatus A100 according to a general configuration that includes a first calculator 100, a second calculator 200, a boundary value calculator 300, and a decision module 400. Decision module 400 is configured to produce a series of combined voice activity decisions, based on the series of values of the first voice activity measure, the series of values of the second voice activity measure, and the calculated boundary value of the first voice activity measure); and
in accordance with a determination that the first threshold of voice activity is not met, forgoing determining that a voice onset has occurred (Shin ¶0100 discloses task T400 can also be configured to normalize a voice activity measure based on speech onset and/or offset. ¶0101 discloses for onset/offset detection, it may be desirable to track the maximum and minimum of the square of ΔE(k,n). It may also be desirable to track the maximum as the square of a clipped value of ΔE(k,n) [ e.g., as the square of max[0, ΔE(k,n)] for onset and the square of min[0, ΔE(k,n)] for offset]. While negative values of ΔE(k,n) for onset and positive values of ΔE(k,n) for offset may be useful for tracking noise fluctuation in minimum statistic tracking, they may be less useful in maximum statistic tracking. ¶0115 discloses if there is not enough computational budget, instead of computing the maximum and minimum for each band, the global maximum and minimum of log RMS level difference between two microphone signals can be used with .
Schevciw, Wang, and Shin are analogous art as they pertain to process speech signals using head-mounted microphone pair. Therefore it would have been obvious to someone of ordinary skill in the art before the effective filing date of the invention was made to modify the teachings of Schevciw in view of Wang in light of the teachings of Shin to use VAD to indicate the presence or absence of human speech in segments of an audio signal (as taught by Shin, ¶0065) to overcome the prior art limitations which employ voice-recognition-based data inquiry in which the accuracy of the systems can be significantly impeded by interfering noise (Shin, ¶0005).

Regarding Claim 2, Schevciw in view of Wang and Shin discloses the method of claim 1, wherein determining whether the first threshold of voice activity is met comprises:
determining a baseline noise power (Schevciw ¶0104 discloses a noise power reference signal as computed according to a single-channel VAD signal [e.g., a VAD signal based only on third audio signal AS30] is usually only an approximate stationary noise estimate. Moreover, such computation generally entails a noise power estimation delay, such that corresponding gain adjustment can only be performed after a significant delay); and
determining a ratio of the first audio signal to the baseline noise power (Schevciw ¶0114 [see Fig. 5B] discloses for a case in which a gain-based scheme is used, detector VAD20 can be configured to produce VAD signal VS20 to indicate a presence .

Regarding Claim 3, Schevciw in view of Wang and Shin discloses the method of claim 1, wherein determining the second probability of voice activity based on the first audio signal and the second audio signal comprises:
summing the first audio signal and the second audio signal to produce a summation signal (Schevciw ¶0113 discloses a VAD signal based on spatial information from the microphone array MC10 and ML10 [or MC10 and MR10] is used to enhance voice information from microphone MC10. Fig. 5B shows a block diagram of such an implementation A130 of apparatus A100. Apparatus A130 includes a second voice activity detector VAD20 that is configured to produce a second VAD signal VS20 based on information from second audio signal AS20 and from third audio signal AS30. ¶0116 discloses apparatus A130 also includes an implementation VAD16 of voice activity detector VAD10 that is configured to combine VAD signal VS20 [e.g., using AND and/or OR logic] with results from one or more of the VAD operations on first audio signal AS10 and second audio signal AS20);
subtracting the first audio signal and the second audio signal to produce a difference signal (Schevciw ¶0114 [see Fig. 5B] discloses detector VAD20 can be configured to ; and
calculating a ratio of the difference signal to the summation signal (Schevciw ¶0114 [see Fig. 5B] discloses for a case in which a gain-based scheme is used, detector VAD20 can be configured to produce VAD signal VS20 to indicate a presence of voice activity when the ratio of the level of third [i.e., first] audio signal AS30 to the level of second audio signal AS20 exceeds [alternatively, is not less than] a threshold value, and a lack of voice activity otherwise).

Regarding Claim 4, Schevciw in view of Wang and Shin discloses the method of claim 1,
wherein the first microphone and the second microphone are configured to be equidistant from a user's mouth (Schevciw ¶0122 discloses the distance between microphones should not exceed half of the minimum wavelength in order to avoid spatial aliasing. The wavelength of a four-kHz signal is about 8.5 centimeters, so in this case, the spacing between adjacent microphones should not exceed about four centimeters).

Regarding Claim 5, Schevciw in view of Wang and Shin discloses the method of claim 1,
wherein the first microphone is configured to be a first distance from a user's mouth, and wherein the second microphone is configured to be a second distance different from the first distance from a user's mouth (Schevciw ¶0075 discloses several different examples of .

Regarding Claim 6, Schevciw in view of Wang and Shin discloses the method of claim 5, the method further comprising
determining a time offset associated with a difference between the first distance and the second distance (¶0086 discloses voice activity detector VAD10 is configured to produce VAD signal VS10 by cross-correlating corresponding segments of first audio signal AS10 and second audio signal AS20 in the time domain. Voice activity detector VAD10 can be configured to calculate the cross-correlation r( d) over a range of delays -d to +d),
wherein determining the second probability of voice activity based on the first audio signal and the second audio signal further comprises compensating for the time offset (Schevciw ¶0088 discloses it may be desirable to configure voice activity detector VAD10 to calculate the cross-correlation over a limited range around zero delay. For an example in which the sampling rate of the microphone signals is eight kilohertz, it may be desirable for the VAD to cross-correlate the signals over a limited range of plus or minus one, two, three, four, or five samples. In such a case, each sample corresponds to a time difference of 125 microseconds [equivalently, a distance of 4.25 centimeters]. For an example in which the sampling rate of the microphone signals is sixteen .

Regarding Claim 8, Schevciw in view of Wang and Shin discloses the method of claim 1. But Schevciw in view of Wang may not explicitly disclose wherein determining whether the first threshold of voice activity is met comprises: weighting the first probability of voice activity with a first weight; and weighting the second probability of voice activity with a second weight.
However, Shin (Figs. 7A-11B, 19-26) teaches wherein determining whether the first threshold of voice activity is met comprises: weighting the first probability of voice activity with a first weight; and weighting the second probability of voice activity with a second weight (Shin ¶0105 discloses if we denote the weight for each noise estimate Ni[n] as Wi[n], for example, the combined noise reference can be expressed as a linear combination ΣWi[n]*Ni[n] of weighted noise estimates, where ΣWi[n]=l. The weights can be dependent on the decision between single- and dual-microphone modes, based on DoA estimation and the statistics on the input signal [e.g., normalized phase coherency measure]. For example, it can be desirable to set the weight for a nonstationary noise reference which is based on spatial processing to zero for single-microphone mode. As for another example, it can be desirable for the weight for a VAD-based long-term noise estimate and/or nonstationary noise estimate to be higher for speech-inactive frames .
Schevciw, Wang, and Shin are analogous art as they pertain to process speech signals using head-mounted microphone pair. Therefore it would have been obvious to someone of ordinary skill in the art before the effective filing date of the invention was made to modify the teachings of Schevciw in view of Wang in light of the teachings of Shin to use VAD to indicate the presence or absence of human speech in segments of an audio signal (as taught by Shin, ¶0065) to overcome the prior art limitations which employ voice-recognition-based data inquiry in which the accuracy of the systems can be significantly impeded by interfering noise (Shin, ¶0005).

Regarding Claim 9, Schevciw in view of Wang and Shin discloses the method of claim 1, the method further comprising:
in accordance with a determination that the first threshold of voice activity is met (Schevciw ¶0092 discloses voice activity detector VAD10 can be configured, for example, to indicate voice detection when the level of one or both signals is above a threshold value [indicating that the signal is arriving from a source that is close to the microphone] and the levels of the two signals are substantially equal [indicating that the signal is arriving from a location between the two microphones]. ¶0093 discloses voice activity detector VAD10 can be configured to use one or more of the time-domain techniques to compute VAD signal VS10 at relatively little computational expense. In a further implementation, voice activity detector VAD10 is configured to compute such a value of VAD signal VS10 [e.g., based on a cross-correlation or level difference] for each of a plurality of subbands of each segment. In this case, voice activity detector , initiating a subsequent processing step (Schevciw ¶0098 discloses one example of a VAD operation whose results can be combined by detector VAD12 with results from more than one of the VAD operations on first audio signal AS10 and second audio signal AS20 includes comparing highband and low band energies of the segment to respective thresholds. Detecting speech onsets, comparing a ratio of frame energy to average energy and/or a ratio of lowband energy to highband energy; Fig. 4A: Voice Activity Detector VAD12. ¶0147 discloses the microphone signals [e.g., signals MS10, MS20, MS30] can be routed to a processing chip that is located in a portable audio sensing device for audio recording and/or voice communications applications, such as a telephone handset [e.g., a cellular telephone handset] or smartphone; a wired or wireless headset [e.g., a Bluetooth headset]; a handheld audio and/or video recorder; a personal media player configured to record audio and/or video content; a personal digital assistant [PDA] or other handheld computing device; and a notebook computer, laptop computer, netbook computer, tablet computer, or other portable computing device).

Regarding Claim 10, Schevciw in view of Wang and Shin discloses the method of claim 9,
wherein the subsequent processing step comprises determining a content of speech (Schevciw ¶0101 discloses Apparatus A100 includes a speech estimator SE10 that is configured to produce a speech signal SS10 from third [i.e., first] audio signal SA30 according to VAD signal VS30; Figs. 1A-9A, 10A-21B).
11, Schevciw in view of Wang and Shin discloses the method of claim 1,
wherein the first microphone and the second microphone are located on a wearable head device (Schevciw Figs. 16A-16E; ¶0157 discloses glasses; helmet; goggles [e.g., ski goggles]; visor or brim of a cap or hat; lapel, breast pocket, or shoulder).

Regarding Claim 12, Schevciw in view of Wang and Shin discloses the method of claim 1,
wherein the determination of whether the first threshold of voice activity is met is based on sensor data from a wearable head device (Schevciw ¶0157 discloses Figs. 16A-16E show eyeglasses having each microphone of noise reference pair ML10, MR10 mounted on a temple and voice microphone MC10 mounted on a temple or the corresponding end piece. The voice microphone MC10 can be used as a portable audio sensing device within the implementation of apparatus A100).

Regarding Claim 13, Schevciw in view of Wang and Shin discloses the method of claim 12,
wherein the sensor data comprises mouth movement data associated with a user of the wearable head device (Schevciw ¶0075 discloses several different examples of positions for voice microphone MC10 during a use of apparatus A100 are shown by labeled circles in Fig. 2A. In position A, voice microphone MC10 is mounted in a visor of a cap or helmet. In position B, voice microphone MC10 is mounted in the bridge of a pair of eyeglasses, goggles, safety glasses, or other eyewear. In position CL or CR, voice microphone MC10 is mounted in a left or right temple of a pair of eyeglasses, goggles, safety glasses, or other eyewear. In position DL or DR, voice microphone MC10 is voice microphone MC10 is mounted on a boom that extends toward the user's mouth from a hook worn over the user's ear. In position FL, FR, GL, or GR, voice microphone MC10 is mounted on a cord that electrically connects voice microphone MC10, and a corresponding one of noise reference microphones ML10 and MR10, to the communications device. ¶0133 discloses Fig. 9B shows a side view of an earbud EB10 in which microphone MC10 is mounted within a strain-relief portion of cord CD10 at the earbud such that microphone MC10 is directed toward the user's mouth during use.  ¶0157 discloses the voice microphone MC10 can be used as a portable audio sensing device within the implementation of apparatus A100; Figs. 16A-16E).

Claims 16-20 are rejected for the same reasons as set forth in Claims 1-6 and 8-13 (Schevciw ¶0175-¶0186 discloses an apparatus and a computer readable media that performs the method).

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Schevciw et al. (US PGPUB #2011/0288860) in view of Wang et al. (US PGPUB #2020/0286465) further in view of Shin et al. (US PGPUB #2012/0130713) and Visser et al. (US #2010/0323652).

Regarding Claim 7, Schevciw in view of Wang and Shin discloses the method of claim 6, the method further comprising:
applying a bandpass filter to the first audio signal (Schevciw ¶0089 discloses it can be desirable to configure audio preprocessing stage AP10 to provide first audio signal AS10 and second audio signal AS20 as bandpass signals); and
applying a bandpass filter to the second audio signal (Schevciw ¶0089 discloses it can be desirable to configure audio preprocessing stage AP10 to provide first audio signal AS10 and second audio signal AS20 as bandpass signals).
Schevciw may not explicitly disclose applying a window function to the first audio signal; applying a finite-impulse response (FIR) filter to the second audio signal, the FIR filter associated with the time offset compensation; applying a window function to the second audio signal.
However, Wang (abstract; Figs.8 and 11-14) teaches applying a window function to the first audio signal (Wang ¶0036 discloses the framing processing can be implemented by moving a window function);
applying a window function to the second audio signal (Wang ¶0036 discloses the framing processing can be implemented by moving a window function).
Schevciw and Wang are analogous art as they pertain to process speech signals using head-mounted microphone pair. Therefore it would have been obvious to someone of ordinary skill in the art before the effective filing date of the invention was made to modify the speech processing (as taught by Schevciw) to determine probability to check whether the pre-determined keyword exists in the to-be-recognized speech signal (as taught by Wang, ¶0029) to overcome the conventional method which is extremely sensitive to manually set decision logic, causing low universality (Wang, ¶0004).
And Visser teaches applying a finite-impulse response (FIR) filter to the second audio signal, the FIR filter associated with the time offset compensation (Visser ¶0123 discloses task T300 can be configured to calculate a smoothed value using a temporal smoothing function, such as a finite-impulse-response [FIR] filter).
.

Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Schevciw et al. (US PGPUB #2011/0288860) in view of Wang et al. (US #2020/0286465) further in view of Shin et al. (US PGPUB #2012/0130713) and Vennström et al. (US #2018/0129469).

Regarding Claim 14, Schevciw in view of Wang and Shin discloses the method of claim 12, but may not explicitly disclose wherein the sensor data comprises eye movement data associated with a user of the wearable head device.
However, Vennström (title; abstract; Fig. 2) teaches wherein the sensor data comprises eye movement data associated with a user of the wearable head device (Vennström ¶0046-¶0052 discloses determining gaze point of a user on a display and produce audio that is dependent on the gaze point; Fig. 2. ¶0134 discloses suitable for both remote gaze detection devices as well as wearable gaze detection devices. Virtual Reality headsets such as the Oculus Rift, wearable displays such as Google Glass and the like all have the capability of providing interactive events to a user, by providing gaze and/or head tracking capability in a wearable device. ¶0127 discloses a point of interest can be .
Schevciw, Wang, Shin, and Vennström are analogous art as they pertain to process speech signals using head-mounted microphone pair. Therefore it would have been obvious to someone of ordinary skill in the art before the effective filing date of the invention was made to modify the teachings of Schevciw in view of Wang and Shin in light of the teachings of Vennström to produce audio based on the gaze point detection (as taught by Vennström, ¶0042) to overcome the prior art limitations which relies on a user’s ability to accurately direct a body part to a particular point of interest on a computer display, which introduces the possibility of error (Vennström, ¶0006).

Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Schevciw et al. (US PGPUB #2011/0288860) in view of Wang et al. (US #2020/0286465) further in view of Shin et al. (US PGPUB #2012/0130713) and Tran (US #2014/0194702).

Regarding Claim 15, Schevciw in view of Wang and Shin discloses the method of claim 12, but may not explicitly disclose wherein the sensor data comprises vital sign data associated with a user of the wearable head device.
However, Tran (title; abstract; Figs. 4-5, 6B, 14B) teaches wherein the sensor data comprises vital sign data associated with a user of the wearable head device (Tran ¶0283 discloses Fig. 14B shows a sunglass or eyeglass which contains electronics for communicating with the mesh network and for sensing acceleration and bioimpedance, .
Schevciw, Wang, Shin, and Tran are analogous art as they pertain to process speech signals using head-mounted microphone pair. Therefore it would have been obvious to someone of ordinary skill in the art before the effective filing date of the invention was made to modify the teachings of Schevciw in view of Wang and Shin in light of the teachings of Tran to detect heart activities near the brain by using the side module that contains piezoelectric transducers or microphones (as taught by Tran, ¶0283) to overcome monitoring fitness using conventional devices which lack the interactivity, sophisticated user interface, and networking capabilities (Tran, ¶0003).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YOGESHKUMAR G PATEL whose telephone number is (571)272-3957. The examiner can normally be reached 7:30 AM-4 PM PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/YOGESHKUMAR PATEL/Primary Examiner, Art Unit 2651