DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments, filed 29 November 2021 have been fully considered, but they are not persuasive.  
With respect to claim 1, Examiner respectfully submits prior art of record teaches the limitation “determining that the direct source is present as a result of determining that the spatial cue is greater than the threshold value”. 
Applicant refers to Fig. 1 and paragraph 88 of Shin to argue that “Shin does not disclose determining that the direct source is present as a result of determining that one of the "single-channel VAD," "proximity VAD," "phase VAD," and "onset/offset VAD" (mapped to the claimed "spatial cue") is greater than the threshold value.” Examiner respectfully disagrees. Shin first discloses [0052] It is also expressly noted that any one or more of these tasks shown in FIGS. 1 and 2 may be implemented independently of the rest of the system (e.g., as part of another audio signal processing system). Shin’s method M100 of Fig. 8B, and MF100 of Fig. 11A, for example, show method of processing an audio signal based on two different VADs (phase VAD [0084], and proximity VAD [0085]), similar to voice activity detection T24 of Fig. 1. [0088] of Shin then teaches task T400 may be configured to compare the series of first and second voice activity measures to corresponding thresholds and to combine the resulting voice activity decisions to produce the series of combined voice activity decisions. See also e.g. VAD thresholds for proximity VAD shown in Fig. 11B, and the Y-Axis denoting the Proximity VAD  [0158] An acoustic signal processing apparatus as described herein (e.g., apparatus A100 or MF100) may be incorporated into an electronic device that accepts speech input in order to control certain operations, or may otherwise benefit from separation of desired noises from background noises, such as communications devices. Many applications may benefit from enhancing or separating clear desired sound from background sounds originating from multiple directions.”; in other words, sounds in close proximity [“direct”] distinguished from those in the background [“diffuse”]. For at least these reasons, Shin teaches the limitation “determining that the direct source is present as a result of determining that the spatial cue is greater than the threshold value”.
With respect to claim 49, Applicant argues “Shin does not disclose using one of "single- channel VAD," "proximity VAD," "phase VAD," and "onset/offset VAD" (mapped to the claimed "spatial cue") to determine whether a non-diffuse source is present.” Examiner respectfully disagrees. Shin first discloses [0052] It is also expressly noted that any one or more of these tasks shown in FIGS. 1 and 2 may be implemented independently of the rest of the system (e.g., as part of another audio signal processing system). Shin’s method M100 of Fig. 8B, and MF100 of Fig. 11A, for example, show method of processing an audio signal based on two different VADs (phase VAD [0084], and proximity VAD [0085]), similar to voice activity detection T24 of Fig. 1. [0088] of Shin then teaches task T400 may be configured to compare the series of first and second voice activity measures to corresponding thresholds and to combine the resulting voice  [0158] An acoustic signal processing apparatus as described herein (e.g., apparatus A100 or MF100) may be incorporated into an electronic device that accepts speech input in order to control certain operations, or may otherwise benefit from separation of desired noises from background noises, such as communications devices. Many applications may benefit from enhancing or separating clear desired sound from background sounds originating from multiple directions.”; in other words, sounds in close proximity [“direct”] distinguished from those in the background [“diffuse”]. For at least these reasons, Shin teaches the limitation “using the spatial cue to determine whether a non-diffuse source is present”.
With respect to claim 9, given the broadness of the claim limitations, Examiner respectfully maintains that Shin teaches detecting a position of the direct source using said spatial cue (e.g. spatially selecting filtering operations including directionally selective filtering operations such as beamforming and based on source proximity; para 53; note also directional coherency measure; para 69; features of the signals including proximity, direction of arrival; para 77, 78 detecting angle; para 79).

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1 – 4, 6, 7, 9, 10, 13, 20 – 23, 25, 26, 28, 29, 32, 41, 49, 50, and 52– 54 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Shin et al. (hereinafter Shin, U.S. Patent Application Publication 2012/0130713).

Regarding Claim 1, Shin discloses:
A method for voice or sound activity detection for spatial audio (e.g. operation of Fig. 1 and corresponding connecting figures, note voice activity detection and/or noise suppression performance; para 50), the method comprising:
receiving input signals (e.g. input mic channels; Fig. 1; note implementation into electronic device that accepts speech input; para 158);
analyzing the received input signals to produce a spatial cue (e.g. for an audio signal… the various voice activities measure calculated from the multi-channel signal, including magnitude and phase; para 69; voice activity measure including proximity, direction of arrival, onset/offset direction of arrival; para 77; DoA calculated as a ratio of phase difference to frequency; para 69; note spatially selective filtering operations include directionally selective filter operations; para 53, 58 and note spatial processing and weighting based on estimated direction of arrival; para 105, 106);
using the spatial cue to determine whether a direct source is present (e.g. use of multiple voice activity measures based on different features of the signal, including proximity VAD; para 
generating a direct source detection decision indicating whether or not a direct source is determined to be present (e.g. proximity based VAD signal input to task T120; Fig. 2, and 21B);
based on the received input signals, obtaining a primary activity decision, wherein the primary activity decision is a primary voice activity decision or a primary sound activity decision (e.g. voice activity measure based on speech onset or offset, detection of speech onsets or offsets; para 68); and
producing a spatial activity decision based on said direct source detection decision and the primary activity decision, wherein the spatial activity decision is a spatial voice activity decision or a spatial sound activity decision (e.g. combining the decisions of the onset and offset VAD with other VAD decisions; para 74; combining voice activity measures that are based on different features, e.g. proximity, direction of arrival, onset/ offset in order to obtain a good frame-by-frame VAD decision; para 77; note the use of measures for other VAD measures as well, and the resultant final VAD decision, final output signal; para 77), wherein using the spatial cue to determine whether the direct source is present comprises:
comparing the spatial cue to a threshold value (e.g. compare the series of first and second voice activity measures to corresponding thresholds; para 88; note that the second measure based on a proximity; para 85); and 


Regarding Claim 2, in addition to the elements stated above regarding claim 1, Shin further discloses:
wherein the spatial activity decision is set active if the direct source detection decision is active and the primary activity decision is active (e.g. note decision from the VAD combines operations using an and or an or operation; para 76, 77, as such, using the OR operation indicates only one need be active to provide a positive VAD indication; using the AND operation indicates both need to be active to provide a positive VAD indication).

Regarding Claim 3, in addition to the elements stated above regarding claim 2, Shin further discloses:
wherein the spatial activity decision remains active as long as the direct source detection decision is active, even if the primary activity decision goes inactive (e.g. using the OR operation… decision from the VAD combines operations using an and or an or operation; para 76, 77, note that it is risky to suppress the signal in the combined configuration; para 77)

Regarding Claim 4, in addition to the elements stated above regarding claim 1, Shin further discloses:


Regarding Claim 6, in addition to the elements stated above regarding claim 4, Shin further discloses:
determining a relevant position decision comparing a source position to relevant positions stored in a memory (e.g. directional coherency measure; para 69; further, operations based on source proximity; para 53; note comparing the series of voice activity measures to corresponding thresholds; para 88, 91; and test statistics used; and note storage of information for the system; para 149-157), and determining that the position is relevant if there is a match (e.g. average number of frequency bins with the estimated DoA in the range of look direction within 10 degrees; para78  and look direction range, and its adjustments; para 123, 124, 126; also note the sensing of signal components from far-field signals; para 143)

Regarding Claim 7, in addition to the elements stated above regarding claim 6, Shin further discloses:
wherein the spatial activity decision is set active if the direct source detection decision is active and any one of the primary activity decision and the relevant position decision is active (e.g. using the OR operation… decision from the VAD combines operations using an and or an 

Regarding Claim 9, in addition to the elements stated above regarding claim 1, Shin further discloses:
detecting a position of the direct source using said spatial cue (e.g. spatially selecting filtering operations including directionally selective filtering operations such as beamforming and based on source proximity; para 53; note also directional coherency measure; para 69; features of the signals including proximity, direction of arrival; para 77, 78 detecting angle; para 79).

Regarding Claim 10, in addition to the elements stated above regarding claim 9, Shin further discloses:
wherein the position of direct source is represented by at least one of an inter-channel time difference (ICTD) (e.g. time frequency gain based VAD; para 110; calculated form direction of arrival, Difference between microphones for the TF cell; para 123), an inter-channel level difference (ICLD) (e.g. magnitude difference between channels; para 69), and an inter-channel phase differences (ICPD) (e.g. phase based difference measure… phase coherency; para 69).

Regarding Claim 13, in addition to the elements stated above regarding claim 8, Shin further discloses:


Claims 20 and 41 are directed to the corresponding apparatus claim of the system presented in claim 1. Accordingly, claims 20, 41 and 49 are rejected under the same grounds as claim 1 above.

Claim 21 is directed to the corresponding apparatus claim of the system presented in claim 2. Accordingly, claim 21 is rejected under the same grounds as claim 2 above. 

Claim 22 is directed to the corresponding apparatus claim of the system presented in claim 3. Accordingly, claim 22 is rejected under the same grounds as claim 3 above.

Claim 23 is directed to the corresponding apparatus claim of the system presented in claim 4. Accordingly, claim 23 is rejected under the same grounds as claim 4 above. 

Claim 25 is directed to the corresponding apparatus claim of the system presented in claim 6. Accordingly, claim 25 is rejected under the same grounds as claim 6 above. 

Claim 26 is directed to the corresponding apparatus claim of the system presented in claim 7. Accordingly, claim 26 is rejected under the same grounds as claim 7 above. 

Claim 28 is directed to the corresponding apparatus claim of the system presented in claim 9. Accordingly, claim 28 is rejected under the same grounds as claim 9 above. 

Claim 29 is directed to the corresponding apparatus claim of the system presented in claim 10. Accordingly, claim 29 is rejected under the same grounds as claim 10 above. 

Claim 32 is directed to the corresponding apparatus claim of the system presented in claim 13. Accordingly, claim 32 is rejected under the same grounds as claim 13 above. 

Regarding Claim 49, Shin discloses:
A method for voice or sound activity detection for spatial audio (e.g. operation of Fig. 1 and corresponding connecting figures, note voice activity detection and/or noise suppression performance; para 50), the method comprising:
receiving input signals (e.g. input mic channels; Fig. 1; note implementation into electronic device that accepts speech input; para 158);
analyzing the received input signals to determine a spatial cue (e.g. for an audio signal… the various voice activities measure calculated from the multi-channel signal, including magnitude and phase; para 69; voice activity measure including proximity, direction of arrival, onset/offset direction of arrival; para 77;  similar to applicant’s disclosure of calculating DOA from phase differences, “ICPD” on page 17 under the Direct source location memory section, 
using the spatial cue to determine whether a non-diffuse source is present (e.g. phase VAD using the estimated DoA; para 78; time-frequency phase based VAD, which is calculated from the direction of arrival estimation for each TF cell; para 123;  note voice activity detecting is used to indicate the “presence” or absence of speech segments in an audio signal; para 65; also note separation of desired sound from background sound as well; para 158; in other words, sounds in close proximity [“direct”] distinguished from those in the background [“diffuse”]);
generating a direct source detection decision indicating whether or not a non-diffuse source is determined to be present (e.g. proximity based VAD signal input to task T120; Fig. 2, and 21B);
based on the received input signals, obtaining a primary activity decision, wherein the primary activity decision is a primary voice activity decision or a primary sound activity decision (e.g. voice activity measure based on speech onset or offset, detection of speech onsets or offsets; para 68); and
producing a spatial activity decision based on said direct source detection decision and the primary activity decision, wherein the spatial activity decision is a spatial voice activity decision or a spatial sound activity decision (e.g. combining the decisions of the onset and offset 

Regarding Claim 50, in addition to the elements stated above regarding claim 1, Shin further discloses:
wherein the spatial cue comprises a degree of an inter-channel cross-correlation (ICC) indicating a diffuseness of a source (e.g. voice activity measure calculated from a multi-channel signal based on a difference between channels, including measures based on phase differences, also called a phase or directional coherency measure; para 69; note phase-based VAD, e.g. a coherency measure; para 75 note voice activity measure based on a relation between channels of the audio signal, for example, measure may be based on a phase-difference based measure; para 84; note this measure is based on phase differences between channels, or a “coherency” measure, similar to the ICC detailed on page 3 line 15 of Applicant’s specification, noting inter-channel coherence or correlation (ICC)).

Regarding Claim 52, in addition to the elements stated above regarding claim 1, Shin further anticipates:
wherein the threshold value is determined based a standard deviation estimate of  a cross correlation function (e.g. threshold value for one measure may be a function of a corresponding value of another measure; para 73; note para 75 as well-  For example, the VAD statistic may be multiplied by a factor greater than one or increased by a bias value greater than zero (before thresholding); and voice activity measure calculated from a multi-channel signal based on a difference between channels, including measures based on phase differences, also called a phase or directional coherency measure; para 69; note phase-based VAD, e.g. a coherency measure; para 75)

Regarding Claim 53, in addition to the elements stated above regarding claim 1, Shin further discloses:
Wherein the spatial cue includes one or more measures that is determined by using a function of generalized cross correlation with phase transform (GCC PHAT) (e.g. multiplying [“transforming”] the phase based VAD statistic (e.g. coherency measure) by a factor; para 75).

Regarding Claim 54, in addition to the elements stated above regarding claim 1, Shin further discloses:
Wherein the primary activity is obtained by performing a monophonic activity detection (single-channel VAD para 76).

Allowable Subject Matter
Claims 11, 12, 30 and 31 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Conclusion 
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.

1.) Xiang et al. (U.S. 2014/0023196) details a system that incorporates abilities such as voice detection, enhancement and separation; para 229 as well as the ability to distinguish between foreground voices and a background effect; para 139; using spatial audio object coding, including ICC, ILD, ITD etc.; para 80

THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to THOMAS H MAUNG whose telephone number is (571)270-5690.  The examiner can normally be reached on Monday-Friday, 9am-6pm, EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/THOMAS H MAUNG/Primary Examiner, Art Unit 2654