DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claim 17 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 17 recites “central unit for use in a voice processing system comprising P microphone units and a central unit”. “Central unit for … a central unit” is not clear.
Claim 17 recites the limitation “central unit” in central unit for use in a voice processing system comprising P microphone units and a central unit. There is insufficient antecedent basis for this limitation in the claim.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

This application includes one or more claim limitations that use the word “means” or “step” but are nonetheless not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph because the claim limitation(s) recite(s) sufficient structure, materials, or acts to entirely perform the recited function.  Such claim limitation(s) is/are:
"microphone units" [Fig. 1: 102A-102D; ¶0026] in Claims 1, 11, 13-14, and 16-17.
"a central unit" [Fig. 1: 104; ¶0026] in Claims 1, 8-9, 11, and 16-17.
Because this/these claim limitation(s) is/are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are not being interpreted to cover only the corresponding structure, material, or acts described in the specification as performing the claimed function, and equivalents thereof.
If applicant intends to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to remove the structure, materials, or acts that performs the claimed function; or (2) present a sufficient showing that the claim limitation(s) does/do not recite sufficient structure, materials, or acts to perform the claimed function.


Allowable Subject Matter
Claims 4-6, 8-10, and 13-15 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 12, and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Dijkstra et al. (US #2008/0101635) in view of Kim et al. (US #2013/0304476).

Regarding Claim 1, Dijkstra (Figs. 1-11) discloses a method in a microphone unit of a voice processing system comprising P microphone units (Figs. 1-2: transmission unit 102) and a central unit (Fig. 1: hearing aid receiver unit 103), the method comprising:
retrieving from N input units Mic_i, i=1, 2, . . . N, N≥2, N microphone signals (Dijkstra ¶0054 discloses two omni-directional microphones M1 and M2; Fig. 4) having a sampling frequency SF1, each microphone signal comprising a target signal component and a noise signal component (Dijkstra: implicit for any microphone signal);
deriving from a group of Y consecutive samples of the source localisation signal a beam form control signal (Dijkstra: implicit for a beamformer);
generating under control of the beam form control signal a group of Y consecutive samples of a beam formed audio signal (Dijkstra ¶0058 discloses the microphones M1 and M2 together associated to a beamformer algorithm form a directional microphone arrangement 26 which, according to Fig. 1, is placed at a relatively short distance to the mouth of the speaker 100 in order to insure a good SNR at the audio source and also to allow the use of easy to implement and fast algorithms for voice detection) having a sampling frequency SF2 from the N microphone signals, wherein SF1≥SF2≥SF3 (Dijkstra: implicit; since all three frequencies may be identical);
deriving a set of metadata for the group of Y consecutive samples of the beam formed audio signal from corresponding samples of the N microphone signals from which the group of Y consecutive samples of the beam formed audio signal has been obtained (Dijkstra ¶0057 discloses the data link established thereby can be used for transmitting control commands relating to the gain to be set by the receiver unit 103 from the transmission unit 102 to the receiver unit 103, and it also can be used for transmitting general information or commands to the receiver unit 103);
generating data packages, a data package comprising Q groups of Y consecutive samples of the beam formed audio signal and Q sets of metadata, wherein Q≥1 (Dijkstra ¶0058 discloses the combined audio/data signals are supplied to the FM transmitter 120 which uses the neck-loop 121 as an FM radio antenna);
streaming wirelessly the data packages to the central unit (Dijkstra ¶0002 discloses processing audio signals by a transmission unit and transmitting the audio signals from the transmission unit via wireless audio link to a receiver unit. ¶0079 discloses the transmission unit 102 and the receiver unit 103 are designed such that a wireless, preferably inductive, bidirectional data link is established which serves as a "monitoring channel". A polling signal is sent from the transmission unit 102 to the receiver unit 103, whereupon the receiver unit 103 sends status information data regarding the status of the receiver unit 103 to transmission unit 102).
Whereas in Dijkstra, SF1, SF2, SF3 may be implicit, since all three frequencies may be identical, Dijkstra may not explicitly disclose determining from the N microphone signals a source localisation signal; and deriving from a group of Y consecutive samples of the source localisation signal a beam form control signal.
However, Kim (Fig. 18) teaches determining from the N microphone signals a source localisation signal (Kim ¶0093 discloses the device(s) can audibly focus [i.e., zoom in on] the person by using [e.g., all] available microphones to enhance the look direction of the target [i.e., localize] person. ¶0097 discloses using the correlation or similarity measurement and the speakers' IDs, a target user can be detected, localized, and zoomed into, at 760; Fig. 18); and
deriving from a group of Y consecutive samples of the source localisation signal a beam form control signal (Kim ¶0100 discloses the correlation or similarity between the audio beams of the separated speakers of the fixed array and the outputs of steerable arrays can be used to adaptively form a better beam for a targeted speaker. In this manner, the fixed microphones beamformer can be adaptively refined, such that new look directions can be adaptively generated by a fixed beamformer).
Dijkstra and Kim are analogous art as they pertain to hearing assistance system. Therefore it would have been obvious to someone of ordinary skill in the art before the effective filing date of the invention was made to modify the voice controlled system (as taught by Dijkstra) to localize the target speaker (as taught by Kim, ¶0097) to enhance audio user interaction on the receiving side by steering audio beams towards a specific person or a specific sound source (Kim, ¶0003).

Claim 16 is rejected for the same reason as Claim 11.

Regarding Claim 2, Dijkstra in view of Kim discloses the method according to claim 1,
wherein a value of a first metadata field is derived from a group of Y consecutive samples of the source localisation signal (Dijkstra ¶0057 discloses the data link established thereby can be used for transmitting control commands relating to the gain to be set by the receiver unit 103 from the transmission unit 102 to the receiver unit 103, and it also can be used for transmitting general information or commands to the receiver unit 103).

Regarding Claim 3, Dijkstra in view of Kim discloses the method according to claim 1,
wherein the sets of metadata and Q groups of Y consecutive samples of the beam formed audio signal that have been derived from a corresponding part in time of the N microphone signals are included in a ith data package and i+Tth data package respectively, wherein T is an integer greater than 0 (Dijkstra: the sets of metadata and beam formed audio signal that have been derived from a corresponding part in time of the N microphone signals are implicitly included in a ith data package and i+Tth data package respectively).

Claim 12 is rejected for the same reason as Claim 3. 

Claim 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Dijkstra et al. (US #2008/0101635) in view of Kim et al. (US #2013/0304476) further in view of Benattar et al. (US #2018/0146284).

Regarding Claim 7, Dijkstra in view of Kim discloses the method according to claim 1, but may not explicitly disclose the method comprises: determining a speaker voice profile from the N microphone signals; verifying if the speaker voice profile corresponds to a microphone reference speaker voice profile; and start streaming the data packages when the speaker voice profile corresponds to the microphone reference speaker voice profile.
However, Benattar (Figs. 1-7) teaches the method comprises:
determining a speaker voice profile from the N microphone signals (Benattar ¶0053 Fig. 3: 8 channel microphone array; ¶0114 discloses decision 716 [Fig. 7] determines if the speaker profile detection is activated. If activated, the system carries out speaker processing at 717. After the speaker processing, decision 718 determines the speaker has been detected);
verifying if the speaker voice profile corresponds to a microphone reference speaker voice profile (Benattar ¶0069 discloses another mode of operation can rely on speaker recognition. When an algorithm detects the presence of speech along with sufficient acoustical detail to match the audio or speech with a locally stored or available profile, the system can select the beam in which the audio exhibits characteristics sufficiently closer to the profile that was detected. The profile can relate to a speaker of interest. ¶0077 discloses each speaker recognition system can have two phases: Enrollment and verification. During enrollment, the speaker's voice can be recorded and/or modeled on one or more features of the speaker's voice which are extracted to form a voice print, template, or model. In the verification phase, a speech sample or "utterance" can be compared against a previously created voice print. The utterance can be compared against multiple voice prints in order to determine the best match having an acceptable score. ¶0114 discloses this can be done by matching a reference voice profile to a profile generated from a beam); and
start streaming the data packages when the speaker voice profile corresponds to the microphone reference speaker voice profile (Benattar ¶0101 discloses Fig. 5 shows the data output format of the microcontroller 401 [Fig. 4]. The data output frame 501 may include eight (8) 16-bit segments representing audio sampled at 1600 kHz by the sensors 408. The signals representing sampled audio is sequenced in segments 502 of the frame 501. ¶0118 discloses audio processing can include a sound level monitor so that the actual levels of injected sound are determined and the overall volume and/or relative volumes are adjusted in order to maintain a consistent output sound level and/or ratio).
Dijkstra, Kim, and Benattar are analogous art as they pertain to hearing assistance system. Therefore it would have been obvious to someone of ordinary skill in the art before the effective filing date of the invention was made to modify the teachings of Dijkstra in view of Kim in light of the teachings of Benattar to monitor the detection condition such as voice activity detection using the analysis process (as taught by Benattar, ¶0105) to enhance the listening experience (Benattar, ¶0044).

Claim 11 and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ganong et al. (US #2013/0024196) in view of Kim et al. (US #2013/0304476) further in view of Dijkstra et al. (US #2008/0101635).

Regarding Claim 11, Ganong (Figs. 1A-5) discloses a method in a central unit of a voice processing system comprising P microphone units (Fig. 1: one of multiple devices 110A-110D) executing the method according to claim 1 (i.e., retrieving from N input units (¶0031); having a sampling frequency SF1 (implicit for any microphone signal); deriving from a group of Y consecutive samples (¶0094); generating under control of the beam control signal a group of Y consecutive samples (¶0094); deriving a set of metadata (implicit when transmitting data with a mobile phone); streaming wirelessly the data packages to the central unit (¶0036)) , the method comprising:
receiving wirelessly P streams of data packages from the P microphone units (Ganong ¶0039 discloses to accommodate the different communication mechanism by the devices 110A-110D, the meeting server 105 can be coupled to multiple interfaces. For instance, a telephone interface receives audio signals from the telephone network 115 and process the received audio signals [e.g., by converting the received audio signals into a format suitable for processing by the meeting server 105]. The received data packets can be processed by one or more network stack components to extract audio signals to be processed by the meeting server 105);
time synchronizing the data packages of the P streams to obtain P synchronized streams of data packages (Ganong ¶0061 discloses at act 320, the meeting server can attempt to synchronize multiple channels of audio received from different devices. ¶0093 discloses at act 410, the meeting server can receive audio signals from multiple devices and synchronize the received audio signals in any suitable way; Figs. 3A and 4);
detecting in each of the P synchronized streams which parts of the beam formed audio signals comprises a target signal component of an active speaker linked to the microphone unit (Ganong ¶0067 discloses a channel selection algorithm can be applied to two or more channels of audio received from the devices to select a channel having a desired signal quality. For example, a value can computed for each channel representing the likelihood that the particular channel of audio contain speech, and a channel having a highest likelihood value can be selected which generated said stream based on the beam formed audio signals and time corresponding sets of metadata (Ganong ¶0070 discloses the meeting server can analyze the received audio signals to identify a leading speaker [e.g., a speaker whose speech is most clearly captured by the device microphones], and then take any suitable action. For example, the system can give the floor of the meeting to the leading speaker in any suitable manner, such as by displaying visual indications as illustrated in Figs. 3B-3D);
forwarding the detected parts of the beam formed audio signals of the P streams for further processing (Ganong ¶0070 discloses alternatively, or additionally, the system can transmit an audio signal that filters out other speakers and focuses on the leading speaker. ¶0094 discloses the meeting server can then apply a multichannel enhancement technique [e.g., beamforming] with different parameters to obtain multiple audio signals, each of which emphasizes speech from a different speaker and therefore can be treated as a focused channel for that speaker. Alternatively, the meeting server can apply a channel selection technique to obtain a focused channel for each speaker).
Ganong may not explicitly disclose determining from the N microphone signals a source localisation signal; deriving from a group of Y consecutive samples of the source localisation signal a beam form control signal; each data package comprises Q groups of Y consecutive samples of a beam formed audio signal and Q sets of metadata corresponding to Q groups of Y consecutive samples of the beam formed audio signal; generating data packages, a data package comprising Q groups of Y consecutive samples of the beam formed audio signal and Q sets of metadata, wherein Q≥1.
However, Kim (Fig. 18) teaches determining from the N microphone signals a source localisation signal (Kim ¶0093 discloses the device(s) can audibly focus [i.e., zoom in on] the person by using [e.g., all] available microphones to enhance the look direction of the target [i.e., localize] person. ¶0097 discloses using the correlation or similarity measurement and the speakers' IDs, a target user can be detected, localized, and zoomed into, at 760; Fig. 18);
deriving from a group of Y consecutive samples of the source localisation signal a beam form control signal (Kim ¶0100 discloses the correlation or similarity between the audio beams of the separated speakers of the fixed array and the outputs of steerable arrays can be used to adaptively form a better beam for a targeted speaker. In this manner, the fixed microphones beamformer can be adaptively refined, such that new look directions can be adaptively generated by a fixed beamformer).
Ganong and Kim are analogous art as they pertain to speaker identifying system. Therefore it would have been obvious to someone of ordinary skill in the art before the effective filing date of the invention was made to modify the voice controlled system (as taught by Ganong) to localize the target speaker (as taught by Kim, ¶0097) to enhance audio user interaction on the receiving side by steering audio beams towards a specific person or a specific sound source (Kim, ¶0003).
Ganong in view of Kim may not explicitly disclose each data package comprises Q groups of Y consecutive samples of a beam formed audio signal and Q sets of metadata corresponding to Q groups of Y consecutive samples of the beam formed audio signal; and generating data packages, a data package comprising Q groups of Y consecutive samples of the beam formed audio signal and Q sets of metadata, wherein Q≥1.
However, Dijkstra teaches each data package comprises Q groups of Y consecutive samples of a beam formed audio signal and Q sets of metadata corresponding to Q groups of Y consecutive samples of the beam formed audio signal; and generating data packages, a data package comprising Q groups of Y consecutive samples of the beam formed audio signal and Q sets of metadata, wherein Q≥1 (Dijkstra ¶0058 discloses the combined audio/data signals are supplied to the FM transmitter 120 which uses the neck-loop 121 as an FM radio antenna).
Ganong, Kim, and Dijkstra are analogous art as they pertain to speaker identifying system. Therefore it would have been obvious to someone of ordinary skill in the art before the effective filing date of the invention was made to modify the voice controlled system (as taught by Ganong) to send a polling signal from a transmission unit to the receiver unit for checking receiver unit status (as taught by Dijkstra, ¶0079) to allow for optimized counseling and training of the user (Dijkstra, ¶0025).

Claim 17 is rejected for the same reason as Claim 11. 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YOGESHKUMAR G PATEL whose telephone number is (571)272-3957. The examiner can normally be reached 7:30 AM-4 PM PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Duc Nguyen can be reached on 571-272-7503. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/YOGESHKUMAR PATEL/Primary Examiner, Art Unit 2651