Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 02/21/2020,10/14/2020, and 07/28/2021 are being considered by the examiner.
Drawings
The drawing submitted on 02/11/2020 is being considered by the examiner.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-6, 8-15, rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claims 1, and 14-15, recites the limitation "the first audio stream" in lines 10-11.  There is insufficient antecedent basis for this limitation in the claim.
Claims 2-6, and 8-13 are rejected due to their dependency on the rejected base claim.



Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-6, 8-9, 13, and 14-15, are rejected under 35 U.S.C. 103 as being unpatentable over Kim et al. (US 2018/0336892 A1)  in view of Kim et al.(US 2018/0033428 A1) herein referred as Kim 1 et al.

Regarding Claims  1 and 14-15, Kim et al. teach: A method, comprising: at an electronic device associated with a media-providing service, the electronic device having one or more processors, and memory storing instructions for execution by the one or more processors (Fig. 8 and paragraph [0247]) : receiving a first set of audio streams from each of a plurality of microphones (Fig. 8, block 802 and paragraphs [0248]-[0249]); generating, from the first set of audio streams, a second set of audio streams corresponding to each of a plurality of independent voices (Fig. 8, block 804 and paragraph [0250]; paragraph [0256]: the audio beam that includes the speech input "Hey Siri, what's the weather?" is a first audio stream of the second set and the one including "Coming up on the History Channel" is a second stream of the second set); detecting, in a first audio stream of the second set of audio streams: a wake word(Hey Siri) ([0253] a spoken trigger (e.g. “Hey Siri”); a beginning of a voice command (paragraph [0263]: "what's the") to play media content from the media-providing service (Fig. 8, block 806 and paragraph [0253]; paragraphs [0256] and [0259]: in the audio beam containing "Hey Siri, what's the weather?" a voice command is detected and in the audio beam containing "Coming up on the History Channel" no voice command is detected); an end of the voice command (paragraph [0263]: "weather"), wherein the end of the voice command overlaps in time with speech in a second audio stream of the second set of audio streams (paragraph [0263]: "weather" overlaps with "History Channel"); and in response to detecting the voice command, playing the media content from the media-providing service ([0278]: "and audio output is provided "the weather outside is..."" i.e. media content is played).
Kim et al. do not specifically teaches the obvious limitation of “wherein the second set of audio streams is generated based on a rolling window having a length of time corresponding to a wake word”.
Kim1 et al. teach:  wherein the second set of audio streams is generated based on a rolling window having a length of time (first time period or standby period) corresponding to a wake word ([0031] For example, while operating in a first operational mode 116, the signal processing system 114 may configured to "listen" for (e.g., detect) a keyword spoken in the far-field acoustic environment 110, and in a second operational mode 118, the signal processing system 114 may configured to "listen" for (e.g., detect) a voice input spoken in the far-field acoustic environment 110 after the keyword. [0037] For example, the first time period may correspond to a standby period in which the apparatus 100 is monitoring the far-field acoustic environment 110 to detect a keyword. While operating in the first operational mode 116, the signal processing system 114 is optimized (or otherwise configured) to detect the keyword in the far-field acoustic environment 110. As another example, in the first operational mode 116, the signal processing system 114 may use static versions of certain signal processing parameters, such as beamformer parameters or nullformer parameters, until a direction of arrival of sound corresponding to the keyword is determined. In this example, the static signal processing parameters may facilitate treating the far-field acoustic environment 110 as a set of adjacent or overlapping sound zones (as describe further with reference to FIGS. 9 and 10). Thus, signal processing parameters used by the signal processing system 114 while operating in the first operational mode 116 are selected improve detection of the keyword in the far-field acoustic environment 110. [0038] The signal processing system 114 may operate in the second operational mode 118 during a second time period. The second time period begins when a keyword is detected and ends when an end of a voice input following the keyword is detected. Thus, the second time period may correspond to an active period in which the apparatus 100 is receiving and processing voice input. While operating in the second operational mode 118, the signal processing system 114 is optimized (or otherwise configured) to detect the voice input in the far-field acoustic environment 110. For example, in the second operational mode 118, the signal processing system 114 may cease updating adaptive signal processing parameters and may use parameters that were in use when the keyword was detected. As another example, in the second operational mode 118, the signal processing system 114 may modify certain signal processing parameters, such as beamformer parameters or nullformer parameters, based on a direction of arrival of sound corresponding to the keyword. In this example, the modified signal processing parameters may facilitate focusing the voice recognition process on a location of a source (or sources) of the keyword or on a region or zone from which the keyword was received. Thus, signal processing parameters used by the signal processing system 114 while operating in the second operational mode 118 are selected improve recognition of the voice input in the far-field acoustic environment 110 after detecting the keyword. ).
Therefore it would have been obvious to one of ordinary skilled in the art before the effective filling date of the invention was made for Kim et al. to include the teaching of the Kim1 et al. above in order to improve recognition of the voice input in the far-field acoustic environment 110 after detecting the keyword.

Regarding Claim 2: The method of claim 1, wherein: the plurality of microphones is dynamically adjusted to produce distinct beamforming arrays (See Kim 1 et al. in the rejection of Claim 1).

Regarding Claim 3: The method of claim 2, wherein the first set of audio streams is received from the beamforming arrays of the dynamically adjusted microphones (See rejection of Claim 1, and Kim 1 et al. [0059] As another example, while the signal processing system 114 is in the first operational mode 116, the beamformer 142 may continuously, regularly, or occasionally, update the beamformer parameters 144. To illustrate, the beamformer 142 may update the beamformer parameters 144 dynamically to follow a particular sound source, to avoid a particular noise source, or both. In this example, when the signal processing system 114 changes to operating in the second operational mode 118, the beamformer 142 may cease updating the beamformer parameters 144 (e.g., may use static beamformer parameters) or may change a rate at which the beamformer parameters 144 are updated.).

Regarding Claim 4: The method of claim 1, wherein generating the second set of audio streams comprises performing a blind source separation on the first set of audio streams (See rejection of Claim 1 and Kim et al.  [0250]: In some examples, at least one audio beam of the plurality of audio beams is obtained using source separation techniques.).

Regarding Claim 5: The method of claim 4, wherein generating the second set of audio streams comprises identifying statistically independent signals in the first set of audio streams (See rejection of claim 1 especially  Kim1 et al. teaching [0037] As another example, in the first operational mode 116, the signal processing system 114 may use static versions of certain signal processing parameters, such as beamformer parameters or nullformer parameters, until a direction of arrival of sound corresponding to the keyword is determined.).

Regarding Claim 6: The method of claim 5, wherein generating the second set of audio streams further comprises performing independent component analysis (ICA) on the first set of audio streams (See rejection of claim 1 and Kim1 et al. [0106] For example, in the second operational mode 118, the signal processing system 114 may cease processing or outputting the directional audio signals 302, 306, 308, and 312 that are associated with zones 1, 3, 4 and 6 since these zones did not include the sound corresponding to the keyword. In this example, the second directional audio signal 304 and the fifth directional audio signal 310 may be provided (e.g., independently or separately) to the voice recognition system 124 to process a voice input that follows the keyword.).

Regarding Claim 7: The method of claim 1, further including, prior to detecting the voice command, detecting a wake word in the first audio stream of the second set of audio streams (See rejection of claim 1 and Kim et al. [0263]: "Hey Siri", and Kim 1, et al. [0037-0038] for first time period, voice input followed by keyword).

Regarding Claim 8: The method of claim 1,further comprising, wherein generating the second set of audio streams comprises performing independent component analysis on each audio stream of the first set of audio streams in real-time using rolling window having a length of time corresponding to a length of the wake word (See rejection of claim 6.).

Regarding Claim 9: The method of claim 1, wherein: detecting the end of the voice command includes, while detecting speech content in a second audio stream of the second set of audio streams, detecting a pause in speech in the first audio stream of the second set of audio streams (See rejection of claim 1 and Kim et al, Fig. 10, reference sign 1034 "pause" and paragraph [0265]: the pause is detected because two segments are identified ""Hey Siri what's the" and “weather").

Regarding Claim 13: The method of claim 1, wherein the electronic device is a first electronic device, and wherein the playing the media content from the media-providing service further comprises playing the media content at a second electronic device distinct from the first electronic device (See rejection of claim 1 and Kim et al. [0275] Based on the obtained information, the third electronic device determines whether the first audio signal, the second audio signal, and/or a combination of the first audio signal and the second audio signal corresponds to a spoken trigger. In some examples, the third electronic device makes the determination using any of the techniques described above with respect to FIG. 8. [0276] At block 908, in accordance with a determination that the first audio signal or the second audio signal correspond to a spoken trigger, a fourth electronic device initiates a session of the digital assistant. [0277] In some examples, the fourth electronic device obtains directional information associated with an audio source (e.g., the user 1000) based on the first audio signal and the second audio signal and provides the audio output based on the directional information.).

Claims 10-12 are rejected under 35 U.S.C. 103 as being unpatentable over Kim et al. (US 2018/0336892 A1)  in view of Kim et al.(US 2018/0033428 A1) herein referred as Kim 1 et al. further in view of Lin et al. (US 20200098354 A1).

Regarding Claim 10, Kim et al. in view of Kim1 et al. do not teach: The method of claim 1, further comprising: storing, as a training set, the second set of audio streams corresponding to respective independent voices.
Lin et al. teach: storing, as a training set, the second set of audio streams corresponding to respective independent voices ([0018] In some embodiments, the voice control application may be trained to identify the speaking cadence for the user based on prompting the user to recite a sequence of utterances, recording how the user speaks the utterances and determining a cadence based on the recited sequence of utterances. The voice control application may receive a second voice input from the user in response to prompting the user. For example, the voice control application may receive a sequence of utterances corresponding to a training sequence associated with the prompt. The voice control application may determine an average amount of time between each utterance of the sequence of utterances in the second voice input. For example, the voice control application may determine the periods of silence, as discussed above based on a variation between sound received when the user is speaking and when the user is not speaking, between utterances and may determine an amount of time corresponding to the period of silence. The voice control application may sum the times associated with the periods of silence and may divide by the number of periods of silence to determine an average amount of time between utterances. Based on the average amount of time between utterances, the voice control application may store, in a profile of the user, the speaking cadence based on the average amount of time between utterances. ).
Therefore it would have been obvious to one of ordinary skilled in the art before the effective filling date of the invention was made for Kim et al. in view of Kim1 et al. to include the teaching of Lin et al. above in order to be trained to identify a speaking cadence for a user.

Regarding Claim 11:  The method of claim 10, further comprising: generating a voice-specific filter (a word (e.g., utterance) matching the keyword, that the user spoke the keyword) for each of the independent voices of the second set of audio streams (See rejection of claim 10 and Lin et al. teaching: [0009] The voice control application may retrieve a voice input from a user comprising a sequence of a plurality of utterances. For example, the voice capable device may comprise a microphone accessible to the voice control application. The voice control application may detect a voice signal at the microphone and may record, at least temporarily, the voice input. The voice control application may determine that the voice input comprises a plurality of utterances (e.g., words or word phrases) based on analyzing a soundwave recorded by the voice control application and detecting periods of silence between spoken words. [0010] The voice control application may compare a plurality of utterances to a keyword stored in memory, where the keyword is associated with triggering a voice capable user device. For example, the voice control application may retrieve, from a configuration file associated with the user device a keyword associated with triggering the voice capable user device (e.g., the keyword phrase "OK Cellphone"). The voice control application may transcribe the plurality of utterances to text and may compare the text to the keyword phrase. [0011] The voice control application may determine, based on the comparing, whether a first utterance of the plurality of utterances matches the keyword. For example, the voice control application may compare the text of each utterance of the plurality of utterances to the keyword and may determine whether the text of any utterance of the plurality of utterances matches the keyword. When the voice control application determines that text of an utterance of the plurality of utterances matches the keyword. For example, the voice control application may determine that the keyword is a name associated with the voice activated device (e.g., "Tom"). The voice control application may compare an utterance of the user ("Tom") to the keyword and may determine whether the utterance matches the keyword. For example, when the voice control application determines that a transcription of the voice input comprises a word (e.g., utterance) matching the keyword, that the user spoke the keyword.).

Regarding Claim 12:  The method of claim 11, further comprising: applying the generated voice specific filter for a respective audio stream of the second set of audio streams corresponding to the independent voice to determine the voice command (See rejection of claim 11).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Mulkerkar (US 10365887 b1) teach: (Abstract) Systems and methods for generating command indications, via a computing device, based on audio data including a keyword are described. The computing device receives and processes audio data to determine whether the audio data includes a keyword. The keyword may be a device user identifier, such as an individual's name. Once a keyword is detecting, audio data surrounding the keyword is processed to determine a command contained within the surrounding data, and the command is conveyed to the computing device's user either audibly or visually. Alternatively, a location of the device is determined, a command is determined based on the device's location, and the command is conveyed to a user of the device either audibly or visually.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MOHAMMAD K ISLAM whose telephone number is (571)270-5878.  The examiner can normally be reached on Monday -Friday, EST (IFP).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/MOHAMMAD K ISLAM/Primary Examiner, Art Unit 2656