DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s amendments in the Amendment filed February 24, 2021 (herein “Amendment”) to claims 8-11 to amend the recited “module” and “unit” to be circuit instead are successful to preclude interpretation of the limitations in claims 8-11 as invoking interpretation under 35 U.S.C. 112(f). Claims 8-11 are no longer being interpreted under 35 U.S.C. 112(f).
Applicant's arguments and amendments in the Amendment regarding the rejection of claims 1-17 under 35 U.S.C. 103, have been fully considered but they are not persuasive. Essentially, Applicant has amended limitations from claim 2 into independent claims 1, 8 and 13, and argues that primary reference Rosner does not teach or suggest these limitations. Specifically, Applicant argues on pages 9-12 that cited reference Rosner does not teach or suggest the claimed “wherein the voice information includes a plurality of voice information segments collected from different time periods, all the time periods being spliced into a complete and continuous time chain.”
For clarity of the record, first it is noted that Rosner is the primary reference of record and as such, given that the rejection of record reflects Rosner as teaching the claimed wherein the voice information comprises a plurality of voice information 
Applicant next sets forth on pages 10-12 that primary reference Rosner does not teach or suggest the claimed “wherein the voice information includes a plurality of voice information segments collected from different time periods, all the time periods being spliced into a complete and continuous time chain.” From Applicant’s argument, it appears the rejection rationale provided was not understood, and in particular, the mentioning of step 908 of Rosner seems to be the focus of confusion. Step 908 of Rosner was discussed to show that Rosner discusses voice information. That is, the newly amended portions of claims 1, 8 and 13 recite limitations directed towards what is comprised in the voice information, and where Fig. 9 of Rosner is a voice activation method that performs various audio signal processing steps, in step 908, it is stated that the received audio signal includes speech with energy levels. Therefore, step 908 of Rosner was referenced to support that Rosner teaches that the various processing steps shown in fig. 9 are performed upon a received audio signal that is a speech signal with energy levels (voice information). Then, with reference to paras. [0052]-[0053], Rosner teaches that the received audio signal is converted to a digital signal, and that 
Therefore, in view of the above, while all of Applicant’s arguments and amendments have been fully considered, they are not persuasive, and the rejection against claims 1-17 under 35 U.S.C. 103 is herein maintained.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-4, 8, and 13-17 are rejected under 35 U.S.C. 103 as being unpatentable over Rosner et al., (US 2016/0086603 A1, herein “Rosner”), as set forth above regarding claim 1, further in view of Kim et al., (US 2016/0267913 A1, herein “Kim”).
Regarding claim 1, Rosner teaches a wakeup method of a voice control system, comprising (Rosner fig. 9, Abstract, and para. [0047], voice activation system which has an operational method including receiving an audio signal and determine if one or more wake-up words are present in the audio signal, and if so, transitioning a speech recognition engine into a fully-operational state (thus waking up)):
collecting voice information (Rosner paras. [0051]-[0052], an audio signal received by a microphone is converted into an electrical signal);
determining that the voice information includes a voice information segment of a voice (Rosner paras. [0054]-[0055], energy characteristics of the received signal are compared to thresholds to determine whether the received audio signal is a voice signal);
Rosner paras. [0057]-[0058], at least a portion of a profile of the audio signal is compared to a predetermined profile (thus some extraction of a segment from the audio signal));
performing wakeup word recognition on the voice information segment of the human voice (Rosner paras. [0059]-[0060], determining whether one or more wake-up words are present in the received audio signal, where the received audio signal includes the segment, thus the determining being performed upon the segment); and 
when a wakeup word is recognized, waking up a voice recognition processor (Rosner para. [0060], if wakeup words are determined to be present (are recognized), the speech recognition engine is transitioned to a fully-operational state),
wherein the voice information comprises a plurality of voice information segments collected from different time periods, all the time periods being spliced into a complete and continuous time chain (Rosner paras. [0052]-[0053], and [0055], the received audio signal which would contain voice information as determined in step 908, is received (thus over time) and converted into a digital (note discrete time) signal, where each of the discrete time periods of the digital signal are the different time periods, with the resultant digital signal being a “spliced” digital audio signal representation in discrete time periods of the continuous timeframe of the analog signal).
While Rosner teaches that its voice activation system detects speech and would discern human speech according to a provided profile, Rosner does not explicitly teach human voice.
Kim fig.1, paras. [0056]-[0058], device generates a wake-up keyword detection signal by detecting the received speech to match a speech signal of a human user).
Therefore, taking the teachings of Rosner and Kim together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the voice activation system and operations thereof of Rosner with the human voice wake-up keyword models taught by Kim at least because doing so would provide more effective speech recognition functions by using a personalized wake-up keyword according to device-based environment information (Kim para. [0012]).
Regarding claim 2, Rosner teaches wherein the collecting the voice information comprises:
	detecting the voice information in an analog signal format (Rosner paras. [0052] and [0055], a received audio signal (which would have the voice information as determined in step 908), is converted into an electrical signal (analog signal format) by the microphone); and
digitally converting the voice information in the analog signal format into a digital signal format (Rosner para. [0053], the analog electrical signal is converted into a digital signal using an A/D converter).
Regarding claim 3, Rosner teaches wherein the performing wakeup word recognition includes: matching the voice information segment of the human voice with the wakeup word voice model (Rosner paras. [0042]-[0043], [0057]-[0058], at least a portion of a profile (time/frequency domain characteristics) of the received audio signal (including the voice information segment after it has been digitized) is compared to at least one predetermined profile as a template of coefficients of known wake-up words (wakeup word voice model)); 
determining that the a wakeup word is recognized when the matching succeeds (Rosner paras. [0043]-[0045], and [0056]-[0058], if the received audio signal profile matches to one or more predetermined profiles, then the received audio signal is qualified as including one or more wake-up words); and 
determining that the wakeup word is not recognized when the matching fails (Rosner paras. [0043]-[0045], and [0058], if the received audio signal profile does not match to one or more predetermined profiles, then the received audio signal is not qualified as including one or more wake-up words).
While Rosner teaches that predetermined profiles exist in its disclosed voice activation system, Rosner does not explicitly teach how those predetermined profiles come into existence on the system. Therefore, Rosner does not explicitly teach the claimed establishing a wakeup word voice model.
Kim teaches establishing a wakeup word voice model (Kim para. [0108], device registers a plurality of wake-up keyword models based on environment information).
Therefore, taking the teachings of Rosner and Kim together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the voice activation system and operations thereof of Rosner with the registration of wake-up keyword models taught by Kim at least because doing so would provide more effective speech recognition functions by using a 
Regarding claim 4, Rosner does not explicitly teach the limitations of claim 4.
Kim teaches wherein the establishing the wakeup word voice model comprises: collecting wakeup voice data of a number of people (Kim para. [0142], the disclosed speech recognition system registers a plurality of wake-up keyword models for each user (thus a number of people), where paras. [0124]-[0127] teach the wake-up keyword models are registered based on the speech signal of the user received); and 
processing and training all the wakeup voice data to obtain the wakeup word voice model (Kim paras. [0080]-[0084], speech signal of the user is used (processed) for registering the wake-up keyword model, where the speech signal of the user is recognized against an acoustic model, and a speed matching rate consideration, and when the speech signal of the user is recognized two or more times (training), the received speech signal of the user is determined to be a valid wake-up keyword model and is registered in the device).
Therefore, taking the teachings of Rosner and Kim together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the voice activation system and operations thereof of Rosner with the registration of wake-up keyword models taught by Kim at least because doing so would provide more effective speech recognition functions by using a personalized wake-up keyword according to device-based environment information (Kim para. [0012]).
Regarding claim 8, Rosner teaches a co-processor, comprising (Rosner paras. [0061]-[0063], hardware, software or any combination to embody any of the modules, procedures and components in figs. 2, 5 and 7-10, with computer system configurations including a computing device having at least one processor, where fig. 9 sets forth the operational method for a voice control system):
a processing circuit (Rosner paras. [0061]-[0063], hardware, software or any combination to embody any of the modules, procedures and components in figs. 2, 5 and 7-10, with computer system configurations including a computing device having at least one processor, where fig. 9 sets forth the operational method for a voice control system) configured to process collected voice information (Rosner paras. [0051]-[0052], an audio signal received by a microphone is converted into an electrical signal) to determine whether the voice information comprises a voice signal that corresponds to a voice (Rosner paras. [0043], [0054]-[0057], energy characteristics of the received signal are compared to thresholds to determine whether the received audio signal is a voice signal, the audio signal being compared to a time or frequency domain profile, where the profiles represent speech signals for wake-up), and to separate a voice information segment of the human voice when the voice information comprises the voice signal that corresponds to the human voice (Rosner paras. [0057]-[0058], at least a portion of a profile of the audio signal is compared to a predetermined profile (thus some separation of a segment from the audio signal) when the voice is detected to be in the audio signal);
a recognition circuit (Rosner paras. [0061]-[0063], hardware, software or any combination to embody any of the modules, procedures and components in figs. 2, 5 and 7-10, with computer system configurations including a computing device having at least one processor, where fig. 9 sets forth the operational method for a voice control system) configured to perform wakeup word recognition on the voice information segment of the human voice (Rosner paras. [0059]-[0060], determining whether one or more wake-up words are present in the received audio signal, where the received audio signal includes the segment, thus the determining being performed upon the segment), and to generate a wakeup instruction when a wakeup word is recognized (Rosner paras. [0043]-[0045], based on a template match of known wake-up words, a second activation signal is output (wakeup instruction)); and 
a wakeup circuit configured to wake up a voice recognition processor according to the wakeup instruction (Rosner paras. [0047] and [0060], if wakeup words are determined to be present (are recognized), the speech recognition engine is transitioned to a fully-operational state from receiving the second activation signal),
wherein the voice information comprises a plurality of voice information segments collected from different time periods, all the time periods being spliced into a complete and continuous time chain (Rosner paras. [0052]-[0053], and [0055], the received audio signal which would contain voice information as determined in step 908, is received (thus over time) and converted into a digital (note discrete time) signal, where each of the discrete time periods of the digital signal are the different time periods, with the resultant digital signal being a “spliced” digital audio signal representation in discrete time periods of the continuous timeframe of the analog signal).

Kim teaches human voice (Kim fig.1, paras. [0056]-[0058], device generates a wake-up keyword detection signal by detecting the received speech to match a speech signal of a human user).
Therefore, taking the teachings of Rosner and Kim together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the voice activation system and operations thereof of Rosner with the human voice wake-up keyword models taught by Kim at least because doing so would provide more effective speech recognition functions by using a personalized wake-up keyword according to device-based environment information (Kim para. [0012]).
Regarding claim 13, Rosner teaches a voice control system, comprising (Rosner para. [0029], fig. 2, voice activation system):
a voice collecting assembly (Rosner para. [0029], microphone 202); and
a co-processor; (Rosner paras. [0061]-[0063], hardware, software or any combination to embody any of the modules, procedures and components in figs. 2, 5 and 7-10, with computer system configurations including a computing device having at least one processor, where fig. 9 sets forth the operational method for a voice control system) wherein,
Rosner paras. [0052]-[0053], microphone 202 converts a received sound into an electrical signal);
the co-processor is configured to: (Rosner paras. [0061]-[0063], hardware, software or any combination to embody any of the modules, procedures and components in figs. 2, 5 and 7-10, with computer system configurations including a computing device having at least one processor, where fig. 9 sets forth the operational method for a voice control system)
process the voice information collected by the voice collecting assembly (Rosner paras. [0051]-[0052], an audio signal received by a microphone is converted into an electrical signal) to determine whether the voice information includes a voice information segment of a voice (Rosner paras. [0043], [0054]-[0057], energy characteristics of the received signal are compared to thresholds to determine whether the received audio signal is a voice signal, the audio signal being compared to a time or frequency domain profile, where the profiles represent speech signals for wake-up);
separate the voice information segment of the human voice from the voice information when it is determined that the voice information includes the voice information segment of the human voice (Rosner paras. [0057]-[0058], at least a portion of a profile of the audio signal is compared to a predetermined profile (thus some separation of a segment from the audio signal) when the voice is detected to be in the audio signal);
Rosner paras. [0059]-[0060], determining whether one or more wake-up words are present in the received audio signal, where the received audio signal includes the segment, thus the determining being performed upon the segment); and 
wake up a voice recognition assembly when a wakeup word is recognized (Rosner paras. [0047] and [0060], if wakeup words are determined to be present (are recognized), the speech recognition engine (voice recognition assembly) is transitioned to a fully-operational state (wake up) from receiving the second activation signal),
wherein the voice information comprises a plurality of voice information segments collected from different time periods, all the time periods being spliced into a complete and continuous time chain (Rosner paras. [0052]-[0053], and [0055], the received audio signal which would contain voice information as determined in step 908, is received (thus over time) and converted into a digital (note discrete time) signal, where each of the discrete time periods of the digital signal are the different time periods, with the resultant digital signal being a “spliced” digital audio signal representation in discrete time periods of the continuous timeframe of the analog signal).
While Rosner teaches that its voice activation system detects speech and would discern human speech according to a provided profile, Rosner does not explicitly teach human voice.
Kim fig.1, paras. [0056]-[0058], device generates a wake-up keyword detection signal by detecting the received speech to match a speech signal of a human user).
Therefore, taking the teachings of Rosner and Kim together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the voice activation system and operations thereof of Rosner with the human voice wake-up keyword models taught by Kim at least because doing so would provide more effective speech recognition functions by using a personalized wake-up keyword according to device-based environment information (Kim para. [0012]).
Regarding claim 14, Rosner teaches wherein the voice collecting assembly comprises a voice collecting module and an A/D conversion module (Rosner paras. [0052]-[0053], microphone and A/D converter 204); wherein
the voice collecting module is configured to collect the voice information in an analog signal format (Rosner paras. [0052] and [0055], a received audio signal (which would have the voice information as determined in step 908), is converted into an electrical signal (analog signal format) by the microphone); and
the A/D conversion module is configured to digitally convert the voice information in the analog signal format into a digital signal format (Rosner para. [0053], the analog electrical signal is converted into a digital signal using an A/D converter).
Regarding claim 15, Rosner teaches wherein the voice recognition assembly is connected to the co-processor (Rosner fig. 2, paras. [0061]-[0062], as shown, the microphone and A/D are connected to the stages 1-3 and control module, where hardware, software or any combination may embody the components in fig. 2, including multiprocessor systems);
the voice recognition assembly is configured to perform a voice recognition in a working-activated state and to enter a non-working dormant state after having performed the voice recognition (Rosner fig. 10, paras. [0018] and [0047], state diagram illustrating an operation complete transition from fully-operational state to stand by state); and 
a transition from the non-working dormant state to the working-activated state of the voice recognition assembly is triggered by the waking up by the co-processor (Rosner fig. 10, para. [0047], stand by state 1002 transitions to wake-up word determination state 1004 once the second activation signal is received, then fully-operational state 1006 once wake-up word is detected).
Regarding claim 16, Rosner teaches wherein the voice recognition assembly enters a waiting state before a transition from the working-activated state to the non-working dormant state (Rosner para. [0047], once speech recognition engine enters fully operations state 1006, it remains in this state until a specified function is complete and a predetermined amount of time has passed (waiting state)); and 
during a set time period, the voice recognition assembly enters the non-working dormant state when the voice recognition assembly is not woken up (Rosner paras. [0047] and [0064]-[0065], if one or more wake-up words are not present within the received audio signal, the speech recognition engine is transitioned back to standby state 1002, where this transition will take a set amount of time according to the speed of the processor/computer architecture set up which is processing the states); and 
Rosner paras. [0047]-[0049], and [0059], fig. 10, speech recognition engine enters into the fully-operational state when one or more wake-up words are detected in the received audio signal and a control signal is output to transition into this state (woken up)).
Regarding claim 17, Rosner does not explicitly teach the limitations of claim 17.
Kim teaches wherein the voice control system is connected to an electrical appliance of an intelligent electrical appliance (Kim paras. [0050]-[0053], [0075], wake-up speech recognition function of the device including executing an application set in the device such as a smart TV, or smart refrigerator).
Therefore, taking the teachings of Rosner and Kim together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the voice activation system and operations thereof of Rosner with the connection to a smart device as taught by Kim at least because doing so would provide more effective speech recognition functions by using a personalized wake-up keyword according to device-based environment information (Kim para. [0012]).
Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Rosner in view of Kim, as set forth above regarding claim 4, further in view of Mitchell, (US 2015/0106095 A1, herein “Mitchell”), further in view of Lee et al., "A voice trigger system using keyword and speaker recognition for mobile devices," in IEEE Transactions on Consumer Electronics, vol. 55, no. 4, pp. 2377-2384, November 2009, doi: 10.1109/TCE.2009.5373813 (herein “Lee NPL”).
Regarding claim 5, Rosner teaches wherein the wakeup word recognition comprises (Rosner paras. [0054]-[0060], process to determine whether one or more wake-up words are present in the received audio signal), and determine whether the wakeup word is recognized (Rosner para. [0060], determine whether one or more wake-up words are present in the received audio signal), but does not explicitly teach the remainder of the limitations of claim 5.
Kim teaches wherein the establishing the wakeup word voice model comprises: in an off-line state, collecting wakeup words recorded by a speaker in different environments (Kim fig. 5A, paras. [0088], [0092], [0108], [0125]-[0126], in a wake-up keyword model registration method/mode (thus off-line with respect to actually performing a wake-up processing) the server registers a plurality of wake-up keyword models based on various different environmental information such as office, home, weather of the environment).
Kim also teaches processing on the wakeup words (Kim paras. [0068] and [0070], speech recognition server generates the wake-up keyword model and extracts speech characteristics from received speech signal of the user), parameters from the wakeup words after the wakeup words have been processed (Kim paras. [0068] and [0070], speech recognition server generates the wake-up keyword model and extracts speech characteristics from received speech signal of the user, where para. [0081] teaches an acoustic model is used to recognize the speech for the registering of the wake-up keyword model thus acoustic parameters of the wakeup words), to obtain the wakeup word voice model (Kim para. [0083], device determines the received speech signal is valid as the wake-up keyword model), storing the wakeup word voice model (Kim para. [0084], wake-up keyword model is generated and stored in the device). 
Mitchell teaches performing framing processing (Mitchell paras. [0061], [0064]-[0067], system flow for generating a Markov model for sound classification including spectral decomposition by first processing the sound signal in windowing units from a frame);
extracting characteristic parameters after the processed by the framing processing (Mitchell paras. [0064]-[0067], spectral coefficients are determined for each frame, and a normalized time frequency matrix is determined therefrom);
adjusting a model parameter of the Hidden Markov HMM model by Baum-Welch algorithm to maximize P(σ|λ), wherein λ is the model parameter, σ is the observation state (Mitchell paras. [0067]-[0068], generating the model parameters of HMM by maximizing the probability of an observation sequence using the Baum-Welch algorithm, which iterates upon λ (characterization of the Markov model, thus a model parameter), until convergence of an expectation and a maximization);
adjusting the model parameter λ to obtain a maximal probability of the observation state σ (Mitchell para. [0068], in the maximization calculation, the λ is step recalculated (adjusting) until convergence (when the maximal probability is reached));
completing model training to obtain the model (Mitchell paras. [0067]- [0070], convergence of the Baum-Welch algorithm produces the Markov model parameter, thus completing the training to obtain the Markov model); and
extracting characteristic parameters for voice frames in the voice information segment of the human voice to obtain a set of new observation values σ’ as a new Mitchell paras. [0075], claim 1, classification of new sound input process including the first part operating as the same as previously disclosed, where paras. [0064]-[0067] teach spectral coefficients are determined for each frame (characteristic parameters), and a normalized time frequency matrix is determined therefrom for an observation sequence defining an observation state path, where paras. [0079]-[0080] teach that the classification model can be for an audio feed from a CCTV camera, or a baby monitor, both systems which would be inputting human voice sounds);
calculating P(σ’|λ) (Mitchell para. [0075], a forward algorithm is used to determine the most likely state path of an observation sequence (for a new input sound) and produce a probability (against the defined model λ, in view of the observation sequence, thus P(σ’|λ)) in terms of a log likelihood that classifies the incoming signal); and
comparing P(σ’|λ) with a confidence threshold to determine whether is recognized (Mitchell paras. [0025], [0017], [0079], threshold values used to classify the audio inputs (i.e. whether a particular sound for which the classifier has been trained, is recognized), where the output probability is greater than (comparing) the confidence threshold for the system to make the classification).
Lee NPL teaches clustering the characteristic parameters to obtain clustered characteristic parameters; establishing an observation state of Hidden Markov HMM model on the clustered characteristic parameters (Lee NPL, section III, keyword model section, the top N Gaussians with the largest Sk,t from all the clusters of the table are assigned to a state of the keyword HMM).

Further, taking the teachings of Rosner and Mitchell together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the voice activation system and operations thereof of Rosner with the sound classifying models taught by Mitchell at least because doing so would provide a sound model resilient to changes in audio sampling rate, use of compression and input of relatively poor quality sound data (Mitchell para. [0014]).
Still further, taking the teachings of Rosner and Lee NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the voice activation system and operations thereof of Rosner with the HMM generation using clustering as taught by Lee NPL at least because doing so would produce a speaker-independent keyword model and reduce a verification error (Lee page 2380 and page 2381, last line, to first two lines of page 2382).
Claims 6-7, 9, 10, and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Rosner in view of Kim, as set forth above regarding claim 1, or claim 8 from which claims 6-7, 9, 10 and 11 respectively depend, further in view of Gotanda et al., (US 7,315,816 B2, herein “Gotanda”).
Regarding claim 6, Rosner teaches wherein the determining that the voice information including the voice information segment of the human voice comprises: determining that the voice signal corresponds to the human voice when the voice signal has an energy level that exceeds an energy threshold (Rosner fig. 4, paras. [0038], [0054]-[0055], energy characteristics of the received signal are compared to predetermined thresholds to determine whether the received audio signal is a voice signal).
Rosner does not teach the balance of the limitations of claim 6.
Gotanda teaches performing blind-source separation processing on the voice information in a digital signal format so as to separate a voice signal having the largest non-Gaussianity value (Gotanda col. 1, lines 37-44, col. 11, lines 1-3, col. 19, lines 42-56, Independent Component Analysis (ICA) to separate target speech from observed mixed signals without information on the transmission paths (thus blind-source), the input signals being first digitized by A/D converters, where a FastICA method sequentially separates signals from the mixed signals in descending order of on-Gaussianity, where the speaker’s speech will be output first for having the highest non-Gaussianity);
voice signal of the largest non-Gaussianity value (Gotanda col. 19, lines 42-56, the first output from the FastICA having the highest non-Gaussianity value being the speaker’s speech (voice signal)); and
Gotanda col. 19, lines 42-56, the first output from the FastICA which separates the input audio signal, being the speaker’s speech signal in separated signal UA).
Therefore, taking the teachings of Rosner and Gotanda together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the voice activation system and operations thereof of Rosner with the FastICA signal separation as taught by Gotanda at least because doing so would allow for separation of target speech without using a priori information on the locations of the target speech and noise (Gotanda col. 27, lines 35-40), and as well, Gotanda characterizes the ICA process to be “known” and “a useful method,” thus, modifying Rosner by Gotanda would also have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention as being use of a known technique to improve similar devices (methods, or products) in the same way. see MPEP 2143(I)(C).
Regarding claim 7, Rosner does not explicitly teach the limitations of claim 7.
Gotanda teaches wherein a method used for the blind-source separation is an independent component analysis ICA algorithm based on one or more of negative entropy maximization, 4th-order kurtosis, or time-frequency transformation (Gotanda col. 19, line 42 – col. 20, line 21, and col. 11, line 52 – col. 12, line 15, where the claim only requires “one or more of,” FastICA method which uses a fourier transform to convert the input mixed signals into the frequency domain for processing, and then after processing, back into the time domain by way of an IFFT (thus time-frequency transformation)).
Gotanda col. 27, lines 35-40), and as well, Gotanda characterizes the ICA process to be “known” and “a useful method,” thus, modifying Rosner by Gotanda would also have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention as being use of a known technique to improve similar devices (methods, or products) in the same way. see MPEP 2143(I)(C).
Regarding claim 9, Rosner teaches wherein the processing circuit comprises a separating circuit and a determining circuit (Rosner paras. [0061]-[0063], hardware, software or any combination to embody any of the modules, procedures and components in figs. 2, 5 and 7-10, with computer system configurations including a computing device having at least one processor, where fig. 9 sets forth the operational method for a voice control system); the determining circuit is configured to determine whether the voice signal corresponds to the human voice through an energy threshold, when the energy threshold is exceeded (Rosner fig. 4, paras. [0038], [0054]-[0055], energy characteristics of the received signal are compared to predetermined thresholds to determine whether the received audio signal is a voice signal).
Rosner does not teach the balance of the limitations of claim 9.
Gotanda col. 1, lines 37-44, col. 11, lines 1-3, col. 19, lines 42-56, Independent Component Analysis (ICA) to separate target speech from observed mixed signals without information on the transmission paths (thus blind-source), the input signals being first digitized by A/D converters, where a FastICA method sequentially separates signals from the mixed signals in descending order of on-Gaussianity, where the speaker’s speech will be output first for having the highest non-Gaussianity); and
to separate the voice signal that corresponds to the human voice so as to obtain the voice information segment of the human voice (Gotanda col. 19, lines 42-56, the first output from the FastICA which separates the input audio signal, being the speaker’s speech signal in separated signal UA).
Therefore, taking the teachings of Rosner and Gotanda together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the voice activation system and operations thereof of Rosner with the FastICA signal separation as taught by Gotanda at least because doing so would allow for separation of target speech without using a priori information on the locations of the target speech and noise (Gotanda col. 27, lines 35-40), and as well, Gotanda characterizes the ICA process to be “known” and “a useful method,” thus, modifying Rosner by Gotanda would also have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention as being 
Regarding claim 10, Rosner teaches wherein the recognition circuit comprises a recognition circuit and a storage circuit (Rosner paras. [0061]-[0063], hardware, software or any combination to embody any of the modules, procedures and components in figs. 2, 5 and 7-10, with computer system configurations including a computing device having at least one processor, where fig. 9 sets forth the operational method for a voice control system); wherein 
the storage circuit is configured to store a wakeup word voice model (Rosner para. [0043], template matching module matches against one or more profiles (wakeup word voice model) thus at least in the processing of the template matching module is stored a wakeup word voice model); and
the recognition circuit is configured to perform wakeup word matching on the voice information segment of the human voice and the with the wakeup word voice model (Rosner paras. [0042]-[0043], [0057]-[0058], at least a portion of a profile (time/frequency domain characteristics) of the received audio signal (including the voice information segment after it has been digitized) is compared to at least one predetermined profile as a template of coefficients of known wake-up words (wakeup word voice model)), and to generate a wakeup instruction when the matching succeeds (Rosner para. [0045], second activation signal is output when the received audio signal is qualified as including one or more wakeup words).
Regarding claim 11, Rosner does not explicitly teach the limitations of claim 11.
Kim para. [0190], processor that performs the operation of the speech recognition server as described): collecting wakeup voice data of a number of people (Kim para. [0142], the disclosed speech recognition system registers a plurality of wake-up keyword models for each user (thus a number of people), where paras. [0124]-[0127] teach the wake-up keyword models are registered based on the speech signal of the user received); and 
processing and training all the wakeup voice data to obtain the wakeup word voice model (Kim paras. [0080]-[0084], speech signal of the user is used (processed) for registering the wake-up keyword model, where the speech signal of the user is recognized against an acoustic model, and a speed matching rate consideration, and when the speech signal of the user is recognized two or more times (training), the received speech signal of the user is determined to be a valid wake-up keyword model and is registered in the device).
Therefore, taking the teachings of Rosner and Kim together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the voice activation system and operations thereof of Rosner with the registration of wake-up keyword models taught by Kim at least because doing so would provide more effective speech recognition functions by using a personalized wake-up keyword according to device-based environment information (Kim para. [0012]).
Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Rosner in view of Kim in view of Golanda, as set forth above regarding claim 11 from which claim 12 depends, further in view of Mitchell, (US 2015/0106095 A1, herein “Mitchell”), further in view of Lee et al., "A voice trigger system using keyword and speaker recognition for mobile devices," in IEEE Transactions on Consumer Electronics, vol. 55, no. 4, pp. 2377-2384, November 2009, doi: 10.1109/TCE.2009.5373813 (herein “Lee NPL”).
Regarding claim 12, Rosner teaches wherein the wakeup word recognition comprises (Rosner paras. [0054]-[0060], process to determine whether one or more wake-up words are present in the received audio signal), and determine whether the wakeup word is recognized (Rosner para. [0060], determine whether one or more wake-up words are present in the received audio signal), but does not explicitly teach the remainder of the limitations of claim 5.
Kim teaches wherein the establishing the wakeup word voice model comprises: in an off-line state, collecting wakeup words recorded by a speaker in different environments (Kim fig. 5A, paras. [0088], [0092], [0108], [0125]-[0126], in a wake-up keyword model registration method/mode (thus off-line with respect to actually performing a wake-up processing) the server registers a plurality of wake-up keyword models based on various different environmental information such as office, home, weather of the environment).
Kim also teaches processing on the wakeup words (Kim paras. [0068] and [0070], speech recognition server generates the wake-up keyword model and extracts speech characteristics from received speech signal of the user), parameters from the wakeup words after the wakeup words have been processed (Kim paras. [0068] and [0070], speech recognition server generates the wake-up keyword model and extracts speech characteristics from received speech signal of the user, where para. [0081] teaches an acoustic model is used to recognize the speech for the registering of the wake-up keyword model thus acoustic parameters of the wakeup words), to obtain the wakeup word voice model (Kim para. [0083], device determines the received speech signal is valid as the wake-up keyword model), storing the wakeup word voice model (Kim para. [0084], wake-up keyword model is generated and stored in the device). 
Mitchell teaches performing framing processing (Mitchell paras. [0061], [0064]-[0067], system flow for generating a Markov model for sound classification including spectral decomposition by first processing the sound signal in windowing units from a frame);
extracting characteristic parameters after the processed by the framing processing (Mitchell paras. [0064]-[0067], spectral coefficients are determined for each frame, and a normalized time frequency matrix is determined therefrom);
adjusting a model parameter of the Hidden Markov HMM model by Baum-Welch algorithm to maximize P(σ|λ), wherein λ is the model parameter, σ is the observation state (Mitchell paras. [0067]-[0068], generating the model parameters of HMM by maximizing the probability of an observation sequence using the Baum-Welch algorithm, which iterates upon λ (characterization of the Markov model, thus a model parameter), until convergence of an expectation and a maximization);
adjusting the model parameter λ to obtain a maximal probability of the observation state σ (Mitchell para. [0068], in the maximization calculation, the λ is step recalculated (adjusting) until convergence (when the maximal probability is reached));
Mitchell paras. [0067]- [0070], convergence of the Baum-Welch algorithm produces the Markov model parameter, thus completing the training to obtain the Markov model); and
extracting characteristic parameters for voice frames in the voice information segment of the human voice to obtain a set of new observation values σ’ as a new observation state (Mitchell paras. [0075], claim 1, classification of new sound input process including the first part operating as the same as previously disclosed, where paras. [0064]-[0067] teach spectral coefficients are determined for each frame (characteristic parameters), and a normalized time frequency matrix is determined therefrom for an observation sequence defining an observation state path, where paras. [0079]-[0080] teach that the classification model can be for an audio feed from a CCTV camera, or a baby monitor, both systems which would be inputting human voice sounds);
calculating P(σ’|λ) (Mitchell para. [0075], a forward algorithm is used to determine the most likely state path of an observation sequence (for a new input sound) and produce a probability (against the defined model λ, in view of the observation sequence, thus P(σ’|λ)) in terms of a log likelihood that classifies the incoming signal); and
comparing P(σ’|λ) with a confidence threshold to determine whether is recognized (Mitchell paras. [0025], [0017], [0079], threshold values used to classify the audio inputs (i.e. whether a particular sound for which the classifier has been trained, is recognized), where the output probability is greater than (comparing) the confidence threshold for the system to make the classification).
Lee NPL, section III, keyword model section, the top N Gaussians with the largest Sk,t from all the clusters of the table are assigned to a state of the keyword HMM).
Therefore, taking the teachings of Rosner and Kim together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the voice activation system and operations thereof of Rosner with the registration of wake-up keyword models taught by Kim at least because doing so would provide more effective speech recognition functions by using a personalized wake-up keyword according to device-based environment information (Kim para. [0012]).
Further, taking the teachings of Rosner and Mitchell together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the voice activation system and operations thereof of Rosner with the sound classifying models taught by Mitchell at least because doing so would provide a sound model resilient to changes in audio sampling rate, use of compression and input of relatively poor quality sound data (Mitchell para. [0014]).
Still further, taking the teachings of Rosner and Lee NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the voice activation system and operations thereof of Rosner with the HMM generation using clustering as taught by Lee NPL at least because doing so would produce a speaker-independent keyword model and .

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 


Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHELLE M KOETH whose telephone number is (571)272-5908.  The examiner can normally be reached on M-Th, and every other Friday, 9:30a-7p..

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


MICHELLE M. KOETH
Primary Examiner
Art Unit 2656



/MICHELLE M KOETH/Primary Examiner, Art Unit 2656