DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination (herein “RCE”) under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on January 25, 2022, has been entered.

Response to Arguments
Applicant's amendments and arguments filed in the Amendment with RCE filed January 25, 2022, have been fully considered but they are not persuasive. In the amendments to independent claims 1, 8, and 13, limitations from prior claims 3-5 and 9-12 were included, as well as additional new limitations. Further, the limitation in claims 5 and 12 which recited  “clustering the characteristic parameters to obtain clustered characteristic parameters,” was amended into claims 1, 8 and 13 with the additional limitation “Gaussian Mixture Model” so that the amended limitation now recites “clustering the characteristic parameters through Gaussian Mixture Models to obtain clustered characteristic parameters.” 
Applicant sets forth on page 12 that Rosner, Kim and Min fails to teach or suggest “clustering the characteristic parameters through Gaussian Mixture Models to obtain clustered characteristic parameters; establishing an observation state of Hidden Markov HMM model on the clustered characteristic parameters.” Applicant contends that in one embodiment of the Application, “the power level of the clustering step performed through GMM is smaller than the power level of the establishing an observation state of HMM model on the clustered characteristic parameters.” In updated search and consideration of the amended claims, the Examiner found that to the extent the present Application discloses aspects of differing power levels regarding the clustering step versus the establishing an observation state of HMM model step, such disclosure would be distinguishing over the cited art of record, however, these limitations are not found to be present in the current claim limitations of at least independent claims 1 and 13. 
To expedite prosecution, Examiner Koeth contacted Attorney for Applicants, Mr. Hyun Kyu (Nathan) Lee on February 14, 2022, to discuss a potential amendment to include the subject matter discussed in the Amendment remarks on page 12, but also requested the corresponding paragraphs of the present application which supported such limitations. The record of this interview is attached with this Action. In the end, no agreement was reached regarding allowability, however, Applicants are still invited to provide additional limitations to the effect of the subject matter presented on page 12 of the Amendment, and also document in their remarks, the supporting portions of the Specification for the additional limitations. 
Because claims 1 and 13, and claims depending therefrom, as submitted in the Amendment do not distinguish over the cited art of record, the claims remain rejected as detailed below, with the rejection rationale and citations to prior art having been updated for the newly amended limitations. Accordingly, while all of Applicant’s arguments and amendments have been fully considered, they are not persuasive. 
Regarding claim 8, given that claim 8 does recite limitations directed towards the subject matter disclosed in remarks on page 12 of the Amendment, claim 8 is only rejected under 112(a) written description, and 112(b) indefiniteness as detailed below.

Claim Objections
Claims 1 and 13, and claims 2, 6-7, and 14-18 which depend therefrom, are objected to because of the following informalities:  claims 1 and 13 both repeat twice “wherein the establishing the wakeup word voice model comprises,” however, such recitations are redundant when this limitation need only be recited once. For better readability of the claims, it is recommended to simply recite “wherein the establishing the wakeup word voice model comprises,” once and include all the sub-limitations under the same wherein clause. Appropriate correction is required.
Claims 1 and 13, and claims 2, 6-7, and 14-18 which depend therefrom, are further objected to because of the following informalities: both claims 1 and 13 recite the following limitations: “wherein the performing wakeup word recognition includes,” and “wherein the wakeup word recognition comprises,” which appear to refer to the following antecedent basis limitation: “performing ... wakeup word recognition.” Therefore, the two “wherein” limitations given above are redundant, and also do not recite the reference to . Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claim 8 is rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time additional power levels distinguishing the clustered step from the establishing step. At least, para. 82 of the PgPub of the present Application discloses the co-processor as having low-power consumption, however, distinctions between operations of the co-processor versus other presumably higher power level processor (different processor? same co-processor?) operations was not found in the Specification upon review by the Examiner.
Claim 8 is also rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention. Because support from the Specification cannot be found for the claimed “clustering the characteristic parameters through Gaussian Mixture Models to obtain clustered characteristic parameters at a third power level of the voice control system; 


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.

Claims 1-2 are rejected under 35 U.S.C. 103 as being unpatentable over Rosner et al., (US 2016/0086603 A1, herein “Rosner”), in view of Kim et al., (US 2016/0267913 A1, herein “Kim”), further in view of Min et al., “Implementation of FastICA on DSP for Blind Source Separation,” 2012 International Workshop on Information and Electronics Engineering (IWIEE), Procedia Engineering 29 (2012) pp 4228-4233 (herein “Min NPL”) further in view of Mitchell, (US 2015/0106095 A1, herein “Mitchell”), further in view of Lee et al., "A voice trigger system using keyword and speaker recognition for mobile devices," in IEEE Transactions on Consumer Electronics, vol. 55, no. 4, pp. 2377-2384, November 2009, doi: 10.1109/TCE.2009.5373813 (herein “Lee NPL”).
Regarding claim 1, Rosner teaches a wakeup method of a voice control system, comprising (Rosner fig. 9, Abstract, and para. [0047], voice activation system which has an operational method including receiving an audio signal and determine if one or more wake-up words are present in the audio signal, and if so, transitioning a speech recognition engine into a fully-operational state (thus waking up)):
collecting voice information (Rosner paras. [0051]-[0052], an audio signal received by a microphone is converted into an electrical signal) by a co-processor (Rosner paras. [0061] and [0065], computer system including a processor to embody the module, procedures and components in figs 2, 5, and 7-10);
determining, by the co-processor (Rosner paras. [0061] and [0065], computer system including a processor to embody the module, procedures and components in figs 2, 5, and 7-10), that the voice information includes a voice information segment of a voice (Rosner paras. [0054]-[0055], energy characteristics of the received signal are compared to thresholds to determine whether the received audio signal is a voice signal);
extracting, by the co-processor (Rosner paras. [0061] and [0065], computer system including a processor to embody the module, procedures and components in figs 2, 5, and 7-10), the voice information segment of the human voice from the voice information (Rosner paras. [0057]-[0058], at least a portion of a profile of the audio signal is compared to a predetermined profile (thus some extraction of a segment from the audio signal)) including the wakeup word and the non-wakeup word (Rosner para. [0026], the received audio signal containing specific words that are identified to transition to a fully-operational state (thus wakeup and thus the specific words being wakeup words) so that in the fully-operational state, the full vocabulary of words is recognized (non-wakeup word));
performing, by the co-processor (Rosner paras. [0061] and [0065], computer system including a processor to embody the module, procedures and components in figs 2, 5, and 7-10), wakeup word recognition on the voice information segment of the human voice to recognize the wakeup word included in the voice information (Rosner paras. [0059]-[0060], determining whether one or more wake-up words are present in the received audio signal, where the received audio signal includes the segment, thus the determining being performed upon the segment, and if the wake-up word is present (to recognize) then step 930 is executed); and 
Rosner paras. [0061] and [0065], computer system including a processor to embody the module, procedures and components in figs 2, 5, and 7-10) wake up a voice recognition processor (Rosner paras. [0057]-[0060], if wakeup words are determined to be present (are recognized), the speech recognition engine is transitioned to a fully-operational state) operatively coupled to the co-processor (Rosner para. [0061], hardware may embody any of the modules, procedures and components in figs. 2, 5, and 7-10, including second stage and or/third stage, the hardware being a processor, where fig. 2 illustrates that the stages are all connected (operatively coupled)),
wherein the voice recognition processor is configured to only operate when the wakeup word is recognized to avoid analyzing the non-wakeup word included in the voice information (Rosner paras. 30-36, first stage which detects voice activity runs constantly, but the other two stages, stages 2 and 3, which analyze an input audio signal for wake up words is only run in a fully-operational state once receiving a signal from the first stage that voice activity has been detected, and the third stage which performs speech recognition only operates in a fully-operational state once receiving a second activation signal from the second stage that a wake up word has been detected),
wherein the co-processor operates at a first power level and the voice recognition processor operates at a second power level higher than the first power level to reduce energy consumption of the voice recognition processor (Rosner paras. 31-36, 46-47 and 61-62, stages implemented on processors, the first and second stages which together, determine that a wake up word is present, operating at a lower power level than the third stage performing the speech recognition),
wherein the voice information comprises a plurality of voice information segments collected from different time periods, all the time periods being spliced into a complete and continuous time chain (Rosner paras. [0052]-[0053], and [0055], the received audio signal which would contain voice information as determined in step 908, is received (thus over time) and converted into a digital (note discrete time) signal, where each of the discrete time periods of the digital signal are the different time periods, with the resultant digital signal being a “spliced” digital audio signal representation in discrete time periods of the continuous timeframe of the analog signal),
wherein the determining that the voice information including the voice information segment of the human voice includes (Rosner fig. 4, paras. [0038], [0054]-[0055], energy characteristics of the received signal are compared to predetermined thresholds to determine whether the received audio signal is a voice signal),
wherein the performing wakeup word recognition includes: matching the voice information segment of the voice with the wakeup word voice model (Rosner paras. [0042]-[0043], [0057]-[0058], at least a portion of a profile (time/frequency domain characteristics) of the received audio signal (including the voice information segment after it has been digitized) is compared to at least one predetermined profile as a template of coefficients of known wake-up words (wakeup word voice model)); 
determining that the a wakeup word is recognized when the matching succeeds (Rosner paras. [0043]-[0045], and [0056]-[0058], if the received audio signal profile matches to one or more predetermined profiles, then the received audio signal is qualified as including one or more wake-up words); and 
determining that the wakeup word is not recognized when the matching fails (Rosner paras. [0043]-[0045], and [0058], if the received audio signal profile does not match to one or more predetermined profiles, then the received audio signal is not qualified as including one or more wake-up words).
While Rosner teaches that its voice activation system detects speech and would discern human speech according to a provided profile, Rosner does not explicitly teach human voice.
Rosner further does not explicitly teach performing blind-source separation processing on the voice information in a digital signal format using an independent component analysis ICA algorithm based on negative entropy maximization.
While Rosner teaches that predetermined profiles exist in its disclosed voice activation system, Rosner does not explicitly teach how those predetermined profiles come into existence on the system. Therefore, Rosner does not explicitly teach the claimed establishing a wakeup word voice model; wherein establishing the wakeup word voice model comprises: collecting wakeup voice data of a number of people; and processing and training all the wakeup voice data to obtain the wakeup word voice model, wherein the wakeup word recognition comprises: in an off-line state, collecting wakeup words recorded by a speaker in different environments; performing framing processing on the wakeup words; extracting characteristic parameters from the wakeup words after the wakeup words have been processed by the framing processing; clustering the characteristic parameters through Gaussian Mixture Models to obtain 
Kim teaches human voice (Kim fig.1, paras. [0056]-[0058], device generates a wake-up keyword detection signal by detecting the received speech to match a speech signal of a human user).
Kim teaches establishing a wakeup word voice model (Kim para. [0108], device registers a plurality of wake-up keyword models based on environment information).
Kim teaches wherein the establishing the wakeup word voice model comprises: collecting wakeup voice data of a number of people (Kim para. [0142], the disclosed speech recognition system registers a plurality of wake-up keyword models for each user (thus a number of people), where paras. [0124]-[0127] teach the wake-up keyword models are registered based on the speech signal of the user received); and 
processing and training all the wakeup voice data to obtain the wakeup word voice model (Kim paras. [0080]-[0084], speech signal of the user is used (processed) for registering the wake-up keyword model, where the speech signal of the user is recognized against an acoustic model, and a speed matching rate consideration, and when the speech signal of the user is recognized two or more times (training), the received speech signal of the user is determined to be a valid wake-up keyword model and is registered in the device).
Kim teaches wherein the establishing the wakeup word voice model comprises: in an off-line state, collecting wakeup words recorded by a speaker in different environments (Kim fig. 5A, paras. [0088], [0092], [0108], [0125]-[0126], in a wake-up keyword model registration method/mode (thus off-line with respect to actually performing a wake-up processing) the server registers a plurality of wake-up keyword models based on various different environmental information such as office, home, weather of the environment).
Kim also teaches processing on the wakeup words (Kim paras. [0068] and [0070], speech recognition server generates the wake-up keyword model and extracts speech characteristics from received speech signal of the user), parameters from the wakeup words after the wakeup words have been processed (Kim paras. [0068] and [0070], speech recognition server generates the wake-up keyword model and extracts speech characteristics from received speech signal of the user, where para. [0081] teaches an acoustic model is used to recognize the speech for the registering of the wake-up keyword model thus acoustic parameters of the wakeup words), to obtain the wakeup word voice model (Kim para. [0083], device determines the received speech signal is valid as the wake-up keyword model), storing the wakeup word voice model (Kim para. [0084], wake-up keyword model is generated and stored in the device). 
Min NPL Abstract, algorithm for blind source separation is applied to voice signals separation (voice information)) in a digital signal format (Min NPL section 3.3, figure 3, implementation of FastICA including sampling data at 48 Khz (thus digital signal) and processing it with a DSP that has an analog to digital converter) using an independent component analysis ICA algorithm based on negative entropy maximization (Min NPL section 2.2, FastICA is based on a principle that separation of signals is complete when negative entropy has reached its maximum).  
Mitchell teaches performing framing processing (Mitchell paras. [0061], [0064]-[0067], system flow for generating a Markov model for sound classification including spectral decomposition by first processing the sound signal in windowing units from a frame);
extracting characteristic parameters after the processed by the framing processing (Mitchell paras. [0064]-[0067], spectral coefficients are determined for each frame, and a normalized time frequency matrix is determined therefrom);
adjusting a model parameter of the Hidden Markov HMM model by Baum-Welch algorithm to maximize P(σ|λ), wherein λ is the model parameter, σ is the observation state (Mitchell paras. [0067]-[0068], generating the model parameters of HMM by maximizing the probability of an observation sequence using the Baum-Welch algorithm, which iterates upon λ (characterization of the Markov model, thus a model parameter), until convergence of an expectation and a maximization);
Mitchell para. [0068], in the maximization calculation, the λ is step recalculated (adjusting) until convergence (when the maximal probability is reached));
completing model training to obtain the model (Mitchell paras. [0067]- [0070], convergence of the Baum-Welch algorithm produces the Markov model parameter, thus completing the training to obtain the Markov model); and
extracting characteristic parameters for voice frames in the voice information segment of the human voice to obtain a set of new observation values σ’ as a new observation state (Mitchell paras. [0075], claim 1, classification of new sound input process including the first part operating as the same as previously disclosed, where paras. [0064]-[0067] teach spectral coefficients are determined for each frame (characteristic parameters), and a normalized time frequency matrix is determined therefrom for an observation sequence defining an observation state path, where paras. [0079]-[0080] teach that the classification model can be for an audio feed from a CCTV camera, or a baby monitor, both systems which would be inputting human voice sounds);
calculating P(σ’|λ) (Mitchell para. [0075], a forward algorithm is used to determine the most likely state path of an observation sequence (for a new input sound) and produce a probability (against the defined model λ, in view of the observation sequence, thus P(σ’|λ)) in terms of a log likelihood that classifies the incoming signal); and
comparing P(σ’|λ) with a confidence threshold to determine whether is recognized (Mitchell paras. [0025], [0017], [0079], threshold values used to classify the audio inputs (i.e. whether a particular sound for which the classifier has been trained, is recognized), where the output probability is greater than (comparing) the confidence threshold for the system to make the classification).
Lee NPL teaches clustering the characteristic parameters through Gaussian Mixture Model to obtain clustered characteristic parameters; establishing an observation state of Hidden Markov HMM model on the clustered characteristic parameters (Lee NPL, section III, keyword model section, the top N Gaussians (from a Gaussian Mixture Model (GMM) with the largest Sk,t from all the clusters of the table are assigned to a state of the keyword HMM).
Therefore, taking the teachings of Rosner and Kim together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the voice activation system and operations thereof of Rosner with the human voice wake-up keyword models taught by Kim at least because doing so would provide more effective speech recognition functions by using a personalized wake-up keyword according to device-based environment information (Kim para. [0012]).
Further, taking the teachings of Rosner and Min NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the voice activation system and operations thereof of Rosner with the blind-source separation processing using FastICA as taught by Min NPL at least because doing so would be a way to perform blind signal separation that is much less time consuming, highly efficient and having less memory requirements (Min NPL section 5).

Still further, taking the teachings of Rosner and Lee NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the voice activation system and operations thereof of Rosner with the HMM generation using clustering as taught by Lee NPL at least because doing so would produce a speaker-independent keyword model and reduce a verification error (Lee NPL page 2380 and page 2381, last line, to first two lines of page 2382).
Regarding claim 2, Rosner teaches wherein the collecting the voice information comprises:
	detecting the voice information in an analog signal format (Rosner paras. [0052] and [0055], a received audio signal (which would have the voice information as determined in step 908), is converted into an electrical signal (analog signal format) by the microphone); and
digitally converting the voice information in the analog signal format into a digital signal format (Rosner para. [0053], the analog electrical signal is converted into a digital signal using an A/D converter.
Claims 6-7, and 13-18 are rejected under 35 U.S.C. 103 as being unpatentable over Rosner in view of Kim in view of Min NPL in view of Mitchell in view of Lee NPL, as set forth above regarding claim 1 (for claims 6-7 which depend from claim 1), further in view of Gotanda et al., (US 7,315,816 B2, herein “Gotanda”).
Regarding claim 6, Rosner teaches wherein the determining that the voice information including the voice information segment of the human voice comprises: determining that the voice signal corresponds to the human voice when the voice signal has an energy level that exceeds an energy threshold (Rosner fig. 4, paras. [0038], [0054]-[0055], energy characteristics of the received signal are compared to predetermined thresholds to determine whether the received audio signal is a voice signal).
Rosner does not teach the balance of the limitations of claim 6.
Gotanda teaches separating a voice signal having the largest non-Gaussianity value based on performing blind-source separation processing on the voice information in a digital signal format (Gotanda col. 1, lines 37-44, col. 11, lines 1-3, col. 19, lines 42-56, Independent Component Analysis (ICA) to separate target speech from observed mixed signals without information on the transmission paths (thus blind-source), the input signals being first digitized by A/D converters, where a FastICA method sequentially separates signals from the mixed signals in descending order of on-Gaussianity, where the speaker’s speech will be output first for having the highest non-Gaussianity);
Gotanda col. 19, lines 42-56, the first output from the FastICA having the highest non-Gaussianity value being the speaker’s speech (voice signal)); and
separating the voice signal that corresponds to the human voice to obtain the voice information segment of the human voice (Gotanda col. 19, lines 42-56, the first output from the FastICA which separates the input audio signal, being the speaker’s speech signal in separated signal UA).
Therefore, taking the teachings of Rosner and Gotanda together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the voice activation system and operations thereof of Rosner with the FastICA signal separation as taught by Gotanda at least because doing so would allow for separation of target speech without using a priori information on the locations of the target speech and noise (Gotanda col. 27, lines 35-40), and as well, Gotanda characterizes the ICA process to be “known” and “a useful method,” thus, modifying Rosner by Gotanda would also have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention as being use of a known technique to improve similar devices (methods, or products) in the same way. see MPEP 2143(I)(C).
Regarding claim 7, Rosner does not explicitly teach the limitations of claim 7.
Gotanda teaches wherein a method used for the blind-source separation is an independent component analysis ICA algorithm further based on either time-frequency transformation or 4-th order kurtosis (Gotanda col. 19, line 42 – col. 20, line 21, and col. 11, line 52 – col. 12, line 15, where the claim only requires “either or,” FastICA method which uses a Fourier transform to convert the input mixed signals into the frequency domain for processing, and then after processing, back into the time domain by way of an IFFT (thus time-frequency transformation)).
Therefore, taking the teachings of Rosner and Gotanda together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the voice activation system and operations thereof of Rosner with the FastICA signal separation as taught by Gotanda at least because doing so would allow for separation of target speech without using a priori information on the locations of the target speech and noise (Gotanda col. 27, lines 35-40), and as well, Gotanda characterizes the ICA process to be “known” and “a useful method,” thus, modifying Rosner by Gotanda would also have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention as being use of a known technique to improve similar devices (methods, or products) in the same way. see MPEP 2143(I)(C).
Regarding claim 13, Rosner teaches a voice control system, comprising (Rosner para. [0029], fig. 2, voice activation system):
a voice collecting assembly (Rosner para. [0029], microphone 202); 
a voice recognition processor (Rosner paras. [0061]-[0063], hardware, software or any combination to embody any of the modules, procedures and components in figs. 2, 5 and 7-10, with computer system configurations including a computing device having at least one processor, where fig. 9 sets forth the operational method for a voice control system); and
Rosner paras. [0061]-[0063], hardware, software or any combination to embody any of the modules, procedures and components in figs. 2, 5 and 7-10, with computer system configurations including a computing device having at least one processor, where fig. 9 sets forth the operational method for a voice control system) wherein,
the voice collecting assembly is configured to collect voice information (Rosner paras. [0052]-[0053], microphone 202 converts a received sound into an electrical signal);
the co-processor is configured to: (Rosner paras. [0061]-[0063], hardware, software or any combination to embody any of the modules, procedures and components in figs. 2, 5 and 7-10, with computer system configurations including a computing device having at least one processor, where fig. 9 sets forth the operational method for a voice control system)
process the voice information collected by the voice collecting assembly (Rosner paras. [0051]-[0052], an audio signal received by a microphone is converted into an electrical signal) to determine whether the voice information includes a voice information segment of a voice (Rosner paras. [0043], [0054]-[0057], energy characteristics of the received signal are compared to thresholds to determine whether the received audio signal is a voice signal, the audio signal being compared to a time or frequency domain profile, where the profiles represent speech signals for wake-up) that includes a wakeup word and a non-wakeup word (Rosner para. [0026], the received audio signal containing specific words that are identified to transition to a fully-operational state (thus wakeup and thus the specific words being wakeup words) so that in the fully-operational state, the full vocabulary of words is recognized (non-wakeup word));
separate the voice information segment of the human voice from the voice information when it is determined that the voice information includes the voice information segment of the human voice (Rosner paras. [0057]-[0058], at least a portion of a profile of the audio signal is compared to a predetermined profile (thus some separation of a segment from the audio signal) when the voice is detected to be in the audio signal);
perform wakeup word recognition on the voice information segment of the human voice to recognize the wakeup word included in the voice information (Rosner paras. [0059]-[0060], determining whether one or more wake-up words are present in the received audio signal, where the received audio signal includes the segment, thus the determining being performed upon the segment); and 
wake up the voice recognition processor when a wakeup word is recognized (Rosner paras. [0047] and [0060], if wakeup words are determined to be present (are recognized), the speech recognition engine (voice recognition assembly) is transitioned to a fully-operational state (wake up) from receiving the second activation signal),
wherein the voice information includes a plurality of voice information segments collected from different time periods, all the time periods being spliced into a complete and continuous time chain (Rosner paras. [0052]-[0053], and [0055], the received audio signal which would contain voice information as determined in step 908, is received (thus over time) and converted into a digital (note discrete time) signal, where each of the discrete time periods of the digital signal are the different time periods, with the resultant digital signal being a “spliced” digital audio signal representation in discrete time periods of the continuous timeframe of the analog signal),
wherein the co-processor includes a separating circuit and a determining circuit (Rosner paras. [0061]-[0063], hardware, software or any combination to embody any of the modules, procedures and components in figs. 2, 5 and 7-10, with computer system configurations including a computing device having at least one processor, where fig. 9 sets forth the operational method fora voice control system); wherein the separating circuit is configured to (Rosner paras. [0061]-[0063], programmable logic executing on a processing platform);
wherein the voice recognition processor is configured to only operate when the wakeup word is recognized to avoid analyzing the non-wakeup word included in the voice information (Rosner paras. 30-36, first stage which detects voice activity runs constantly, but the other two stages, stages 2 and 3, which analyze an input audio signal for wake up words is only run in a fully-operational state once receiving a signal from the first stage that voice activity has been detected, and the third stage which performs speech recognition only operates in a fully-operational state once receiving a second activation signal from the second stage that a wake up word has been detected); and
wherein the co-processor operates at a first power level and the voice recognition processor operates at a second power level higher than the first power level to reduce energy consumption of the voice recognition processor (Rosner paras. 31-36, 46-47 and 61-62, stages implemented on processors, the first and second stages which together, determine that a wake up word is present, operating at a lower power level than the third stage performing the speech recognition);
wherein the performing wakeup word recognition includes: matching the voice information segment of the voice with the wakeup word voice model (Rosner paras. [0042]-[0043], [0057]-[0058], at least a portion of a profile (time/frequency domain characteristics) of the received audio signal (including the voice information segment after it has been digitized) is compared to at least one predetermined profile as a template of coefficients of known wake-up words (wakeup word voice model)); 
determining that the a wakeup word is recognized when the matching succeeds (Rosner paras. [0043]-[0045], and [0056]-[0058], if the received audio signal profile matches to one or more predetermined profiles, then the received audio signal is qualified as including one or more wake-up words); and 
determining that the wakeup word is not recognized when the matching fails (Rosner paras. [0043]-[0045], and [0058], if the received audio signal profile does not match to one or more predetermined profiles, then the received audio signal is not qualified as including one or more wake-up words).
While Rosner teaches that its voice activation system detects speech and would discern human speech according to a provided profile, Rosner does not explicitly teach human voice.
Further, Rosner does not explicitly teach perform blind-source separation processing on the voice information in a digital signal format so as to separate a voice signal having the largest non-Gaussianity value.

Kim teaches human voice (Kim fig.1, paras. [0056]-[0058], device generates a wake-up keyword detection signal by detecting the received speech to match a speech signal of a human user).
Kim para. [0108], device registers a plurality of wake-up keyword models based on environment information).
Kim teaches wherein the establishing the wakeup word voice model comprises: collecting wakeup voice data of a number of people (Kim para. [0142], the disclosed speech recognition system registers a plurality of wake-up keyword models for each user (thus a number of people), where paras. [0124]-[0127] teach the wake-up keyword models are registered based on the speech signal of the user received); and 
processing and training all the wakeup voice data to obtain the wakeup word voice model (Kim paras. [0080]-[0084], speech signal of the user is used (processed) for registering the wake-up keyword model, where the speech signal of the user is recognized against an acoustic model, and a speed matching rate consideration, and when the speech signal of the user is recognized two or more times (training), the received speech signal of the user is determined to be a valid wake-up keyword model and is registered in the device).
Kim teaches wherein the establishing the wakeup word voice model comprises: in an off-line state, collecting wakeup words recorded by a speaker in different environments (Kim fig. 5A, paras. [0088], [0092], [0108], [0125]-[0126], in a wake-up keyword model registration method/mode (thus off-line with respect to actually performing a wake-up processing) the server registers a plurality of wake-up keyword models based on various different environmental information such as office, home, weather of the environment).
Kim also teaches processing on the wakeup words (Kim paras. [0068] and [0070], speech recognition server generates the wake-up keyword model and extracts speech characteristics from received speech signal of the user), parameters from the wakeup words after the wakeup words have been processed (Kim paras. [0068] and [0070], speech recognition server generates the wake-up keyword model and extracts speech characteristics from received speech signal of the user, where para. [0081] teaches an acoustic model is used to recognize the speech for the registering of the wake-up keyword model thus acoustic parameters of the wakeup words), to obtain the wakeup word voice model (Kim para. [0083], device determines the received speech signal is valid as the wake-up keyword model), storing the wakeup word voice model (Kim para. [0084], wake-up keyword model is generated and stored in the device). 
Min NPL teaches perform blind-source separation processing on the voice information (Min NPL Abstract, algorithm for blind source separation is applied to voice signals separation (voice information)) in a digital signal format (Min NPL section 3.3, figure 3, implementation of FastICA including sampling data at 48 Khz (thus digital signal) and processing it with a DSP that has an analog to digital converter). 
Gotanda teaches so as to separate a voice signal having the largest non-Gaussianity value (Gotanda col. 19, lines 42-56, the first output from the FastICA which separates the input audio signal, being the speaker’s speech signal in separated signal UA).
Mitchell teaches performing framing processing (Mitchell paras. [0061], [0064]-[0067], system flow for generating a Markov model for sound classification including spectral decomposition by first processing the sound signal in windowing units from a frame);
Mitchell paras. [0064]-[0067], spectral coefficients are determined for each frame, and a normalized time frequency matrix is determined therefrom);
adjusting a model parameter of the Hidden Markov HMM model by Baum-Welch algorithm to maximize P(σ|λ), wherein λ is the model parameter, σ is the observation state (Mitchell paras. [0067]-[0068], generating the model parameters of HMM by maximizing the probability of an observation sequence using the Baum-Welch algorithm, which iterates upon λ (characterization of the Markov model, thus a model parameter), until convergence of an expectation and a maximization);
adjusting the model parameter λ to obtain a maximal probability of the observation state σ (Mitchell para. [0068], in the maximization calculation, the λ is step recalculated (adjusting) until convergence (when the maximal probability is reached));
completing model training to obtain the model (Mitchell paras. [0067]- [0070], convergence of the Baum-Welch algorithm produces the Markov model parameter, thus completing the training to obtain the Markov model); and
extracting characteristic parameters for voice frames in the voice information segment of the human voice to obtain a set of new observation values σ’ as a new observation state (Mitchell paras. [0075], claim 1, classification of new sound input process including the first part operating as the same as previously disclosed, where paras. [0064]-[0067] teach spectral coefficients are determined for each frame (characteristic parameters), and a normalized time frequency matrix is determined therefrom for an observation sequence defining an observation state path, where paras. [0079]-[0080] teach that the classification model can be for an audio feed from a CCTV camera, or a baby monitor, both systems which would be inputting human voice sounds);
calculating P(σ’|λ) (Mitchell para. [0075], a forward algorithm is used to determine the most likely state path of an observation sequence (for a new input sound) and produce a probability (against the defined model λ, in view of the observation sequence, thus P(σ’|λ)) in terms of a log likelihood that classifies the incoming signal); and
comparing P(σ’|λ) with a confidence threshold to determine whether is recognized (Mitchell paras. [0025], [0017], [0079], threshold values used to classify the audio inputs (i.e. whether a particular sound for which the classifier has been trained, is recognized), where the output probability is greater than (comparing) the confidence threshold for the system to make the classification).
Lee NPL teaches clustering the characteristic parameters through Gaussian Mixture Model to obtain clustered characteristic parameters; establishing an observation state of Hidden Markov HMM model on the clustered characteristic parameters (Lee NPL, section III, keyword model section, the top N Gaussians (from a Gaussian Mixture Model (GMM) with the largest Sk,t from all the clusters of the table are assigned to a state of the keyword HMM).
Therefore, taking the teachings of Rosner and Kim together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the voice activation system and operations thereof of Rosner with the human voice wake-up keyword models taught by Kim at least because doing so would provide more effective speech recognition functions by using a 
Further, taking the teachings of Rosner and Min NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the voice activation system and operations thereof of Rosner with the blind-source separation processing using FastICA as taught by Min NPL at least because doing so would be a way to perform blind signal separation that is much less time consuming, highly efficient and having less memory requirements (Min NPL section 5).
Still further, taking the teachings of Rosner and Gotanda together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the voice activation system and operations thereof of Rosner with the FastICA signal separation as taught by Gotanda at least because doing so would allow for separation of target speech without using a priori information on the locations of the target speech and noise (Gotanda col. 27, lines 35-40), and as well, Gotanda characterizes the ICA process to be “known” and “a useful method,” thus, modifying Rosner by Gotanda would also have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention as being use of a known technique to improve similar devices (methods, or products) in the same way. see MPEP 2143(I)(C).
Further, taking the teachings of Rosner and Mitchell together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the voice activation system and operations thereof of 
Still further, taking the teachings of Rosner and Lee NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the voice activation system and operations thereof of Rosner with the HMM generation using clustering as taught by Lee NPL at least because doing so would produce a speaker-independent keyword model and reduce a verification error (Lee NPL page 2380 and page 2381, last line, to first two lines of page 2382).
Regarding claim 14, Rosner teaches wherein the voice collecting assembly comprises a voice collecting module and an A/D conversion module (Rosner paras. [0052]-[0053], microphone and A/D converter 204); wherein
the voice collecting module is configured to collect the voice information in an analog signal format (Rosner paras. [0052] and [0055], a received audio signal (which would have the voice information as determined in step 908), is converted into an electrical signal (analog signal format) by the microphone); and
the A/D conversion module is configured to digitally convert the voice information in the analog signal format into a digital signal format (Rosner para. [0053], the analog electrical signal is converted into a digital signal using an A/D converter).
Regarding claim 15, Rosner teaches wherein the voice recognition assembly is connected to the co-processor (Rosner fig. 2, paras. [0061]-[0062], as shown, the microphone and A/D are connected to the stages 1-3 and control module, where hardware, software or any combination may embody the components in fig. 2, including multiprocessor systems);
the voice recognition assembly is configured to perform a voice recognition in a working-activated state and to enter a non-working dormant state after having performed the voice recognition (Rosner fig. 10, paras. [0018] and [0047], state diagram illustrating an operation complete transition from fully-operational state to stand by state); and 
a transition from the non-working dormant state to the working-activated state of the voice recognition assembly is triggered by the waking up by the co-processor (Rosner fig. 10, para. [0047], stand by state 1002 transitions to wake-up word determination state 1004 once the second activation signal is received, then fully-operational state 1006 once wake-up word is detected).
Regarding claim 16, Rosner teaches wherein the voice recognition assembly enters a waiting state before a transition from the working-activated state to the non-working dormant state (Rosner para. [0047], once speech recognition engine enters fully operations state 1006, it remains in this state until a specified function is complete and a predetermined amount of time has passed (waiting state)); and 
during a set time period, the voice recognition assembly enters the non-working dormant state when the voice recognition assembly is not woken up (Rosner paras. [0047] and [0064]-[0065], if one or more wake-up words are not present within the received audio signal, the speech recognition engine is transitioned back to standby state 1002, where this transition will take a set amount of time according to the speed of the processor/computer architecture set up which is processing the states); and 
Rosner paras. [0047]-[0049], and [0059], fig. 10, speech recognition engine enters into the fully-operational state when one or more wake-up words are detected in the received audio signal and a control signal is output to transition into this state (woken up)).
Regarding claim 17, Rosner does not explicitly teach the limitations of claim 17.
Kim teaches wherein the voice control system is connected to an electrical appliance of an intelligent electrical appliance (Kim paras. [0050]-[0053], [0075], wake-up speech recognition function of the device including executing an application set in the device such as a smart TV, or smart refrigerator).
Therefore, taking the teachings of Rosner and Kim together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the voice activation system and operations thereof of Rosner with the connection to a smart device as taught by Kim at least because doing so would provide more effective speech recognition functions by using a personalized wake-up keyword according to device-based environment information (Kim para. [0012]).
Regarding claim 18, Rosner teaches wherein the co-processor is loaded between the voice collecting assembly and the voice recognition processor as a separate hardware (Rosner fig. 2, paras. 61-65, modules depicted in fig. 2 implemented as hardware, including multiple processors, where the co-processor corresponds to stage 1 and/or stage 2, which is shown as being positioned between the microphone 202 (voice collecting assembly) and the stage 3).
Allowable Subject Matter
Claim 8 would be allowable if rewritten or amended to overcome the rejections under 35 U.S.C. 112(b), and 35 U.S.C. 112(b), set forth in this Office action. Specifically, the closest cited art of record includes the combination as set forth above in the rejection for claim 13, of Rosner, Kim, Min NPL, Mitchell, Lee NPL and Gotanda. Further, while Lee NPL teaches the claimed clustering the characteristic parameters through Gaussian Mixture Models to obtain clustered characteristic parameters; establishing an observation state of Hidden Markov HMM model on the clustered characteristic parameters, none of Lee NPL, Rosner, Kim, Min NPL, Mitchell or Gotanda, whether considered alone, or in a combination obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, teaches or suggests that the clustering and the establishing limitations are performed at different power levels, where the clustering is performed at a smaller power level than the establishing. 


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHELLE M KOETH whose telephone number is (571)272-5908. The examiner can normally be reached Monday-Friday, 09:30-18:30 EDT/EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

MICHELLE M. KOETH
Primary Examiner




/MICHELLE M KOETH/Primary Examiner, Art Unit 2656