DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claim 9 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.

Claim 9 is directed to a computer-readable medium storing computer readable instructions. The broadest reasonable interpretation of a claim drawn to a computer readable medium (also called machine readable medium and other such variations) typically covers forms of non-transitory tangible media and transitory propagating signals per se in view of the ordinary and customary meaning of computer readable media, particularly when the specification is silent.  See MPEP 2111.01. When the broadest reasonable interpretation of a claim covers a signal per se, the claim must be rejected under 35 U.S.C. § 101 as covering non-statutory subject matter.  See In re Nuijten, 500 F.3d 1346, 1356-57 (Fed. Cir. 2007) (transitory embodiments are not directed to statutory subject matter) and Interim Examination Instructions for Evaluating Subject Matter Eligibility Under 35 U.S.C. § 101, Aug. 24, 2009; p. 2.
The USPTO recognizes that applicants may have claims directed to computer readable media that cover signals per se, which the USPTO must reject under 35 U.S.C. § 101 as covering both non-statutory subject matter and statutory subject matter.  In an effort to assist the patent community in overcoming a rejection or potential rejection under 35 U.S.C. § 101 in this situation, the USPTO suggests the following approach.  A claim drawn to such a computer readable medium that covers both transitory and non-transitory embodiments may be amended to narrow the claim to cover only statutory embodiments to avoid a rejection under 35 U.S.C. § 101 by adding the limitation “non-transitory” to the claim.  Cf.  Animals - Patentability, 1077 Off. Gaz. Pat. Office 24 (April 21, 1987) (suggesting that applicants add the limitation “non-human” to a claim covering a multi-cellular organism to avoid a rejection under 35 U.S.C. § 101).  Such an amendment would typically not raise the issue of new matter, even when the specification is silent because the broadest reasonable interpretation relies on the ordinary and customary meaning that includes signals per se.  The limited situations in which such an amendment could raise issues of new matter occur, for example, when the specification does not support a non-transitory embodiment because a signal per se is the only viable embodiment such that the amended claim is impermissibly broadened beyond the supporting disclosure.  See, e.g., Gentry Gallery, Inc. v. Berkline Corp., 134 F.3d 1473 (Fed. Cir. 1998).

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1 and 9-10 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Howard (US PG Pub 20190074028).
	As per claim 1, Howard discloses:
	A smart device input method based on facial vibration, comprising steps of: 	Step S1: collecting a facial vibration signal generated when a user performs voice input (Shalon; p. 0098 - Speech signals are generated by the passage of glottal waveforms of specific frequency through the vocal tract. Due to its particular shape, the vocal tracts act as a filter. A lot of information carried in the glottal source is not essential or relevant for recognition tasks, such as phones detection; Fig. 9, item 902; p. 0099 – audio signal is input); 	Step S2: extracting a Mel-frequency cepstral coefficient from the facial vibration signal (Howard; p. 0111 - The lowest spectral values contain information regarding the vocal tract filter that is well isolated from the glottal source, which is desired for phone detection problems. The higher Cepstral coefficients would be considered if the interest was pitch detection. In embodiments, only the first 12 to 16 Mel-cepstral coefficients are typically chosen, for example, with a value set to 12); and 	Step S3: taking the Mel-frequency cepstral coefficient as an observation sequence to obtain text input corresponding to the facial vibration signal by using a trained hidden Markov model (Howard; p. 0135 - In research fields associated with Automatic Speech Recognition (ASR), the Hidden Markov Model (HMM) is probably the most considered method for dynamic modelling).

	As per claim 9, Howard discloses:	A computer readable storage medium, storing a computer program, wherein when being executed by a processor (Howard; p. 0185 - The present invention may be a system, a method, and/or a computer program product at any possible technical detail level of integration. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention), the computer program implements steps of the smart device input method according to claim 1 (see rejection of claim 1).

	As per claim 10, Howard discloses
	A computer device comprising a memory and a processor, a computer program wherein the computer program runs in the processor and is stored in the memory (Howard; p. 0008 - a system for intent extraction may comprise a processor, memory accessible by the processor, and computer program instructions stored in the memory and executable by the processor to perform acquiring an audio signal relating to a conversation including the person), and the processor executes steps of the smart device input method according to claim 1 (see rejection of claim 1).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 2-5 are rejected under 35 U.S.C. 103 as being unpatentable over Howard in view of Xie (US PG Pub 20190204907).

	As per claim 2, Howard discloses:	The smart device input method according to claim 1, upon which claim 2 depends.	Howard, however, fails to disclose wherein in Step Si, the facial vibration signal is collected by a vibration sensor arranged on glasses.	Xie does teach wherein in Step Si, the facial vibration signal is collected by a vibration sensor arranged on glasses (Xie; p. 0035 - In some embodiments, the input device 120 may be any device that includes a microphone, such as a mobile computing device (e.g., a mobile phone 120-2, etc.), a computer 120-1, a tablet computer, a smart wearable device (including smart glasses such as Google Glasses, a smart watch, a smart ring, a smart helmet, etc.), a virtual reality device or an augmented reality device such as Oculus Rift, Gear VR, Hololens, or the like, or any combination thereof).
	Therefore, it would have been obvious to one of ordinary skill in the art to modify the method of Howard to include wherein in Step Si, the facial vibration signal is collected by a vibration sensor arranged on glasses, as taught by Xie, in order to improve the human-machine interaction experience (Xie; p. 0003).

	As per claim 3, Howard discloses:
	The smart device input method according to claim 1, wherein in Step S2, a vibration signal is processed by: amplifying the facial vibration signal to obtain an amplified facial vibration signal (Howard; Fig. 9, item 904; p. 0099 - an audio signal 902 is input to pre-emphasis stage 904); and intercepting a section from the amplified facial vibration signal as an effective portion and extracting the Mel-frequency cepstral coefficient from the effective portion by the smart device (Howard; p. 0111 - The lowest spectral values contain information regarding the vocal tract filter that is well isolated from the glottal source, which is desired for phone detection problems. The higher Cepstral coefficients would be considered if the interest was pitch detection. In embodiments, only the first 12 to 16 Mel-cepstral coefficients are typically chosen, for example, with a value set to 12).
	Howavrd, however, fails to disclose transmitting the amplified facial vibration signal to the smart device via a wireless module.	Xie does teach transmitting the amplified facial vibration signal to the smart device via a wireless module (Xie; p. 0039 - The server 150 may be a single server or a server group. Each server in the server group may be connected through a wired or wireless network. The server group may be centralized, for example, a data center. The server group may be distributed, e.g., a distributed system. The server 150 may be used to collect the information transmitted by the input device 120, analyze and process the inputted information based on the database 160).
	Therefore, it would have been obvious to one of ordinary skill in the art to modify the method of Howard to include transmitting the amplified facial vibration signal to the smart device via a wireless module, as taught by Xie, in order to improve the human-machine interaction experience (Xie; p. 0003).

	As per claim 4, Howard in view of Xie discloses:	The smart device input method according to claim 3, wherein the step of intercepting the section from the amplified facial vibration signal as the effective portion comprises steps of: setting a first cut-off threshold and a second cut-off threshold based on a short-term energy standard deviation σ of the amplified facial vibration signal, wherein the first cut-off threshold is TL= υ + σ, the second cut-off threshold is TH= υ + 3σ, and υ represents average energy of background noise; finding out a frame signal having a maximum short-term energy from the amplified facial vibration signal, wherein energy of the frame signal is higher than the second cut-off threshold; and respectively finding out, from a preamble frame before the frame signal and a postamble frame after the frame signal, a frame having energy lower than the first cut-off threshold, wherein the frame is closest to the frame signal in time sequence, taking an obtained preamble frame position as a starting point and taking an obtained postamble frame position as an end point, and intercepting a portion between the starting point and the end point as the effective portion of the amplified facial vibration signal (Howard; p. 0060-0067 - an improved method of computing the coefficient may be through a minimization of either the L-2, Equation (4), or L-1 norm, Equation (5). In both equations, a.sub.i corresponds to the i.sup.th element of the vector α, with i ∈ [1;n].∥α∥.sub.2=√{square root over (Σ.sub.i−1.sup.n|α.sub.i|.sup.2)}  (4) ∥α∥.sub.1=Σ.sub.i−1.sup.n|α.sub.i|  (5) The L-2 norm method, also known as Ordinary Least Squares (OLS), aims to minimize the sum of the squared deviation between channel1 and coefficient.Math.channel2, see Equation (6).
ƒ(β)=∥Y−βX∥.sub.2  (6)  As demonstrated in Equation (6), the desired multiplicative coefficient can be computed by minimizing the L-2 norm of the linear combination of Equation (3)… On the other hand, the Least Absolute Deviation (LAD) deals with the L-1 norm, Equation (5). Instead of minimizing the sum of squares residuals, it uses the absolute difference. ƒ(β)=∥Y−βX∥.sub.1  (8)…).

	As per claim 5, Howard in view of Xie disclose:
	The smart device input method according to claim 4, wherein the step of intercepting the section from the amplified facial vibration signal as the effective portion further comprises steps of: setting, for a vibration signal, a maximum interval threshold and a minimum length threshold between signal peaks; and taking two signal peaks as one signal peak of the vibration signal in response to an interval between the two signal peaks of the vibration signal being less than the maximum interval threshold; and discarding a signal peak in response to a length of the signal peak of the vibration signal being less than the minimum length threshold (Howard; p. 0086-0092 - In order to prevent those artifacts, a weighting function may be applied to each truncated waveform before the computation of the Fourier Transform. Multiplying the signal by a fixed length window may reduce the amplitude of discontinuities at the boundary of each frame. This means that it decreases the spectral leakage problem related to finite intervals. This filter has the effect of slowly and smoothly attenuating the frame edges towards zero. Among a large number of windows possible, the perfect function would not deform the spectrum. However, a trade-off between the time and frequency domain resolution of the window needs to be established. A window of finite length with abrupt boundaries, such as a rectangular window, is the simplest in the time domain, but creates artifacts in the frequency domain. A function such as the Dirac function, with a thin central peak and maxima tending towards zero elsewhere may be better in the frequency domain. But this type of function has infinite duration once transferred to the time domain, which does not correlate to an ideal time domain window function. Regardless of the selected window, completely avoiding spectral deformation is not possible and the window will not be of infinite length).

Claim(s) 6-8 are rejected under 35 U.S.C. 103 as being unpatentable over Howard in view of Czyryba (US PG Pub 20190221205).

	As per claim 6, Howard discloses:
	The smart device input method according to claim 1, upon which claim 6 depends.	Howard, however, fails to disclose wherein the trained hidden Markov model is obtained by: generating a hidden Markov model corresponding to each input button type of the smart device to obtain a plurality of hidden Markov models; constructing a training sample set corresponding to each of the plurality of hidden Markov models, wherein each observation sequence in the training sample set comprises the Mel-frequency cepstral coefficient of the facial vibration signal; and evaluating a most possible hidden Markov model as the trained hidden Markov model, wherein the most possible hidden Markov model generates a pronunciation represented by the observation.	Czyryba does teach wherein the trained hidden Markov model is obtained by: generating a hidden Markov model corresponding to each input button type of the smart device to obtain a plurality of hidden Markov models; constructing a training sample set corresponding to each of the plurality of hidden Markov models, wherein each observation sequence in the training sample set comprises the Mel-frequency cepstral coefficient of the facial vibration signal; and evaluating a most possible hidden Markov model as the trained hidden Markov model, wherein the most possible hidden Markov model generates a pronunciation represented by the observation (Czyryba; p. 0038 - In one known attempt to reduce the computational load and memory capacity requirements, the acoustic model, the start state based rejection model, and the keyphrase model may be generated by training an acoustic model using a training set of audio such that the acoustic model has multiple outputs including tied triphone (e.g., HMM-) states. For example, each of the tied triphone states may be associated with one of multiple monophones (or phonemes) in a lexicon representing the language being used. Furthermore, the acoustic model may include outputs representative of non-speech such as silence or background noise. In an implementation, an acoustic model (e.g., a DNN) may be trained by inputting audio data. Based on the acoustic model outputs (e.g., DNN-outputs), the triphones where each possible phoneme is a centerphone may be selected to remain as an output of the acoustic model. The acoustic model outputs corresponding to the centerphone that has been observed the most often during training may be selected, or in other words, the output scores of the triphones are selected, and such selected acoustic model outputs may be used as or in the rejection model. For example, one or the N most important center-phone acoustic model outputs for each monophone may be selected for the rejection model. This still required a relatively large number of rejection scores to be handled by the rejection model such as the 100 outputs mentioned above that were also added to the rejection model).	Therefore, it would have been obvious to one of ordinary skill in the art to modify the method of Howard to include wherein the trained hidden Markov model is obtained by: generating a hidden Markov model corresponding to each input button type of the smart device to obtain a plurality of hidden Markov models; constructing a training sample set corresponding to each of the plurality of hidden Markov models, wherein each observation sequence in the training sample set comprises the Mel-frequency cepstral coefficient of the facial vibration signal; and evaluating a most possible hidden Markov model as the trained hidden Markov model, wherein the most possible hidden Markov model generates a pronunciation represented by the observation, as taught by Czyryba, because while some low resource WoV systems already exist, these systems still consume too much power due to inefficient memory usage and heavy computational loads while these systems still can be noticeably inaccurate by waking to spoken words that are close to, but not the same as, the actual keyphrase, often resulting in an annoying and time-wasting experience for the user (Czyryba; p. 0002).

	As per claim 7, Howard discloses:
	The smart device input method according to claim 1, upon which claim 7 depends.
	Howard, however, fails to disclose wherein Step S3 further comprises steps of: calculating an output probability of a test sample for the plurality of hidden Markov models by using a Viterbi algorithm; and displaying a button type corresponding to the test sample and a selectable button type based on the output probability.	Czyryba does teach wherein Step S3 further comprises steps of: calculating an output probability of a test sample for the plurality of hidden Markov models by using a Viterbi algorithm; and displaying a button type corresponding to the test sample and a selectable button type based on the output probability (Czyryba; p. 0066 - Rejection model 501 having single state 511 may provide a greatly reduced rejection model 501 (e.g., in terms of memory and computational resources usage) as compared to conventional rejection models, which may implement many equally possible words or phrases or the like in parallel and may require Viterbi decoding with backtracking to provide for a most probable sequence to determine a rejection likelihood).
	Therefore, it would have been obvious to one of ordinary skill in the art to modify the method of Howard to include wherein Step S3 further comprises steps of: calculating an output probability of a test sample for the plurality of hidden Markov models by using a Viterbi algorithm; and displaying a button type corresponding to the test sample and a selectable button type based on the output probability, as taught by Czyryba, because while some low resource WoV systems already exist, these systems still consume too much power due to inefficient memory usage and heavy computational loads while these systems still can be noticeably inaccurate by waking to spoken words that are close to, but not the same as, the actual keyphrase, often resulting in an annoying and time-wasting experience for the user (Czyryba; p. 0002).

	As per claim 8, Howard in view of Czyryba disclose:
	The smart device input method according to claim 7, upon which claim 8 depends.	And further, Czyryba teaches determining whether a classification result is correct according to a button selected by the user; adding a first test sample with a correct classification result into the training sample set, wherein a corresponding classification label is the classification result; and adding a second test sample with a wrong classification result into the training sample set, wherein a corresponding classification label is a category determined according to the user's selection (Czyryba; p. 0080 - Referring to FIG. 6A, the first part of this operation may be to “determine sub-phonetic units that are associated with monophones of a lexicon and based on tied HMM-state triphones in a classification data structure that receives the monophones as the inputs” 608. An example system or data structure 650 associated with generating a keyphrase detection model including a rejection model and keyphrase model is arranged in accordance with at least some implementations of the present disclosure. Specifically, the data structure 650 may be used by the centerphone unit 720 to determine the most occurring centerphone scores, where the centerphones are each associated with a different monophone of a lexicon. This data structure 600 may be, or include, a classification data structure such as a classification and regression tree (CART) that uses a classification tree 652. A lexicon or the like may include multiple monophones or phonemes 654 associated therewith (e.g., labeled MP.sub.1, MP.sub.2, . . . , MP.sub.M) that form the lexicon such /a/, /b/, /k/, and so on, and that is the input to the classification tree 652. For example, the pronunciation of a word or phrase in a language or lexicon may be represented as a series of individual units of sound, which may be characterized as phones, and a monophone (which is a phoneme) may be characterized as a single phone without context. A lexicon or language or the like may include any number of monophones 654. This input operation may be considered as iterating over the phoneme inventory. It should be noted that the entire inventory of phoneme or monophones MP may or may not be used here, and some subset of the monophones may be used instead of all of the monophones forming a lexicon. By one form, the monophones are input to the classification tree one monophone at a time, but in other forms, the monophones are input into the classification tree as a set or subset of the lexicon being used. Also, the CART tree may not be regular, and it could potentially also be a degenerated tree, which refers to the fact that the first level does not necessarily contain entries for all monophones).
	Therefore, it would have been obvious to one of ordinary skill in the art to modify the method of Howard to include determining whether a classification result is correct according to a button selected by the user; adding a first test sample with a correct classification result into the training sample set, wherein a corresponding classification label is the classification result; and adding a second test sample with a wrong classification result into the training sample set, wherein a corresponding classification label is a category determined according to the user's selection, as taught by Czyryba, because while some low resource WoV systems already exist, these systems still consume too much power due to inefficient memory usage and heavy computational loads while these systems still can be noticeably inaccurate by waking to spoken words that are close to, but not the same as, the actual keyphrase, often resulting in an annoying and time-wasting experience for the user (Czyryba; p. 0002).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The prior art made of record and not relied upon includes:	Hong (US PG Pub 20030233233) discloses methods and systems for recognizing speech include receiving information reflecting the speech, determining at least one broad-class of the received information, classifying the received information based on the determined broad-class, selecting a model based on the classification of the received information, and recognizing the speech using the selected model and the received information (Hong; Abstract).
	Any inquiry concerning this communication or earlier communications from the examiner should be directed to Rodrigo A Chavez whose telephone number is (571)270-0139. The examiner can normally be reached Monday - Friday 9-6 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on 5712727602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/RODRIGO A CHAVEZ/Examiner, Art Unit 2658                                                                                                                                                                                                        
/RICHEMOND DORVIL/Supervisory Patent Examiner, Art Unit 2658