DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Office Action is in response to the amendment filed September 14, 2021.  Claims 1, 3, 5, 9, and 11-12 have been amended.  Claim 4 has been cancelled.  Claims 1-3 and 5-18 are pending.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on September 14, 2021, is being considered by the examiner.

Claim Rejections - 35 USC § 101
Claims 1 and 9 have been amended to incorporate features of claims 3 and 4.   The rejections under 35 USC 101 are withdrawn.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:


Claims 1-3 and 5-18 are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al (US Patent Application Publication No. 2020/0286465), hereinafter, Wang in view of Abramson et al (US Patent Application Publication No. 2016/0216130), hereinafter Abramson.
Regarding claims 1 and 9, Wang teaches methods, systems, and apparatus [para 0195-0196 – where the elements of the device/apparatus/system provide for circuitry];
receiving, a first audio signal [para 0031-0034 – obtain first speech segments based on a to-be-recognized speech signal; para 0196 – sound capturing apparatus];
 identifying a first portion of the first audio signal [para 0057 –  where the identified portion is the second speech segment that corresponds to the first 61 segments]; 
determining, for the first portion of the first audio signal, a first probability that the first portion comprises speech directed by a user of the speech processing engine as input to the speech processing engine [para 0057-0070 -- comparing the second probabilities with the pre-determined probability threshold may be specifically determining whether the probabilities that the second speech segments correspond to the pre-determined keyword and the probabilities that the second speech segments do not correspond to the pre-determined keyword that are included in the second probabilities are greater than the corresponding pre-determined probability threshold – where once keywords have been determined the system initiates additional processing is an obvious step requiring only routine skill in the art ], wherein the first probability is determined based on a speech characteristic of the first portion of the first audio signal and a first sensor data [para 0041-0048 -- The keyword probabilities are processed by determining if the speech segments correspond to the pre-determined word segmentation units using segments based on the acoustic characteristics of the speech segments by using a trained neural network model, where the prediction characteristics are input to the model and acoustic characteristic of the unit frame may include any one of or a combination of any two or more of a Mel spectrum, a logarithmic Mel spectrum (obtained by performing a logarithmic operation on the Mel spectrum), a Mel frequency cepstrum coefficient (MFCC) and the like – where implementing a sensor to acquire speech data is an obvious step requiring only routine skill in the art; para 0137-0138 --  the word segmentation and processing can be based on the phonemes of the words, where a phoneme represents a form of a speech characteristic]; 
in accordance with a determination that the first probability exceeds a first threshold, presenting the first portion of the first audio signal as input to the speech processing engine [para 0057-0070 -- comparing the second probabilities with the pre-determined probability threshold may be specifically determining whether the probabilities that the second speech segments correspond to the pre-determined keyword and the probabilities that the second speech segments do not correspond to the pre-determined keyword that are included in the second probabilities are greater than the corresponding pre-determined probability threshold; para 0097 – valid state – where valid determination of keywords allows for the system to initiate additional processing is an obvious step requiring only routine skill in the art]; 
and in accordance with a determination that the first probability does not exceed the first threshold, forgoing presenting the first portion of the first audio signal as input to the speech processing engine [para 0057-0070 -- comparing the second probabilities with the pre-determined probability threshold may be specifically determining whether the probabilities that the second speech segments correspond to the pre-determined keyword and the probabilities that the second speech segments do not correspond to the pre-determined keyword that are included in the second probabilities are greater than the corresponding pre-determined probability threshold; para 0098-0102 – invalid state and the system goes back to operation to be determining a current-to-be-determined speech segment].
Wang fails to specifically teach that the sound capturing apparatus is a microphone.  However, utilizing a microphone to capture speech signals was well known in the art of speech signal processing.  
Wang teaches the system is implemented with a wearable device [para 0031] and mobile device [para 0031].  Wang fails to teach the wearable device is a wearable head unit.  Abramson teaches an enhanced navigation instruction system that processes audio to determine if the received audio is similar or identically the same source or from different sources [Fig 2A, 2C, 2D, 3; para 0690; 0466] provides for sensors and microphones to detect inputs [para 0179] and teaches the system can be implemented in a head wearable device [para 1456].  Abramson suggests the system  can be effectively employed in practically any scenario where the determination and/or 
Regarding claims 2 and 10, Wang teaches probability [para 0057-0070] is determined based on a comparison of the first portion of the first audio signal to a plurality of audio signals in a database, each audio signal of the plurality of audio signals associated with a probability that its respective audio signal comprises speech directed as input to a speech processing engine [para 0121 – specialized corpus of predetermined keywords]. 
Regarding claims 3 and 11, Wang teaches receiving first sensor data from a first sensor associated with the user [para 0031 – Wang teaches a wearable device, where providing sensors on wearable device is an obvious step, and one having ordinary skill in the art at the time of the invention would have been able to implement a sensor on a wearable device to receive information on/from the user, for the purpose of receiving additional information based on the user that can be used to determine context or intent of the user, so as to ensure the system can adequately perform the user’s desired requests];  wherein the first probability [para 0057-0070] is determined based on a comparison of the first sensor data to a plurality of sensor data in a database, each sensor data of the plurality of sensor data in the database associated with an audio signal and further associated with a probability that its respective audio signal specialized corpus of predetermined keywords].
Regarding claims 5 and 12, the combination of Wang and Abramson teaches first sensor data is indicative of one or more of a position, orientation, eye movement, eye gaze target, or vital sign of the user [Abramson’s gaze detection at para 0666 to be used in authenticating the device, where one having ordinary skill in the art would have recognized the advantages of implementing the wearable device keyword recognition processing suggested by Wang,  as a head wearable unit utilizing eye gaze detection, as suggested by Abramson, to allow for the authorized user to utilized speech recognition processing in a hands-busy eyes-busy environment, to ensure the user operates devices with minimal distractions].
Regarding claims 6 and 13, the combination of Wang and Abramson teaches wherein the first portion of the first audio signal is identified based on the first sensor data, where utilizing the gaze detection to determine and verify the user’s focus to ensure the system can correctly operate on the user’s input is an obvious step requiring only routine skill in the art.  Therefore, one having ordinary skill in the art at the time of the invention would have recognized the advantages of utilizing the gaze detection processing, to determine the user’s intent, as was known, for the purpose resolving an ambiguity from the user’s input and thereby ensuring the system is correctly processing the user input.
Regarding claims 7 and 14, Wang teaches identifying a second portion of the first audio signal [para 0085-0094; 0124-0136]; determining, for the second portion of the first audio signal, a second probability that the second portion comprises speech comparing the second probabilities with the pre-determined probability threshold may be specifically determining whether the probabilities that the second speech segments correspond to the pre-determined keyword and the probabilities that the second speech segments do not correspond to the pre-determined keyword that are included in the second probabilities are greater than the corresponding pre-determined probability threshold; para 0085-0094; para 0124-0136 –additional speech segments are determined and processed; para 0098-0102 – invalid state and the system goes back to operation to be determining a current-to-be-determined speech segment].; in accordance with a determination that the second probability exceeds the first threshold, presenting the second portion of the first audio signal as input to the speech processing engine [para 0057-0070 -- comparing the second probabilities with the pre-determined probability threshold may be specifically determining whether the probabilities that the second speech segments correspond to the pre-determined keyword and the probabilities that the second speech segments do not correspond to the pre-determined keyword that are included in the second probabilities are greater than the corresponding pre-determined probability threshold; para 0085-0094; para 0124-0136 –additional speech segments are determined and processed; para 0097 – valid state – where valid determination of keywords allows for the system to initiate additional processing is an obvious step requiring only routine skill in the art]; and in accordance with a determination that the second probability does not exceed the first threshold, forgoing presenting the second portion of the first audio signal as input to the speech processing engine [para 0057-0070 -- comparing the second probabilities with the pre-determined probability threshold may be specifically determining whether the probabilities that the second speech segments correspond to the pre-determined keyword and the probabilities that the second speech segments do not correspond to the pre-determined keyword that are included in the second probabilities are greater than the corresponding pre-determined probability threshold; para 0085-0094; para 0124-0136 –additional speech segments are determined and processed; para 0098-0102 – invalid state and the system goes back to operation to be determining a current-to-be-determined speech segment]. 
Regarding claims 8 and 15, Wang teaches the second probability is determined based at least in part on the first probability [para 0159].
Regarding claim 16, Wang teaches the system is implemented with a wearable device [para 0031].  Wang fails to teach the wearable device is a wearable head unit including the microphone and circuitry.  Abramson teaches an enhanced navigation instruction system that processes audio to determine if the received audio is similar or identically the same source or from different sources [Fig 2A, 2C, 2D, 3; para 0690; 0466] provides for sensors and microphones to detect inputs [para 0179] and teaches the system can be implemented in a head wearable device [para 1456].  Abramson suggests the system  can be effectively employed in practically any scenario where the determination and/or identification of a user or usage of a mobile device is of value, such as in the context of exercising or game playing [para 1466].  One having ordinary skill in the art would have recognized the advantages of implementing the wearable device keyword recognition processing suggested by Wang,  as a head wearable unit, as suggested by Abramson, to allow for the user to utilized speech recognition 
Regarding claim 17, Wang teaches the system is implemented in a plurality of devices, but fails to specifically teach a vehicle including the microphone and circuitry.  However, providing speech processing in a vehicle so as to allow the driver to operate features of the vehicle while driving is an obvious step requiring only routine skill in the art.  Therefore, one having ordinary skill in the art at the time of the invention would have recognized the advantages of implementing the device keyword recognition processing suggested by Wang, in a vehicle environment to allow the driver the capabilities to operate features of the vehicle while driving, and thereby minimize driver distraction and maintain driver safety.
Regarding claim 18, Wang teaches the system includes an electronic voice assistant [para 0031 – personal digital assistant].

Response to Arguments
Applicant's arguments filed September 14, 2021, have been fully considered but they are not persuasive. 
Applicant argues Wang's omission of such sensors fails to teach or suggest how the inputs of such sensors would be used.  The Examiner notes, Wang is directed to speech recognition of speech uttered by a user, where to ability to receive the speech uttered by the user necessarily requires some form of sensors/microphones/inputs if the speech is to be processed and recognized.  
Applicant argues Wang is directed to identifying keywords from speech and Wang fails to teach or suggest wherein the first probability is determined based on a speech characteristic of the first portion of the first audio signal and the first sensor data, as claimed.  The Examiner respectfully disagrees.  Wang teaches [para 0041-0048] the keyword probabilities are processed by determining if the speech segments correspond to the pre-determined word segmentation units using segments based on the acoustic characteristics of the speech segments by using a trained neural network model, where the prediction characteristics are input to the model and acoustic characteristic of the unit frame may include any one of or a combination of any two or more of a Mel spectrum, a logarithmic Mel spectrum (obtained by performing a logarithmic operation on the Mel spectrum), a Mel frequency cepstrum coefficient (MFCC) and the like and  [para 0137-0138] the word segmentation and processing can be based on the phonemes of the words, where a phoneme represents a form of a speech characteristic. 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Hayakawa (US Patent Application Publication No. 2017/0148429) teaches keyword probability detection using phoneme Hidden Markov Models.

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP 
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 



Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANGELA A ARMSTRONG whose telephone number is (571)272-7598.  The examiner can normally be reached on M,T,TH,F 11:30-8:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Desir can be reached on 571-272-7799.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.



ANGELA A. ARMSTRONG
Primary Examiner
Art Unit 2659



/ANGELA A ARMSTRONG/Primary Examiner, Art Unit 2659