DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on January 5, 2022 has been entered.
Claims 1, 3, 9, 10-12 have been amended.  Claims 19-20 have been added.  Claims 1-3 and 5-20 are pending,
 
Information Disclosure Statement
The information disclosure statement (IDS) submitted on October 4, 2021; January 5, 2022; January 26, 2022; February 3, 2022; February 22, 2022, is being considered by the examiner.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3 and 5-19 are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al (US Patent Application Publication No. 2020/0286465), hereinafter, Wang in view of Poulos et al (US Patent Application Publication No. 2018/0011534), hereinafter Poulos.
Regarding claims 1 and 9, Wang teaches methods, systems, and apparatus [para 0195-0196 – where the elements of the device/apparatus/system provide for circuitry];
receiving, a first audio signal [para 0031-0034 – obtain first speech segments based on a to-be-recognized speech signal; para 0196 – sound capturing apparatus];
 identifying a first portion of the first audio signal [para 0057 –  where the identified portion is the second speech segment that corresponds to the first 61 segments]; 
comparing the second probabilities with the pre-determined probability threshold may be specifically determining whether the probabilities that the second speech segments correspond to the pre-determined keyword and the probabilities that the second speech segments do not correspond to the pre-determined keyword that are included in the second probabilities are greater than the corresponding pre-determined probability threshold – where once keywords have been determined the system initiates additional processing is an obvious step requiring only routine skill in the art ], wherein the first probability is determined based on a speech characteristic of the first portion of the first audio signal and a first sensor data [para 0041-0048 -- The keyword probabilities are processed by determining if the speech segments correspond to the pre-determined word segmentation units using segments based on the acoustic characteristics of the speech segments by using a trained neural network model, where the prediction characteristics are input to the model and acoustic characteristic of the unit frame may include any one of or a combination of any two or more of a Mel spectrum, a logarithmic Mel spectrum (obtained by performing a logarithmic operation on the Mel spectrum), a Mel frequency cepstrum coefficient (MFCC) and the like – where implementing a sensor to acquire speech data is an obvious step requiring only routine skill in the art; para 0137-0138 --  the word segmentation and processing can be based on the phonemes of the words, where a phoneme represents a form of a speech characteristic]; 
in accordance with a determination that the first probability exceeds a first threshold, presenting the first portion of the first audio signal as input to the speech processing engine [para 0057-0070 -- comparing the second probabilities with the pre-determined probability threshold may be specifically determining whether the probabilities that the second speech segments correspond to the pre-determined keyword and the probabilities that the second speech segments do not correspond to the pre-determined keyword that are included in the second probabilities are greater than the corresponding pre-determined probability threshold; para 0097 – valid state – where valid determination of keywords allows for the system to initiate additional processing is an obvious step requiring only routine skill in the art]; 
and in accordance with a determination that the first probability does not exceed the first threshold, forgoing presenting the first portion of the first audio signal as input to the speech processing engine [para 0057-0070 -- comparing the second probabilities with the pre-determined probability threshold may be specifically determining whether the probabilities that the second speech segments correspond to the pre-determined keyword and the probabilities that the second speech segments do not correspond to the pre-determined keyword that are included in the second probabilities are greater than the corresponding pre-determined probability threshold; para 0098-0102 – invalid state and the system goes back to operation to be determining a current-to-be-determined speech segment].
Wang teaches the system is implemented with a wearable device [para 0031] and mobile device [para 0031].  
Wang fails to specifically teach that the sound capturing apparatus is a microphone or the wearable device is a wearable head unit with a first sensor different from a microphone.  Poulos teaches a system for context-aware augmented reality object commands, and provides for a head mounted device [104; Fig 2]; one or more microphones for the user to enter voice commands [para 0020]; and provides for sensors for user gaze detection [para 0025; 0030-0031] and user motions and movements [para 0027; 0037-0039].  Poulos teaches the system presents commands that are currently available to the user [para 0014] so as to alleviate a user having to learn a relatively large and complex set of interactions to navigate the augmented reality experience [para 0002].
One having ordinary skill in the art would have recognized the advantages of implementing the wearable device keyword recognition processing suggested by Wang,  as a head wearable unit with additional sensors providing information to the system, as suggested by Poulos, to allow for the user to utilize speech recognition processing in a hands-busy eyes-busy environment, to ensure the user operates devices with minimal distractions and to provide the user with information regarding currently available commands to enhance the user’s experience with the system.
Regarding claims 2 and 10, the combination of Wang and Poulos teaches probability [para 0057-0070] is determined based on a comparison of the first portion of the first audio signal to a plurality of audio signals in a database, each audio signal of the plurality of audio signals associated with a probability that its respective audio signal comprises speech directed as input to a speech processing engine [para 0121 – specialized corpus of predetermined keywords]. 
Regarding claims 3 and 11, the combination of Wang and Poulos teaches receiving first sensor data from a first sensor associated with the user [para 0031 – Wang teaches a wearable device, where providing sensors on wearable device is an obvious step, and one having ordinary skill in the art at the time of the invention would have been able to implement a sensor on a wearable device to receive information on/from the user, for the purpose of receiving additional information based on the user that can be used to determine context or intent of the user, so as to ensure the system can adequately perform the user’s desired requests];  wherein the first probability [Wang para 0057-0070] is determined based on a comparison of the first sensor data [Poulos’ sensors for user gaze detection at para 0025; 0030-0031 and user motions and movements at para 0027; 0037-0039] to a plurality of sensor data in a database, each sensor data of the plurality of sensor data in the database associated with an audio signal and further associated with a probability that its respective audio signal comprises speech directed as input to a speech processing engine [Wang para 0121 – specialized corpus of predetermined keywords].
Regarding claims 5 and 12, the combination of Wang and Poulos teaches first sensor data is indicative of one or more of a position of the user, orientation of the user, eye movement of the user, eye gaze target of the user, or vital sign of the user [Poulos’ sensors for user gaze detection at para 0025; 0030-0031 and user motions and movements at para 0027; 0037-0039, where one having ordinary skill in the art would have recognized the advantages of implementing the a head wearable unit with additional sensors for eye gaze detection and user motions/movement, as suggested by Poulos, to provide the user with information regarding currently available commands to enhance the user’s experience with the system and alleviate a user having to learn a relatively large and complex set of interactions to navigate the area, as suggested by Poulos [para 0002].
Regarding claims 6 and 13, the combination of Wang and Poulos teaches wherein the first portion of the first audio signal is identified based on the first sensor data, [Poulos’ eye gaze detection processing at para 0025; 0030-31, where utilizing the gaze detection to determine and verify the user’s focus to ensure the system can correctly operate on the user’s input is an obvious step requiring only routine skill in the art.  One having ordinary skill in the art at the time of the invention would have recognized the advantages of utilizing the gaze detection processing, to determine the user’s intent, as suggested by Poulos, for the purpose resolving an ambiguity from the user’s input and thereby ensuring the system is correctly processing the user input.
Regarding claims 7 and 14, Wang teaches identifying a second portion of the first audio signal [para 0085-0094; 0124-0136]; determining, for the second portion of the first audio signal, a second probability that the second portion comprises speech directed by the user as input to the speech processing engine [para 0057-0070 -- comparing the second probabilities with the pre-determined probability threshold may be specifically determining whether the probabilities that the second speech segments correspond to the pre-determined keyword and the probabilities that the second speech segments do not correspond to the pre-determined keyword that are included in the second probabilities are greater than the corresponding pre-determined probability threshold; para 0085-0094; para 0124-0136 –additional speech segments are determined and processed; para 0098-0102 – invalid state and the system goes back to operation to be determining a current-to-be-determined speech segment].; in accordance with a determination that the second probability exceeds the first threshold, presenting the second portion of the first audio signal as input to the speech processing engine [para 0057-0070 -- comparing the second probabilities with the pre-determined probability threshold may be specifically determining whether the probabilities that the second speech segments correspond to the pre-determined keyword and the probabilities that the second speech segments do not correspond to the pre-determined keyword that are included in the second probabilities are greater than the corresponding pre-determined probability threshold; para 0085-0094; para 0124-0136 –additional speech segments are determined and processed; para 0097 – valid state – where valid determination of keywords allows for the system to initiate additional processing is an obvious step requiring only routine skill in the art]; and in accordance with a determination that the second probability does not exceed the first threshold, forgoing presenting the second portion of the first audio signal as input to the speech processing engine [para 0057-0070 -- comparing the second probabilities with the pre-determined probability threshold may be specifically determining whether the probabilities that the second speech segments correspond to the pre-determined keyword and the probabilities that the second speech segments do not correspond to the pre-determined keyword that are included in the second probabilities are greater than the corresponding pre-determined probability threshold; para 0085-0094; para 0124-0136 –additional speech segments are determined and processed; para 0098-0102 – invalid state and the system goes back to operation to be determining a current-to-be-determined speech segment]. 
Regarding claims 8 and 15, Wang teaches the second probability is determined based at least in part on the first probability [para 0159].
Regarding claim 16, Wang teaches the system is implemented with a wearable device [para 0031].  Wang fails to teach the wearable device is a wearable head unit including the microphone and circuitry.  Poulos provides for a head mounted device [104; Fig 2]; one or more microphones for the user to enter voice commands [para 0020]; and provides for sensors for user gaze detection [para 0025; 0030-0031] and user motions and movements [para 0027; 0037-0039].  Poulos teaches the system presents commands that are currently available to the user [para 0014] so as to alleviate a user having to learn a relatively large and complex set of interactions to navigate the augmented reality experience [para 0002].
One having ordinary skill in the art would have recognized the advantages of implementing the wearable device keyword recognition processing suggested by Wang,  as a head wearable unit with additional sensors providing information to the system, as suggested by Poulos, to allow for the user to utilize speech recognition processing in a hands-busy eyes-busy environment, to ensure the user operates devices with minimal distractions and to provide the user with information regarding currently available commands to enhance the user’s experience with the system.
Regarding claim 17, Wang teaches the system is implemented in a plurality of devices, but fails to specifically teach a vehicle including the microphone and circuitry.  However, providing speech processing in a vehicle so as to allow the driver to operate features of the vehicle while driving is an obvious step requiring only routine skill in the art.  Therefore, one having ordinary skill in the art at the time of the invention would have recognized the advantages of implementing the device keyword recognition processing suggested by Wang, in a vehicle environment to allow the driver the capabilities to operate features of the vehicle while driving, and thereby minimize driver distraction and maintain driver safety.
Regarding claim 18, Wang teaches the system includes an electronic voice assistant [para 0031 – personal digital assistant].
Regarding claim 19, the combination of Wang and Poulos teaches comprising generating a second audio signal, wherein: generating the second audio signal comprises, in accordance with a determination that the first probability exceeds a first threshold, including the first portion of the first audio signal in the second audio signal [para 0057-0070 -- comparing the second probabilities with the pre-determined probability threshold may be specifically determining whether the probabilities that the second speech segments correspond to the pre-determined keyword and the probabilities that the second speech segments do not correspond to the pre-determined keyword that are included in the second probabilities are greater than the corresponding pre-determined probability threshold; para 0097 – valid state – where valid determination of keywords allows for the system to initiate additional processing is an obvious step requiring only routine skill in the art]; and generating the second audio signal further comprises, in accordance with a determination that the first probability does not exceed the first threshold, forgoing including the first portion of the first audio signal in the second audio signal [para 0057-0070 -- comparing the second probabilities with the pre-determined probability threshold may be specifically determining whether the probabilities that the second speech segments correspond to the pre-determined keyword and the probabilities that the second speech segments do not correspond to the pre-determined keyword that are included in the second probabilities are greater than the corresponding pre-determined probability threshold; para 0098-0102 – invalid state and the system goes back to operation to be determining a current-to-be-determined speech segment]; generating the second audio signal comprises, in accordance with a determination that the second probability exceeds the first threshold, including the second portion of the first audio signal in the second audio signal [para 0057-0070 -- comparing the second probabilities with the pre-determined probability threshold may be specifically determining whether the probabilities that the second speech segments correspond to the pre-determined keyword and the probabilities that the second speech segments do not correspond to the pre-determined keyword that are included in the second probabilities are greater than the corresponding pre-determined probability threshold; para 0097 – valid state – where valid determination of keywords allows for the system to initiate additional processing is an obvious step requiring only routine skill in the art]; and 5 sf-4647967Application No.: 16/805,337Docket No.: 72800-20031.00 generating the second audio signal further comprises, in accordance with a determination that the second probability does not exceed the first threshold, forgoing including the second portion of the first audio signal from the second audio signal [para 0057-0070 -- comparing the second probabilities with the pre-determined probability threshold may be specifically determining whether the probabilities that the second speech segments correspond to the pre-determined keyword and the probabilities that the second speech segments do not correspond to the pre-determined keyword that are included in the second probabilities are greater than the corresponding pre-determined probability threshold; para 0098-0102 – invalid state and the system goes back to operation to be determining a current-to-be-determined speech segment].

Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Wang in view of Poulos as applied to claim 19 above, and further in view of Rao (US Patent Application Publication No. 2014/0222430).
Regarding claim 20, the combination of Wang and Poulos teaches generating the a determination that the first probability exceeds the first threshold and that the second probability exceeds the first threshold [para 0057-0070 -- comparing the second probabilities with the pre-determined probability threshold may be specifically determining whether the probabilities that the second speech segments correspond to the pre-determined keyword and the probabilities that the second speech segments do not correspond to the pre-determined keyword that are included in the second probabilities are greater than the corresponding pre-determined probability threshold; para 0097 – valid state – where valid determination of keywords allows for the system to initiate additional processing is an obvious step requiring only routine skill in the art]. The combination of Wang and Poulos fails to teach concatenating the first portion of the first audio signal and the second portion of the first audio signal.  Rao teaches a system and method for multimodal utterance detection that detects one or more segments of desired speech utterances from an audio stream and combines candidate speech segments to form a desired speech signal [para 0045].  Rao teaches the system is advantageous in providing accurate and reliable robust utterance detection [para 0006-0008].  One having ordinary skill in the art at the time of the invention would have recognized the advantages of implementing the speech segment combination processing suggested by Rao, in the Wang/Poulos system, for the purpose of providing desired speech signals, so as to provide accurate and reliable robust utterance detection, as suggested by Rao.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANGELA A ARMSTRONG whose telephone number is (571)272-7598.  The examiner can normally be reached on M,T,TH,F 11:30-8:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Desir can be reached on 571-272-7799.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


ANGELA A. ARMSTRONG
Primary Examiner
Art Unit 2659



/ANGELA A ARMSTRONG/Primary Examiner, Art Unit 2659