DETAILED ACTION
This communication is in response to the Amendments and Arguments filed on June 22, 2022. Claims 1-21 are pending and have been examined. 
All previous objection/rejections not mentioned in the previous office action has been withdrawn by the examiner.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement

The information disclosure statement (IDS) was submitted on October 28, 2019 and November 11, 2019. The submissions are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statements are being considered by the examiner.	

Response to Arguments and Amendments

The amendment filed on June 22, 2022 has been entered. Claims 1-21 remain pending in the application. Applicant’s amendments to the claims have overcome the 35 U.S.C. 101 rejection set forth in the previous office action.
The applicant amends claims 1, 8, and 16 by adding the limitation “wherein the start time is a time before the key-phrase within the microphone signal”. As discussed in the interview, applicant’s amendments to the claim have overcome the 35 U.S.C. 103 rejection set forth in the previous office action.
Applicant’s arguments with respect to the 35 U.S.C. 103 rejections for 1-21 have been considered but are moot because the arguments are directed towards amended claim language, addressed on new grounds of rejection below.

Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 8-9, and 16-17 are rejected under 35 U.S.C. 103 as being unpatentable over Dusan (U.S. Publication No. 20180324518) in view of Czyryba (U.S. Publication No. 20190221205).
Regarding claim 1, Dusan discloses a signal processing method performed by a programmed processor of a headphone that includes an accelerometer and a microphone ([0005] - The processor can generate an ASR trigger signal based on a combination of the acoustic signal and the non-acoustic signal. [0028] - In an aspect, an ASR triggering system and a method of generating an ASR trigger signal uses non acoustic data generated by an accelerometer in an earphone or headset), the method comprising: 
receiving 1) an accelerometer signal from the accelerometer (Figure 1 - 108) ([0006] – an ASR triggering system includes an accelerometer to generate a non-acoustic signal corresponding to an input command pattern made by a user) in the headphone (102) ([0025] – The ASR triggering system may also include an accelerometer mounted on headphones) and 2) a microphone signal from the microphone (114) ([0004] - the ASR triggering system may include a microphone (114) to generate an acoustic signal representing an acoustic vibration) in the headphone (102);
detecting a key-phrase within the microphone signal ([0002] – Voice assistants can be triggered by an always-on-processor (AOP) based on voice data generated by a microphone. For example, the AOP may recognize a key-phrase represented by the voice data);
generating a voice activity detection (VAD) signal based on the accelerometer signal ([0004] - a voice activity detector (VAD) may receive the non-acoustic signal from the accelerometer and generate a VAD signal based on energy or a cross-correlation value);
 However, Dusan does not disclose the method to determining a start time and an end time for the key-phrase using the microphone signal, wherein the start time is a time before the key-phrase within the microphone signal;
determining whether a portion of the VAD signal that spans between the start time and the end time indicates that the key-phrase is spoken by the wearer of the headphone; 
and responsive to determining that the portion of the VAD signal indicates that the key-phrase is spoken by the wearer of the headphone, triggering a virtual personal assistant (VPA).
Czyryba does teach the method to determining a start time and an end time for the key-phrase using the microphone signal, wherein the start time is a time before the key-phrase within the microphone signal ([0158] - updating a start state based rejection model and a keyphrase model associated with a predetermined keyphrase based on at least some of the time series of scores of sub-phonetic units, wherein both the rejection model and keyphrase model have states interconnected by transitions; propagating score-related values from the rejection model and through the keyphrase model via the transitions and comprising propagating the values through a series of consecutive silence states to intentionally add silence before or after or both at least part of a spoken keyphrase);
determining whether a portion of the VAD signal that spans between the start time and the end time indicates that the key-phrase is spoken by the wearer of the headphone ([0158] - and make a keyphrase detection determination depending on a key phrase detection likelihood score computed by using the keyphrase model);
and responsive to determining that the portion of the VAD signal indicates that the key-phrase is spoken by the wearer of the headphone, triggering a virtual personal assistant (VPA) ([0001] - activates a particular computer program such as a personal assistant (PA) application).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Dusan to incorporate the teachings of Czyryba in order to implement the method of determining a start time and an end time for the key-phrase using the microphone signal, wherein the start time is a time before the key-phrase within the microphone signal; determining whether a portion of the VAD signal that spans between the start time and the end time indicates that the key-phrase is spoken by the wearer of the headphone; and responsive to determining that the portion of the VAD signal indicates that the key-phrase is spoken by the wearer of the headphone, triggering a virtual personal assistant (VPA). Doing so allows the reduction of the number of false wakes, and thereby increase the accuracy of the speech recognition (Czyryba [0044]).
Regarding claim 2, Dusan in view of Czyryba teaches all of the limitations as in claim 1, above. 
Dusan discloses the signal processing method, wherein the accelerometer signal and the microphone signal are continuously received while the headphone is worn by the wearer ([0008] - automatic speech recognition (ASR) system having an earphone worn in an ear canal of a user [0058] - VAD 216 may output VAD signal 222 as a continuous series of high and low digital signals).
Regarding claim 8, Dusan discloses an audio system ([0004] - automatic speech recognition (ASR) triggering system) comprising: 
a headphone (102) ([0025] – The ASR triggering system may also include an accelerometer mounted on headphones) having an accelerometer and at least one microphone integrated therein ([0004] - the ASR triggering system may include a microphone to generate an acoustic signal representing an acoustic vibration. [0006] – an ASR triggering system includes an accelerometer to generate a non-acoustic signal corresponding to an input command pattern made by a user);
at least one processor (Figure 2 – 214);
and memory having stored therein instructions which when executed by the at least one processor cause the system to ([0092] - Memory may include a main memory having computer usable volatile memory, e.g., random access memory (RAM), coupled to bus for storing information and instructions for processor (s)) 
receive 1) an accelerometer signal from the accelerometer ([0040] - ASR triggering system may include a voice activity detector (VAD) to receive non-acoustic signal. In an embodiment, non-acoustic signal includes an accelerometer signal from accelerometer) and 2) a microphone signal from the at least one microphone ([0004] - the ASR triggering system may include a microphone to generate an acoustic signal representing an acoustic vibration). 
detect a key-phrase within the microphone signal ([0002] - Voice assistants can be triggered by an always on-processor (AOP) based on voice data generated by a microphone. For example, the AOP may recognize a key phrase represented by the voice data, and generate a trigger signal to activate speech recognition of a payload of the voice data); 
generate a voice activity detection (VAD) signal based on the accelerometer signal ([0004] - Similarly, a voice activity detector (VAD) may receive the non-acoustic signal from the accelerometer and generate a VAD signal based on energy or a cross-correlation value). 
However, Dusan does not disclose the system to determine a start time and an end time for the key-phrase using the microphone signal, wherein the start time is a time before the key-phrase within the microphone signal;
determine whether a portion of the VAD signal that spans between the start time and the end time indicates that the key-phrase is spoken by the wearer of the headphone; 
and responsive to a determination that the portion of the VAD signal indicates that the key-phrase is spoken by the wearer of the headphone, triggering a virtual personal assistant (VPA).
Czyryba does teach the system to determine a start time and an end time for the key-phrase using the microphone signal, wherein the start time is a time before the key-phrase within the microphone signal ([0158] - updating a start state based rejection model and a keyphrase model associated with a predetermined keyphrase based on at least some of the time series of scores of sub-phonetic units, wherein both the rejection model and keyphrase model have states interconnected by transitions; propagating score-related values from the rejection model and through the keyphrase model via the transitions and comprising propagating the values through a series of consecutive silence states to intentionally add silence before or after or both at least part of a spoken keyphrase);
determine whether a portion of the VAD signal that spans between the start time and the end time indicates that the key-phrase is spoken by the wearer of the headphone ([0158] - and make a keyphrase detection determination depending on a key phrase detection likelihood score computed by using the keyphrase model);
and responsive to a determination that the portion of the VAD signal indicates that the key-phrase is spoken by the wearer of the headphone, triggering a virtual personal assistant (VPA) ([0001] - activates a particular computer program such as a personal assistant (PA) application).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Dusan to incorporate the teachings of Czyryba in order to implement the system to determine a start time and an end time for the key-phrase using the microphone signal, wherein the start time is a time before the key-phrase within the microphone signal; determine whether a portion of the VAD signal that spans between the start time and the end time indicates that the key-phrase is spoken by the wearer of the headphone; and responsive to a determination that the portion of the VAD signal indicates that the key-phrase is spoken by the wearer of the headphone, triggering a virtual personal assistant (VPA). Doing so allows the reduction of the number of false wakes, and thereby increase the accuracy of the speech recognition (Czyryba [0044]).
Regarding claim 9, Dusan in view of Czyryba teaches all of the limitations as in claim 8, above. 
Dusan discloses the audio system, wherein the accelerometer signal and the microphone signal are continuously received while the headphone is worn by the wearer ([0008] - automatic speech recognition (ASR) system having an earphone worn in an ear canal of a user [0058] - VAD 216 may output VAD signal 222 as a continuous series of high and low digital signals).
Regarding claim 16, Dusan discloses an article of manufacture comprising a non-transitory machine-readable storage medium having instructions stored therein that when executed by a process of an audio system having a headphone (Figure 2 – 214, Figure 17 – Non-Transitory Machine Readable Storage Medium 1712, [0025] – The ASR triggering system may also include an accelerometer mounted on headphones. [0092] - Memory may include a main memory having computer usable volatile memory, e.g., random access memory (RAM), coupled to bus for storing information and instructions for processor (s)) 
receive 1) an accelerometer signal from the accelerometer ([0040] - ASR triggering system may include a voice activity detector (VAD) to receive non-acoustic signal. In an embodiment, non-acoustic signal includes an accelerometer signal from accelerometer) and 2) a microphone signal from the at least one microphone ([0004] - the ASR triggering system may include a microphone to generate an acoustic signal representing an acoustic vibration). 
detect a key-phrase within the microphone signal ([0002] - Voice assistants can be triggered by an always on-processor (AOP) based on voice data generated by a microphone. For example, the AOP may recognize a key phrase represented by the voice data, and generate a trigger signal to activate speech recognition of a payload of the voice data); 
generate a voice activity detection (VAD) signal based on the accelerometer signal ([0004] - Similarly, a voice activity detector (VAD) may receive the non-acoustic signal from the accelerometer and generate a VAD signal based on energy or a cross-correlation value). 
However, Dusan does not disclose the article of manufacture to determine a start time and an end time for the key-phrase using the microphone signal, wherein the start time is a time before the key-phrase within the microphone signal;
determine whether a portion of the VAD signal that spans between the start time and the end time indicates that the key-phrase is spoken by the wearer of the headphone; 
and responsive to a determination that the portion of the VAD signal indicates that the key-phrase is spoken by the wearer of the headphone, triggering a virtual personal assistant (VPA).
Czyryba does teach the article of manufacture to determine a start time and an end time for the key-phrase using the microphone signal, wherein the start time is a time before the key-phrase within the microphone signal ([0158] - updating a start state based rejection model and a keyphrase model associated with a predetermined keyphrase based on at least some of the time series of scores of sub-phonetic units, wherein both the rejection model and keyphrase model have states interconnected by transitions; propagating score-related values from the rejection model and through the keyphrase model via the transitions and comprising propagating the values through a series of consecutive silence states to intentionally add silence before or after or both at least part of a spoken keyphrase);
determine whether a portion of the VAD signal that spans between the start time and the end time indicates that the key-phrase is spoken by the wearer of the headphone ([0158] - and make a keyphrase detection determination depending on a key phrase detection likelihood score computed by using the keyphrase model);
and responsive to a determination that the portion of the VAD signal indicates that the key-phrase is spoken by the wearer of the headphone, triggering a virtual personal assistant (VPA) ([0001] - activates a particular computer program such as a personal assistant (PA) application).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Dusan to incorporate the teachings of Czyryba in order to implement the article of manufacture to determine a start time and an end time for the key-phrase using the microphone signal, wherein the start time is a time before the key-phrase within the microphone signal; determine whether a portion of the VAD signal that spans between the start time and the end time indicates that the key-phrase is spoken by the wearer of the headphone; and responsive to a determination that the portion of the VAD signal indicates that the key-phrase is spoken by the wearer of the headphone, triggering a virtual personal assistant (VPA). Doing so allows the reduction of the number of false wakes, and thereby increase the accuracy of the speech recognition (Czyryba [0044]).
Regarding claim 17, Dusan in view of Czyryba teaches all of the limitations as in claim 16, above. 
Dusan discloses the article of manufacture, wherein the accelerometer signal and the microphone signal are continuously received while the headphone is worn by the wearer ([0008] - automatic speech recognition (ASR) system having an earphone worn in an ear canal of a user [0058] - VAD 216 may output VAD signal 222 as a continuous series of high and low digital signals).
Claims 3-5, 10-12, and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Dusan (U.S. Publication No. 20180324518) in view of Czyryba (U.S. Publication No. 20190221205) and further in view of Gomes (U.S. Publication No. 20200160858).
Regarding claim 3, Dusan in view of Czyryba teaches all of the limitations as in claim 1, above. 
However, Dusan in view of Czyryba does not teach a signal processing method wherein determining whether the portion of the VAD signal indicates that the key-phrase is spoken by the wearer of the headphone comprises determining whether a VAD score that is based on the portion of the VAD signal is above a VAD score threshold, wherein the VAD signal indicates that the key-phrase is spoken by the wearer when the VAD score is above the VAD score threshold.
Gomes does teach a signal processing method ([0004] - Aspects provide methods and apparatus for improving the accuracy and speed of identifying and validating a WUW by an audio device As described herein, an audio device combines WUW identification or detection with inputs received from one or more of on-head detection, voice activity detection (VAD), or other sensors) wherein determining whether the portion of the VAD signal indicates that the key-phrase is spoken by the wearer of the headphone comprises determining whether a VAD score that is based on the portion of the VAD signal is above a VAD score threshold, wherein the VAD signal indicates that the key-phrase is spoken by the wearer when the VAD score is above the VAD score threshold ([0018] - According to aspects, the method further comprises, when the user is wearing the audio device , maintaining in a low power mode at least one component in the audio device used to detect at least one of the trigger word or whether the sound was generated by the user speaking, detecting a sound energy of the sound exceeds a configured threshold, and powering up the at least one component in response to detecting that the sound energy of the sound exceeds the threshold).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Dusan in view of Czyryba to incorporate the teachings of Gomes in order to generate a VAD score threshold as part of the voice activation process involving a key-phrase. Doing so allows the method to determine whether the signal is above or below the threshold which can then be used to save power (Gomes [0010]).
Regarding claim 4, Dusan in view of Czyryba in view of Gomes teaches all of the limitations as in claim 3, above. 
However, Dusan in view of Czyryba does not teach a signal processing method wherein generating a VAD signal based on the accelerometer signal comprises 
determining whether an energy level of the accelerometer signal is above an energy threshold;
in response to determining that the energy level is above the energy threshold, the VAD signal is set to a high signal level; 
and in response to determining that the energy level is below the energy threshold, the VAD signal is set to a low signal level.
Gomes does teach a signal processing method ([0004] - Aspects provide methods and apparatus for improving the accuracy and speed of identifying and validating a WUW by an audio device as described herein, an audio device combines WUW identification or detection with inputs received from one or more of on-head detection, voice activity detection (VAD), or other sensors) wherein generating a VAD signal based on the accelerometer signal comprises 
determining whether an energy level of the accelerometer signal is above an energy threshold ([0009] - According to aspects, the method further comprises determining, based on the additional information, that the user is wearing the audio device, maintaining in a low power mode, at least one component in the audio device used to detect at least one of the trigger word or whether the sound was generated by the user speaking, detecting a sound energy of the sound exceeds a configured threshold, and powering up the at least one component in response to detecting that the sound energy of the sound exceeds the threshold. [0051] On-head detection determines if the user is wearing the audio device. Several methods to perform on-head detection are contemplated. [0052] - In an example, on-head detection is determined based on the output from one or more of an accelerometer, gyroscope, and magnetometer);
in response to determining that the energy level is above the energy threshold, the VAD signal is set to a high signal level ([0009] - powering up the at least one component in response to detecting that the sound energy of the sound exceeds the threshold);
and in response to determining that the energy level is below the energy threshold, the VAD signal is set to a low signal level ([0010] - in response, powering down at least one of the microphones to save power).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Dusan in view of Czyryba to incorporate the teachings of Gomes in order to identify high and low VAD signals. Doing so allows the method to determine whether the signal is above or below the threshold which can then be used to save power (Gomes [0010]).
Regarding claim 5, Dusan in view of Czyryba in view of Gomes teaches all of the limitations as in claim 4, above. 
Dusan discloses a signal processing method ([0028] - In an aspect, an ASR triggering system and a method of generating an ASR trigger signal uses non acoustic data generated by an accelerometer in an earphone or headset), wherein the portion of the VAD signal comprises a plurality of segments, each segment having either the high signal level or the low signal level, wherein generating the VAD score comprises averaging the plurality of segments to produce an average VAD score value as the VAD score ([0043] – Alternatively, VAD may generate the non-acoustic trigger signal based on an average of VAD signal over time. Thus, during a time frame when the cross-correlation value is mostly above the predetermined correlation threshold, e.g., when the user is speaking, VAD signal and non-acoustic trigger signal may be a high digital signal. Similarly, during a time frame when the user is not speaking, VAD signal and non-acoustic trigger signal may be a low digital signal. The binary non-acoustic trigger signal may be sent to processor of ASR triggering system. Processor may store non-acoustic trigger signal to gate acoustic trigger signal as described below).
Regarding claim 10, Dusan in view of Czyryba teaches all of the limitations as in claim 8, above. 
However, Dusan in view of Czyryba does not teach the audio system wherein determining whether the portion of the VAD signal indicates that the key-phrase is spoken by the wearer of the headphone comprises determining whether a VAD score that is based on the portion of the VAD signal is above a VAD score threshold, wherein the VAD signal indicates that the key-phrase is spoken by the wearer when the VAD score is above the VAD score threshold.
Gomes does teach the audio system ([0004] - Aspects provide methods and apparatus for improving the accuracy and speed of identifying and validating a WUW by an audio device As described herein, an audio device combines WUW identification or detection with inputs received from one or more of on-head detection, voice activity detection (VAD), or other sensors) wherein determining whether the portion of the VAD signal indicates that the key-phrase is spoken by the wearer of the headphone comprises determining whether a VAD score that is based on the portion of the VAD signal is above a VAD score threshold, wherein the VAD signal indicates that the key-phrase is spoken by the wearer when the VAD score is above the VAD score threshold ([0018] - According to aspects, the method further comprises, when the user is wearing the audio device , maintaining in a low power mode at least one component in the audio device used to detect at least one of the trigger word or whether the sound was generated by the user speaking, detecting a sound energy of the sound exceeds a configured threshold, and powering up the at least one component in response to detecting that the sound energy of the sound exceeds the threshold).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Dusan in view of Czyryba to incorporate the teachings of Gomes in order to generate a VAD score threshold as part of the voice activation process involving a key-phrase. Doing so allows the method to determine whether the signal is above or below the threshold which can then be used to save power (Gomes [0010]).
Regarding claim 11, Dusan in view of Czyryba in view of Gomes teaches all of the limitations as in claim 10, above. 
However, Dusan in view of Czyryba does not teach the audio system, wherein generating a VAD signal based on the accelerometer signal comprises instructions to
determine whether an energy level of the accelerometer signal is above an energy threshold;
in response to a determination that the energy level is above the energy threshold, the VAD signal is set to a high signal level; 
and in response to a determinination that the energy level is below the energy threshold, the VAD signal is set to a low signal level.
Gomes does teach the audio system ([0004] - Aspects provide methods and apparatus for improving the accuracy and speed of identifying and validating a WUW by an audio device as described herein, an audio device combines WUW identification or detection with inputs received from one or more of on-head detection, voice activity detection (VAD), or other sensors) wherein generating a VAD signal based on the accelerometer signal comprises 
determine whether an energy level of the accelerometer signal is above an energy threshold ([0009] - According to aspects, the method further comprises determining, based on the additional information, that the user is wearing the audio device, maintaining in a low power mode, at least one component in the audio device used to detect at least one of the trigger word or whether the sound was generated by the user speaking, detecting a sound energy of the sound exceeds a configured threshold, and powering up the at least one component in response to detecting that the sound energy of the sound exceeds the threshold. [0051] On-head detection determines if the user is wearing the audio device. Several methods to perform on-head detection are contemplated. [0052] - In an example, on-head detection is determined based on the output from one or more of an accelerometer, gyroscope, and magnetometer);
in response to a determination that the energy level is above the energy threshold, the VAD signal is set to a high signal level ([0009] - powering up the at least one component in response to detecting that the sound energy of the sound exceeds the threshold);
and in response to a determination that the energy level is below the energy threshold, the VAD signal is set to a low signal level ([0010] - in response, powering down at least one of the microphones to save power).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Dusan in view of Czyryba to incorporate the teachings of Gomes in order to identify high and low VAD signals. Doing so allows the method to determine whether the signal is above or below the threshold which can then be used to save power (Gomes [0010]).
Regarding claim 12, Dusan in view of Czyryba in view of Gomes teaches all of the limitations as in claim 11, above. 
Dusan discloses the audio system ([0028] - In an aspect, an ASR triggering system and a method of generating an ASR trigger signal uses non acoustic data generated by an accelerometer in an earphone or headset), wherein the portion of the VAD signal comprises a plurality of segments, each segment having either the high signal level or the low signal level, wherein generating the VAD score comprises averaging the plurality of segments to produce an average VAD score value as the VAD score ([0043] – Alternatively, VAD may generate the non-acoustic trigger signal based on an average of VAD signal over time. Thus, during a time frame when the cross-correlation value is mostly above the predetermined correlation threshold, e.g., when the user is speaking, VAD signal and non-acoustic trigger signal may be a high digital signal. Similarly, during a time frame when the user is not speaking, VAD signal and non-acoustic trigger signal may be a low digital signal. The binary non-acoustic trigger signal may be sent to processor of ASR triggering system. Processor may store non-acoustic trigger signal to gate acoustic trigger signal as described below).
Regarding claim 18, Dusan in view of Czyryba teaches all of the limitations as in claim 16, above. 
However, Dusan in view of Czyryba does not teach the article of manufacture wherein determining whether the portion of the VAD signal indicates that the key-phrase is spoken by the wearer of the headphone comprises determining whether a VAD score that is based on the portion of the VAD signal is above a VAD score threshold, wherein the VAD signal indicates that the key-phrase is spoken by the wearer when the VAD score is above the VAD score threshold.
Gomes does teach the article of manufacture ([0004] - Aspects provide methods and apparatus for improving the accuracy and speed of identifying and validating a WUW by an audio device As described herein, an audio device combines WUW identification or detection with inputs received from one or more of on-head detection, voice activity detection (VAD), or other sensors) wherein determining whether the portion of the VAD signal indicates that the key-phrase is spoken by the wearer of the headphone comprises determining whether a VAD score that is based on the portion of the VAD signal is above a VAD score threshold, wherein the VAD signal indicates that the key-phrase is spoken by the wearer when the VAD score is above the VAD score threshold ([0018] - According to aspects, the method further comprises, when the user is wearing the audio device , maintaining in a low power mode at least one component in the audio device used to detect at least one of the trigger word or whether the sound was generated by the user speaking, detecting a sound energy of the sound exceeds a configured threshold, and powering up the at least one component in response to detecting that the sound energy of the sound exceeds the threshold).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Dusan in view of Czyryba to incorporate the teachings of Gomes in order to generate a VAD score threshold as part of the voice activation process involving a key-phrase. Doing so allows the method to determine whether the signal is above or below the threshold which can then be used to save power (Gomes [0010]).
Regarding claim 19, Dusan in view of Czyryba in view of Gomes teaches all of the limitations as in claim 18, above. 
However, Dusan in view of Czyryba does not teach the article of manufacture, wherein generating a VAD signal based on the accelerometer signal comprises instructions to
determine whether an energy level of the accelerometer signal is above an energy threshold;
in response to a determination that the energy level is above the energy threshold, the VAD signal is set to a high signal level; 
and in response to a determinination that the energy level is below the energy threshold, the VAD signal is set to a low signal level.
Gomes does teach the article of manufacture ([0004] - Aspects provide methods and apparatus for improving the accuracy and speed of identifying and validating a WUW by an audio device as described herein, an audio device combines WUW identification or detection with inputs received from one or more of on-head detection, voice activity detection (VAD), or other sensors) wherein generating a VAD signal based on the accelerometer signal comprises 
determine whether an energy level of the accelerometer signal is above an energy threshold ([0009] - According to aspects, the method further comprises determining, based on the additional information, that the user is wearing the audio device, maintaining in a low power mode, at least one component in the audio device used to detect at least one of the trigger word or whether the sound was generated by the user speaking, detecting a sound energy of the sound exceeds a configured threshold, and powering up the at least one component in response to detecting that the sound energy of the sound exceeds the threshold. [0051] On-head detection determines if the user is wearing the audio device. Several methods to perform on-head detection are contemplated. [0052] - In an example, on-head detection is determined based on the output from one or more of an accelerometer, gyroscope, and magnetometer);
in response to a determination that the energy level is above the energy threshold, the VAD signal is set to a high signal level ([0009] - powering up the at least one component in response to detecting that the sound energy of the sound exceeds the threshold);
and in response to a determination that the energy level is below the energy threshold, the VAD signal is set to a low signal level ([0010] - in response, powering down at least one of the microphones to save power).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Dusan in view of Czyryba to incorporate the teachings of Gomes in order to identify high and low VAD signals. Doing so allows the method to determine whether the signal is above or below the threshold which can then be used to save power (Gomes [0010]).
Regarding claim 20, Dusan in view of Czyryba in view of Gomes teaches all of the limitations as in claim 19, above. 
Dusan discloses the article of manufacture ([0028] - In an aspect, an ASR triggering system and a method of generating an ASR trigger signal uses non acoustic data generated by an accelerometer in an earphone or headset), wherein the portion of the VAD signal comprises a plurality of segments, each segment having either the high signal level or the low signal level, wherein generating the VAD score comprises averaging the plurality of segments to produce an average VAD score value as the VAD score ([0043] – Alternatively, VAD may generate the non-acoustic trigger signal based on an average of VAD signal over time. Thus, during a time frame when the cross-correlation value is mostly above the predetermined correlation threshold, e.g., when the user is speaking, VAD signal and non-acoustic trigger signal may be a high digital signal. Similarly, during a time frame when the user is not speaking, VAD signal and non-acoustic trigger signal may be a low digital signal. The binary non-acoustic trigger signal may be sent to processor of ASR triggering system. Processor may store non-acoustic trigger signal to gate acoustic trigger signal as described below).
Claims 6, 13, and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Dusan (U.S. Publication No. 20180324518) in view of Czyryba (U.S. Publication No. 20190221205) in view of Gomes (U.S. Publication No. 20200160858), and further in view of Dadu (U.S. Publication No. 20150179189).
	Regarding claim 6, Dusan in view of Czyryba in view of Gomes teaches all of the limitations as in claim 5, above. Dusan in view of Czyryba in view of Gomes teaches about an average VAD score value ([0043] – Alternatively, VAD may generate the non-acoustic trigger signal based on an average of VAD signal over time. Thus, during a time frame when the cross-correlation value is mostly above the predetermined correlation threshold, e.g., when the user is speaking, VAD signal and non-acoustic trigger signal may be a high digital signal. Similarly, during a time frame when the user is not speaking, VAD signal and non-acoustic trigger signal may be a low digital signal. The binary non-acoustic trigger signal may be sent to processor of ASR triggering system. Processor may store non-acoustic trigger signal to gate acoustic trigger signal as described below).
However, Dusan in view of Czyryba in view of Gomes does not teach a signal processing method wherein generating the VAD score further comprises applying a correction factor that accounts for segments of the portion of the VAD signal that are at the low signal level in a normally uttered key-phrase.
Dadu does teach a signal processing method ([0016] - Additionally, a speaker verification (SV) stage of the pipeline may confirm whether an identified key phrase was spoken by a known/enrolled user wherein generating the VAD score further comprises applying a correction factor that accounts for segments of the portion of the VAD signal that are at the low signal level in a normally uttered key-phrase (Figure 4 – Filters 41 – Low-pass Filter is used to allow low signal levels which are then corrected by noise estimation and error variance techniques, [0021] - various filters (e.g., high pass, low-pass, adaptive) are applied to the signals. In addition, noise estimation and error variance techniques may be use to optimize the filtered results and obtain fused audio that may be provided to speaker verification and/or voice interaction stages of an automated voice operation pipeline). Dadu teaches various filter and error techniques that can be assumed to allow low signal levels in order to optimize the signal.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Dusan view of Czyryba in view of Gomes to incorporate the teachings of Dadu to provide a correction factor to the average VAD score value in order to account for segments of the portion of the VAD signal in a normally uttered key-phrase that are at the low signal level. Doing so optimizes the filtered results which may be used to provide speaker verification (Dadu [0021]).
Regarding claim 13, Dusan in view of Czyryba in view of Gomes teaches all of the limitations as in claim 12, above. Dusan in view of Czyryba in view of Gomes teaches about an average VAD score value ([0043] – Alternatively, VAD may generate the non-acoustic trigger signal based on an average of VAD signal over time. Thus, during a time frame when the cross-correlation value is mostly above the predetermined correlation threshold, e.g., when the user is speaking, VAD signal and non-acoustic trigger signal may be a high digital signal. Similarly, during a time frame when the user is not speaking, VAD signal and non-acoustic trigger signal may be a low digital signal. The binary non-acoustic trigger signal may be sent to processor of ASR triggering system. Processor may store non-acoustic trigger signal to gate acoustic trigger signal as described below).
However, Dusan in view of Czyryba in view of Gomes does not teach the audio system wherein the instructions to generate the VAD score further comprises instructions to apply a correction factor that accounts for segments of the portion of the VAD signal that are at the low signal level in a normally uttered key-phrase.
Dadu does teach the audio system wherein the instructions to generate the VAD score further comprises instructions to apply a correction factor that accounts for segments of the portion of the VAD signal that are at the low signal level in a normally uttered key-phrase (Figure 4 – Filters 41 – Low-pass Filter is used to allow low signal levels which are then corrected by noise estimation and error variance techniques, [0021] - various filters (e.g., high pass, low-pass, adaptive) are applied to the signals. In addition, noise estimation and error variance techniques may be use to optimize the filtered results and obtain fused audio that may be provided to speaker verification and/or voice interaction stages of an automated voice operation pipeline). Dadu teaches various filter and error techniques that can be assumed to allow low signal levels in order to optimize the signal.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Dusan in view of Czyryba in view of Gomes to incorporate the teachings of Dadu to provide a correction factor to the average VAD score value in order to account for segments of the portion of the VAD signal in a normally uttered key-phrase that are at the low signal level. Doing so optimizes the filtered results which may be used to provide speaker verification (Dadu [0021]).
Regarding claim 21, Dusan in view of Czyryba in view of Gomes teaches all of the limitations as in claim 20, above. Dusan in view of Czyryba in view of Gomes teaches about an average VAD score value ([0043] – Alternatively, VAD may generate the non-acoustic trigger signal based on an average of VAD signal over time. Thus, during a time frame when the cross-correlation value is mostly above the predetermined correlation threshold, e.g., when the user is speaking, VAD signal and non-acoustic trigger signal may be a high digital signal. Similarly, during a time frame when the user is not speaking, VAD signal and non-acoustic trigger signal may be a low digital signal. The binary non-acoustic trigger signal may be sent to processor of ASR triggering system. Processor may store non-acoustic trigger signal to gate acoustic trigger signal as described below).
However, Dusan in view of Czyryba in view of Gomes does not teach the article of manufacture wherein the instructions to generate the VAD score further comprises instructions to apply a correction factor that accounts for segments of the portion of the VAD signal that are at the low signal level in a normally uttered key-phrase.
Dadu does teach the article of manufacture wherein the instructions to generate the VAD score further comprises instructions to apply a correction factor that accounts for segments of the portion of the VAD signal that are at the low signal level in a normally uttered key-phrase (Figure 4 – Filters 41 – Low-pass Filter is used to allow low signal levels which are then corrected by noise estimation and error variance techniques, [0021] - various filters (e.g., high pass, low-pass, adaptive) are applied to the signals. In addition, noise estimation and error variance techniques may be use to optimize the filtered results and obtain fused audio that may be provided to speaker verification and/or voice interaction stages of an automated voice operation pipeline). Dadu teaches various filter and error techniques that can be assumed to allow low signal levels in order to optimize the signal.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Dusan in view of Czyryba in view of Gomes to incorporate the teachings of Dadu to provide a correction factor to the average VAD score value in order to account for segments of the portion of the VAD signal in a normally uttered key-phrase that are at the low signal level. Doing so optimizes the filtered results which may be used to provide speaker verification (Dadu [0021]).
Claims 7 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Dusan (U.S. Publication No. 20180324518) in view of Czyryba (U.S. Publication No. 20190221205) in view of Gomes (U.S. Publication No. 20200160858), and further in view of Ossowski (U.S. Publication No. 20190051299).
	Regarding claim 7, Dusan in view of Czyryba in view of Gomes teaches all of the limitations as in claim 3, above. 
However, Dusan in view of Czyryba in view of Gomes does not teach a signal processing method wherein the headphone is communicatively coupled with a multimedia device via a communication data link, wherein triggering the VPA comprises 
generating a trigger signal; 
and transmitting from the headphone and over the communication data link, at least one of the trigger signal, the VAD score, and the microphone signal to the multimedia device.
Ossowski does teach a signal processing method ([0001] - Such PAs can receive these wake-up calls in the form of audio signals from microphones where the audio signals include a person's spoken words), wherein the headphone ([0054] - Device may be any suitable audio computing device such as a computer, a smartspeaker, a personal speech assistant, a laptop, an ultrabook, a smartphone, a tablet, a phablet, a wearable device such as a smart watch or wrist band, eye glasses , headphones, a security device whether a separate device, a conferencing device, a cloud based computing device, or the like) is communicatively coupled with a multimedia device via a communication data link ([0157] - Platform and/or content services device(s) may be coupled to a network to communicate (e.g., send and/or receive) media information to and from network. Content delivery device (s) also may be coupled to platform and/or to display), wherein triggering the VPA ([0001] – artificial intelligence (AI) assistant (also referred to herein as virtual assistant (VA) or personal assistant (PA)) comprises 
generating a trigger signal ([0001] - The recognition of the keyphrase also may trigger the activation of other applications to perform other automatic actions);
and transmitting from the headphone ([0054] - Referring to FIG. 3, an audio processing device 300 performs automatic speech recognition activation for a personal assistant (PA) voice activation application or other automatic speech recognition applications for this example. Device may be any suitable audio computing device such as a computer, a smartspeaker, a personal speech assistant, a laptop, an ultrabook , a smartphone, a tablet, a phablet, a wearable device such as a smart watch or wrist band, eye glasses, headphones, a security device whether a separate device, a conferencing device, a cloud based computing device, or the like) and over the communication data link, at least one of the trigger signal, the VAD score, and the microphone signal to the multimedia device ([0158] - It will be appreciated that the content may be communicated uni-directionally and/or bi directionally to and from any one of the components in system and a content provider via network. Examples of content may include any media information including, for example, video, music, medical and gaming information, and so forth).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Dusan in view of Czyryba in view of Gomes to incorporate the teachings of Ossowski to provide a signal processing method that couples a headphone with a multimedia device that can transmit a trigger signal, VAD score, or microphone signal. This allows data to be transmitted back and forth from a device to other devices (Ossowski [0157]).
Regarding claim 14, Dusan in view of Czyryba in view of Gomes teaches all of the limitations as in claim 10, above. 
However, Dusan in view of Czyryba in view of Gomes does not teach an audio system wherein the instructions to trigger the VPA comprises instructions to 1) generate a trigger signal and 2) transmit, over a communication data link and to a multimedia device, at least one of the trigger signal, the VAD score and the microphone signal.
Ossowski does teach an audio system ([0001] – keyphrase detection KPD (or wake on voice (WoV)) systems) wherein the instructions to trigger the VPA ([0001] – artificial intelligence (AI) assistant (also referred to herein as virtual assistant (VA) or personal assistant (PA)) comprises instructions to 1) generate a trigger signal ([0001] - The recognition of the keyphrase also may trigger the activation of other applications to perform other automatic actions) and 2) transmit, over a communication data link and to a multimedia device, at least one of the trigger signal, the VAD score and the microphone signal ([0098] - It will be appreciated that as with device 300, the speaker recognition and keyphrase detection units 704, 706, 708, 710, and 722 of device 700, as well as the application providing data for the playback unit 714 , may be entirely or partly remote from device 700 and transmit data back and forth from device 700 over a network, such as the internet and to a cloud or server that operates the units, [0157] - Platform and/or content services device(s) may be coupled to a network to communicate (e.g., send and/or receive) media information to and from network. Content delivery device s) also may be coupled to platform and/or to display, [0158] - It will be appreciated that the content may be communicated uni-directionally and/or bi directionally to and from any one of the components in system and a content provider via network. Examples of content may include any media information including, for example, video, music, medical and gaming information, and so forth). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Dusan in view of Czyryba in view of Gomes to incorporate the teachings of Ossowski to provide an audio system that couples a headphone with a multimedia device that can transmit a trigger signal, VAD score, or microphone signal. This allows data to be transmitted back and forth from a device to other devices (Ossowski [0157]).
Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over over Dusan (U.S. Publication No. 20180324518) in view of Czyryba (U.S. Publication No. 20190221205), and further in view of Ossowski (U.S. Publication No. 20190051299).
Regarding claim 15, Dusan in view of Czyryba teaches all of the limitations as in claim 8, above. 
However, Dusan in view of Czyryba does not teach an audio system further comprising a multimedia device that is communicatively coupled with the headphone via a communication data link, wherein the instructions for determining whether the portion of VAD signal indicates that the key-phrase is spoken by the wearer of the headphone is executed by a processor of the multimedia device.
Ossowski does teach an audio system ([0001] – keyphrase detection KPD (or wake on voice (WoV)) systems) further comprising a multimedia device that is communicatively coupled with the headphone via a communication data link, wherein the instructions for determining whether the portion of the VAD signal indicates that the key-phrase is spoken by the wearer of the headphone ([0040] - Specifically, the system and method disclosed herein are used to detect or recognize the voice of the speaker of the keyphrase which triggers an action, such as the weakening of the PA, and omits the action if the speaker is identified as a computer originated voice such as the PA's own voice. This is accomplished by using a voice or speaker recognition (SR) system such as with speaker verification (or identification) in order to learn the characteristics of the its own PA voice, other voice applications, and other popular, known PA voices when desirable, and filter out detections of that voice or voices, [0043] - Referring to FIG . 2, an example process of false keyphrase rejection using speaker recognition is arranged in accordance with at least some implementations of the present disclosure… Process or portions thereof may be performed by a device or system. [0054] - Referring to FIG. 3, an audio processing device 300 performs automatic speech recognition activation for a personal assistant (PA) voice activation application or other automatic speech recognition applications for this example. Device may be any suitable audio computing device such as a computer, a smartspeaker, a personal speech assistant, a laptop, an ultrabook , a smartphone, a tablet, a phablet, a wearable device such as a smart watch or wrist band, eye glasses, headphones, a security device whether a separate device, a conferencing device, a cloud based computing device, or the like)  is executed by a processor of the multimedia device ([0157] - Platform and/or content services device(s) may be coupled to a network to communicate (e.g., send and/or receive) media information to and from network. Content delivery device (s) also may be coupled to platform and/or to display, [0158] - It will be appreciated that the content may be communicated uni-directionally and/or bi directionally to and from any one of the components in system and a content provider via network. Examples of content may include any media information including, for example, video, music, medical and gaming information, and so forth). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Dusan in view of Czyryba to incorporate the teachings of Ossowski to provide an audio system that couples a headphone with a multimedia device that can transmit a trigger signal, VAD score, or microphone signal. This allows data to be transmitted back and forth from a device to other devices (Ossowski [0157]).
Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Dorau (U.S. Publication No. 20190043479) teaches wake on voice key phrase segmentation. Kajarekar (U.S. Patent No. 10789959) teaches training speaker recognition models for digital assistants). Liu (U.S. Publication No. 20210120206) teaches an in-call experience enhancement for assistant systems. Meiyappan (U.S. Publication No. 10681453) teaches an automatic active noise reduction (ANR) control to improved user interaction. Pedersen (U.S. Publication 20210105565) teaches a hearing device comprising a detector and a trained neural network. Rohde (U.S. Publication No. 20200336846) teaches a hearing device comprising a keyword detector and an own voice detector and/or a transmitter. Zheng (U.S Publication No. 20190034604) teaches a voice activation method for service provisioning or smart assistant devices. 
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ETHAN DANIEL KIM whose telephone number is (571) 272-1405.  The examiner can normally be reached on Monday - Friday 9:00 - 5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on (571) 272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ETHAN DANIEL KIM/Examiner, Art Unit 2658                                                                                                                                                                                                       /VIJAY B CHAWAN/Primary Examiner, Art Unit 2658