DETAILED ACTION
This action is in response to the amendment filed on 06/10/2022.

Response to Amendment
Applicant’s amendment filed on 06/10/22 has been entered. Claims 1, 9  12 and 14 have been amended. Claim 23 has been canceled. No claims have been added. Claims 1, 3 – 9 and 12-22 and 24 are still pending in this application, with claims 1, 9 and 12 being independent.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3, 5, 12, 17 – 19, 21, 22 and 24 are rejected under 35 U.S.C. 103 as being unpatentable over VanBlon et al. (US 2017/0169817) (“VanBlon”) in view of Xu (US 2021/0295839), and further in view of Doshi et al. (US 2019/0318759) and further in view of Wang et al. (US 2014/0337028).
For claim 1, VanBlon discloses an electronic device (Abstract) comprising: a microphone (Fig.1, 120; [0021]); a display (touch screen, Fig.1, 170; [0021]); at least one processor(Fig.1, 110; [0019]) operatively connected to the microphone and the display ( Fig.1, 110, 120 and 170; [0019] [0021]); and at least one memory (flash memory and SDRAM) operatively coupled to the at least one processor (Fig.1, 110, 180 and 190; [0019] [0021]), and wherein the at least one memory stores instruction that, when executed, cause the processor ([0046]): receive a wake-up utterance (activation cue or trigger phrase to allow a device to wake up, e.g. “Hey Siri”, , [0014]), wherein the wake-up utterance includes the predetermined wake up word, activate the voice recognition service (personal assistant  application) and receive a first user utterance after the wake-up utterance through the microphone (an embodiment may identify one or more commands within the audio data captured at 310. Thus, an embodiment may receive audio input which contains an activation cue as well as a command, e.g., “what is the weather tomorrow?”, [0014] [0017] [0028 [0029]);  generate a first response (an action is carried out, Fig.3, 320; [0029]); retain the voice recognition service during a time interval after receiving the wake-up utterance or generating the first response (action may be taken when an additional audio input comprising a command is received during a pre-determined time period, [0031]); in retaining the voice recognition server, receive a second user utterance through the microphone within the time interval after receiving the wake-up utterance or generating the first response (Once an action has been carried out at 320, an additional audio input is received at 330 during a predetermined time period, [0030] [0031]); and generate a second response (Fig.3, 360; [0031]): Yet, VanBlon fails to teach the following: a speaker operatively connected to the processor;  the at least one memory stores an automatic speech recognition  (ASR) module and a natural language understanding (NLU )module; display a user interface with a plurality of predetermined selectable trigger words on the display, wherein each of the plurality of predetermined selectable trigger words is for requesting an action among a plurality of actions capable of being performed by the electronic device and the plurality of predetermined selectable trigger words are different from a predetermined wake-up word for activating a voice recognition service; receive a selection of some at least one of the plurality of predetermined selectable trigger words and or input of other at least one trigger words by the user interface, thereby forming a selected one or more trigger words; generate the first response based on processing the first user utterance with the NLU module; extract a text for the second user utterance with the ASR module; when any one of the selected one or more trigger words is recognized in the text for the second user utterance, generate the second response with the NLU module, wherein generating the second response comprises: providing a sentence including a particular one of the selected one or more trigger words included in the second user utterance to the NLU module; and generating the second response based on the sentence including the particular one of the selected one or more trigger words, wherein the second response is based on the meaning of the selected one or more trigger words;; and when any one of the selected one or more trigger words is not recognized in the text for the second user utterance and when the time interval is expired after receiving the wake-up utterance or after generating the first response, deactivate the voice recognition service.
However, Xu discloses a voice control command generation method and terminal (Abstract), wherein an electronic device (terminal, Fig.1, 100; [0065] [0066]) which receives and processes voice commands using a voice recognition service (voice assistant application, [0132] [0133]) further comprises a processor (Fig.1, 110) operatively coupled to a speaker (Fig.1170A) ([0066]) and a display (Fig.1, 194) to display a user interface (Fig.4a) with a plurality of predetermined selectable trigger words on the display (custom voice control commands 1 and 2 which are selectable by enabling the custom voice control button, Fig.4a, 402; [0137] [0138]), wherein each of the plurality of predetermined selectable trigger words is for requesting an action among a plurality of actions capable of being performed by the electronic device ( a custom voice command is bound to an operation to be performed by the terminal, [0003] [0141 – 0149] [0157] [0158] [0160] [0161] [0163] [0167 – 0179] [0181] [0182]) and the plurality of predetermined selectable trigger words are different from a predetermined wake-up word for activating a voice recognition service (a wakeup keyword is different than the custom commands, [0137]). Additionally, a selection of some at least one of the plurality of predetermined selectable trigger words (set the custom voice control button so that the pre-stored custom commands, e.g. Custom command 1, are enabled, [0137])  or input of  least one trigger words is received by the user interface ([0141 – 0149] [0157] [0158] [0160] [0161] [0163] [0167 – 0179] [0181] [0182], thereby forming a selected one or more trigger words (only after the custom voice control function is enables, the mobile phone can perform a corresponding event in response the custom commands, [0138])
Additionally, Doshi discloses an ASR based human machine interface (Abstract), wherein an ASR module  (Fig.1, 130)) and NLU module (Fig.1, 140) are stored in the memory of a device ((Fig.1, 120); [0030]); an ASR module is used to extract text for user utterances ([0021] [0032 – 0035]) and provide a sentence including the text in a user utterance to the NLU module ([0036] [0082]); and a response is generated based on the sentence including the text of the user utterance by the NLU module by determining the meaning of the sentence ([0037] [0045] [0047] [0048]).
Moreover, Wang discloses a voice command interface (Abstract), wherein a speech recognition routine is deactivated if a predefined command or command string is not received, and a timer has expired (Fig.3, 78, 80, 84; [0040] [0041] [0059] [0063] [0064]); and when any one of a predefined command is recognized in the text for a second user utterance generate a second response (Fig.3, 80; [0063] [0065] [0066]).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of applicant’s filing to improve VanBlon’s invention in the same way that Xu’s, Doshi’s and Wang’s inventions have been improved to achieve the following predictable results for the purpose decreasing the awkwardness or unnatural feel of using voice command technology by not requiring a user to repeatedly activate a device using a wake-up trigger phrase and enabling a more flexible custom voice control of the device (VanBlon, [0001] [0002]) (Xu, [0002 – 0006): further operatively connecting a speaker to the processor; further storing an automatic speech recognition  (ASR) module and a natural language understanding (NLU ) module in the at least one memory; further displaying a user interface with a plurality of predetermined selectable trigger words (custom commands) on the display, wherein each of the plurality of predetermined selectable trigger words is for requesting an action among a plurality of actions capable of being performed by the electronic device and the plurality of predetermined selectable trigger words are different from a predetermined wake-up word for activating a voice recognition service; further receiving a selection of some at least one of the plurality of predetermined selectable trigger words or input of other at least one trigger words by the user interface, thereby forming a selected one or more trigger words; further generating the first response (action) based on processing the first user utterance with the NLU module (the NLU module receives a sentence based on the first user utterance and determines the intent to generate the response/action); extracting a text for the user utterance, including the second user utterance, with the ASR module; when any one of the selected one or more trigger words (custom commands/predefined commands) is recognized in the text for the second user utterance, generating the second response (action) with the NLU module, wherein generating the second response comprises: providing a sentence including a particular one of the selected one or more trigger words included in the second user utterance to the NLU module; and generating the second response based on the sentence including the particular one of the selected one or more trigger words, wherein the second response is based on the meaning of the selected one or more trigger words in the sentence; and when any one of the selected one or more trigger words is not recognized in the text for the second user utterance and when the time interval is expired after receiving the wake-up utterance or after generating the first response, deactivating the voice recognition service (speech recognition routine).

For claim 3, VanBlon and Xu further disclose, wherein the instructions cause the electronic device to determine the selected one or more trigger words by an operation of the at least one processor  (VanBlon, Fig.1, 110; [ 0019])(Xu, custom voice control function is determined to be enabled, [0068] [0069] [0072] [0138]).
For claim 5, the combination of VanBlon and Xu further disclose, wherein the selected one or more trigger words includes a word associated belonging to a category specified through the input circuit among a plurality of actions capable of being performed by the electronic device (VanBlon, [0030]) (Xu, custom commands associated with an application as category, [0003] [0141 – 0149] [0157] [0158] [0160] [0161] [0163] [0167 – 0179] [0181] [0182]).

For claim 12, VanBlon discloses an electronic device (Abstract) comprising:  a communication circuit (WLAN transceiver, Fig.1, 160; [0021]); an input circuit (input/ output controller, [0019]); a microphone (Fig.1, 120; [0021]); a display (touch screen, Fig.1, 170; [0021]); at least one processor(Fig.1, 110; [0019]) operatively connected to the communication circuit, the input circuit, the display  and the microphone (Fig.1, 110, 120 and 170; [0019] [0021]); and at least one memory (flash memory and SDRAM) operatively coupled to the at least one processor (Fig.1, 110, 180 and 190; [0019] [0021]), and wherein the at least one memory stores instruction that, when executed, cause the processor ([0046]): based on receiving a predetermined wake-up word (activation cue or trigger phrase to allow a device to wake up, e.g. “Hey Siri”, [0014]) for calling a voice recognition service through the microphone, execute an intelligent app capable of providing the voice recognition service(Google or Siri related personal assistant  application, [0002] [0014]), wherein a wake-up utterance includes the predetermined wake-up word ([0014] [0017] [0028]); receive a first user utterance through the microphone, after the predetermined wake-up word (an embodiment may identify one or more commands within the audio data captured at 310. Thus, an embodiment may receive audio input which contains an activation cue as well as a command, e.g., “what is the weather tomorrow?”, [0014] [0017] [0028 [0029]); perform a first action determined based on the first user utterance, using the intelligent app (an action is carried out, Fig.3, 320; [0029]); in retaining the voice recognition service during a time interval after receiving the wake-up utterance or generating the first response (action may be taken when an additional audio input comprising a command is received during a pre-determined time period, [0031]); in retaining the voice recognition server, receive a second user utterance through the microphone within the time interval after receiving the wake-up utterance or generating the first response (Once an action has been carried out at 320, an additional audio input is received at 330 during a predetermined time period, [0030] [0031]); and perform a second action using the intelligent app (Fig.3, 360; [0031]): Yet, VanBlon fails to teach the following: display a user interface with a plurality of predetermined selectable trigger words on the display, wherein each of the plurality of predetermined selectable trigger words is for requesting an action among a plurality of actions capable of being performed by the electronic device and the plurality of predetermined selectable trigger words are different from a predetermined wake-up word for activating a voice recognition service; receive a selection of some at least one of the plurality of predetermined selectable trigger words and or input of other at least one trigger words by the user interface, thereby forming a selected one or more trigger words; determine whether any one of the selected one or more trigger words for the user interface are recognized in the second user utterance within the time interval, using the intelligent app; when any one of the selected one or more trigger words is recognized in the second user utterance within the time interval, perform the second action based on the second user utterance and a meaning of the selected one or more trigger words, wherein performing the second action comprises: determining a sentence including a particular one of the selected one or more trigger words that is recognized in the second user utterance; transmitting the sentence including the particular one of the selected one or more trigger words to an external electronic device; and when none of the at least one trigger words is not recognized in the second user utterance within the time interval, terminal the intelligent app.
However, Xu discloses a voice control command generation method and terminal (Abstract), wherein an electronic device (terminal, Fig.1, 100; [0065] [0066]) comprises a display (Fig.1, 194) to display a user interface (Fig.4a) with a plurality of predetermined selectable trigger words on the display (custom voice control commands 1 and 2 which are selectable by enabling the custom voice control button, Fig.4a, 402; [0137] [0138]), wherein each of the plurality of predetermined selectable trigger words is for requesting an action among a plurality of actions capable of being performed by the electronic device ( a custom voice command is bound to an operation to be performed by the terminal, [0003] [0141 – 0149] [0157] [0158] [0160] [0161] [0163] [0167 – 0179] [0181] [0182]), and the plurality of predetermined selectable trigger words are different from a predetermined wake-up word for activating a voice recognition service (a wakeup keyword is different than the custom commands, [0137]). Additionally, a selection of some at least one of the plurality of predetermined selectable trigger words (set the custom voice control button so that the pre-stored custom commands, e.g. Custom command 1, are enabled, [0137])  or input of  least one trigger words is received by the user interface ([0141 – 0149] [0157] [0158] [0160] [0161] [0163] [0167 – 0179] [0181] [0182], thereby forming a selected one or more trigger words (only after the custom voice control function is enables, the mobile phone can perform a corresponding event in response the custom commands, [0138])
Moreover, Doshi discloses an ASR based human machine interface (Abstract), wherein an ASR module  (Fig.1, 130) is local to a client and a NLU module (Fig.1, 140) is remote to a client(Fig.1, 120); [0030]); the ASR module is used to extract text for user utterances ([0021] [0032 – 0035]) and provide a sentence including the text in a user utterance to the NLU module on a remote device ([0030] [0036] [0082]); and an action is performed based on the sentence including the text of the user utterance by the NLU and a meaning of the sentence ([0037] [0045] [0047] [0048]).
Moreover, Wang discloses a voice command interface (Abstract), wherein a speech recognition routine is deactivated if a predefined command or command string is not received, and a timer has expired (Fig.3, 78, 80, 84; [0040] [0041] [0059] [0063] [0064]); and when any one of a predefined command is recognized in the text for a second user utterance generate a second response (Fig.3, 80; [0063] [0065] [0066]).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of applicant’s filing to improve VanBlon’s invention in the same way that Xu’s, Doshi’s and Wang’s inventions have been improved to achieve the following predictable results for the purpose decreasing the awkwardness or unnatural feel of using voice command technology by not requiring a user to repeatedly activate a device using a wake-up trigger phrase and enabling a more flexible custom voice control of the device (VanBlon, [0001] [0002]) (Xu, [0002 – 0006): further display a user interface with a plurality of predetermined selectable trigger words (custom commands/predefined commands) on the display, wherein each of the plurality of predetermined selectable trigger words is for requesting an action among a plurality of actions capable of being performed by the electronic device and the plurality of predetermined selectable trigger words are different from a predetermined wake-up word for activating a voice recognition service; receive a selection of some at least one of the plurality of predetermined selectable trigger words and or input of other at least one trigger words by the user interface, thereby forming a selected one or more trigger words; further determine whether any one of the selected one or more trigger words for the user interface are recognized in the second user utterance within the time interval, using the intelligent app; when any one of the selected one or more trigger words is recognized in the second user utterance within the time interval, perform the second action based on the second user utterance, wherein performing the second action comprises: determining a sentence including a particular one of the selected one or more trigger words that is recognized in the second user utterance (electronic device further comprises an ASR module); transmitting the sentence including the particular one of the selected one or more trigger words to an external electronic device (a NLU module located remotely from ASR determines semantic meaning of user utterance) wherein the second response is based on the meaning of the selected one or more trigger words in the sentence; and when none of the at least one trigger words is not recognized in the second user utterance within the time interval, terminal the intelligent app.

For claim 17, the combination of VanBlon and Xu further disclose, wherein the selected one or more trigger words includes a word associated with an action request specified through the input circuit among a plurality of actions capable of being performed by the electronic device (VanBlon, [0030]) (Xu, custom commands associated with an application as category, [0003] [0141 – 0149] [0157] [0158] [0160] [0161] [0163] [0167 – 0179] [0181] [0182]).
For claim 18, the combination of VanBlon and Xu further disclose, wherein the selected one or more trigger words includes a word associated belonging to a category specified through the input circuit among a plurality of actions capable of being performed by the electronic device (VanBlon, [0030]) (Xu, custom commands associated with an application as category, [0003] [0141 – 0149] [0157] [0158] [0160] [0161] [0163] [0167 – 0179] [0181] [0182]).
For claim 19, the combination of VanBlon and Xu further disclose wherein the selected one or more words further includes at least one of a word for requesting a plurality of actions capable of being performed by the electronic device (VanBlon, [0030]) (Xu, custom commands associated with an application as category, [0003] [0141 – 0149] [0157] [0158] [0160] [0161] [0163] [0167 – 0179] [0181] [0182]), a word for changing a topic, and a word indicating the electronic device.

For claim 21, VanBlon, Xu and Doshi further discloses generating the second response when the second user utterance comprises any one or more words, further comprises determining a user intent based on the sentence including the extracted text and the particular one of the selected one or more trigger words (VanBlon. [0029])(Xu, Xu, custom commands, [0003] [0141 – 0149] [0157] [0158] [0160] [0161] [0163] [0167 – 0179] [0181] [0182]) (Doshi, [0021] [0032 – 0035] [0037] [0045] [0047] [0048] [0082]).
For claim 22, VanBlon and Wang further disclose wherein the instructions cause the electronic device to further retain the voice recognition service from a point in time when generating second response (VanBlon, [0031])(Wang, [0059] [0061] [0065] [0069]) or when the time interval is expired after receiving the wake-up utterance or after generating the first response (VanBlon, [0031] (Wang, [0040] [[0046] [0049] [0050]).
For claim 24, VanBlon and Wang further disclose wherein the instructions cause the electronic device to further retain the intelligent app from a point in time when generating second response (VanBlon, [0031])(Wang, [0059] [0061] [0065] [0069]) or when the time interval is expired after receiving the wake-up utterance or after generating the first response (VanBlon, [0031] (Wang, [0040] [[0046] [0049] [0050]).

Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over VanBlon et al. (US 2017/0169817) (“VanBlon”) in view of Xu (US 2021/0295839), and further in view of Doshi et al. (US 2019/0318759), and further in view of Wang et al. (US 2014/0337028),and further in view of Mont-Reynaud et al.  (US 2018/0301151) (“Mont-Reynaud”) and further in view of Helbing (US 2005/0038659).
For claim 4, the combination of VanBlon, Xu, Doshi and Wang further disclose a user selecting a time interval (Wang, [0041] [0078]), yet fails to teach that the electronic device provides a user interface configured to receive the time interval.
However, Mont-Reynaud discloses a method for managing agent engagement in a man-machined dialog (Abstract), where an electronic device (Fig.3A) is configured to receive a time interval via speech (lock request comprising time to be spent in the locked state, lock for three minutes) ([0057] [0058] [0067] [0089] [0090] [0094] [0095] [0143] [0152] [0153] [0169] [0174] [0180]).
Moreover, Helbing discloses a dialogue system  (Abstract), wherein a user terminal comprises an acoustic user interface with a microphone for the user to input speech ([0003]).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of applicant’s filing to improve the invention disclosed by the combination of VanBlon, Xu, Doshi and Wang in the same way that Mont-Reynaud’s and Helbing’s inventions have been improved to achieve the predictable results of the user selected time intervals further being received via a user interface for the purpose decreasing the awkwardness or unnatural feel of using voice command technology by not requiring a user to repeatedly activate a device using a wake-up trigger phrase and  enabling a more flexible custom voice control of the device (VanBlon, [0001] [0002]).

Claim 6  and 20 are rejected under 35 U.S.C. 103 as being unpatentable over VanBlon et al. (US 2017/0169817) (“VanBlon”) in view of Xu (US 2021/0295839), and further in view of Doshi et al. (US 2019/0318759), and further in view of Wang et al. (US 2014/0337028), and further in view of Rose et al. (US 2018/0061399) (“Rose”).
For claims 6 and 20, the combination of VanBlon, Xu, Doshi and Wang fails to teach, wherein the instructions cause the at least one processor to: determine whether the selected one or more words are included in the second user utterance based at least in part on identification of an utterance speed change, a tone change, and an intonation change.
However, Rose discloses a system and method for recognizing speech (Abstract), wherein a user speaking a voice command is determined by a change in utterance speed (user speaks more slowly), tone and intonation (user speaks more monotonically and monotonously) ([0032]).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of applicant’s filing to apply Rose’s command detecting technique to selected word/command recognition process disclosed by the combination of VanBlon, Xu, Doshi and Wang fails to teach to achieve the predictable results of further determining that the selected one or more words are included in the second user utterance based at least in part on identification of an utterance speed change, a tone change, and an intonation change for the purpose decreasing the awkwardness or unnatural feel of using voice command technology by not requiring a user to repeatedly activate a device using a wake-up trigger phrase and enabling a more flexible custom voice control of the device (VanBlon, [0001] [0002]) (Xu, [0002 – 0006).

Claims 7 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over VanBlon et al. (US 2017/0169817) (“VanBlon”) in view of Xu (US 2021/0295839), and further in view of Doshi et al. (US 2019/0318759), and further in view of Wang et al. (US 2014/0337028), and further in view of Bender et al.  (US 2020/0105249) (“Bender”).
For claim 7, the combination of VanBlon, Xu, Doshi and Wang fails to teach wherein the instructions cause the at least one processor to: generate the second response with the NLU module further based at least in part on whether another selected one or more trigger words are not included in the second user utterance.
However, Bender discloses a method for creating a temporal blacklisting of commands form a listening device (Abstract), wherein a response is generated by a device when one or more selected keywords (blacklisted keywords including “order a pizza”, [0027 – 0033]) are not included in an utterance (Fig.3, 302 – 310; [0040 – 0047]).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of applicant’s filing to modify the combined teachings of VanBlon, Xu, Doshi and Wang with Bender’s teachings so that a blacklist of trigger words (keywords) are additionally selected so that the a response is generated by the device using the NLU module when  one or more of the black listed trigger words are not included in an utterance such as the second user utterance for the purpose of preventing undesirable actions from being performed by a device when a command originates from a source other than a user (Bender, [0002 – 0004] [0013]). 

For claim 16, the combination of VanBlon, Xu, Doshi, Wang and Mok fails to teach wherein the instruction further cause the electronic device to: when still another one or more trigger words is recognized in the second user utterance terminate the intelligent app.
However, Bender discloses a method for creating a temporal blacklisting of commands form a listening device (Abstract), wherein a command is ignored when one or more selected keywords (blacklisted keywords including “order a pizza”, [0027 – 0033]) are included in an utterance (Fig.3, 302 – 310; [0040 – 0047]).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of applicant’s filing to modify the combined teachings of VanBlon, Xu, Doshi, Wang with Bender’s teachings so that a blacklist of trigger words (keywords) are additionally selected so that when one or more black listed trigger words are recognized in the second user utterance, the command is ignored and a voice recognition service (intelligent app) is terminated (deactivated), i.e. the timer has expired (an ignored command is synonymous with a undetected command since both are not actionable commands) (VanBlon, [0031]) (Wang, [0049] [0050]) for the purpose of preventing undesirable actions from being performed by a device when a command originates from a source other than a user (Bender, [0002 – 0004] [0013]). 

Claims 8 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over VanBlon et al. (US 2017/0169817) (“VanBlon”) in view of Xu (US 2021/0295839), and further in view of Doshi et al. (US 2019/0318759), and further in view of Wang et al. (US 2014/0337028). and further in view of Ogawa et al. (US 2016/0275950) (“Ogawa”).
For claims 8,and 15, the combination of VanBlon, Xu, Doshi and Wang fails to teach, wherein the instructions cause the at least one processor to: output an audible request to reutter a sentence, based on a previously detected location of the selected one or more words within the sentence.
However, Ogawa discloses a voice recognition system and device (Abstract), wherein  the voice recognition system provides multiple candidate voice recognition results ([0044] [0045] [0076]); and  a voice recognition error which requires user clarification is determined by identifying differences between the texts of the candidate voice recognition results at locations other than a head or tail, wherein the position of commands in an utterance are limited to a head or tail of an utterance ([0071 – 0075] [0094] [0095]). Furthermore, a user is requested to input a voice again to correct the error ([0095]).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of applicant’s filing to improve the invention disclosed by the combination of VanBlon, Xu, Doshi and Wang in the same way that Ogawa’s invention has been improved to achieve the predictable results of further providing utterance rules for the selected one or more  trigger words (command) so that the selected one or more trigger words are limited to location in an utterance , wherein a user is requested to reutter an input including a sentence when a voice recognition error is detected based on the detected location of the selected one or  trigger words within the sentence when the agent generates multiple voice recognition candidates for an utterance/sentence (differences between the texts of the candidate voice recognition results at locations other than a head or tail are identified ) for the purpose of enhancing the efficiency of the virtual assistant by preventing a wrong voice result from being used to perform a task.

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over VanBlon et al. (US 2017/0169817) (“VanBlon”) in view of Helbing (US 2005/0038659), and further in view of  view of Xu (US 2021/0295839), and further and further in view of Doshi et al. (US 2019/0318759) (“Doshi”) and further in view of Wang et al. (US 2014/0337028).
For claim 9, VanBlon discloses an electronic device (Abstract) comprising: a microphone (Fig.1, 120; [0021]); a display (touch screen, Fig.1, 170; [0021]); at least one processor (Fig.1, 110; [0019]) operatively connected to the microphone and the display ( Fig.1, 110, 120 and 170; [0019] [0021]); and at least one memory (flash memory and SDRAM) operatively coupled to the at least one processor (Fig.1, 110, 180 and 190; [0019] [0021]), and wherein the at least one memory stores instruction that, when executed, cause the processor ([0046]): receive a wake-up utterance (activation cue or trigger phrase to allow a device to wake up, e.g. “Hey Siri”, [0014]), wherein the wake-up utterance includes the predetermined wake up word, activate the voice -based intelligent assistance service (personal assistant  application) and receive a first user utterance after the wake-up utterance through the microphone (an embodiment may identify one or more commands within the audio data captured at 310. Thus, an embodiment may receive audio input which contains an activation cue as well as a command, e.g., “what is the weather tomorrow?”, [0014] [0017] [0028 [0029]); generate a first response (an action is carried out, Fig.3, 320; [0029]); retain the voice recognition service during a time interval after receiving the wake-up utterance or generating the first response (action may be taken when an additional audio input comprising a command is received during a pre-determined time period, [0031]); in retaining the voice-based intelligent assistance service, receive a second user utterance through the microphone within the time interval after receiving the wake-up utterance or generating the first response (Once an action has been carried out at 320, an additional audio input is received at 330 during a predetermined time period, [0030] [0031]); and generate a second response (Fig.3, 360; [0031]). Yet, VanBlon fails to teach the following: a speaker operatively connected to the processor;  the at least one memory stores an automatic speech recognition  (ASR) module and a natural language understanding (NLU) module; receive the user input through another user interface; display a user interface with a plurality of predetermined selectable trigger words on the display, wherein each of the plurality of predetermined selectable trigger words is for requesting an action among a plurality of actions capable of being performed by the electronic device and the plurality of predetermined selectable trigger words are different from a predetermined wake-up word for activating a voice recognition service; receive a selection of some at least one of the plurality of predetermined selectable trigger words and or input of other at least one trigger words by the user interface, thereby forming a selected one or more trigger words;  receive a user input to call a voice-based intelligent assistance service, through another user interface; generate the first response based on processing the first user utterance with the NLU module; extract a text for the second user utterance with the ASR module; when any one of the selected one or more trigger words is recognized in the text for the second user utterance, generate the second response with the NLU module, wherein generating the second response comprises: providing a sentence including a particular one of the selected one or more trigger words included in the second user utterance to the NLU module; and generating the second response based on the sentence including the particular one of the selected one or more trigger words, wherein the second response is based on the meaning of the selected one or more trigger words; and when any one of the selected one or more trigger words is not recognized in the text for the second user utterance and when the time interval is expired after receiving the wake-up utterance or after generating the first response, deactivate the voice-based intelligent assistant service.
However, Helbing discloses a dialogue system  (Abstract), wherein a user terminal comprises an acoustic user interface with a microphone for the user to input speech ([0003]).
Moreover, Xu discloses a voice control command generation method and terminal (Abstract), wherein an electronic device (terminal, Fig.1, 100; [0065] [0066]) which receives and processes voice commands using a voice recognition service (voice assistant application, [0132] [0133]) further comprises a processor (Fig.1, 110) operatively coupled to a speaker (Fig.1170A) ([0066]) and a display (Fig.1, 194) to display a user interface (Fig.4a) with a plurality of predetermined selectable trigger words on the display (custom voice control commands 1 and 2 which are selectable by enabling the custom voice control button, Fig.4a, 402; [0137] [0138]), wherein each of the plurality of predetermined selectable trigger words is for requesting an action among a plurality of actions capable of being performed by the electronic device ( a custom voice command is bound to an operation to be performed by the terminal, [0003] [0141 – 0149] [0157] [0158] [0160] [0161] [0163] [0167 – 0179] [0181] [0182]) and the plurality of predetermined selectable trigger words are different from a predetermined wake-up word for activating a voice recognition service (a wakeup keyword is different than the custom commands, [0137]). Additionally, a selection of some at least one of the plurality of predetermined selectable trigger words (set the custom voice control button so that the pre-stored custom commands, e.g. Custom command 1, are enabled, [0137])  or input of  least one trigger words is received by the user interface ([0141 – 0149] [0157] [0158] [0160] [0161] [0163] [0167 – 0179] [0181] [0182], thereby forming a selected one or more trigger words (only after the custom voice control function is enables, the mobile phone can perform a corresponding event in response the custom commands, [0138])
Additionally, Doshi discloses an ASR based human machine interface (Abstract), wherein an ASR module  (Fig.1, 130)) and NLU module (Fig.1, 140) are stored in the memory of a device ((Fig.1, 120); [0030]); an ASR module is used to extract text for user utterances ([0021] [0032 – 0035]) and provide a sentence including the text in a user utterance to the NLU module ([0036] [0082]); and a response is generated based on the sentence including the text of the user utterance by the NLU module ([0037] [0045] [0047] [0048]).
Furthermore, Wang discloses a voice command interface (Abstract), wherein a speech recognition routine is deactivated if a predefined command or command string is not received, and a timer has expired (Fig.3, 78, 80, 84; [0040] [0041] [0059] [0063] [0064]); and when any one of a predefined command is recognized in the text for a second user utterance generate a second response (Fig.3, 80; [0063] [0065] [0066]).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of applicant’s filing to improve VanBlon’s invention in the same way that Helbing, Xu’s, Doshi’s and Wang’s inventions have been improved to achieve the following predictable results for the purpose decreasing the awkwardness or unnatural feel of using voice command technology by not requiring a user to repeatedly activate a device using a wake-up trigger phrase and  enabling a more flexible custom voice control of the device (VanBlon, [0001] [0002]) (Xu, [0002 – 0006): further operatively connecting a speaker to the processor;  further receiving the user input through another interface, further storing an automatic speech recognition  (ASR) module and a natural language understanding (NLU ) module in the at least one memory; further displaying a user interface with a plurality of predetermined selectable trigger words (custom commands) on the display, wherein each of the plurality of predetermined selectable trigger words is for requesting an action among a plurality of actions capable of being performed by the electronic device and the plurality of predetermined selectable trigger words are different from a predetermined wake-up word for activating a voice recognition service; further receiving a selection of some at least one of the plurality of predetermined selectable trigger words or input of other at least one trigger words by the user interface, thereby forming a selected one or more trigger words; further generating the first response (action) based on processing the first user utterance with the NLU module (the NLU module receives a sentence based on the first user utterance and determines the intent to generate the response/action); extracting a text for the user utterance, including the second user utterance, with the ASR module; when any one of the selected one or more trigger words (custom commands/predefined commands) is recognized in the text for the second user utterance, generating the second response (action) with the NLU module, wherein generating the second response comprises: providing a sentence including a particular one of the selected one or more trigger words included in the second user utterance to the NLU module; and generating the second response based on the sentence including the particular one of the selected one or more trigger words, wherein the second response is based on the meaning of the selected one or more trigger words in the sentence; and when any one of the selected one or more trigger words is not recognized in the text for the second user utterance and when the time interval is expired after receiving the wake-up utterance or after generating the first response, deactivating the voice-based intelligent assistant service(speech recognition routine).

Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over VanBlon et al. (US 2017/0169817) (“VanBlon”) in view of Xu (US 2021/0295839), and further in view of Doshi et al. (US 2019/0318759), and further in view of Wang et al. (US 2014/0337028) and further in view of Mok et al.  (US 10,847,149) (“Mok”).
For claim 13, the combination of VanBlon, Xu, Doshi and Wang fails to teach wherein performing the second action when any one of the selected one or more trigger words are recognized: receive information associated with an execution of the second action determined based on the sentence, from the external electronic device; and perform the second action based on the information associated with the execution of the second action.
However, Mok discloses a discloses a system and method for processing speech (Abstract), wherein a sentence comprising one or more selected words (command received after an instruction to send audio without a wakeword, e.g. “what is the weather”, column 2 lines 35 - 40) is determined (input audio data corresponding to an utterance is generated) and transmitted to an external electronic device (server, Fig.1, 120 and Fig.2, 120 and Fig.12, 120) (Fig.7B, 730 and 732; column 5 lines 7 – 20; column 23 lines 9 – 17) through a communication  interface (Fig,14, 1414; column 29 lines 40 – 67); information associated with an execution of an action (content to be output by the device 110) determined based on the sentence (column 5 lines 22 – 47; column 8 lines 17 – 58; column 10 lines 65 – column 11 line 10; column 16 lines 28 – 46; column 23 lines 17 - 25) is received from the external device (column 5 lines 45 - 49;  column 22 lines 65 – 67; column 23 lines 17 - 25); and the action based on the information associated with execution of the second action is performed (the device receives the output data for output to user, wherein outputting to user comprises the device outputting audio corresponding to the output data, Fig.7B, 724, column 5 lines 45 – 49; column 23 lines 8 – 10).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of applicant’s filing to improve the invention disclosed by the combination of VanBlon, Xu, Doshi and Wang so that the following speech recognition and processing occurs without detecting a spoken keyword or wakeword (Mok, column 3 lines 35 – 40) for the purpose of conserving local processing resources while decreasing the awkwardness or unnatural feel of using voice command technology by not requiring a user to repeatedly activate a device using a wake-up trigger phrase and  enabling a more flexible custom voice control of the device (VanBlon, [0001] [0002]) (Xu, [0002 – 0006): wherein performing the second action when any one of the selected one or more trigger words are recognized alternatively comprises receiving information associated with an execution of the second action determined based on the sentence, from the external electronic device; and performing the second action based on the information associated with the execution of the second action.

Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over VanBlon et al. (US 2017/0169817) (“VanBlon”) in view of Xu (US 2021/0295839), and further in view of Doshi et al. (US 2019/0318759), and further in view of Wang et al. (US 2014/0337028), and further in view of Mok et al.  (US 10,847,149) (“Mok”) and further in view of Bender et al.  (US 2020/0105249) (“Bender”).
For claim 14, the combination of VanBlon, Xu, Doshi, Wang and Mok fails to teach wherein performing the second action when any one of the selected one or more trigger words are recognized: determine whether another one or more selected trigger words are included in the sentence; when the another one or more selected trigger words are not included in the sentence, transmit the sentence to the external electronic device through the communication circuit; and when the another one or more selected trigger words are included in the sentence, not transmit the sentence to the external electronic device.
However, Bender discloses a method for creating a temporal blacklisting of commands form a listening device (Abstract), wherein a response is generated by a device when one or more selected keywords (blacklisted keywords including “order a pizza”, [0027 – 0033]) are not included in an utterance (Fig.3, 302 – 310; [0040 – 0047]).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of applicant’s filing to modify the combined teachings of VanBlon, Xu, Doshi, Wang and Mok with Bender’s teachings so that a blacklist of trigger words (keywords) are additionally selected so that  a performing the second action of generating a response when any one of the selected one or more trigger words are recognized further comprises determining whether one or more black listed trigger words are included in the sentence; and generating a response including transmitting the sentence to the external electronic device through the communication circuit when a blacklisted word is not included and not transmitting the sentence to generate a response when a blacklisted word is included in the sentence for the purpose of preventing undesirable actions from being performed by a device when a command originates from a source other than a user (Bender, [0002 – 0004] [0013]). 

Response to Arguments
Applicant's arguments filed on 06/10/2022 have been fully considered but they are not persuasive. On page 12 of the remarks, applicant argues that custom voice controls in Xu would not be provided to a NLU. However, Doshi discloses that Xu’s voice commands would be provided to an NLU since the NLU performs semantic interpretation of speech ([0037]). Doshi  discloses that a user utterance is first processed by an ASR to extract sound features and determine matching keywords ([0032 – 0036]).  These keywords are sent to the NLU to determine meaning ([0036] [0037]).

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SONIA L GAY whose telephone number is (571)270-1951. The examiner can normally be reached Monday-Friday 9-5 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on 571-272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SONIA L GAY/Primary Examiner, Art Unit 2657