Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Claims 1-19 are pending. Claims 1, 11 and 19 are independent.
This Application was published as U.S. 20210216276.
This Application is a continuation of 15/783,476 issued as U.S. 11,003,417.  A Terminal Disclaimer over the term of the parent is required.
Key concepts need definition inside the Claim language.  Attorney is encouraged to contact the Examiner for suggestions.
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP §§ 706.02(l)(1) - 706.02(l)(3) for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.

Claims 1-2, 4-12, and 14-19 (not 3 or 13) are rejected on the ground of nonstatutory double patenting as being unpatentable over claims of parent U.S. patent application No. 15/783476 issued as U.S. 11,003,417.  Although the claims at issue are not identical, they are not patentably distinct from each other because of the following mapping.  Claims 3 and 13 are rejected under the combination of claim 1 of the reference and the reference cited below (Pantel) in the statutory rejection.
Claim 1 is mapped as shown below:
Instant Application
Reference Application No. 15/783476
1. A speech recognition method comprising: 
1. A speech recognition method comprising:
receiving an input audio signal; 

(6. The speech recognition method of claim 1, further comprising: 
storing the plurality of activation words for activating speech recognition respectively corresponding to a plurality of operating environments, the activation words being terms related to each of the operating environments and being terms that can be included in a sentence to trigger the speech recognition.)

storing a plurality of activation words for activating speech recognition respectively corresponding to a plurality of operating environments, the activation words being terms related to each of the operating environments and being terms that can be included in a sentence to trigger the speech recognition;
determining at least one activation word among a plurality of activation words based on information related to an operating environment in which the speech recognition apparatus is operating; 

determining, by a processor of a speech recognition apparatus, at least one activation word among the plurality of activation words based on information related to the operating environment in which the speech recognition apparatus is operating; 
(receiving an input audio signal; )

receiving, at the speech recognition apparatus, an input audio signal;
performing speech recognition on the input audio signal, based on whether the input audio signal includes a speech signal of an utterance of an activation word included in the determined at least one activation word; and 

performing, by the processor, speech recognition on the input audio signal, based on whether the input audio signal includes an activation word included in the at least one activation word determined and 
whether the input audio signal includes a direct command for requesting a response of the speech recognition apparatus; and
outputting a result of the performing of the speech recognition, 

outputting a result of the performing of the speech recognition, 

wherein the performing of the speech recognition comprises:
wherein the outputting of the result of performing the speech recognition comprises:
extracting text of an utterance of a user by performing speech recognition on the input audio signal, 
extracting text uttered by a user by performing speech recognition on the input audio signal, 
determining whether a speech command included in the input audio signal is a direct command or an indirect command,
determining whether a speech command included in the input audio signal is the direct command, 

when it is determined that the speech command is the direct command, performing an operation of responding to the speech command, and 
when it is determined that the speech command is the direct command, performing an operation of responding to the speech command, and 
when it is determined that the speech command is the indirect command: 

when it is determined that the speech command is an indirect command:
(2. The speech recognition method of claim 1, wherein the performing of the speech recognition further comprises: 
displaying that a response to the speech command is possible, when it is determined that the speech command is the indirect command.)
displaying that a response to the speech command is possible and waiting for a confirmation command, and 
determining whether a confirmation command is detected, and

in response to detecting the confirmation command from the user, performing the operation of responding to the speech command, 
performing the operation of responding to the speech command, in response to detecting the confirmation command from the user, 


wherein the at least one activation word corresponds to executable functions of the speech recognition apparatus.
wherein the at least one activation word corresponds to executable functions of the speech recognition apparatus,
(4. The speech recognition method of claim 1, 
wherein the direct command is speech uttered by the user with an intent for the speech recognition apparatus to output the result of the performing of the speech recognition, and )

wherein the direct command is speech uttered by a user with an intent for the speech recognition apparatus to output the result of the performing of the speech recognition,
( 4….. wherein the indirect command is speech uttered by the user such that the speech recognition apparatus is unable to determine that the user intends for the speech recognition apparatus to output the result of the performing of the speech recognition.)

wherein the indirect command is speech uttered by the user such that the speech recognition apparatus is unable to determine that the user intends for the speech recognition apparatus to output the result of the performing of the speech recognition,
( 5. The speech recognition method of claim 1, 
wherein the information related to the operating environment comprises at least one of a time, a location of the speech recognition apparatus identified according to a type of a network to which the speech recognition apparatus is connected, and whether the speech recognition apparatus is connected to another electronic apparatus, and )

wherein the information related to the environment comprises at least one of 
a time, 
a location of the speech recognition apparatus identified according to a type of a network to which the speech recognition apparatus is connected, and 
whether the speech recognition apparatus is connected to another electronic apparatus, and
( 5....   wherein the direct command is determined based on at least one of a sentence ending in the speech signal, an intonation of the speech signal, a direction in which the speech signal is received, or a size of the speech signal. )

wherein the direct command is determined based on at least one of 
a sentence ending in the speech signal, 
an intonation of the speech signal, a direction in which the speech signal is received, or 
a size of the speech signal.

	
Independent Claims 11 and 19 are rejected under similar rationale.
	ODP of Claims 2 and 4-6 and their counterparts 12 and 14-16 is also shown in the table above.
	Claims 7 and 17 of the instant Application are made obvious by claim 2 of the reference. (2. The speech recognition method of claim 1, wherein the determining of the at least one activation word comprises determining a number of the at least one activation word, based on a degree of sensitivity of an activated speech recognition function of the speech recognition apparatus.)
Claims 8 and 18 of the instant Application are made obvious by claim 3 of the reference.  (3. The speech recognition method of claim 1, wherein the receiving of the input audio signal comprises storing the input audio signal, and wherein the performing of speech recognition comprises: determining whether the input audio signal comprises the speech signal for uttering the activation word included in the at least one activation word, and when it is determined that the input audio signal comprises the speech signal for uttering the activation word included in the at least one activation word, performing speech recognition on the stored input audio signal and a subsequently received input audio signal.)
Claim 9 of the instant Application is made obvious by claim 6 of the reference. (6. The speech recognition method of claim 1, further comprising: receiving information about speech commands received from a user in a plurality of situations, wherein the receiving is performed by the speech recognition apparatus; extracting a plurality of words included in the speech commands; and based on a frequency of the plurality of words included in speech commands received in a specific situation among the plurality of situations, storing at least one word as an activation word corresponding to the specific situation.)
Claim 10 of the instant Application is made obvious by claim 7 of the reference. (7. The speech recognition method of claim 1, wherein the determining of the at least one activation word comprises: obtaining information about at least one electronic apparatus connected to the speech recognition apparatus; and determining a word related to the at least one electronic apparatus as the at least one activation word.)
Claim Objections
Claims 1 and 11 are objected to because of informalities that may be addressed with the following suggested amendments: 
1. A speech recognition method comprising: 
determining at least one activation word among a plurality of activation words based on information related to an operating environment in which [[the]] a speech recognition apparatus is operating; 
…

11. A speech recognition apparatus comprising: 
a receiver configured to receive an input audio signal; and 
at least one processor configured to: 
determine at least one activation word comprising activation words among a plurality of activation words based on information related to an operating environment in which [[a]] the speech recognition apparatus is operating, 
…
Appropriate correction is required.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-6, 8, 11-16 and 18-19 are rejected under 35 U.S.C. 102(a)(1) as being unpatentable over Pantel (U.S. 2014/0214429) in view of Ogawa (U.S. 2018/0137861) and further in view of Cohrs (U.S. 5,930,751).
Regarding Claim 1, Pantel teaches:
1. A speech recognition method comprising:  [Patnel, Figures 1 and  4, “smartphone or terminal 1,” “main processor 26” and “DSP 25.”]
determining at least one activation word among a plurality of activation words based on information related to an operating environment in which the speech recognition apparatus is operating; [Pantel, Figure 7, 33, “[0096]  33 Keyword or Phrase Found?” is done by referring to a database of keywords. ]  [Patnel, F “Activation words” of the Claim are taught by “a previously defined keyword- and phrase-catalog“ of Pantel.  “[0075] 12 Activation Signal (Trigger) After Recognizing A Keyword.”  “…When a keyword is recognized, a primary voice recognition process (8) is activated from an inactive state, which converts the audio buffer to text and inputs it to a dialog system (9) which analyzes as to whether there is a relevant question made by the user….”  Abstract.   “1 … which, on recognizing a keyword (18) or a phrase from a previously defined keyword- and phrase-catalog …”  “[0049] Basically, the entries in the keyword- and phrase-catalog can be divided into: [0050] Question words and question phrases: e.g. "who has", "what", "how is", "where is", "are there", "is there", "are there", "do you know", "can one". [0051] Requests and commands: By way of example: "Please write an email to Bob". The phrase "write an email" will be recognized. Another example: "I would like to take a picture". The phrase "take a picture" will be recognized. [0052] Nouns referring to topics on which there is information in the database of the dialog system: e.g. "weather", "appointment", "deadline", "football", "soccer". ….”]
receiving an input audio signal; [Pantel, Figure 7, 30, “[0093] 30 Digitize Microphone Signals …”] [Pantel, Figures 1, and 2.  “Mic 2” receives the input speech. Pantel buffers audio continuously (Figure 2, 6) and when a keyword or keyphrase is recognized (Figures 2 and 3, 12, 18) it subjects the audio, both buffered and live, 17, 19) to a primary voice recognition process (8) and generates text (13) and begins a dialog (20) with a dialog system (9) and receives a response (14) of the dialog system (9) through a loudspeaker (3).  Pantel looks in a catalog for “a keyword (18) or a phrase from a previously defined keyword- and phrase-catalog” that have a predefined structure or fit a template.]
performing speech recognition on the input audio signal, based on whether the input audio signal includes a speech signal of an utterance of an activation word included in the determined at least one activation word; and [Pantel, Figure 7, 32 and 33 and 37-40:  “[0095] 32 Execute Secondary Voice Recognition Process with Live Audio data” and “[0096] 33 Keyword or Phrase found?” and “[0100] 37 Apply Primary Voice Recognition Process to Audio Buffer …”] [Figures 2 and 3.  The term “What 18” in Figure 3 is a keyword that generates the “trigger/activation signal 12” and is followed by the “direct command of “What will the weather be like then?”   “[0054] … Example: "Hello, <product name>, please calculate the square root of 49", or "What time is it, <product name>?"” ]
outputting a result of the performing of the speech recognition, [Pantel.  The “result” could be the “Text 13” that is output of “Primary voice recognition 8” and “2ndray voice recognition 7.”  Or it could be the “response 14” by the “Dialog System 9” output at the “loudspeaker 3.”  Figure 7, “[0105] 42 Generate Reply or Activate Action/Response (Full Regular Operation).”]
wherein the performing of the speech recognition comprises: 
extracting text of an utterance of a user by performing speech recognition on the input audio signal, [Pantel, Figure 7, 38:  “[0101] 38 Apply Primary Voice Recognition Process to New Live Audio Data” and the all of the steps of recognition from 31, 32, 33, 34, 35, 36 in [0095] to [0102].  Figure 2, “13 text” generated from the “dialog audio recording 11” and “trigger activation signal 12” by the “primary voice recognition process.”  The “text” of both command and non-command speech is shown in Figure 3 as “What” which is the trigger and “what will the weather be like then?” which is the entire command.]
determining whether a speech command included in the input audio signal is a direct command or an indirect command, [Pantel, Figure 7, 40, “[0103] 40 Analyze the Text of the sentence in the Dialog system” followed by Figure 7, 41, “[0104] “41: Does the Text Contain relevant Questions Messages or Commands?”]
when it is determined that the speech command is the direct command, performing an operation of responding to the speech command, and [Pantel, Figure 7, 42: “[0105] 42 Generate Reply or Activate Action/Response (Full Regular Operation)” after 41 which determines that a Question or Command is included in the text of the speech.  Figure 2, “Dialog system 9” outputting the “Response of the Dialog System 14” to the “loudspeaker 3.”  Figure 3, “start of the dialog 3.”  Pantel does not distinguish between direct and indirect commands but the example used (Figure 3, e.g.) indicate that indirect commands can be deciphered by Pantel.]
when it is determined that the speech command is the indirect command: determining whether a confirmation command is detected, and 
in response to detecting the confirmation command from the user, performing the operation of responding to the speech command, 
wherein the at least one activation word corresponds to executable functions of the speech recognition apparatus. [Pantel’s activation words begin commands that are executable b.  “14. The method of claim 1 wherein said keyword- and phrase-catalog contains question words, questioning phrases, requests and/or commands.”  See [0104] and Figure 7: “41: Does text contain relevant questions messages or commands?” ]

Pantel does not teach the environment-specific nature of the commands such that the activation word depends on the environment (defined as time and location amongst other things).

Ogawa teaches the environment specific nature of commands:
determining at least one activation word among a plurality of activation words based on information related to an operating environment in which the speech recognition apparatus is operating; [Ogawa, Figure 1 showing an "activation word database 24" and a "sensor monitoring unit 21" which provides the "environmental conditions" to the speech recognizer.  Figures 2 and 3 show how each command is considered a command only under certain conditions.  The flowchart of Figure 7 shows how moving and an out of an environment causes a command to be recognized or not.]
      Pantel and Ogawa pertain to the use of activation/ wake-up words and phrases to start speech recognition or to activate a device and both refer to a plurality of activation words.  It would have been obvious to add and combine the environment-specific keywords of Ogawa with the system of Pantel in order to permit the system an added parameter for detecting commands and as combining prior art elements according to known methods to yield predictable results. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Pantel is elaborate and on-point regarding the Activation Word and most other aspect of the Claim, and many of the examples of Pantel (see Figure 3) conform to “indirect commands” the way described in the instant Application.  Yet, Pantel is not express regarding distinguishing a direct command from and indirect command and a need for confirmation.
Ogawa filters the activation words accord to conditions of the environment as sensed by inputs but does not seek user confirmation.

Cohrs teaches the confirmation of a command when the system is not sure if a command has been issued, i.e. in the case of an indirect command:
determining whether a speech command included in the input audio signal is a direct command or an indirect command, [Cohrs provides an example of commanding a phone to make a call to a Bill King either by a direct and previously stored command such as “Call Bill King” or indirectly by telling someone else in the room:  “You should call Bill King” which thereby invokes the phone to wonder whether the command was directed at it or not.  Col. 2, lines 1-15.  The top paragraph of Col. 2 calls the second situation as a false recognition situation.  However, the next paragraph of Col. 2 (lines 16-23) clarifies that if the speaker confirms that he intended the second situation to be a command directed at the phone, then the command is carried out.]
….
when it is determined that the speech command is the indirect command: [Cohrs, In the scenario set forth in Figure 1, the determination is made when a command is heard without the subject of the command being also included like “make a call” but leaving out “Bill King.”  This command needs confirmation.]
determining whether a confirmation command is detected, and [Cohrs, Figure 1, 124 to YES detects a confirmation of the command.  There are two embodiments disclosed in Cohrs: one in the Background and on in the Detailed Description and Figure 1.  In the Background (Col. 2, lines 1-32) the scenario is similar to the scenario of the instant Specification and the confirmation is provided by a Yes or No.  In the Body of Cohrs the scenario is saying “make a call” and then the device indicator goes on and waits for the speaker to say “Bill King” as confirmation of the fact that he intended his speech to be a command “make call to Bill King.”  Both scenarios map to the language of the instant Claim.  Either YES/No or Bill King could be considered “a confirmation command” of the Claim.]
in response to detecting the confirmation command from the user, performing the operation of responding to the speech command, [Cohrs teaches that if the user responds to the indicator with a confirmation, then the command, for example, Call Bill King, will be executed/performed.  Figure 1, steps 114 to YES to 122 to 124 to 126 to 128.]
Pantel/Ogawa and Cohrs pertain to recognizing of spoken commands to activate a device and all refer to the situation where the command is not recognized with certainty.  It would have been obvious to add and combine the confirmation feature of Cohrs which is used for situations where the command may be considered indirect and an incident of another conversation and waits for confirmation from the speaker with the system of Pantel/Ogawa in order to provide more flexibility to the system and more control to the speaker and as combining prior art elements according to known methods to yield predictable results. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Regarding Claim 2, Pantel shows a dialog system (9) which can include confirmation but does not discuss this feature.
Ogawa was not cited for confirmation.
Cohrs expressly teaches (and goes further to conform to the Specification of the instant Application as well) that when an indirect command is detected the system provides an indication to the user that a response is possible:
2. The speech recognition method of claim 1, wherein the performing of the speech recognition further comprises: 
displaying that a response to the speech command is possible, when it is determined that the speech command is the indirect command. [Cohrs teaches the use of an indicator that could be a visual indicator such as a light or a sound indicator like a low volume tone when the device is not sure if the input speech was intended as a command which teaches the indirect command or “not the direct command” of the Claim.  Figure 1, “Activate Listen Indicator 112.”  (Refer to the scenarios set forth in Cohrs and in the Specification and Drawings of the instant Application which depict the same situation.)   “When a user issues a verbal command, the speech command system recognizes or identifies the command and retrieves a set of verbal command operators associated with the initial identified verbal command. The speech command system then initiates operation a timer of a pre-determined maximum timing period and activates an indicator to alert the user that the system is awaiting a verbal command operator. The indication is preferably performed without interrupting the user's activity. For example, the indication may be accomplished by a visual indicator (e.g., a light) and/or a low volume tone.”  Col. 2, lines 36-46. “….without limitation, visual indicators such as an LED light or character-based display, audible indicators such as a low volume tone, and a combination of visual and audible indicators.”  Col. 4, lines 47-51.  In the scenario set forth in Figure 1, the determination is made when a command is heard without the subject of the command being also included like “make a call” but leaving out “Bill King.”]
Rationale for combination as provided for Claim 1.  The feature of indirect command was mapped to Cohrs and the details of this feature come from Cohrs under the same rationale.

Regarding Claim 3, Pantel teaches and suggests:
3. The speech recognition method of claim 1, further comprising: 
storing the speech command when it is determined that the speech command is the indirect command; and [ Pantel teaches storing/buffering every sound that comes in noise or speech.  See Figures 2 and 3.  “Audio buffer 6.”  “Audio recording in buffer 15, 16, 17.”  This includes direct or indirect speech commands.] 
transmitting the stored speech command to an embedded speech recognition module of the speech recognition apparatus or an external server in response to detecting the confirmation command from the user. [ Pantel Figures 4 and 5 show the two types of voice recognition process: “8: primary voice recognition process,” and “7: secondary voice recognition process” which can handle a limited vocabulary.   Figure 6 shows the “server 28” in communication with “smartphone/terminal 1” with the “primary voice recognition process 8” on the server and  the “secondary voice recognition process 7” on the smartphone.  Accordingly, once the low power secondary voice recognition process (7) detects the incoming and buffered speech is sent to the more capable primary voice recognition process (8) which could be on the same device or on a server (8).]
Pantel teaches having two different speech recognition processes with different power/vocabulary/capability levels and once the low power recognizer is triggered by a keyword, the speech may be sent to the high-power recognizer.  Pantel does not distinguish between direct and indirect commands and it can determine a command even when the language is indirect.  See the example of Figure 3.  Considering that Claim 1 does not define the direct and indirect commands (other by specifying that the indirect requires confirmation), the sending of the type of command that requires a higher processing power (i.e. indirect) to the more powerful process is first well-known in the art and second suggested by the teachings of Pantel that have a two-tier voice recognition.  (Key is to a have a definition of “direct” vs. “indirect” commands and how each is detected; once we have a more complex NLU task, it become obvious that a more complex system would be employed.)

Regarding Claim 4, Pantel teaches:
4. The speech recognition method of claim 1, 
wherein the direct command is speech uttered by the user with an intent for the speech recognition apparatus to output the result of the performing of the speech recognition, and [Pantel, Figure 3, and Figure 7:  “[0104] 41 Does the Text Contain A Relevant Question, Message, or Command? [0105] 42 Generate Reply or Activate Action/Response (Full Regular Operation) [0106] 43 Are there Further Questions/Commands by the User? (Full Regular Operation)”]
wherein the indirect command is speech uttered by the user such that the speech recognition apparatus is unable to determine that the user intends for the speech recognition apparatus to output the result of the performing of the speech recognition.
Pantel and Ogawa do not address the indirect commands.
Cohrs teaches:
wherein the indirect command is speech uttered by the user such that the speech recognition apparatus is unable to determine that the user intends for the speech recognition apparatus to output the result of the performing of the speech recognition. [Cohrs provides an example of commanding a phone to make a call to a Bill King either by a direct and previously stored command such as “Call Bill King” or indirectly by telling someone else in the room:  “You should call Bill King” which thereby invokes the phone to wonder whether the command was directed at it or not.  Col. 2, lines 1-15.  The top paragraph of Col. 2 calls the second situation as a false recognition situation.  However, the next paragraph of Col. 2 (lines 16-23) clarifies that if the speaker confirms that he intended the second situation to be a command directed at the phone, then the command is carried out.]
Rationale for combination as provided for Claim 1 as this feature of indirect commands was mapped to Cohrs starting from Claim 1.

Regarding Claim 5, Pantel teaches:
5. The speech recognition method of claim 1, 
wherein the information related to the operating environment comprises at least one of a time, a location of the speech recognition apparatus identified according to a type of a network to which the speech recognition apparatus is connected, and whether the speech recognition apparatus is connected to another electronic apparatus, and 
wherein the direct command is determined based on at least one of a sentence ending in the speech signal, an intonation of the speech signal, a direction in which the speech signal is received, or a size of the speech signal. [Pantel is looking for pre-determined templates/ sentence structures that are stored in a “previously defined keyword- and phrase-catalog” for commands / “direct command” in order to activate the “dialog system 9.” at Figure 7, “[0103] 40 Analyze the Text of the Sentence in the Dialog System” and “[0104] 41 Does the Text Contain A Relevant Question, Message, or Command?”  and “[0105] 42 Generate Reply or Activate Action/Response (Full Regular Operation)”  See claims 14-16 and [0049] to [56] and [0060].  See [0003] and [0006] regarding input by “natural speech/voice” which indicates that the processing is a natural language processing system.  Pantel also finds the “pause” in the speech and interprets it as a “termination end” of the previous utterance and beginning of the next utterance and uses this pause to determine when the command began and ended in order to interpret the intent of the command.  The “Pause”/“Silence” is shown as space 16 in Figure 3 of Pantel.]
The changing catalog of commands/activation words is taught by Ogawa.
Ogawa teaches:
wherein the information related to the operating environment comprises at least one of a time, a location of the speech recognition apparatus identified according to a type of a network to which the speech recognition apparatus is connected, and whether the speech recognition apparatus is connected to another electronic apparatus, and [Ogawa, Figure 2 shows location as a type of sensor information and Figure 7, shows the “time range” in combination with “sensor” information (see, e.g., the third command of “Mow down”) as the type of information used to interpret the command.]
Rationale for combination as provided for Claim 1 as this feature of environment-dependent commands was combined from Ogawa starting from Claim 1.

Regarding Claim 6, Pantel looks in a catalog for “a keyword (18) or a phrase from a previously defined keyword- and phrase-catalog” that have a predefined structure or fit a template.
The keywords are not dependent on the operating environment.
Ogawa teaches:
6. The speech recognition method of claim 1, further comprising: 
storing the plurality of activation words for activating speech recognition respectively corresponding to a plurality of operating environments, the activation words being terms related to each of the operating environments and being terms that can be included in a sentence to trigger the speech recognition. [Ogawa, Figure 1 showing the “activation word database 24” that stores the activation words and “activation word control unit 23” which controls which activation words are valid according to an input from the “sensor monitoring unit 21.”  Figures 5 and 6 that show other embodiments of filtering the activation words according to environment as determined by sensor input.  [0033] … the sensor monitoring unit 21 instructs the activation word control unit 23 to use the word as an activation word….”  “The activation word control unit 23 controls the increase and decrease in the number of words used as activation words by registering a word in the activation word database 24 in response to an instruction from the sensor monitoring unit 21 ….”]
Rationale for combination as provided for Claim 1 as this feature of environment-dependent commands was combined from Ogawa starting from Claim 1.

Regarding Claim 8, Pantel teaches
8. The speech recognition method of claim 1, 
wherein the receiving of the input audio signal comprises storing the input audio signal, and [Pantel buffers audio continuously (Figure 2, 6).  “[0010] A software agent or a personal assistant system is in a power-saving standby mode or sleep state, the ambient noise--for example voice--picked up by one or more microphones being digitized and continually buffered in an audio buffer, so that the audio buffer constantly contains the ambient noises or voice from the most recent past, by way of example, those of the last 30 seconds….”]
wherein the performing of speech recognition comprises: 
determining whether the input audio signal comprises the speech signal of the utterance of the activation word included in the at least one activation word, and [Pantel buffers audio continuously (Figure 2, 6) and when a keyword or keyphrase is recognized (Figures 2 and 3, 12, 18) it subjects the audio, both buffered and live, 17, 19) to a primary voice recognition process (8) and generates text (13).
when it is determined that the input audio signal comprises the speech signal of the utterance of the activation word included in the at least one activation word, performing speech recognition on the stored input audio signal and a subsequently received input audio signal. [Pantel buffers audio continuously (Figure 2, 6) and when a keyword or keyphrase is recognized (Figures 2 and 3, 12, 18) it subjects the audio, both buffered and live, 17, 19) to a primary voice recognition process (8) and generates text (13).  The keyword indicates that the primary voice recognition should start processing the incoming speech and the live portion is the “subsequently received audio.”  “[0011] The more energy-intensive, primary voice recognition process now converts either the entire audio buffer or the most recent part starting at a recognized voice pause (which typically characterizes the beginning of a question phrase) into text, the primary voice recognition process then seamlessly continuing the conversion of the live transmission from the microphone….”  “[0031] As soon as the secondary voice recognition process 7 recognizes a potentially relevant keyword 18 or a phrase, e.g. "do you know", it arranges the temporary wakeup 12 of the primary voice recognition process 8 and a switch to full operation takes place. The content 21 of the audio buffer 6 is now handed over to the primary voice recognition process 8.”]

Claim 11 is a system Claim with limitations similar to the limitations of Claim 1 and is rejected under a similar rationale.  Note Figure 4 of Pantel for processor and memory.
Claims 12-16 are system Claims with limitations similar to the limitations of Claims 2-6 and are rejected under similar rationale.

Claim 18 is a system Claim with limitations similar to the limitations of Claim 8 and is rejected under a similar rationale.  

Claim 19 is a CRM system Claim with limitations similar to the limitations of Claim 1 and is rejected under a similar rationale.  Note Figure 4 of Pantel for memory.

Claims 7 and 17 are rejected under 35 U.S.C. 102(a)(1) as being unpatentable over Pantel, Ogawa, and Cohrs and further in view of Sharifi (U.S. 20170110130).
Regarding Claim 7, Pantel teaches [0042] … That is to say the trigger 12 of the secondary voice recognition process 7 reacts very sensitive ….”  But this is not what the Claim intends.
Sharifi  more expressly teaches:
7. The speech recognition method of claim 1, wherein the determining of the at least one activation word comprises determining a number of the at least one activation word, based on a degree of sensitivity of an activated speech recognition function of the speech recognition apparatus. [Sharifi teaches a scheme of detecting keywords/hotwords where the keyword detection is evaluated with a score and if this score is above a threshold then the keyword is considered as detected.  Thus, the higher the score or the lower the threshold, the higher the number of the keywords that are detected in a particular situation.  Sharifi adjusts the score (by weights) and the threshold according to the “level of sensitivity” of the situation and the data that is being accessed by the keyword such that for “lower sensitivity” material the query or command is executed more often even if the match is not that great.  This way the number of keywords that lead to the execution of the particular command is increased.  See paragraph [0055] for adjusting the weight of the similarity score or the threshold value according to the context of the query and see [0056] for the level of sensitivity of the data being a type of context that determines the score or the threshold and thus the number of hotwords that can execute in that particular context:  “[0056] For example, if the context of the command or query indicates a lower level of sensitivity with regards to personal or private data, then the similarity score or value may be weighted or the predetermined threshold score or value may be adjusted to more often allow the query or command to be executed regardless of a close similarity between the generated audio fingerprint of the utterance "Call Mom" and the generated hotword fingerprint of the utterance "OK Computer". Conversely, if the context of the command or query indicates a higher level of sensitivity, then the similarity score or value may be weighted or the predetermined threshold score or value may be adjusted to less often allow the query or command to be executed, e.g., requiring a closer similarity between the generated audio fingerprint of the utterance "Call Mom" and the generated hotword fingerprint of the utterance "OK Computer". Thus, the comparison between the hotword utterance and the query or command utterance may inhibit an unauthorized user's ability to replay a recorded hotword and issue a new query or command with their own voice”.]
Pantel/Ogawa/Cohrs and Sharifi pertain to the use of spoken keywords and speech recognition for execution of commands and it would have been obvious to add the context/sensitivity adjusted keyword activation of Sharifi which causes a greater number of uttered keywords to be executed in low sensitivity contexts with the system of combination in order to provide for a higher level of security or for an adjustable security level and as combining prior art elements according to known methods to yield predictable results or use of known technique to improve similar devices (methods, or products) in the same way. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Claim 17 is a system Claim with limitations similar to the limitations of Claim 7 and is rejected under a similar rationale.

Claim 9 is rejected under 35 U.S.C. 102(a)(1) as being unpatentable over Pantel, Ogawa, and Cohrs and further in view of Maghoul (U.S. 20090327263).
Regarding Claim 9, Pantel teaches “[0055] In an advantageous embodiment, the keyword- and phrase-catalog can be modified by the user. If the voice activation is done via the product name or a generic term, the user could, for example, define a nickname for the terminal 1 as a further, alternative keyword.”  And “[0056] The user could also delete some keywords or phrases from the catalog, e.g. if the personal assistant system should report less frequently or only in relation to certain topics.”  These teachings at the least suggest the device specific keyword generation of Claim 10.
Ogawa changes the list of active keywords according to sensor data but does not teach the frequency of input of a keyword as a method of detecting commands.
Cohrs does not teach the use of frequency of occurrence either.
Maghoul teaches:
9. The speech recognition method of claim 1, further comprising: 
receiving information about speech commands received from the user in a plurality of situations, wherein the receiving is performed by the speech recognition apparatus; [Maghoul, Title: “Background Contextual Conversational Search.”  Maghoul finds the context/situation of utterance of a keyword in a conversation while extracting keywords/commands from the conversation and the context of Maghoul includes location and time which are examples given by the instant Application for “situation.”  “2. … wherein the conversations comprise conversations over telephones, the method further comprising: sensing a contextual piece of information within the monitored conversation or as related to a status of one or more of the telephones ….”  “3. The method of claim 2, wherein the contextual piece of information comprises a location sensed by a global positioning system (GPS) device located within the telephone, words uttered in proximity to the identified words within the search string, a time stamp, or a combination thereof.”  The context data is received at the “speech recognition server 134” of Figure 2 at the “search string generator 228”:   “[0031] … The search string generator 228 may also take into account a context of the search strings, to include, but not limited to, words uttered in proximity to the search string but not included therein, a location of the mobile phone 105, 110 determined from a GPS device (see FIG. 3), a time stamp of the mobile phone 105, 110 and a detected gender of the user….”  Additionally, Maghoul teaches a “personalized entity list,” and the identity of the “persons” can also be mapped to a “situation.”  See [0030] for identifying the “personalized” keywords by identifying the telephone of the particular user.  The keywords of Maghoul teach the “commands” of the Claim because as shown in [0033] a sentence such as “Let’s go see Indiana Jones tonight” is treated as a search command to return the location and times of showing at the nearest movie theatres and the search considers the location and time of the query/command (context/situation) for returning results.  (Also note the example in paragraphs 174-175 of the instant Application where every portion of the input sentence is called a command.)]
extracting a plurality of words included in the speech commands; and [Maghoul, Figure 4, “digitize voice conversation into words and phrases 404” and “speech recognition of words and phrases 408.”]
based on a frequency of the plurality of words included in speech commands received in a specific situation among the plurality of situations, storing at least one word as an activation word corresponding to the specific situation. [Maghoul, Figure 4, “keep track and count 436” to “Frequency > Threshold? 440” to YES to “Add to Entity lists 444.”  “[0038] If there are no matches found at block 412, than at block 436, the search string of the query is tracked by incrementing a counter for the number of times a search for the same has been submitted to the search engine 138. This counting could take place either on the speech recognition server 134 or at the search engine 138. At block 440, the method asks whether the counter, representing the frequency with which the search string has been submitted to the search engine 138, is above a threshold value. If the answer is yes, at block 444, the search string (which could be one or more words) is added to each relevant entity list in database 140, including the global hot-list and/or the personalized entity list. If the answer is no, at block 448, the counter remains updated, but no further action is taken.”]
Pantel/Ogawa/Cohrs and Maghoul pertain to recognizing of spoken keywords/commands to perform a function such as a search.  It would have been obvious to add and combine the keyword compilation feature of Maghoul which adds to an entity list by associating keywords that were uttered with frequency above a threshold in a certain context/situation with the system of Pantel/Ogawa/Cohrs which also permits keyword generation and storage in order to allow for the system to dynamically add to the list of keywords and as combining prior art elements according to known methods to yield predictable results. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Claim 10 is rejected under 35 U.S.C. 102(a)(1) as being unpatentable over Pantel, Ogawa, and Cohrs and further in view of Shoemake (U.S. 20150244807).
Regarding Claim 10, Pantel teaches: “[0054] Using a product name as a keyword has the advantage that compared to a catalog with question words, the frequency at which the system unnecessarily changes to full operation can be reduced. When using a product name, it can be assumed that the personal assistant system is in charge. Example: "Hello, <product name>, please calculate the square root of 49", or "What time is it, <product name>?"”  where 0053] Product names, nicknames and generic terms for a direct address of the personal assistant system. Examples of generic terms: "mobile", "mobile phone", "smartphone", "computer", "navigator", "navi"."  This at the least suggests having the capability of being connected to various devices.
Ogawa and Cohrs don’t discuss this feature although their structure is again capable of being an initial command receiving and recognition device for other devices.
Shoemake very expressly teaches:
10. The speech recognition method of claim 1, wherein the determining of the at least one activation word comprises: 
obtaining information about at least one electronic apparatus connected to the speech recognition apparatus; and  [Shoemake, Figure 6.  A number of user devices from “First User Device 625” to “Nth User Device 630n” are being controlled with the same remote control.  Each of these devices have their own trigger words.  Figure 3D showing the “receive user input as voice input using voice recognition 368b” and [0029] to [0030]. ]
determining a word related to the at least one electronic apparatus as the at least one activation word. [Shoemake, “[0067] The same may apply to voice recognition, e.g., where a user may say a control phrase to trigger the menu to start or for the system to be activated to receive a directive, e.g., "Computer, Call Bob." or "Computer, Watch TV."”   See [0066] for context where [0066] pertains to gesture recognition and teaches that each gesture is a trigger for a certain device.  “[0012] Merely by way of example, some embodiments allow a consumer electronics device to connect to the Internet or other suitable network without the use of a dedicated remote controller or dedicated remote control device to send instructions to do so. Some embodiments allow for the consumer electronics device to be used by a user(s) without a dedicated remote control device at all. In some embodiments, latency minimization techniques may also be provided for communication of commands to the consumer electronics product under control.”]
Pantel/Ogawa/Cohrs and Shoemake pertain to the use of spoken commands and it would have been obvious to add the connected device specific trigger words of Shoemake with the system of combination which does teach having different trigger words for different applications as one potential scenario of use of the device which can be used to convey command to other devices (like a universal remote or a personal assistant).  This combination falls under combining prior art elements according to known methods to yield predictable results or simple substitution of one known element for another to obtain predictable results. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
            Regarding the effect of the environment on the “Activation Words” see also:
           Another Ogawa (U.S. 2017/0053650), Figure 3 shows the activation word databases and Figure 5 shows that which activation words are acknowledged depends on whether another device is detected.  
Wang (U.S. 9691384) teaches that trigger words are recognized according to context and context is defined as the type of application that is installed on the phone.  
Weksler (U.S. 9542941) suspends the requirement of a trigger word for commands at night time and requires a trigger word to precede a command during the day.  For location and time as context/environment see 
Maghoul (U.S. 20090327263) cited in the rejection of Claim 10. 
Teasley (U.S. 20140324431) has a different vocabulary subset for each room in the house for recognizing the commands pertaining to that room.

Independent Claims
1. A speech recognition method comprising: 
determining at least one activation word among a plurality of activation words based on information related to an operating environment in which the speech recognition apparatus is operating; 
receiving an input audio signal; 
performing speech recognition on the input audio signal, based on whether the input audio signal includes a speech signal of an utterance of an activation word included in the determined at least one activation word; and 
outputting a result of the performing of the speech recognition, 
wherein the performing of the speech recognition comprises: 
extracting text of an utterance of a user by performing speech recognition on the input audio signal, 
determining whether a speech command included in the input audio signal is a direct command or an indirect command, 
when it is determined that the speech command is the direct command, performing an operation of responding to the speech command, and 
when it is determined that the speech command is the indirect command: determining whether a confirmation command is detected, and 
in response to detecting the confirmation command from the user, 
performing the operation of responding to the speech command, 
wherein the at least one activation word corresponds to executable functions of the speech recognition apparatus.

11. A speech recognition apparatus comprising: 
a receiver configured to receive an input audio signal; and 
at least one processor configured to: 
determine at least one activation word comprising activation words among a plurality of activation words based on information related to an operating environment in which a speech recognition apparatus is operating, 
perform speech recognition on the input audio signal, when the input audio signal includes a speech signal of an utterance of an activation word included in the determined at least one activation word, and output a result of the speech recognition, 
wherein the performing of the speech recognition comprises: 
extracting text of an utterance of a user by performing speech recognition on the input audio signal, 
determining whether a speech command included in the input audio signal is a direct command or an indirect command, 
when it is determined that the speech command is the direct command, performing an operation of responding to the speech command, and 
when it is determined that the speech command is the indirect command:
determining whether a confirmation command is detected, and 
in response to detecting the confirmation command from the user, performing the operation of responding to the speech command, 
wherein the at least one activation word corresponds to executable functions of the speech recognition apparatus.

19. A non-transitory computer-readable recording medium having recorded thereon at least one program comprising instructions for allowing a speech recognition apparatus to execute a speech recognition method, 
the speech recognition method comprising: 
determining at least one activation word among a plurality of activation words based on information related to an operating environment in which the speech recognition apparatus is operating; 
receiving an input audio signal; 
performing speech recognition on the input audio signal, when the input audio signal includes a speech signal of an utterance of an activation word included in the determined at least one activation word; and 
outputting a result of the performing of the speech recognition, 
wherein the performing of the speech recognition comprises: 
extracting text of an utterance of a user by performing speech recognition on the input audio signal, 
determining whether a speech command included in the input audio signal is a direct command or an indirect command, 
when it is determined that the speech command is the direct command, performing an operation of responding to the speech command, and 
when it is determined that the speech command is the indirect command: determining whether a confirmation command is detected, and 
in response to detecting the confirmation command from the user, performing the operation of responding to the speech command, 
wherein the at least one activation word corresponds to executable functions of the speech recognition apparatus.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARIBA SIRJANI whose telephone number is (571)270-1499. The examiner can normally be reached on Monday through Thursday 9am to 4pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached on 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/FARIBA SIRJANI/
Primary Examiner, Art Unit 2659