Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Claims 1-20 are pending. Claims 1, 9 and 17 are independent.  The independent Claims have been amended and the Dependent Claims have been amended to include a “voice communication session.”  Claim 3 and counterparts 11 and 19 are amended to include two voice calls.
This Application was published as U.S. 20200043486.   
Apparent priority is 2 August 2018.
Applicant’s amendments and arguments are considered but are either unpersuasive or moot in view of the new grounds of rejection that if presented were necessitated by the amendments to the Claims.
This action is Final.  First RCE.
Response to Arguments
Arguments, while having been rendered moot in view of the modified grounds of rejection over Locker, are nevertheless discussed below.
Claim 1 is amended to state:
1. An apparatus comprising: 
a sound sensor; 
one or more communication interfaces; 
one or more processor devices; and 
one or more memory devices storing instructions executable by the one or more processor devices to: 
receive a mute data setting indicating a muted or unmuted state of a voice communication session with a first party from a user;
generate first audio data based on sound detected by the sound 
initiate transmission, via the one or more communication interfaces, of the first audio data to another device during the voice communication session based on a determination that the sound sensor is unmuted with respect to the voice communication session at the first time based on the state of the mute data setting indicating unmuted;
generate second audio data based on sound detected by the sound sensor at a second time; 
refrain from initiating transmission of the second audio data to the other device during the voice communication session based on a determination that the sound sensor is muted with respect to the voice communication session at the second time based on the state of the mute data setting indicating muted; and 
initiate a natural language processing operation on the second audio data based on detecting a wake phrase in the second audio data.

Applicant argues that Yu uses the “wakeup phrase” for muting while the Claim performs the muting first and then receives a “wake-up phrase”:

    PNG
    media_image1.png
    184
    563
    media_image1.png
    Greyscale



As a preliminary matter and as provided by the Applicant, Yu may use a button press for turning on the VPA instead of using a spoken keyword.  Yu, [0016]-[0017] and [0021]-[0024].   See, e.g. Yu: “[0024] During the call, the VPA 60 can be either in a manual mode in which the user expressly turns on the VPA 60 via a switch, button, or some other mechanical operation, or it can be in a set to a live mode (i.e., the VPA 60 is listening). … The termination of this mode can be done by an explicit cue, such as a button press or use of a particular phrase, or a pause on the part of the user.”  
Likewise, the Claim language is left broad and it was argued that the Claim may also use a button press for muting because “receive a mute data setting indicating a muted or unmuted state of a voice communication session with a first party from a user” does not specify by what method the “mute data setting” is received.
On the other hand, the amended language still does not foreclose the use of the trigger word for both muting and start of the NPL on the received voice (trigger + command).  The Claim now “refrain[s] from initiating transmission of the second audio data to the other device during the voice communication session … based on the state of the mute data setting indicating muted; and initiate[s] a natural language processing operation on the second audio data based on detecting a wake phrase in the second audio data.”  The initiation of the NLP on the second audio data is not contingent upon determination of a mute state.  NLP here is only contingent on detecting the wake/trigger phrase.

Thus, Yu is still sufficient for teaching Claim 1.  The wake word can be both the trigger for muting and the trigger for NLP.  The two are not inconsistent.

Applicant continues:

    PNG
    media_image2.png
    156
    580
    media_image2.png
    Greyscale


    PNG
    media_image3.png
    51
    572
    media_image3.png
    Greyscale

(Applicant’s Response, pp. 9-10.)
Applicant argues that, in contrast, in the Claim the muting occurs first and then a wakeup phrase and a command are both sent to the NLP.  In other words, the Claim begins with “receive a mute data setting indicating a muted or unmuted state of a voice communication session with a first party from a user,” and therefore the Claim mutes the ongoing communication by some method before it gets to the sending of the WakePhrase+Command.

As provided above, it is not clear to the Examiner how the current language indicates that the “second audio data” is muted.  It only says that if you detect mute then don’t send the second audio data.  The last limitation could have been a wherein clause connected to the “refrain” step to indicate that the “initiate NLP” is also occurring when the second audio data is muted.
In Yu only one of the button press and the wake-up phrase is used.  So, if Yu 
In this respect Applicant has provided arguments regarding an inappropriate delay occurring in the operation:

    PNG
    media_image4.png
    326
    574
    media_image4.png
    Greyscale

(Applicant’s Response, p. 10.)
The issue of delay is not on point.  The Claim has no language regarding any delay that could be caused or a method of avoiding it.  The mechanism of muting is left open in the Claim and if it were to be done by speech (trigger word) the same delay would occur in the Claim as in Yu.  Additionally, the durations of delay that are set forth in the response are not relevant in addition to not being supported by evidence.  See Locker [0015] which teaches buffering the input voice so it can be analyzed for commands and to make a determination of whether the input should be transmitted or not.  Locker speaks of a delay of milliseconds which it calls “essentially real time.”  “[0015] Embodiments provide for voice command devices that receive sound but do not transfer the voice data beyond the system unless certain voice-filtering criteria have been met. In addition, embodiments provide devices that support voice command operation while external voice data transmission is in mute operation mode. As such, devices according to embodiments may process voice data locally responsive to the voice data matching voice-filtering criteria. According to embodiments, voice command devices capture sound and analyze it in real-time on a word-by-word basis and decide whether to handle the voice data locally, transmit it externally, or both. Voice data received by a device may be buffered so that the device may analyze it according to embodiments. In addition, embodiments provide that any buffer delay may consist of a delay on the order of milliseconds. Thus, voice data transmission or voice activated commands may be executed essentially in real time or merely delayed within customary time periods experienced by similar devices.”

Any special method that avoids a recognition delay has to be in the Claim before it can be argued.  If the Claim wishes to be limited to pressing a button for muting, it should say so. 

Nevertheless, the mapping will assume that the second audio data is muted and then after having been muted is analyzed for presence of a wake phrase to get over this line of arguments.  The breadth of the language permits different interpretations.

Applicant further provided the following explanation:

    PNG
    media_image5.png
    291
    564
    media_image5.png
    Greyscale

(Applicant’s Response, p. 13.)
In the Claim, “a mute data setting” is received and this “mute data setting” is not limited to a button press or a keyword or a pause or anything else.  Thus, it remains broad and inclusive and taught by the methods of Yu.


However, to expedite prosecution, in the amended language the “mute data setting” is interpreted as separate from the “wake-up phrase” to avoid the above arguments.  A reference is added to address the amended language.

Additionally, for teaching the current Claim 1, in addition to Yu just one reference that teaches a virtual personal assistant that needs a combination of Wake Phrase + Command is required.  This is the well-established state of prior art to have a wake-word followed by a command and a combination of any of a multiplicity of references that use a wake-up word to first get the attention of the VPA, such as Hey Siri, Alexa, OK Google, or Cortana, followed by the actual commands would serve this purpose.
See rejection of Claim 7.

Further, note that the current language of the independent Claim still does not include two phone conversations.  Rather, there are two “voice communications” the second of which is to a NLP which is similar to what occurs in Yu.  Claim 1 is still similar to Yu except that it is interpreted to include an additional muting command (“mute setting”) and that the voice that it sends for NLP always includes a wake-up phrase plus a command.
Patentability of the other independent Claims is argued based on their similarity to Claim 1. Accordingly, the above provides a reply to those arguments as well.
Patentability of the dependent Claims is argued based on their dependence from their base independent Claims. Accordingly, the above provides a reply to those arguments as well.

Arguments Regarding Claims 3, 11, and 19
Applicant has separately provided arguments regarding Claims 3, 9, and 11.  (Applicant’s Response, pp. 11-13.)  Claim 3 is also amended:
3. The apparatus of claim 1, 
wherein initiating the natural language processing operation includes initiating transmission, via the one or more communication interfaces, of the second audio data to a remote natural language processing device, and
wherein the instructions further include instructions executable by the one or more processor devices 
to initiate a second voice communication session with a second party, via the one or more communication interfaces, based on a response from the natural language processing device to a spoken command detected in the second audio data and
to initiate transmission, via the one or more communication interfaces, of second audio data to a device of the second party during the second voice communication session based on a determination that the sound sensor is muted with respect to the voice communication session at the second time based on the state of the mute data setting indicating muted.

Support is found in Figure 3 and paragraph [0052] of the instant Application.
Paragraph [0052] states that the session with the Natural Language Processing Device 138 includes a command 202 to initiate the emergency call to the Emergency Comm. Device 302.  After the initiation of the Emergency Call (302), the two sessions: call to the Other Comm. Device (152) and the call to the Emergency Service (302) are held in parallel with neither session put on hold.  The mute setting 184 plays a role in which of 302 or 152 will be receiving voice. 
    PNG
    media_image6.png
    399
    566
    media_image6.png
    Greyscale


    PNG
    media_image7.png
    77
    558
    media_image7.png
    Greyscale


    PNG
    media_image8.png
    468
    731
    media_image8.png
    Greyscale

Applicant also provides the following helpful explanatory material:

    PNG
    media_image9.png
    318
    594
    media_image9.png
    Greyscale


    PNG
    media_image10.png
    227
    595
    media_image10.png
    Greyscale


    PNG
    media_image11.png
    239
    571
    media_image11.png
    Greyscale

(Applicant’s Response 12-13.)

Claim 3 begins a voice communication with another “party” which is interpreted as a second telephone conversation, such as that conducted with the 911 operator.
Yu sends the command to a NLP server for processing and determining the meaning of the spoken command and receives back the interpreted command at the VPA and carries out the command at the VPA.
Yu does not teach that the command dictates start of another telephone (voice) conversation session with another person.  The example of instant Application is the command to “call 911” and then establishing another call with the 911 operator.
Locker (Figure 3) teaches that the user (201) may be involved in a conference call with several other human participants (201 in conference call with 203 and 204).  Locker also teaches that the user may issue commands to the smart phone 205 which will not be sent to the other devices.  See Figure 1 for buffering the voice data to decide at the decision step 104 whether the voice should be maintained locally or transmitted externally. Locker, [0015]-[0016].  The examples that Locker shows for on the device command processing include Volume up/down, call <Name>, Calendar operations, and GPS operations.  See Figure 2, 209. Thus, it can be concluded that Locker teaches asking for another call to be made while in its command mode.  (Locker “[0025] … As a non-limiting example, a smart phone user engaged in a conversation may place the smart phone in mute operation such that the other caller may not hear the user's voice. However, the smart phone may still receive the user's voice for processing voice activated commands even though it is in mute operation and is not transmitting the user's voice externally.”)  
Locker additionally teaches that it can keep some conversations private from other ongoing phone calls:  “[0014] The operation of devices through voice commands is becoming more popular, especially for smart phones that have either small or no keyboards and for automobiles that require hands free operation of certain functions. However, a conflict may arise when a user needs to mute a device microphone due to background noise, feedback on a multi-party call, or to keep a side conversation private from others on a conference call. In addition, many devices according to current technology provide that a user manually switch off mute in order to use voice commands. This limitation appears to defeat the convenience and safety resulting from using a device in a `hands free` mode through voice commands. Furthermore, traditional muting may stop all voice operation of a device, while a user may want to maintain local operation but only mute the transmission of voice data. As such, a device that is able to treat voice data as locally active while outwardly muted would be highly desirable.”
Thus, as applied to Claim 3, Locker teaches that the user can mute his first call and command the smart phone to place a call to another person.  The muting is either manually done by the user pressing a button or touching the screen or by a spoken command that explicitly or impliedly requests the call to be muted.  Thus, in Locker, you would get two ongoing sets of phone calls (voice communications) one of which is muted in the interest of the other.  This teaches Claim 3.

Claim 11 is a computer program product system claim with limitations corresponding to the limitations of method Claim 3.
Claim 19 is a method claim with limitations corresponding to the limitations of method Claim 3.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 7 and 15 are rejected under rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 7 includes “7. The apparatus of claim 1, wherein the second audio data includes a wake phrase.”
Claim 1, as amended, includes: “initiate a natural language processing operation on the second audio data based on detecting a wake phrase in the second audio data.”
Antecedent basis is not clear.  Is it intended to be: “7. The apparatus of claim 1, wherein the second audio data includes [[a]] the wake phrase?”

Additionally, and aside from the antecedent basis issue, Claim 1 already states that “a wake phrase” is detected in the “second audio data” and Claim 7, which depends from Claim 1, needs to be further limiting (112(d)).  Inclusion of Claim 7 indicates that the intent is to keep the language of Claim 1 broad and implies that the “wake phrase” may not be present in the “second audio data.”
Please clarify.

Claim 15 includes similar language and is rejected for similar reasons.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.



Claims 1-3, 5-6, 8-11, 13-14, and 16-19 are rejected under 35 U.S.C. 103 as being unpatentable over Yu (U.S. 2014/0297288) in view of Locker (U.S. 20120166184).
Regarding Claim 1, Yu teaches:
1. An apparatus comprising: 
a sound sensor; [Yu, Figure 1, “microphone 20.”]
one or more communication interfaces; [Yu, Figure 1, the “VPA smartphone 10” which is in communication with other phones and remote servers and therefore inherently includes communication interfaces.]
one or more processor devices; and [Yu, “[0047] The system or systems described herein may be implemented on any form of computer or computers and the components may be implemented as dedicated applications or in client-server architectures, including a web-based architecture, and can include functional programs, codes, and code segments. Any of the computers may comprise a processor, a memory for storing program data and executing it, a permanent storage such as a disk drive, a communications port for handling communications with external devices, and user interface devices, including a display, keyboard, mouse, etc….”  See also [0050].]
one or more memory devices storing instructions executable by the one or more processor devices to: [Yu, “[0047] … Any of the computers may comprise a processor, a memory for storing program data and executing it, a permanent storage such as a disk drive, a communications port for handling communications with external devices, and user interface devices, including a display, keyboard, mouse, etc….”  See also [0050].]
receive a mute data setting indicating a muted or unmuted state of  a voice communication session with a first party from a user; [Yu teaches that the device goes on mute with respect to sending the voice to the other side and one of the methods of invoking this mute is by keyword and the other method is by a “button” both of which would be received from a “user.”  Accordingly, Yu teaches this limitation.  Here the “mute data setting” is mapped to the button press of Yu:  “[0016] … When operating in a whisper mode, upon the trigger of voice commands (either a wake-up phrase or a button), the phone call client can suspend the transmission of voice or goes on mute. ….”  “[0024] During the call, the VPA 60 can be either in a manual mode in which the user expressly turns on the VPA 60 via a switch, button, or some other mechanical operation, or it can be in a set to a live mode (i.e., the VPA 60 is listening). … The termination of this mode can be done by an explicit cue, such as a button press or use of a particular phrase, or a pause on the part of the user.”  Yu, [0016]-[0017] and [0021]-[0024]. ]
generate first audio data based on sound detected by the sound sensor at a first time; [Yu, the “first audio data” of the Claim is the input voice intended for communication with the “other party phone 170.” “Phone speech” in Figure 1.]
initiate transmission, via the one or more communication interfaces, of the first audio data to another device during the voice communication session based on a determination that the sound sensor is unmuted with respect to the voice communication session at the first time based on the state of the mute data setting indicating unmuted; [Yu, Figure 1, this is when the user is using the phone for calling another person and no muting has been done.  “[0016] The smartphone 10 can pass along phone speech, local commands 70, remote commands 90, as well as prompts from the VPA 60, if it is not in whisper mode, through the telephone company network 150 to another party's phone 170 (generically, a second user equipment). …”  In normal phone operation, the “Voice Personal Assistant (VPA)” is unmuted with respect to user’s input voice intended for the party on the other end of the line.  “[0024] During the call, the VPA 60 can be either in a manual mode in which the user expressly turns on the VPA 60 via a switch, button, or some other mechanical operation, or it can be in a set to a live mode (i.e., the VPA 60 is listening). … The termination of this mode can be done by an explicit cue, such as a button press or use of a particular phrase, or a pause on the part of the user.”]
generate second audio data based on sound detected by the sound sensor at a second time; [Yu, the “second audio data” is the data from the voice received during the “whisper mode” which are not sent to the other user’s phone:  “5. … operating in a whisper mode in which local commands and remote commands are not communicated to the second user equipment.”]
refrain from initiating transmission of the second audio data to the other device during the voice communication session based on a determination that the sound sensor is muted with respect to the voice communication session at the second time based on the state of the mute data setting indicating muted; and [Yu, during the “whisper mode” the input commands are not sent to the other person’s phone.  The phone is muted with respect to communication to the other party on the phone call.  “[0016] … When operating in a whisper mode, upon the trigger of voice commands (either a wake-up phrase or a button), the phone call client can suspend the transmission of voice or goes on mute. …” ]
initiate a natural language processing operation on the second audio data based on detecting a wake phrase in the second audio data. [Yu, the input in Yu is in “natural language” and it may be subject to “natural language processing” either locally on the device or remotely by being sent to the “speech server 110.”  Figure 2 shows the “wake-up command” / “wake phrase” which signals the VPA to being monitoring for remote commands.   “[0010] A natural language VPA is provided below that greatly enhances the use of voice commands on a telephone/smartphone device.”  “[0014] In an embodiment of the inventive phone 10, a VPA 60 is provided that can assist the user by dealing with various commands. These commands can be local commands 70 that are interpreted and handled by the VPA 60, or remote commands 80 that are passed on to a speech server 110, located in a cloud 100. In addition, the VPA 60 can interact with various applications 120, particularly once it has received an interpretation of speech received from the speech server 110 (or locally obtained and processed speech)”  “[0017] In one embodiment, the only local command 70 recognized interpreted and handled by the VPA 60 is a wake-up command. It should be noted that the wake-up command is a separate type of local command 70 (technically in a completely separate class), because the VPA 60 needs to listen for it all the time….”]

    PNG
    media_image12.png
    278
    351
    media_image12.png
    Greyscale

    PNG
    media_image13.png
    240
    215
    media_image13.png
    Greyscale



Based on the arguments of the Applicant, the “mute data setting” of the Claim and the “wake-up phrase” are considered to be different such that the “mute data setting” sets whether or not the voice of the user is muted for the party on the other side of a call and the “wake-up phrase” is used in combination with a following “command” and is issued as “wake-up phrase + command” to the smart phone / virtual personal assistant to invoke some function of the smart phone.
As provided in the arguments, the language is still broad.  But, to avoid further arguments in this respect a reference is added.

    PNG
    media_image14.png
    428
    512
    media_image14.png
    Greyscale

Locker teaches:
1. An apparatus comprising: [Locker, Figure 2, cellular phone 201.]
a sound sensor; [Locker, Figure 2, “microphone 205.”]
one or more communication interfaces; [Locker, Figure 4, “a network interface 454 (for example, LAN).”]
one or more processor devices; and [Locker, Figure 4,  “one or more processors 422.”  ]
one or more memory devices storing instructions executable by the one or more processor devices to: [Locker, Figure 4, “[0029] …The architecture of the chipset 410 includes a core and memory control group 420 and an I/O controller hub 450 that exchanges information (for example, data, signals, commands, et cetera) via a direct management interface (DMI) 442 or a link controller 444. ….”]
receive a mute data setting indicating a muted or unmuted state of a voice communication session with a first party from a user; [Locker teaches manual mute control and voice mute control.  “[0015] …In addition, many devices according to current technology provide that a user manually switch off mute in order to use voice commands. …. As such, a device that is able to treat voice data as locally active while outwardly muted would be highly desirable.” “[0025] … As a non-limiting example, a smart phone user engaged in a conversation may place the smart phone in mute operation such that the other caller may not hear the user's voice. However, the smart phone may still receive the user's voice for processing voice activated commands even though it is in mute operation and is not transmitting the user's voice externally.”  “[0024] Embodiments provide that a device may determine whether to only maintain voice data locally based on whether the voice command mode has been enabled or disabled through a non-vocal method. Such non-vocal methods include, but are not limited to, button press, touchscreen gesture, face recognition, physical gesture with device, and physical gesture as detected by a camera….”]
generate first audio data based on sound detected by the sound sensor at a first time; [Locker, Figure 1 and Figure 2.  A user is speaking with two other users in a conference call.  His voice is received at the mic 205 and converted to audio data.  Figure 1 shows receiving of the voice and determining whether it should be transmitted externally or processed internally.  “[0016] Referring now to FIG. 1, therein is depicted an example embodiment. Voice data 101 is received by a voice data control system 102 and buffered 103. The voice data control system 102 analyzes the voice data 101 to determine whether the voice data 101 should be handled locally 105 or transmitted externally 106. …”]
initiate transmission, via the one or more communication interfaces, of the first audio data to another device during the voice communication session based on a determination that the sound sensor is unmuted with respect to the voice communication session at the first time based on the state of the mute data setting indicating unmuted; [Locker, Figure 2, the “cellular phone 201” is in conference call with two other phones.  “[0018] … As depicted in FIG. 2, the cellular phone 201 is engaged in a conference call 202 wherein it is communicating with two other cellular phones 203, 204….”  Figure 1, decision step 104 determines whether the input voice should be transmitted 106 or maintained locally and makes this determination based on whether the voice data corresponds with certain criteria at 104 which means if the voice data includes a mute command.]
generate second audio data based on sound detected by the sound sensor at a second time; [Locker, Figure 1, “voice data 101,” Figure 2, “open calendar 208.”]
refrain from initiating transmission of the second audio data to the other device during the voice communication session based on a determination that the sound sensor is muted with respect to the voice communication session at the second time based on the state of the mute data setting indicating muted; and [Locker, “9. The system according to claim 1, further comprising: a mute control configured to enable a mute operation mode; wherein responsive to the mute operation mode being enabled: the voice data is handled locally; and the voice data is not transmitted externally.”  “[0025] According to embodiments, voice command operation of a device is supported during mute operation. As such, certain embodiments provide for a `sound firewall` wherein a device microphone remains active, however, no sound is transferred beyond the device unless certain criteria are met. Embodiments provide that a user may activate mute operation on a device, such that the user's voice is not transmitted externally, but voice commands remain active. As a non-limiting example, a smart phone user engaged in a conversation may place the smart phone in mute operation such that the other caller may not hear the user's voice. However, the smart phone may still receive the user's voice for processing voice activated commands even though it is in mute operation and is not transmitting the user's voice externally.”]
initiate a natural language processing operation on the second audio data based on detecting a wake phrase in the second audio data.[Locker teaches that its system looks for certain discrete words / “wake phrase” or “multi-word phrase” / including the “wake phrase” in order to handle the data locally and conducts processing (Figure 1, 105) on it.  “[0021] Whether voice data is only processed locally at the device may be determined according to embodiments based on discrete words or pauses in a trained user's voice, including, but not limited to, detecting pauses in the speech that are contrary to normal conversational speech. As a non-limiting example, a GPS navigation device may have a "go to <location>" command for setting <location> as the user-specified location for the GPS navigation program.”  Locker also teaches that a combination of words (mapped to Wake phrase + command) indicates local processing of voice data as a command directed at some function of the smart phone such as GPS or Calendar or Volume.  “[0022] In addition, embodiments may determine whether voice data is local based on word-filtering criteria involving certain multi-word phrases or word pairings. As a non-limiting example, a GPS navigation device may not handle the word `navigate` in isolation as a voice command that must not be transmitted externally. However, the GPS navigation device may respond to voice commands that involve multiple word pairings such as `navigate on` or `navigate off` as voice commands. As such, the voice commands will only be processed locally by the device and will not be transmitted externally.”  In this example the “navigate” portion is the “wake phrase” and “on” or “off” are the “command portions of the “second audio data” of the Claim.  “[0017] According to embodiments, voice-filtering criteria may include a list of predetermined or learned voice commands that are not transmitted and are only processed locally. A non-limiting example involves a cellular phone wherein a predetermined set of commands such as call, text, and volume commands activate certain functions and are only processed locally….”  Natural language processing is not express but is at the least suggested by the types of examples that require an understanding of the natural language input beyond keyword spotting:  “[0021] …On the other hand, if a user states that he would "like to go to <location> this week if I have the time," the device will discern that the phrase "go to <location>" was in normal conversation because it lacks distinguishing pauses. Thus, the device will not set the location of the GPS navigation program to <location> and will allow the voice data to be transmitted externally.”]

Yu and Locker both pertain to the use of a smart phone for the dual purpose of making phone calls or issuing spoken commands to the device directed at some other functions such as Calendar, GPS, or Volume.  Both references teach a manual mute setting as one option.  It would have been obvious to combine the system of Locker which teaches the use of “multi-word phrases or word pairings” as commands to be handled locally by the smart phone with the system of Yu with the option of using the press of the button for muting to arrive at the system of the Claim where when the mute control is set to muting, the system still would have to see the “navigate” / wake-phrase and “on”/ command together in order consider it a command directed at the smart phone internally and process and execute it accordingly.  In Locker, the certain words that invoke a local function of the device ([0017]) can be mapped to the “wake phrase” of the Claim.  Additionally, the two embodiments of Yu or Locker can be combined together within one references also.  Two references were added for clarity and to showcase the capabilities of the prior art.  The combination provides added security that the command is issued only when the phone is otherwise muted and also that some function of the device is not inadvertently invoked.  This combination falls under combining prior art elements according to known methods to yield predictable results or use of known technique to improve similar devices (methods, or products) in the same way. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Regarding Claim 2, Yu teaches:
2. The apparatus of claim 1, wherein initiating the natural language processing operation includes initiating transmission, via the one or more communication interfaces, of the second audio data to a remote natural language processing device. [Yu, Figures 1 and 2.  The detection of the “wake-up command” by the VPA causes the following speech to be sent to the remote “speech server 110” for natural language processing and detection of other commands.  “[0020] FIG. 2 is a state diagram that shows the various states of the VPA 60 in an embodiment. Here, the VPA 60 starts out in a sleep state, or a "listen for wakeup phrase" state 200 (this naming also includes the equivalent of a "wait for button press" or other element for transitioning out of a sleep state for the VPA 60). Once the wakeup phrase is heard (or button pressed), the VPA 60 transitions 205 into an active state, or a "listen for command" state 210. In this state, the VPA 60 is actively listening for commands, and interpreting any local commands 70 that are provided, while streaming or sending any remote commands 80 to the speech server 110….”]

Regarding Claim 3, Yu teaches:
3. The apparatus of claim 1, 
wherein initiating the natural language processing operation includes initiating transmission, via the one or more communication interfaces, of the second audio data to a remote natural language processing device, and [Yu, Figures 1 and 2.  The detection of the “wake-up command” by the VPA causes the speech following the “wake-up command” to be sent to the remote “speech server 110” for natural language processing and detection of other commands.  See [0020] and rejection of Claim 2.]
wherein the instructions further include instructions executable by the one or more processor devices to initiate a second voice communication session with a second party, via the one or more communication interfaces, based on a response from the natural language processing device to a spoken command detected in the second audio data and [Yu, Figure 1 shows transmission either to the “other party phone 170” which is the “first communication session” of Claim 1 or to the “speech server 110” which could have been the “second communication session” of this Claim if the second communication session were not a “voice communication session with a second party” which is interpreted as a phone call between the user and another person.  The transmission to the “speech server 110” is based on the detection of the “command” following the “wake-up phrase” / “wake phrase.”   “[0020] FIG. 2 is a state diagram that shows the various states of the VPA 60 in an embodiment. Here, the VPA 60 starts out in a sleep state, or a "listen for wakeup phrase" state 200 (this naming also includes the equivalent of a "wait for button press" or other element for transitioning out of a sleep state for the VPA 60). Once the wakeup phrase is heard (or button pressed), the VPA 60 transitions 205 into an active state, or a "listen for command" state 210. In this state, the VPA 60 is actively listening for commands, and interpreting any local commands 70 that are provided, while streaming or sending any remote commands 80 to the speech server 110. When a command is heard, the VPA 60 transitions 215 to a "process command" state 220. For a local command 70, the command is processed by a routine associated with the VPA 60. Once the processing of the command is complete, the VPA 60 transitions 225 into the "listen for command" state 210. For a remote command 80, the VPA 60 waits for the interpretation of the command 85 to come from the speech server 110, and the VPA 60 or routine associated with it executes based on the interpretation. …”]
to initiate transmission, via the one or more communication interfaces, of second audio data to a device of the second party during the second voice communication session based on a determination that the sound sensor is muted with respect to the voice communication session at the second time based on the state of the mute data setting indicating muted.

Yu sends the command to a NLP server for processing and determining the meaning of the spoken command and receives back the interpreted command at the VPA and carries out the command at the VPA.
Yu does not teach that the command dictates start of another telephone (voice) conversation session with another person.  The example of instant Application is the command to “call 911.” 
Locker teaches:
 3. The apparatus of claim 1, 
wherein initiating the natural language processing operation includes initiating transmission, via the one or more communication interfaces, of the second audio data to a remote natural language processing device, and [Locker, Figure 1, “maintain voice data locally 105.”  Locker does not teach that it transmits the command data / second audio data to a remote NLP processor.]
wherein the instructions further include instructions executable by the one or more processor devices 
to initiate a second voice communication session with a second party, via the one or more communication interfaces, based on a response from the natural language processing device to a spoken command detected in the second audio data and [Locker, Figure 2, one of the types of local command is “call <name>” which is a command to the device to place a voice call to another party.]
to initiate transmission, via the one or more communication interfaces, of second audio data to a device of the second party during the second voice communication session based on a determination that the sound sensor is muted with respect to the voice communication session at the second time based on the state of the mute data setting indicating muted. [Locker, the result of the command of “call <name>” would be a call placed to <name> and all of this occurs during a period when the initially on-going conference call between 201, 203, and 204 is on mute.]
Yu and Locker pertain to the use of smart phones both for making voice calls and as a virtual personal assistant and it would have been obvious to combine the teaching of Locker with respect to issuing a “call <name>” command to the VPA during another muted call with the teachings of Yu that include commands to the VPA but do not specify making a second phone call as one of the command to improve the capabilities of the VPA.  This combination falls under combining prior art elements according to known methods to yield predictable results or use of known technique to improve similar devices (methods, or products) in the same way. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Regarding Claim 5, Yu teaches:
5. The apparatus of claim 3, wherein the instructions are executable by the one or more processor devices to conduct the voice communication session and the second voice communication session in parallel. [Yu, Figure 1 shows that both sessions can be happening in parallel.  Depending on the incoming voice, the audio will be sent to the other party’s phone or to the speech server.  See also:  “[0025] Actions that are performed can be based on an interpretation of the voice command. Activities that can be done using the VPA 60 while in the call can include contact searching, for example, obtaining information about someone mentioned in the call, or web searching, for example, to obtain information about a restaurant for which plans are being made. …”  The call is going on while the search is being done to accommodate the conversation.]
Yu does not teach that the second communication session is a voice communication session.
Locker teaches:
conduct the voice communication session and the second voice communication session in parallel. [Locker in Figure 2 shows a conference call that is ongoing when the user causes a mute operation either manually or by issuing a known local command 209 to the smart phone.  The commands 209 include a “call <name>” which means that a call to <name> can be made and would be ongoing while the previously occurring conference all is on mute.  Note, if the calls are joined, this is just generating a conference call.  If the previous ones are muted, then taught by Locker.]
Rationale for combination as provided for Claim 3.

Regarding Claim 6, Yu teaches:
6. The apparatus of claim 1, wherein initiating the natural language processing operation includes storing, in the one or more memory devices, a note associated with the voice communication session based on a spoken command detected in the second audio data. [Yu teaches adding items to a to-do list which teaches “a note associated with the voice communications session based on the spoken command”:  “[0025] …  In a further example, actions related to later recall/remembering may be implemented. For example, a to-do list can be activated, and items being discussed in the call can be added--or in a variation, an action item can be added to a list. The user could instruct the VPA 60 to record the last x seconds of a call that contains information that might be useful to access later.”]

Regarding Claim 8, Yu teaches:
8. The apparatus of claim 1, further comprising 
an audio output device, [Yu, Figure 1, “speaker 15.”]
wherein the instructions are further executable by the one or more processor devices to initiate output, via the audio output device, of sound based on communication data received, via the one or more communication interfaces, from the other device during the voice communication session. [Yu, Figure 1, the output of returning speech from the “other party phone 170” or from the servers in “cloud 100” is output from the “speaker 15” at 9.  “[0013] FIG. 1 illustrates an embodiment of the VPA smartphone 10 (generically, a first user equipment). As with any telephone, a user 5 can input a voice audio signal 7 into a microphone 20, and receive an audio signal back 9 from a speaker 15. The smartphone comprises a touch screen 30, a mobile operating system 40, and a phone call client 50 that serves to connect the user to another party's phone 170 over the telephone company network 150.”]

Claim 9 is a computer program product system claim with limitations corresponding to the limitations of method Claim 1 and is rejected under similar rationale.
Addtionally, Yu teaches “19. A non-transitory computer program product, comprising a computer usable medium having a computer readable program code embodied therein, said computer readable program code adapted to be executed to implement the method according to claim 1.”  See [0047].

Claim 10 is a computer program product system claim with limitations corresponding to the limitations of method Claim 2 and is rejected under similar rationale.
Claim 11 is a computer program product system claim with limitations corresponding to the limitations of method Claim 3 and is rejected under similar rationale.

Claim 13 is a computer program product system claim with limitations corresponding to the limitations of method Claim 5 and is rejected under similar rationale.
Claim 14 is a computer program product system claim with limitations corresponding to the limitations of method Claim 6 and is rejected under similar rationale.

Claim 16 is a computer program product system claim with limitations corresponding to the limitations of method Claim 8 and is rejected under similar rationale.
Claim 17 is a method claim with limitations corresponding to the limitations of method Claim 1 and is rejected under similar rationale.
Claim 18 is a method claim with limitations corresponding to the limitations of method Claim 2 and is rejected under similar rationale.
Claim 19 is a method claim with limitations corresponding to the limitations of method Claim 3 and is rejected under similar rationale.

Claims 7 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Yu and Locker in view of Bundalo (U.S. 20190005953).
Regarding Claim 7, Yu teaches:
7. The apparatus of claim 1, wherein the second audio data includes a wake phrase. [Yu teaches that its “local command” which is “interpreted and executed by the VPA” includes “a wake-up command that the VPA always monitors for … that instructs the VPA to enter into an active mode” and after this wake-up command normal local commands can be issued by the user and executed by the device.  See claims 1, 2, and 3 of Yu.  “[0017] In one embodiment, the only local command 70 recognized interpreted and handled by the VPA 60 is a wake-up command. It should be noted that the wake-up command is a separate type of local command 70 (technically in a completely separate class), because the VPA 60 needs to listen for it all the time. This is typically done using special hardware. In contrast, a normal local command (e.g., simple voice commands on Android) do not require the VPA to be "always listening." Once recognition is triggered either by a wake-up phrase or a button, normal local commands 70 can be handled by software, which instructs the VPA 60 to begin listening for commands so that any other commands, which are remote commands 80, are streamed or sent….”  “[0020] FIG. 2 is a state diagram that shows the various states of the VPA 60 in an embodiment. Here, the VPA 60 starts out in a sleep state, or a "listen for wakeup phrase" state 200 (this naming also includes the equivalent of a "wait for button press" or other element for transitioning out of a sleep state for the VPA 60). Once the wakeup phrase is heard (or button pressed), the VPA 60 transitions 205 into an active state, or a "listen for command" state 210….”]
Thus, Yu teaches that a “wakeup phrase” can be used to transition the VPA to an active state.  However, in Yu, the same “wakeup phrase” is also used to mute the call with the party on the other side.  Therefore, as argued by the Applicant, the “wakeup phrase” of Claim 1 was interpreted to be input after the muting had been done by some other method.
To address, the scenario of Claim 1, the muting of Claim 1 was mapped to muting by a button press which is taught by Yu as an alternative to the “wakeup phrase” and the “wakeup phrase” was mapped to Locker’s wakeup phrases.
To buttress the mapping and for clarity and to avoid arguments over a point that is considered minor in this situation, a third reference is added that expressly teaches that it is well-known to include a wakeup/attention/trigger phrase to alert a virtual personal assistant that it is being addressed (such as Hey Siri, OK Google, Alexa) followed by a further command.
Bundalo teaches:
wherein the second audio data includes a wake phrase. [Bundalo is directed to a two-tier detection of a wakeword and includes the well-known use of wake word plus command which uses a wakeword to get the attention of the device followed by an actual command to be executed and also includes options where the wakeword is not necessary:  “[0017] As used herein, the term "wakeword" may correspond to a "keyword" or "key phrase," an "activation word" or "activation words," or a "trigger," "trigger word," or "trigger expression." One exemplary wakeword may be a name, such as the name, "Alexa," however any word (e.g., "Amazon"), or series of words (e.g., "Wake Up" or "Hello, Alexa") may alternatively be used as the wakeword. Furthermore, the wakeword may be set or programmed by an individual operating a voice activated electronic device, and in some embodiments more than one wakeword (e.g., two or more different wakewords) may be available to activate a voice activated electronic device. In yet another embodiment, the trigger that is used to activate a voice activated device may be any series of temporally related sounds.”  “[0018] As used herein, the term "utterance" may correspond to a spoken word, statement, or sound. In some embodiments, an utterance may include the wakeword followed by an invocation, such as a request, question, or command. In this particular instance, the utterance may begin with the wakeword being spoken, and may end when a last word, phoneme, or sound is spoken. For example, an utterance may correspond to the question, "Alexa--What is the weather currently like?" As another example, an utterance may be, "Alexa--Play my workout music," or "Alexa--Buy that." Further still, an utterance, which need not include the wakeword, may be, "Turn up the volume" or "Call mom."”]

Yu, Locker, and Bundalo pertain to processing spoken command directed at devices including PDAs and it would have been obvious to add the express teaching of wakeword plus command from Bundalo with the system of Yu/Locker which teaches the same concept but not in such black and white terminology for completeness.  This combination falls under combining prior art elements according to known methods to yield predictable results or simple substitution of one known element for another to obtain predictable results. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Claim 15 is a computer program product system claim with limitations corresponding to the limitations of method Claim 7 and is rejected under similar rationale.

Claims 4, 12, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Yu and Locker in view of Busca (U.S. 2015/0331666).
Busca is directed to “System and Method for Processing Control Commands in a Voice Interactive System.”  Title.
In Busca, the spoken commands can be transmitted to a remote server 120 where they are remotely interpreted 190 and sent back for execution at the location of the Voice Interactive Personal Assistant (VIPA) local device 102.  One of the mentioned commands is the command for voice activated phone calling including an emergency call.  “[0059] Other interactive applications 124/services include voice interactive emergency services, voice activated phone calling, voice controlled intercom, voice control for automation, voice controlled email, voice controlled banking transactions, voice controlled audio news feeds, voice controlled internet search, voice controlled ordering and reading of electronic books (eBooks), audio targeted advertising, and/or audio voice reminders.”  “[0061] For voice controlled phone applications 124, the caller list can be synchronized and updated with a phone contact list on a typical computing device or mobile phone….”
[0083] For the remote processing branch, in step 624, the local device 102 include speech phrase in a message and send message in a control stream 180 to VIPA interface 201 on remote server 120 to process the speech phrase, where the remote server 120 determines if the speech phrase matches a command in a set of remotely interpreted commands 190, and executes interactive applications 124 for remotely interpreted commands 190 matching the speech phrases in the messages.


    PNG
    media_image15.png
    598
    379
    media_image15.png
    Greyscale


Regarding Claim 4, Yu teaches that the “second communication session” can be for “web searching, social network updating, calendar scheduling, and a media server.”  Yu does not teach the commands to be associated with an emergency service.  However, the teachings of Yu, e.g. in [0025], strongly suggest that anything that comes up during the conversation with the other party is fair game for the search function of the “second communication session.”
Locker does not teach that the additional calls that the Caller may make is to 911 although there are no limitations to the other calls that the User/Caller may make.
Busca teaches:
4. The apparatus of claim 3, wherein the second voice communication session is associated with an emergency service. [Busca is directed to voice commands some of which are interpreted locally and some remotely and included a command associated with an emergency service as one example: “[0059] Other interactive applications 124/services include voice interactive emergency services, voice activated phone calling, voice controlled intercom, voice control for automation, voice controlled email, voice controlled banking transactions, voice controlled audio news feeds, voice controlled internet search, voice controlled ordering and reading of electronic books (eBooks), audio targeted advertising, and/or audio voice reminders.”]
Yu, Locker, and Busca pertain to processing spoken command directed at devices including PDAs and it would have been obvious to add the command for emergency service application from Busca with the similar list of commands from Yu/Locker for a more complete set of commands and considering that the context of the conversation in Yu/Locker may necessitate a command to invoke emergency services as it may make the search of a contact name desirable.  This combination falls under combining prior art elements according to known methods to yield predictable results or simple substitution of one known element for another to obtain predictable results. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Claim 12 is a computer program product system claim with limitations corresponding to the limitations of method Claim 4 and is rejected under similar rationale.
Claim 20 is a method claim with limitations corresponding to the limitations of method Claim 4 and is rejected under similar rationale.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 

Locker (U.S. 2012/0166184) also mutes the outgoing data but permits local receiving of voice data and executing commands in response to it:  “[0014] … Furthermore, traditional muting may stop all voice operation of a device, while a user may want to maintain local operation but only mute the transmission of voice data. As such, a device that is able to treat voice data as locally active while outwardly muted would be highly desirable.”  “[0015] Embodiments provide for voice command devices that receive sound but do not transfer the voice data beyond the system unless certain voice-filtering criteria have been met. In addition, embodiments provide devices that support voice command operation while external voice data transmission is in mute operation mode….”  “[0025] … As a non-limiting example, a smart phone user engaged in a conversation may place the smart phone in mute operation such that the other caller may not hear the user's voice. However, the smart phone may still receive the user's voice for processing voice activated commands even though it is in mute operation and is not transmitting the user's voice externally.”  “[0028] … In addition, the computer system and circuitry may also be utilized in other devices, including, but not limited to, a smart phone, Personal Digital Assistant (PDA), or a computing system embedded in an automobile.”

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARIBA SIRJANI whose telephone number is (571)270-1499.  The examiner can normally be reached on 9 to 5, M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Desir can be reached on 571-272-7799.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Fariba Sirjani/
Primary Examiner, Art Unit 2659