PNG
    media_image1.png
    172
    172
    media_image1.png
    Greyscale
United States Patent and Trademark Office    
        
            
                                
            
        
    

Commissioner for Patents
United States Patent and Trademark Office
P.O. Box 1450
Alexandria, VA 22313-1450
www.uspto.gov











BEFORE THE PATENT TRIAL AND APPEAL BOARD


Application Number: 15/226,054
Filing Date: 2 Aug 2016
Appellant(s): Sung et al.



__________________
Nicholas W. Stephens
For Appellant


EXAMINER’S ANSWER





This is in response to the appeal brief filed 03/10/2020.

(1) Grounds of Rejection to be Reviewed on Appeal
Every ground of rejection set forth in the Office action dated 10/10/2019 from which the appeal is taken is being maintained by the examiner except for the grounds of rejection (if any) listed under the subheading “WITHDRAWN REJECTIONS.”  New grounds of rejection (if any) are provided under the subheading “NEW GROUNDS OF REJECTION.”

The following ground(s) of rejection are applicable to the appealed claims.

Claims 1, 2, 4, 11-14, and 24-26, 28-31, and 33-36 are rejected under 35 U.S.C. 103 as being unpatentable over Bang et al. (hereinafter Bang), US 2016/0034253 A1, published on February 4, 2016, in view of Sumner et al. (hereinafter Sumner), US 2016/0260433 A1, published on September 8, 2016 (filed on August 28, 2015).

With respect to independent Claim 1, Bang teaches a computer-implemented method, comprising: 
- receiving, by a computing device, (i) an indication of a user’s interaction with an application on the computing device and (ii) an utterance that corresponds to the user’s interaction with the application (see ¶ 0097, showing execution of voice recognition widget (i), and inputted voice command (ii)).
- transmitting, from the computing device, (i) context data that describes information about the user’s interaction with the application and (ii) audio data for at least a portion of the utterance that corresponds to the user’s interaction with the 
- obtaining, by the computing device, a command for the application to perform an action indicated by the utterance, wherein the command was generated based on (i) performing automated speech recognition on the audio data for at least a portion of the utterance that corresponds to the user’s interaction with the application, and (ii) performing natural language processing on a textual output of the automated speech recognition, including (a) using the context data that describes information about the user’s interaction with the application … and (b) selecting the action … (see ¶¶ 0118-0122, showing that a control event is determined based on the voice command data and information about the current page, and that control event is transmitted to the computing device; see ¶¶ 0339-0340, showing that result data is generated by analyzing the voice data based on natural language processing where voice data can be converted into text data first and then analyzed with natural language processing).
- performing, by the application on the computing device, and in accordance with the command, the particular action indicated by the textual output (see Figs. 22A, 22C-E, ¶¶ 0123, 0340, 0346).
- generating, by the computing device, data that indicates an output of the particular action performed by the application (see Figs. 22A-E, ¶¶ 0338-50). 


With respect to “… identify the application to which the utterance is directed and (b) selecting the action from a plurality of pre-defined actions for the identified application based on matching at least a portion of the textual output of the automated speech recognition to a pre-defined text string for the action,” Bang suggests application-specific commands (see Figs. 22A, 22C-E, ¶¶ 0347-50, showing commands specific to OO Talk) that are performed in response to the user’s voice commands, and a skilled artisan would understand that a command corresponding to a different application would invoke different set of commands specific for that application (see, for example, Fig. 27A, ¶ 0419; in that example the subway application (and commands corresponding to that application) is used rather than the chat application) – in the examples illustrated in Figs. 22A and 27A, search command is used both times but in different applications based on the context data and the user’s utterance.  However, the teachings of Sumner can be relied upon for an explicit showing of this functionality.
Sumner is directed towards structured dictation using intelligent automated assistants (see Sumner, ¶ 0002).  Sumner teaches receiving speech input in the course of, or as a part of, an interaction with a digital assistant (see Sumner, ¶ 0239).  Sumner states that the speech input can be associated with an application, and further teaches that the speech input can include one or more ambiguous terms (i.e., terms that can be interpreted differently by different applications) (see Sumner, ¶¶ 0240, 0242).  Sumner 
Accordingly, a skilled artisan would understand that the natural language processing of input in Bang could have been modified at the time the instant application was filed to explicitly include a determination of an application to which the speech input is directed, as suggested by Sumner, in order to identify a particular domain(s) so that the natural language processing of the speech input is performed more quickly, efficiently, and accurately (see Sumner, ¶¶ 0256-57).


With respect to dependent Claim 2, Bang teaches the method wherein the utterance is spoken by the user into a microphone of the computing device or an additional computing device (¶ 0091; see also ¶ 0516).

With respect to dependent Claim 4, Bang teaches the method wherein: the application that the user interacts with is a foreground application running on the 

With respect to dependent Claim 11, Bang teaches the method wherein: the command comprises one or more data fields, the data fields identifying at least one of a command type, a command sub-type, or a characteristic of an element of digital content associated with the command (see Figs. 22C-E (elements 2204, 2206, 2207)).

With respect to dependent Claim 12, Bang teaches the method wherein: the computing device further comprises a microphone and a speaker; and the method further comprises: presenting, through the speaker, audible content associated with at least one of the utterance or the application; and in response to the presented audible content, receiving an additional utterance spoken by the user into the microphone (see ¶¶ 0091 (microphone), 0661 (speaker), 0389-90 (providing guide information and receiving additional input); although Bang illustrates providing the guide information visually, a skilled artisan would understand that such guide information could be outputted in various, well-known ways, including through speakers (i.e., text to speech), particularly if the computing device’s display was small or not present). 

Claims 13 and 14 these claims reflect a computing device comprising steps and/or features recited in Claims 1 and 2, respectively, and are thus rejected along the same rationale as those claims, above.

With respect to dependent Claim 24, Bang teaches the method wherein the context data that describes information about the user’s interaction with the application comprises data that identifies the application (see ¶ 0279; see also Sumner ¶¶ 0245, 0256).

With respect to dependent Claim 25, Bang teaches the method wherein the context data that describes information about the user’s interaction with the application comprises data that identifies a version or a particular release of the application (see ¶ 0279).

With respect to dependent Claim 26, Bang teaches the method wherein the context data that describes information about the user’s interaction with the application comprises data) that characterizes content presented in a current view of the application (see ¶¶ 0278, 0280).

With respect to dependent Claim 28, Bang teaches the method wherein a format of the command is selected from among a plurality of action-specific formats that correspond to different ones of the plurality of pre-defined actions for the identified application (see ¶¶ 0347-50; see also Sumner, ¶ 0232).

With respect to Claims 29-31 and 33, these claims reflect the computing device comprising steps and/or features recited in Claims 24-26 and 28, respectively, and are thus rejected along the same rationale as those claims, above.


With respect to independent Claim 34, this claim reflects one or more non-transitory storage devices comprising steps and/or features recited in independent Claim 1 and is thus rejected along the same rationale as Claim 1, above.

With respect to dependent Claim 35, Bang in view of Sumner teaches the method of claim 1, as discussed above, and further teaches wherein a natural language processing system that performs the natural language processing on the textual output of the automated speech recognition is configured to generate commands for a plurality of applications, and the natural language processing system is further configured to select different application-specific actions for a same speech recognition result based on which of the plurality of applications a user interacted with to prompt a command generation process (see Bang, Fig. 27A, ¶ 0419, showing an example where the subway application (and commands corresponding to that application) is used rather than the chat application – in the examples illustrated in Figs. 22A and 27A, search command is used both times but in different applications based on the context data and the user’s utterance; 

With respect to dependent Claim 36, Bang in view of Sumner teaches the method of claim 1, as discussed above, and further teaches selecting the action from the plurality of pre-defined actions for the identified application includes mapping the textual output of the automated speech recognition to an event for the identified application; and generating the command comprises extracting data inputs for the event from the textual output of the automated speech recognition (see Sumner, ¶ 0256, showing using look-up tables to identify a domain(s) in order to determine an appropriate command/action; also note that “mapping” is not explicitly mentioned in the instant Specification).


Claims 5 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Bang in view of Sumner, and further in view of Disano et al. (hereinafter Disano), US 2016/0077793 A1, published on March 17, 2016.

	With respect to dependent Claim 5 (and similarly, dependent Claim 16), while Bang in view of Sumner teaches the method of Claim 1, and “presenting, through the corresponding interface of the application, an interface element representing a microphone; receiving input from the user indicative of a selection of the presented interface element; performing operations that activate the microphone in response to 
Disano is directed towards gesture shortcuts for invocation of voice input (see Disano, Title, Abstract).  Disano recognizes a need to provide on-demand controls for voice input associated with desired events without requiring always-on listening mode (see Disano, ¶ 0001).  Disano teaches an “invocation” gesture which activates the microphone to receive a user command (see Disano, ¶¶ 0039-40).  Disano teaches various ways of initiating a voice-to-text session (such as push-to-talk scenario or time-out following a gesture, etc.) and further teaches a visual indicator corresponding to the activation of a dictation session (see Disano, Figs. 7A-C, ¶ 0051; see also Figs. 3A-D and 4, ¶ 0048, showing a floating microphone icon which can be a part of the invocation gesture and which visually indicates activation of a dictation session).
Accordingly, a skilled artisan would understand that the voice commands of Bang and Sumner could have been modified at the time the instant application was filed to explicitly include an “invocation” gesture as suggested by Disano in order to allow the user to select when the system is listening for commands rather than being in always-on .


(2) Response to Argument

Beginning on page 4 of Appellant’s Brief (hereinafter Brief), Appellant argues specific issues, which are accordingly addressed below.  Appellant has chosen the groupings and thus this response follows the same order.  Appellant has also chosen claim 1 to stand and fall for claims 2, 4-5, 11-14, 16, 24-26, 28-31, and 33-36, as no additional arguments are presented for the features of those claims.  Therefore, the examiner will provide arguments to the addressed claims and assumes Appellant has nothing further to present on the claims not argued.

Appellant’s argument that the combination of cited references fails to teach a selection of an “action from a plurality of pre-defined actions for the identified application based on matching at least a portion of the textual output of the automated speech recognition to a pre-defined text string for the action” as recited in independent Claim 1
Appellant argues with respect to the primary reference, Bang, that “the Office Action’s characterization of Bang does not address the specific language recited by claim 1 in which the ‘action’ is selected ‘from a plurality of pre-defined actions for the identified application based on matching at least a portion of the textual output of the automated speech recognition to a pre-defined text string for the action” (see Brief, pg. 
The examiner respectfully disagrees.
It is noted that Bang illustrates application-specific commands (such as “send” or “search” corresponding to chat application and Subway application, respectively) (see Bang, Figs. 22A (element 2200), 27A (element 2700); see also ¶ 0340, describing distinguishing between functions and variables in the command data; ¶ 0350, showing performance of the action corresponding to the “send” function based on provided variables).  While Bang clearly teaches a selection of an action based on the generated command (see ¶ 0350, showing selection of “send” based on processed voice data), Bang does not appear to explicitly discuss “selection … based on matching … to a pre-defined text string for the action,” as recited in claim 1, but a skilled artisan would understand that there are multiple actions that each application can perform and that voice data is processed in Bang in order to determine which action to perform (see Bang, ¶¶ 0340, 0350; see also Fig. 8A, illustrating different actions, such as “1:1 Chat” and “VoiceTalk,” available in an application; see also ¶ 0128, discussing predetermined functions that correspond to at least one of the functions provided by an installed application, such as sending a message, searching, making a call, sending a picture, playing content, etc.).  It follows that a skilled artisan would understand that an utterance, such as “send Kim Myungjun Message ‘Where are you’” (see Fig. 22A 
It is further noted that Sumner was relied upon to illustrate “identif[ication] of the application to which the utterance is directed” and limiting the actions to the actions appropriate for the identified application.  The rejection of claim 1 was based on a combination of Bang and Sumner, but it appears that Appellant is arguing against the references individually, but one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references (See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986)).  Sumner suggests using a look-up table with the metadata associated with the speech input to identify one or more domains corresponding to the identified application (see Sumner, ¶¶ 0244-45, 0256, further showing that the metadata can be generated based on the application corresponding to the speech input).  
Contrary to Appellant’s conclusion in the Brief (see pgs. 6-9), Sumner clearly suggests that each domain can correspond to an action (i.e., an actionable intent) to be performed, such as set reminder, make a reservation, send a message, initiate a phone call, etc. (see Sumner, Figs. 7C, 8A-B, ¶¶ 0219-20; see also ¶ 0222, discussing a “super domain” which includes a plurality of actionable intents, similar to actions determined to be available in an application, as discussed by Bang).  Sumner further 


(3) Conclusion

For the above reasons, it is believed that the rejections should be sustained.
Respectfully submitted,

/DINO KUJUNDZIC/Primary Examiner, Art Unit 2179                                                                                                                                                                                                        
Conferees:
/RENEE D CHAVEZ/Supervisory Patent Examiner, Art Unit 2179                                                                                                                                                                                                        
/ABDULLAH AL KAWSAR/Supervisory Patent Examiner, Art Unit 2171                                                                                                                                                                                                        


Requirement to pay appeal forwarding fee.  In order to avoid dismissal of the instant appeal in any application or ex parte reexamination proceeding, 37 CFR 41.45 requires payment of an appeal forwarding fee within the time permitted by 37 CFR 41.45(a), unless appellant had timely paid the fee for filing a brief required by 37 CFR 41.20(b) in effect on March 18, 2013.