PNG
    media_image1.png
    172
    172
    media_image1.png
    Greyscale
United States Patent and Trademark Office    
        
            
                                
            
        
    

Commissioner for Patents
United States Patent and Trademark Office
P.O. Box 1450
Alexandria, VA 22313-1450
www.uspto.gov











BEFORE THE PATENT TRIAL AND APPEAL BOARD


Application Number: 16/519736
Filing Date: 03/01/2021
Appellant(s): Joseph Kessler, Suresh Bellam, Andre Coetzee, Dan Verdeyen.



__________________
Noah Tilton For Appellant


EXAMINER’S ANSWER





This is in response to the appeal brief filed 03/01/2021 appealing from the Office action mailed 08/25/2020.
(1) Grounds of Rejection to be Reviewed on Appeal
Every ground of rejection set forth in the Office action dated 08/25/2020 from which the appeal is taken is being maintained by the examiner except for the grounds of rejection (if any) listed under the subheading “WITHDRAWN REJECTIONS.”  New grounds of rejection (if any) are provided under the subheading “NEW GROUNDS OF REJECTION.”
The following ground(s) of rejection are applicable to the appealed claims.


Claim Rejections - 35 U.S.C. § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims1-4, 7-12, 16-18, and 20 are rejected under AIA  35 U.S.C §103 as being unpatentable over Yu et al. (US 2013/0246050 A1, hereinafter Yu) in view of WILPON et al. (US 2009/0187410 A1, hereinafter WILPON) in view of Ashoori et al. (US 2018/0366144 A1, hereinafter Ashoori).

claims 1, 9, and 16, Yu teaches a voice control hub (Paragraph [0051], FIG. 2 is a basic block diagram illustrating an exemplary user equipment 
(UE) 10 device; Paragraph [0053], the voice controlled UE 10 comprises a microphone 30 via which voice commands are input) computing system for performing a task within a  software application (paragraph [0054], 2A shows an alternative embodiment to that shown in FIG. 2 in which a custom speech/NLP engine 152, 154 is included in the application 110 and deployed as part of the application 110 on the UE 10; application 110 is the business software application used by an enterprise, the task is the  custom speech based actions), comprising 
one or more processors, and a memory containing instructions that, when executed (Fig. 2, processor 40, memory 100 of UE 10), cause the voice control hub computing system to: 
receive a handler registration request specifying an object handler to respond to voice commands (paragraph [0017], a registration mechanism that permits an association to be formed between an action-context pair and a handler in the voice controlled application; paragraph [0073], FIGS. 9 and 9A together show a flow chart for one embodiment of voice command dispatching.  It collects speech input from the user S905, send the speech to the backend engine S910, receives an answer from the backend engine S915, and tries to assemble a context according to the user's requirement before triggering the corresponding handler; wherein the speech input is voice command),
receive an utterance of a user of the software application (paragraph [0071], the user might say, "Give me directions from 100 E Main St.  to 152 W. Elm St."; paragraph [0054], 2A shows an alternative embodiment to that shown in FIG. 2 in which a custom speech/NLP engine 152, 154 is included in the application 110 and deployed as part of the application 110 on the UE 10; application 110 is the business software application used by an enterprise, and the custom speech engine receives user’s utterance), 
transmit the utterance of the user to a remote cloud services layer (paragraph [0051], an exemplary user equipment (UE) 10 device and an exemplary backend engine 150 on the Cloud according to an embodiment of the invention; paragraph [0071], The voice control backend engine 150, including the NLP routines 154 and speech engine 152, interpret the speech and produce a command; the back engine is by the cloud service), 
analyze to generate an intent and an entity corresponding to a task (paragraph [0071], The voice control backend engine 150, including the NLP routines 154 and speech engine 152, interpret the speech and produce a command or a command list (if multiple commands are present) 65 comprising an action (Action X="Give me directions" command); the action is the task), 
receive the intent and the entity from the remote cloud services layer, wherein the intent is associated with the entity (paragraph [0071], a command or a command list (if multiple commands are present) 65 comprising an action (Action X="Give me directions" command) along with the context; the action includes the entity and the intent, wherein “me” is the entity, and the “direction” is the intent; paragraph [0088], The custom speech engine S640 can be provided either on the Cloud or shipped to the UE 10 together with the application), and 
(paragraph [0071], Based on the registered actions for the application, the action is located in the registration table 250 and control is then passed to the appropriate handler 112).
Yu does not teach:
an enterprise business software application; 
convert the utterance of the user to a text string representing speech-to-text output using a custom speech model,
analyze the text string using one or more trained machine learning models to generate an intent corresponding to the task.
WILPON teaches:
convert the utterance of the user to a text string representing speech-to-text output using a custom speech model (paragraph [0043], A wireline component 244 communicates with an automatic speech recognition server that includes profiles, models and grammars 236 for converting audio into text.  This server represents a public, common network node.  The profiles, models and grammars may be custom tailored for a particular user as would be known in the art.  For example, the profiles, models and grammars may be trained for a particular user and periodically updated and improved),
analyze the text string using one or more models to generate an intent corresponding to the task (paragraph [0005], The text can be transmitted to a spoken language understanding (SLU) module which will seek to identify the intent or the purpose of the words spoken by the user).
to improve on the ability of individuals and companies to create voice enabled services over a network (WILPON, paragraph [0005]).
Yu/WILPON does not teach:
an enterprise business software application;
analyze the text string using one or more trained machine learning models.
Ashoori teaches:
an enterprise business software application (paragraph [0029], server 150 may transcribe the audio data 124 to text using known transcription technology 204, One (non-limiting) example of such transcription technology, is the IBM Watson Speech To Text Service.TM.; paragraph), 
analyze the text string using one or more trained machine learning models (paragraph [0030], server 150 may analyze 206 the transcribed text of the patient's speech to determine one or more attributes of the patient's speech; paragraph [0026], server 150 may be configured to analyze both structured and unstructured data by applying advanced natural language processing, , and machine learning technologies; paragraph [0032], the machine learning module 208 may be trained using supervised learning where the training inputs and testing feature vectors may have the same elements).
Since Yu/WILPON teaches a method of voice control handler generating action based on an utterance of a user, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate an enterprise business software application, and to analyze the text string using one or more trained machine learning models, as taught by Ashoori, as the prior arts are in the same application field of user voice based speech recognition and system response, and Ashoori further teaches analyzing text by machine learning. By incorporating Ashoori into Yu/WILPON would improve the integrity of Yu/WILPON’s system by allowing to analyze the past pattern of a patient's mood swings to predict future behaviors (Ashoori, paragraph [0032]).


As to dependent claims 2, 10, and 17, the rejection of claim 1 is incorporated. Yu 
teaches the voice control hub computing system of claim 1, including further instructions that, when executed, cause the voice control hub to: receive the text string representing speech-to-text output from the remote cloud services layer (paragraph [0051], an exemplary user equipment (UE) 10 device and an exemplary backend engine 150 on the Cloud according to an embodiment of the invention; paragraph [0012], Solutions on speech-to-text APIs (e.g., Dragon Mobile, MaCaption) help developers to translate speech input into text.  Developers use an API to feed speech input to the speech engine behind the solution, which in turn returns a piece of text (e.g., "How to I get home?")).


As to dependent claims 3, 11, and 18, the rejection of claim 1 is incorporated. Yu 
teaches the voice control hub computing system of claim 1, including further instructions that, when executed, cause the voice control hub to: receive a channel subscription specifying a channel and one or both of (i) the intent type, and (ii) the entity type (paragraph [0033], In a preferred embodiment of the framework, developers are given a 
specification and/or library categorizing common natural language sentences 
(e.g., "drive to", "how do I get to", "find direction to", .  . . ) into actions (commands in an abstract format; e.g., the above sentences may all belong to a single action called "direction"); the categorized voice input is the “direction” channel with the “direction” as the intent type), and 
based on the channel subscription, dispatch the intent and the entity to the channel (paragraph [0033], developers may use an API to connect actions with code for handling them (e.g., code for getting direction)).


As to dependent claims 4, and 12, the rejection of claim 1 is incorporated. Yu 
teaches the voice control hub computing system of claim 1, wherein the object handler is a dynamically compiled function (paragraph [0018], an execution element that 
executes a specific VCA handler at the VCA handler execution address associated 
with the run-time action-context pair).


As to dependent claims 7, and 20, the rejection of claim 1 is incorporated. Yu 
teaches the voice control hub computing system of claim 1, including further instructions that, when executed, cause the voice control hub to: set a value in a global context visible to the object handler (paragraph [0071], The context is then used to instantiate the parameter of the handler when the handler is executed.  FIG. 4 shows that a context is a list of entries 260. Each entry can comprise a name N 262 and a value V 264; a name N 262 or a value V 264 is the set value).


As to dependent claim 8, the rejection of claim 1 is incorporated. Yu teaches the voice control hub computing system of claim 1, wherein the voice control hub computing system is packaged as a shared object that the software application can access to enable voice functionality in the application (paragraph [0071], In FIG. 3, when the user 5 is executing the exemplary navigation application, he speaks a voice command 60 that the developer has designed to be a part of the voice-controlled navigation application; wherein the navigation application is an application can access to enable voice functionality; paragraph [0054], 2A shows an alternative embodiment to that shown in FIG. 2 in which a custom speech/NLP engine 152, 154 is included in the application 110 and deployed as part of the application 110 on the UE 10; application 110 is the business software application used by an enterprise, the task is the  custom speech based actions).
Yu/WILPON does not teach:
an enterprise business software application.
Ashoori teaches:
an enterprise business software application (paragraph [0029], server 150 may transcribe the audio data 124 to text using known transcription technology 204, One (non-limiting) example of such transcription technology, is the IBM Watson Speech To Text Service.TM.; paragraph). 
Since Yu/WILPON teaches a method of voice control handler generating action based on an utterance of a user, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate an enterprise business software application, as taught by Ashoori, as the prior arts are in the same application field of user voice based speech recognition and system response, and Ashoori further teaches enterprise application. By incorporating Ashoori into Yu/WILPON would improve the integrity of Yu/WILPON’s system by allowing to analyze the past pattern of a patient's mood swings to predict future behaviors (Ashoori, paragraph [0032]).

Claims 5, and 13 are rejected under AIA  35 U.S.C §103 as being unpatentable over Yu et al. (US 2013/0246050 A1, hereinafter Yu) in view of WILPON et al. (US 2009/0187410 A1, hereinafter WILPON) in view of Ashoori et al. (US 2018/0366144 A1, hereinafter Ashoori) in view of CHUN et al. (US 2019/0348044 A1, hereinafter CHUN).

As to dependent claims 5, and 13, the rejection of claim 1 is incorporated. Yu/WILPON/Ashoori does not teach the voice control hub computing system of claim 1, wherein the utterance of the user is received in response to a wake word utterance of the user.
	CHUN teaches:
the utterance of the user is received in response to a wake word utterance of the user (paragraph [00163], The controller 110 may execute a speech recognition function using a wake word uttered by the user).
Since Yu/WILPON/Ashoori teaches a method of voice control handler generating action based on an utterance of a user, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the utterance of the user is received in response to a wake word utterance of the user, as taught by CHUN, as the prior arts are in the same application field of user voice based speech recognition and system response, and CHUN. By incorporating CHUN into Yu/WILPON/Ashoori would expand the utility of Yu/WILPON/Ashoori’s system by allowing to display a speech recognition user interface (UI) 250 on the display 120 based on a predetermined wake word being input via the microphone (CHUN, paragraph [0162]).


Claims 6, 14-15, and 19 are rejected under AIA  35 U.S.C §103 as being unpatentable over Yu et al. (US 2013/0246050 A1, hereinafter Yu) in view of WILPON et al. (US 2009/0187410 A1, hereinafter WILPON) in view of Ashoori et al. (US 2018/0366144 A1, hereinafter Ashoori) in view of Hiroe et al. (US 20060020473 A1, hereinafter Hiroe).

As to dependent claims 6, 14, and 19, the rejection of claim 1 is incorporated. Yu/WILPON/Ashoori does not teach the voice control hub computing system of claim 1, including further instructions that, when executed, cause the voice control hub to: 
synthesize a speech response to the user responsive to the utterance of the user, and 
cause the speech response to be output in an audio speaker of a computing device of the user.
	Hiroe teaches further instructions that, when executed, cause the voice control hub (paragraph [0065], This voice dialogue system includes a microphone 1, a speech recognizer 2, a controller 3, a response generator 4, a speech synthesizer 5 and a speaker 6, which are configured to interact via voice with a user) to:
synthesize a speech response to the user responsive to the utterance of the user (paragraph [0066], The microphone 1 converts a voice (speech) uttered by a user or the like into a voice signal in the form of an electric signal and supplies it to the speech recognizer 2; paragraph [0076], the speech synthesizer 5 produces a voice signal corresponding to the response sentence supplied from the controller 3 by using a speech synthesis technique such as speech synthesis by rule), and 
(paragraph [0076], the speech synthesizer 5 supplies the resultant voice signal to the speaker 6).
Since Yu/WILPON/Ashoori teaches a method of voice control handler generating action based on an utterance of a user, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the voice control hub to: synthesize a speech response to the user responsive to the utterance of the user, and cause the speech response to be output in an audio speaker of a computing device of the user, as taught by Hiroe, as the prior arts are in the same application field of user voice based speech recognition and system response. 
By incorporating CHUN into Yu/WILPON/Ashoori would expand the utility of Yu/WILPON/Ashoori’s system by allowing a conclusive response sentence is output in response to the input sentence (CHUN, paragraph [0032]).

As to dependent claim 15, the rejection of claim 1 is incorporated. Yu/WILPON/Ashoori does not teach the computer-implemented method of claim 14, wherein synthesizing the speech response to the user responsive to the utterance to the user is part of a multi-turn interaction with the user.
Hiroe teaches synthesizing the speech response to the user responsive to the utterance to the user (paragraph [0076], the speech synthesizer 5 produces a voice signal corresponding to the response sentence supplied from the controller 3 by using a speech synthesis technique such as speech synthesis by rule, and the speech synthesizer 5 supplies the resultant voice signal to the speaker 6) is part of a multi-turn (paragraph [0079], In addition to or instead of outputting, from the speaker 6, a voice corresponding to a response sentence supplied from the controller 3, the response sentence may be displayed on a display or may be projected on a screen using a projector; the output response could be presented to user either is voice or other display format).
Since Yu/WILPON/Ashoori teaches a method of voice control handler generating action based on an utterance of a user, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate synthesizing the speech response to the user responsive to the utterance to the user is part of a multi-turn interaction with the user, as taught by Hiroe, as the prior arts are in the same application field of user voice based speech recognition and system response. By incorporating CHUN into Yu/WILPON/Ashoori would expand the utility of Yu/WILPON/Ashoori’s system by allowing a conclusive response sentence is output in response to the input sentence (CHUN, paragraph [0032]).


(2) Response to Argument

Rejection of Claims 1, 9, and 16 under 35 U.S.C. 103 as being unpatentable over Yu, in view of WILPON and in further view of Ashoori

Independent Claims 1, 9, and 16:

1) the Advisory action dated November 10, 2020 mischaracterizes the state of the art by giving the interpretation of “trained machine learning model” (See Brief, page 8 3rd paragraph - page 9 1st paragraph).
2) the office action fails to resolve Graham factor by combining prior art Ashoori with Yu/WILPON (See Brief, page 9 2nd -3rd paragraph).
3) the office action does not clearly identify KSR rationale by combining prior art Ashoori into Yu/WILPON (See Brief, page 9 4th paragraph).
4) the combination of prior art Ashoori with Yu/WILPON is distinct from the claim invention, that is, the element “receiving an utterance of a user of the enterprise business software application, converting the utterance of the user to a text string representing speech-to-text output using a custom speech model, analyzing the text string using one or more trained machine learning models to generate an intent and an entity corresponding to a task” was not taught, and Ashoori’s cited rationale is not relevant to the claimed “user utterance task of an enterprise business software application” (See Brief, page 10 1st – page 11 2nd paragraph).
rd paragraph).
6) the 103 prior art rejection of Yu, in view of WILPON and in further view of Ashoori is improper based the reasons above listed in 1) - 5)  (See Brief, page 12 1st paragraph).

Examiner disagrees.
Regarding Claims 1, 9, and 16, Yu teaches a method of receiving a user utterance, and to analyze the user utterance to generate an intent and an entity corresponding to a task (Yu, paragraph [0071]). Yu set up a speech receiving and intent analysis method framework, which is consistent with the main idea of the invention. Yu does not teach elements of “an enterprise business software application”, “convert the utterance of the user to a text string representing speech-to-text output using a custom speech model”, and “analyze the text string using one or more trained machine learning models to generate an intent corresponding to the task”.
As the combination prior art, WILPON teaches to convert the utterance of the user to a text string representing speech-to-text output using a custom speech model, wherein the spoken language understanding (SLU) module is the speech model specifically tailored or machine trained to converting speech into text (WILPON, paragraph [0043]), and also to identify the intent or the purpose of the words spoken by the user (WILPON, paragraph [0005]). In addition, Ashoori teaches an enterprise business software application which provides speech training (paragraph [0029]), and 

Regarding argument 1), the Advisory action dated November 10, 2020 gave examiner’s interpretation of the element “trained machine learning model”. Machine learning is a broad concept of data analysis that automates analytical model building by computer. Ashoori teaches analyzing the patient's speech text by applying advanced natural language processing, and machine learning technologies to determine some attributes of the speech text data (Ashoori, paragraph [0030]), and specifically, Ashoori teaches that the machine learning module 208 is trained using supervised learning with the training inputs to generate output data, wherein the training model is set up by the system defined input/output model (Ashoori, paragraph [0032]). As the conclusion, Ashoori teaches analyzing received user speech text using the trained machine learning module.

Regarding argument 2), as for the rationale to combine the prior art Ashoori with Yu/WILPON, Ashoori is a speech text analysis based prior art even though Ashoori’s application field is medical related. Thus Ashoori could be used to combine with Yu for speech data analysis features as they are from the same application field. Ashoori allows the system to analyze the past pattern of a patient's mood swings to predict future behaviors (Ashoori, paragraph [0032]). Analyzing the past pattern is to set up a training model using collected user data as the data input for the trained machine 

Regarding argument 3), as for the rationale to combine the prior art Ashoori with Yu/WILPON, Ashoori could be classified as a speech text analysis field related prior art even though Ashoori’s application field is medical based. Thus Ashoori could be used to combine with Yu for speech data analysis features as they are from the same application field of user input text analysis. Ashoori’s cited rationale “to analyze the past pattern of a patient's mood swings to predict future behaviors” provides a specific function of the system as taught by Ashoori’s machine learning models, which is a text intent type application by machine learning system. Ashoori’s cited rationale reasonably explains why the prior art Ashoori is combined with Yu/WILPON for the machine learning element teaching. 

Regarding argument 4), for the element “receiving an utterance of a user of the enterprise business software application, converting the utterance of the user to a text string representing speech-to-text output using a custom speech model, analyzing the text string using one or more trained machine learning models to generate an intent and an entity corresponding to a task”, first the primary prior art Yu teaches to receive an utterance of a user of the software application, transmit the utterance of the user to a remote cloud services layer, and to analyze to generate an intent and an entity corresponding to a task (Yu, paragraph [0051], [0054], paragraph [0071]). Yu does not teach an enterprise business software application, and to convert the utterance of the 

Regarding argument 5), Yu, WILPON and Ashoori focus on different aspects of speech analysis.  Yu set up a speech receiving and intent analysis framework, WILPON teaches to convert the utterance of the user to a text string representing speech-to-text output using a custom speech model, and Ashoori teaches to analyze the text string using one or more trained machine learning models. All three prior arts relate to the application field of speech recognition and analysis. Even Ashoori is medical based application, Ashoori’s teaching is to analyze the patient’s speech for intent purpose related. The speech analysis framework set up by primary art Yu reads on the main idea of the invention, with WILPON and Ashoori teaches some elements which does not fundamentally changes Yu’s method sequence. Thus the combination of the prior arts is not hindsight.

Regarding argument 6), Yu, WILPON and Ashoori reasonably combines for claims 1, 9 and 16 based the reasons above listed in 1) - 5). 

For the above reasons, it is believed that the rejections should be sustained.
Respectfully submitted,


Examiner, Art Unit 2143	

Conferees:
/JENNIFER N TO/           Supervisory Patent Examiner, Art Unit 2143                                                                                                                                                                                             
/ABDULLAH AL KAWSAR/           Supervisory Patent Examiner, Art Unit 2171                                                                                                                                                                                             

Requirement to pay appeal forwarding fee.  In order to avoid dismissal of the instant appeal in any application or ex parte reexamination proceeding, 37 CFR 41.45 requires payment of an appeal forwarding fee within the time permitted by 37 CFR 41.45(a), unless appellant had timely paid the fee for filing a brief required by 37 CFR 41.20(b) in effect on March 18, 2013.