DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Specification
The disclosure is objected to because of the following informalities:
In the Abstract, “A system is provided.” is an unnecessary sentence, but could be consolidated as “A system is provided that includes at least one sentence . . . .”  A new Abstract should be submitted on a separate sheet.
In ¶[0079], “GUI 313” is not illustrated in Figure 3, or any of the drawings, but could be “GUI 310”.
In ¶[0141], “capsule DB 230 of FIG. 2” appears that it should be “capsule DB 230 of FIG. 1”, as there is no reference numeral 230 in Figure 2.
In ¶[0161], “capsule DB 230 of FIG. 2” appears that it should be “capsule DB 230 of FIG. 1”, as there is no reference numeral 230 in Figure 2.
In ¶[0169], “predesignates” should be “predesignated”.
In ¶[0251], “domain form” should be “domain from”.
Appropriate correction is required.





Claim Objections
Claims 1 to 26 are objected to because of the following informalities:  
Independent claims 1 and 13 set forth two limitations that include “when” constructions, which can give rise to issues of indefiniteness.  Here, these “when” constructions set forth conditions precedent that do not have to be met in their conditional limitations, and it becomes unclear how to interpret these limitations if their conditions precedent are not met.  Conceivably, these “when” constructions might be accorded no patentable weight if the conditions precedent are not met.  Applicants should redraft the limitations directed to “when the content does not comprise a business entity” and “when the content corresponds to the at least one domain” so that they set forth positive limitations that are not conditional.
Claims 6 and 18 set forth “an input”, which is ambiguous as to whether it is a same input of independent claims 1 and 13, or whether it is not necessarily an input from a user as set forth by these independent claims.  
Appropriate correction is required.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

s 1 to 6, 9 to 18, 21 to 23, and 25 are rejected under 35 U.S.C. 103 as being unpatentable over Hart et al. (U.S. Patent No. 10,580,408) in view of Liu et al. (U.S. Patent Publication 2017/0286404).
Concerning independent claims 1 and 13, Hart et al. discloses a system and method for speech recognition services, comprising:
“at least one memory; and at least one processor operatively connected to the at least one memory, wherein the at least one memory stores instructions that, when executed, cause the at least one processor to:” – voice-controlled device 106 includes a processor 502 and memory 504, where the memory 504 may include computer-readable storage media, which may be any available physical media accessible by processor 502 to execute instructions stored on the memory (column 16, line 65 to column 17, line 3: Figure 5);
“receive an input from a user selecting at least one domain” – example domains may include shopping, listening to music, calendaring, reminder setting, travel reservations, or to-do list creation (column 2, lines 53 to 56); NLU component may use ASR results and the context to determine multiple different potential intents associated with the user’s request, with each intent being associated with a respective domain; dialogue component may either select a domain or engage in a dialog with a user to determine a domain; dialog component may determine that the user’s request is associated with a ‘shopping domain’; a dialog engine may decide to engage in a dialog with the user to make this determination; a dialog engine may decide to ensure that the user wishes to purchase a DVD rather than a book or other version of the identified content item; dialog component may ask the user the question “Would you like to 
“after the input, receive a user utterance” – dialog component 124 may determine that the user’s speech is associated with the ‘music’ domain, and provides audio for output at device 106, with the audio being associated with question 128(4): “Okay. Would you like to play the channel ‘the String Cheese Incident’ on internet radio or play music from your personal storage?”; user 104 provides a response of reply 128(5): “Internet radio, please” (column 9, lines 7 to 18: Figure 1A); here, “Internet radio, please” is “a user utterance” received “after the input” to select a domain by a user of 
“recognize, content from the utterance” – an audio signal is provided to a speech recognition component; in response to receiving the audio signal, a speech recognition component may perform automatic speech recognition (ASR) on the audio signal to generate ASR results; ASR results may take the form of a list that includes the most likely words or phrases spoken by the user (column 2, lines 39 to 48); servers 112(1)-(P) may have access to and utilize a speech-recognition engine for receiving audio signals from device 106, and recognizing speech (column 7, lines 24 to 27: Figure 1A); in response to receiving an audio signal, speech recognition component 120 may perform automatic speech recognition (ASR) on the audio signal to generate ASR results (column 8, lines 5 to 8: Figure 1A); after device 106 captures the audio and provides a corresponding audio signal to resources 108, resources 108 may determine that user 104 wishes to launch an application that provides internet radio and begin playing the channel entitled, ‘the String Cheese Incident’ (column 9, lines 19 to 23: Figure 1A); when a user says “Internet radio, please”, the words “Internet radio” are “content” of “the utterance” that is ‘recognized’ by automatic speech recognition (ASR);
“when the content does not comprise a business entity, determine whether the content corresponds to the at least one domain” – a limitation of “when the content does not include a business entity” can be construed as a conditional limitation including a condition precedent, where this condition precedent is not met; here, the reference does not expressly disclose anything about content comprising “a business entity”; broadly, 
“when the content corresponds to the at least one domain, processing the content by using the at least one domain to generate a response” – after selecting an intent, a dialog component may perform one or more actions corresponding to the user’s speech; if the speech recognition platform determines that the user has requested a particular channel of a particular internet radio service, then the platform may provide audio to the device, e.g., “I will turn on your station momentarily”, as well as begin streaming the particular channel to the device (column 4, lines 12 to 23); servers 112(1)-(P) may recognize speech and cause performance of an action in response (column 7, lines 24 to 28: Figure 1A); response component 126 may perform a corresponding action, e.g., provide audio for output at device 105 (“I’ll being playing your music shortly”), and begin streaming the channel to device 106 (column 9, lines 27 to 31: Figure 1A); here, “a response” can be construed to include performing the action associated with playing music in a ‘music’ domain, or providing audio for output directed to a ‘music’ domain at device 105 (“I’ll being playing your music shortly”).
Concerning independent claims 1 and 13, Hart et al. could be construed as anticipating all of the limitations of these independent claims.  That is, Hart et al. broadly discloses that a user selects a domain by a spoken input, and that subsequent utterances of spoken input from a user are then received in an interactive dialog to refine an intent within that domain, content of these utterances are recognized by speech recognition, and a response is provided that includes an audio confirmation of an action and a performance of the action within the domain by a service.  Hart et al. Hart et al. does not disclose anything about a “business entity”.  However, this limitation is conditional, and the condition precedent to this limitation is not necessarily met.  Hart et al., then, can be construed to meet this limitation because the conditional limitation is not necessarily a positive limitation when the condition precedent is not met.
Concerning independent claims 1 and 13, even if the conditional limitation of “when the content does not comprise a business entity, determine whether the content corresponds to the at least one domain” were considered to be positive limitations, this limitation is obvious as taught by Liu et al.  Generally, Liu et al. teaches a distributed server system for language understanding that can utilize a distributed network of features extractor on features servers.  (Abstract)  A natural language understanding (NLU) system is responsible to extract semantic frames to represent a natural language input’s domain, intents, and semantic slots (or entities).  (¶[0018])  A ‘places’ domain may have a business name dictionary.  (¶[0020])  Feature extractors 110 may use feature set definitions to extract potential features from received NL input 116.  (¶[0028]: Figure 1)  Specifically, a feature specialty may include a business name extractor.  (¶[0029])  LU decoder 104 uses input features and trained LU models 106 to understand the query or extract semantic meaning from input 116.  LU decoder 104 generates a response 118 to received input based on the determined semantic meaning, where response 118 may be generated to include domain: reminder; intent: create a reminder; slots: reminder; content: call mom; reminder time: tomorrow.  (¶[0032] - ¶[0037])  NLU system 102 may be applied to a specific domain, where the Liu et al., then, teaches determining a domain from input and that content can “comprise a business entity”, but content does not necessarily have to comprise a business entity in every instance, but could use an alternative feature, e.g., locations 212B (“when the content does not comprise a business entity”).  Accordingly, Liu et al. teaches any limitations that might not be disclosed by Hart et al. as directed to these conditional limitations of “content that does not comprise a business entity”.  An objective is to reduce an amount of time, money, and resources to accomplish development of natural language understanding models, and to provide better development, update ability, productivity, and scalability.  (¶[0001] - ¶[0003])  It would have been obvious to one having ordinary skill in the art to determine when content does not include a business entity as taught by Liu et al. to provide speech recognition services in Hart et al. for a purpose of reducing time and resources to develop natural language understanding models and provide better development and scalability.  

Concerning claims 2 and 14, Hart et al. discloses using a natural language understanding (NLU) component to identify multiple different intents, where each intent is associated with a respective domain, e.g., a ‘shopping’ domain or a ‘music’ domain (column 2, line 57 to column 3, line 5); NLU component may use ASR results to determine multiple different potential intents associated with the user’s request, with each intent being associated with a respective domain (column 4, lines 51 to 54).  Similarly, Liu et al. teaches that NLU models (“a first natural language understanding 
Concerning claims 3 and 15, Hart et al. discloses using a natural language understanding (NLU) component to identify multiple different intents, where each intent is associated with a respective domain, e.g., a ‘shopping’ domain or a ‘music’ domain (column 2, line 57 to column 3, line 5); NLU component may use ASR results to determine multiple different potential intents associated with the user’s request, with each intent being associated with a respective domain (column 4, lines 51 to 54).  Hart et al., then, discloses that “the first natural language understanding model” is both “a domain determination model” and “an intent determination model”.  
Concerning claims 4 and 16, Hart et al. discloses that if a dialog component is able to determine a domain with a threshold amount of confidence, then the dialog component may proceed to select a domain (column 3, lines 44 to 46); after receiving N intents, each associated with a particular domain, dialog component 124 may attempt to select a domain most likely associated with the user’s speech; if dialog component 124 can make this determination with a threshold amount of confidence, then component 124 may select a domain (column 8, lines 24 to 33: Figure 1); NLU component 122 provides a list of intents across domains for selecting a domain; dialog engine 230 may determine based on the ranked list and corresponding probabilities associated with the intents, whether dialog engine 230 is able to select a domain with a confidence that is Liu et al. teaches that potential features may include confidence scores for each determined item, where a confidence score is an indicator, e.g., a ranking or percentage, that signifies how accurate or how confident a feature extractor is about an identified item.  (¶[0028]: Figure 1)
Concerning claims 5 and 17, Liu et al. teaches that a natural language understanding (NLU) system is responsible to extract semantic frames to represent a natural language input’s domain, intents, and semantic slots (or entities) (¶[0018]);  potential features are inputted into trained LU models 106 to estimate input features for the NL input 116; LU models 106 are used to understand a query and generate a response 118 to received input based on the determined semantic meaning, where response 118 may be generated to include domain: reminder; intent: create a reminder; slots: reminder; content: call mom; reminder time: tomorrow (¶[0031] - ¶[0037]); here, there are a plurality of natural language understanding models 106 (“a first natural language understanding model” and “a second natural language understanding model”), where each model is used to extract domain, intent (“a user intent”), and semantic slots (“a parameter”).  Similarly, Hart et al. discloses that NLU component may identify intents (“determine at least one of a user intent”) within each of multiple different domains, and may fill slots (or ‘fields’) of the intent (“determine at least one of . . . a parameter by using the second natural language understanding model”) (column 3, lines 6 to 12); each domain is associated with a slot filter 226, which utilizes a received context to fill one or more slots associated with a particular intent (column 12, lines 4 to 20: Figure 2).
Liu et al. teaches a method that includes retrieving training features and estimating model parameters based on a training algorithm to form a trained language understanding model (¶[0004]); an NLU system utilizes NLU models that are usually trained from domain specific inputs with semantic annotations (¶[0018]); LU models 106 are trained for a specific task based on the type of input signal 116 that the NLU system 102 received; NLU system 102 sends a request for training input for the specific task to the one or more feature extractors 110 on the feature servers 108 (¶[0025] - ¶[0026]: Figure 1); training features may include items of client intent, a domain, and entities (¶[0046]: Figure 3); here, Liu et al. teaches that LU models are being trained in a domain-specific way, which implies that there is some “input” for “selecting the at least one domain”; that is, “an input” does not necessarily have to be from a user, but can be from the system for training features that are annotated for domain specific inputs.
Concerning claims 9 and 21, Hart et al. discloses that remote computing resources 108 may provide audio for output back to the device to aid in determining a domain associated with the user’s speech by asking the user question 128(2): “Are you wishing to shop or listen to music?”; remote computing resources 108 is attempting to determine whether the user’s speech should be associated with a ‘music’ domain or a ‘shopping’ domain (column 8, line 60 to column 9, line 3: Figure 1A).  Broadly, this audio output is “a user interface” that “comprises a guide to select the at least one domain”.  That is, an audio output is an audio interface that aids (‘guides’) a determination of a domain by prompting the user to select a domain.  Remote computing resources may interact with applications hosted by one or more third-party services 130, where e.g. internet radio for providing music to device 106, a reminder application for providing reminders to device 106, and a weather application for providing weather forecasts to device 106 (“to select . . . a service”).  (Column 14, Lines 28 to 34: Figure 1A)  
Concerning claim 10, Hart et al. discloses that client computing devices 202 may be smart phones (“a mobile terminal”), tablet computing devices, desktop computers (“a stationary terminal”), etc.  (Column 9, Lines 55 to 59: Figure 1B)
Concerning claims 11 and 22, Hart et al. discloses that if a dialog component is able to determine a domain with a threshold amount of confidence, then the dialog component may proceed to select a domain (column 3, lines 44 to 46); after receiving N intents, each associated with a particular domain, dialog component 124 may attempt to select a domain most likely associated with the user’s speech; if dialog component 124 can make this determination with a threshold amount of confidence, then component 124 may select a domain (column 8, lines 24 to 33: Figure 1); NLU component 122 provides a list of intents across domains for selecting a domain; dialog engine 230 may determine based on the ranked list and corresponding probabilities associated with the intents, whether dialog engine 230 is able to select a domain with a confidence that is greater than a predefined threshold (column 12, lines 32 to 43: Figure 2).  Given that there are a plurality of domains that are ranked and compared to a threshold, this implies “determine whether a first confidence score is greater than a first threshold associated with a first domain” and “determine whether a second confidence score is Liu et al. teaches “when the content includes a business entity”.   (¶[0020]; ¶[0028] - ¶[0029]; ¶[0039] - ¶[0040]: Figure 2)
Concerning claims 12 and 23, Hart et al. discloses that remote computing resources may interact with applications hosted by one or more third-party services 130, where services 130 may include a music application that resources 108 utilize to cause the requested music channel to be streamed to device 106.  (Column 9, Lines 32 to 36: Figure 1A)  Third-party services 130 may comprise music applications, e.g. internet radio for providing music to device 106, a reminder application for providing reminders to device 106, and a weather application for providing weather forecasts to device 106 (“wherein the at least one domain is related to a type of service provided to the user to perform a user intent included in the user utterance or a subject that provides the service”).  (Column 14, Lines 28 to 34: Figure 1A)
Concerning claim 25, Liu et al. teaches “when the content includes a business entity” and “a first natural language understanding model associated with the business entity”.   (¶[0020]; ¶[0028] - ¶[0029]; ¶[0039] - ¶[0040]: Figure 2)  Hart et al. discloses NLU component may use ASR results and the context to determine multiple different e.g., “I will turn on your station momentarily”, as well as begin streaming the particular channel to the device (“performing an action based on the user intent and the parameter”) (column 4, lines 12 to 23).

Claims 7, 19, and 24 are rejected under 35 U.S.C. 103 as being unpatentable over Hart et al. (U.S. Patent No. 10,580,408) in view of Liu et al. (U.S. Patent Publication 2017/0286404) as applied to claims 1 to 6, 13 to 15, and 18 above, and further in view of Wu et al. (U.S. Patent Publication 2008/0215320).
Concerning claims 7 and 19, Liu et al. teaches training natural language understanding models in a way that is specific to domains, but omits a procedure directed to “rule information” in the limitations of “acquire rule information corresponding to at least one domain” and “train the second natural language understanding model by Wu et al.  Specifically, Wu et al. teaches an apparatus and method to reduce recognition errors through context relations among dialogue turns, where a rule set storage unit has a rule set containing one or more rules.  (Abstract)  An evolutionary rule generation module performs an evolutionary adaptation process to train the rule set using a dialogue log (dialogue history) as training data.  (¶[0014])  Evolutionary rule generation module 202 performs evolutionary adaptation from dialogue log 221 to train rule set 211.  Rule trigger unit 205 is connected to rule storage unit 201, and rule trigger 205, according to trained rule set 211 and dialogue history 223 of N previous dialogue turns, selects at least a rule 215a, and corresponding confidence measure 215b from trained rule set 211, to provide an ASR system 225 for reevaluating the recognition result.  (¶[0032]: Figure 2A)  Wu et al., then, teaches the concept of training a model using rule information that is applicable to training a natural language understanding model specific to a domain of Liu et al.  An objective is to reduce recognition errors among dialogue turns.  (¶[0001] and ¶[0012])  It would have been obvious to one having ordinary skill in the art to acquire rule information to train a model as taught by Wu et al. in training a natural language understanding model associated with a domain of Liu et al. for a purpose of reducing recognition errors among dialogue turns.
Concerning claim 24, Wu et al. teaches an evolutionary rule generation module performs an evolutionary adaptation process to train the rule set using a dialogue log (dialogue history) as training data.  (¶[0014])  Evolutionary rule generation module 202 performs evolutionary adaptation from dialogue log 221 to train rule set 211.  Rule a, and corresponding confidence measure 215b from trained rule set 211 to provide an ASR system 225 for reevaluating the recognition result.  (¶[0032]: Figure 2A)  Wu et al., then, teaches “generating training data based on use history information” and “applying the training data” to train a model “based on the rule information.”

Claims 8 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Hart et al. (U.S. Patent No. 10,580,408) in view of Liu et al. (U.S. Patent Publication 2017/0286404) as applied to claims 1 and 13 above, and further in view of Leeb (U.S. Patent Publication 2018/0261216).
Hart et al. discloses that an NLU component identifies multiple different intents associated with multiple different domains, and may rank the intents based on one or more criteria to provide a ranked list of intents, where the ranked list of intents is provided to a coordination component, which may attempt to determine which domain the user is most likely requesting to operate within.  (Column 3, Lines 25 to 40)  After identifying multiple intents, an NLU component may rank the intents and provide a ranked list of intents to orchestration component, which in turn may provide this list to a dialog component, which may either select a domain or engage in a dialog with the user to determine a domain.  (Column 4, Lines 51 to 67)  In response to receiving the list, an orchestration component provides the list to a dialog component for selecting a domain.  (Column 5, Lines 43 to 45)  Hart et al., then, clearly discloses generating “a list of Hart et al. does not clearly disclose “wherein the input selects the at least one domain from a list of domains”.  Still, Hart et al. could be construed to arguably disclose this limitation because the ‘list’ is not expressly disclosed as being displayed to a user for selection.  That is, “the input” is a user’s response to select one of the domains, e.g., a ‘music’ domain, and there is a list of ranked domains generated by Hart et al., even if this list is not displayed to the user for selection.  
Anyway, Leeb expressly teaches a speech-enabled system with domain disambiguation, where a user may provide disambiguation information that can be used to choose the best domain for a domain of conversation representing a subject area.  (Abstract; ¶[0025])  A speech-enabled system may determine one or more domains of conversation in the context of which the utterance is sensible.  Each interpretation is evaluated for how sensible it is within the domain, where a system would assign a high relevancy score in a weather domain but a low relevancy score in a cooking domain for an utterance, “It’s raining!”  A relevancy score assigned to each domain is compared to a threshold associated with the domain.  (¶[0028] - ¶[0030]: Figure 1A)  One embodiment presents a list of candidate domains to the user, and asks the user to choose a domain.  A list may simply name the domains, “Did you mean music or politics?”  (¶[0035] - ¶[0036]: Figure 1B)  Leeb, then, teaches these limitations directed to “the input selects the at least one domain from a list of multiple domains classified according to a selected standard”, where this list is described in a manner equivalent to a question “Are your wishing to shop or listen to music?” of Hart et al.  Accordingly, Hart et al.’s question “Are you wishing to shop or listen to music?” is equivalent to “a list of multiple domains” for selection by user input as taught by Leeb.  An objective is to perform disambiguation of a spoken query.  (Abstract; ¶[0001])  It would have been obvious to one having ordinary skill in the art to provide input from a user to select a domain from a list of domains as taught by Leeb in speech recognition services of Hart et al. for a purpose of disambiguating a spoken query.

Claim 26 is rejected under 35 U.S.C. 103 as being unpatentable over Hart et al. (U.S. Patent No. 10,580,408) in view of Liu et al. (U.S. Patent Publication 2017/0286404) as applied to claims 13 and 25 above, and further in view of Stern et al.  (U.S. Patent Publication 2017/0084270).
Hart et al. discloses that remote computing resources may interact with applications hosted by one or more third-party services 130, where services 130 may include a music application that resources 108 utilize to cause the requested music channel to be streamed to device 106.  (Column 9, Lines 32 to 36: Figure 1A)  Third-party services 130 may comprise music applications, e.g. internet radio for providing music to device 106, a reminder application for providing reminders to device 106, and a weather application for providing weather forecasts to device 106.  (Column 14, Lines 28 to 34: Figure 1A).  Moreover, Hart et al. discloses a natural language understanding component 122, and Liu et al. teaches natural language understanding models 106 and business entities 212A.  However, Hart et al. and Liu et al. do not clearly disclose or teach “receiving the first natural language understanding model from the business entity or a third party that is designated to provide the model on behalf of the business entity.”  Hart et al., and models associated with these applications could be downloaded from a plurality of third party sources.   
Specifically, Stern et al. teaches fetching speech processing models from remote speech models when installing or removing an app.  (Abstract)  NLU models are usually trained for each task domain, e.g., dictation, SMS, and web search.  (¶[0006])  One embodiment relates to an installation, update, or removal of apps on a local device.  If a user downloads a shopping app, then the speech model manager on local device 102 can check for an existing local language model 118 that covers related product names, and if not the speech model manager can download the appropriate remote speech model 116.  (¶[0020]: Figure 1)  Stern et al., then, suggests that speech models can be downloaded corresponding to a shopping application to provide models of product names.  Implicitly, a shopping application for products is associated with a business, so downloading models relating to product names at least provides for “receiving the natural language understanding model from . . . a third party that is designated to provide the model on behalf of the business entity.”  An objective is to predict which speech processing models are needed based on context changes, and to increase practicality of embedded speech technology on local devices having limited storage space.  (¶[0012] and ¶[0015])  It would have been obvious to one having ordinary skill in the art to receive a natural language understanding model from a business entity or third party that is designated to provide the model on behalf of the business entity as taught by Stern et al. to provide speech recognition services of Hart et al. for a purpose 

Conclusion
The prior art made of record and not relied upon is considered pertinent to Applicants’ disclosure.
Kannan et al. and Kakirwar et al. disclose related prior art.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARTIN LERNER whose telephone number is (571) 272-7608.  The examiner can normally be reached on Monday-Thursday 8:30 AM-6:00 PM.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571) 272-5551.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access 






/MARTIN LERNER/Primary Examiner
Art Unit 2657                                                                                                                                                                                                        March 1, 2021