Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Claims 1-20 are pending. Claims 1, 16, and 20 are independent and are all method Claims.   Claims 16-20 are withdrawn pursuant to an election by the Applicant on 10/10/2022.  In the initial submission, there were two successive Claims numbered 4 and in the most recent submission the second one’s number has been changed to 5.  Technically, a Claim number cannot be changed once submitted and the way to correct an error in claim numbering is to cancel both and present them at the end with new numbers.  Claims 1-15 are pending and under examination of which Claim 1 is independent.
This Application was published as U.S. 2022/0157318.
            Apparent priority: 13 November 2020.

	The instant Application is directed to Device Arbitration and Cross-Device Task Execution using Pooled Capabilities of several devices.

	Inventor as his own lexicographer:  The instant Application includes numerous references to “Warm” in the Specification and Drawings.  “Warm,” as in “warm words” or “warm cues,” refers to commands/keywords that are recognized by the personal digital assistant systems.  “[0036] … Notably, detecting the occurrence of a warm cue causes a particular action to be performed even when the detected occurrence is not preceded by any wake cue. Accordingly, when a warm cue is a particular word or words, a user can simply speak the word(s), without needing to provide any wake cue(s), and cause performance of a corresponding particular action.”  “[0037] As one example, "stop" warm cue(s) can be active at least at times when a timer or alarm is being audibly rendered at assistant device 110A via automated assistant 120A. For instance, at such times the warm cue(s) engine 127A can continuously (or at least when VAD engine 128A1 detects voice activity) process a stream of audio data frames that are based on output from one or more microphones of the client device 110A, to monitor for an occurrence of "stop", "halt", or other limited set of particular warm word(s)…..”  “[0038] As another example, "volume up", "volume down", and "next" warm cue(s) can be active at least at times when music is being audibly rendered at assistant device 110A via automated assistant 120A….”
Claim Objections
Claim 8 is objected to because of informalities that may be addressed with the following suggested amendments: 
8. The method of claim 1, 
wherein the corresponding subset locally stored on the first assistant device comprises one or more pre-adaptation natural language understanding models that are utilized in performing semantic analysis of natural language input, 
wherein the one or more initial natural language understanding models occupy a first quantity of local disk space at the first assistant device; and
wherein the corresponding subset locally stored on the first assistant device comprises one or more post-adaptation natural language understanding models that include at least one additional natural language understanding model that is in addition to the one or more initial natural language understanding models, 
wherein the one or more post-adaptation natural language understanding models occupy a second quantity of the local disk space at the first assistant device, the second quantity being greater than the first quantity. 

Appropriate correction is required.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-3 and 8-13 are rejected under 35 U.S.C. 103 as being unpatentable over Carbune (U.S. 20180182397) in view of Slayton (U.S. 20150215350).
Regarding Claim 1, Carbune teaches:
1. A method implemented by one or more processors, the method comprising: [There is no specified place for execution of the steps of the method.]
generating an assistant device group of disparate assistant devices, the disparate assistant devices including at least a first assistant device and a second assistant device, [Carbune, Figure 1, “Device Group: Alice Phone, Bob Phone 110, 112.”  The two “computing devices 104a, 104b” as well as Figure 2, “identify a second computing device that is configured to respond to a particular, predefined hotword 210” teach the “generating … device group.”  See [0071] for 3 users Alice, Bob, and Carl as a group.  See [0076] for devices of a same user as a group.] 
wherein, at a time of generating the group: 
the first assistant device includes a first set of locally stored on-device models utilized in locally processing assistant requests directed to the first assistant device, and [Carbune, those “computing devices 104” that respond to a particular “hotword” teach both the “first” and “second” “assistant device [that] includes a first set of locally stored on-device models.”  “[0031] In more detail, the sequence of events in FIG. 1 begins at stage A. The computing devices 104a and 104b identify each other and other computing devices that are configured to respond to a particular, predefined hotword….”]  
the second assistant device includes a second set of locally stored on-device models utilized in locally processing assistant requests directed to the second assistant device; [Carbune, those “computing devices 104” that respond to a particular “hotword” teach both the “first” and “second” “assistant device [that] includes a first set of locally stored on-device models.”]
determining, based on corresponding processing capabilities for each of the disparate assistant devices of the assistant device group, a collective set of locally stored on-device models for utilization in cooperatively locally processing assistant requests directed to any of the disparate assistant devices of the assistant device group; [Carbune, Figure 1, 104a and 104b, [0030]-[0031] and  those “computing devices 104” that respond to a particular “hotword” include “a collective set of locally stored on-device models” of the Claim because they cooperate in processing the request.  In Figure 1, the request is directed to 104a and 104a includes the models that are necessary for processing the request 102:  “Ok, Computer, find me a good steakhouse …” and respond with 138 and further 104b is also equipped with the necessary models and responds with 148.  Examples of “models” that form the “collective set of locally stored on-device models” include French ([0055]), analyzing semantics of the input (i.e. NLP) ([0056]), Calendar ([0055], [0065]), Contacts ([0065]), Reservation ([0065]), Routing ([0071]), as some examples.]
in response to generating the assistant device group: 
causing each of the disparate assistant devices to locally store a corresponding subset of the collective set of locally stored on-device models, including causing the first assistant device to purge one or more first on-device models of the first set to provide storage space for the corresponding subset locally stored on the first assistant device and causing the second assistant device to purge one or more second on-device models of the second set to provide storage space for the corresponding subset locally stored on the second assistant device, and [Carbune teaches that each device has a number of programs/models (Calendar, Contacts, French, etc.) stored on it.  Carbune also teaches that the devices identify one another and communicate.  See, e.g. [0046].  It does not teach download or storage of new programs/model although this is a well-known capability of the “computing devices 104a/104b” that are shown.]
assigning one or more corresponding processing roles to each of the disparate assistant devices of the assistant device group, each of the processing roles utilizing one or more corresponding of the locally stored on-device models; and [Carbune does assign “processing roles” according to their respective capabilities.  For example, if the incoming speech is in French, a device with French speech recognition capability is identified and used.  The “assigning … processing roles” is “in response to generating the assistant device group” because it is from among the members of the group that the particular assistant is selected/ “assigned a role utilizing its locally stored models/programs.”  “[0055] In some implementations, determining which device initially responds to a user utterance may comprise analyzing the settings of the computing device. For example, if the utterance is in French and there is one phone with French language settings in the vicinity, it is probably the computing device that the utterance was intended for.”  “[0056] In some implementations, determining which device initially responds to a user utterance may be done by analyzing the semantics of the command or the query included in the utterance and correlating it with the state and information of the computing devices. For example, if the query is "Who am I meeting with at two o'clock?" the speech-enabled system may determine that the query is intended for the computing device which is synchronized with a calendar, and has an appointment at two o'clock.”]
subsequent to assigning the corresponding processing roles to each of the disparate assistant devices of the assistant device group: [Carbune, each device has its processing roles according to the software/model that is loaded on it and further for some functions, like speech recognition, according to its distance from the speaker.  See also, Figure 2, “identify a second computing device that is configured to respond to a particular, predefined hotword.” 210.]
detecting, via microphones of at least one of the disparate assistant devices of the assistant device group, a spoken utterance, and [Carbune, Figure 1, Alice input of her speech.  Figure 2, “receive audio that corresponds to an utterance 220.”]
responsive to the spoken utterance being detected via the microphones of the assistant device group, causing the spoken utterance to be cooperatively locally processed by the disparate assistant devices of the assistant device group and utilizing their corresponding processing roles. [Carbune Figures 1 and 2 both show the “cooperative local processing” of the Claim.  See the example of Al’s steakhouse in Figure 1 and the example of Alice, Bob, and Carl coordinating their trip to the park and respective capabilities of their devices.  [0071]. ]

Carbune teaches all of the limitations of Claim 1; it does not teach moving around and purging/deleting/removing programs/models and storing a subset of the capabilities of the group on all of the devices once the group has been formed.
Slayton teaches:
in response to generating the assistant device group: [Slayton also teaches that Virtual Assistants can be grouped together:  “Virtual assistant systems ("VAs") operate on a distributed and interconnected network, such as a hierarchy or mesh, of virtual assistant platforms ("VAPs"). …  A VA may be configured to participate in a group VA in which knowledge and tasks can be shared and cooperatively executed. Cooperative execution can include distributing subtasks among VAs in the group VA, the subtasks together forming the task. Group VAs can share information with each other ….”  Abstract.  “[0071] The VA shared data store 52 may be completely or partially replicated across all devices 11 that subscribe to the VA 12….”  “16. The system of claim 1, wherein, for one or more of the users, one of the virtual assistants accessible by the user is a personal virtual assistant that is installed on an electronic device of the user….”]
causing each of the disparate assistant devices to locally store a corresponding subset of the collective set of locally stored on-device models, including causing the first assistant device to purge one or more first on-device models of the first set to provide storage space for the corresponding subset locally stored on the first assistant device and causing the second assistant device to purge one or more second on-device models of the second set to provide storage space for the corresponding subset locally stored on the second assistant device, and [Slayton, Figure 1, teaches a plurality of Virtual Assistants (VAs 12) that can be stored on a plurality of Devices 11 and can be managed by an Administrator Virtual Assistant (AVA 14) and share programs (Agents 22) and data in order to perform different tasks.  An “Agent 22” is a “program” which teach the “model” of the Claim.  The Agents 22 / “models” are written or “deleted” / “purged” from the particular VAs as needed for a particular task.  “[0045] … The VA 12 may store its resources, which may include accessible agents 22, local data 19, and shared or VA-specific data 17, locally or through access to a data store maintained by the VAP 10. A VAP 10 may include an administrative virtual assistant ("AVA") 14 that is configured to manage the VAs 12 of the VAP 10. An administrator 20 may use the AVA 14 to add, delete, and configure VAs 12 according to the capabilities required of the VAP 10. Each VA 12 may perform tasks and communicate with the users 16, objects 18, other VAs 12, or other devices using one or more agents 22. An agent 22 may be an autonomous or semi-autonomous software or hardware component configured to perform a particular task, as described in more detail below.”  “[0046] Referring to FIG. 2, the VAP 10 may include an execution environment 24 configured to store and process agents 22, and further to provide VAP-implementation services 26 to the agents 22. VAP-implementation services 26 may enable the operation of agents 22, and therefore VAs 12, within the VAP 10 and between devices with which the VAs 12 communicate. Such services 26 may include, without limitation: an agent 22 registration service that creates, stores, searches, instantiates, manages, distributes, applies, and deletes agents 22 within a VA 12, and further tracks the agents 22 and VAs 12 with which an agent 22 may communicate; an agent 22 programming service for modifying the preprogrammed logic of the agent 22 as described below;….”  See [0079] about the download and update of an Agent 22 on a Device 11 where at least the update would automatically cause a “purge” of the previous version.]
Carbune and Slayton pertain to grouping of digital assistants and collaborative/distributed processing of the voice commands by a group of digital assistants and it would have been obvious to combine the moving around and storing a subset of the Agents/models of the group on the other VAs which are stored on other devices from Slayton with the system of Carbune in order to be able to perform the voice command at any of the devices in the group.  This combination falls under combining prior art elements according to known methods to yield predictable results or use of known technique to improve similar devices (methods, or products) in the same way. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Regarding Claim 2, Carbune teaches:
2. The method of claim 1, 
wherein causing the first assistant device to purge one or more first on-device models of the first set comprises causing the first assistant device to purge a first device wake word detection model, of the first set, that is utilized in detecting a first wake word; [Carbune teaches that the devices in the group identify one another based on the other device using the same hotword: “[0035] In some implementations, there may be more than two computing devices that are configured to respond to the particular hotword. Each computing device may identify the other computing devices that are configured to respond to the particular hotword, and may store the device identifiers for the other computing devices in the device group.”]
wherein the corresponding subset locally stored on the second assistant device comprises a second device wake word detection model that is utilized in detecting the first wake word; [Carbune teaches that the devices in a group including first, second, etc. devices have same wake word detection model and respond to the same wakeword.  “…  In one aspect, a method includes the actions of identifying, by a first computing device, a second computing device that is configured to respond to a particular, predefined hotword; …”  Abstract.]
wherein assigning the corresponding processing roles comprises assigning, to the second assistant device, a first wake word detection role that utilizes the second device wake word detection model in monitoring for occurrence of the first wake word; 
wherein the spoken utterance comprises a first wake word followed by an assistant command; [Carbune: This is the definition of a wake word, wake up and receive further commands:  “[0003] Hotwords may be used in order to avoid picking up utterances made in the surrounding environment that are not directed to the system. A hotword (also referred to as an “attention word” or “voice action initiation command”) is a predetermined word or term that is spoken to invoke the attention of the system. In an example environment, the hotword used to invoke the system's attention are the words “OK computer.” When the system detects that the user has spoken the hotword, the system enters a ready state for receiving further user commands.”]
wherein, in the first wake word detection role, the second assistant device detects occurrence of the first wake word and causes performance of an additional of the corresponding processing roles in response to detecting the occurrence of the first wake word. [Carbune: This is the definition of a wake word, wake up and receive further commands and executes the command. Further, in Carbune, when one device is detected as responding the others stand down.  “[0004] In speech-enabled environments, devices may be continuously listening for hotwords. When there are multiple devices in the same environment that are configured to respond to a particular hotword, any utterance including the hotword may trigger all the devices and provide redundant responses from the multiple devices. ….”  “[0008] … determining that the audio data that corresponds to the utterance includes the particular, predefined hotword; and receiving data indicating that the second computing device is responding to the audio data.” ]
The addition and deletion of models/programs/software is taught by Slayton as applied to Claim 1.  Further, Slayton teaches the use of Natural Language Commands and also mentions Siri in the Background which is a wake word for the Apple TM personal assistant.  “[0049] In some embodiments, message formatting and processing within the execution environment 24 may be implemented by a natural language processing pipeline. Natural language commands comprise phrases typically input by a user 16 and parsed according to sentence structure and parts of speech. The VAP API 28 or other elements of the execution environment 24 may be configured to execute the processing pipeline to determine the nature of the commands and distribute tasks and data to the appropriate agents 22…. analyzed to determine if multiple commands are present, such as when a user 16 enters a multiple-step script for one or more agents 22 to follow.”  For Siri see [0003]-[0005].  
Rationale for combination as provided for Claim 1.

Regarding Claim 3, Carbune teaches and the combination of teachings suggest:  
3. The method of claim 2, wherein the additional of the corresponding processing roles is performed by the first assistant device, and wherein the second assistant device causes performance of the additional of the corresponding processing roles by [Carrbune, Figure 1, 102, 132 and Figure 2, 230, 250 teach that when the wake word “OK Computer” is detected by both devices 104a and 104b, one device lets the other device perform a part of the task: “[0064] In stage H, computing device 104b processes the audio data corresponding to the utterance 102 and the transcription 132 of the additional audio data provided by computing device 104b [This should be 104a, the arrow in Figure 1 goes from 104a to 104b], and generates a new transcription that corresponds to a response….”  “[0067] In stage I, the computing device 104b provides the generated output in response to the utterance 102 and the additional audio data provided by computing device 104a….”]
transmitting, to the first assistant device, an indication of detection of the first wake word. [Carbune, Figure 1, teaches that the server may detect the hotword and then send notification to the one or more of the computing devices.  “[0042] In some implementations, one or more of the computing devices sends the processed audio data to a server and the server computes a hotword confidence score. In this instance, the server includes a hotworder similar to hotworders 122 and 124. The hotworder on the server may determine that the utterance 102 includes the hotword and sends the notification to the one or more computing devices.”  Several locations in Carbune equate the Server with a Second Computing Device: [0018], [0019] and “[0045] In some implementation, the audio data from the computing devices may be sent to a server….”  Further, the sever may be one of the devices:  “[0100] … [0045] In some implementation, the audio data from the computing devices may be sent to a server….”  Accordingly, the server detecting the Hotword and then transmitting the hotword to one of the device is taught or suggested by the combination of 1teachings of Carbune.]
In this Claim, one device detects that wake word has been received and tells another device to execute the command.  In Carbune generally both devices are receiving the wake word as shown by B in Figure 1.  However, in some configurations, the server may receive and detect the hotword and notify a device and in some configurations, the server may be a mobile device.  Accordingly, one device detecting the hotword and telling another device is taught or suggested by Carbune.
Slayton does not address wakeword exchange between devices.  But, Slayton teaches that any device can be a server.  “[0071] The VA shared data store 52 may be completely or partially replicated across all devices 11 that subscribe to the VA 12. Through this redundancy of shared data, processing may be partially or fully decentralized as agents 22 on any server or device 11 may operate autonomously upon the shared data it requires to do so. …”
Rationale for combination as provided for Claim 1.

Regarding Claim 8, Carbune teaches that each device has a “speech recognizer 130/158” and the fact that the devices can make sense of the natural language question posed by the user Alice implies that they do include NLP capability.  “[0056] In some implementations, determining which device initially responds to a user utterance may be done by analyzing the semantics of the command or the query included in the utterance and correlating it with the state and information of the computing devices. For example, if the query is “Who am I meeting with at two o'clock?” the speech-enabled system may determine that the query is intended for the computing device which is synchronized with a calendar, and has an appointment at two o'clock.”
However, Carbune does not discuss NLP features of the devices.
This Claim is impliedly about adapting a natural language understanding model.
Slayton teaches:
8. The method of claim 1, 
wherein the corresponding subset locally stored on the first assistant device comprises one or more pre-adaptation natural language understanding models that are utilized in performing semantic analysis of natural language input, wherein the one or more initial natural language understanding models occupy a first quantity of local disk space at the first assistant device; [Slayton teaches that it has the capability for receiving natural language input and determining the commands from the natural language input.  See [0049].]
wherein the corresponding subset locally stored on the first assistant device comprises one or more post-adaptation natural language understanding models that include at least one additional natural language understanding model that is in addition to the one or more initial natural language understanding models, wherein the one or more post-adaptation natural language understanding models occupy a second quantity of the local disk space at the first assistant device, the second quantity being greater than the first quantity. [Slayton also teaches that its agents/software/models can learn and improve and adapt.  When a model is “capable of learning new tasks” generally (if not inherently) the storage capacity that it requires expands.  If the learning is accompanied with optimization then the storage required may not go up but learning “new tasks” without ditching the old and without some type of optimization requires more space.  “[0053] The agent 22 may comprise a processing module 32 and an agent data store 34 that may be accessed and modified by the processing module 32. The processing module 32 may comprise preprogrammed logic that defines the behavior of the agent 22. The preprogramming logic may include one or more algorithms, implemented with hardware or software modules, for processing input, deciding what action to take, if any, based on the input, and generating output according to the selected action. The behavior of the agent 22 may have a particular degree of complexity. In some embodiments, the agent 22 may be an intelligent agent capable of choosing and taking action in pursuit of accomplishing one or more tasks or subtasks. The agent 22 may further be capable of learning, in that the logic and its algorithms may change over time in light of input, output, and/or data in the agent data store 34.”  “[0056] According to the algorithms, rules, and data provided to it, the agent 22 may perform one or more tasks or subtasks, and may be dedicated to such tasks or subtasks or may be capable of learning new tasks or subtasks to perform. As non-limiting examples, an agent 22 may perform: speech recognition; text-to-speech conversion ….”]
Carbune and Slayton pertain to grouping of digital assistants and collaborative/distributed processing of the voice commands by a group of digital assistants and both utilize NLP to understand the received natural language command and it would have been obvious to combine the Learning/Adapting feature of Slayton with the NLP of Carbune to permit for expansion of capabilities of the NLP feature.  This combination falls under combining prior art elements according to known methods to yield predictable results or use of known technique to improve similar devices (methods, or products) in the same way. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Regarding Claim 9, Carbune teaches that each device has a “speech recognizer 130/158” and the fact that the devices can make sense of the natural language question posed by the user Alice implies that they do include NLP capability.  However, Carbune does not discuss NLP features of the devices.
Slayton teaches and the teachings suggest:
9. The method of claim 1, 
wherein the corresponding subset locally stored on the first assistant device comprises a first device natural language understanding model that is utilized in semantic analysis for one or more first classifications, and excludes any natural language understanding model that is utilized in semantic analysis for a second classification; and [Slayton teaches the storing of Natural Languages in the VAs and their corresponding devices.  See [0049].  Slayton also teaches the use of an Ontology which defines different Domains (Classifications of the Claim).  See [0054]-[0055].  The two teachings together suggest that the various Domains of the Ontology may correspond to different Natural Languages.   “[0049] In some embodiments, message formatting and processing within the execution environment 24 may be implemented by a natural language processing pipeline. Natural language commands comprise phrases typically input by a user 16 and parsed according to sentence structure and parts of speech. The VAP API 28 or other elements of the execution environment 24 may be configured to execute the processing pipeline to determine the nature of the commands and distribute tasks and data to the appropriate agents 22. In addition to user 16 input, natural language syntax may be used for communications between agents 22 in place of or in addition to artificial programming protocols….”  “[0054] The agent data store 34 may comprise one or more agent knowledge stores and one or more agent file stores. An agent knowledge store may include one or more ontologies. An ontology may be understood herein to mean a collection of data that defines the scope and procedures by which agents 22 may perform tasks. An ontology may contain facts, rules, and other types of structured and unstructured information typically found in a knowledge base. Data in an ontology may be unstructured or may be organized into files, databases, hierarchies, and the like…..”  “[0055] The agent 22 may update each ontology through receipt of input or other processing, or the VA 12 may update each ontology, such as when a software, firmware, or hardware upgrade is propagated in the VAP 10. The rules of each ontology may be organized into one or more rule sets that are interpreted by the processing module 32 in order for the agent 22 to perform tasks. Rules may be added, removed, or changed within each rule set as needed for the agent 22 to perform its tasks or subtasks….”  “[0060] … At the opposite end of the hierarchy, a domain ontology may govern access to a particular element or set of elements in the system (i.e., a domain) by defining the logic and data for the domain. A domain may be a file, a database or set of databases, an agent or set of agents, an object or set of objects, a VA (e.g., any VA described herein), etc. An upper ontology may be an ontology that defines logic and data for a set of domains. Definitions from higher ontologies may pass by inheritance to ontologies below within the hierarchy.”] 
wherein the corresponding subset locally stored on the second assistant device comprises a second device natural language understanding model that is utilized in semantic analysis for at least the second classification. [Slayton teaches the use of an Ontology with a Hierarchy of Domains/ Classifications and also teaches the use of Natural Language Understanding for deciphering complex commands.  The two teachings together suggest that each Domain/Classification may be subjected to a specialized Natural Language Understanding Model and Slayton teaches that the Rules of domains of an ontology are added or removed:   “[0055] … Rules may be added, removed, or changed within each rule set as needed for the agent 22 to perform its tasks or subtasks….”]
Rationale for combination as provided for Claim 1.

Regarding Claim 10, Carbune teaches and suggests:
10. The method of claim 1, wherein the corresponding processing capabilities for each of the disparate assistant devices of the assistant device group comprise a corresponding processor value based on capabilities of one or more on-device processors, a corresponding memory value based on size of on-device memory, and a corresponding disk space value based on available disk space. [Carbune, Figure 3, “[0096] The computing device 300 includes a processor 302, a memory 304, a storage device 306, a high-speed interface 308 connecting to the memory 304 and multiple high-speed expansion ports 310, and a low-speed interface 312 connecting to a low-speed expansion port 314 and the storage device 306….”  The “values” of this Claim are features known in the art.  It is the size of the memory and empty disk space that determines their respect “values” and the processing power of the processors.]

Regarding Claim 11, Carbune teaches:
11. The method of claim 1, wherein generating the assistant device group of disparate assistant devices is in response to user interface input that explicitly indicates a desire to group the disparate assistant devices. [Carbune, “[0077] … For example, a user may be setting up a home assistant and part of the set up process is to search for other computing device that are nearby and that respond to hotwords….” “[0036] … In some examples, the computing devices may be co-located virtually, e.g., when the computing devices participate in a telephone or video conference.”  This teaching of participants in a conference suggests a group explicitly indicated by a user (person inviting for the conference).  “[0032] …In this case, Alice may be logged into computing device 104a and Bob may be logged into computing device 104b. Alice and Bob may be a part of a group of users with associated devices that may be configured to respond to a particular, predefined hotword. The group of users may be a group of co-workers at a company, or a group of friends. ….”]

Regarding Claim 12, Carbune teaches:
12. The method of claim 1, wherein generating the assistant device group of disparate assistant devices is performed automatically in response to determining the disparate assistant devices satisfy one or more proximity conditions relative to one another. [Carbune, “[0036] In some implementations, the computing devices may be co-located such that they share a same location or place. The computing devices may be within a predetermined distance of each other, or within the same room. The computing devices may be in the same acoustic environment. ….”  “[0037] … computing devices may be located within a particular distance of each other, such as ten meters, as determined by GPS or signal strength…..”]

Regarding Claim 13, Carbune does not address the scenario of leaving the group.
Slayton teaches and suggests (the last limitation is suggested):
13. The method of claim 1, further comprising, subsequent to assigning the corresponding processing roles to each of the disparate assistant devices of the assistant device group: [ Slayton teaches that the Agents 22 may or may not be located on a particular device:  “[0063] Agents 22 or dependent agents 44 of the VA 12 may engage a device 50 by communicating with the device client 30, and therefore the agents may not be instantiated on the device 50 itself. …  Alternatively or additionally, one or more agents may be instantiated on the device 50 itself, so that a network connection to the agents' location is not required. ….” Slayton, Figure 8, teaches Devices 1 to N that subscribe to VA 12.]
determining that the first assistant device is no longer in the group; and: [Slayton teaches that if a user who is part of a work group and therefore part of a group of VAs leaves, the information on his VA (and therefore his device) is transferred:  “[0092] …When the user 16 gets promoted or leaves the company all this data can be easily transferred to his successor. …”]
in response to determining that the first assistant device is no longer in the group: causing the first assistant device to supplant the corresponding subset, locally stored on the first assistant device, with the first on-device models of the first set. [Slayton teaches deleting the current agents/software; Slayton teaches updating the agents when a new version becomes available; the two teachings suggest that once a device has left, its software choice have the option of reverting to their original configuration.  “[0045] …An administrator 20 may use the AVA 14 to add, delete, and configure VAs 12 according to the capabilities required of the VAP 10. Each VA 12 may perform tasks and communicate with the users 16, objects 18, other VAs 12, or other devices using one or more agents 22. An agent 22 may be an autonomous or semi-autonomous software or hardware component configured to perform a particular task, as described in more detail below.”  Figures 1 and 2.  “[0046] … Such services 26 may include, without limitation: an agent 22 registration service that creates, stores, searches, instantiates, manages, distributes, applies, and deletes agents 22 within a VA 12 ….”  “[0055] The agent 22 may update each ontology through receipt of input or other processing, or the VA 12 may update each ontology, such as when a software, firmware, or hardware upgrade is propagated in the VAP 10….”  “[0078] Use of the agent 22 within a VA 12 may include discovery, delivery, and updating of the agent 22….”  “[0079] … Subsequent updating of the agent 22 may be required when a new version of the agent template becomes available. The AVA 14, through the agent store agent 90, may notify any VA 12 that had previously downloaded the agent 22 that a new version of the agent 22 is available. The VA 12 may then initiate a download and installation of the new version as described above.”]
Rationale for combination as provided for Claim 1.

Claims 4-5 are rejected under 35 U.S.C. 103 as being unpatentable over Carbune and Slayton in view of Stevans (U.S. 20180108343).
Regarding Claim 4, Carbune teaches that the devices in the group respond to the same wakeword (see [0034] same hotword) and Slayton does not address particulars of wakewords.  However, the “agents” / programs /models of Slayton may very well be wake word detection models and this Claim repeats the requirement of each assistant device having its own set of models in the particular case that the model is a wake word detection model.
Stevans teaches:
4 . The method of claim 1, 
wherein the corresponding subset locally stored on the first assistant device comprises a first device first wake word detection model that is utilized in detecting one or more first wake words, and excludes any wake word detection model that is utilized in detecting one or more second wake words; [Stevans teaches a two step process where first one of a multiple types of wake words is detected by “wake-up phrase spotting” and then the remainder of the utterance is sent to a server for conducting ASR in one of several languages.  The wake word / “wake-up phrase” determines in which language the Command following the Wakeword is intended to be.  Thus, each Virtual Assistant would have its own wake word which can be considered a “wake word detection model.”  See Figure 1.  “Phrase Spotting” and then sending to “Server ASR.”  “A speech-enabled dialog system responds to a plurality of wake-up phrases. Based on which wake-up phrase is detected, the system's configuration is modified accordingly….”  Abstract.  “[0006] FIG. 1 illustrates a state diagram for a speech-enabled virtual assistant operating on a mobile phone. The virtual assistant initially runs a phrase spotter. If it detects a wake-up phrase and the phone has a network connection to a remote ASR server, it begins sending captured audio to the server to perform ASR using a full language vocabulary….”]
wherein the corresponding subset locally stored on the second assistant device comprises a second device second wake word detection model that is utilized in detecting the one or more second hot words, and excludes any wake word detection model that is utilized in detecting the one or more first wake words; [Stevans, Figure 1, as provided above, Stevans runs on multiple wake words such that each virtual assistant can get its own wake word: “[0011] Aspects of the present invention are directed to systems for, and methods of, configuring the behavior of a virtual assistant in response to distinct wake-up phrases. Accordingly, embodiments of the invention spot a plurality of wake-up phrases.”]
wherein assigning the corresponding processing roles comprises assigning, to the first assistant device, a first wake word detection role that utilizes the first device wake word detection model in monitoring for occurrence of the one or more first wake word; and [Stevans, Figure 1, “[0010] … To select the appropriate vocabulary for any given utterance, the phrase spotter listens for multiple wake-up phrases. One wake-up phrase causes the system to parse the full utterance in a first language vocabulary, another wake-up phrase causes the system to parse the full utterance in a second language, etc. For example, a virtual assistant could respond to “OK, Hound” by performing an English language parse of the full utterance, ….”]
wherein assigning the corresponding processing roles comprises assigning, to the second assistant device, a second wake word detection role that utilizes the second device wake word detection model in monitoring for occurrence of the one or more second wake words. [Stevans, Figure 1, “[0010] … the assistant could respond to “Anyong, Hound” by performing a Korean language parse of the full utterance (“Anyong” being the Romanized transliteration of the Korean word meaning “Hello”)….”]
Stevans does not teach that different keyword spotters are present on different devices.  However, the system of Slayton can send any program/model/agent to any device that is desirable for the particular issued.  Each version of the Keyword spotter is a different Agent within the meaning of Slayton.
Carbune/Slayton and Stevans pertain to Virtual Assistants and it would have been obvious to modify the system of combination which includes various Agents for performing different tasks that can be present on different devices with the system of Stevans such that some of the Agents consist of the keyword spotter of Stevans that responds differently to different keywords and send each version of the keyword spotter responding to a particular language to a different device.  This combination falls under combining prior art elements according to known methods to yield predictable results or use of known technique to improve similar devices (methods, or products) in the same way. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Regarding Claim 5, Carbune teaches that the devices in the group respond to the same wakeword (see [0034] same hotword) and Slayton does not address particulars of wakewords.
This Claim pertains to the situation where each VA performs speech recognition in a different natural language.
Stevans teaches:
5. The method of claim 1, 
wherein the corresponding subset locally stored on the first assistant device comprises a first language speech recognition model that is utilized in performing recognition of speech in a first language, and excludes any speech recognition model that is utilized in recognizing speech in a second language; [Stevans Figure 1 and Abstract and [0010] , e.g. where each wake-up phrase invokes a different speech recognition model for recognition of a different language at the server.  The speech recognition models in different languages are not locally stored and are at the server.]
wherein the corresponding subset locally stored on the second assistant device comprises a second language speech recognition model that is utilized in performing recognition of speech in a second language, and excludes any speech recognition model that is utilized in recognizing speech in the second language; [Stevans, “[0012] Beyond using different wake-up phrases to invoke speech recognition using different language vocabularies, embodiments of the invention support wake-up phrase selection to vary many other features or components of a virtual assistant, as well as attributes of such components. …”]
wherein assigning the corresponding processing roles comprises assigning, to the first assistant device, a first language speech recognition role that utilizes the first language speech recognition model in performing recognition of speech in the first language; and  [Stevans  “[0010] …To select the appropriate vocabulary for any given utterance, the phrase spotter listens for multiple wake-up phrases. One wake-up phrase causes the system to parse the full utterance in a first language vocabulary, another wake-up phrase causes the system to parse the full utterance in a second language, etc. For example, a virtual assistant could respond to “OK, Hound” by performing an English language parse of the full utterance,…”]
wherein assigning the corresponding processing roles comprises assigning, to the second assistant device, a second language speech recognition role that utilizes the second language speech recognition model in performing recognition of speech in the second language. [Stevans  “[0010] … and the assistant could respond to “Anyong, Hound” by performing a Korean language parse of the full utterance (“Anyong” being the Romanized transliteration of the Korean word meaning “Hello”). Such an assistant system determines the desired input language based on a small vocabulary of, known wake-up phrases, and processes the subsequent utterance accordingly.”]
Stevans does not teach that different speech recognition models are present on different devices.  Rather, all of the models are on the Server and the wake-up phrase triggers the selection of the proper model.  However, the system of Slayton can send any program/model/agent to any device that is desirable for the particular issued.  Each speech recognition model in a different language is a different Agent within the meaning of Slayton.
Rationale for combination as provided for Claim 4.

Claims 6-7 are rejected under 35 U.S.C. 103 as being unpatentable over Carbune and Slayton in view of Georges (U.S. 20150058018).
Regarding Claim 6, Carbune does teach sending the task that is received at Device 104a to another Device 104b that is in the group/cluster of devices so that the second device responds to the task.  See Figures 1 and 2 of Carbune.  Carbune, as applied to Claim 7, also teaches this division of task between different devices 
Slayton also teaches sending the portions of tasks to different Agents which may be on different devices.
This Claim pertains to a distributed speech recognition situation where the VA devices each perform a portion of the recognition task.
Georges teaches:
6. The method of claim 1, 
wherein the corresponding subset locally stored on the first assistant device comprises a first portion of a speech recognition model that is utilized in performing a first portion of recognition of speech, and excludes a second portion of the speech recognition model; [Georges, Figure 1, “perform first speech processing pass, 120.” Figure 4 showing 3 devices in communication in a network of devices.  “[0047] FIG. 1 shows an illustrative process 100 for recognizing mixed-content speech having natural language and domain-specific content, in accordance with some embodiments. As described above, process 100 may be implemented using one device or multiple devices in a distributed manner. For example, process 100 may be implemented using any of the system configurations described in connection with the illustrative environment 400 described with reference to FIG. 4 below or using other configurations, as the multi-pass techniques are not limited for use with any particular system or system configuration, or with any particular environment.” ]
wherein the corresponding subset locally stored on the second assistant device comprises the second portion of the speech recognition model that is utilized in performing the second portion of recognition of speech, and excludes the first portion of the speech recognition model; [Georges, Figure 1, “perform second speech processing pass, 130.”  As provided at [0047], the second pass may be implement on a second device. ]
wherein assigning the corresponding processing roles comprises assigning, to the first assistant device, a first portion of a language speech recognition role that utilizes the first portion of the speech recognition model in generating a corresponding embedding of corresponding speech, and transmitting the corresponding embedding to the second assistant device; and [Georges, Figure 1, 125. “[0067] Performing the first speech processing pass (act 120) may result in first pass results 125 comprising natural language recognition results and information identifying one or more portions of speech input 115 containing domain-specific content (e.g., one or more tags)….”  “[0065] An example of embedding a filler model into a search network is described below using conventional finite state automata notation. In particular, to embed into the search network a transducer G' representing a filler model corresponding to a tag, source and destination states of transitions in the search network having the tag as an output label may be marked. Those transitions may have a .delta. input label to prevent being eliminated during determinization and minimization. The transducer G' is embedded using an .epsilon. transition, if a marked state is reached during decoding.….”]
wherein assigning the corresponding processing roles comprises assigning, to the second assistant device, a second language speech recognition role that utilizes the corresponding embedding, from the first assistant device, and the second language speech recognition model in generating a corresponding recognition of the corresponding speech. [Georges, Figure 1, 125 is received by the second stage and the output 135 is generated: “[0079] As may be appreciated from the above discussion, performing the second speech processing pass (act 130) may result in second pass results comprising recognition results for the speech input 115 including recognition results for the natural language and domain-specific portions of speech input 115. That is, second pass results 135 may include the 1-best recognition for speech input 115.”]
Carbune/Slayton and Georges include speech recognition and it been obvious to modify the system of combination which includes various Agents for performing different tasks that can be present on different devices with the system of Georges that is directed to distributed speech recognition and states that each portion of the recognition can be conducted on a different device to match the processing power to the load by distributing the task.  This combination falls under combining prior art elements according to known methods to yield predictable results or use of known technique to improve similar devices (methods, or products) in the same way. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Regarding Claim 7, Carbune teaches:
7. The method of claim 1, 
wherein the corresponding subset locally stored on the first assistant device comprises a speech recognition model that is utilized in performing a first portion of recognition of speech; [Carbune, Figure 1, each device such as 104a includes a “Hotworder 122” which is used for recognizing the “hotword” such as “Ok, Computer 126.”  “[0040] In stage C, the respective audio subsystem 118 and 120 of each computing device 104a and 104b provide the processed audio data to respective hotworders 122 and 124. The respective hotworders 122 and 124 compare the processed audio data to known hotword data and compute respective hotword confidence scores that indicate the likelihood that the utterance 102 includes a hotword 126.”  Additionally, Figure 1, “speech recognizer 130” of 104a recognizes the input speech “find me a good steakhouse nearby for lunch 102.”]
wherein assigning the corresponding processing roles comprises assigning, to the first assistant device, a first portion of a language speech recognition role that utilizes the speech recognition model in generating output, and transmitting the corresponding output to the second assistant device; and [Carbune, Figure 1, “Audio transcription 132” (G) is being sent from 104a to 104b.  “[0062] In stage G, computing device 104b {this is a typo and should be 104a as shown in Figure 1} sends the transcription 132 that corresponds to the response to computing device 104b and any other identified computing devices.”  Additionally, the output generated by the 104a is “Al’s steakhouse at 123 Main Street has good reviews 138” which heard by / “transmitted to” the device 104b] [ In Claim 7 it is not clear what the “output” is? Is it the output of the “speech recognition model” as recognized text or transcript? Is it the response of the “assistant device” to the input command? The Claim says “utilizes the speech recognition model in generating output.”  Does not say that the output is the output of the speech recognizer operating on input speech.  Every output is generated “utilizing” the recognizer because the device has to understand the command in order to respond.]
wherein assigning the corresponding processing roles comprises assigning, to the second assistant device, a second language speech recognition role that performs a beam search, on the corresponding output from the first assistant device, in generating a corresponding recognition of the corresponding speech. [Carbune, Figure 1, the “second assistant device” / “104b” is receiving the output 138 of the first device 104a via its “microphone 116” and recognizing it with “speech recognizer 158” of 104b which prompts device104b to generate its own output 148 in response to the output 138 of 104a.]
Carbune does not teach a “beam search” being performed by a second stage of speech recognition.
Neither does Slayton.
Georges as applied to Claim 6 teaches the two stages/passes of speech recognition and that they could each be implemented on a different device (Figures 1 and 4) and further teaches:
wherein assigning the corresponding processing roles comprises assigning, to the second assistant device, a second language speech recognition role that performs a beam search, on the corresponding output from the first assistant device, in generating a corresponding recognition of the corresponding speech. [Georges, “[0074] The second speech processing pass may be performed at least in part by using the results of the first speech processing pass. In this respect, some embodiments may include evaluating the likelihoods of various hypotheses, each hypothesis including potential recognition results for the natural language portions(s) of speech input 115 (obtained during the first pass at act 120) and potential recognition results for the portions of speech input 115 identified as having domain-specific content. As in the first pass, the evaluation of such hypotheses may be performed by using dynamic programming (e.g., a Viterbi beam search, a token-passing time-synchronous Viterbi beam search, etc.) and/or any other suitable technique.”  “[0041] … Additionally, such a multi-pass approach allows a natural language recognizer and a domain-specific recognizer to focus on recognizing speech for which they were adapted, allows recognition parameters to be optimally tuned for each speech processing pass and permits different decoding schemes to be used for each speech processing pass if desired (e.g., a Viterbi beam search for the first speech processing pass and a conditional random field (CRF) classifier for the second speech processing pass).”]
Carbune/Slayton and Georges pertain to multiple speech recognition processes it would have been obvious to use beam pruning of Georges which is a known process in speech recognition in the system of the combination as one method of pruning the results of the previous pass/stage of speech recognition and arriving at the 1-best output.  This combination falls under combining prior art elements according to known methods to yield predictable results or use of known technique to improve similar devices (methods, or products) in the same way. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Claims 14-15 are rejected under 35 U.S.C. 103 as being unpatentable over Carbune and Slayton in view of Phipps (U.S. 20180332118).
Regarding Claim 14, Carbune teaches that “[0032] …The group of users may be a group of co-workers at a company, or a group of friends. ….”  
Slayton teaches that the VA members of a Group VA perform the same general task (HR of a company) or are targeted with the same information (employees): Figure 7: “[0092] All of the company VAs 1124, 1126, 1128 may be implemented on the same VAP 1130 or on different VAPs. …  In one example, the company VA 1128 configured to communicate directly with the employee user's 16 VA 12 may be a human resources group VA that provides the employee with access to company resources and information, delivers urgent notifications to the user's 16 device, and traces employee physical locations. In another example, The company VA 1128 that directly communicates with the VA 12 may be an employee VA (EVA) that is designed to assist humans in a specified company role or position. When a new person is hired, his VA 12 is connected to the EVA, allowing all the company related information to be delivered and stored at the user's 16 EVA and not his personal VA 12. ….”
Phipps teaches:
14. The method of claim 1, wherein determining the collective set is further based on usage data that reflects past usage at one or more of the assistant devices of the group. [Phipps teaches it uses “contextual information” that includes “resources usage data” ([0085] … past and present network activities, background services, error logs, resources usage, etc….”) to help infer user’s intent which in turn determines which models/programs need to be used to respond to the user.  [0084].]
Carbune/Slayton and Phipps pertain to grouping of digital assistants and collaborative/distributed processing of the voice commands by a group of digital assistants and it would have been obvious to use the history of the use of the devices by a user to decide a future grouping or clustering configuration as is commonly done in the industry.  This combination falls under combining prior art elements according to known methods to yield predictable results or use of known technique to improve similar devices (methods, or products) in the same way. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Regarding Claim 15, Carbune does not address the capabilities of the devices.
 Slayton teaches providing shared capabilities to a plurality of virtual assistants.  [0007].  Slayton teaches that the VAs are managed to have particular capabilities.  [0045].  Slayton teaches that the “computing power of devices 11 registered with the group VA’s is considered and exchanged between the VA pools:  “[0088] …Other subscriber VAs 79 may automatically acquire resources from the pool to perform complex computing tasks, such as to perform calculations for weather forecasting, medical data review, and the like. Users 16a . . . n may be able to store some of their data on other devices 11 that have excess capacity ….”  See also [0097].  Note that VAP is one or more devices:  “[0044] Referring to FIGS. 1 and 2, a virtual assistant platform ("VAP") 10 may be a computing hardware or software framework, or a combination thereof, that provides the computational resources in at least one computer or computing device upon which one or more virtual assistants ("VAs") 12 may operate….”
Slayton teaches providing VA capabilities to a groups of devices according to their computing capability and obviously if a device does not have the processing/computing power, it will not receive a set of capabilities.
Phipps expressly teaches:
15. The method of claim 14, wherein determining the collective set comprises: 
determining, based on the corresponding processing capabilities for each of the disparate assistant devices of the assistant device group, multiple candidate sets that are each capable of being collectively locally stored and collectively locally utilized by the assistant devices of the group; and [Phipps, Figure 8 teaches that after user request has been determined (802), a set of data corresponding to a second instance of the VA on a second device are obtained (804) and the settings of the device are updated based on the received data (806).  This set of data can be on both the first device and the second device, namely on “assistant devices of the group”:  “[0254] Exemplary sets of data corresponding to the second instance of the digital assistant are described below. As discussed below, the set of data can include settings (e.g., language-related settings, acoustic settings, user preferences) of the second instance of the digital assistant (e.g., electronic device 1022), …. As described in further detail below, the set of data can be used by the first electronic device to provide consistent operation of the digital assistant across the first electronic device and the second electronic device.”  “[0255] In some examples, obtaining the set of data corresponding to the second instance of the digital assistant comprises obtaining settings corresponding to the second instance of the digital assistant….”  “[0260] In some examples, to avoid duplicate presentation of the same content by multiple instances of the digital assistant, obtaining the set of data corresponding to the second instance of the digital assistant comprises obtaining information corresponding to content presentable by the digital assistant. The information can include information related to a set of suggestions (e.g., tips) presentable by the digital assistant regarding various capabilities of the digital assistant….”]
selecting, based on the usage data, the collective set from the candidate sets. [Phipps teaches it uses “contextual information” that includes “resources usage data” to help infer user’s intent/domain which in turn determines which models/programs need to be used to respond to the user.  [0084]-[0085].  For the domain ontology see Figure 7C.]
Rationale for combination as provided for Claim 14.  This Claim depends from Claim 14 and further limits a limitation of Claim 14.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Lochan (U.S. 20160378455) Figures 5A, 5B and Abstract: “A first electronic device stores in its memory one or more installation files for a first version of an application. A local connection is established between the first electronic device and a second electronic device. The local connection is independent of Internet connectivity. Using the local connection, a determination is made as to whether the first version of the application is installed on the second electronic device. In response to a determination that the first version of the application is not installed on the second electronic device, the first electronic device sends, to the second electronic device through the local connection, the one or more installation files for the first version of the application.”
Nadkar (U.S. 20180122372):  Multiple Wake Words each invoking a different service:  “[0042] The wake component 304 determines a plurality of wake words for voice assistant services. For example, the wake component 304 may maintain a list of wake words for a plurality of voice assistant services that are currently available, as determined by the tracking component 302. The wake words may include a first wake word for a first voice assistant service and a second wake word for a second voice assistant service and so forth for each voice assistant service. The wake component 304 may store or determine one or more wake words for each service. For example, some voice assistant services may have more than one wake word. In one embodiment, the wake component 304 maintains a list of wake words that correspond to currently available voice assistants.”
Liu (U.S. 20210183397) : “In one embodiment, a method includes by a client system associated with a user, receiving, at the client system associated with the user, a user input, parsing the user input to identify an n-gram associated with a wake word from a plurality of wake words corresponding to a plurality of assistant systems associated with the client system, wherein each assistant system provides a particular set of functions, determining that the wake word corresponds to a first assistant system of the plurality of assistant systems, wherein the first assistant system provides a first set of functions, sending, to the first assistant system, a request to set an assistant xbot of the first assistant system into a listening mode, and receiving, from the first assistant system, an indication that the assistant xbot is in listening mode responsive to a determination that the user has permission to access the first assistant system.”
Baker (U.S. 20040148164), Figure 4 shows that the hypotheses resulting from the second speech recognition (first for the Claim) from 440 are sent to the first speech recognition process for pruning with a beam search with a different beamwidth at 460.  “[0093] Referring to block 460, if the answer to the determination in block 450 is that a particular hypothesis has been found to have a score that is better than the best score for that time frame for the hypotheses evaluated by the first speech recognition search process 420 by more than a predetermined amount, then restarting the first speech recognition search process at that point in time on a plurality of hypotheses that include the earlier pruned hypothesis and using a new pruning threshold.”  “[0008] In a yet further embodiment, the beam search process for the first speech recognition search process has a tighter pruning threshold than the beam search process for the second speech recognition search process.”

Phipps (U.S. 20180332118) cited by the PCT as an X reference:
Phipps teaches:
1. A method implemented by one or more processors, the method comprising: [Phipps, Figure 8, and [0248] teach that the steps of the method may be performed at a server, any of the client devices, or distributed among the devices.]
…
in response to generating the assistant device group: [Phipps, Figure 1, the two user devices 104, 122 that belong to the same user for a group.  Figure 11, “Identifies a second device associated with a same user.”   “[0045] … While only two user devices 104 and 122 are shown in FIG. 1, it should be appreciated that system 100, in some examples, includes any number and type of user devices configured in this proxy configuration to communicate with DA server system 106.”]
causing each of the disparate assistant devices to locally store a corresponding subset of the collective set of locally stored on-device models, including causing the first assistant device to purge one or more first on-device models of the first set to provide storage space for the corresponding subset locally stored on the first assistant device and causing the second assistant device to purge one or more second on-device models of the second set to provide storage space for the corresponding subset locally stored on the second assistant device, and [Phipps, Figure 2 includes a “User and Data Models 231” and Figure 8, “update one or more setting of the first instant of the digital assistance based on the received set of data” where the “received set of data” is “a set of data corresponding to a second instance of the digital assistant on a second electronic device 804.”  Phipps provides the models and data /settings of the user to the various devices in the group.  The “settings” include the “acoustic model” that is user-specific and teaches the “on-device models” of the Claim.  The settings, in the context of disambiguation, includes the “particular application” that the user used previously on a different device.  Further, “updating a user-specific acoustic model” at 806 “adjusts” the data for hardware and software configurations of the two devices.  “[0254] Exemplary sets of data corresponding to the second instance of the digital assistant are described below. As discussed below, the set of data can include settings (e.g., language-related settings, acoustic settings, user preferences) of the second instance of the digital assistant (e.g., electronic device 1022), previous disambiguation inputs provided by the user to the second instance of the digital assistant, information related to content presentable by the digital assistant (e.g., tips for operating the digital assistant), information related to one or more interactions (e.g., sessions) between the user and the second instance of the digital assistant, or any combination thereof. As described in further detail below, the set of data can be used by the first electronic device to provide consistent operation of the digital assistant across the first electronic device and the second electronic device.”  “[0255] In some examples, obtaining the set of data corresponding to the second instance of the digital assistant comprises obtaining settings corresponding to the second instance of the digital assistant. In some examples, the settings include one or more language-related settings, such as language(s) in which the digital assistant can provide outputs. ....”  “[0256] In some examples, the settings include one or more settings used by the digital assistant to perform speech-to-text analysis. For example, the settings can include information related to a user-specific acoustic model….”  “[0264] In some examples, updating the settings of the first instance of the digital assistant comprises updating a user-specific acoustic model of the first instance of the digital assistant. By way of example, the received set of data may include one or more acoustic models for recognizing the user's speech. In some examples, the received set of data includes acoustic information that has been adjusted (e.g., by the second electronic device, by the settings server) to account for differences between hardware and/or software configurations (e.g., different microphones) of the first electronic device and the second electronic device. As such, the first electronic device does not need to further adjust the received acoustic information locally before updating settings of the first instance of the digital assistant. In other examples, the first electronic device adjusts the acoustic information locally (e.g., based on the different hardware/software configurations of the first and second electronic devices) before updating settings of the first instance of the digital assistant using the received acoustic information.”  “[0271] … As another example, if the user has previously asked the second instance of the digital assistant to "Book a car" and provided a disambiguation input selecting a particular ride booking application from multiple ride booking applications, the first instance of the digital assistant can automatically invoke the selected ride booking application if the user asks the first instance of the digital assistant to "Find a car".”]
assigning one or more corresponding processing roles to each of the disparate assistant devices of the assistant device group, each of the processing roles utilizing one or more corresponding of the locally stored on-device models; [Phipps teaches that the processing “role” may occur at any selected device or at the server or as a distributed/collaborative processing task.  See [0248] and “thin-client” in [0046].]

Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARIBA SIRJANI whose telephone number is (571)270-1499. The examiner can normally be reached on 9 to 5, M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Desir can be reached on 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Fariba Sirjani/
Primary Examiner, Art Unit 2659