DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Drawings
The drawings are objected to because an embodiment illustrated in Figure 7 does not match what is described for this at ¶[0094] of the Specification.  Here, Applicants’ Specification, ¶[0094], describes a user clicking on a ‘map’ option from a list of possible options, so that a machine may learn an intent of an utterance, but a ‘map’ option is not illustrated in Figure 7.  
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application.   Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended.  The figure or figure number of an amended drawing should not be labeled as “amended.”  If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency.  Additional replacement sheets may be necessary to show the renumbering of the remaining figures.  Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d).  If the changes are not accepted by the 

Specification
The disclosure is objected to because of the following informalities:
In ¶[0030], “severs” should be “servers”.
In ¶[0071] to ¶[0073], there should be some reference to Step 402 corresponding to ‘knowledge transfer’ of Figure 4.  
Appropriate correction is required.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1 to 4, 8 to 11, and 15 to 17 are rejected under 35 U.S.C. 103 as being unpatentable over Zitouni et al. (U.S. Patent Publication 2019/0034795) in view of Liu et al. (U.S. Patent Publication 2021/0142164).
Concerning independent claims 1, 8, and 15, Zitouni et al. discloses a method, system, and computer-readable medium (¶[0078] - ¶[0082]: Figure 5) for domain addition in natural language understanding, comprising:
e.g., ordering pizza, booking a flight, making a reservation, etc.; each domain 110 utilizes one or more domain experts 111A, 111B, . . . , 111N, where each domain expert 111 is a fully trained task specific learning model (“a base model”) (¶[0056] - ¶[0057]: Figure 1); a natural language understanding (NLU) system 108 can be implemented by computing device 500, which is a mobile telephone, a smart phone, a personal computer, or a laptop computer, and may include at least one processing unit 502 (“using at least one processor of an electronic device”) (¶[0078]: Figure 5); here, each domain expert 111 is a ‘model’, and a collection of existing models of domain experts is “a base model”;  
“generating, using the at least one processor, a first model expansion based on knowledge from the base model” – after an application is launched, developer 102 may determine that new domain 110C would be beneficial to application 100; a developer 104 may send a new domain 110C with training data 106 to NLU system 108 of application 100, where the training data is limited; new domain model 111C (“a first model expansion”) may utilize provided inputs to train new domain expert 111C to identify correct labels, intents, and/or slots for provided utterances (¶[0061]: Figure 1); communication between new domain 110C and already trained domains 110A, 110B, . . . , 110N includes a query that may ask each domain expert 111A, 111B, . . . 111N for 
“training, using the at least one processor, the first model expansion based on first utterances without modifying parameters of the base model” – a system and method is able to add a new domain utilizing data from one or more of the domains 

“determining, using the at least one processor, a meaning of the additional utterance using the base model and the first model expansion” – domain experts 111 are able to identify intents and/or slots because each expert is trained utilizing labeled data or label embeddings, where these labels provide meanings of different words or phrases (“a meaning of the additional utterance”) (¶[0058]: Figure 1); an action based on the user utterance is determined utilizing the new domain expert (“determining . . . a meaning of the additional utterance using . . . the first model expansion”); the action is determined by the new domain expert by predicting an intent and slots for the utterance and by predicting labels for words and/or phrases of the utterance (¶0075]: Figure 4: Step 418); implicitly, natural language understanding determines “a meaning” of utterances; here, a next user utterance is understood by new domain expert 111C, but a given user utterance might be directed to any of domain experts 111A, 111B, . . . , 111N or new domain expert 111C; a meaning of a next user utterance, then, can be determined “using the base model and the first model expansion”.

Zitouni et al. arguably discloses all of the limitations of these independent claims.  Conceivably, Zitouni et al. might be argued to omit a concept of “a base model” because it provides a collection of models consisting of existing model resources of domain expert models 111A, 111B, . . . , 111N, which are collectively equivalent to “a base model”, but not a single “base model”.  Still, Liu et al. teaches whatever limitations might be omitted by Zitouni et al.  Generally, Liu et al. teaches multi-task learning with a larger teacher model and a smaller student model.  During training of the teacher model, its shared layers are initialized and then the teacher model is multi-task refined.  The teacher model predicts teacher logits.  During training of the student model, its shared layers are initialized, and knowledge distillation is employed to transfer knowledge from the teacher model to the student model by the student updating its shared layers and task layers according to the teacher logits of the teacher model.  (Abstract; ¶[0019])  A pre-trained language model may relate to a natural language understanding (NLU) neural network.  (¶[0035])  The architecture of the student model is independent of the teacher model.  (¶[0038])  Each task-specific layer 220a, 220b, 220c of teacher model 200 performs or implements a different natural language understanding (NLU) task.  (¶[0043]: Figure 3)  Liu et al., then, provides a teacher model that is equivalent to “a base model” and a student model that is equivalent to “a first expansion model”, where “knowledge from the base model” is transferred from the teacher model to train the student model.  Implicitly, a nature of this architecture is that the teacher model is unchanged by this procedure of training the student model.  Liu et al., then, teaches “training . . . the first model expansion . . . without modifying parameters of the base model”.  An objective is to reduce a model Zitouni et al. to a base model of Liu et al. for a purpose of reducing model size and computational cost while maintaining comparable quality of output.

Concerning claims 2, 9, and 16, these steps are simply an iteration of the prior steps of adding and training a new model from a base model as disclosed and taught by Zitouni et al. and Liu et al.  Implicitly, Zitouni et al. can iteratively add new domain models to perform natural language understanding for new intents and slots as a natural language understanding system is again updated to include new tasks with associated intents and slots, and this would not modify any of the existing domain models including any of the prior added new domain models.  That is, Zitouni et al. can iteratively add new domain models 111C’ or 111(N+1).
Concerning claims 3 and 10, Zitouni et al. discloses that a user input including a user utterance is received, and the user input may be determined to be for the new domain expert.  The new domain expert may be further trained or updated in its learning model based on this user utterance.  (¶[0075]: Figure 4: Step 418)  Zitouni et al., then, can provide training using both “knowledge from the base model” and “based on first utterances” to train a model of domain expert 111C.  Implicitly, an iterative process may train a model of a next domain expert 111C’ based on a received user utterance.  
Zitouni et al. discloses that each domain expert 111A, 111B, . . . , 111N is a fully trained task specific learning model that performs intent classification and slot tagging.  Domain experts 111 are able to identify intents and/or slots because each expert is trained using labeled data or label embeddings.  NLU system 108 of application 100 receives user input 116 from client computing device 114, and determines one or more user intents and identifies any provided slots in user input 116.  (¶[0057] - ¶[0060]: Figure 1)  Here, intent and slots are obtained as tasks of natural language understanding using domain experts 111 including existing domain experts 111A, 111B, . . . , 111N (“slot prediction and intent prediction outputs of the base model”) and new domain expert 111C (“slot prediction and intent prediction outputs of the first model expansion”).  Implicitly, any output of intents and slots is a combination (“combining”) of intents identified and slots filled by existing domain experts 111A, 111B, . . . , 111N and intents identified and slots filled by new domain expert 111C.  

Claims 5, 7, 12, 14, 18, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Zitouni et al. (U.S. Patent Publication 2019/0034795) in view of Liu et al. (U.S. Patent Publication 2021/0142164) as applied to claims 1, 8, and 15 above, and further in view of Polovets (U.S. Patent Publication 2020/0312301).
Concerning claims 5, 12, and 18, Liu et al. provides that knowledge from a teacher model is transferred to a student model as logits.  (Abstract; ¶[0019]; ¶[0028]; ¶[0066] - ¶[0067])  A logit is not clearly equivalent to a ‘hidden state’, but may be similar.  Liu et al., then, does not clearly teach “wherein generating the first model expansion Liu et al. describes how a teacher model 200 includes one or more shared encoding layers 210a, 210b, 210c (shared layer 1, shared layer 2, . . . , shared layer n), and various task specific layers 220a, 220b, 220c (task layer 1, task layer 2, . . . task layer m).  (¶[0040]: Figure 2)  Student model 400 includes a plurality of shared bottom layers and a plurality of task specific top layers, where shared layers include hidden layers, and a student model computes hidden states.  (¶[0047] - ¶[0051]: Figure 4)  Conceivably, these shared layers of student model 400 include hidden layers that are equivalent to shared layers received from teacher model 200.  
Concerning claims 5, 12, and 18, Polovets teaches whatever might be omitted by Liu et al. as directed to hidden states that are knowledge from a base model.  Generally, Polovets teaches modular language model adaptation, where a set of adaptation training data and a set of parameters may be received from a recurrent neural network model.  (Abstract)  Parameter freezing may produce a lightweight solution to the problem of providing language adaptation to a recurrent neural network (RNN).  (¶[0012])  A base model is provided for modular language model adaptation.  Hidden states in hidden layer 215 may be used to summarize context.  Outputs of final hidden layer L are fed to projection layer 220 to produce logits.  A set of adaptation training data and a set of parameters from RNN model 115 are received from base model 120, and used to produce an output related to an item of adaptation training data for which an adaptation module is to be trained.  (¶[0016] - ¶[0019]: Figure 2)  The frozen pretrained model, e.g., base model 120, may include an input layer 305, embedded Polovets, then, teaches that information transferred from a base model for training includes hidden states that are gated according to context as adaptation data.  Compare Specification, ¶[0080] - ¶[0081]: Figure 5B, describing a context-gated transfer layer 560, which receives hidden states from a base model for training an expansion model.  An objective is to address updating pretrained models in a way that minimally increases total parameter size and processing required to perform adaptation.  (¶[0010] - ¶[0011])  It would have been obvious to one having ordinary skill in the art to perform model expansion based upon knowledge from a base model in Liu et al. by incorporating hidden states produced by a base model as taught by Polovets for a purpose of performing adaptation of a base model that updates models in a way that minimally increases total parameter size and required processing.
Concerning claims 7, 14, and 20, Zitouni et al. discloses that an objective of existing domain models and a new domain model is to fill slots and identify intents (“the base model is configured to . . . generate first slot predictions and a first intent Liu et al. teaches computing embedding vectors of input text sequences for student model 400.  Then, hidden states are computed from text sequences with context vectors for each token using an attention weight.  (¶[0049] - ¶[0052]: Figures 4 to 5)  Polovets teaches gate 455 is trained to determine in which contexts adaptation should be applied, where hidden states extracted from the base model are used as input to gate 455 to predict if the context comes from the pretraining or adaptation data.  (¶[0035]: Figure 4)  Compare Specification, ¶[0080] - ¶[0081]: Figure 5B, describing a context-gated transfer layer 560, which receives hidden states from a base model for training an expansion model.  Polovets, then, teaches that first hidden states from a base model are used by an adapted model according to gated context vectors of Liu et al. 

Allowable Subject Matter
Claims 6, 13, and 19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.


Conclusion
The prior art made of record and not relied upon is considered pertinent to Applicants’ disclosure.
Kiss et al., Rastow et al., Barton et al., Shen et al., Averboch et al., and Mahmud et al. disclose related prior art.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARTIN LERNER whose telephone number is (571) 272-7608. The examiner can normally be reached Monday-Thursday 8:30 AM-6:00 PM.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571) 272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center.  Unpublished application information in Patent Center is available to registered users.  To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov.  Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and 





/MARTIN LERNER/Primary Examiner
Art Unit 2657                                                                                                                                                                                                        November 9, 2021