DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Applicant’s Response
In Applicant’s response dated 12/01/2022, Applicant amended Claims 1, 5, 8, 12, 19 and 20; and argued against the rejections previously set forth in the Office Action dated 09/01/2022.

Status of the Claims
	Claims 1 – 20 are rejected under 35 U.S.C. 102(a)(1).

Examiner Note
 	The Examiner cites particular columns, line numbers and/or paragraph numbers in the references as applied to the claims below for the convenience of the Applicant(s). Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested that, in preparing responses, the Applicant fully consider the references in their entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the Examiner.  


Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1 – 20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Neil Houlsby et al. “Parameter-Efficient Transfer Learning for NLP”, Cornell University Library, February 2, 2019, 13 pages (hereinafter, Houlsby) (cited in IDS dated 01/13/2021).

	Regarding Claim 1,  Houlsby teaches a method for providing a variety of natural language processing (NLP) models during runtime (See Houlsby’s Abstract), comprising: 
	receiving a first input data comprising a first input text and a first task type, the first task type specifying one or more target NLP task types to be performed on the first input text (Houlsby in page 1, col 1, Introduction second paragraph, teaches that tasks arrive in a stream. The goal is to build a system that perform well on all of them, but without training an entire new model for every new task. Houlsby in page 2, col 1, lines 32 – Col 2 line 2, teaches that continual learning systems aim to learn from an endless stream of tasks. Adapters differ in that the task do not interact and the shared parameters are frozen. This means that the models has a perfect memory of previous tasks using a small number of task-specific parameters. Houlsby demonstrate on a large and diverse set of text classification tasks that adapters yield parameter-efficient tuning for NLP. Adapter-based tuning yield a single, extensible, model that attains near state of the art performance in text classification); 
	dynamically generating a first model tuned to generate predictions for a first NLP task having the first task type, the generating comprising integrating, into a base model during runtime, a first model artifact comprising one or more adapter layers specific to the first task type (Houlsby in page 2, col 2 lines 5 – 32, further teaches a strategy for tuning a large text model on several downstream tasks. Tuning with adapter modules involves adding a small number of new parameters to a model, which are trained on the downstream task. When performing vanilla fine-tuning of deep networks, a modification is made to the top layer of the network. Adapter modules perform more general architectural modifications to re-purpose a pretrained network for a downstream task. The adapter tuning strategy involves injecting new layers into the original network. In adapter tuning, the parameters of the original network are frozen and therefore may be shared by many tasks. Houlsby in page 4, col 2, lines 21 – 25, further teaches that for each task, AutoML is run for a week on CPUs, using 30 machines. In this time the algorithm explores over 10k models on average per task. Selecting the best final model for each task according to validation set accuracy); 
	generating, during the same runtime and based on processing the first input text with the first model generated during the same runtime, a prediction for the first NLP task (Houlsby in page 3 col. 1 line 54 – Col. 2 line 24, teaches that alongside the layers in the adapter module, new layer normalization parameter per task were also trained. This technique, similar to conditional batch normalization and self-modulation yields parameter-efficient adaptation of a network with only 2d parameters per layer. However, training the layer normalization parameters alone is insufficient for good performance. Houlsby in page 4 section 3.4, teaches parameter/performance trade-off. Table 2, shows the test accuracy for additional classification tasks. Houlsby in page 4, Col. 2, section 3.3, further teaches that for each task, AutoML ran for one week on CPUs, using 30 machines. In this time, the algorithm explores over 10k models on average per task. The results for the AutoMl benchmark (“no Bert base-line”), fine-tuning, variable fine-tuning, and adapter-tuning are shown in table 2); and 
	providing the prediction to one or more application instances (Houlsby in page 3 col. 1 line 54 – Col. 2 line 24, teaches that alongside the layers in the adapter module, new layer normalization parameter per task were also trained. This technique, similar to conditional batch normalization and self-modulation yields parameter-efficient adaptation of a network with only 2d parameters per layer. Houlsby in page 4 section 3.4, teaches parameter/performance trade-off. Table 2, shows the test accuracy for additional classification tasks).

	Regarding Claim 2, Houlsby teaches the limitations contained in parent Claim 1. Houlsby further teaches:
	wherein the base model comprises one or more encoder layers that process training data to generate base parameters (Houlsby in page 2 col. 1 lines 45 – 53, shows that on large and diverse set of text classification tasks that adapters yield parameter-efficient tuning for NLP. The key innovation is to design an effective adapter module and its integration with the base model. The strategy almost matches the performance of the fully fine-tuned BERT, but uses only 3% task specific parameters, while fine-tuning uses 100% task specific parameters. Houlsby in page 2, Col. 2, lines 5 – 11, further teaches a strategy for tuning a large text model on several downstream tasks. One of the strategy key properties is that it adds only a small number of additional parameters per task. Houlsby in page 4 section 3.2, teaches that pre-trained BERT large model contains 24 layers and a total of 330M parameters).  

	Regarding Claim 3, Houlsby teaches the limitations contained in parent Claim 2. Houlsby further teaches: 
	further comprising: 
	training the base parameters (Houlsby in page 3 col. 1 line 54 – Col. 2 line 24, teaches that alongside the layers in the adapter module, new layer normalization parameter per task were also trained. This technique, similar to conditional batch normalization and self-modulation yields parameter-efficient adaptation of a network with only 2d parameters per layer. Houlsby in page 4 section 3.4, teaches parameter/performance trade-off. Table 2, shows the test accuracy for additional classification tasks), 
	the training comprising:
	 tokenizing one or more corpora of text data to generate a plurality of tokens (Houlsby in page 1, Col 1, introduction section, teaches that BERT, a Transformer network trained on a large corpora with an unsupervised loss. Houlsby in page 3, Col. 2, lines 38 – 44, further teaches that BERT is the base model. To perform classification with BERT. The first token in each sequence is a special “classification token. A linear layer is embedded to this token to predict the class label. Houlsby in table 6, shows the dataset size (tokens)); 
	generating predictions for a first training task and a second training task based on the plurality of tokens (Houlsby in page 2, col 2 lines 5 – 32, further teaches a strategy for tuning a large text model on several downstream tasks. Tuning with adapter modules involves adding a small number of new parameters to a model, which are trained on the downstream task. When performing vanilla fine-tuning of deep networks, a modification is made to the top layer of the network. Adapter modules perform more general architectural modifications to re-purpose a pretrained network for a downstream task. The adapter tuning strategy involves injecting new layers into the original network. In adapter tuning, the parameters of the original network are frozen and therefore may be shared by many tasks);
	determining an update for the base parameters (Houlsby in page 2, Col. 1 lines 14 – 31, teaches that fine tuning involves adjusting the original parameters for each new task. For adapter tuning, a new function is defined, where parameters are copied over from pre-training. Houlsby in page 2, Col. 2, lines 5 – 11, further teaches a strategy for tuning a large text model on several downstream tasks. One of the strategy key properties is that it adds only a small number of additional parameters per task. Houlsby in page 4 section 3.2, teaches that pre-trained BERT large model contains 24 layers and a total of 330M parameters); and 
	applying the update to the base parameters (Houlsby in page 2, Col. 1 lines 14 – 31, teaches that fine tuning involves adjusting the original parameters for each new task. For adapter tuning, a new function is defined, where parameters are copied over from pre-training. Houlsby in page 2, Col. 2, lines 5 – 11, further teaches a strategy for tuning a large text model on several downstream tasks. One of the strategy key properties is that it adds only a small number of additional parameters per task. Houlsby in page 4 section 3.2, teaches that pre-trained BERT large model contains 24 layers and a total of 330M parameters).

	Regarding Claim 4, Houlsby teaches the limitations contained in parent Claim 2. Houlsby further teaches:
	wherein the base parameters of the base model transfer knowledge generalizable to a variety of NLP tasks to the first model (Houlsby in page 2, col 2 lines 5 – 32, further teaches a strategy for tuning a large text model on several downstream tasks. Tuning with adapter modules involves adding a small number of new parameters to a model, which are trained on the downstream task. When performing vanilla fine-tuning of deep networks, a modification is made to the top layer of the network. Adapter modules perform more general architectural modifications to re-purpose a pretrained network for a downstream task. The adapter tuning strategy involves injecting new layers into the original network. In adapter tuning, the parameters of the original network are frozen and therefore may be shared by many tasks).  

	Regarding Claim 5, Houlsby teaches the limitations contained in parent Claim 2. Houlsby further teaches:
	wherein the integrating of the first model artifact comprises inserting the one or more adapter layers into the encoder layers to alter a flow of data within the base model (Houlsby in page 2 col. 1 lines 45 – 53, shows that on large and diverse set of text classification tasks that adapters yield parameter-efficient tuning for NLP. The key innovation is to design an effective adapter module and its integration with the base model. The strategy almost matches the performance of the fully fine-tuned BERT, but uses only 3% task specific parameters, while fine-tuning uses 100% task specific parameters. Houlsby in page 2, Col. 2, lines 5 – 11, further teaches a strategy for tuning a large text model on several downstream tasks. One of the strategy key properties is that it adds only a small number of additional parameters per task. Houlsby in page 4 section 3.2, teaches that pre-trained BERT large model contains 24 layers and a total of 330M parameters); and 
	wherein the first model artifact comprises an output layer that generates the prediction for the first NLP task based on an output received from the one or more adapter layers (Houlsby in page 3, Col. 1 lines 31 – 38, further teaches that the output of each sub-layer is fed into layer normalization. The adapter is always applied directly to the output of the sub-layer. The output of the adapter is then passed directly into the following layer normalization) .  

	Regarding Claim 6, Houlsby teaches the limitations contained in parent Claim 2. Houlsby further teaches:
	wherein the first model artifact comprises adapter parameters comprising knowledge related to a particular NLP task (Houlsby in page 2, col 2 lines 5 – 32, further teaches a strategy for tuning a large text model on several downstream tasks. Tuning with adapter modules involves adding a small number of new parameters to a model, which are trained on the downstream task. When performing vanilla fine-tuning of deep networks, a modification is made to the top layer of the network. Adapter modules perform more general architectural modifications to re-purpose a pretrained network for a downstream task. The adapter tuning strategy involves injecting new layers into the original network. In adapter tuning, the parameters of the original network are frozen and therefore may be shared by many tasks).  

	Regarding Claim 7, Houlsby teaches the limitations contained in parent Claim 6. Houlsby further teaches:
	further comprising: 
	training the adapter parameters (Houlsby in the Abstract indicates that the user of adapters attain near state of the art performance, whilst adding only a few parameters per task. Houlsby in page 2 col. 1 lines 45 – 53, shows that on large and diverse set of text classification tasks that adapters yield parameter-efficient tuning for NLP. The key innovation is to design an effective adapter module and its integration with the base model. The strategy almost matches the performance of the fully fine-tuned BERT, but uses only 3% task specific parameters, while fine-tuning uses 100% task specific parameters), 
	the training comprising: 
	tokenizing one or more corpora of text data to generate a plurality of tokens (Houlsby in page 1, Col 1, introduction section, teaches that BERT, a Transformer network trained on a large corpora with an unsupervised loss. Houlsby in page 3, Col. 2, lines 38 – 44, further teaches that BERT is the base model. To perform classification with BERT. The first token in each sequence is a special “classification token. A linear layer is embedded to this token to predict the class label. Houlsby in table 6, shows the dataset size (tokens)); 
	combining the first model artifact with the base model by inserting the one or more adapter layers within the base model's architecture (Houlsby in page 2, col 2 lines 5 – 32, further teaches a strategy for tuning a large text model on several downstream tasks. Tuning with adapter modules involves adding a small number of new parameters to a model, which are trained on the downstream task. When performing vanilla fine-tuning of deep networks, a modification is made to the top layer of the network. Adapter modules perform more general architectural modifications to re-purpose a pretrained network for a downstream task. The adapter tuning strategy involves injecting new layers into the original network. In adapter tuning, the parameters of the original network are frozen and therefore may be shared by many tasks); 
	freezing the base parameters included in the base model (Houlsby in page 2, col 2 lines 5 – 32, further teaches a strategy for tuning a large text model on several downstream tasks. Tuning with adapter modules involves adding a small number of new parameters to a model, which are trained on the downstream task. When performing vanilla fine-tuning of deep networks, a modification is made to the top layer of the network. Adapter modules perform more general architectural modifications to re-purpose a pretrained network for a downstream task. The adapter tuning strategy involves injecting new layers into the original network. In adapter tuning, the parameters of the original network are frozen and therefore may be shared by many tasks); 
	generating predictions for a training task based on the plurality of tokens, wherein the training task has the same task type as the first NLP task (Houlsby in page 2, col 2 lines 5 – 32, further teaches a strategy for tuning a large text model on several downstream tasks. Tuning with adapter modules involves adding a small number of new parameters to a model, which are trained on the downstream task. When performing vanilla fine-tuning of deep networks, a modification is made to the top layer of the network. Adapter modules perform more general architectural modifications to re-purpose a pretrained network for a downstream task. The adapter tuning strategy involves injecting new layers into the original network. In adapter tuning, the parameters of the original network are frozen and therefore may be shared by many tasks);
	determining an update for the adapter parameters (Houlsby in page 2, Col. 1 lines 14 – 31, teaches that fine tuning involves adjusting the original parameters for each new task. For adapter tuning, a new function is defined, where parameters are copied over from pre-training. Houlsby in page 2, Col. 2, lines 5 – 11, further teaches a strategy for tuning a large text model on several downstream tasks. One of the strategy key properties is that it adds only a small number of additional parameters per task. Houlsby in page 4 section 3.2, teaches that pre-trained BERT large model contains 24 layers and a total of 330M parameters); 
	applying the update to the adapter parameters (Houlsby in page 2, Col. 1 lines 14 – 31, teaches that fine tuning involves adjusting the original parameters for each new task. For adapter tuning, a new function is defined, where parameters are copied over from pre-training. Houlsby in page 2, Col. 2, lines 5 – 11, further teaches a strategy for tuning a large text model on several downstream tasks. One of the strategy key properties is that it adds only a small number of additional parameters per task. Houlsby in page 4 section 3.2, teaches that pre-trained BERT large model contains 24 layers and a total of 330M parameters); and 
	storing the adapter parameters in the first model artifact (Houlsby in page 2, Col. 1 lines 14 – 31, teaches that fine tuning involves adjusting the original parameters for each new task. For adapter tuning, a new function is defined, where parameters are copied over from pre-training). 

	Regarding Claim 8, Houlsby teaches the limitations contained in parent Claim 6. Houlsby further teaches:
	wherein the adapter parameters comprise a smaller number of tunable parameters relative to the base parameters, and wherein the adapter parameters require less training time relative to the base parameters (Houlsby in the Abstract indicates that the user of adapters attain near state of the art performance, whilst adding only a few parameters per task. Houlsby in page 2 col. 1 lines 45 – 53, shows that on large and diverse set of text classification tasks that adapters yield parameter-efficient tuning for NLP. The key innovation is to design an effective adapter module and its integration with the base model. The strategy almost matches the performance of the fully fine-tuned BERT, but uses only 3% task specific parameters, while fine-tuning uses 100% task specific parameters).

	Regarding Claim 9,  Houlsby teaches the limitations contained in parent Claim 1. Houlsby further teaches:
	wherein the one or more adapter layers include skip- connection bottleneck layers that alter a learned manifold within the base model (Houlsby in page 2, col 2 lines 15 – 32, further teaches a bottleneck adapter module. Tuning with adapter modules involves adding small number of new parameters to a model, which are trained on the downstream task. When performing vanilla fine-tuning of deep networks, a modification is made to the top layer of the network. Adapter modules perform more general architectural modifications to re-purpose a pretrained network for a downstream task. The adapter tuning strategy involves injecting new layers into the original network. In adapter tuning, the parameters of the original network are frozen and therefore may be shared by many tasks). 

	Regarding Claim 10, Houlsby teaches the limitations contained in parent Claim 1. Houlsby further teaches:
	further comprising:  29 EAST\171349430.1Attorney Docket No. 327696-000369 
	receiving a second input data comprising a second input text and a second task type, the second task type specifying one or more target NLP task types to be performed on the second input text, wherein the second task type is different from the first task type (Houlsby in page 1, col 1, Introduction second paragraph, teaches that tasks arrive in a stream. The goal is to build a system that perform well on all of them, but without training an entire new model for every new task. Houlsby in page 2, col 1, lines 32 – Col 2 line 2, teaches that continual learning systems aim to learn from an endless stream of tasks. Adapters differ in that the task do not interact and the shared parameters are frozen. This means that the models has a perfect memory of previous tasks using a small number of task-specific parameters. Houlsby demonstrate on a large and diverse set of text classification tasks that adapters yield parameter-efficient tuning for NLP. Adapter-based tuning yield a single, extensible, model that attains near state of the art performance in text classification); 
	generating a second model tuned to generate predictions for a second NLP task having the second task type, the generating comprising exchanging the first model artifact with a second model artifact comprising one or more adapter layers specific to the second task type (Houlsby in page 2, col 2 lines 5 – 32, further teaches a strategy for tuning a large text model on several downstream tasks. Tuning with adapter modules involves adding a small number of new parameters to a model, which are trained on the downstream task. When performing vanilla fine-tuning of deep networks, a modification is made to the top layer of the network. Adapter modules perform more general architectural modifications to re-purpose a pretrained network for a downstream task. The adapter tuning strategy involves injecting new layers into the original network. In adapter tuning, the parameters of the original network are frozen and therefore may be shared by many tasks); 
	generating a second prediction for the second NLP task by processing the second input text using the second model (Houlsby in page 3 col. 1 line 54 – Col. 2 line 24, teaches that alongside the layers in the adapter module, new layer normalization parameter per task were also trained. This technique, similar to conditional batch normalization and self-modulation yields parameter-efficient adaptation of a network with only 2d parameters per layer. Houlsby in page 4 section 3.4, teaches parameter/performance trade-off. Table 2, shows the test accuracy for additional classification tasks); and 
	distributing the second prediction to the one or more application instances (Houlsby in page 3 col. 1 line 54 – Col. 2 line 24, teaches that alongside the layers in the adapter module, new layer normalization parameter per task were also trained. This technique, similar to conditional batch normalization and self-modulation yields parameter-efficient adaptation of a network with only 2d parameters per layer. Houlsby in page 4 section 3.4, teaches parameter/performance trade-off. Table 2, shows the test accuracy for additional classification tasks).  

	Regarding Claim 11, Houlsby teaches the limitations contained in parent Claim 10. Houlsby further teaches:
	wherein the exchanging comprises: 
	fetching the second model artifact (Houlsby in page 1, Col. 2, introduction, teaches that extensible models can be trained incrementally to solve new tasks, without forgetting previous ones, thus not sacrificing performance); 
	removing the one or more adapter layers specific to the first NLP task from the base model; and inserting one or more adapter layers specific to the second NLP task into the base model (Houlsby in page 2, col 2 lines 15 – 32, further teaches a bottleneck adapter module. Tuning with adapter modules involves adding small number of new parameters to a model, which are trained on the downstream task. When performing vanilla fine-tuning of deep networks, a modification is made to the top layer of the network. Adapter modules perform more general architectural modifications to re-purpose a pretrained network for a downstream task. The adapter tuning strategy involves injecting new layers into the original network. In adapter tuning, the parameters of the original network are frozen and therefore may be shared by many tasks). 
  
	Regarding Claim 12, Houlsby teaches a machine learning (ML) system for serving a variety of NLP models comprising: a memory; and a processor in communication with the memory (Houlsby in page 4, col 2, lines 21 – 22, teaches that for each task, AutoML is run for a week on CPUs, using 30 machines) and configured to perform at least the following functions: 
	training a base model (Houlsby in page 3 col. 1 line 54 – Col. 2 line 24, teaches that alongside the layers in the adapter module, new layer normalization parameter per task were also trained. This technique, similar to conditional batch normalization and self-modulation yields parameter-efficient adaptation of a network with only 2d parameters per layer. Houlsby in page 4 section 3.4, teaches parameter/performance trade-off. Table 2, shows the test accuracy for additional classification tasks);
	generating a plurality of model artifacts, wherein each model artifact comprises adapter parameters (Houlsby in page 2, col 2 lines 5 – 32, further teaches a strategy for tuning a large text model on several downstream tasks. Tuning with adapter modules involves adding a small number of new parameters to a model, which are trained on the downstream task. When performing vanilla fine-tuning of deep networks, a modification is made to the top layer of the network. Adapter modules perform more general architectural modifications to re-purpose a pretrained network for a downstream task. The adapter tuning strategy involves injecting new layers into the original network. In adapter tuning, the parameters of the original network are frozen and therefore may be shared by many tasks); 
	receiving a first input data comprising a first input text and a first task type, the first task type specifying a category of NLP tasks to be performed on the first input text (Houlsby in page 1, col 1, Introduction second paragraph, teaches that tasks arrive in a stream. The goal is to build a system that perform well on all of them, but without training an entire new model for every new task. Houlsby in page 2, col 1, lines 32 – Col 2 line 2, teaches that continual learning systems aim to learn from an endless stream of tasks. Adapters differ in that the task do not interact and the shared parameters are frozen. This means that the models has a perfect memory of previous tasks using a small number of task-specific parameters. Houlsby demonstrate on a large and diverse set of text classification tasks that adapters yield parameter-efficient tuning for NLP. Adapter-based tuning yield a single, extensible, model that attains near state of the art performance in text classification); 
	dynamically generating a first model tuned to generate predictions for a first NLP task having the first task type, the generating comprising integrating, into the base model during runtime, a first model artifact comprising one or more adapter layers specific to the first task type (Houlsby in page 2, col 2 lines 5 – 32, further teaches a strategy for tuning a large text model on several downstream tasks. Tuning with adapter modules involves adding a small number of new parameters to a model, which are trained on the downstream task. When performing vanilla fine-tuning of deep networks, a modification is made to the top layer of the network. Adapter modules perform more general architectural modifications to re-purpose a pretrained network for a downstream task. The adapter tuning strategy involves injecting new layers into the original network. In adapter tuning, the parameters of the original network are frozen and therefore may be shared by many tasks. Houlsby in page 4, Col. 2, section 3.3, further teaches that for each task, AutoML ran for one week on CPUs, using 30 machines. In this time, the algorithm explores over 10k models on average per task. The results for the AutoMl benchmark (“no Bert base-line”), fine-tuning, variable fine-tuning, and adapter-tuning are shown in table 2);  30 EAST\171349430.1Attorney Docket No. 327696-000369 
	generating, during the same runtime and based on processing the first input text with the first model generated during the same runtime, a prediction for the first NLP task (Houlsby in page 3 col. 1 line 54 – Col. 2 line 24, teaches that alongside the layers in the adapter module, new layer normalization parameter per task were also trained. This technique, similar to conditional batch normalization and self-modulation yields parameter-efficient adaptation of a network with only 2d parameters per layer. Houlsby in page 4 section 3.4, teaches parameter/performance trade-off. Table 2, shows the test accuracy for additional classification tasks. Houlsby in page 4, Col. 2, section 3.3, further teaches that for each task, AutoML ran for one week on CPUs, using 30 machines. In this time, the algorithm explores over 10k models on average per task. The results for the AutoMl benchmark (“no Bert base-line”), fine-tuning, variable fine-tuning, and adapter-tuning are shown in table 2); and 
	providing the prediction to one or more application instances (Houlsby in page 3 col. 1 line 54 – Col. 2 line 24, teaches that alongside the layers in the adapter module, new layer normalization parameter per task were also trained. This technique, similar to conditional batch normalization and self-modulation yields parameter-efficient adaptation of a network with only 2d parameters per layer. Houlsby in page 4 section 3.4, teaches parameter/performance trade-off. Table 2, shows the test accuracy for additional classification tasks).

	Regarding Claim 13, this Claim merely recites a machine learning (ML) system for serving a variety of NLP models comprising a memory; and a processor in communication with the memory and configured to perform instructions as similarly recited in Claim 10. Accordingly, Houlsby discloses/teaches every limitation of Claim 13, as indicated in the above rejection of Claim 10.

	Regarding Claim 14, this Claim merely recites a machine learning (ML) system for serving a variety of NLP models comprising a memory; and a processor in communication with the memory and configured to perform instructions as similarly recited in Claim 11. Accordingly, Houlsby discloses/teaches every limitation of Claim 14, as indicated in the above rejection of Claim 11.

	Regarding Claim 15, Houlsby teaches the limitations contained in parent Claim 12. Houlsby further teaches:
	wherein the ML system comprises an Adapter Service architecture that generates a plurality of models by integrating different model artifacts into a single instance of the base model, wherein each model included in the plurality of models generates predictions for a different NLP task (Houlsby in page 2 col. 1 lines 45 – 53, shows that on large and diverse set of text classification tasks that adapters yield parameter-efficient tuning for NLP. The key innovation is to design an effective adapter module and its integration with the base model. The strategy almost matches the performance of the fully fine-tuned BERT, but uses only 3% task specific parameters, while fine-tuning uses 100% task specific parameters. Houlsby in page 2, Col. 2, lines 5 – 11, further teaches a strategy for tuning a large text model on several downstream tasks. One of the strategy key properties is that it adds only a small number of additional parameters per task. Houlsby in page 4 section 3.2, teaches that pre-trained BERT large model contains 24 layers and a total of 330M parameters. Houlsby in page 4, col 2, lines 21 – 25, further teaches that for each task, AutoML is run for a week on CPUs, using 30 machines. In this time the algorithm explores over 10k models on average per task. Selecting the best final model for each task according to validation set accuracy).  

	Regarding Claim 16, Houlsby teaches the limitations contained in parent Claim 15. Houlsby further teaches:
	wherein the Adapter Service architecture includes a model deployment service that facilitates integrating the different model artifacts into the base model (Houlsby in page 2, col 2 lines 5 – 32, further teaches a strategy for tuning a large text model on several downstream tasks. Tuning with adapter modules involves adding a small number of new parameters to a model, which are trained on the downstream task. When performing vanilla fine-tuning of deep networks, a modification is made to the top layer of the network. Adapter modules perform more general architectural modifications to re-purpose a pretrained network for a downstream task. The adapter tuning strategy involves injecting new layers into the original network. In adapter tuning, the parameters of the original network are frozen and therefore may be shared by many tasks).  

	Regarding Claim 17, Houlsby teaches the limitations contained in parent Claim 16. Houlsby further teaches:
	wherein the model deployment service generates predictions for a variety of NLP tasks by serving the plurality of models on demand (Houlsby in page 1, Col. 2, introduction, teaches that extensible models can be trained incrementally to solve new tasks, without forgetting previous ones, thus not sacrificing performance. Houlsby in page 4, col 2, lines 21 – 25, further teaches that for each task, AutoML is run for a week on CPUs, using 30 machines. In this time the algorithm explores over 10k models on average per task. Selecting the best final model for each task according to validation set accuracy).  

	Regarding Claim 18, this Claim merely recites a machine learning (ML) system for serving a variety of NLP models comprising a memory; and a processor in communication with the memory and configured to perform instructions as similarly recited in Claim 5. Accordingly, Houlsby discloses/teaches every limitation of Claim 1, as indicated in the above rejection of Claim 5.

	Regarding Claim 19, Houlsby teaches a method for providing a variety of natural language processing (NLP) models during runtime (See Houlsby’s Abstract) comprising:
	training a base model comprising base parameters (Houlsby in page 3 col. 1 line 54 – Col. 2 line 24, teaches that alongside the layers in the adapter module, new layer normalization parameter per task were also trained. This technique, similar to conditional batch normalization and self-modulation yields parameter-efficient adaptation of a network with only 2d parameters per layer. However, training the layer normalization parameters alone is insufficient for good performance. Houlsby in page 4 section 3.4, teaches parameter/performance trade-off. Table 2, shows the test accuracy for additional classification tasks), the training comprising: 
	tokenizing one or more corpora of text data to generate a plurality of tokens (Houlsby in page 1, Col 1, introduction section, teaches that BERT, a Transformer network trained on a large corpora with an unsupervised loss. Houlsby in page 3, Col. 2, lines 38 – 44, further teaches that BERT is the base model. To perform classification with BERT. The first token in each sequence is a special “classification token. A linear layer is embedded to this token to predict the class label. Houlsby in table 6, shows the dataset size (tokens)); 
	generating predictions for a first training task and a second training task based on the plurality of tokens (Houlsby in page 2, col 2 lines 5 – 32, further teaches a strategy for tuning a large text model on several downstream tasks. Tuning with adapter modules involves adding a small number of new parameters to a model, which are trained on the downstream task. When performing vanilla fine-tuning of deep networks, a modification is made to the top layer of the network. Adapter modules perform more general architectural modifications to re-purpose a pretrained network for a downstream task. The adapter tuning strategy involves injecting new layers into the original network. In adapter tuning, the parameters of the original network are frozen and therefore may be shared by many tasks); 
	determining an update for the base parameters, wherein applying the update to the base parameters reduces a combined loss function for the first training task and the second training task (Houlsby in page 2, Col. 1 lines 14 – 31, teaches that fine tuning involves adjusting the original parameters for each new task. For adapter tuning, a new function is defined, where parameters are copied over from pre-training. Houlsby in page 2 col. 1 lines 45 – 53, shows that on large and diverse set of text classification tasks that adapters yield parameter-efficient tuning for NLP. The key innovation is to design an effective adapter module and its integration with the base model. The strategy almost matches the performance of the fully fine-tuned BERT, but uses only 3% task specific parameters, while fine-tuning uses 100% task specific parameters. Houlsby in page 2, Col. 2, lines 5 – 11, further teaches a strategy for tuning a large text model on several downstream tasks. One of the strategy key properties is that it adds only a small number of additional parameters per task. Houlsby in page 4 section 3.2, teaches that pre-trained BERT large model contains 24 layers and a total of 330M parameters); and 
	applying the update to the base parameters (Houlsby in page 2, Col. 1 lines 14 – 31, teaches that fine tuning involves adjusting the original parameters for each new task. For adapter tuning, a new function is defined, where parameters are copied over from pre-training. Houlsby in page 2, Col. 2, lines 5 – 11, further teaches a strategy for tuning a large text model on several downstream tasks. One of the strategy key properties is that it adds only a small number of additional parameters per task. Houlsby in page 4 section 3.2, teaches that pre-trained BERT large model contains 24 layers and a total of 330M parameters);
	training, during runtime, one or more adapter layers comprising adapter parameters, wherein the one or more adapter layers are specific to an NLP task type (Houlsby in page 2, col 2 lines 15 – 32, further teaches a bottleneck adapter module. Tuning with adapter modules involves adding small number of new parameters to a model, which are trained on the downstream task. When performing vanilla fine-tuning of deep networks, a modification is made to the top layer of the network. Adapter modules perform more general architectural modifications to re-purpose a pretrained network for a downstream task. The adapter tuning strategy involves injecting new layers into the original network. In adapter tuning, the parameters of the original network are frozen and therefore may be shared by many tasks), the training one or more adapter layers comprising: 
	inserting, during the same runtime, the one or more adapter layers into their correct position within the base model's architecture (Houlsby in page 2, col 2 lines 15 – 32, further teaches a bottleneck adapter module. Tuning with adapter modules involves adding small number of new parameters to a model, which are trained on the downstream task. When performing vanilla fine-tuning of deep networks, a modification is made to the top layer of the network. Adapter modules perform more general architectural modifications to re-purpose a pretrained network for a downstream task. The adapter tuning strategy involves injecting new layers into the original network. In adapter tuning, the parameters of the original network are frozen and therefore may be shared by many tasks); 
	freezing, during the same runtime, the base parameters (Houlsby in page 2, col 2 lines 5 – 32, further teaches a strategy for tuning a large text model on several downstream tasks. Tuning with adapter modules involves adding a small number of new parameters to a model, which are trained on the downstream task. When performing vanilla fine-tuning of deep networks, a modification is made to the top layer of the network. Adapter modules perform more general architectural modifications to re-purpose a pretrained network for a downstream task. The adapter tuning strategy involves injecting new layers into the original network. In adapter tuning, the parameters of the original network are frozen and therefore may be shared by many tasks);
	generating, during the same runtime, predictions for a third training task having the NLP task type specific to the one or more adapter layers inserted during the same runtime (Houlsby in page 2, col 2 lines 5 – 32, further teaches a strategy for tuning a large text model on several downstream tasks. Tuning with adapter modules involves adding a small number of new parameters to a model, which are trained on the downstream task. When performing vanilla fine-tuning of deep networks, a modification is made to the top layer of the network. Adapter modules perform more general architectural modifications to re-purpose a pretrained network for a downstream task. The adapter tuning strategy involves injecting new layers into the original network. In adapter tuning, the parameters of the original network are frozen and therefore may be shared by many tasks. Houlsby in page 4, Col. 2, section 3.3, further teaches that for each task, AutoML ran for one week on CPUs, using 30 machines. In this time, the algorithm explores over 10k models on average per task. The results for the AutoMl benchmark (“no Bert base-line”), fine-tuning, variable fine-tuning, and adapter-tuning are shown in table 2);  32 EAST\171349430.1Attorney Docket No. 327696-000369 
	determining, during the same runtime, an update for the adapter parameters, wherein applying the update to the adapter parameters reduces a loss function for the third training task (Houlsby in page 2, Col. 1 lines 14 – 31, teaches that fine tuning involves adjusting the original parameters for each new task. For adapter tuning, a new function is defined, where parameters are copied over from pre-training. Houlsby in page 2 col. 1 lines 45 – 53, shows that on large and diverse set of text classification tasks that adapters yield parameter-efficient tuning for NLP. The key innovation is to design an effective adapter module and its integration with the base model. The strategy almost matches the performance of the fully fine-tuned BERT, but uses only 3% task specific parameters, while fine-tuning uses 100% task specific parameters. Houlsby in page 2, Col. 2, lines 5 – 11, further teaches a strategy for tuning a large text model on several downstream tasks. One of the strategy key properties is that it adds only a small number of additional parameters per task. Houlsby in page 4 section 3.2, teaches that pre-trained BERT large model contains 24 layers and a total of 330M parameters);
	applying, during the same runtime, the update to the adapter parameters (Houlsby in page 2, Col. 1 lines 14 – 31, teaches that fine tuning involves adjusting the original parameters for each new task. For adapter tuning, a new function is defined, where parameters are copied over from pre-training. Houlsby in page 2, Col. 2, lines 5 – 11, further teaches a strategy for tuning a large text model on several downstream tasks. One of the strategy key properties is that it adds only a small number of additional parameters per task. Houlsby in page 4 section 3.2, teaches that pre-trained BERT large model contains 24 layers and a total of 330M parameters); and 
	using, during the same runtime, the updated adapter parameters for a prediction (Houlsby in page 2, Col. 1 lines 14 – 31, teaches that fine tuning involves adjusting the original parameters for each new task. For adapter tuning, a new function is defined, where parameters are copied over from pre-training. Houlsby in page 3 col. 1 line 54 – Col. 2 line 24, teaches that alongside the layers in the adapter module, new layer normalization parameter per task were also trained. This technique, similar to conditional batch normalization and self-modulation yields parameter-efficient adaptation of a network with only 2d parameters per layer. Houlsby in page 4, Col. 2, section 3.3, further teaches that for each task, AutoML ran for one week on CPUs, using 30 machines. In this time, the algorithm explores over 10k models on average per task. The results for the AutoMl benchmark (“no Bert base-line”), fine-tuning, variable fine-tuning, and adapter-tuning are shown in table 2. Houlsby in page 4 section 3.4, teaches parameter/performance trade-off. Table 2, shows the test accuracy for additional classification tasks).

	Regarding Claim 20, Houlsby teaches the limitations contained in parent Claim 19. Houlsby further teaches:
	wherein the prediction is for a first NLP task within a category of NPL tasks (Houlsby in page 3 Col. 2 section 3, teaches that adapters achieve parameter efficient transfer for text tasks. On the GLUE benchmark, adapter tuning is within 0.4% of full fine-tuning of BERT, but it adds only 3% of the number of parameters trained by fine-tuning. The result was confirmed on a further 17 public classification tasks and SQuAD question answering. Analysis shows that adapter-based tuning automatically focuses on the higher layers of the network. As indicated above, Houlsby in page 4, Col. 2, section 3.3, further teaches that for each task, AutoML ran for one week on CPUs, using 30 machines. In this time, the algorithm explores over 10k models on average per task), the method further comprising:
 	receiving a first input data comprising a first input text and a first task type, the first task type specifying the category of NLP tasks to be performed on the first input text, wherein the first task type specifies the NLP task type specific to the one or more adapter layers (Houlsby in page 1, col 1, Introduction second paragraph, teaches that tasks arrive in a stream. The goal is to build a system that perform well on all of them, but without training an entire new model for every new task. Houlsby in page 2, col 1, lines 32 – Col 2 line 2, teaches that continual learning systems aim to learn from an endless stream of tasks. Adapters differ in that the task do not interact and the shared parameters are frozen. This means that the models has a perfect memory of previous tasks using a small number of task-specific parameters. Houlsby demonstrate on a large and diverse set of text classification tasks that adapters yield parameter-efficient tuning for NLP. Adapter-based tuning yield a single, extensible, model that attains near state of the art performance in text classification); 
	generating a first model tuned to generate predictions for the first NLP task having the first task type, the generating comprising integrating, into the base model, a first model artifact comprising one or more adapter layers specific to the first task type (Houlsby in page 2, col 2 lines 5 – 32, further teaches a strategy for tuning a large text model on several downstream tasks. Tuning with adapter modules involves adding a small number of new parameters to a model, which are trained on the downstream task. When performing vanilla fine-tuning of deep networks, a modification is made to the top layer of the network. Adapter modules perform more general architectural modifications to re-purpose a pretrained network for a downstream task. The adapter tuning strategy involves injecting new layers into the original network. In adapter tuning, the parameters of the original network are frozen and therefore may be shared by many tasks);
	generating, based on processing the first input text with the first model, a prediction for the first NLP task (Houlsby in page 3 col. 1 line 54 – Col. 2 line 24, teaches that alongside the layers in the adapter module, new layer normalization parameter per task were also trained. This technique, similar to conditional batch normalization and self-modulation yields parameter-efficient adaptation of a network with only 2d parameters per layer. Houlsby in page 4 section 3.4, teaches parameter/performance trade-off. Table 2, shows the test accuracy for additional classification tasks); and
	providing the prediction to one or more application instances (Houlsby in page 3 col. 1 line 54 – Col. 2 line 24, teaches that alongside the layers in the adapter module, new layer normalization parameter per task were also trained. This technique, similar to conditional batch normalization and self-modulation yields parameter-efficient adaptation of a network with only 2d parameters per layer. Houlsby in page 4 section 3.4, teaches parameter/performance trade-off. Table 2, shows the test accuracy for additional classification tasks).

Response to Arguments
Applicant's arguments filed 12/01/2022 have been fully considered but they are not persuasive.
(1) Applicant argues: that Houlsby is silent as to at least this features: “dynamically generating a first model tuned to generate predictions for a first NLP task having the first task type, the generating comprising integrating, into a base model during runtime, a first model artifact comprising one or more adapter layers specific to the first task type”
“generating, during the same runtime and based on processing the first input text with the first model generated during the same runtime, a prediction for the first NLP task”
Houlsby is silent as to the relative timing between generating the adapter modules and using them, let alone accomplishing both during the same runtime. For example, section 2 of Houlsby describes generating the adapter modules and section 3 describes testing the generated adapter modules. But there is no disclosure on whether both of these are performed “during the same runtime” as claimed. 
The Examiner respectfully disagrees.
Houlsby in page 2, section 2.0 teaches a strategy for tuning a large text model on several downstream tasks, training on tasks sequentially that is, it does not require simultaneous access to all datasets. The strategy adds only small number of additional parameters per task Tuning with adapter modules involves adding a small number of new parameters to a model, which are trained on the downstream task. Houlsby in page 3, Col. 1,  further teaches that alongside the layers in the adapter module, new layer normalization parameter per task were also trained. This technique, similar to conditional batch normalization and self-modulation yields parameter-efficient adaptation of a network with only 2d parameters per layer. Houlsby in page 4, Col. 2, section 3.3, further teaches that for each task, AutoML ran for one week on CPUs, using 30 machines. In this time, the algorithm explores over 10k models on average per task. The results for the AutoMl benchmark (“no Bert base-line”), fine-tuning, variable fine-tuning, and adapter-tuning are shown in table 2. Houlsby in page 4 section 3.4, teaches parameter/performance trade-off. Table 2, shows the test accuracy for additional classification tasks.
Accordingly, Houlsby teaches a method for dynamically generating a model by tuning a model for a particular task determining the accuracy of the model for the particular task. During the same runtime, Houlsby re-purpose a pre-trained model for a particular task and execute the retrained model to calculate the accuracy of the model for the particular task. Therefore, Houlsby teaches or suggests “dynamically generating a first model tuned to generate predictions for a first NLP task having the first task type, the generating comprising integrating, into a base model during runtime, a first model artifact comprising one or more adapter layers specific to the first task type” as claimed in claim 1.
Furthermore, Houlsby in page 3 Col. 2 section 3, teaches that adapters achieve parameter efficient transfer for text tasks. On the GLUE benchmark, adapter tuning is within 0.4% of full fine-tuning of BERT, but it adds only 3% of the number of parameters trained by fine-tuning. The result was confirmed on a further 17 public classification tasks and SQuAD question answering. Analysis shows that adapter-based tuning automatically focuses on the higher layers of the network. As indicated above, Houlsby in page 4, Col. 2, section 3.3, further teaches that for each task, AutoML ran for one week on CPUs, using 30 machines. In this time, the algorithm explores over 10k models on average per task.
Accordingly, Houlsby during the same runtime is processing the input data for a model and generating a prediction for the particular model for the particular task, Houlsby furthermore explore during the same runtime a plurality of models for each task. Therefore, Houlsby teaches or suggests “generating, during the same runtime and based on processing the first input text with the first model generated during the same runtime, a prediction for the first NLP task” as claimed. In claim 1.
Applicant's remaining arguments with respect to claims are substantially encompassed in the arguments above, therefore examiner responds with the same rationale.
For at least the foregoing reasons, Examiner maintains prior art rejections.

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ARIEL MERCADO VARGAS whose telephone number is (571)270-1701. The examiner can normally be reached M-F 8:00am - 4:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kavita Stanley can be reached on 571-272-8352. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ARIEL MERCADO/           Primary Examiner, Art Unit 2176