DETAILED ACTION
1.	This communication is in response to the Application filed on 11/25/2019. Claims 1-26 are pending and have been examined. Claims 27-28 are cancelled.
Allowable Subject Matter
2.	Claims 20, 23 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
 					Claim Rejections - 35 USC § 103
3.	Claims 1-4, 8, 16-19, 21-22, 24-26 are rejected under 35 U.S.C. 103 as being unpatentable over Yu, et al. (US 20140257803; hereinafter YU) in view of Rabinowitz (US 20210201116; hereinafter RABINOWITZ). 
As per claim 1, YU (Title: CONSERVATIVELY ADAPTING A DEEP NEURAL NETWORK IN A RECOGNITION SYSTEM) discloses “A computer-implemented method, the method comprising: 
obtaining, by one or more computing devices, a machine-learned model that has been previously trained on a first training dataset to perform a first task, the machine-learned model including a first set of learnable parameters (YU, [0002], in a mobile telephone equipped with an ASR <automatic speech recognizer>; [0007], a speaker independent (SI) CD-DNN-HMM system <read on a machine-learned model> that has been trained utilizing training data from a plurality of different users; [0005], ASR systems <read on a first task> that utilize DNNs are trained .. parameters (e.g., weights and weight biases) of the DNN ..); 
modifying, by the one or more computing devices, the machine-learned model to include [ a model patch, the model patch including a second set of learnable parameters ]; and after modifying the machine-learned model to include the model patch, training, by the one or more computing devices, the machine-learned model on a second training dataset to perform a second task that is different from the first task, wherein training, by the one or more computing devices, the machine-learned model on the second training dataset to perform the second task comprises learning new values for the second set of learnable parameters included in the model patch (YU, [0007], adapting at least one parameter <read on learning new values>  of a deep neural network (DNN) that is employed in a recognition system .. a context-dependent deep neural network hidden Markov model (CD-DNN-HMM) .. To improve recognition capabilities of the CD-DNN-HMM system for a particular user or context <read on a different task> .. it may be desirable to adapt the DNN to the particular user or context; [0059], amount of training data to be used when adapting parameters of the DNN <read on the second training dataset>).”  
YU does not expressly disclose “a model patch, the model patch including a second set of learnable parameters ..” However, this feature is taught by RABINOWITZ (Title: Progressive neural networks). See Specification 1A (Patch 1, Patch 2) where “model patch” can be broadly interpreted.
In the same field of endeavor, RABINOWITZ teaches: Fig. 1 and [Abstract] “performing a sequence of machine learning tasks. One system includes a sequence of deep neural networks (DNNs), including: a first DNN corresponding to a first machine learning task .. and one or more subsequent DNNs corresponding to one or more respective machine learning tasks ..” where each DNN includes associated parameters.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of RABINOWITZ in the system taught by YU to provide a sequence of deep neural networks (DNNs) as model patches for performing a sequence of machine learning tasks.
As per claim 2 (dependent on claim 1), YU in view of RABINOWITZ further discloses “learning the new values for the second set of learnable parameters while keeping at least some the first set of learnable parameters fixed (YU, [0007], adapting at least one parameter <read on learning new values, where how many parameters are adapted is a system design choice> of a deep neural network .. To improve recognition capabilities of the CD-DNN-HMM system for a particular user or context .. it may be desirable to adapt the DNN to the particular user or context).”  
As per claim 3 (dependent on claim 1), YU in view of RABINOWITZ further discloses “learning the new values for the second set of learnable parameters while keeping at least a majority of the first set of learnable parameters fixed (YU, [0007], adapting at least one parameter <read on learning new values, where how many parameters are adapted is a system design choice> of a deep neural network .. To improve recognition capabilities of the CD-DNN-HMM system for a particular user or context .. it may be desirable to adapt the DNN to the particular user or context).”  
As per claim 4 (dependent on claim 1), YU in view of RABINOWITZ further discloses “learning the new values for the second set of learnable parameters while keeping all of the first set of learnable parameters fixed (YU, [0007], adapting at least one parameter <read on learning new values, where how many parameters are adapted is a system design choice> of a deep neural network .. To improve recognition capabilities of the CD-DNN-HMM system for a particular user or context .. it may be desirable to adapt the DNN to the particular user or context).”  
As per claim 8 (dependent on claim 1), YU in view of RABINOWITZ further discloses “wherein: the machine-learned model comprises a plurality of layers; and at least some the second set of learnable parameters included in the model patch comprise one or both of scale and bias parameters for one or more layers of the plurality of layers (YU, [0007], to adapt the DNN <read on a plurality of layers> to the particular user or context; RABINOWITZ, Fig. 1 <also read on a plurality of layers, and where each layer of neural network typically consists of scale and bias parameters>).”   
As per claim 16 (dependent on claim 1), YU in view of RABINOWITZ further discloses “simultaneous with training, by the one or more computing devices, the machine-learned model including the model patch on the second training dataset to perform the second task: training, by the one or more computing devices, the machine-learned model excluding the model patch on the first training dataset to perform the first task (YU, [0005], ASR systems <read on a first task> that utilize DNNs are trained; RABINOWITZ, [Abstract], One system includes a sequence of deep neural networks (DNNs), including: a first DNN corresponding to a first machine learning task .. and one or more subsequent DNNs corresponding to one or more respective machine learning tasks <where the different tasks are trained simultaneously or not is a system design choice>).”  
As per claim 17 (dependent on claim 16), YU in view of RABINOWITZ further discloses “wherein training, by the one or more computing devices, the machine-learned model excluding the model patch on the first training dataset to perform the first task comprises training, by the one or more computing devices, the machine-learned model excluding the model patch but including an alternative model patch on the first training dataset to perform the first task.” (YU, [0005], ASR systems <read on a first task> that utilize DNNs are trained; RABINOWITZ, [Abstract], One system includes a sequence of deep neural networks (DNNs), including: a first DNN corresponding to a first machine learning task .. and one or more subsequent DNNs corresponding to one or more respective machine learning tasks <where which model patch is included for training is a system design choice>).”    
As per claim 18 (dependent on claim 1), YU in view of RABINOWITZ further discloses “wherein the first task comprises processing of first input data structured according to a first domain and the second task comprises processing of second input data structured according to a second domain that is different than the first domain (RABINOWITZ, [Abstract], One system includes a sequence of deep neural networks (DNNs), including: a first DNN corresponding to a first machine learning task .. and one or more subsequent DNNs corresponding to one or more respective machine learning tasks <where each task can be on any domain per system design choice>).”
As per claim 19 (dependent on claim 18), YU in view of RABINOWITZ further discloses “after training, by the one or more computing devices, the machine-learned model on the second training dataset to perform the second task: receiving, by the one or more computing devices, new input data; and when the new input data is structured according to the first domain, employing, by the one or more computing devices, the machine-learned model excluding the model patch to process the new input data to generate a first prediction; and when the new input data is structured according to the second domain, employing, by the one or more computing devices, the machine-learned model including the model patch to process the new input data to generate a second prediction (YU, [0007], a context-dependent deep neural network hidden Markov model (CD-DNN-HMM) .. To improve recognition capabilities of the CD-DNN-HMM system for a particular user or context <read on a different domain> .. it may be desirable to adapt the DNN to the particular user or context <read on to use the original DNN or adapted DNN, i.e., with the model patch which can be broadly interpreted>; RABINOWITZ, [Abstract], a first DNN corresponding to a first machine learning task .. and one or more subsequent DNNs corresponding to one or more respective machine learning tasks <where each task can be on any domain per system design choice>).”
As per claim 21 (dependent on claim 18), YU in view of RABINOWITZ further discloses “wherein the first domain comprises a first image resolution, the first task comprises processing imagery of the first input resolution, the second domain comprises Page 6 of 9Amendment Dated: November 25, 2019a second image resolution that is smaller than the first image resolution, and the second task comprises processing imagery of the second input resolution (RABINOWITZ, [Abstract], One system includes a sequence of deep neural networks (DNNs), including: a first DNN corresponding to a first machine learning task .. and one or more subsequent DNNs corresponding to one or more respective machine learning tasks <where each different task reads on image recognition under different resolution>).”
As per claim 22 (dependent on claim 1), YU in view of RABINOWITZ further discloses “wherein the machine-learned model comprises a neural network and the model patch comprises a patch subnetwork (RABINOWITZ, [Abstract], One system includes a sequence of deep neural networks (DNNs), including: a first DNN corresponding to a first machine learning task .. and one or more subsequent DNNs corresponding to one or more respective machine learning tasks <where each subsequent DNN reads on a patch subnetwork which can be broadly interpreted>).”
As per claim 24 (dependent on claim 1), YU in view of RABINOWITZ further discloses “wherein the first task comprises object detection and the second task comprises image classification (RABINOWITZ, [Abstract], One system includes a sequence of deep neural networks (DNNs), including: a first DNN corresponding to a first machine learning task .. and one or more subsequent DNNs corresponding to one or more respective machine learning tasks <where each task can be object detection or image classification or else per system design choice>).”
	Claim 25 (similar in scope to claim 1) is rejected under the same rationales as applied above for claim 1. Furthermore, YU teaches: [0064] “the functions can be stored on or transmitted over as one or more instructions or code on a computer-readable medium” and RABINOWITZ teaches: [0068] “to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus.” Both teach ready mechanisms for transmitting any information (such as the first set but not the second set of learnable parameters) between any two devices per system design choice. Note that Claim 25 recites “by the one or more computing devices” which has antecedent issue rejectable under 35 USC 112(b).
Claim 26 (similar in scope to claim 1) is rejected under the same rationales as applied above for claim 1. Note that the references cited in Claim 1 teach ready mechanisms to realize all the limitations recited in Claim 26.

4.	Claims 5-7, 11, 15 are rejected under 35 U.S.C. 103 as being unpatentable over YU in view of RABINOWITZ, and further in view of Cai, et al. (IEEE International Conference on Multimedia and Expo (ICME), 2018; hereinafter CAI).
As per claim 5 (dependent on claim 1), YU in view of RABINOWITZ further discloses “wherein, after modification of the machine-learned model to include the model patch, at least [a portion of the model patch is positioned structurally prior to a final layer of the machine-learned model ].”
YU in view of RABINOWITZ does not expressly disclose “a portion of the model patch is positioned structurally prior to a final layer of the machine-learned model ..” However, this feature is taught by CAI (Title: Enhancing CNN Incremental Learning Capability with an Expanded Network). 
In the same field of endeavor, CAI teaches: Fig. 2; [Introduction, para 3] “keeps filters of the original networks on one hand, yet adds additional filters to the convolutional layers and the fully connected (FC) layers on the other hand.” Note that where a portion of the model patch is placed is a system design choice.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of CAI in the system taught by YU and RABINOWITZ to place a portion of the model patch per system design choice.
As per claim 6 (dependent on claim 1), YU in view of RABINOWITZ further discloses “wherein, after modification of the machine-learned model to include the model patch, at least [a portion of the model patch is included in an intermediate layer of the machine-learned model].”
YU in view of RABINOWITZ does not expressly disclose “a portion of the model patch is included in an intermediate layer of the machine-learned model ..” However, this feature is taught by CAI (Title: Enhancing CNN Incremental Learning Capability with an Expanded Network). 
In the same field of endeavor, CAI teaches: Fig. 2; [Introduction, para 3] “keeps filters of the original networks on one hand, yet adds additional filters to the convolutional layers and the fully connected (FC) layers on the other hand.” Note that where a portion of the model patch is placed is a system design choice.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of CAI in the system taught by YU and RABINOWITZ to place a portion of the model patch per system design choice.
As per claim 7 (dependent on claim 5), YU in view of RABINOWITZ further discloses “wherein the model patch further includes the final layer of the machine-learned model (CAI, Fig. 2; [Introduction, para 3], keeps filters of the original networks on one hand, yet adds additional filters to the convolutional layers and the fully connected (FC) layers on the other hand <Note that what the machine-learned model includes is a system design choice>).”
As per claim 11 (dependent on claim 1), YU in view of RABINOWITZ further discloses “wherein: the machine-learned model comprises a convolutional machine-learned model that includes one or more convolutional filters; and modifying, by the one or more computing devices, the machine-learned model to include the model patch comprises [ replacing, by the one or more computing devices, at least one of the convolutional filters with a reduced-parameter version of the convolutional filter ] (RABINOWITZ, [Abstract], a first DNN corresponding to a first machine learning task .. and one or more subsequent DNNs corresponding to one or more respective machine learning tasks <read on model patches>).”    
YU in view of RABINOWITZ does not expressly disclose “replacing .. at least one of the convolutional filters with a reduced-parameter version of the convolutional filter ..” However, this feature is taught by CAI (Title: Enhancing CNN Incremental Learning Capability with an Expanded Network). 
In the same field of endeavor, CAI teaches: Fig. 2, 3; [Introduction, para 3] “keeps filters of the original networks on one hand, yet adds additional filters to the convolutional layers and the fully connected (FC) layers on the other hand” and [Conclusion] “modifications such as pruning [16] can be used to reduce the size of the ExpandNet.”
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of CAI in the system taught by YU and RABINOWITZ to reduce the complexity of any neural network per system design choice.
As per claim 15 (dependent on claim 1), YU in view of RABINOWITZ further discloses “wherein: the machine-learned model comprises a plurality of layers; and the model patch comprises at least one additional intermediate layer that is structurally positioned between at least two of the plurality of layers (see Claim 5 rejections).”

5.	Claims 9-10 are rejected under 35 U.S.C. 103 as being unpatentable over YU in view of RABINOWITZ, and further in view of Lillicrap, et al. (US 20170024643; hereinafter LILLICRAP).
As per claim 9 (dependent on claim 8), YU in view of RABINOWITZ further discloses “wherein the scale and bias parameters for the one or more layers comprise scale and bias parameters for one or more [ batch normalization ] operations performed respectively for the one or more layers (RABINOWITZ, [Abstract], deep neural networks (DNNs) <where each layer of neural network typically consists of scale and bias parameters>).”   
YU in view of RABINOWITZ does not expressly disclose “batch normalization ..” However, this feature is taught by LILLICRAP (Title: Continuous control with deep reinforcement learning).
In the same field of endeavor, LILLICRAP teaches: [0028] “the critic neural network 140, the actor neural network 110, or both include one or more batch normalization layers in order to minimize covariance shift during training.”
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of LILLICRAP in the system taught by YU and RABINOWITZ to provide batch normalization for more efficient training.
As per claim 10 (dependent on claim 8), YU in view of RABINOWITZ further discloses “wherein the scale and bias parameters for the one or more layers comprise scale and bias parameters for one or more layer normalization operations, [ one or more batch renormalization operations ], or one or more group normalization operations performed respectively for the one or more layers.”
YU in view of RABINOWITZ does not expressly disclose “one or more batch normalization operations ..” However, this feature is taught by LILLICRAP (Title: Continuous control with deep reinforcement learning).
In the same field of endeavor, LILLICRAP teaches: [0028] “the critic neural network 140, the actor neural network 110, or both include one or more batch normalization layers in order to minimize covariance shift during training.”
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of LILLICRAP in the system taught by YU and RABINOWITZ to provide batch normalization for more efficient training.

6.	Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over YU in view of RABINOWITZ, and further in view of Roblek, et al. (US 20170330586; hereinafter ROBLEK).
As per claim 12 (dependent on claim 1), YU in view of RABINOWITZ further discloses “wherein: the machine-learned model comprises a convolutional machine-learned model that includes one or more convolutional filters; and modifying, by the one or more computing devices, the machine-learned model to include the model patch comprises replacing, by the one or more computing devices, at least one of the convolutional filters with [ a depth-wise separable convolution ] (RABINOWITZ, [Abstract], a first DNN corresponding to a first machine learning task .. and one or more subsequent DNNs corresponding to one or more respective machine learning tasks <where each subsequent DNN reads on a model patch>).”
YU in view of RABINOWITZ does not expressly disclose “a depth-wise separable convolution.” However, this feature is taught by ROBLEK (Title: Frequency based audio analysis using neural networks).
In the same field of endeavor, Roblek teaches: [0019] “convolutional neural network stage comprises at least (i) a 1×1×2 convolutional layer with row stride of 1 followed by a max pooling layer with row stride 2, (ii) a 3×1×D convolutional layer, and (iii) a 3×1×D depthwise-separable convolution with row stride 1 followed by a max pooling layer with row stride 2, wherein D represents a convolutional layer filter depth.”
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of ROBLEK in the system taught by YU and RABINOWITZ to provide a depth-wise separable convolution.
 
7.	Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over YU in view of RABINOWITZ, and further in view of Hu, et al. (IEEE Transactions on Pattern Analysis and Machine Intelligence, Sep. 2017; hereinafter HU).
As per claim 13 (dependent on claim 1), YU in view of RABINOWITZ further discloses “wherein: the machine-learned model comprises a plurality of layers; and at least some the second set of learnable parameters included in the model patch comprise parameters included in one or both of [ a squeeze function or an excite function ] for one or more layers of the plurality of layers (RABINOWITZ, [Abstract], a first DNN corresponding to a first machine learning task .. and one or more subsequent DNNs corresponding to one or more respective machine learning tasks <where each subsequent DNN reads on a model patch>).”
YU in view of RABINOWITZ does not expressly disclose “a squeeze function or an excite function ..” However, this feature is taught by HU (Title: Squeeze-and-Excitation Networks).
In the same field of endeavor, HU teaches: [Abstract] “Convolutional neural networks ..  the “Squeeze-and-Excitation” (SE) block .. SE blocks produce significant performance improvements for existing state-of-the-art deep architectures at minimal additional computational cost.”
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of HU in the system taught by YU and RABINOWITZ to provide Squeeze-and-Excitation” (SE) blocks for neural network operation.
8.	Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over YU in view of RABINOWITZ, and further in view of Danihelka (US 20170228633; hereinafter DANIHELKA).
As per claim 14 (dependent on claim 1), YU in view of RABINOWITZ further discloses “wherein: the machine-learned model comprises a plurality of layers; and at least some the second set of learnable parameters included in the model patch comprise parameters included in [ a gating function ] for one or more layers of the plurality of layers (RABINOWITZ, [Abstract], a first DNN corresponding to a first machine learning task .. and one or more subsequent DNNs corresponding to one or more respective machine learning tasks <where each subsequent DNN reads on a model patch>).”
YU in view of RABINOWITZ does not expressly disclose “a gating function ..” However, this feature is taught by DANIHELKA (Title: Generative neural networks).
In the same field of endeavor, DANIHELKA teaches: [0042] “Each LSTM memory block can include one or more cells that each include an input gate, a forget gate, and an output gate that allow the cell to store previous activations generated by the cell, e.g., as a hidden state for use in generating a current activation or to be provided to other components of the LSTM neural network” and [0068] “a convolutional gated recurrent unit (CGRU) neural network.”
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of DANIHELKA in the system taught by YU and RABINOWITZ to provide gating function for neural network operation.
						Conclusion 
9.	 Any inquiry concerning this communication or earlier communications from the examiner should be directed to FENG-TZER TZENG whose telephone number is (571)272-4609. The examiner can normally be reached on M-F (8:00-5:30). The fax phone number where this application or proceeding is assigned is 571-273-4609.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir (SPE) can be reached on (571)272-7799. 
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/FENG-TZER TZENG/		7/29/2022Primary Examiner, Art Unit 2659