DETAILED ACTION
1. 	This action is in response to the application filed 12/10/2018 which claims foreign priority to KR10-2018-0144354 filed on 11/21/2018. Claims 1-20 are pending and have been considered. 
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 12/12/2018 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Specification
The disclosure is objected to because of the following informalities: In paragraph [0026], line 14 "dataset ." should read "dataset.".  
Appropriate correction is required.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 

Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitations uses a generic placeholder 
Meta model unit configured to determine in claim 13.
Meta model training unit configured to generate in claim 14.
Transfer learning unit configured to perform transfer learning in claim 15. 
Because these claim limitations are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, they are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have these limitations interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitations to avoid them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitations recites sufficient structure to perform the claimed function so as to avoid them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 13, 14, 16, 19, and 20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 

Regarding claim 13, 
Step 1 Analysis: Claim 13 is directed to a process, which falls within one of the four statutory categories. 
Step 2A Prong 1 Analysis: Claim 13 recites, in part, determine a form and amount of information to be transferred, generating an attention map to be used for transfer learning, determining the form of information to be transferred, and determining the amount of data to be transferred. These limitations, as drafted, are processes that under broadest reasonable interpretation, covers performance of the limitation in the mind. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind or pen and paper, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea. 
Step 2A Prong 2 Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites the additional elements – “meta model unit”, “first meta model”, and “second meta model”. These elements invoke 112(f) and can be interpreted to be a processor as disclosed in ¶[0097] of the specification. Thus, the elements in the claim are recited at a high-level of generality (i.e. as a generic processor performing a generic computer function of generating an index) such that it amounts to no more than mere instructions to apply the exception using a generic computer component. Accordingly, these additional elements do not integrate the 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of utilizing a meta model unit, first meta model and second meta model to perform the steps of the claimed process amount to no more than mere instructions to apply an exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible. 

Regarding claim 14, the rejection of claim 13 is further incorporated, and further, the claim recites: further comprising a meta model training unit configured to generate a virtual source dataset and a virtual target dataset through the source dataset used by the pre-trained model, train a virtual pre-trained model and a virtual target model, and train the meta model in order to be of help to training. This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 13 above. 
The claim recites the additional element “transfer learning unit”, however it does not amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception, for the reasons set forth in connection with the rejection of claim 13 above. The claim is not patent eligible. 

claim 16, the rejection of claim 13 is further incorporated, and further, the claim recites: wherein: the amount of data to be transferred is a constant value output through the second meta model, and the constant value is differently applied for each pair of layers. The claim recites additional mathematical steps in addition to the mental steps identified in the rejection of claim 13, thus recites a judicial exception. 
The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible. 

Regarding claim 19, the rejection of claim 14 is further incorporated, and further, the claim recites: wherein the meta model training unit trains the meta model and the virtual target model to minimize a loss function. The claim recites additional mathematical steps in addition to the mental steps identified in the rejection of claim 13, thus recites a judicial exception. 
The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible. 

Regarding claim 20, the rejection of claim 13 is further incorporated, and further, the claim recites: wherein: the pre-trained model and the target model comprise a deep learning model, and the target model is trained through the new target dataset using a previously trained deep learning model. This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 13 above. 


Claims 1-12, 15, and 17 recite additional elements or steps that amount to a practical application of the abstract idea or significantly more than the exception and would be eligible if incorporated into the respective parent independent claim. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.

4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 8 and 9 are rejected under 35 U.S.C. 103 as being unpatentable over Luo et al. ("Label Efficient Learning of Transferable Representations across Domains and Tasks", hereinafter "Luo") in view of Finn et al. ("Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks", hereinafter "Finn").

Regarding claim 1, Luo teaches A transfer learning method [See §3 Method], comprising steps of: determining a form and amount of information to be transferred, used by a pre-trained model, (“Fine-tuning has been broadly applied to reduce the number of labeled examples needed for learning new tasks, such as recognizing new object categories after ImageNet pre-training [54, 18], or learning new label structures such as detection after classification pre-training [14, 50]. Here we focus on transfer in the case of a shared label structure (e.g. classification of different category sets) We assume the source domain contains ns images, xs ∈ XS , with associated labels, ys ∈ YS . Similarly, the target domain consists of nt unlabeled images, x˜t ∈ X˜T, as well as mt images, xt ∈ XT, with associated labels, yt ∈ YT . We assume that the target domain is only sparsely labeled so that the number of image-label pairs is much smaller than the number of unlabeled images, mt << nt. Additionally, the number of source labeled images is assumed to be much larger than the number of target labeled images” [pg. 3, §3. Method, ¶2-3; Luo discloses classification pre-training which would implicitly correspond to a pre-trained model. Examiner is interpreting a form of information to correspond to an image.]) based on similarity between a source dataset and a new target dataset (“For each unlabeled target image, x˜t , we compute the similarity, ψ(·), to each labeled example or to each prototypical example [56] per class in the labeled set. For simplicity of presentation let us consider semantic transfer from the source to the target domain first. For each target unlabeled image we compute a similarity vector where the ith element is the similarity between this target image and the ith labeled source image” [pg. 5, §3.3 Cross category similarity for semantic transfer]); 
and performing transfer-learning on a target model (“We then initialize the target model (depicted in green in Figure 1) with the source parameters and begin our adaptive transfer learning.” [pg. 4, §3.1 Joint domain and semantic transfer, ¶2]).
However Luo fails to explicitly teach using a meta model 
using the form and amount of information of the pre-trained model determined by the meta model
Finn teaches using a meta model (See § 1. Introduction ¶2, Finn discloses a meta learning algorithm which trains a meta model)
using the form and amount of information of the pre-trained model determined by the meta model (“We consider a model, denoted f, that maps observations x to outputs a. During meta-learning, the model is trained to be able to adapt to a large or infinite number of tasks.” [pg. 2, §2.1. Meta-Learning Problem Set-Up, ¶2; Finn discloses a meta learning model which would correspond to a meta model, the model would be able to determine and modify information (i.e. tasks)])
	Luo and Finn are both in analogous fields of task and domain adaptation. Luo discloses a task and domain adaptation method by using transfer learning. Finn discloses a meta learning algorithm that trains a plurality of meta models in order to modify specific tasks. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Luo’s teachings with Finn’s teachings to include a meta model as a part of the transfer learning method. One would have been motivated to use a meta model because they are used to modify information for specific tasks and are trained to be able to learn on a large number of different tasks. [Finn, § 1. Introduction, ¶2] 

	Regarding claim 8, the combination of Luo and Finn teaches The transfer learning method of claim 1, where Luo further teaches wherein: the pre-trained model and the target model comprise a deep learning model, and the target model is trained through the new target dataset using a previously trained deep learning model (“In addition, this setting matches closely to the most common practical approach for training deep models which is to use a large labeled source dataset (often ImageNet [6, 52]) to train an initial representation and then to continue supervised learning with a new set of data and often with new concepts.” [pg. 1, § 1. Introduction, ¶3, lines 6-9; Luo discloses training deep models using pre-trained models (see §3. Methods) which would correspond to previously trained deep model.]).
claim 9, Luo teaches A transfer learning system implemented as a computer (“In computer vision, examples of transfer learning include which try to overcome the deficit of training samples for some categories by adapting classifiers trained for other categories” [pg. 2, § 2 Related work, ¶2; computer vision would implicitly need to use a computer to perform the transfer learning]), comprising: at least one processor implemented to execute instructions readable by a computer, wherein the at least one processor is configured to: determine a form and amount of information to be transferred, used by a pre-trained model (“Fine-tuning has been broadly applied to reduce the number of labeled examples needed for learning new tasks, such as recognizing new object categories after ImageNet pre-training [54, 18], or learning new label structures such as detection after classification pre-training [14, 50]. Here we focus on transfer in the case of a shared label structure (e.g. classification of different category sets) We assume the source domain contains ns images, xs ∈ XS , with associated labels, ys ∈ YS . Similarly, the target domain consists of nt unlabeled images, x˜t ∈ X˜T, as well as mt images, xt ∈ XT, with associated labels, yt ∈ YT . We assume that the target domain is only sparsely labeled so that the number of image-label pairs is much smaller than the number of unlabeled images, mt << nt. Additionally, the number of source labeled images is assumed to be much larger than the number of target labeled images” [pg. 3, §3. Method, ¶2-3; Luo discloses classification pre-training which would implicitly correspond to a pre-trained model. Examiner is interpreting a form of information to correspond to an image.]), based on similarity between a source dataset and a new target dataset (“For each unlabeled target image, x˜ t , we compute the similarity, ψ(·), to each labeled example or to each prototypical example [56] per class in the labeled set. For simplicity of presentation let us consider semantic transfer from the source to the target domain first. For each target unlabeled image we compute a similarity vector where the ith element is the similarity between this target image and the ith labeled source image” [pg. 5, §3.3 Cross category similarity for semantic transfer]); and perform transfer-learning on a target model (“We then initialize the target model (depicted in green in Figure 1) with the source parameters and begin our adaptive transfer learning.” [pg. 4, §3.1 Joint domain and semantic transfer, ¶2]).
Luo fails to explicitly teach using a meta model 
using the form and amount of information of the pre-trained model determined by the meta model.
Finn teaches using a meta model (See § 1. Introduction ¶2, Finn discloses a meta learning algorithm which trains a meta model)
using the form and amount of information of the pre-trained model determined by the meta model (“We consider a model, denoted f, that maps observations x to outputs a. During meta-learning, the model is trained to be able to adapt to a large or infinite number of tasks.” [pg. 2, §2.1. Meta-Learning Problem Set-Up, ¶2; Finn discloses a meta learning model which would correspond to a meta model, the model would be able to determine and modify information (i.e. tasks)])
Luo and Finn are both in analogous fields of task and domain adaptation. Luo discloses a task and domain adaptation method by using transfer learning. Finn discloses a meta learning algorithm that trains a plurality of meta models in order to [Finn, § 1. Introduction, ¶2]

Claims 2, 7, 10 are rejected under 35 U.S.C. 103 as being unpatentable over Luo in view of Finn and in further view of Peng et al. ("VisDA: The Visual Domain Adaptation Challenge", hereinafter "Peng").

Regarding claim 2, the combination of Luo and Finn teaches The transfer learning method of claim 1, where Finn further teaches and training the meta model in order to be of help to the training (“In our meta-learning scenario, we consider a distribution over tasks p(T ) that we want our model to be able to adapt to. In the K-shot learning setting, the model is trained to learn a new task Ti drawn from p(T) from only K samples drawn from qi and feedback LTi generated by Ti . During meta-training, a task Ti is sampled from p(T ), the model is trained with K samples and feedback from the corresponding loss LTi from Ti, and then tested on new samples from Ti. The model f is then improved by considering how the test error on new data from qi changes with respect to the parameters.” [pg. 2, 2.1. Meta-Learning Problem Set-Up, ¶3; training the meta model would be a part of the transfer learning process thus would correspond to “helping” the training]).

Peng teaches further comprising a step of generating a virtual source dataset and a virtual target dataset through the source dataset used by the pre-trained model (“The goal in both tracks is to first train a model on simulated, synthetic data in the source domain and then adapt it to perform well on real image data in the unlabeled test domain. Our dataset is the largest one to date for cross-domain object classification, with over 280K images across 12 categories in the combined training, validation and testing domains. The image segmentation dataset is also large-scale with over 30K images across 18 categories in the three domains” [Abstract, pg. 1, see further Table 2 for existing synthetic objects datasets which the examiner interprets to be a virtual target dataset.]), training a virtual pre-trained model and a virtual target model (“We perform in-domain (i.e. train and test on the same domain) experiments to obtain approximate “oracle” performance, as well as source-only (i.e. train only on the source domain) to obtain the lower bound results of no adaptation. In total, we have 152,397 images as the source domain and 55,388 images as the target domain for validation. In our in-domain experiments, we follow a 70%/30% split for training and testing, i.e., 106,679 training images, 45,718 test images for the synthetic domain and 38,772 training images, 16,616 test images for the real domain” [pg. 4, § 3.2. Experiments, lines 8-12; synthetic domain would correspond to a virtual target model, experiment is done using synthetic source domain (corresponds to virtual pre-trained model) see pg. 5, right col, ¶1])
Luo, Finn and Peng are all in analogous fields of task and domain adaptation. Luo discloses a task and domain adaptation method by using transfer learning. Finn discloses a meta learning algorithm that trains a plurality of meta models in order to modify specific tasks. Peng discloses visual adaptation challenge which uses synthetic datasets and adapts it to a validation and real target dataset. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Luo’s teachings and Finn’s teachings to include synthetic datasets and models as part of the transfer learning method. One would be motivated to use synthetic models and datasets in order to train and test the model to transfer knowledge from a large source dataset to an unlabeled target dataset. [See Fig. 1, pg. 1, Peng] 

Regarding claim 7, the combination of Luo, Finn, and Peng teaches The transfer learning method of claim 2, wherein in the step of training the meta model, where Finn further teaches the meta model are trained to minimize a loss function (“Formally, each task T = {L(x1, a1, . . . , xH, aH), q(x1), q(xt+1|xt, at), H} consists of a loss function L, a distribution over initial observations q(x1), a transition distribution q(xt+1|xt, at), and an episode length H.” [pg. 2, §2.1. Meta-Learning Problem Set-Up, ¶2; note: Loss functions are always minimized]). 
Peng further teaches the virtual target model are trained to minimize a loss function (“It improved their source only ResNet-152 model from 45.3% to 92.8%, a 104% relative improvement. Their method consisted of optimizing two losses: 1) a mean cross entropy between ground truth and predictions of the so-called student network on samples from the source domain, and 2) a mean square difference between predictions of student and teacher networks on all samples from both domains.” [pg. 6, top left col, ¶1; optimizing loss would be equivalent to minimizing a loss function.]).
Luo, Finn and Peng are all in analogous fields of task and domain adaptation. Luo discloses a task and domain adaptation method by using transfer learning. Finn discloses a meta learning algorithm that trains a plurality of meta models in order to modify specific tasks. Peng discloses visual adaptation challenge which uses synthetic datasets and adapts it to a validation and real target dataset. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Luo’s teachings and Finn’s teachings to include synthetic datasets and models as part of the transfer learning method. One would be motivated to use synthetic models and datasets in order to train and test the model to transfer knowledge from a large source dataset to an unlabeled target dataset. [See Fig. 1, pg. 1, Peng] 

Regarding claim 10, the combination of Luo and Finn teaches The transfer learning system of claim 9, where Finn further teaches wherein the at least one processor is configured to:
and train the meta model in order to be of help to the training (“In our meta-learning scenario, we consider a distribution over tasks p(T ) that we want our model to be able to adapt to. In the K-shot learning setting, the model is trained to learn a new task Ti drawn from p(T) from only K samples drawn from qi and feedback LTi generated by Ti . During meta-training, a task Ti is sampled from p(T ), the model is trained with K samples and feedback from the corresponding loss LTi from Ti, and then tested on new samples from Ti. The model f is then improved by considering how the test error on new data from qi changes with respect to the parameters.” [pg. 2, 2.1. Meta-Learning Problem Set-Up, ¶3; training the meta model would be a part of the transfer learning process thus would correspond to “helping” the training]).
However the combination of Luo and Finn fails to explicitly teach generate a virtual source dataset and a virtual target dataset through the source dataset used by the pre-trained model, train a virtual pre-trained model and a virtual target model
Peng teaches generate a virtual source dataset and a virtual target dataset through the source dataset used by the pre-trained model (“The goal in both tracks is to first train a model on simulated, synthetic data in the source domain and then adapt it to perform well on real image data in the unlabeled test domain. Our dataset is the largest one to date for cross-domain object classification, with over 280K images across 12 categories in the combined training, validation and testing domains. The image segmentation dataset is also large-scale with over 30K images across 18 categories in the three domains” [Abstract, pg. 1, see further Table 2 for existing synthetic objects datasets which the examiner interprets to be a virtual target dataset.]), train a virtual pre-trained model and a virtual target model (“We perform in-domain (i.e. train and test on the same domain) experiments to obtain approximate “oracle” performance, as well as source-only (i.e. train only on the source domain) to obtain the lower bound results of no adaptation. In total, we have 152,397 images as the source domain and 55,388 images as the target domain for validation. In our in-domain experiments, we follow a 70%/30% split for training and testing, i.e., 106,679 training images, 45,718 test images for the synthetic domain and 38,772 training images, 16,616 test images for the real domain” [pg. 4, § 3.2. Experiments, lines 8-12; synthetic domain would correspond to a virtual target model, experiment is done using synthetic source domain (corresponds to virtual pre-trained model) see pg. 5, right col, ¶1])
Luo, Finn and Peng are all in analogous fields of task and domain adaptation. Luo discloses a task and domain adaptation method by using transfer learning. Finn discloses a meta learning algorithm that trains a plurality of meta models in order to modify specific tasks. Peng discloses visual adaptation challenge which uses synthetic datasets and adapts it to a validation and real target dataset. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Luo’s teachings and Finn’s teachings to include synthetic datasets and models as part of the transfer learning method. One would be motivated to use synthetic models and datasets in order to train and test the model to transfer knowledge from a large source dataset to an unlabeled target dataset. [See Fig. 1, pg. 1, Peng] 


Claims 3-6, 11-13, 15-18, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Luo in view of Finn and further in view of Zagoruyko et al. ("Paying .

Regarding claim 3, the combination of Luo and Finn teaches The transfer learning method of claim 1, Luo further teaches wherein the step of determining a form and amount of information to be transferred using a meta model comprises steps of: 
and determining an amount of data to be transferred in each of the pre-trained model and the target model (“Fine-tuning has been broadly applied to reduce the number of labeled examples needed for learning new tasks, such as recognizing new object categories after ImageNet pre-training [54, 18], or learning new label structures such as detection after classification pre-training [14, 50]. Here we focus on transfer in the case of a shared label structure (e.g. classification of different category sets) We assume the source domain contains ns images, xs ∈ XS , with associated labels, ys ∈ YS . Similarly, the target domain consists of nt unlabeled images, x˜t ∈ X˜T, as well as mt images, xt ∈ XT, with associated labels, yt ∈ YT . We assume that the target domain is only sparsely labeled so that the number of image-label pairs is much smaller than the number of unlabeled images, mt << nt. Additionally, the number of source labeled images is assumed to be much larger than the number of target labeled images” [pg. 3, §3. Method, ¶2-3; Luo discloses classification pre-training which would implicitly correspond to a pre-trained model. Examiner is interpreting a form of information to correspond to an image.]), 
(“For each target unlabeled image we compute a similarity vector where the ith element is the similarity between this target image and the ith labeled source image” [pg. 5, §3.3 Cross category similarity for semantic transfer])
Finn further teaches using a second meta model (“We consider a model, denoted f, that maps observations x to outputs a. During meta-learning, the model is trained to be able to adapt to a large or infinite number of tasks.” [pg. 2.1 Meta-Learning Problem Set-Up, ¶2; Finn discloses a meta model that is able to modify a large amount of tasks.]).
However the combination of Luo and Finn fails to explicitly teach generating an attention map to be used for the transfer learning as output when a feature map of the pre-trained model or target model is input to a first meta model as input and determining the form of information to be transferred in the transfer learning;
Zagoruyko teaches generating an attention map to be used for the transfer learning as output when a feature map of the pre-trained model or target model is input to a first meta model as input and determining the form of information to be transferred in the transfer learning; (“Let us consider a CNN layer and its corresponding activation tensor A ∈ RC×H×W, which consists of C feature planes with spatial dimensions H ×W. An activation-based mapping function F (w.r.t. that layer) takes as input the above 3D tensor A and outputs a spatial attention map, i.e., a flattened 2D tensor defined over the spatial dimensions” [pg. 3, § 3.1 Activation-based Attention Transfer, ¶1; note: Zagoruyko discloses a model which performs attention transfer using student and teacher networks. Additionally, the form of information to be transferred would be an image (See Figure 1, pg. 2)]);
Luo, Finn and Zagoruyko are all in analogous fields of task and domain adaptation. Luo discloses a task and domain adaptation method by using transfer learning. Finn discloses a meta learning algorithm that trains a plurality of meta models in order to modify specific tasks. Zagoruyko discloses an attention map transfer learning method using convolutional neural networks. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Luo’s teachings and Finn’s teachings to substitute Zagoruyko’s model with a meta model disclosed by Finn to perform the attention transfer step. One would have to been motivated to use meta models because they are used to modify information for specific tasks and are trained to be able to learn on a large number of different tasks. [Finn, § 1. Introduction, ¶2]

Regarding claim 4, the combination of Luo, Finn and Zagoruyko teaches The transfer learning method of claim 3, where Luo further teaches wherein in the step of determining an amount of data to be transferred, the amount of data to be transferred is a constant value output (“Here, we define a new semantic transfer objective, LST, which transfers information from a labeled set of data to an unlabeled set of data by minimizing the entropy of the softmax with temperature of the similarity vector between an unlabeled point and all labeled points. Thus, this loss may be applied either between the source and unlabeled target data or between the labeled and unlabeled target data.” [pg. 5, § 3.3 Cross category similarity for semantic transfer, ¶1; note: Examiner is interpreting the temperature of the similarity vector to be equivalent to a constant value.), and the constant value is differently applied for each pair of layers (“We introduce a domain discriminator which aligns source and target representations across multiple layers of the network through domain adversarial learning. We enable semantic transfer through minimizing the entropy of the pairwise similarity between unlabeled and labeled target images and use the temperature of the softmax over the similarity vector to allow for non-overlapping label spaces.” [pg. 3, Figure 1]).
Finn further teaches through the second meta model (“We consider a model, denoted f, that maps observations x to outputs a. During meta-learning, the model is trained to be able to adapt to a large or infinite number of tasks.” [pg. 2.1 Meta-Learning Problem Set-Up, ¶2; Finn discloses a meta model that is able to modify a large amount of tasks.])
Luo, Finn and Zagoruyko are all in analogous fields of task and domain adaptation. Luo discloses a task and domain adaptation method by using transfer learning. Finn discloses a meta learning algorithm that trains a plurality of meta models in order to modify specific tasks. Zagoruyko discloses an attention map transfer learning method using convolutional neural networks. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Luo’s teachings and Zagoruyko’s teachings to include a meta model disclosed by Finn as a part of the transfer learning method. One would have been motivated to use a meta model because they are used to modify information for specific tasks and are trained to be able to learn on a large number of different tasks. [Finn, § 1. Introduction, ¶2]

claim 5, the combination of Luo and Finn teaches The transfer learning method of claim 1, however fails to explicitly teach wherein in the step of performing transfer learning on the target model, the transfer learning is performed in such a manner that an attention map of the target model generated through the meta model becomes similar to an attention map of the pre-trained model generated through the meta model
Zagoruyko teaches wherein in the step of performing transfer learning on the target model, the transfer learning is performed in such a manner that an attention map of the target model generated through the meta model becomes similar to an attention map of the pre-trained model generated through the meta model (“In attention transfer, given the spatial attention maps of a teacher network (computed using any of the above attention mapping functions), the goal is to train a student network that will not only make correct predictions but will also have attentions maps that are similar to those of the teacher.” [pg. 4, bottom para, lines 1-4; student network would correspond to a target model and teacher network would correspond to a pre-trained model).
Luo, Finn and Zagoruyko are all in analogous fields of task and domain adaptation. Luo discloses a task and domain adaptation method by using transfer learning. Finn discloses a meta learning algorithm that trains a plurality of meta models in order to modify specific tasks. Zagoruyko discloses an attention map transfer learning method using convolutional neural networks. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Luo’s teachings and Finn’s teachings to substitute Zagoruyko’s model with a meta model disclosed by Finn [Finn, § 1. Introduction, ¶2]

Regarding claim 6, the combination of Luo, Finn and Zagoruyko teaches The transfer learning method of claim 5, where Zagoruyko further teaches wherein in the step of performing transfer learning on the target model, the transfer learning is performed to reduce an additional loss in such a manner that the attention map of the target model generated through the meta model becomes similar to the attention map of the pre-trained model generated through the meta model (“Without loss of generality, we assume that transfer losses are placed between student and teacher attention maps of same spatial resolution, but, if needed, attention maps can be interpolated to match their shapes. Let S, T and WS, WT denote student, teacher and their weights correspondingly, and let L(W, x) denote a standard cross entropy loss. Let also L denote the indices of all teacher-student activation layer pairs for which we want to transfer attention maps.” [pg. 5, ¶2; note: Zagoruyko discloses a loss function to minimize loss which would correspond to reducing an additional loss.]).
Luo, Finn and Zagoruyko are all in analogous fields of task and domain adaptation. Luo discloses a task and domain adaptation method by using transfer learning. Finn discloses a meta learning algorithm that trains a plurality of meta models in order to modify specific tasks. Zagoruyko discloses an attention map transfer learning method using convolutional neural networks. It would have been obvious to one of [Finn, § 1. Introduction, ¶2]

Regarding claim 11, the combination of Luo and Finn teaches The transfer learning system of claim 9, where Luo further teaches wherein the at least one processor is configured to: determine the form and amount of information to be transferred using the meta model (“Fine-tuning has been broadly applied to reduce the number of labeled examples needed for learning new tasks, such as recognizing new object categories after ImageNet pre-training [54, 18], or learning new label structures such as detection after classification pre-training [14, 50]. Here we focus on transfer in the case of a shared label structure (e.g. classification of different category sets) We assume the source domain contains ns images, xs ∈ XS , with associated labels, ys ∈ YS . Similarly, the target domain consists of nt unlabeled images, x˜t ∈ X˜T, as well as mt images, xt ∈ XT, with associated labels, yt ∈ YT . We assume that the target domain is only sparsely labeled so that the number of image-label pairs is much smaller than the number of unlabeled images, mt << nt. Additionally, the number of source labeled images is assumed to be much larger than the number of target labeled images” [pg. 3, §3. Method, ¶2-3; Luo discloses classification pre-training which would implicitly correspond to a pre-trained model. Examiner is interpreting a form of information to correspond to an image.]), and determine an amount of data to be transferred in each of the pre-trained model and the target model (“Fine-tuning has been broadly applied to reduce the number of labeled examples needed for learning new tasks, such as recognizing new object categories after ImageNet pre-training [54, 18], or learning new label structures such as detection after classification pre-training. [pg. 3, §3. Method, ¶2]), based on the similarity between the source dataset and the target dataset (“For each target unlabeled image we compute a similarity vector where the ith element is the similarity between this target image and the ith labeled source image” [pg. 5, §3.3 Cross category similarity for semantic transfer]).
Finn further teaches using a second meta model (“We consider a model, denoted f, that maps observations x to outputs a. During meta-learning, the model is trained to be able to adapt to a large or infinite number of tasks.” [pg. 2.1 Meta-Learning Problem Set-Up, ¶2; Finn discloses a meta model that is able to modify a large amount of tasks.])
However the combination of Luo and Finn fails to explicitly teach generate an attention map to be used for transfer learning as output when a feature map of the pre-trained model or target model is input to a first meta model as input and determine the form of information to be transferred in the transfer learning
Zagoruyko teaches generate an attention map to be used for transfer learning as output when a feature map of the pre-trained model or target model is input to a first meta model as input and determine the form of information to be transferred in the transfer learning (“Let us consider a CNN layer and its corresponding activation tensor A ∈ RC×H×W, which consists of C feature planes with spatial dimensions H ×W. An activation-based mapping function F (w.r.t. that layer) takes as input the above 3D tensor A and outputs a spatial attention map, i.e., a flattened 2D tensor defined over the spatial dimensions” [pg. 3, § 3.1 Activation-based Attention Transfer, ¶1; note: Zagoruyko discloses a model which performs attention transfer using student and teacher networks. Additionally, the form of information to be transferred would be an image (See Figure 1, pg. 2)]) 
Luo, Finn and Zagoruyko are all in analogous fields of task and domain adaptation. Luo discloses a task and domain adaptation method by using transfer learning. Finn discloses a meta learning algorithm that trains a plurality of meta models in order to modify specific tasks. Zagoruyko discloses an attention map transfer learning method using convolutional neural networks. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Luo’s teachings and Finn’s teachings to substitute Zagoruyko’s model with a meta model disclosed by Finn to perform the attention transfer step. One would have been motivated to use meta models because they are used to modify information for specific tasks and are trained to be able to learn on a large number of different tasks. [Finn, § 1. Introduction, ¶2]

Regarding claim 12, the combination of Luo and Finn teaches The transfer learning system of claim 9, where Luo further teaches wherein the at least one processor is configured to: perform transfer learning on the target model (“We then initialize the target model (depicted in green in Figure 1) with the source parameters and begin our adaptive transfer learning.” [pg. 4, §3.1 Joint domain and semantic transfer, ¶2]), 

Zagoruyko teaches and perform the transfer learning in such a manner that an attention map of the target model generated through the meta model becomes similar to an attention map of the pre-trained model generated through the meta model (“In attention transfer, given the spatial attention maps of a teacher network (computed using any of the above attention mapping functions), the goal is to train a student network that will not only make correct predictions but will also have attentions maps that are similar to those of the teacher.” [pg. 4, bottom para, lines 1-4; student network would correspond to a target model and teacher network would correspond to a pre-trained model).
Luo, Finn and Zagoruyko are all in analogous fields of task and domain adaptation. Luo discloses a task and domain adaptation method by using transfer learning. Finn discloses a meta learning algorithm that trains a plurality of meta models in order to modify specific tasks. Zagoruyko discloses an attention map transfer learning method using convolutional neural networks. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Luo’s teachings and Finn’s teachings to substitute Zagoruyko’s model with a meta model disclosed by Finn to perform the attention transfer step. One would have been motivated to use meta models because they are used to modify information for specific tasks and are trained to be able to learn on a large number of different tasks. [Finn, § 1. Introduction, ¶2]
claim 13, Luo teaches A transfer learning system [See §3 Method], comprising: a meta model unit (Luo discloses computer vision which implicitly uses computers (i.e. processors) [pg. 2, §Transfer learning]) configured to determine a form and amount of information to be transferred, used by a pre- trained model (“Fine-tuning has been broadly applied to reduce the number of labeled examples needed for learning new tasks, such as recognizing new object categories after ImageNet pre-training [54, 18], or learning new label structures such as detection after classification pre-training [14, 50]. Here we focus on transfer in the case of a shared label structure (e.g. classification of different category sets) We assume the source domain contains ns images, xs ∈ XS , with associated labels, ys ∈ YS . Similarly, the target domain consists of nt unlabeled images, x˜t ∈ X˜T, as well as mt images, xt ∈ XT, with associated labels, yt ∈ YT . We assume that the target domain is only sparsely labeled so that the number of image-label pairs is much smaller than the number of unlabeled images, mt << nt. Additionally, the number of source labeled images is assumed to be much larger than the number of target labeled images” [pg. 3, §3. Method, ¶2-3; Luo discloses classification pre-training which would implicitly correspond to a pre-trained model. Examiner is interpreting a form of information to correspond to an image.]), based on similarity between a source dataset and a new target dataset (“For each unlabeled target image, x˜t , we compute the similarity, ψ(·), to each labeled example or to each prototypical example [56] per class in the labeled set. For simplicity of presentation let us consider semantic transfer from the source to the target domain first. For each target unlabeled image we compute a similarity vector where the ith element is the similarity between this target image and the ith labeled source image” [pg. 5, §3.3 Cross category similarity for semantic transfer]), wherein the meta model unit comprises: 
However Luo fails to explicitly teach a first meta model of generating an attention map to be used for transfer learning as output when a feature map of the pre-trained model or a target model is received as input and determining the form of information to be transferred in the transfer learning;
and a second meta model of determining the amount of data to be transferred in each layer of the pre-trained model and the target model based on the similarity between the source dataset and the target dataset
Finn teaches and a second meta model of determining the amount of data to be transferred in each layer of the pre-trained model and the target model based on the similarity between the source dataset and the target dataset (“We consider a model, denoted f, that maps observations x to outputs a. During meta-learning, the model is trained to be able to adapt to a large or infinite number of tasks.” [pg. 2, §2.1. Meta-Learning Problem Set-Up, ¶2; Finn discloses a meta learning model which would correspond to a meta model, the model would be able to determine and modify information (i.e. tasks)]).
Luo and Finn are both in analogous fields of task and domain adaptation. Luo discloses a task and domain adaptation method by using transfer learning. Finn discloses a meta learning algorithm that trains a plurality of meta models in order to modify specific tasks. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Luo’s teachings with Finn’s teachings to include [Finn, § 1. Introduction, ¶2]
The combination of Luo and Finn fails to explicitly teach a first meta model of generating an attention map to be used for transfer learning as output when a feature map of the pre-trained model or a target model is received as input and determining the form of information to be transferred in the transfer learning;
Zagoruyko teaches a first meta model of generating an attention map to be used for transfer learning as output when a feature map of the pre-trained model or a target model is received as input and determining the form of information to be transferred in the transfer learning (“Let us consider a CNN layer and its corresponding activation tensor A ∈ RC×H×W, which consists of C feature planes with spatial dimensions H ×W. An activation-based mapping function F (w.r.t. that layer) takes as input the above 3D tensor A and outputs a spatial attention map, i.e., a flattened 2D tensor defined over the spatial dimensions” [pg. 3, § 3.1 Activation-based Attention Transfer, ¶1; note: Zagoruyko discloses a model which performs attention transfer using student and teacher networks. Additionally, the form of information to be transferred would be an image (See Figure 1, pg. 2)]);
Luo, Finn and Zagoruyko are all in analogous fields of task and domain adaptation. Luo discloses a task and domain adaptation method by using transfer learning. Finn discloses a meta learning algorithm that trains a plurality of meta models in order to modify specific tasks. Zagoruyko discloses an attention map transfer learning [Finn, § 1. Introduction, ¶2]

Regarding claim 15, the combination of Luo, Finn and Zagoruyko teaches The transfer learning system of claim 13, where Luo further teaches further comprising a transfer learning unit (Luo discloses computer vision which implicitly uses computers (i.e. processors) [pg.2, § Transfer learning]) configured to perform transfer learning on the target model (“We then initialize the target model (depicted in green in Figure 1) with the source parameters and begin our adaptive transfer learning.” [pg. 4, §3.1 Joint domain and semantic transfer, ¶2]) using the form and amount of information to be transferred (“Fine-tuning has been broadly applied to reduce the number of labeled examples needed for learning new tasks, such as recognizing new object categories after ImageNet pre-training [54, 18], or learning new label structures such as detection after classification pre-training [14, 50]. Here we focus on transfer in the case of a shared label structure (e.g. classification of different category sets) We assume the source domain contains ns images, xs ∈ XS , with associated labels, ys ∈ YS . Similarly, the target domain consists of nt unlabeled images, x˜t ∈ X˜T, as well as mt images, xt ∈ XT, with associated labels, yt ∈ YT . We assume that the target domain is only sparsely labeled so that the number of image-label pairs is much smaller than the number of unlabeled images, mt << nt. Additionally, the number of source labeled images is assumed to be much larger than the number of target labeled images” [pg. 3, §3. Method, ¶2-3; Luo discloses classification pre-training which would implicitly correspond to a pre-trained model. Examiner is interpreting a form of information to correspond to an image.]), 
Finn further teaches determined by the meta model (“We consider a model, denoted f, that maps observations x to outputs a. During meta-learning, the model is trained to be able to adapt to a large or infinite number of tasks.” [pg. 2, §2.1. Meta-Learning Problem Set-Up, ¶2; Finn discloses a meta learning model which would correspond to a meta model, the model would be able to determine and modify information (i.e. tasks)]).
Luo, Finn and Zagoruyko are all in analogous fields of task and domain adaptation. Luo discloses a task and domain adaptation method by using transfer learning. Finn discloses a meta learning algorithm that trains a plurality of meta models in order to modify specific tasks. Zagoruyko discloses an attention map transfer learning method using convolutional neural networks. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Luo’s teachings and Zagoruyko’s teachings to include a meta model disclosed by Finn as a part of the transfer learning method. One would have been motivated to use a meta model because they are used to modify information for specific tasks and are trained to be able to learn on a large number of different tasks. [Finn, § 1. Introduction, ¶2]

claim 16, the combination of Luo, Finn, and Zagoruyko teaches The transfer learning system of claim 13, where Luo further teaches wherein: the amount of data to be transferred is a constant value output (“Here, we define a new semantic transfer objective, LST, which transfers information from a labeled set of data to an unlabeled set of data by minimizing the entropy of the softmax with temperature of the similarity vector between an unlabeled point and all labeled points. Thus, this loss may be applied either between the source and unlabeled target data or between the labeled and unlabeled target data.” [pg. 5, § 3.3 Cross category similarity for semantic transfer, ¶1; note: Examiner is interpreting the temperature of the similarity vector to be equivalent to a constant value.), and the constant value is differently applied for each pair of layers (“We introduce a domain discriminator which aligns source and target representations across multiple layers of the network through domain adversarial learning. We enable semantic transfer through minimizing the entropy of the pairwise similarity between unlabeled and labeled target images and use the temperature of the softmax over the similarity vector to allow for non-overlapping label spaces.” [pg. 3, Figure 1]). 
Finn further teaches through the second meta model (“We consider a model, denoted f, that maps observations x to outputs a. During meta-learning, the model is trained to be able to adapt to a large or infinite number of tasks.” [pg. 2.1 Meta-Learning Problem Set-Up, ¶2; Finn discloses a meta model that is able to modify a large amount of tasks.])
Luo, Finn and Zagoruyko are all in analogous fields of task and domain adaptation. Luo discloses a task and domain adaptation method by using transfer [Finn, § 1. Introduction, ¶2]

Regarding claim 17, the combination of Luo, Finn, and Zagoruyko teaches  The transfer learning system of claim 15, where Zagoruyko further teaches wherein the transfer learning unit performs transfer learning in such a manner that an attention map of the target model generated through the meta model becomes similar to an attention map of the pre-trained model generated through the meta model (“In attention transfer, given the spatial attention maps of a teacher network (computed using any of the above attention mapping functions), the goal is to train a student network that will not only make correct predictions but will also have attentions maps that are similar to those of the teacher.” [pg. 4, bottom para, lines 1-4; student network would correspond to a target model and teacher network would correspond to a pre-trained model).
Luo, Finn and Zagoruyko are all in analogous fields of task and domain adaptation. Luo discloses a task and domain adaptation method by using transfer learning. Finn discloses a meta learning algorithm that trains a plurality of meta models [Finn, § 1. Introduction, ¶2]

Regarding claim 18, the combination of Luo, Finn, and Zagoruyko teaches The transfer learning system of claim 17, where Zagoruyko further teaches wherein the transfer learning unit is trained to reduce an additional loss when the transfer learning is performed in such a manner that the attention map of the target model generated through the meta model becomes similar to the attention map of the pre-trained model generated through the meta model (“Without loss of generality, we assume that transfer losses are placed between student and teacher attention maps of same spatial resolution, but, if needed, attention maps can be interpolated to match their shapes. Let S, T and WS, WT denote student, teacher and their weights correspondingly, and let L(W, x) denote a standard cross entropy loss. Let also L denote the indices of all teacher-student activation layer pairs for which we want to transfer attention maps.” [pg. 5, ¶2; note: Zagoruyko discloses a loss function to minimize loss which would correspond to reducing an additional loss.]).
Luo, Finn and Zagoruyko are all in analogous fields of task and domain adaptation. Luo discloses a task and domain adaptation method by using transfer [Finn, § 1. Introduction, ¶2]

Regarding claim 20, the combination of Luo, Finn, and Zagoruyko teaches The transfer learning system of claim 13, where Luo further teaches wherein: the pre-trained model and the target model comprise a deep learning model, and the target model is trained through the new target dataset using a previously trained deep learning model (“In addition, this setting matches closely to the most common practical approach for training deep models which is to use a large labeled source dataset (often ImageNet [6, 52]) to train an initial representation and then to continue supervised learning with a new set of data and often with new concepts.” [pg. 1, § 1. Introduction, ¶3, lines 6-9; Luo discloses training deep models using pre-trained models (see §3. Methods) which would correspond to previously trained deep model.]).

Claims 14 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Luo in view of Finn and Zagoruyko as applied to claim 13 above, and further in view of Peng.

Regarding claim 14, the combination of Luo, Finn, and Zagoruyko teaches The transfer learning system of claim 13, where Finn further teaches further comprising a meta model training unit configured to 
and train the meta model in order to be of help to training (“In our meta-learning scenario, we consider a distribution over tasks p(T ) that we want our model to be able to adapt to. In the K-shot learning setting, the model is trained to learn a new task Ti drawn from p(T) from only K samples drawn from qi and feedback LTi generated by Ti . During meta-training, a task Ti is sampled from p(T ), the model is trained with K samples and feedback from the corresponding loss LTi from Ti, and then tested on new samples from Ti. The model f is then improved by considering how the test error on new data from qi changes with respect to the parameters.” [pg. 2, 2.1. Meta-Learning Problem Set-Up, ¶3; training the meta model would be a part of the transfer learning process thus would correspond to “helping” the training]).
However the combination of Luo, Finn and Zagoruyko fails to explicitly teach generate a virtual source dataset and a virtual target dataset through the source dataset used by the pre-trained model, train a virtual pre-trained model and a virtual target model
(“The goal in both tracks is to first train a model on simulated, synthetic data in the source domain and then adapt it to perform well on real image data in the unlabeled test domain. Our dataset is the largest one to date for cross-domain object classification, with over 280K images across 12 categories in the combined training, validation and testing domains. The image segmentation dataset is also large-scale with over 30K images across 18 categories in the three domains” [Abstract, pg. 1, see further Table 2 for existing synthetic objects datasets which the examiner interprets to be a virtual target dataset.]), train a virtual pre-trained model and a virtual target model (“We perform in-domain (i.e. train and test on the same domain) experiments to obtain approximate “oracle” performance, as well as source-only (i.e. train only on the source domain) to obtain the lower bound results of no adaptation. In total, we have 152,397 images as the source domain and 55,388 images as the target domain for validation. In our in-domain experiments, we follow a 70%/30% split for training and testing, i.e., 106,679 training images, 45,718 test images for the synthetic domain and 38,772 training images, 16,616 test images for the real domain” [pg. 4, § 3.2. Experiments, lines 8-12; synthetic domain would correspond to a virtual target model, experiment is done using synthetic source domain (corresponds to virtual pre-trained model) see pg. 5, right col, ¶1])
Luo, Finn, Zagoruyko and Peng are all in analogous fields of task and domain adaptation. Luo discloses a task and domain adaptation method by using transfer [See Fig. 1, pg. 1, Peng] 

Regarding claim 19, the combination of Luo, Finn, Zagoruyko, and Peng teaches The transfer learning system of claim 14, where Finn further teaches wherein the meta model training unit trains the meta model to minimize a loss function (“Formally, each task T = {L(x1, a1, . . . , xH, aH), q(x1), q(xt+1|xt, at), H} consists of a loss function L, a distribution over initial observations q(x1), a transition distribution q(xt+1|xt, at), and an episode length H.” [pg. 2, §2.1. Meta-Learning Problem Set-Up, ¶2; note: Loss functions are always minimized]).
Peng further teaches and the virtual target model to minimize a loss function (“It improved their source only ResNet-152 model from 45.3% to 92.8%, a 104% relative improvement. Their method consisted of optimizing two losses: 1) a mean cross entropy between ground truth and predictions of the so-called student network on samples from the source domain, and 2) a mean square difference between predictions of student and teacher networks on all samples from both domains.” [pg. 6, top left col, ¶1; optimizing loss would be equivalent to minimizing a loss function.]).
Luo, Finn, Zagoruyko and Peng are all in analogous fields of task and domain adaptation. Luo discloses a task and domain adaptation method by using transfer learning. Finn discloses a meta learning algorithm that trains a plurality of meta models in order to modify specific tasks. Zagoruyko discloses an attention map transfer learning method using convolutional neural networks. Peng discloses visual adaptation challenge which uses synthetic datasets and adapts it to a validation and real target dataset. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Luo’s teachings, Finn’s teachings, and Zagoruyko’s teachings to include synthetic datasets and models as part of the transfer learning method. One would be motivated to use synthetic models and datasets in order to train and test the model to transfer knowledge from a large source dataset to an unlabeled target dataset. [See Fig. 1, pg. 1, Peng] 
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Liu et al. ("Sparse Deep Transfer Learning for Convolutional Neural Network") discloses deep transfer learning for CNNs.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL H HOANG whose telephone number is (571)272-8491.  The examiner can normally be reached on Mon-Fri 8:30AM-4:30PM.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571) 272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/M.H.H./           Examiner, Art Unit 2122                                                                                                                                                                                             
/ERIC NILSSON/           Primary Examiner, Art Unit 2122