DETAILED ACTION
Response to Arguments
Applicant’s arguments filed 10/14/2021 with respect to 35 U.S.C § 112(f), 35 U.S.C § 112(a), and  35 U.S.C § 112(b) have been fully considered and are deemed persuasive. Accordingly, the rejections under 35 U.S.C § 112(f), 35 U.S.C § 112(a), and 35 U.S.C § 112(b) are withdrawn. 
Applicant’s arguments with respect to claims 1, 11, and 17 under 35 U.S.C § 103 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claim 13 is objected to because of the following informalities:  it depends on the canceled claim 12.  Appropriate correction is required. Accordingly, for the current Office Action, claim 13 is being examined as depending on claim 11. 


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 6, 9-10, 17 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Wang, Zehan et al. (US 2017/0347110 A1, hereinafter Wang 1) in view of Ammar et al. ("An Automated Measure of MDP Similarity for Transfer in Reinforcement Learning", July 2014, hereinafter Ammar) and further in view of Volpi, Riccardo, et al. "Adversarial feature augmentation for unsupervised domain adaptation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018(“Volpi”). 
Regarding claim 1, Wang 1 teaches: A system, comprising: a memory; and a processor operably coupled to the memory ([0110] e.g., "Visual data is often encoded prior to transmission across a network, or storage in a memory" [0430] e.g., "In this embodiment, original video data 70 is transmitted to an off-site computing system wherein one or more of the processes outlined in this specification take place. A different section of video data may be processed in parallel on the off-site computing system." Examiner notes that a computing system has at least a processor to process data.), wherein the processor
identifies a pre-trained neural network model associated with the source data set based on the similarity metric to perform the target machine learning task ([0220] "In some embodiments, extracted standardised features are used to produce a value or series of values based on a metric from the input data. In these embodiments, the metric can then be used to select the pre-trained model from the library which is most appropriate for the input data, as each model in the library has associated metric values based on the input data from which the models were respectively trained, the selection based on the similarity between the metrics associated with the input data and each of the pre-trained models."). 
	Wang 1 does not explicitly teach: assesses a similarity metric between a source data set and a sample data set from a target machine learning task.
	However, Ammar teaches: an assessment component that assesses a similarity metric between a source data set and a sample data set from a target machine learning task ([p. 31, Introduction, par. 3, and Fig. 1] e.g., "Transfer learning agents must be able to automatically identify source tasks that are most most [sic] similar to and helpful for learning a target task. In RL, where tasks are represented by Markov decision processes (MDPs), agents could use an MDP similarity measure to assess the relatedness of each potential source task to the given target." [p. 31, Introduction, par. 4] e.g., "this approach does not require a model of the MDP, but can estimate this metric from samples gathered through an agent’s interaction with the environment.").


    PNG
    media_image1.png
    446
    507
    media_image1.png
    Greyscale


Wang 1 and Ammar are analogous art because they are directed to neural network. It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of Wang 1 to incorporate the method of the similarity measure of Ammar. The motivation/suggestion for doing this would be for the purpose of improving the reinforcement learning of a new task by allowing the agent to reuse knowledge acquired from other source tasks. (Ammar [Abstract] e.g., "Transfer learning can improve the reinforcement learning of a new task by allowing the agent to reuse knowledge acquired from other source tasks.").
Wang 1 does not teach: and merge one or more pre-existing neural network models of different domains and generate one or more new hybrid pre-trained neural network models, wherein the one or more images comprised within a one of the one or more pre-existing neural network models associated with a first one of the different domains has an associated knowledge label not described by the one of the one or more pre-existing neural network models, wherein the knowledge label is employed to perform an analysis in a second one of the one or more pre-existing neural network models associated with a second one of the different domains, wherein the first one of the different domains is distinct from the second one of the different domains, and wherein respective data streams from the one of the one or more pre-existing neural network models associated with the first one of the different domains and from the second one of the one or more pre-existing neural network models associated with the second one of the different domains are merged as a one of the new hybrid pre- trained neural network models within a single layer to produce a multi-modal output.
However, Volpi teaches: and merge one or more pre-existing neural network models of different domains and generate one or more new hybrid pre-trained neural network models(pg. 3, e.g. “First, we need to train a feature extractor on source data (C                         
                            ∘
                        
                                             
                            
                                
                                    E
                                
                                
                                    s
                                
                            
                        
                    )[ i.e. merge one or more pre-existing neural network models of different domains]. This step is necessary because we need a reference feature space and a reference classifier that performs well on it. Secondly, we need to train a feature generator (S) to perform data augmentation in the source feature space. We can train it by playing a GAN minimax game against features extracted through                         
                            
                                
                                    E
                                
                                
                                    s
                                
                            
                        
                    . Finally, we can train a domain invariant feature extractor (                        
                            
                                
                                    E
                                
                                
                                    I
                                
                            
                        
                    ) by playing a GAN minimax game against features generated through S [ i.e. merge one or more pre-existing neural network models of different domains]. This module can then be combined with the softmax layer [on] previously trained (C                         
                            ∘
                        
                                             
                            
                                
                                    E
                                
                                
                                    I
                                
                            
                        
                    ) [i.e. generate one or more new hybrid pre-trained neural network models] to perform inference on both source and target samples.”), 
wherein the one or more images comprised within a one of the one or more pre-existing neural network models associated with a first one of the different domains has an associated knowledge label not described by the one of the one or more pre-existing neural network models(pg. 3, e.g. “The model C                         
                            ∘
                        
                                             
                            
                                
                                    E
                                
                                
                                    s
                                
                            
                        
                     is trained to classify source samples.                        
                             
                            
                                
                                    E
                                
                                
                                    s
                                
                            
                        
                     represents a ConvNet feature extractor and C represents a fully connected softmax layer, with a size that depends on the problem. The optimization problem consists in the minimization of the following cross-entropy loss (CE Loss in Figure 1):                         
                            m
                            i
                            
                                
                                    n
                                
                                
                                    
                                        
                                            θ
                                        
                                        
                                            
                                                
                                                    E
                                                
                                                
                                                    s
                                                
                                            
                                        
                                    
                                    ,
                                    
                                        
                                            θ
                                        
                                        
                                            C
                                        
                                    
                                
                            
                             
                            
                                
                                    l
                                
                                
                                    0
                                
                            
                            =
                            
                                
                                    E
                                
                                
                                    
                                        
                                            
                                                
                                                    x
                                                
                                                
                                                    i
                                                
                                            
                                            ,
                                             
                                            
                                                
                                                    y
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                    ~
                                    
                                        
                                            
                                                
                                                    X
                                                
                                                
                                                    s
                                                
                                            
                                            ,
                                             
                                            
                                                
                                                    Y
                                                
                                                
                                                    s
                                                
                                            
                                        
                                    
                                
                            
                        
                                              
                            H
                            (
                            C
                            ∘
                            
                                
                                    E
                                
                                
                                    s
                                
                            
                            
                                
                                    
                                        
                                            x
                                        
                                        
                                            i
                                        
                                    
                                
                            
                            ,
                             
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                            )
                        
                     [i.e. within a one of the one or more pre-existing neural network models associated with a first one of the different domains]…where                          
                            
                                
                                    θ
                                
                                
                                    
                                        
                                            E
                                        
                                        
                                            s
                                        
                                    
                                
                            
                        
                     and                         
                            
                                
                                    θ
                                
                                
                                    C
                                
                            
                             
                        
                    indicate the parameters of                         
                            
                                
                                    E
                                
                                
                                    s
                                
                            
                        
                     and                         
                            C
                        
                    , respectively,                         
                            
                                
                                    X
                                
                                
                                    s
                                
                            
                        
                    ,                         
                            
                                
                                    Y
                                
                                
                                    s
                                
                            
                        
                     are the distributions of source samples (                        
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                        
                    ) and source labels (                        
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                        
                    ) [i.e. has an  associated knowledge label not described by the one of the one or more pre-existing neural network models], respectively, and H represents the softmax cross-entropy function.”), 
wherein the knowledge label is employed to perform an analysis in a second one of the one or more pre-existing neural network models associated with a second one of the different domains(pg. 4, e.g. “The domain-invariant encoder                         
                            
                                
                                    E
                                
                                
                                    I
                                
                            
                        
                     is trained via the following minimax game, after being initialized with weights optimized on Step 0 (note that                          
                            
                                
                                    E
                                
                                
                                    s
                                
                            
                        
                     and                         
                            
                                
                                    E
                                
                                
                                    I
                                
                            
                        
                     have the same architecture), a requirement to reach optimal convergence:                         
                            m
                            i
                            
                                
                                    n
                                
                                
                                    
                                        
                                            θ
                                        
                                        
                                            
                                                
                                                    E
                                                
                                                
                                                    I
                                                
                                            
                                        
                                    
                                
                            
                            m
                            a
                            
                                
                                    x
                                
                                
                                    
                                        
                                            θ
                                        
                                        
                                            
                                                
                                                    D
                                                
                                                
                                                    2
                                                
                                            
                                        
                                    
                                
                            
                            
                                
                                    l
                                
                                
                                    2
                                
                            
                            =
                             
                            
                                
                                    E
                                
                                
                                    
                                        
                                            x
                                        
                                        
                                            i
                                        
                                    
                                    ~
                                    
                                        
                                            X
                                        
                                        
                                            s
                                        
                                    
                                    ∪
                                    
                                        
                                            X
                                        
                                        
                                            t
                                        
                                    
                                
                            
                            
                                
                                    
                                        
                                            
                                                
                                                    D
                                                
                                                
                                                    2
                                                
                                            
                                            
                                                
                                                    
                                                        
                                                            E
                                                        
                                                        
                                                            I
                                                        
                                                    
                                                    
                                                        
                                                            
                                                                
                                                                    x
                                                                
                                                                
                                                                    i
                                                                
                                                            
                                                        
                                                    
                                                
                                            
                                            -
                                            1
                                        
                                    
                                
                                
                                    2
                                
                            
                        
                    [i.e. in a second one of the one or more pre-existing neural network models associated with a second one of the different domains]                          
                            +
                             
                            
                                
                                    E
                                
                                
                                    (
                                    z
                                    ,
                                     
                                    
                                        
                                            y
                                        
                                        
                                            i
                                        
                                    
                                    )
                                    ~
                                    (
                                    
                                        
                                            p
                                        
                                        
                                            z
                                        
                                    
                                    
                                        
                                            z
                                        
                                    
                                    ,
                                     
                                    
                                        
                                            Y
                                        
                                        
                                            s
                                        
                                    
                                    )
                                
                            
                            
                                
                                    
                                        
                                            
                                                
                                                    D
                                                
                                                
                                                    2
                                                
                                            
                                            
                                                
                                                    S
                                                    (
                                                    z
                                                    |
                                                    |
                                                    
                                                        
                                                            y
                                                        
                                                        
                                                            i
                                                        
                                                    
                                                    )
                                                
                                            
                                        
                                    
                                
                                
                                    2
                                
                            
                        
                    [i.e. wherein the knowledge label is employed to perform an analysis ] where                         
                            
                                
                                    θ
                                
                                
                                    
                                        
                                            E
                                        
                                        
                                            I
                                        
                                    
                                
                            
                        
                     and                         
                            
                                
                                    θ
                                
                                
                                    
                                        
                                            D
                                        
                                        
                                            2
                                        
                                    
                                
                            
                        
                     indicate the parameters of                         
                            
                                
                                    E
                                
                                
                                    I
                                
                            
                        
                     and                         
                            
                                
                                    D
                                
                                
                                    2
                                
                            
                        
                    , respectively. Since the model                         
                            
                                
                                    E
                                
                                
                                    I
                                
                            
                        
                     is trained using both source and target domains, the feature extractor results domain-invariant.”), 
wherein the first one of the different domains is distinct from the second one of the different domains (pg. 4, e.g. “Both datasets consist of white digits on a solid black background. We tested two different protocols: the first one (P1) consists in sampling 2,000 MNIST [i.e. the first one of the different domains is distinct]…images and 1, 800 USPS… images [i.e. the second one of the different domains].”), 
and wherein respective data streams from the one of the one or more pre-existing neural network models associated with the first one of the different domains and from the second one of the one or more pre-existing neural network models associated with the second one of the different domains are merged as a one of the new hybrid pre- trained neural network models within a single layer to produce a multi-modal output(pg. 4, e.g.,  “Being the latter trained to produce features indistinguishable from the source ones, the feature extractor                         
                            
                                
                                    E
                                
                                
                                    I
                                
                            
                        
                     [i.e. from the one of the one or more pre-existing neural network models associated with the first one of the different domains] can be combined with the classification layer of Step 0 (C)[i.e. from the second one of the one or more pre-existing neural network models associated with the second one of the different domains ] and used for inference…                         
                            
                                
                                    
                                        
                                            y
                                        
                                        
                                            i
                                        
                                    
                                
                                ~
                            
                            =
                            C
                             
                            ∘
                            
                                
                                    E
                                
                                
                                    I
                                
                            
                            
                                
                                    
                                        
                                            x
                                        
                                        
                                            i
                                        
                                    
                                
                            
                            ,
                             
                        
                    [i.e. are merged as a one of the new hybrid pre- trained neural network models within a single layer to produce a multi-modal output] where                         
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                        
                     is a generic image from the source or the target data distribution and                         
                            
                                
                                    
                                        
                                            y
                                        
                                        
                                            i
                                        
                                    
                                
                                ~
                            
                        
                     is the inferred label (dashed box in Figure 1, right).”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of  Wang 1 with the teachings of Volpi the motivation to do so would be to learn target features that are indistinguishable from the source features to produce a domain-invariant classifiers that are more accurate(pg. 1, e.g. “[U]se a GAN objective to learn target features that are indistinguishable from the source ones, leading to a pair of feature extractors, one for the source and one for the target samples… [t]he CGAN generator is thus able to learn the class distribution in the feature space, and therefore to generate an arbitrary number of labeled feature vectors. Our results show that
forcing domain-invariance and augmenting features are both valuable approaches in the unsupervised domain adaptation setting, leading to higher classification accuracies.”)


Regarding claim 6, Wang 1 in view of Ammar and in view of Volpi teaches: The system of claim 1. 
Wang 1 further teaches: wherein the processor also identifies the pre-trained neural network model from a library of pre-existing models ([0220] e.g., "In these embodiments, the metric can then be used to select the pre-trained model from the library which is most appropriate for the input data, as each model in the library has associated metric values based on the input data from which the models were respectively trained, the selection based on the similarity between the metrics associated with the input data and each of the pre-trained models.").
 

Regarding claim 9, Wang 1 in view of Ammar and in view of Volpi teaches: The system of claim 1. 
Wang 1 further teaches: wherein the processor also assesses the similarity metric in a cloud computing environment ([0199] e.g., "In some embodiments off-site, or ‘cloud computing’, systems allow for the performance of computerised tasks on a server not necessarily local to the site of the recording or reconstruction of a section of visual data.").  

Regarding claim 10, Wang 1 in view of Ammar and in view of Volpi teaches: The system of claim 1. 
Wang 1 further teaches: wherein the processor further applies a data processing technique to the pre-trained neural network model, and wherein the data processing technique is selected from a group consisting of data normalization, data rotation, and data scaling ([0339] e.g., " the transmission of the down-sampled visual; and an image enhancement process to upscale from the down-sampled visual data to a higher-resolution visual data, using a combination of the received down-sampled visual data and a convolutional neural network selected from a library, the convolutional neural network able to use super resolution techniques to increase the resolution of the received down-sampled visual data").

Regarding claim 17, a computer program product that facilitates using a pre-trained neural network model to enhance performance of a target machine learning task, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor (Wang 1, [0320] "Aspects and/or embodiments include a computer program product comprising software code to effect the method and/or apparatus of other aspects and/or embodiments herein described." [0254] e.g., "Providing a plurality of pre-trained sets of parameters or models in some embodiments allows the training of a machine learning process to be accelerated." [0110] e.g., "Visual data is often encoded prior to transmission across a network, or storage in a memory" [0430] e.g., "In this embodiment, original video data 70 is transmitted to an off-site computing system wherein one or more of the processes outlined in this specification take place. A different section of video data may be processed in parallel on the off-site computing system." Examiner notes that a computing system has at least a processor to process data.) to: claim 1, and is similarly analyzed.

Regarding claim 19, the computer program product of claim 6, and is similarly analyzed.


Claims 11 and 15 is rejected under 35 U.S.C. 103 as being unpatentable over Wang, Zehan et al. (US 2017/0347110 A1, hereinafter Wang 1) in view of Ammar et al. ("An Automated Measure of MDP Similarity for Transfer in Reinforcement Learning", July 2014, hereinafter Ammar) and in view of Wei, Yunchao, et al. "CNN: Single-label to multi-label." arXiv preprint arXiv:1406.5726 (2014)(“Wei”).
Regarding claim 11, Wang 1 teaches: A computer-implemented method, comprising: identifying, by the system, a pre-trained neural network model associated with the source data set based on the similarity metric to perform the target machine learning task ([0220] "In some embodiments, extracted standardised features are used to produce a value or series of values based on a metric from the input data. In these embodiments, the metric can then be used to select the pre-trained model from the library which is most appropriate for the input data, as each model in the library has associated metric values based on the input data from which the models were respectively trained, the selection based on the similarity between the metrics associated with the input data and each of the pre-trained models."). 
	Wang 1 does not explicitly teach: assessing, by a system operatively coupled to a processor, a similarity metric between a source data set of one or more source data sets and a sample data set from a target machine learning task.
	However, Ammar teaches: assessing, by a system operatively coupled to a processor, a similarity metric between a source data set of one or more source data sets and a sample data set from a target machine learning task ([p. 31, Introduction, par. 3, and Fig. 1] e.g., "Transfer learning agents must be able to automatically identify source tasks that are most most [sic] similar to and helpful for learning a target task. In RL, where tasks are represented by Markov decision processes (MDPs), agents could use an MDP similarity measure to assess the relatedness of each potential source task to the given target." [p. 31, Introduction, par. 4] e.g., "this approach does not require a model of the MDP, but can estimate this metric from samples gathered through an agent’s interaction with the environment.").


    PNG
    media_image1.png
    446
    507
    media_image1.png
    Greyscale


Wang 1 and Ammar are analogous art because they are directed to neural network. It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of Wang 1 to incorporate the method of the similarity measure of Ammar. The motivation/suggestion for doing this would be for the purpose of improving the reinforcement learning of a new task by allowing the agent to reuse knowledge acquired from other source tasks. (Ammar [Abstract] e.g., "Transfer learning can improve the reinforcement learning of a new task by allowing the agent to reuse knowledge acquired from other source tasks.").
Wang 1 does not teach: identifying, by the system, penultimate layer outputs of the pre-trained neural network model associated with the source data set as feature descriptors; aggregating, by the system, a plurality of the feature descriptors that characterize the one or more of the source data sets employing one or more statistical aggregation techniques; and averaging, by the system, the plurality of the feature descriptors characterizing the one or more source data sets within a defined category to compute a category feature representation.
However, Wei teaches: identifying, by the system, penultimate layer outputs of the pre-trained neural network model associated with the source data set as feature descriptors(pg. 6, e.g. “During the I-FT process, as shown in Figure 4, the parameters of the first seven layers are initialized by the parameters pre-trained on ImageNet [i.e. identifying, by the system, penultimate layer outputs of the pre-trained neural network model associated with the source data set as feature descriptors] and the parameters of the last fully-connected layer are randomly initialized with a Gaussian distribution                         
                            G
                            (
                            μ
                            ,
                             
                            σ
                            )
                            (
                            μ
                            =
                            0
                            ,
                             
                            σ
                            =
                            0.01
                            )
                        
                    .”); aggregating, by the system, a plurality of the feature descriptors that characterize the one or more of the source data sets employing one or more statistical aggregation techniques(pg. 6, e.g. “As shown in Fig. 5, the second row and the third row indicate the generated hypotheses and the corresponding outputs from the shared CNN [i.e. aggregating, by the system, a plurality of the feature descriptors that characterize the one or more of the source data sets]. For each object independent hypothesis, there is a high response on the corresponding category (e.g., for the first hypothesis, the response on car is very high). After cross-hypothesis max-pooling operation, as indicated by the last row in Fig. 5 [i.e. employing one or more statistical aggregation techniques], the high responses (i.e., car, horse and person), which can be considered as the predicted labels, are reserved.”); and averaging, by the system, the plurality of the feature descriptors characterizing the one or more source data sets within a defined category to compute a category feature representation(pg. 6, e.g.,  “Different from the pre-training, squared loss is used during I-FT. Suppose there are N images in the multi-label image set…the predictive probability vector is                         
                            
                                
                                    p
                                
                                
                                    i
                                
                            
                            =
                            [
                            
                                
                                    p
                                
                                
                                    i
                                    1
                                
                            
                            ,
                             
                            
                                
                                    p
                                
                                
                                    i
                                    2
                                
                            
                            ,
                             
                            …
                            ,
                             
                            
                                
                                    p
                                
                                
                                    i
                                    c
                                
                            
                            ]
                        
                    . And then the cost function to be minimized is defined as                         
                            J
                            =
                            
                                
                                    1
                                
                                
                                    N
                                
                            
                            
                                
                                    ∑
                                    
                                        i
                                        =
                                        1
                                    
                                    
                                        N
                                    
                                
                                
                                    
                                        
                                            ∑
                                            
                                                k
                                                =
                                                1
                                            
                                            
                                                c
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        
                                                            
                                                                
                                                                    p
                                                                
                                                                
                                                                    i
                                                                    k
                                                                
                                                            
                                                            -
                                                            
                                                                
                                                                    
                                                                        
                                                                            p
                                                                        
                                                                        
                                                                            i
                                                                            k
                                                                        
                                                                    
                                                                
                                                                ^
                                                            
                                                        
                                                    
                                                
                                                
                                                    2
                                                
                                            
                                        
                                    
                                
                            
                        
                    [i.e. averaging, by the system, the plurality of the feature descriptors characterizing the one or more source data sets within a defined category to compute a category feature representation].”).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of  Wang 1 with the teachings of Wei the motivation to do so would be to using existing pre-trained layers  for a different classification task(pg. 2, e.g., “The shared CNN can be well pre-trained with a large-scale single-label image dataset. To address the problem of insufficient multi-label training images, based on the Hypotheses-CNN-Pooling architecture, the shared CNN can be first well pre-trained on some large-scale single-label dataset, e.g., ImageNet, and then fine-tuned on the target multi-label dataset.”).

Regarding claim 15, Wang 1 in view of Ammar and Wei teaches: The method of claim 11.
Wang 1 further teaches: wherein the identifying comprises identifying, by the system, the pre-trained neural network model from a library of pre-existing models ([0220] e.g., "In these embodiments, the metric can then be used to select the pre-trained model from the library which is most appropriate for the input data, as each model in the library has associated metric values based on the input data from which the models were respectively trained, the selection based on the similarity between the metrics associated with the input data and each of the pre-trained models.").



Claims 2-4 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Wang 1 in view of Ammar and Volpi and further in view of Akahori et al. (US 2005/0125368 A1, hereinafter Akahori).

Regarding claim 2, Wang 1 in view of Ammar and in view of Volpi teaches: The system of claim 1. 
Wang 1 in view of Ammar and in view of  Volpi does not explicitly teach: wherein the assessment component uses a feature extractor and a statistical aggregation technique to create a first vector representation of the source data set and a second vector representation of the sample data set, and wherein the assessment component assesses the similarity metric using a distance computation technique regarding the first vector representation and the second vector representation.
However, Akahori teaches: wherein the assessment component uses a feature extractor and a statistical aggregation technique to create a first vector representation of the source data set and a second vector representation of the sample data set, and wherein the assessment component assesses the similarity metric using a distance computation technique regarding the first vector representation and the second vector representation ([0015] "Moreover, in the present invention, "the similarity" means an indicator of a degree of similarity between feature vectors. For example, a Euclidean distance or an inner product between two feature vectors in a feature vector space can be used." [0139] "Next, in Step S132, the feature vector extracting part 46 extracts a feature vector from one of the block images specified in Step S130. Here, the extracted feature vector is a vector which has, as components, feature quantities of the same sorts as the components of the sample feature vectors learned to derive the self-organizing map used for the meaning determination, in other words, ten feature quantities including inter-pixel average values and standard deviations of three component values of the block image expressed by the YCC color system as well as inter-pixel average values and standard deviations of respective absolute values of component values of a vertical edge image and a horizontal edge image which are derived from the block image.").  
Wang 1, Ammar, Volpi and Akahori are analogous art because they are directed to machine learning. It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of Wang 1 in view of Ammar and in view of Volpi to incorporate the method of the similarity measure of Akahori. The motivation/suggestion for doing this would be for the purpose of improving accuracy of meaning determination in accordance with characteristics of the target image or the image region. (Akahori [0036] e.g., "Thus, it is possible to gradually improve accuracy of meaning determination in accordance with characteristics of the target image or the image region, which is an actual target for the meaning determination.").

Regarding claim 3, Wang 1 in view of Ammar and in view Volpi and further in view of Akahori teaches: The system of claim 2. 
Wang 1 in view of Ammar and in view of Volpi does not explicitly teach: wherein the distance computation technique is selected from a group consisting of Kullback-Leibler divergence, Euclidean distance, cosine similarity, Manhattan distance, Minkowski distance, Jenson Shannon distance, chi-square distance, and Jaccard similarity.
However, Akahori teaches: wherein the distance computation technique is selected from a group consisting of Kullback-Leibler divergence, Euclidean distance, cosine similarity, Manhattan distance, Minkowski distance, Jenson Shannon distance, chi-square distance, and Jaccard similarity ([0015] "Moreover, in the present invention, "the similarity" means an indicator of a degree of similarity between feature vectors. For example, a Euclidean distance or an inner product between two feature vectors in a feature vector space can be used.").  
Wang 1, Ammar, Volpi, and Akahori are analogous art because they are directed to machine learning. It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of Wang 1 in view of Ammar and in view of Volpi to incorporate the method of the similarity measure of Akahori. The motivation/suggestion for doing this would be for the purpose of improving accuracy of meaning determination in accordance with characteristics of the target image or the image region. (Akahori [0036] e.g., "Thus, it is possible to gradually improve accuracy of meaning determination in accordance with characteristics of the target image or the image region, which is an actual target for the meaning determination.").

Regarding claim 4, Wang 1 in view of Ammar and in view of Volpi and further in view of Akahori teaches: The system of claim 2. 
Wang 1 in view of Ammar and in view of Volpi does not explicitly teach: wherein the statistical aggregation technique is selected from a group consisting of a mean average, a code book, a standard deviation, and a median average.
However, Akahori teaches: wherein the statistical aggregation technique is selected from a group consisting of a mean average, a code book, a standard deviation, and a median average ([0139] "Next, in Step S132, the feature vector extracting part 46 extracts a feature vector from one of the block images specified in Step S130. Here, the extracted feature vector is a vector which has, as components, feature quantities of the same sorts as the components of the sample feature vectors learned to derive the self-organizing map used for the meaning determination, in other words, ten feature quantities including inter-pixel average values and standard deviations of three component values of the block image expressed by the YCC color system as well as inter-pixel average values and standard deviations of respective absolute values of component values of a vertical edge image and a horizontal edge image which are derived from the block image.").  
Wang 1, Ammar, Volpi, and Akahori are analogous art because they are directed to machine learning. It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of Wang 1 in view of Ammar and in view of Volpi to incorporate the method of the similarity measure of Akahori. The motivation/suggestion for doing this would be for the purpose of improving accuracy of meaning determination in accordance with characteristics of the target image or the image region. (Akahori [0036] e.g., "Thus, it is possible to gradually improve accuracy of meaning determination in accordance with characteristics of the target image or the image region, which is an actual target for the meaning determination.").

Regarding claim 18, the computer program product of claim 2, and is similarly analyzed.

Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Wang, Zehan et al. (US 2017/0347110 A1, hereinafter Wang 1) in view of Ammar et al. ("An Automated Measure of MDP Similarity for Transfer in Reinforcement Learning", July 2014, hereinafter Ammar) and in view of Wei, Yunchao, et al. "CNN: Single-label to multi-label." arXiv preprint arXiv:1406.5726 (2014)(“Wei”) and further in view of Akahori et al. (US 2005/0125368 A1, hereinafter Akahori).

Regarding claim 13, Wang 1 in view of Ammar and in view of Wei teaches: The method of claim 11. 
Wang 1 in view of Ammar and in view of Wei does not explicitly teach: wherein the distance computation technique is selected from a group consisting of Kullback-Leibler divergence, Euclidean distance, cosine similarity, Manhattan distance, Minkowski distance, Jenson Shannon distance, chi-square distance, and Jaccard similarity.
However, Akahori teaches: wherein the distance computation technique is selected from a group consisting of Kullback-Leibler divergence, Euclidean distance, cosine similarity, Manhattan distance, Minkowski distance, Jenson Shannon distance, chi-square distance, and Jaccard similarity ([0015] "Moreover, in the present invention, "the similarity" means an indicator of a degree of similarity between feature vectors. For example, a Euclidean distance or an inner product between two feature vectors in a feature vector space can be used.").  
Wang 1, Ammar, Wei , and Akahori are analogous art because they are directed to machine learning. It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of Wang 1 in view of Ammar and in view of Wei to incorporate the method of the similarity measure of Akahori. The motivation/suggestion for doing this would be for the purpose of improving accuracy of meaning determination in accordance with characteristics of the target image or the image region. (Akahori [0036] e.g., "Thus, it is possible to gradually improve accuracy of meaning determination in accordance with characteristics of the target image or the image region, which is an actual target for the meaning determination.").

Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Wang 1 in view of Ammar, and in view of Volpi and further in view of Wang, Xiaogang et al. (US 2018/0341872 A1, hereinafter Wang 2).

Regarding claim 5, Wang 1 in view of Ammar and in view of Volpi teaches: The system of claim 1. 
Wang 1 in view of Ammar and in view of Volpi does not explicitly teaches: further comprising: a training component that performs a training pass using a target data set from the target machine learning task on the pre-trained neural network model.
However, Wang 2 teaches: further comprising: a training component that performs a training pass using a target data set from the target machine learning task on the pre-trained neural network model ([0009] "In one embodiment of the present application, the training comprises: feeding a first training sample forward through the pre-trained CNN and the adaptive CNN to generate a first output image, wherein the first training sample is obtained according to a first frame of the target video; comparing the generated first output image with a first ground truth derived from the first frame to obtain a plurality of first training errors for the adaptive convolution kernels, respectively;" [0010] "In one embodiment of the present application, the optimizing comprises: feeding the second training sample forward through the pre-trained CNN and the adaptive CNN to generate a second output image, wherein the second training sample is obtained according to a second frame of the target video").    
Wang 1, Ammar, Volpi and Wang 2 are analogous art because they are directed to neural network. It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of Wang 1, in view of  Ammar, and in view of Volpi and further in view of  Wang 2 to incorporate the transfer learning method by first pre-training a deep CNN on a source task with a large scale training data set and then fine-tuning the learned feature on the target task of Wang 2. The motivation/suggestion for doing this would be for the purpose to output a plurality of second feature maps with improved adaptability. (Wang 2 [Abstract] e.g., "convolving each of the sub-feature maps with one of a plurality of adaptive convolution kernels, respectively, to output a plurality of second feature maps with improved adaptability; training, frame by frame, the adaptive convolution kernels.").

Claim 14 are rejected under 35 U.S.C. 103 as being unpatentable over Wang 1 in view of Ammar, and in view of Wei and further in view of Wang, Xiaogang et al. (US 2018/0341872 A1, hereinafter Wang 2).

Regarding claim 14, Wang 1 in view of Ammar and in view of Wei teaches: The method of claim 11. 
Wang 1 in view of Ammar and in view of Wei does not explicitly teaches: further comprising: a training component that performs a training pass using a target data set from the target machine learning task on the pre-trained neural network model.
However, Wang 2 teaches: further comprising: a training component that performs a training pass using a target data set from the target machine learning task on the pre-trained neural network model ([0009] "In one embodiment of the present application, the training comprises: feeding a first training sample forward through the pre-trained CNN and the adaptive CNN to generate a first output image, wherein the first training sample is obtained according to a first frame of the target video; comparing the generated first output image with a first ground truth derived from the first frame to obtain a plurality of first training errors for the adaptive convolution kernels, respectively;" [0010] "In one embodiment of the present application, the optimizing comprises: feeding the second training sample forward through the pre-trained CNN and the adaptive CNN to generate a second output image, wherein the second training sample is obtained according to a second frame of the target video").    
Wang 1, Ammar, Wei and Wang 2 are analogous art because they are directed to neural network. It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of Wang 1 in view of Ammar and in view of Wei and further in view of  Wang 2 to incorporate the transfer learning method by first pre-training a deep CNN on a source task with a large scale training data set and then fine-tuning the learned feature on the target task of Wang 2. The motivation/suggestion for doing this would be for the purpose to output a plurality of second feature maps with improved adaptability. (Wang 2 [Abstract] e.g., "convolving each of the sub-feature maps with one of a plurality of adaptive convolution kernels, respectively, to output a plurality of second feature maps with improved adaptability; training, frame by frame, the adaptive convolution kernels.").

Claims 7-8 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Wang 1 in view of Ammar, and in view of Volpi and further in view of Liu et al. (US 2019/0050890 A1, hereinafter Liu).

Regarding claim 7, Wang 1 in view of Ammar and in view of Volpi teaches: The system of claim 1. 
 	Wang 1 does not explicitly teach: wherein the assessment component assesses the similarity metric between the plurality of source data sets and the sample data set.
	However, Ammar teaches: wherein the assessment component assesses the similarity metric between the plurality of source data sets and the sample data set ([p. 31, Introduction, par. 3, and Fig. 1] e.g., "Transfer learning agents must be able to automatically identify source tasks that are most most [sic] similar to and helpful for learning a target task. In RL, where tasks are represented by Markov decision processes (MDPs), agents could use an MDP similarity measure to assess the relatedness of each potential source task to the given target." [p. 31, Introduction, par. 4] e.g., "this approach does not require a model of the MDP, but can estimate this metric from samples gathered through an agent’s interaction with the environment.").
Wang 1 and Ammar are analogous art because they are directed to neural network. It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of Wang 1 to incorporate the method of the similarity measure of Ammar. The motivation/suggestion for doing this would be for the purpose of improving the reinforcement learning of a new task by allowing the agent to reuse knowledge acquired from other source tasks. (Ammar [Abstract] e.g., "Transfer learning can improve the reinforcement learning of a new task by allowing the agent to reuse knowledge acquired from other source tasks.").
Wang 1 in view of Ammar and in view of Volpi does not explicitly teach: wherein the source data set is comprised within a plurality of source data sets, 
wherein the identification component further generates the pre-trained neural network model using the source data set and a second source data set from the plurality of source data sets.
However, Lin teaches: wherein the source data set is comprised within a plurality of source data sets ([0019] e.g., “The system mentioned about may further include a descriptor relationship learning module, configured to train and generate a descriptor semantic model based on a plurality of datasets”),
wherein the identification component further generates the pre-trained neural network model using the source data set and a second source data set from the plurality of source data sets ([0084] "Specifically, the descriptor relationship learning module 12 uses the plurality of datasets 3 collected by the data collection module 11 for pre-training then generates the descriptor semantic model 120." [0085] "In an exemplary embodiment, the descriptor relationship learning module 12 uses deep learning/artificial intelligence").  
Wang 1, Ammar, Volpi and Liu are analogous art because they are directed to neural network. It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of Wang 1 in view of Ammar and in view of Volpi to incorporate the method of generating a pre-trained model of Liu. The motivation/suggestion for doing this would be for the purpose of allowing the prediction result to be more accurate. (Liu [0118] e.g., "when the AR prediction module 18 of the present disclosure is predicting the predicted AR values for each ADC, it responds effectively to the individual audience behavior in order to allow the prediction result to be more accurate; consequently, the objective of providing personalized advertisement can be achieved.").

Regarding claim 8, Wang 1 in view of Ammar and in view of Volpi and further in view of Lin teaches: The system of claim 7. 
Wang 1 further teaches: wherein the source data set is associated with a vision-based model ([0308] "It should also be noted that visual data, in some embodiments, may comprise image and/or video data.").
Wang 1 in view of Ammar and in view of Volpi does not explicitly teach: the second source data set is associated with a knowledge-based model.
However, Lin teaches: the second source data set is associated with a knowledge-based model ([0057] e.g., "The data collection module 11 is connected to the internet. A plurality of dataset 3 is collected by accessing any public data via the internet. Specifically, dataset 3 may be general data such as encyclopedia, text book, or data updated as time revolves such as Wikipedia, internet news or comments (e.g. video comments on YouTube or text comments on Facebook), etc.").  
Wang 1, Ammar, Volpi and Liu are analogous art because they are directed to neural network. It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of Wang 1 in view of Ammar and in view of Volpi to incorporate the method of generating a pre-trained model of Liu. The motivation/suggestion for doing this would be for the purpose of allowing the prediction result to be more accurate. (Liu [0118] e.g., "when the AR prediction module 18 of the present disclosure is predicting the predicted AR values for each ADC, it responds effectively to the individual audience behavior in order to allow the prediction result to be more accurate; consequently, the objective of providing personalized advertisement can be achieved.").

Regarding claim 20, the computer program product of claim 7, and is similarly analyzed.

Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Wang 1 in view of Ammar, and in view of Wei and further in view of Liu et al. (US 2019/0050890 A1, hereinafter Liu).
	
Regarding claim 16, Wang 1 in view of Ammar and in view of Wei teaches: The method of claim 11. 
 	Wang 1 does not explicitly teach: wherein the assessment component assesses the similarity metric between the plurality of source data sets and the sample data set.
	However, Ammar teaches: wherein the assessment component assesses the similarity metric between the plurality of source data sets and the sample data set ([p. 31, Introduction, par. 3, and Fig. 1] e.g., "Transfer learning agents must be able to automatically identify source tasks that are most most [sic] similar to and helpful for learning a target task. In RL, where tasks are represented by Markov decision processes (MDPs), agents could use an MDP similarity measure to assess the relatedness of each potential source task to the given target." [p. 31, Introduction, par. 4] e.g., "this approach does not require a model of the MDP, but can estimate this metric from samples gathered through an agent’s interaction with the environment.").
Wang 1 and Ammar are analogous art because they are directed to neural network. It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of Wang 1 to incorporate the method of the similarity measure of Ammar. The motivation/suggestion for doing this would be for the purpose of improving the reinforcement learning of a new task by allowing the agent to reuse knowledge acquired from other source tasks. (Ammar [Abstract] e.g., "Transfer learning can improve the reinforcement learning of a new task by allowing the agent to reuse knowledge acquired from other source tasks.").
Wang 1 in view of Ammar and in view of Wei does not explicitly teach: wherein the source data set is comprised within a plurality of source data sets, wherein the identification component further generates the pre-trained neural network model using the source data set and a second source data set from the plurality of source data sets.
However, Lin teaches: wherein the source data set is comprised within a plurality of source data sets ([0019] e.g., “The system mentioned about may further include a descriptor relationship learning module, configured to train and generate a descriptor semantic model based on a plurality of datasets”),
wherein the identification component further generates the pre-trained neural network model using the source data set and a second source data set from the plurality of source data sets ([0084] "Specifically, the descriptor relationship learning module 12 uses the plurality of datasets 3 collected by the data collection module 11 for pre-training then generates the descriptor semantic model 120." [0085] "In an exemplary embodiment, the descriptor relationship learning module 12 uses deep learning/artificial intelligence").  
Wang 1, Ammar, Wei, and Liu are analogous art because they are directed to neural network. It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of Wang 1 in view of Ammar and in view of Wei to incorporate the method of generating a pre-trained model of Liu. The motivation/suggestion for doing this would be for the purpose of allowing the prediction result to be more accurate. (Liu [0118] e.g., "when the AR prediction module 18 of the present disclosure is predicting the predicted AR values for each ADC, it responds effectively to the individual audience behavior in order to allow the prediction result to be more accurate; consequently, the objective of providing personalized advertisement can be achieved.").

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
                                                                                                                                                                                                   
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Adam Clark Standke whose telephone number is (571)270-1806. The examiner can normally be reached 10AM-7PM M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael J Huntley can be reached on (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

Adam Clark Standke
Assistant Examiner
Art Unit 2129


/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129