Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
 
Status of Claims
The following claim(s) is/are pending in this Office action: 1-20.
Claim(s) 1-20 are rejected.  This rejection is NON-FINAL.

Claim Objections
Claims 1, 7-8, and 16 stand objected to because of the following informalities:
(a)       Claim 1: The examiner suggests amending claim 1 to recite “using a third neural network, determine whether the output from the first layer is from the first neural network, the third neural network being different from the first and second neural networks; and based on a determination that the output from the first layer is not from the first neural network, adjust one or more weights of the first layer.”
(b)       Claim 7: the examiner suggests amending claim 7 to recite “wherein the third neural network operates in an unsupervised mode to, using labeled data, learn to correctly classify outputs from layers of either of the first neural network and the second neural network.”
(c)        Claim 8: The examiner suggests amending claim 1 to recite “using a third neural network, determining whether the output from the first layer is from the first neural network, the third neural network being different from the first and second neural  and based on determining that the output from the first layer is not from the first neural network, adjusting one or more weights of the first layer.”
(d)       Claim 16: The limitation “output a classification of the target data set, wherein the target data set is classified by a domain adaptation module comprising a domain classifier to inverse a gradient and back-propagate the gradient to a main model” contains a minor informality because the term “inverse” can be used as a noun or an adjective but not a verb as recited in claim 16. The examiner suggests amending the above limitation to recite “output a classification of the target data set, wherein the target data set is classified by a domain adaptation module comprising a domain classifier to invert a gradient …”.
Appropriate correction is required.
 
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
 
 
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
 
 
Claims 1-20 stand rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
(a)       Independent claims 1 and 8:
identical to a separate output of a separate layer in the first neural network …”.
(2) The examiner further notes that the purpose of the claimed “first neural network” is unclear because only the second neural network produces an output by a first layer and is trained in these claims, and there is no link between the limitations pertaining to the second neural network and the “first neural network” other than the recitation of the output is or is not from the first neural network. For the purpose of examination, the aforementioned interpretation pertaining to the first neural network is also adopted here, yet clarification is requested. 
 
(b) Dependent claims 2-7 and 9-15:  

 
(c)        Claims 3-4 and 10-11:
(1) the limitation “based on a determination that the output from the first layer is from the first neural network, …” is indefinite because the base claim 1 already recites and thus requires “a determination that the output from the first layer is not from the first neural network”.  Therefore, these two instances of “determination” cannot be the same, and the output from the first layer cannot be “from the first neural network” as recited in claims 3-4 and 10-11. For the purpose of examination, the above limitation is interpreted as “based on a separate determination that the output from the first layer is from the first neural network, …”. 
(2) The limitation “one or more weights of the first layer” recited in claims 3-4 and 10-11 confuse with “one or more weights of the first layer” respectively recited in the base claims 1 and 8. For the purpose of examination, the limitation “one or more weights of the first layer” recited in claims 3-4 and 10-11 are interpreted as “one or more separate weights of the first layer” that are different from “one or more weights of the first layer” respectively recited in base claims 1 and 8. 
(3) The limitation “one or more weights” is indefinite because it confuses with the limitation “one or more weights” recited in base claim 1 so it is unclear whether these “one or more weights” in claim 3 are the same as or different from the “one or more weights” in claim 1.  For the purpose of examination, “one or more weights” in these claims are separate weights” to distinguish from “one or more weights” in the respective base claims.
 
(c)        Claims 4 and 11:
(1) The limitation “a hidden layer of the second neural network” confuses with the limitation “a hidden layer of the second neural network” recited in the base claim 1 and is thus indefinite.  For the purpose of examination, this limitation is interpreted as “a different hidden layer of the second neural network”. 
(2) The limitation “using the third neural network, determine whether the second output is from the first neural network; and” is indefinite, especially in view of the limitation “the second layer also being a hidden layer of the second neural network”. Again, the function word “from” carries the definition of indicating “a starting point of a physical movement or a starting point in measuring or reckoning or in a statement of limits”. See Merriam-Webster.   Therefore, it is unclear how the output generated by the “second layer” of a second neural network can possibly be “from the first neural network”.  For the purpose of examination, this limitation is interpreted as “using the third neural network, determine whether the second output is [[from]]identical to a separate output generated by a corresponding layer in the first neural network; and”.   
 
(d)       Independent claim 16:
The limitation “output a classification of the target data set, wherein the target data set is classified by a domain adaptation module comprising a domain classifier to inverse a gradient and back-propagate the gradient to a main model” is indefinite because this invert a gradient into a negative gradient and back-propagate the negative gradient to a main model”. 
 
(e)       Dependent claims 17-20:
Dependent claims 17-20 depend from independent claim 16 and are thus rejected due to at least their dependency and hence inheritance of the aforementioned deficiencies from independent claim 16.
 
(f)        Claim 20: The limitation “receiving data from a spatial model and a temporal model” in “wherein the domain classifier inverses the gradient using a gradient reversal layer (GRL) receiving data from a spatial model and a temporal model” is unclear because it is unclear whether it is the domain classifier or the gradient reversal layer (GRL) is “receiving data from a spatial model and a temporal model”.  For the purpose of examination, this limitation is interpreted as “wherein the domain classifier inverses the gradient using a gradient reversal layer (GRL), the domain classifier receiving data from a spatial model and a temporal model”.
 
Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
 
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to 
Claims 1-4, 7, 10, and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tzeng et al. Adversarial Discriminative Domain Adaptation (17 Feb. 2017) (hereinafter Tzeng) in view of Csurka et al. USPGPub US20160078359A published on Mar. 17, 2016 (hereinafter Csurka).
 
With respect to claim 1, Tzeng teaches an apparatus, comprising:
 
access a first neural network, the first neural network being associated with a first data type; (Tzeng, FIG. 3:  


    PNG
    media_image1.png
    198
    791
    media_image1.png
    Greyscale

          FIG. 3 caption: “An overview of our proposed Adversarial Discriminative Domain Adaptation (ADDA) approach. We first pre-train a source encoder CNN using labeled source image examples. Next, we perform adversarial adaptation by learning a target encoder CNN such that a discriminator that sees encoded source and target examples cannot reliably predict their domain label. During testing, target images are mapped with the target encoder to the shared feature space and classified by the source encoder CNN. Dashed lines indicate fixed network parameters.” P. 7172, § 5, ¶ 1: “We now evaluate ADDA for unsupervised classiﬁcation adaptation across four different domain shifts. We explore three digits datasets of varying difﬁculty: MNIST [18], USPS, and SVHN [19]. We additionally evaluate on the NYUD [20] dataset to study adaptation across modalities.”  
	The examiner first notes that Tzeng’s source encoder CNN teaches a first neural network, and that Tzeng’s pre-training its source encoder CNN with source images and labels or sending source images to its CNN during adversarial adaptation teaches accessing a first neural network.  Moreover, the examiner further notes that Tzeng’s evaluating any one of the four different types of domain shifts to learn the source mapping (e.g., Ms in § 3) for its source encoder CNN to a different type of the four types of domain shifts for adaptation across modalities to a target domain shift (e.g., any one of the remaining three different types of domain shifts) teaches that the source encoder CNN is associated with a first data type. 
Further, the examiner notes that Tseng interchangeably uses the terms “classifier” (e.g., “source classifier” and “target classifier” in § 3, “task-specific classifier” in § 3.1, “classifier” in FIG. 3 and its caption), “encoder” (e.g., “target encoder” in FIGS. 1 and 3 as well as their respective captions), and “encoder CNN” (e.g., “source encoder CNN” and “target encoder CNN” in the caption of FIG. 3), and that “classifier,” “encoder,” and “encoder CNN” are thus interpreted as functional and/or structural equivalents of each other.)
 
 (Tzeng, FIG. 3 and p. 7172, § 5, ¶ 1, supra. FIG. 3 caption: “An overview of our proposed Adversarial Discriminative Domain Adaptation (ADDA) approach. We first pre-train a source encoder CNN using labeled source image examples. Next, we perform adversarial adaptation by learning a target encoder CNN such that a discriminator that sees encoded source and target examples cannot reliably predict their domain label. During testing, target images are mapped with the target encoder to the shared feature space and classified by the source classifier. Dashed lines indicate fixed network parameters.” 
The examiner notes that Tzeng’s target encoder CNN teaches a second neural network, and that Tzeng’s target encoder CNN’s receiving target images during adversarial adaptation illustrated in FIG. 3 teaches accessing a second neural network.  The examiner further notes that Tzeng’s evaluating any one of the remaining three different types of domain shifts (e.g., three different shifts other than the aforementioned domain shift evaluated by the first neural network) including adaptation across modalities teaches that the target encoder CNN is associated with a different, second data type while the source encoder CNN above is associated with a first data type (see rationale for the limitation immediately above).)
 
provide, as input, first training data to the second neural network; (Tzeng, FIG. 3 caption: “An overview of our proposed Adversarial Discriminative Domain Adaptation (ADDA) approach. We first pre-train a source encoder CNN using labeled source image examples. Next, we perform adversarial adaptation by learning a target encoder CNN such that a discriminator that sees encoded source and target examples cannot reliably predict their domain label. During testing, target images are mapped with the target encoder to the shared feature space and classified by the source encoder CNN. Dashed lines indicate fixed network parameters.” The examiner notes that Tzeng’s learning a target encoder CNN (also referred to as “target encoder” above) teaches training the second neural network, and that Tzeng’s providing “target examples” in FIG. 3’s Caption or the “target images” illustrated in FIG. 3 to the target encoder CNN (the clamed second neural network) for learning the target encoder CNN teach provide, as input, first training data to the second neural network (e.g., Tzeng’s target encoder CNN/target CNN cited above.) Therefore, the examiner asserts that at least the aforementioned passages and figure teach the above limitation.)
 
select a first layer, the first layer being a hidden layer of the second neural network; (Tzeng, § 3.1, ¶ 3: “Once the mapping parameterization is determined for the source, we must decide how to parametrize the target mapping Mt. In general, the target mapping almost always matches the source in terms of the speciﬁc functional layer (architecture), but different methods have proposed various regularization techniques. All methods initialize the target mapping parameters with the source, but different methods choose different constraints between the source and target mappings, ψ(Ms,Mt).” 
§ 3.1, ¶ 4: “Consider a layered representations where each layer parameters are denoted as,                         
                            
                                
                                    M
                                
                                
                                    s
                                
                                
                                    l
                                
                            
                        
                     or                         
                            
                                
                                    M
                                
                                
                                    t
                                
                                
                                    l
                                
                            
                        
                    , for a given set of equivalent layers, {ℓ1, . . . , ℓn}. Then the space of constraints explored in the literature can be described through layerwise equality constraints as follows:                         
                            ψ
                            
                                
                                    
                                        
                                            M
                                        
                                        
                                            s
                                        
                                    
                                    ,
                                     
                                    
                                        
                                            M
                                        
                                        
                                            t
                                        
                                    
                                
                            
                            ≜
                            {
                            
                                
                                    ψ
                                
                                
                                    
                                        
                                            l
                                        
                                        
                                            i
                                        
                                    
                                
                            
                            
                                
                                    
                                        
                                            M
                                        
                                        
                                            s
                                        
                                        
                                            
                                                
                                                    l
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                    ,
                                     
                                    
                                        
                                            M
                                        
                                        
                                            t
                                        
                                        
                                            
                                                
                                                    l
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                
                            
                            
                                
                                    }
                                
                                
                                    i
                                    ∈
                                    
                                        
                                            1
                                            …
                                            n
                                        
                                    
                                
                            
                             
                             
                             
                             
                             
                            (
                            4
                            )
                        
                     where each individual layer can be constrained independently. A very common form of constraint is source and target layerwise equality:                         
                            
                                
                                    ψ
                                
                                
                                    
                                        
                                            l
                                        
                                        
                                            i
                                        
                                    
                                
                            
                            
                                
                                    
                                        
                                            M
                                        
                                        
                                            s
                                        
                                        
                                            
                                                
                                                    l
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                    ,
                                     
                                    
                                        
                                            M
                                        
                                        
                                            t
                                        
                                        
                                            
                                                
                                                    l
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                
                            
                            =
                            
                                
                                    
                                        
                                            M
                                        
                                        
                                            s
                                        
                                        
                                            
                                                
                                                    l
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                    =
                                     
                                    
                                        
                                            M
                                        
                                        
                                            t
                                        
                                        
                                            
                                                
                                                    l
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                
                            
                             
                             
                             
                             
                            (
                            5
                            )
                        
                      .”  
The examiner notes that any of the i-th layers where                         
                            i
                            ∈
                            
                                
                                    2
                                    …
                                    n
                                    -
                                    1
                                
                            
                             
                        
                    to which the mapping (e.g.,                         
                            
                                
                                    M
                                
                                
                                    t
                                
                                
                                    
                                        
                                            l
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                     in Eq. (4) or (5) above) applies to produce a mapping distribution (e.g,                         
                            
                                
                                    M
                                
                                
                                    t
                                
                                
                                    
                                        
                                            l
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                    (Xt)) teaches a hidden layer. That is, the first layer (ℓ1) may teach an input layer, and the last layer (ℓn) may teach an output layer), and the intervening layers (ℓi where                         
                            i
                            ∈
                            
                                
                                    2
                                    …
                                    n
                                    -
                                    1
                                
                            
                        
                    ) in the second neural network (e.g., the target encoder CNN) for learning the layer parameters (                        
                            
                                
                                    M
                                
                                
                                    t
                                
                                
                                    l
                                
                            
                        
                     in § 3.1, supra) is interpreted as a hidden layer. The examiner further notes that Tzeng’s independent modeling and constraining each layer in the second neural network (e.g., Tzeng’s target encoder CNN) with respect to the corresponding individual layer in the first neural network (e.g., Tzeng’s source encoder CNN) as shown in Eqns. (4)-(5) above teaches selecting a first layer, which is a hidden layer, in the second neural network.)
 
identify an output from the first layer that was generated based on the first training data; (Tzeng, p. 7169, § 3, ¶ 3: “In adversarial adaptive methods, the main goal is to regularize the learning of the source and target mappings, Ms and Mt, so as to minimize the distance between the empirical source and target mapping distributions: Ms(Xs) and Mt(Xt).” p. 7169, § 3, ¶ 4: “First a domain discriminator, D, which classiﬁes whether a data point is drawn from the source or the target domain. Thus, we can derive a generic formulation for domain adversarial techniques below: 

    PNG
    media_image2.png
    98
    336
    media_image2.png
    Greyscale
”
p. 7170, § 3.1, ¶ 4: “Consider a layered representations where each layer parameters are denoted as,                         
                            
                                
                                    M
                                
                                
                                    s
                                
                                
                                    l
                                
                            
                        
                     or                         
                            
                                
                                    M
                                
                                
                                    t
                                
                                
                                    l
                                
                            
                        
                    , for a given set of equivalent layers, {ℓ1, . . . , ℓn}. Then the space of constraints explored in the literature can be described through layerwise equality constraints as follows:                        
                             
                            ψ
                            
                                
                                    
                                        
                                            M
                                        
                                        
                                            s
                                        
                                    
                                    ,
                                     
                                    
                                        
                                            M
                                        
                                        
                                            t
                                        
                                    
                                
                            
                            ≜
                            {
                            
                                
                                    ψ
                                
                                
                                    
                                        
                                            l
                                        
                                        
                                            i
                                        
                                    
                                
                            
                            
                                
                                    
                                        
                                            M
                                        
                                        
                                            s
                                        
                                        
                                            
                                                
                                                    l
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                    ,
                                     
                                    
                                        
                                            M
                                        
                                        
                                            t
                                        
                                        
                                            
                                                
                                                    l
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                
                            
                            
                                
                                    }
                                
                                
                                    i
                                    ∈
                                    
                                        
                                            1
                                            …
                                            n
                                        
                                    
                                
                            
                             
                             
                             
                             
                             
                            (
                            4
                            )
                        
                     where each individual layer can be constrained independently. A very common form of constraint is source and target layerwise equality:                         
                            
                                
                                    ψ
                                
                                
                                    
                                        
                                            l
                                        
                                        
                                            i
                                        
                                    
                                
                            
                            
                                
                                    
                                        
                                            M
                                        
                                        
                                            s
                                        
                                        
                                            
                                                
                                                    l
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                    ,
                                     
                                    
                                        
                                            M
                                        
                                        
                                            t
                                        
                                        
                                            
                                                
                                                    l
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                
                            
                            =
                            
                                
                                    
                                        
                                            M
                                        
                                        
                                            s
                                        
                                        
                                            
                                                
                                                    l
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                    =
                                     
                                    
                                        
                                            M
                                        
                                        
                                            t
                                        
                                        
                                            
                                                
                                                    l
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                
                            
                             
                             
                             
                             
                            (
                            5
                            )
                        
                    .
The examiner notes that Tzeng’s i-th layer (e.g.,                         
                            l
                            i
                             
                            w
                            h
                            e
                            r
                            e
                             
                            i
                            ∈
                            
                                
                                    2
                                    …
                                    n
                                    -
                                    1
                                
                            
                             
                        
                    ) teaches a first layer of the second neural network (e.g., Tzeng’s target encoder CNN), and that the second neural network (e.g., Tzeng’s target encoder neural network) generating the target mapping distribution for the i-th layer,                         
                            
                                
                                    M
                                
                                
                                    t
                                
                                
                                    
                                        
                                            l
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                    (Xt), by learning the target representation mapping,                         
                            
                                
                                    M
                                
                                
                                    t
                                
                                
                                    
                                        
                                            l
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                    , of the i-th layer (                        
                            l
                            i
                        
                    ) for the target training input images, Xt, teaches an output (e.g., the aforementioned target mapping distribution,                         
                            
                                
                                    M
                                
                                
                                    t
                                
                                
                                    
                                        
                                            l
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                    (Xt), for the target input images Xt), and that Tzeng thus teaches the above limitation.)

using a third neural network, determine whether the output from the first layer is from the first neural network, the third neural network being different from the first and second neural networks; (Tzeng, FIG. 3, supra.  § 3, ¶ 4: “First a domain discriminator, D, which classiﬁes whether a data point is drawn from the source or the target domain. Thus, D is optimized according to a standard supervised loss, LadvD (X s , Xt, Ms, Mt ) where the labels indicate the origin domain, deﬁned below:

    PNG
    media_image3.png
    90
    386
    media_image3.png
    Greyscale

§ 3, ¶ 5: “Thus, we can derive a generic formulation for domain adversarial techniques below: 

    PNG
    media_image2.png
    98
    336
    media_image2.png
    Greyscale
”
The examiner notes that Tzeng’s network including the functional blocks “generative or discriminative model?”, “weights tied or untied?”, “Which adversarial objective?”, and/or “discriminator” illustrated in FIG. 3 for Tzeng’s domain adaptation between the source domain and the target domain teaches a third neural network that is different from the first neural network (e.g., the source CNN in FIG. 3)) and the second neural network (e.g., Tzeng’s target encoder CNN ).  The examiner further notes that Tzeng’s training both the source and target encoder CNNs  based on minimizing the losses in Eqns. (2)-(3) to optimize the source representation for the i-th layer,                         
                            
                                
                                    M
                                
                                
                                    s
                                
                                
                                    
                                        
                                            l
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                    , for the source encoder CNN  and the target representation for the i-th layer,                         
                            
                                
                                    M
                                
                                
                                    t
                                
                                
                                    
                                        
                                            l
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                    , for the target encoder CNN  so that the source encoder CNN  is adapted to the target domain teaches whether the source mapping distribution for the i-th layer,                         
                            
                                
                                    M
                                
                                
                                    s
                                
                                
                                    
                                        
                                            l
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                    (Xi), is sufficiently close to the target mapping distribution for the i-th layer,                         
                            
                                
                                    M
                                
                                
                                    t
                                
                                
                                    
                                        
                                            l
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                    , and thus renders obvious whether the output from the first layer in the second neural network (e.g.,                         
                            
                                
                                    M
                                
                                
                                    t
                                
                                
                                    
                                        
                                            l
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                    (Xi) from the target encoder CNN ) is from the first neural network (e.g.,                         
                            
                                
                                    M
                                
                                
                                    s
                                
                                
                                    
                                        
                                            l
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                    (Xi) from the source encoder CNN).)
 
based on a determination that the output from the first layer is not from the first neural network, adjust one or more weights of the first layer. (Tzeng, § 4, ¶ 1: “Speciﬁcally, we use a discriminative base model, unshared weights, and the standard GAN loss. We illustrate our overall sequential training procedure in Figure 3.” § 4, ¶ 3: “Next, we choose to allow independent source and target mappings by untying the weights. This is a more ﬂexible learing [sic] paradigm as it allows more domain speciﬁc feature extraction to be learned. However, note that the target domain has no label access, and thus without weight sharing a target model may quickly learn a degenerate solution if we do not take care with proper initialization and training procedures.” § 4, ¶ 4: “In doing so, we are effectively learning an asymmetric mapping, in which we modify the target model so as to match the source distribution.” § 1, ¶ 3: “For example, [11, 12] share weights and learn a symmetric mapping of both source and target images to the shared feature space, while [13] decouple some layers thus learning a partially asymmetric mapping.” Eqns. (1)-(3) (reproduction omitted). 
The examiner notes that Tzeng uses the term weight and mapping interchangeably (ese § 1, ¶ 3, supra).  The examiner further notes that incurring a loss between the source encoder CNN output and the target encoder CNN output renders obvious that the output from the first layer of the second neural network is different from the corresponding output of the first neural network and is thus not from the first neural network, and that Tzeng’s modifying the target mapping (e.g.,                         
                            
                                
                                    M
                                
                                
                                    t
                                
                                
                                    
                                        
                                            l
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                     for the i-th layer), which represents “layer parameters” such as weights based on adversarial training and standard GAN loss (e.g., Eqns. (1)-(3)) teaches modifying one or more weights of the first layer (e.g., the i-th layer).)
 
Tzeng does not appear to teach:
at least one processor, and
at least one computer storage that is not a transitory signal and that comprises instructions executable by the at least one processor to:
 
Csurka does, however, teach:
at least one processor, and (Csurka, ¶ [0042]: “The digital processor 30 can be variously embodied, such as by a single-core processor, a dual-core processor (or more generally by a multiple-core processor), a digital processor and cooperating math coprocessor, a digital controller, or the like. The exemplary digital processor 30, in addition to controlling the operation of the computer system 10, executes the instructions 28 stored in memory 26 for performing the method outlined in FIG. 3.” ¶ [0044]: “The term “software,” as used herein, is intended to encompass any collection or set of instructions executable by a computer or other digital system so as to configure the computer or other digital system to perform the task that is the intent of the software. The term “software” as used herein is intended to encompass such instructions stored in storage medium such as RAM, a hard disk, optical disk, or so forth, and is also intended to encompass so-called “firmware” that is software stored on a ROM or so forth.”)

at least one computer storage that is not a transitory signal and that comprises instructions executable by the at least one processor to: (Csurka, supra.)

Tseng and Csurka are analogous art because both pertain to domain adaptation of generative neural networks.
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Tseng’s determining a layer of a first neural network of a first data type is not from a second neural network of a second data type (Tseng, supra) with Csurka’s processor and computer storage (Csurka, supra).  The motivation communicatively links a computer processor to memory so that the processor controls the computing system and execute the instructions stored in memory for any software processes (Csurka, ¶ [0042]: “The digital processor 30 can be variously embodied, such as by a single-core processor, a dual-core processor (or more generally by a multiple-core processor), a digital processor and cooperating math coprocessor, a digital controller, or the like. The exemplary digital processor 30, in addition to controlling the operation of the computer system 10, executes the instructions 28 stored in memory 26 for performing the method outlined in FIG. 3.”)
 
 
With respect to claim 2, Tzeng modified by Csurka teaches the apparatus of claim 1, and Tzeng further teaches:
wherein the instructions are executable by the at least one processor to: initially establish the second neural network by a copying of the first neural network. (Tzeng, § 4, ¶ 3: “However, note that the target domain has no label access, and thus without weight sharing target model may quickly learn a degenerate solution if we do not take care with proper initialization and training procedures. Therefore, we use the pre-trained source model as an intitialization [sic] for the target representation space and ﬁx the source model during adversarial training.”  The examiner notes that Tzeng’s using the same, pre-trained source mode as an initialization for the target domain teaches copying a first neural network to initially establish the second neural network.)
 
With respect to claim 3, Tzeng modified by Csurka teaches the apparatus of claim 1, and Csurka further teaches:
wherein the instructions are executable by the at least one processor to: based on a determination that the output from the first layer is from the first neural network, decline to adjust one or more weights of the first layer. (Csurka, ¶ [0094]: “Therefore another stopping criterion may be added, as follows. At each iteration, the classification accuracy of the learned DSCM classifier on the original labeled set                         
                            
                                
                                    T
                                
                                
                                    1
                                
                            
                             
                        
                    is evaluated and if the classification performance in step r+1 incurs a stronger degradation than a predefined tolerance threshold (e.g., 1%) compared to the accuracy obtained in step r, iterating is stopped and Wr, the metric obtained before degradation, is retained. As will be appreciated, other stopping criteria can also be considered, such as measuring the variation between iterations of the TDAS.” The examiner notes that Csurka’s classification performance at a step teaches the output of a layer.  The examiner further notes that Csurka’s terminating weight updates based on a tolerance threshold (e.g., 1% for the accuracy between two classification outputs) teaches declining to adjust one or more weights of the first layer.)
Tzeng and Csurka are analogous art because both pertain to domain adaptation of generative neural networks.
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Tzeng’s determining a layer of a first neural network of a first data type is not from a second neural network of a second data type (Tzeng, supra) with Csurka’s stopping criterion for weight updates (Csurka, supra).  The modification preserves an optimal accuracy of neural networks and computing resources by terminating the update process when a stopping criterion is satisfied (Csurka, ¶ [0101]: “The transformation matrix is then updated as a function of the current transformation matrix, the current training set, and the current domain-specific weights. The current weights may then be updated. At S114, if a stopping point criterion is not met, the method may return to S106 or S108 for a further iteration.” ¶ [0084]: Algorithm 1: “9. If stopping criteria [sic] is met (classification accuracy degraded or no more data available to add or remove), quit the loop.”)
 
With respect to claim 4, Tzeng modified by Csurka teaches the apparatus of claim 3, and Tzeng further teaches:
 
wherein the output is a first output, and wherein the instructions are executable by the at least one processor to: based on a determination that the first output from the first layer is from the first neural network, select a second layer, the second layer also being a hidden layer of the second neural network; (Tzeng, § 3, ¶ 4 and § 3.1, ¶ 4 cited for claim 1, supra.  § 4, ¶ 3: “Next, we choose to allow independent source and target mappings by untying the weights. This is a more ﬂexible learing [sic] paradigm as it allows more domain speciﬁc feature extraction to be learned. However, note that the target domain has no label access, and thus without weight sharing a target model may quickly learn a degenerate solution if we do not take care with proper initialization and training procedures. Therefore, we use the pre-trained source model as an initialization for the target representation space and ﬁx the source model during adversarial training.” § 4, ¶ 4: “In doing so, we are effectively learning an asymmetric mapping, in which we modify the target model so as to match the source distribution.”
The examiner notes that Tzeng teaches a layerwise domain adaptation technique that optimizes the source mapping in a source layer and the target mapping in a target layer for each individually modeled layer as shown in Eq. (3) in § 3, ¶ 4 and Eqns. (4)-(5) in § 3.1, ¶ 4, supra. The examiner further notes that when the adversarial loss (                        
                            
                                
                                    L
                                
                                
                                    
                                        
                                            a
                                            d
                                            v
                                        
                                        
                                            M
                                        
                                    
                                
                            
                        
                     in Eq. (3), supra) for a first layer (e.g.,                         
                            
                                
                                    M
                                
                                
                                    s
                                
                                
                                    
                                        
                                            l
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                     in Eq. (4), supra) is sufficiently small or zero for the first layer (e.g., after the modification of the mapping / weights described in § 4, ¶ 4, supra), the target layer in the target encoder CNN is then determined to be sufficiently similar to the corresponding source layer in the source encoder CNN, and Tzeng then proceeds to another layer (see e.g., another layer in {ℓ1, . . . , ℓn} as described in § 3.1, ¶ 4, supra) and repeats the same optimization for “each individual layer” until all the individually constrained and modeled layers in {ℓ1, . . . , ℓn} are processed.  That is, any of the i-th layers, other than the first layer selected in claim 1, where                         
                            i
                            ∈
                            
                                
                                    2
                                    …
                                    n
                                    -
                                    1
                                
                            
                             
                        
                    to which the mapping (e.g.,                         
                            
                                
                                    M
                                
                                
                                    t
                                
                                
                                    
                                        
                                            l
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                     in Eq. (4) or (5) cited for claim 1, supra) applies for producing a mapping distribution (e.g,                         
                            
                                
                                    M
                                
                                
                                    t
                                
                                
                                    
                                        
                                            l
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                    (Xt)) teaches the second layer being a hidden layer (the examiner notes that the first layer (i=1) is usually considered as the input layer, the last layer (i=n) is usually considered as the output layer, and any layers between the first and the last layers are considered as hidden layers.) The examiner thus notes that Tzeng’s iteratively proceeding through each individual layer in the target encoder CNN and performing the aforementioned optimization and modification of the mappings / weights teaches selecting a second, hidden layer of the second neural network.)
 
(Tzeng, p. 7169, § 3, ¶ 3: “In adversarial adaptive methods, the main goal is to regularize the learning of the source and target mappings, Ms and Mt, so as to minimize the distance between the empirical source and target mapping distributions: Ms(Xs) and Mt(Xt).” p. 7169, § 3, ¶ 4: “First a domain discriminator, D, which classiﬁes whether a data point is drawn from the source or the target domain. Thus, we can derive a generic formulation for domain adversarial techniques below: 

    PNG
    media_image2.png
    98
    336
    media_image2.png
    Greyscale
”
p. 7170, § 3.1, ¶ 4: “Consider a layered representations where each layer parameters are denoted as,                         
                            
                                
                                    M
                                
                                
                                    s
                                
                                
                                    l
                                
                            
                        
                     or                         
                            
                                
                                    M
                                
                                
                                    t
                                
                                
                                    l
                                
                            
                        
                    , for a given set of equivalent layers, {ℓ1, . . . , ℓn}. Then the space of constraints explored in the literature can be described through layerwise equality constraints as follows:
    PNG
    media_image4.png
    31
    370
    media_image4.png
    Greyscale
 where each individual layer can be constrained independently. A very common form of constraint is source and target layerwise equality: 
    PNG
    media_image5.png
    27
    338
    media_image5.png
    Greyscale
.” 
The examiner notes that Tzeng’s j-th layer (e.g.,                         
                            
                                
                                    M
                                
                                
                                    t
                                
                                
                                    
                                        
                                            l
                                        
                                        
                                            j
                                        
                                    
                                
                            
                             
                            w
                            h
                            e
                            r
                            e
                             
                            j
                            ∈
                            
                                
                                    1
                                    …
                                    n
                                
                            
                             
                            a
                            n
                            d
                             
                            j
                            ≠
                            i
                             
                            i
                            n
                             
                            c
                            l
                            a
                            i
                            m
                             
                            1
                        
                    ) teaches a second layer of the second neural network (e.g., Tzeng’s target encoder CNN ), and that Tzeng’s target mapping distribution, Mt(Xt), for the aforementioned j-th layer (e.g.,                         
                            
                                
                                    M
                                
                                
                                    t
                                
                                
                                    
                                        
                                            l
                                        
                                        
                                            j
                                        
                                    
                                
                            
                             
                            w
                            h
                            e
                            r
                            e
                             
                            j
                            ∈
                            
                                
                                    1
                                    …
                                    n
                                
                            
                             
                            a
                            n
                            d
                             
                            j
                            ≠
                            i
                             
                            i
                            n
                             
                            c
                            l
                            a
                            i
                            m
                             
                            1
                        
                    ) in Eq. (4) or (5), supra) teaches an output of the second layer.)
 
(Tzeng, FIG. 2 (reproduction omitted).  § 3, ¶ 4: “First a domain discriminator, D, which classiﬁes whether a data point is drawn from the source or the target domain. Thus, D is optimized according to a standard supervised loss, LadvD (X s , X t , M s , M t ) where the labels indicate the origin domain, deﬁned below:

    PNG
    media_image3.png
    90
    386
    media_image3.png
    Greyscale

§ 3, ¶ 5: “Thus, we can derive a generic formulation for domain adversarial techniques below: 

    PNG
    media_image2.png
    98
    336
    media_image2.png
    Greyscale
”
The examiner notes that Tzeng’s “generative or discriminative model?”, “weights tied or untied?” and “Which adversarial objective?” for Tzeng’s domain adaptation between the source domain and the target domain teach a third neural network.  The examiner further notes that Tzeng’s training both the source and target encoder CNNs in a pairwise manner based on minimizing the losses in Eqns. (2)-(3) to optimize the source representation, Ms, for the source encoder CNN and the target representation, Mt, for the target encoder CNN  so that the source encoder CNNcan be adapted to the target domain teaches whether the source representation, Ms, is sufficiently close to the target representation, Mt, and thus teaches whether the output from the first layer in the second neural network (e.g., Mt from the target encoder CNN) is from the first neural network (e.g. , Ms from the source encoder CNN).)
 
based on a determination that the second output is not from the first neural network, adjust one or more weights of the second layer. based on a determination that the output from the first layer is not from the first neural network, adjust one or more weights of the first layer. (Tzeng, § 4, ¶ 1: “Speciﬁcally, we use a discriminative base model, unshared weights, and the standard GAN loss. We illustrate our overall sequential training procedure in Figure 3.” § 4, ¶ 3: “Next, we choose to allow independent source and target mappings by untying the weights. This is a more ﬂexible learing [sic] paradigm as it allows more domain speciﬁc feature extraction to be learned. However, note that the target domain has no label access, and thus without weight sharing a target model may quickly learn a degenerate solution if we do not take care with proper initialization and training procedures. Therefore, we use the pre-trained source model as an initialization for the target representation space and ﬁx the source model during adversarial training.” P. 7171, § 4, ¶ 4: “In doing so, we are effectively learning an asymmetric mapping, in which we modify the target model so as to match the source distribution.” § 1, ¶ 3: “For example, [11, 12] share weights and learn a symmetric mapping of both source and target images to the shared feature space, while [13] decouple some layers thus learning a partially asymmetric mapping.” 
The examiner notes that Tzeng uses the term weight and mapping interchangeably (ese § 1, ¶ 3, supra).  The examiner further notes that Tzeng’s modifying the target mapping (e.g.,                         
                            
                                
                                    M
                                
                                
                                    t
                                
                                
                                    
                                        
                                            l
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                     for the i-th layer), which represents “layer parameters” such as weights, in Tzeng’s iteratively minimizing the distance between the source mapping/parameters (Ms) and the target mapping/parameters (Mt) teaches modifying one or more weights of the second layer (e.g., the j-th layer).)
 
With respect to claim 7, Tzeng modified by Csurka teaches the apparatus of claim 6, and Tzeng further teaches:
wherein the third neural network operates in unsupervised mode to, using labeled data, learn to correctly classify outputs from layers of either of the first neural network and the second neural network. (Tzeng, p. 7169, § 3, ¶ 1: “We present a general framework for adversarial unsupervised adaptation methods. In unsupervised adaptation, we assume access to source images Xs and labels Ys drawn from a source domain distribution ps(x, y), as well as target images Xt drawn from a target distribution pt(x, y), where there are no label observations. Our goal is to learn a target representation, Mt and classifier Ct that can correctly classify target images into one of K categories at test time, despite the lack of in domain annotations. Since direct supervised learning on the target is not possible, domain adaptation instead learns a source representation mapping, Ms, along with a source classifier, Cs, and then learns to adapt that model for use in the target domain.” The examiner notes that Tzeng’s unsupervised adaptation using labels from the source domain teaches a third neural network (e.g., FIG. 2, supra) operating in unsupervised mode and using labeled data. The examiner further notes that Tzeng’s adapting the source model to the target domain by minimizing the distances between the source and target domain with an adversarial loss function (see p. 7170, § 3.2, ¶ 1, p. 7171, § 4, ¶ 3, and p. 7171, § 4, ¶ 4 cited for claim 6, supra) teaches that Tzeng’s neural network, once the adaptation is complete, learns to correctly classify outputs from layers of either the source model or the target model as claimed.)
 

With respect to claim 10, Tzeng teaches the method of claim 8, comprising, and Csurka further teaches:
based on determining that the output from the first layer is from the first neural network, declining to adjust one or more weights of the first layer. (Csurka, ¶ [0094]: “Therefore another stopping criterion may be added, as follows. At each iteration, the classification accuracy of the learned DSCM classifier on the original labeled set                         
                            
                                
                                    T
                                
                                
                                    1
                                
                            
                             
                        
                    is evaluated and if the classification performance in step r+1 incurs a stronger degradation than a predefined tolerance threshold (e.g., 1%) compared to the accuracy obtained in step r, iterating is stopped and Wr, the metric obtained before degradation, is retained. As will be appreciated, other stopping criteria can also be considered, such as measuring the variation between iterations of the TDAS.” The examiner notes that Csurka’s classification performance at a step teaches the output of a layer.  The examiner further notes that Csurka’s terminating weight updates based on a tolerance threshold (e.g., 1% for the accuracy between two classification outputs) teaches declining to adjust one or more weights of the first layer.)
Tzeng and Csurka are analogous art because both pertain to domain adaptation of generative neural networks.
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Tzeng’s determining a layer of a first neural network of a first data type is not from a second neural network of a second data type (Tzeng, supra) with Csurka’s stopping criterion for weight updates (Csurka, supra).  The modification preserves an optimal accuracy of neural networks and maintains computing resources by terminating the update process when a stopping criterion is satisfied (Csurka, ¶ [0101]: “The transformation matrix is then updated as a function of the current transformation matrix, the current training set, and the current domain-specific weights. The current weights may then be updated. At S114, if a stopping point criterion is not met, the method may return to S106 or S108 for a further iteration.” ¶ [0094]: “At each iteration, the classification accuracy of the learned DSCM classifier on the original labeled set                         
                            
                                
                                    T
                                
                                
                                    1
                                
                            
                             
                        
                    is evaluated and if the classification performance in step r+1 incurs a stronger degradation than a predefined tolerance threshold (e.g., 1%) compared to the accuracy obtained in step r, iterating is stopped and Wr, the metric obtained before degradation, is retained.”)
 
With respect to claim 16, Tzeng teaches: an apparatus, comprising: access a first domain, the first domain being associated with a first domain genre; (Tzeng, FIG. 3: 

    PNG
    media_image1.png
    198
    791
    media_image1.png
    Greyscale

FIG. 3 caption: “An overview of our proposed Adversarial Discriminative Domain Adaptation (ADDA) approach. We first pre-train a source encoder CNN using labeled source image examples. Next, we perform adversarial adaptation by learning a target encoder CNN such that a discriminator that sees encoded source and target examples cannot reliably predict their domain label. During testing, target images are mapped with the target encoder to the shared feature space and classified by the source encoder CNN. Dashed lines indicate fixed network parameters.” P. 7172, § 5, ¶ 1: “We now evaluate ADDA for unsupervised classiﬁcation adaptation across four different domain shifts. We explore three digits datasets of varying difﬁculty: MNIST [18], USPS, and SVHN [19]. We additionally evaluate on the NYUD [20] dataset to study adaptation across modalities.”  The examiner notes that that Tzeng’s source domain to which the source encoder CNN belongs teaches a first domain, and that Tzeng’s source encoder CNN’s receiving source input teaches accessing a first domain.  The examiner further notes that Tzeng’s evaluating any of the four different types of domain shifts including adaptation across modalities teaches that the source domain  is associated with a domain genre.)
 
(Tzeng, FIG. 3 and p. 7172, § 5, ¶1, supra. FIG. 3 caption: “An overview of our proposed Adversarial Discriminative Domain Adaptation (ADDA) approach. We first pre-train a source encoder CNN using labeled source image examples. Next, we perform adversarial adaptation by learning a target encoder CNN such that a discriminator that sees encoded source and target examples cannot reliably predict their domain label. During testing, target images are mapped with the target encoder to the shared feature space and classified by the source classifier. Dashed lines indicate fixed network parameters.” 
The examiner notes that Tzeng’s target domain to which the target encoder CNN  belongs teaches a second domain, and that Tzeng’s target encoder CNN ’s receiving target input teaches accessing a second domain.  The examiner further notes that Tzeng’s evaluating any of the remaining three different types of domain shifts (e.g., excluding the domain shift evaluated by the first neural network as delineated above) including adaptation across modalities teaches that the target encoder CNN is associated with a different, second domain genre while the source domain above is associated with a first domain genre.)

using training data provided to the first and second domains, classify a target data set; and (Tzeng, FIG. 3: 

    PNG
    media_image1.png
    198
    791
    media_image1.png
    Greyscale

The examiner notes that the source domain to which the “Source CNN” belongs teaches the first domain, and that the target domain to which the “Target CNN” belongs teaches the second domain. The examiner further notes that the “source images” and “target images” in FIG. 3 teach training data provided to the first and second domains, and that the FIG. 3’s classifier and/or Target CNN’s classifying the “target image” into “class label” teaches classifying a target data set using the aforementioned training data.)
 
output a classification of the target data set, (Tzeng, FIG. 3, ¶ 1, supra. The examiner notes that “class label” for “target image” in FIG. 3 teaches “a classification of the target data set”, and that Tzeng’s classifier and/or “Target CNN’s classifying “target image” into “class label” teaches teaches outputting a classification of the target data set.)
 
wherein the target data set is classified by a domain adaptation module comprising a domain classifier to inverse a gradient and back-propagate the gradient to a main model. (Tzeng, P. 7168, § 2, ¶ 3: “The gradient reversal algorithm (ReverseGrad) proposed in [11] also treats domain invariance as a binary classification problem, but directly maximizes the loss of the domain classifier by reversing its gradients.” P. 7171, § 3.2, ¶ 2: “The gradient reversal layer of [19] optimizes the mapping to maximize the discriminator loss directly: Eq. (6) (reproduction omitted)”; p. 7172, § 3.2, ¶ 4: “When training GANs, rather than directly using the minimax loss, it is typical to train the generator with the standard loss function with inverted labels [10]. This splits the optimization into two independent objectives, one for the generator and one for the discriminator, where LadvD remains unchanged, but LadvM becomes: Eq. (7) (reproduction omitted)”. 
The examiner notes that Tzeng’s neural network illustrated in FIG. 3 and teaches a domain adaptation module, and that Tzeng’s neural network including at least the “source discriminator” and/or the “target discriminator” as well as the aforementioned gradient reversal algorithm or gradient reversal layer teaches a domain classifier in the aforementioned domain adaptation module.  The examiner further notes that Tzeng’s classifying target input with the aforementioned discriminator and gradient reversal layer/algorithm teaches the target data set is classified by the aforementioned domain adaptation module having a classifier to reverse a gradient, that the aforementioned “gradient reversal algorithm,” “gradient reversal layer,” and/or the “loss function with inverted labels” teaches inverting a gradient, and that the gradient reversal layer reversing the gradients in optimizing the mapping (e.g., weights as taught in p. 7168, § 1, ¶ 2) teaches backpropagating the reversed gradients to a main module (e.g., Tzeng’s neural network in FIG. 2 or FIG. 3) and hence the above limitation.)

Tzeng does not appear to explicitly teach: 

Csurka does, however, teach: 
at least one computer storage that is not a transitory signal and that comprises instructions executable by at least one processor to: (Csurka, ¶ [0042]: “The digital processor 30 can be variously embodied, such as by a single-core processor, a dual-core processor (or more generally by a multiple-core processor), a digital processor and cooperating math coprocessor, a digital controller, or the like. The exemplary digital processor 30, in addition to controlling the operation of the computer system 10, executes the instructions 28 stored in memory 26 for performing the method outlined in FIG. 3.” ¶ [0044]: “The term “software,” as used herein, is intended to encompass any collection or set of instructions executable by a computer or other digital system so as to configure the computer or other digital system to perform the task that is the intent of the software. The term “software” as used herein is intended to encompass such instructions stored in storage medium such as RAM, a hard disk, optical disk, or so forth, and is also intended to encompass so-called “firmware” that is software stored on a ROM or so forth.”)
Tseng and Csurka are analogous art because both pertain to domain adaptation of generative neural networks.
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Tseng’s determining a layer of a first neural network of a first data type is not from a second neural network of a second data type (Tseng, supra) with Csurka’s computer storage (Csurka, supra).  The motivation provides a large variety of common forms of non-transitory storage device for storing instructions for software for computers to read and use (Csurka, ¶ [0114]: “The computer program product may comprise a non-transitory computer-readable recording medium on which a control program is recorded (stored), such as a disk, hard drive, or the like. Common forms of non-transitory computer-readable media include, for example, floppy disks, flexible disks, hard disks, magnetic tape, or any other magnetic storage medium, CD-ROM, DVD, or any other optical medium, a RAM, a PROM, an EPROM, a FLASH-EPROM, or other memory chip or cartridge, or any other non-transitory medium from which a computer can read and use.” )

Claim(s) 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tzeng et al. Adversarial Discriminative Domain Adaptation (17 Feb. 2017) (hereinafter Tzeng) in view of Csurka et al. USPGPub US20160078359A published on Mar. 17, 2016 (hereinafter Csurka) an further in view of Ashmore, S. Evaluating the Intrinsic Similarity between Neural Networks (2015) (hereinafter Ashmore).

With respect to clam 5, Tzeng modified by Csurka teaches the apparatus of claim 4 but does not appear to explicitly teach:
wherein the first and second layers of the second neural network are selected randomly.
Ashmore does, however, teach:
wherein the first and second layers of the second neural network are selected randomly. (Ashmore, § 4 “Implementation”, p. 9, ¶ 1: “When two models are aligned, the elements of that model may be compared in a pair-wise manner to evaluate the similarity (or dissimilarity) between the two models.”  § 6.2 “Ensemble”, p. 22, ¶ 3: “For this experiment, we created a single baseline neural network for comparison, created with the same random number generator seed.” P. 25, § 6.2 “Ensemble”, ¶ 2: “For Figure 6.2 we postulate the small difference in some previously large error rates to be a result of the random seed that was used. The seed determines the order of training, and in some cases we can get lucky with how the ensemble of neural networks was trained. In the case of these, we believe that the neural networks were using some similar weights for similar functionality. Those networks were more aligned to begin with, so simple averaging did not completely ruin the accuracy.” The examiner notes that Ashmore’s using a random order in training its forward bipartite alignment method that performs pair-wise alignment and comparison for each pair of layers of two neural networks to determine similarity of each pair teaches that the pairs of layers for alignment and comparison and hence each individual layer are selected randomly.)
Tzeng, Csurka, and Ashmore are analogous art because all three references pertain to domain adaptation of generative neural networks.
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Tzeng in view of Csurka with Ashmore’s random selection of multiple layers from a target encoder CNN in determining whether these layers in the target encoder CNN  are similar to the corresponding layers in an align neural network (Ashmore, supra).  The modification solves the key problem with semi-random nature of training of neural networks by using its Forward Bipartite Alignment that adopts Ashmore, p. 7, § 3.1, ¶ 2: “The key problem addressed in this thesis is that weights which have been trained to represent some part of a problem could be in different places in the networks”; and “This is due to the randomness of initializing weights, the semi-random nature of training, but more importantly due to the nonlinear combinations that are made and the large number of iterations that it takes to train.” Pp. 7-8, § 3.1, ¶ 3: “To solve this important problem, transforming the neural network such that it is aligned to the other network is one possible solution. This solution is the basis for Forward Bipartite Alignment.” p. 25, § 6.2 “Ensemble”, ¶ 2: “Consider Figure 6.2 where FBA-Wagging does well and reaches 0 percent error rate, simply averaging can cause such a negative effect that the error rate of that model reached almost 85 percent. If that model had been aligned before averaging, the error rate would have been nearly 0 percent”; and “For Figure 6.2 we postulate the small difference in some previously large error rates to be a result of the random seed that was used. The seed determines the order of training”.)

Claim(s) 6 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tzeng et al. Adversarial Discriminative Domain Adaptation (17 Feb. 2017) (hereinafter Tzeng) in view of Csurka et al. USPGPub US20160078359A published on Mar. 17, 2016 (hereinafter Csurka) an further in view of Chen et al. Large-Scale Visual Font Recognition (2014) (hereinafter Chen).

Tzeng modified by Csurka teaches the apparatus of claim 1, and Tzeng further teaches:
wherein the instructions are executable by the at least one processor to: prior to using the third neural network to determine whether the output from the first layer is from the first neural network, (Tzeng, p. 7169, § 3, ¶ 1: “Our goal is to learn a target representation, Mt and classifier Ct that can correctly classify target images into one of K categories at test time, despite the lack of in domain annotations. Since direct supervised learning on the target is not possible, domain adaptation instead learns a source representation mapping, Ms, along with a source classifier, Cs, and then learns to adapt that model for use in the target domain.” P. 7170, § 3.1, ¶ 3: “Once the mapping parameterization is determined for the source, we must decide how to parametrize the target mapping Mt. In general, the target mapping almost always matches the source in terms of the specific functional layer (architecture), but different methods have proposed various regularization techniques.” P. 7170, § 3.2, ¶ 1: “Once we have decided on a parametrization of Mt, we employ an adversarial loss to learn the actual mapping.” P. 7171, § 4, ¶ 3: “Therefore, we use the pre-trained source model as an intitialization [sic] for the target representation space and fix the source model during adversarial training.” P. 7171, § 4, ¶ 4: “In doing so, we are effectively learning an asymmetric mapping, in which we modify the target model so as to match the source distribution.” P. 7172, § 4, last paragraph: “Through this framework we are able to motivate a novel domain adaptation method, ADDA, and offer insight into our design decisions. In the next section we demonstrate promising results on unsupervised adaptation benchmark tasks, studying adaptation across visual domains and across modalities.”
The examiner notes that Tzeng’s approach (1) learns the source representation mapping for parameters such as weights,  and/or Ms, and source classifier, Cs, as taught in p. 769, § 3, ¶ 1, supra; (2) after the source mapping parameterization is determined, determines the target representation mapping, Mt by initializing the pre-trained source model as taught in § 3.1, ¶ 3 and § 4, ¶ 3, supra; and (3) after the target mapping parameterization of Mt is determined, optimizes the target representation mapping (Mt) and target classifier (Ct) by modifying the target mapping/parameters (e.g., weights), Mt, via minimizing the distances between the source and target mapping distributions (Ms(Xs) and Mt(Xt)) with the inverted label GAN loss function as taught in § 3.2, ¶ 1 and § 4, ¶ 3, supra; and (4) uses Tzeng’s ADDA framework to produce “promising results” and insights into domain adaptation when compared to prior methods explicitly taught in § 4, last paragraph, supra.  The examiner further notes that (4) delineated above teaches providing insight to domain adaptation such as whether a first layer’s output is from a first neural network and thus teaches determining whether the output from the first layer is from the first neural network.  The examiner also notes that (1)-(3) delineated above occur prior to (4) and thus teach prior to using the third neural network to perform (4).  Therefore, Tzeng teaches the above limitation for at least the foregoing reasons.)
 
(Chen, p. 515, left-hand column, ¶ 5: “Fig. 1 shows a typical DBN architecture, which is composed of a stack of Restricted Boltzmann Machines (RBMs) and/or one or more additional layers for discrimination tasks.” “Once the structure of a DBN is determined, the goal for training is to learn the weights (and biases) between layers.” p. 516, left-hand column, ¶ 4: “After pre-training, information about the input data is stored in the weights between every adjacent layers. The DBN then adds a final layer representing the desired outputs and the overall network is fine tuned using labeled data and back propagation strategies for better discrimination (in some implementations, on top of the stacked RBMs, there is another layer called associative memory determined by supervised learning methods).” p. 516, left-hand column, ¶ 6: “In summary, DBNs use a greedy and efficient layer-by-layer approach to learn the latent variables (weights) in each hidden layer and a back propagation method for fine-tuning.” 
The examiner notes that Chen’s DBN, which performs discriminative tasks as does Tzeng’s discriminator, teaches a third neural network.  The examiner further notes that Chen’s layer-by-layer fine-tuning weights of its DBN teaches adjusting one or more weights of one or more layers of the third neural network to correctly classify input to its DBN.  Therefore, the examiner asserts that Chen’s teaching, when combined with Tzeng’s outputs from layers of either of the first neural network and the second neural network, teaches the above limitation.)
Tzeng, Csurka, and Chen are analogous art because all three references pertain to transfer learning using discriminative neural networks.
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Tzeng in view of Csurka with Chen’s adjusting weight(s) of one or more layers for correct performance of discrimination tasks (Chen, supra).  The modification learns weights in one or more layers of a discriminative neural network to improve both the generative performance and the discriminative power of the discriminative neural network (Chen, p. 516, left-hand column, last paragraph: “In summary, DBNs use a greedy and efficient layer-by-layer approach to learn the latent variables (weights) in each hidden layer and a back propagation method for fine-tuning. This hybrid training strategy thus improves both the generative performance and the discriminative power of the network.”)

With respect to claim 19, Tzeng modified by Csurka teaches the apparatus of claim 16 but does not appear to explicitly teach:  
wherein the first domain pertains to standard font text and the second domain pertains to cursive script.
Chen does, however, teach: 
wherein the first domain pertains to standard font text and the second domain pertains to cursive script. (Chen, p. 3, § 2, ¶ 3: “For each font class, we generate one image per English word, which gives 2:42 million synthetic images for the whole dataset.” P. 3, § 2, ¶ 4: “Besides the synthetic data, we also collected 325 real world test images for the font classes we have in the training set”. p. 5, last paragraph: “When new data or font classes are added to the database, we only need to calculate the new class mean vectors, and estimate the within-class covariances to update the WCCN metric incrementally. As the template model is universally shared by all classes, the template weights do not need to be retrained.4 Therefore, our algorithm can easily adapt to new data or new classes at little added cost.” FIG. 6. “(a) Real world images that are correctly classified (rank one).”

    PNG
    media_image6.png
    225
    204
    media_image6.png
    Greyscale

	The examiner notes that in the above FIG. 6(a), a standard font (e.g., “Space Coast,” “Claude,” etc.) in a real-world domain teaches a first domain, and a cursive font (e.g., “Classic,” “of the way,” etc.) in a synthetic domain teaches a second domain.)
Tzeng, Csurka, and Chen are analogous art because all three references pertain to dataset shift or mismatch of neural networks across multiple domains such as real-world domain and synthetic domain.
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Tzeng in view of Csurka to incorporate Chen’s recognizing fonts of different font styles (Chen, supra).  The modification enables Tzeng modified by Csurka, when modified by Chen, to automatically recognize the typeface without any knowledge of contents and to facilitate a scalable solution that not only Chen, Abstract: “This paper addresses the large-scale visual font recognition (VFR) problem, which aims at automatic identification of the typeface, weight, and slope of the text in an image or photo without any knowledge of content. Although visual font recognition has many practical applications, it has largely been neglected by the vision community. To address the VFR problem, we construct a large-scale dataset containing 2,420 font classes, which easily exceeds the scale of most image categorization datasets in computer vision. As font recognition is inherently dynamic and open-ended, i.e., new classes and data for existing categories are constantly added to the database over time, we propose a scalable solution based on the nearest class mean classifier (NCM). The core algorithm is built on local feature embedding, local feature metric learning and max-margin template selection, which is naturally amenable to NCM and thus to such open-ended classification problems. The new algorithm can generalize to new classes and new data at little added cost. Extensive experiments demonstrate that our approach is very effective on our synthetic test images, and achieves promising results on real world test images”)

Claim(s) 8-9, 11, and 14-15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tzeng et al. Adversarial Discriminative Domain Adaptation (17 Feb. 2017) (hereinafter Tzeng).

With respect to claim 8, Tzeng teaches a method, comprising:
(Tzeng, FIG. 3: 


    PNG
    media_image1.png
    198
    791
    media_image1.png
    Greyscale

FIG. 3 caption: “An overview of our proposed Adversarial Discriminative Domain Adaptation (ADDA) approach. We first pre-train a source encoder CNN using labeled source image examples. Next, we perform adversarial adaptation by learning a target encoder CNN such that a discriminator that sees encoded source and target examples cannot reliably predict their domain label. During testing, target images are mapped with the target encoder to the shared feature space and classified by the source encoder CNN. Dashed lines indicate fixed network parameters.” P. 7172, § 5, ¶ 1: “We now evaluate ADDA for unsupervised classiﬁcation adaptation across four different domain shifts. We explore three digits datasets of varying difﬁculty: MNIST [18], USPS, and SVHN [19]. We additionally evaluate on the NYUD [20] dataset to study adaptation across modalities.”  
The examiner first notes that Tzeng’s source encoder CNN teaches a first neural network, and that Tzeng’s pre-training its source encoder CNN with source images and labels or sending source images to its CNN during adversarial adaptation teaches accessing a first neural network.  The examiner further notes that Tzeng’s evaluating any one of the four different types of domain shifts for adaptation across modalities to a target domain shift (e.g., any one of the remaining three different types of domain shifts) teaches that the source encoder CNN is associated with a first data type. Further, the examiner notes that Tseng interchangeably uses the terms “classifier” (e.g., “source classifier” and “target classifier” in § 3, “task-specific classifier” in § 3.1, “classifier” in FIG. 3 and its caption), “encoder” (e.g., “target encoder” in FIGS. 1 and 3 as well as their respective captions), and “encoder CNN” (e.g., “source encoder CNN” and “target encoder CNN” in the caption of FIG. 3), and that “classifier,” “encoder,” and “encoder CNN” are thus interpreted as functional and/or structural equivalents of each other.)
 
accessing a second neural network, the second neural network being associated with a second data type different from the first data type; (Tzeng, FIG. 3 and p. 7172, § 5, ¶1, supra. FIG. 3 caption: “An overview of our proposed Adversarial Discriminative Domain Adaptation (ADDA) approach. We first pre-train a source encoder CNN using labeled source image examples. Next, we perform adversarial adaptation by learning a target encoder CNN such that a discriminator that sees encoded source and target examples cannot reliably predict their domain label. During testing, target images are mapped with the target encoder to the shared feature space and classified by the source classifier. Dashed lines indicate fixed network parameters.” 
The examiner notes that Tzeng’s target encoder CNN  teaches a second neural network, and that Tzeng’s target encoder CNN ’s receiving target images during adversarial adaptation illustrated in FIG. 3 teaches accessing a second neural network.  The examiner further notes that Tzeng’s evaluating any one of the remaining three different types of domain shifts (e.g., three different shifts other than the aforementioned domain shift evaluated by the first neural network) including adaptation across modalities teaches that the target encoder CNN  is associated with a different, second data type while the source encoder CNN above is associated with a first data type.)
 
providing, as input, first training data to the second neural network; (Tzeng, FIG. 3 caption: “An overview of our proposed Adversarial Discriminative Domain Adaptation (ADDA) approach. We first pre-train a source encoder CNN using labeled source image examples. Next, we perform adversarial adaptation by learning a target encoder CNN such that a discriminator that sees encoded source and target examples cannot reliably predict their domain label. During testing, target images are mapped with the target encoder to the shared feature space and classified by the source encoder CNN. Dashed lines indicate fixed network parameters.” The examiner notes that Tzeng’s learning a target encoder CNN (also referred to as “target encoder” above) teaches training the second neural network, and that Tzeng’s providing “target examples” in FIG. 3’s Caption or the “target images” illustrated in FIG. 3 to the target encoder CNN (the clamed second neural network) for learning the target encoder CNN teach provide, as input, first training data to the second neural network (e.g., Tzeng’s target encoder CNN/target CNN cited above.) Therefore, the examiner asserts that at least the aforementioned passages and figure teach the above limitation.)
 
selecting a first layer, the first layer being a hidden layer of the second neural network; (Tzeng, § 3.1, ¶ 3: “Once the mapping parameterization is determined for the source, we must decide how to parametrize the target mapping Mt. In general, the target mapping almost always matches the source in terms of the speciﬁc functional layer (architecture), but different methods have proposed various regularization techniques. All methods initialize the target mapping parameters with the source, but different methods choose different constraints between the source and target mappings, ψ(Ms,Mt).” 
§ 3.1, ¶ 4: “Consider a layered representations where each layer parameters are denoted as,                         
                            
                                
                                    M
                                
                                
                                    s
                                
                                
                                    l
                                
                            
                        
                     or                         
                            
                                
                                    M
                                
                                
                                    t
                                
                                
                                    l
                                
                            
                        
                    , for a given set of equivalent layers, {ℓ1, . . . , ℓn}. Then the space of constraints explored in the literature can be described through layerwise equality constraints as follows:                         
                            ψ
                            
                                
                                    
                                        
                                            M
                                        
                                        
                                            s
                                        
                                    
                                    ,
                                     
                                    
                                        
                                            M
                                        
                                        
                                            t
                                        
                                    
                                
                            
                            ≜
                            {
                            
                                
                                    ψ
                                
                                
                                    
                                        
                                            l
                                        
                                        
                                            i
                                        
                                    
                                
                            
                            
                                
                                    
                                        
                                            M
                                        
                                        
                                            s
                                        
                                        
                                            
                                                
                                                    l
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                    ,
                                     
                                    
                                        
                                            M
                                        
                                        
                                            t
                                        
                                        
                                            
                                                
                                                    l
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                
                            
                            
                                
                                    }
                                
                                
                                    i
                                    ∈
                                    
                                        
                                            1
                                            …
                                            n
                                        
                                    
                                
                            
                             
                             
                             
                             
                             
                            (
                            4
                            )
                        
                     where each individual layer can be constrained independently. A very common form of constraint is source and target layerwise equality:                         
                            
                                
                                    ψ
                                
                                
                                    
                                        
                                            l
                                        
                                        
                                            i
                                        
                                    
                                
                            
                            
                                
                                    
                                        
                                            M
                                        
                                        
                                            s
                                        
                                        
                                            
                                                
                                                    l
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                    ,
                                     
                                    
                                        
                                            M
                                        
                                        
                                            t
                                        
                                        
                                            
                                                
                                                    l
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                
                            
                            =
                            
                                
                                    
                                        
                                            M
                                        
                                        
                                            s
                                        
                                        
                                            
                                                
                                                    l
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                    =
                                     
                                    
                                        
                                            M
                                        
                                        
                                            t
                                        
                                        
                                            
                                                
                                                    l
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                
                            
                             
                             
                             
                             
                            (
                            5
                            )
                        
                      ”  
The examiner notes that any of the i-th layers where                         
                            i
                            ∈
                            
                                
                                    2
                                    …
                                    n
                                    -
                                    1
                                
                            
                             
                        
                    to which the mapping (e.g.,                         
                            
                                
                                    M
                                
                                
                                    t
                                
                                
                                    
                                        
                                            l
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                     in Eq. (4) or (5) above) applies to produce a mapping distribution (e.g,                         
                            
                                
                                    M
                                
                                
                                    t
                                
                                
                                    
                                        
                                            l
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                    (Xt)) teaches a hidden layer. That is, the first layer (ℓ1) may teach an input layer, and the last layer (ℓn) may teach an output layer), and the intervening layers (ℓi where                         
                            i
                            ∈
                            
                                
                                    2
                                    …
                                    n
                                    -
                                    1
                                
                            
                        
                    ) in the second neural network (e.g., the target encoder CNN) for learning the layer parameters (                        
                            
                                
                                    M
                                
                                
                                    t
                                
                                
                                    l
                                
                            
                        
                     in § 3.1, supra) is interpreted as a hidden layer. The examiner further notes that  Tzeng’s independent modeling and constraining each layer in the second neural network (e.g., Tzeng’s target encoder CNN) with respect to the corresponding individual layer in the first neural network (e.g., Tzeng’s source encoder CNN) as shown in Eqns. (4)-(5) above teaches selecting a first layer, which is a hidden layer, in the second neural network.)
 
identifying an output from the first layer that was generated based on the first training data; (Tzeng, p. 7169, § 3, ¶ 3: “In adversarial adaptive methods, the main goal is to regularize the learning of the source and target mappings, Ms and Mt, so as to minimize the distance between the empirical source and target mapping distributions: Ms(Xs) and Mt(Xt).” p. 7169, § 3, ¶ 4: “First a domain discriminator, D, which classiﬁes whether a data point is drawn from the source or the target domain. Thus, we can derive a generic formulation for domain adversarial techniques below: 

    PNG
    media_image2.png
    98
    336
    media_image2.png
    Greyscale
”
p. 7170, § 3.1, ¶ 4: “Consider a layered representations where each layer parameters are denoted as,                         
                            
                                
                                    M
                                
                                
                                    s
                                
                                
                                    l
                                
                            
                        
                     or                         
                            
                                
                                    M
                                
                                
                                    t
                                
                                
                                    l
                                
                            
                        
                    , for a given set of equivalent layers, {ℓ1, . . . , ℓn}. Then the space of constraints explored in the literature can be described through layerwise equality constraints as follows:                        
                             
                            ψ
                            
                                
                                    
                                        
                                            M
                                        
                                        
                                            s
                                        
                                    
                                    ,
                                     
                                    
                                        
                                            M
                                        
                                        
                                            t
                                        
                                    
                                
                            
                            ≜
                            {
                            
                                
                                    ψ
                                
                                
                                    
                                        
                                            l
                                        
                                        
                                            i
                                        
                                    
                                
                            
                            
                                
                                    
                                        
                                            M
                                        
                                        
                                            s
                                        
                                        
                                            
                                                
                                                    l
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                    ,
                                     
                                    
                                        
                                            M
                                        
                                        
                                            t
                                        
                                        
                                            
                                                
                                                    l
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                
                            
                            
                                
                                    }
                                
                                
                                    i
                                    ∈
                                    
                                        
                                            1
                                            …
                                            n
                                        
                                    
                                
                            
                             
                             
                             
                             
                             
                            (
                            4
                            )
                        
                     where each individual layer can be constrained independently. A very common form of constraint is source and target layerwise equality:                         
                            
                                
                                    ψ
                                
                                
                                    
                                        
                                            l
                                        
                                        
                                            i
                                        
                                    
                                
                            
                            
                                
                                    
                                        
                                            M
                                        
                                        
                                            s
                                        
                                        
                                            
                                                
                                                    l
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                    ,
                                     
                                    
                                        
                                            M
                                        
                                        
                                            t
                                        
                                        
                                            
                                                
                                                    l
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                
                            
                            =
                            
                                
                                    
                                        
                                            M
                                        
                                        
                                            s
                                        
                                        
                                            
                                                
                                                    l
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                    =
                                     
                                    
                                        
                                            M
                                        
                                        
                                            t
                                        
                                        
                                            
                                                
                                                    l
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                
                            
                             
                             
                             
                             
                            (
                            5
                            )
                        
                    .
The examiner notes that Tzeng’s i-th layer (e.g.,                         
                            l
                            i
                             
                            w
                            h
                            e
                            r
                            e
                             
                            i
                            ∈
                            
                                
                                    2
                                    …
                                    n
                                    -
                                    1
                                
                            
                             
                        
                    ) teaches a first layer of the second neural network (e.g., Tzeng’s target encoder CNN), and that the second neural network (e.g., Tzeng’s target encoder neural network) generating the target mapping distribution for the i-th layer,                         
                            
                                
                                    M
                                
                                
                                    t
                                
                                
                                    
                                        
                                            l
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                    (Xt), by learning the target representation mapping,                         
                            
                                
                                    M
                                
                                
                                    t
                                
                                
                                    
                                        
                                            l
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                    , of the i-th layer (                        
                            l
                            i
                        
                    ) for the target training input images, Xt, teaches an output (e.g., the aforementioned target mapping distribution,                         
                            
                                
                                    M
                                
                                
                                    t
                                
                                
                                    
                                        
                                            l
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                    (Xt), for the target input images Xt), and that Tzeng thus teaches the above limitation.)
 
based on determining that the output from the first layer is not from the first neural network, adjusting one or more weights of the first layer. (Tzeng, § 4, ¶ 1: “Speciﬁcally, we use a discriminative base model, unshared weights, and the standard GAN loss. We illustrate our overall sequential training procedure in Figure 3.” § 4, ¶ 3: “Next, we choose to allow independent source and target mappings by untying the weights. This is a more ﬂexible learing [sic] paradigm as it allows more domain speciﬁc feature extraction to be learned. However, note that the target domain has no label access, and thus without weight sharing a target model may quickly learn a degenerate solution if we do not take care with proper initialization and training procedures. Therefore, we use the pre-trained source model as an initialization for the target representation space and ﬁx the source model during adversarial training.” § 4, ¶ 4: “In doing so, we are effectively learning an asymmetric mapping, in which we modify the target model so as to match the source distribution.” § 1, ¶ 3: “For example, [11, 12] share weights and learn a symmetric mapping of both source and target images to the shared feature space, while [13] decouple some layers thus learning a partially asymmetric mapping.” Eqns. (1)-(3) (reproduction omitted).
The examiner notes that Tzeng uses the term weight and mapping interchangeably (ese § 1, ¶ 3, supra).  The examiner further notes that incurring a loss between the source classification and the target classification teaches that the output from the first layer is not from the first neural network, and that Tzeng’s modifying the target mapping (e.g., [AltContent: rect] for the i-th layer), which represents “layer parameters” such as weights based on adversarial training and standard GAN loss (e.g., Eqns. (1)-(3)) teaches modifying one or more weights of the first layer (e.g., the i-th layer).)

using a third neural network, determining whether the output from the first layer is from the first neural network, the third neural network being different from the first and second neural networks; (Tzeng, FIG. 3, supra.  § 3, ¶ 4: “First a domain discriminator, D, which classiﬁes whether a data point is drawn from the source or the target domain. Thus, D is optimized according to a standard supervised loss, LadvD (X s , Xt, Ms, Mt ) where the labels indicate the origin domain, deﬁned below:

    PNG
    media_image3.png
    90
    386
    media_image3.png
    Greyscale

§ 3, ¶ 5: “Thus, we can derive a generic formulation for domain adversarial techniques below: 

    PNG
    media_image2.png
    98
    336
    media_image2.png
    Greyscale
”
The examiner notes that Tzeng’s network including the functional blocks “generative or discriminative model?”, “weights tied or untied?”, “Which adversarial objective?”, and/or “discriminator” illustrated in FIG. 3 for Tzeng’s domain adaptation between the source domain and the target domain teaches a third neural network that is different from the first neural network (e.g., the source CNN in FIG. 3)) and the second neural network (e.g., Tzeng’s target encoder CNN ).  The examiner further notes that Tzeng’s training both the source and target encoder CNNs  based on minimizing the losses in Eqns. (2)-(3) to optimize the source representation for the i-th layer,                         
                            
                                
                                    M
                                
                                
                                    s
                                
                                
                                    
                                        
                                            l
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                    , for the source encoder CNN  and the target representation for the i-th layer,                         
                            
                                
                                    M
                                
                                
                                    t
                                
                                
                                    
                                        
                                            l
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                    , for the target encoder CNN  so that the source encoder CNN  is adapted to the target domain renders obvious a decision ofwhether the source mapping distribution for the i-th layer,                         
                            
                                
                                    M
                                
                                
                                    s
                                
                                
                                    
                                        
                                            l
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                    (Xi), is sufficiently close to the target mapping distribution for the i-th layer,                         
                            
                                
                                    M
                                
                                
                                    t
                                
                                
                                    
                                        
                                            l
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                    , and thus renders obvious whether the output from the first layer in the second neural network (e.g.,                         
                            
                                
                                    M
                                
                                
                                    t
                                
                                
                                    
                                        
                                            l
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                    (Xi) from the target encoder CNN ) is from the first neural network (e.g.,                         
                            
                                
                                    M
                                
                                
                                    s
                                
                                
                                    
                                        
                                            l
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                    (Xi) from the source encoder CNN).)
based on determining that the output from the first layer is not from the first neural network, adjusting one or more weights of the first layer. (Tzeng, § 4, ¶ 1: “Speciﬁcally, we use a discriminative base model, unshared weights, and the standard GAN loss. We illustrate our overall sequential training procedure in Figure 3.” § 4, ¶ 3: “Next, we choose to allow independent source and target mappings by untying the weights. This is a more ﬂexible learing [sic] paradigm as it allows more domain speciﬁc feature extraction to be learned. However, note that the target domain has no label access, and thus without weight sharing a target model may quickly learn a degenerate solution if we do not take care with proper initialization and training procedures.” § 4, ¶ 4: “In doing so, we are effectively learning an asymmetric mapping, in which we modify the target model so as to match the source distribution.” § 1, ¶ 3: “For example, [11, 12] share weights and learn a symmetric mapping of both source and target images to the shared feature space, while [13] decouple some layers thus learning a partially asymmetric mapping.” Eqns. (1)-(3) (reproduction omitted). 
The examiner notes that Tzeng uses the term weight and mapping interchangeably (ese § 1, ¶ 3, supra).  The examiner further notes that incurring a loss between the source encoder CNN output and the target encoder CNN output renders obvious that the output from the first layer of the second neural network is different from the corresponding output of the first neural network and is thus not from the first neural network, and that Tzeng’s modifying the target mapping (e.g.,                         
                            
                                
                                    M
                                
                                
                                    t
                                
                                
                                    
                                        
                                            l
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                     for the i-th layer), which represents “layer parameters” such as weights based on adversarial training and standard GAN loss (e.g., Eqns. (1)-(3)) renders obvious modifying one or more weights of the first layer (e.g., the i-th layer).)

Tzeng teaches the method of claim 8, comprising, and Tzeng further teaches:
determining, using the third neural network, whether the output from the first layer is from the first neural network at least in part by using the third neural network to identify the output from the first layer as pertaining to the first data type. (Tzeng, p. 7169, § 3, ¶ 4: “We are now able to describe our full general framework view of adversarial adaptation approaches. We note that all approaches minimize source and target representation distances through alternating minimization between two functions. First a domain discriminator, D, which classifies whether a data point is drawn from the source or the target domain.” P. 7172, FIG. 3 “Adversarial Adaptation”: 

    PNG
    media_image7.png
    255
    362
    media_image7.png
    Greyscale

The examiner notes that Tzeng’s “discriminator” that classifies whether the output is from a source CNN receiving source images in a first data type or from a target CNN receiving target images in a second data type until the target CNN is trained as illustrated in FIG. 3 teaches the above limitation.)
 
With respect to claim 11, Tzeng teaches the method of claim 10, wherein the output is a first output, and Tzeng further teaches: 
(Tzeng, § 3, ¶ 4 and § 3.1, ¶ 4 cited for claim 1, supra.  § 4, ¶ 3: “Next, we choose to allow independent source and target mappings by untying the weights. This is a more ﬂexible learing [sic] paradigm as it allows more domain speciﬁc feature extraction to be learned. However, note that the target domain has no label access, and thus without weight sharing a target model may quickly learn a degenerate solution if we do not take care with proper initialization and training procedures. Therefore, we use the pre-trained source model as an initialization for the target representation space and ﬁx the source model during adversarial training.” § 4, ¶ 4: “In doing so, we are effectively learning an asymmetric mapping, in which we modify the target model so as to match the source distribution.”
The examiner notes that Tzeng teaches a layerwise domain adaptation technique that optimizes the source mapping in a source layer and the target mapping in a target layer for each individually modeled layer as shown in Eq. (3) in § 3, ¶ 4 and Eqns. (4)-(5) in § 3.1, ¶ 4, supra. The examiner further notes that when the adversarial loss (                        
                            
                                
                                    L
                                
                                
                                    
                                        
                                            a
                                            d
                                            v
                                        
                                        
                                            M
                                        
                                    
                                
                            
                        
                     in Eq. (3), supra) for a first layer (e.g.,                         
                            
                                
                                    M
                                
                                
                                    s
                                
                                
                                    
                                        
                                            l
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                     in Eq. (4), supra) is sufficiently small or zero for a first layer (e.g., after the modification of the mapping / weights described in § 4, ¶ 4, supra), the target layer in the target encoder CNN is then sufficiently similar to the source layer in the source encoder CNN, and Tzeng then proceeds to another layer (see e.g., another layer in {ℓ1, . . . , ℓn} as described in § 3.1, ¶ 4, supra) and repeats the same optimization for “each individual layer” until all the individually constrained and modeled layers in {ℓ1, . . . , ℓn} are processed.  The examiner thus notes that Tzeng’s iteratively proceeding through each individual layer in the target encoder CNN  and performing the aforementioned optimization and modification of the mappings / weights teaches selecting a second, hidden layer of the second neural network.)
 
identifying a second output from the second layer; (Tzeng, p. 7169, § 3, ¶ 3: “In adversarial adaptive methods, the main goal is to regularize the learning of the source and target mappings, Ms and Mt, so as to minimize the distance between the empirical source and target mapping distributions: Ms(Xs) and Mt(Xt).” § 3, ¶ 4: “First a domain discriminator, D, which classiﬁes whether a data point is drawn from the source or the target domain. Thus, we can derive a generic formulation for domain adversarial techniques below: 

    PNG
    media_image2.png
    98
    336
    media_image2.png
    Greyscale
” The examiner notes that Tzeng’s j-th layer (e.g.,                         
                            
                                
                                    M
                                
                                
                                    t
                                
                                
                                    
                                        
                                            l
                                        
                                        
                                            j
                                        
                                    
                                
                            
                             
                            w
                            h
                            e
                            r
                            e
                             
                            j
                            ∈
                            
                                
                                    1
                                    …
                                    n
                                
                            
                             
                            a
                            n
                            d
                             
                            j
                            ≠
                            i
                             
                            i
                            n
                             
                            c
                            l
                            a
                            i
                            m
                             
                            1
                        
                    ) teaches a second layer of the second neural network (e.g., Tzeng’s target encoder CNN ), and that Tzeng’s target mapping distribution, Mt(Xt), for the aforementioned j-th layer (e.g.,                         
                            
                                
                                    M
                                
                                
                                    t
                                
                                
                                    
                                        
                                            l
                                        
                                        
                                            j
                                        
                                    
                                
                            
                             
                            w
                            h
                            e
                            r
                            e
                             
                            j
                            ∈
                            
                                
                                    1
                                    …
                                    n
                                
                            
                             
                            a
                            n
                            d
                             
                            j
                            ≠
                            i
                             
                            i
                            n
                             
                            c
                            l
                            a
                            i
                            m
                             
                            1
                        
                    ) in Eq. (4) or (5), supra) teaches an output of the second layer.)

using the third neural network, determining whether the second output is from the first neural network; and (Tzeng, FIG. 2 (reproduction omitted).  § 3, ¶ 4: “First a domain discriminator, D, which classiﬁes whether a data point is drawn from the source or the target domain. Thus, D is optimized according to a standard supervised loss, LadvD (X s , X t , M s , M t ) where the labels indicate the origin domain, deﬁned below:

    PNG
    media_image3.png
    90
    386
    media_image3.png
    Greyscale

§ 3, ¶ 5: “Thus, we can derive a generic formulation for domain adversarial techniques below: 

    PNG
    media_image2.png
    98
    336
    media_image2.png
    Greyscale
”
The examiner notes that Tzeng’s “generative or discriminative model?”, “weights tied or untied?” and “Which adversarial objective?” for Tzeng’s domain adaptation between the source domain and the target domain teach a third neural network.  The examiner further notes that Tzeng’s training both the source and target encoder CNNs in a pairwise manner based on minimizing the losses in Eqns. (2)-(3) to optimize the source representation, Ms, for the source encoder CNN and the target representation, Mt, for the target encoder CNN so that the source encoder CNN can be adapted to the target domain teaches whether the source representation, Ms, is sufficiently close to the target representation, Mt, and thus teaches whether the output from the first layer in the second neural network (e.g., Mt from the target encoder CNN) is from the first neural network (e.g. , Ms from the source encoder CNN).)
 
(Tzeng, § 4, ¶ 1: “Speciﬁcally, we use a discriminative base model, unshared weights, and the standard GAN loss. We illustrate our overall sequential training procedure in Figure 3.” § 4, ¶ 3: “Next, we choose to allow independent source and target mappings by untying the weights. This is a more ﬂexible learing [sic] paradigm as it allows more domain speciﬁc feature extraction to be learned. However, note that the target domain has no label access, and thus without weight sharing a target model may quickly learn a degenerate solution if we do not take care with proper initialization and training procedures. Therefore, we use the pre-trained source model as an initialization for the target representation space and ﬁx the source model during adversarial training.” P. 7171, § 4, ¶ 4: “In doing so, we are effectively learning an asymmetric mapping, in which we modify the target model so as to match the source distribution.” § 1, ¶ 3: “For example, [11, 12] share weights and learn a symmetric mapping of both source and target images to the shared feature space, while [13] decouple some layers thus learning a partially asymmetric mapping.” 
The examiner notes that Tzeng uses the term weight and mapping interchangeably (ese § 1, ¶ 3, supra).  The examiner further notes that Tzeng’s modifying the target mapping (e.g.,                         
                            
                                
                                    M
                                
                                
                                    t
                                
                                
                                    
                                        
                                            l
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                     for the i-th layer), which represents “layer parameters” such as weights, in Tzeng’s iteratively minimizing the distance between the source mapping/parameters (Ms) and the target mapping/parameters (Mt) teaches modifying one or more weights of the second layer (e.g., the j-th layer).)

With respect to claim 14, Tzeng teaches the method of claim 8, and Tzeng further teaches: 
wherein the third neural network operates in unsupervised mode to, using labeled data, learn to correctly classify outputs from layers of either of the first neural network and the second neural network. (Tzeng, p. 7169, § 3, ¶ 1: “We present a general framework for adversarial unsupervised adaptation methods. In unsupervised adaptation, we assume access to source images Xs and labels Ys drawn from a source domain distribution ps(x, y), as well as target images Xt drawn from a target distribution pt(x, y), where there are no label observations. Our goal is to learn a target representation, Mt and classifier Ct that can correctly classify target images into one of K categories at test time, despite the lack of in domain annotations. Since direct supervised learning on the target is not possible, domain adaptation instead learns a source representation mapping, Ms, along with a source classifier, Cs, and then learns to adapt that model for use in the target domain.” The examiner notes that Tzeng’s unsupervised adaptation using labels from the source domain teaches a third neural network (e.g., FIG. 2, supra) operating in unsupervised mode and using labeled data. 
The examiner further notes that Tzeng’s adapting the source model to the target domain by minimizing the distances between the source and target domain with an adversarial loss function (see p. 7170, § 3.2, ¶ 1, p. 7171, § 4, ¶ 3, and p. 7171, § 4, ¶ 4 cited for claim 6, supra) teaches that Tzeng’s neural network, once the adaptation is complete, learns to correctly classify outputs from layers of either the source model or the target model as claimed.)
 
With respect to claim 15, Tzeng teaches the method of claim 8, comprising, and Tzeng further teaches: 
initially establishing the second neural network by a copying of the first neural network. (Tzeng, § 4, ¶ 3: “However, note that the target domain has no label access, and thus without weight sharing target model may quickly learn a degenerate solution if we do not take care with proper initialization and training procedures. Therefore, we use the pre-trained source model as an intitialization [sic] for the target representation space and ﬁx the source model during adversarial training.”  The examiner notes that Tzeng’s using the same, pre-trained source mode as an initialization for the target domain teaches copying a first neural network to initially establish the second neural network.)
 
 	Claim(s) 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tzeng et al. Adversarial Discriminative Domain Adaptation (17 Feb. 2017) (hereinafter Tzeng) in view of Clinchant et al. USPGPub 2018/0024968 published on Jan. 25, 2018 (hereinafter Clinchant).

With respect to claim 12, Tzeng teaches the method of claim 8, but does not appear to teach wherein the first layer is selected based on a command from a human supervisor. 
Clinchant does, however, teach: 
 (Clinchant, ¶ [0058]: “As will be appreciated, while the steps of the method may all be computer implemented, in some embodiments one or more of the steps may be at least partially performed manually.”¶ [0063]: “The exemplary mapping component 52 used herein can be based on the stacked marginalized Denoising Autoencoder (sMDA) described in Chen 2012, which will now be briefly described. The sMDA is a version of the multi-layer neural network trained to reconstruct input data from partial random corruption (see, P. Vincent, et al., ‘Extracting and composing robust features with denoising autoencoders,’ ICML pp. 1096-1103, 2008). In the method of Chen, the random corruption is marginalized out, yielding the optimal reconstruction weights in the closed-form and avoids the need for backpropagation in tuning.” The examiner notes that Clinchant’s explicit description that one or more steps in its method may be performed manually, and that its method includes optimal reconstruction of weights in Clinchant’s method teaches manually reconstructing a weight of a layer and hence manually selecting the layer to which the weight for reconstruction belongs, and that a user’s manual selection of a layer for reconstructing a weight thereof teaches a command from a user supervisor.)
Tzeng and Clinchant are analogous art because both pertain to domain adaptation of generative neural networks.
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Tzeng to incorporate Clinchant’s manual identification of a layer for reconstructing a weight (Csurka, supra).  The modification Clinchant, ¶ [00156]: “Accordingly, in embodiments employing unsupervised learning the classifier training may include manual review of and labeling of the resulting clusters. Other human feedback for the classifier training is also contemplated, such as providing initial conditions for initiating an iterative classifier learning process.”)

Claim(s) 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tzeng et al. Adversarial Discriminative Domain Adaptation (17 Feb. 2017) (hereinafter Tzeng) in view of Chen et al. Large-Scale Visual Font Recognition (2014) (hereinafter Chen).
With respect to clam 13, Tzeng teaches the method of claim 8, wherein the instructions are executable by the at least one processor to, and Tzeng further teaches: 
prior to using the third neural network to determine whether the output from the first layer is from the first neural network, (Tzeng, p. 7169, § 3, ¶ 1: “Our goal is to learn a target representation, Mt and classifier Ct that can correctly classify target images into one of K categories at test time, despite the lack of in domain annotations. Since direct supervised learning on the target is not possible, domain adaptation instead learns a source representation mapping, Ms, along with a source classifier, Cs, and then learns to adapt that model for use in the target domain.” P. 7170, § 3.1, ¶ 3: “Once the mapping parameterization is determined for the source, we must decide how to parametrize the target mapping Mt. In general, the target mapping almost always matches the source in terms of the specific functional layer (architecture), but different methods have proposed various regularization techniques.” P. 7170, § 3.2, ¶ 1: “Once we have decided on a parametrization of Mt, we employ an adversarial loss to learn the actual mapping.” P. 7171, § 4, ¶ 3: “Therefore, we use the pre-trained source model as an intitialization [sic] for the target representation space and fix the source model during adversarial training.” P. 7171, § 4, ¶ 4: “In doing so, we are effectively learning an asymmetric mapping, in which we modify the target model so as to match the source distribution.” P. 7172, § 4, last paragraph: “Through this framework we are able to motivate a novel domain adaptation method, ADDA, and offer insight into our design decisions. In the next section we demonstrate promising results on unsupervised adaptation benchmark tasks, studying adaptation across visual domains and across modalities.”
The examiner notes that Tzeng’s approach (1) learns the source representation mapping for parameters such as weights, Mt and/or Mt, and source classifier, Cs, as taught in p. 769, § 3, ¶ 1, supra; (2) after the source mapping parameterization is determined, determines the target representation mapping, Mt by initializing the pre-trained source model as taught in § 3.1, ¶ 3 and § 4, ¶ 3, supra; and (3) after the target mapping parameterization of Mt is determined, optimizes the target representation mapping (Mt) and target classifier (Ct) by modifying the target mapping/parameters (e.g., weights), Mt, via minimizing the distances between the source and target mapping distributions (Ms(Xs) and Mt(Xt)) with the inverted label GAN loss function as taught in § 3.2, ¶ 1 and § 4, ¶ 3, supra; and (4) uses Tzeng’s ADDA framework to produce “promising results” and insights into domain adaptation when compared to prior methods explicitly taught in § 4, last paragraph, supra.  The examiner further notes that (4) delineated above teaches providing insight to domain adaptation such as whether a first layer’s output is from a first neural network and thus teaches determining whether the output from the first layer is from the first neural network.  The examiner also notes that (1)-(3) delineated above occur prior to (4) and thus teach prior to using the third neural network to perform (4).  Therefore, Tzeng teaches the above limitation for at least the foregoing reasons.)

Tzeng does not appear to explicitly teach:  
adjusting one or more weights of one or more layers of the third neural network so that the third neural network learns to correctly classify outputs from layers of either of the first neural network and the second neural network. 

Chen does, however, teach: 
adjusting one or more weights of one or more layers of the third neural network so that the third neural network learns to correctly classify outputs from layers of either of the first neural network and the second neural network. (Chen, p. 515, left-hand column, ¶ 5: “Fig. 1 shows a typical DBN architecture, which is composed of a stack of Restricted Boltzmann Machines (RBMs) and/or one or more additional layers for discrimination tasks.” “Once the structure of a DBN is determined, the goal for training is to learn the weights (and biases) between layers.” p. 516, left-hand column, ¶ 4: “After pre-training, information about the input data is stored in the weights between every adjacent layers. The DBN then adds a final layer representing the desired outputs and the overall network is fine tuned using labeled data and back propagation strategies for better discrimination (in some implementations, on top of the stacked RBMs, there is another layer called associative memory determined by supervised learning methods).” p. 516, left-hand column, ¶ 6: “In summary, DBNs use a greedy and efficient layer-by-layer approach to learn the latent variables (weights) in each hidden layer and a back propagation method for fine-tuning.” 
The examiner notes that Chen’s DBN, which performs discriminative tasks as does Tzeng’s discriminator, teaches a third neural network.  The examiner further notes that Chen’s layer-by-layer fine-tuning weights of its DBN teaches adjusting one or more weights of one or more layers of the third neural network to correctly classify input to its DBN.  Therefore, the examiner asserts that Chen’s teaching, when combined with Tzeng’s outputs from layers of either of the first neural network and the second neural network, teaches the above limitation.)
Tzeng and Chen are analogous art because all three references pertain to transfer learning using discriminative neural networks. 
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Tzeng with Chen’s adjusting weight(s) of one or more layers for correct performance of discrimination tasks (Chen, supra).  The modification learns weights in one or more layers of a discriminative neural network to improve both the generative performance and the discriminative power of the discriminative neural network (CHen, p. 516, left-hand column, last paragraph: “In summary, DBNs use a greedy and efficient layer-by-layer approach to learn the latent variables (weights) in each hidden layer and a back propagation method for fine-tuning. This hybrid training strategy thus improves both the generative performance and the discriminative power of the network.”.)

Claim(s) 17 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tzeng et al. Adversarial Discriminative Domain Adaptation (17 Feb. 2017) (hereinafter Tzeng) in view of Csurka et al. USPGPub US20160078359A published on Mar. 17, 2016 (hereinafter Csurka) and further in view of Csurka, G. Domain Adaptation for Visual Applications: A Comprehensive Survey (30 Mar. 2017) (hereinafter Csurka 1).
With respect to claim 17, Tzeng modified by Csurka teaches the apparatus of claim 16 but do not appear to explicitly teach: 
wherein the first domain pertains to real world video and the second domain pertains to computer game video.

Csurka 1, does however, teach: 
wherein the first domain pertains to real world video and the second domain pertains to computer game video. (Curska 1, pp. 27-28, § 5, ¶ 3: “The recent progresses in computer graphics and modern high-level generic graphics platforms such as game engines enable to generate photo-realistic virtual worlds with diverse, realistic, and physically plausible events and actions. Popular virtual words are SYNTHIA37 [176], Virtual KITTI38 [177] and GTA-V [178] (see also Figure 23).” P. 28, § 5, ¶ 4: “The Cool Temporal Segment Network [189] is an end-to-end action recognition model for real-world target categories that combines a few examples of labeled real-world videos with a large number of procedurally generated synthetic videos. The model uses a deep multi-task representation learning architecture, able to mix synthetic and real videos even if the action categories differ between the real and synthetic sets (see Figure 24).”)
Tzeng, Csurka, and Csurka 1 are analogous art because all three references pertain to domain adaptation of generative neural networks.
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Tzeng in view of Csurka to incorporate Csurka 1’s use of real-world videos and computer game videos for domain adaptation between a real-world domain and a synthetic domain (e.g., a computer video game domain) (Csurka 1, supra).  The modification not only provide great promise for deep learning across a variety of computer vision problems but also helps to adjust the models trained in one domain to the other domain, especially when no or few labeled examples are available (Csurka 1, p. 28, § 5, ¶ 2: “Such virtually generated and controlled environments come with different levels of labeling for free and therefore have great promise for deep learning across a variety of computer vision problems, including optical flow [179, 180, 181, 182], object trackers [183, 177], depth estimation from RGB [184], object detection [185, 186, 187] semantic segmentation [188, 176, 178] or human actions recognition [189].” P. 28, § 5, ¶ 3: “In most cases, the synthetic data is used to enrich the real data for building the models. However, DA techniques can further help to adjust the model trained with virtual data (source) to real data (target) especially when no or few labeled examples are available in the real domain [190, 191, 176, 189].”)
 
With respect to claim 20, Tzeng modified by Csurka teaches the apparatus of claim 16 but does not appear to explicitly teach: 
wherein the domain classifier inverses the gradient using a gradient reversal layer (GRL) receiving data from a spatial model and a temporal model. 
Csurka 1 does, however, teach: 
wherein the domain classifier inverses the gradient using a gradient reversal layer (GRL) receiving data from a spatial model and a temporal model. (Curska 1, FIG. 20 Caption: “Fig. 20 The DANN architecture including a feature extractor (green) and a label predictor (blue), which together form a standard feed-forward architecture. Unsupervised DA is achieved by the gradient reversal layer that multiplies the gradient by a certain negative constant during the backpropagation-based training to ensures that the feature distributions over the two domains are made indistinguishable.” P. 28, ¶ 3: “In most cases, the synthetic data is used to enrich the real data for building the models. However, DA techniques can further help to adjust the model trained with virtual data (source) to real data (target) especially when no or few labeled examples are available in the real domain [190, 191, 176, 189]. As such, [190] propose a deep spatial feature point architecture for visuomotor representation which, using synthetic examples and a few supervised examples, transfer the pretrained model to real imagery.” P. 28, ¶ 4: “The Cool Temporal Segment Network [189] is an end-to-end action recognition model for real-world target categories that combines a few examples of labeled real-world videos with a large number of procedurally generated synthetic videos.”
	The examiner notes that Curska 1’s gradient reversal layer in its neural network teaches the domain classifier inverses the gradient using a gradient reversal layer.  The examiner further notes that Curska 1’s domain adaptation between a synthetic domain with a deep spatial feature point architecture teaches receiving data from a spatial model, and that Curska 1’s domain adaptation between a real-world domain with a cool temporal segment network teaches a temporal model.)
Tzeng, Csurka, and Csurka 1 are analogous art because all three references pertain to domain adaptation of generative neural networks.
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Tzeng in view of Csurka to incorporate Csurka 1’s gradient reversal layer that receives data from a spatial model as well as a temporal model and inverts the gradient (Csurka 1, supra).  The modification provides great promises for deep learning across a variety of computer vision problems with unsupervised domain adaptation that utilizes a gradient reversal layer in backpropagation to ensure that feature distributions over two domains are indistinguishable as well as capability of mixing synthetic and real data even when classification categories differ between the real and synthetic datasets (Csurka 1, p. 28, ¶ 2: “Such virtually generated and controlled environments come with different levels of labeling for free and therefore have great promise for deep learning across a variety of computer vision problems, including optical flow [179, 180, 181, 182], object trackers [183, 177], depth estimation from RGB [184], object detection [185, 186, 187] semantic segmentation [188, 176, 178] or human actions recognition [189].” P. 28, ¶ 3: “However, DA techniques can further help to adjust the model trained with virtual data (source) to real data (target) especially when no or few labeled examples are available in the real domain [190, 191, 176, 189]. As such, [190] propose a deep spatial feature point architecture for visuomotor representation which, using synthetic examples and a few supervised examples, transfer the pretrained model to real imagery.”  P. 28, ¶ 4: “The model uses a deep multi-task representation learning architecture, able to mix synthetic and real videos even if the action categories differ between the real and synthetic sets (see Figure 24).”)

Claim(s) 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tzeng et al. Adversarial Discriminative Domain Adaptation (17 Feb. 2017) (hereinafter Tzeng) in view of Csurka et al. USPGPub US20160078359A published on Mar. 17, 2016 (hereinafter Csurka) and further in view of Catanzaro et al. US PGPub 20170148431 published on May 25, 2017 (hereinafter Cantanzaro).

With respect to claim 18, Tzeng modified by Csurka teaches the apparatus of claim 16 but does not appear to explicitly teach: 
wherein the first domain pertains to information derived from a first voice and the second domain pertains to information derived from a second voice. 
Catanzaro does, however, teach: 
(Catanzaro, ¶ [0038]: “In embodiments, an English speech system was trained on 11,940 hours of speech, while a Mandarin system was trained on 9,400 hours. In embodiments, data synthesis was used to further augment the data during training.” The examiner notes that the English domain to which Catanzaro’s English speech system belongs teaches a first domain, and that the Mandarin domain to which Catanzaro’s Mandarin speech system belongs teaches a second domain.)
Tzeng, Csurka, and Catanzaro are analogous art because both pertain to domain adaptation of neural networks.
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Tzeng in view of Csurka to incorporate Catanzaro’s adapting between an English domain and a Mandarin domain (Catanzaro, supra).  The modification not only recognize speech of vastly different languages but can also be inexpensively deployed delivering low latency when serving users (Catanzaro, Abstract: “Embodiments of end-to-end deep learning systems and methods are disclosed to recognize speech of vastly different languages, such as English or Mandarin Chinese. In embodiments, the entire pipelines of hand-engineered components are replaced with neural networks, and the end-to-end learning allows handling a diverse variety of speech including noisy environments, accents, and different languages. Using a trained embodiment and an embodiment of a batch dispatch technique with GPUs in a data center, an end-to-end deep learning system can be inexpensively deployed in an online setting, delivering low latency when serving users at scale.”)


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Tsatsin et al. USPGPub 2017/0357896 published on Dec. 14, 2017 teaches a method of training a neural network to create an embedding space including a catalog of documents, the method including providing a plurality of training sets of K+2 training documents to a computer system, each training document being represented by a corresponding training vector x, each set of training documents including a target document represented by a vector xt, a favored document represented by a vector xs, and K>1 unfavored documents represented respectively by vectors xi u, where i is an integer from 1 to K, and each of the vectors including a plurality of input vector elements, for each given one of the training sets, passing, by the computer system, the vector representing each document of the training set through a neural network to derive a corresponding output vector yt, a corresponding output vector ys, and corresponding output vectors yi u, each of the output vectors including a plurality of output vector elements, the neural network including a set of adjustable parameters which dictate an amount of influence that is imposed on each input vector element of an input vector to derive each output vector element of the output vector, adjusting the parameters of the neural network so i u of [D(yt,ys)−D(yt,yi u)], where D is a distance between two vectors, and for each given one of the training sets, passing the vector representing each document of the training set through the neural network having the adjusted parameters to derive the output vectors.
Ganin et al. Domain-Adversarial Training of Neural Networks (April 2016) teaches a new representation learning approach for domain adaptation, in which data at training and test time come from similar but different distributions. The approach is directly inspired by the theory on domain adaptation suggesting that, for effective domain transfer to be achieved, predictions must be made based on features that cannot discriminate between the training (source) and test (target) domains.
Opitz et al. Deep Metric Learning with BIER: Boosting Independent Embeddings Robustly (15 Jan. 2018) teaches propose additional loss functions which make learners more diverse from each other. Opitz presents two different loss functions to encourage the diversity of learners. These can either be used for weight initialization or as auxiliary loss function during training (see Section 3.3). Our first loss function, which Opitz denotes as Activation Loss, optimizes the embeddings such that for a given sample, only a single embedding is active and all other embeddings are close to zero (see Section 3.2.1). As second loss function, Opitz proposes an Adversarial Loss. Opitz trains a regressor on top of our embeddings which maps one embedding to a different embedding, maximizing their similarity. By inserting a gradient reversal layer between the 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERICH C. TZOU whose telephone number is (571)272-9852. The examiner can normally be reached Monday-Friday 6:00AM-5:00PM PST with alternative Fridays off.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann J. Lo can be reached on 571-272-9767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/ANN J LO/Supervisory Patent Examiner, Art Unit 2126