Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
 
Status of Claims
The following claim(s) is/are pending in this Office action: 1-20.
Claim(s) 1-20 are rejected.  This rejection is NON-FINAL.

Claim Objections
Claims 4 and 9 stand objected to because of the following informalities:
(a)        Claim 4: The limitation “wherein the ratio is identified and the equation is applied using cross-domain batch normalization (CDBN)” includes two complete sentences without proper punctuation. The examiner suggests amending this limitation to recite “wherein the ratio is identified, and the equation is applied using cross-domain batch normalization (CDBN)”.
(b)        Claim 9 contains a minor clerical informality by missing a conjunction between the last two blocks of limitations. The examiner suggests amending the last two blocks of limitations to recite “identifying a ratio to normalize the first output and the second output; and applying the ratio to outputs from the hidden layer of the second neural network to normalize the outputs from the hidden layer of the second neural network.”  
Appropriate correction is required.
 

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
 
 
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
 
 
Claim 18 stands rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
(a)       Claim(s) 18: the limitations “source and target data” is indefinite because it is unclear whether the claimed “source” refers to a source of data or “source data”. Further, it is unclear whether “target data” refers to the claimed “target data set,” “first domain of training data,” and/or “second domain of training data” recited in base claim 12.  For purpose of examination, the claimed “source and target data” is interpreted as “source data and target data”, and the claimed “target data” is interpreted as being different from the claimed “target data set” recited in base claim 12. Clarification is nevertheless required.
(b)       Claim(s) 19: (1) the two instances of “statistics” in “to use the ratio and statistics related to the target to normalize statistics for both the source and the target” are indefinite because it is unclear whether the “statistics” used together with “the ratio” refer to the same or different “statistics” that are to be normalized. For purpose of examination, the above two instances of “statistics” are interpreted as the same “statistics”. (2) The limitation “the source and the target” lacks proper antecedent basis and is indefinite because it is unclear what “the source” refers to, and what “the targe refers to”. For purpose of examination, “the source” and “the target” are interpreted as “a first domain” and “a second domain” although “the source” may be either “a first domain” or “a second domain”, and so is “the target” yet “the source” is different from “the target”. Clarification is nevertheless required.
 
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-20 stand rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception without significantly more.
 
Step 1: Claims 1-8 are directed to the statutory class of machines; claims 9-11 are directed to the statutory category of processes; and claims 12-20 are directed to the statutory category of machines.
 
Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Judicial Exceptions: Step 2A – Prong One: 
Claim 1 recites the following judicial exceptions:
access a first neural network, the first neural network being associated with a first data type; (mental process - The examiner notes that this limitation, under the broadest reasonable interpretation, covers performance of this imitation by a human by with a physical aid such as a pen and paper or a computer. For example, a human can use a pen and paper to draft a first tree (e.g., first neural network) of processes for processing a first type of data. See MPEP § 2106.04(a).)
 
access a second neural network, the second neural network being associated with a second data type different from the first data type; (mental process - The examiner notes that this limitation, under the broadest reasonable interpretation, covers performance of this imitation by a human by with a physical aid such as a pen and paper or a computer. For example, a human can use a pen and paper to draft a second tree of processes (e.g., first neural network) for processing a second type of data. See MPEP § 2106.04(a).)
 
identify a ratio to normalize the first output and the second output; (Mathematical concept/principles/calculations: The examiner notes that this limitation, under its broadest reasonable interpretation, merely covers identifying or determining a numeric ratio (e.g., a numeric value between 0 and 1, exclusive) which is a mathematical concept or principle that has been determined to be an abstract idea. See MPEP § 2106.04(a)(2).)
 
apply an equation that accounts for the ratio to change one or more weights of the intermediate layer of the second neural network. (Mathematical concept/principles/calculations: The examiner notes that this limitation, under its broadest reasonable interpretation, merely covers applying a numeric ratio to two output datasets to combine two output datasets at the ratio as well as normalizing the resulting dataset both of which are mathematical concepts, principles, or calculations that have been determined to be an abstract idea. See MPEP § 2106.04(a)(2).)

Additional Elements – Step 2A Prong Two & Step 2B: 
            But for the recitation of the following insignificant additional elements, claim 1, under the broadest reasonable interpretation, merely covers the aforementioned judicial exceptions.  Nonetheless, these additional elements fail to amount to significantly more than the claimed judicial exceptions. 
at least one processor, and (mere instructions to apply an exception: The examiner notes that this additional element merely amounts to a recitation of the words “apply it” (or an equivalent) to implement a judicial exception on a computer or a generic computer component (e.g., at least one processor), and that this has been held to be insufficient to satisfy Step 2A Prong Two and Step 2B. See MPEP § 2106.05(f).)
at least one computer storage that is not a transitory signal and that comprises instructions executable by the at least one processor to: (mere instructions to apply an exception: The examiner notes that this additional element merely amounts to a recitation of the words “apply it” (or an equivalent) to implement a judicial exception on a computer or a generic computer component (e.g., at least one computer storage that comprises instructions executable by the at least one processor), and that this has been held to be insufficient to satisfy Step 2A Prong Two and Step 2B. See MPEP § 2106.05(f).)
provide, as input, first training data to the first neural network; (insignificant extra-solution (pre-solution) activity: The examiner notes that this additional element merely covers providing input and is this directed to insignificant pre-solution activity that has been held to fail to integrate a judicial exception into a practical application because it does not impose any meaningful limits on practicing the abstract idea.  See MPEP § 2106. 05(b)(III).)
provide, as input, second training data to the second neural network, the first training data being different from the second training data; (insignificant extra-solution (pre-solution) activity: The examiner notes that this additional element merely covers providing input and is this directed to insignificant pre-solution activity that has been held to fail to integrate a judicial exception into a practical application because it does not impose any meaningful limits on practicing the abstract idea.  See MPEP § 2106. 05(b)(III).)
identify a first output from an intermediate layer of the first neural network, the first output being based on the first training data; (insignificant extra-solution (pre-solution) activity: The examiner notes that this additional element merely covers providing input and is this directed to insignificant post-solution activity that has been held to fail to integrate a judicial exception into a practical application because it does not impose any meaningful limits on practicing the abstract idea.  See MPEP § 2106. 05(b)(III).)
identify a second output from an intermediate layer of the second neural network, the second output being based on the second training data, the respective intermediate layers of the first and second neural networks being parallel layers; (insignificant extra-solution (pre-solution) activity: The examiner notes that this additional element merely covers providing input and is this directed to insignificant post-solution activity that has been held to fail to integrate a judicial exception into a practical application because it does not impose any meaningful limits on practicing the abstract idea.  See MPEP § 2106. 05(b)(III).)
Therefore, the examiner asserts that these additional elements fail to integrate the claimed judicial exceptions into a practical application and thus fail to satisfy Prong Two for at least the foregoing reasons.  In additional, these additional elements fail to amount to significantly more to the claimed judicial exceptions and thus also fail to satisfy Step 2B, the same rationale applying. 
 
Claim 2 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. Claim 2 recites the following limitations:
The apparatus of claim 1, wherein the ratio pertains to a mean value. (Mathematical concept/principle: The examiner notes that claim 2, under its broadest reasonable interpretation, merely recites a mathematical concept – average or mean value – that has been held to be a judicial exception that fails to satisfy Step 2A Prong One. See MPEP § 2106.04.  
Claim 2 does not recite any additional elements, much less additional elements that integrate the claimed judicial exception into a practical application to satisfy Step 2A Prong Two or that amount to significantly more than the claimed judicial exception to satisfy Step 2B.  Claim 2 is thus not patent eligible.)
 
Claim 3 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. Claim 3 recites the following limitations:
wherein mean and variance between the first output and the second output are both analyzed to apply the equation. (mental process – collecting information, analyzing it, and displaying certain results of the collection and analysis: The examiner notes that claim 3, under its broadest interpretation, merely covers a mental process of collecting and analyzing information (e.g., mean and variance of first and second outputs) for an intended purpose of applying an equation. This has been held to constitute a mental process that fails to satisfy Step 2A. See MPEP § 2106.04(III)(A)(a).
Claim 3 does not recite any additional elements, much less additional elements that integrate the claimed judicial exception into a practical application to satisfy Step 2A Prong Two or that amount to significantly more than the claimed judicial exception to satisfy Step 2B.  Claim 3 is thus not patent eligible.) 
 
Claim 4 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. Claim 4 recites the following limitations:
wherein the ratio is identified and the equation is applied using cross-domain batch normalization (CDBN). (Mathematical concept/principle: The examiner notes that claim 4, under its broadest reasonable interpretation, merely recites a mathematical concept – standardizing or normalizing data – that has been held to be a judicial exception that fails to satisfy Step 2A Prong One. See MPEP § 2106.04.  
Claim 4 does not recite any additional elements, much less additional elements that integrate the claimed judicial exception into a practical application to satisfy Step 2A Prong Two or that amount to significantly more than the claimed judicial exception to satisfy Step 2B.  Claim 4 is thus not patent eligible.)
 
Claim 5 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. Claim 5 recites the following limitations:
wherein the second neural network is established by a copy of the first neural network prior to the second training data being provided to the second neural network. (mental process: The examiner notes that claim 5, under its broadest reasonable interpretation, merely covers a mental process that can be performed by a human. For example, a human may use a physical aid (e.g., a pen and paper or a computer) to replicate a network representation (e.g., a neural network that does not necessarily require the use of a computer for execution) into multiple replicas.  This claimed mental process has been held to fail to satisfy step 2A regardless of whether or not a computer is used. See MPEP § 2106.04(a)(2)(III)(C)(1)-(3). 
Claim 5 does not recite any additional elements, much less additional elements that integrate the claimed judicial exception into a practical application to satisfy Step 2A Prong Two or that amount to significantly more than the claimed judicial exception to satisfy Step 2B.  Claim 5 is thus not patent eligible.)
 
Claim 6 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. Claim 6 recites the following limitations:
wherein the intermediate layers of the first and second neural networks are layers other than output layers. (Mental process such as an observation, judgment, and/or opinion by a human.  For example, a human can review a network representation and judge which layer(s) constitutes output layer(s) and which layer(s) constitutes intermediate layer(s). A mental process has been held to be insufficient to satisfy Step 2A. See MPEP § 2106.04(a)(2)(III).
Claim 6 does not recite any additional elements, much less additional elements that integrate the claimed judicial exception into a practical application to satisfy Step 2A Prong Two or that amount to significantly more than the claimed judicial exception to satisfy Step 2B.  Claim 6 is thus not patent eligible.)
 
Claim 7 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Judicial Exceptions – Step 2A Prong One: 
But for the recitation of some insignificant additional elements, claim 7 recites the following judicial exceptions:
wherein the first training data is related to the second training data, (Mental process such as an observation, judgment, and/or opinion by a human.  For example, a human can review two pieces of data and judge whether these two pieces of data are related. A mental process has been held to be insufficient to satisfy Step 2A. See MPEP § 2106.04(a)(2)(III).)
 
wherein the first training data is related to the second training data in that the first and second training data both pertain to a same action. (Mental process such as an observation, judgment, and/or opinion by a human.  For example, a human can review two pieces of data and judge whether these two pieces of data represent the same action (e.g., whether two images represent playing tennis). A mental process has been held to be insufficient to satisfy Step 2A. See MPEP § 2106.04(a)(2)(III).)
 
Additional Elements – Step 2B: 
wherein the first and second neural networks pertain to action recognition, and (Generally linking the use of a judicial exception to a particular technological environment or field of use: The examiner notes that this limitation, under its broadest reasonable interpretation, merely generally links the claimed judicial exception to a particular technological environment or a field of use (e.g., action recognition).  This has been held to be insignificant to integrate the claimed judicial exception into a practical application to satisfy Step 2A Prong Two. See MPEP § 2106.04(d)(I).  This has also been held not to be enough to quality as “significantly more”.  See MPEP § 2106.05(A).)
Therefore, claim 7 is not patent eligible for at least the foregoing reasons. 
 
Claim 8 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. Claim 8 recites the following limitations:
Judicial Exceptions – Step 2A Prong One: 
But for the recitation of some insignificant additional elements, claim 8 recites the following judicial exceptions:
wherein the first training data is related to the second training data, (Mental process such as an observation, judgment, and/or opinion by a human.  For example, a human can review two pieces of data and judge whether these two pieces of data are related. A mental process has been held to be insufficient to satisfy Step 2A. See MPEP § 2106.04(a)(2)(III).)
 
wherein the first training data is related to the second training data in that the first and second training data both pertain to a same object. (Mental process such as an observation, judgment, and/or opinion by a human.  For example, a human can review two pieces of data and judge whether these two pieces of data represent the same object. A mental process has been held to be insufficient to satisfy Step 2A. See MPEP § 2106.04(a)(2)(III).)
 
Additional Elements – Step 2B: 
wherein the first and second neural networks pertain to object recognition, and (Generally linking the use of a judicial exception to a particular technological environment or field of use: The examiner notes that this limitation, under its broadest reasonable interpretation, merely generally links the claimed judicial exception to a particular technological environment or a field of use (e.g., object recognition).  This has been held to be insignificant to integrate the claimed judicial exception into a practical application to satisfy Step 2A Prong Two. See MPEP § 2106.04(d)(I).  This has also been held not to be enough to quality as “significantly more”.  See MPEP § 2106.05(A).)
Therefore, claim 8 is not patent eligible for at least the foregoing reasons. 
 
Claim 9 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Judicial Exceptions: Step 2A – Prong One: 
Claim 9 recites the following judicial exceptions:
accessing a first neural network, the first neural network being associated with a first data type; (mental process - The examiner notes that this limitation, under the broadest reasonable interpretation, covers performance of this imitation by a human by with a physical aid such as a pen and paper or a computer. For example, a human can use a pen and paper to draft a first tree (e.g., first neural network) of processes for processing a first type of data. See MPEP § 2106.04(a).)
 
accessing a second neural network, the second neural network being associated with a second data type different from the first data type; (mental process - The examiner notes that this limitation, under the broadest reasonable interpretation, covers performance of this imitation by a human by with a physical aid such as a pen and paper or a computer. For example, a human can use a pen and paper to draft a second tree of processes (e.g., first neural network) for processing a second type of data. See MPEP § 2106.04(a).)
 
identifying a ratio to normalize the first output and the second output; (Mathematical concept/principles/calculations: The examiner notes that this limitation, under its broadest reasonable interpretation, merely covers identifying or determining a numeric ratio (e.g., a numeric value between 0 and 1, exclusive) which is a mathematical concept or principle that has been determined to be an abstract idea. See MPEP § 2106.04(a)(2).)
 
applying the ratio to outputs from the hidden layer of the second neural network to normalize the outputs from the hidden layer of the second neural network. (Mathematical concept/principles/calculations: The examiner notes that this limitation, under its broadest reasonable interpretation, merely covers applying a numeric ratio to two output datasets to combine two output datasets at the ratio as well as normalizing the resulting dataset both of which are mathematical concepts, principles, or calculations that have been determined to be an abstract idea. See MPEP § 2106.04(a)(2).)
 
Additional Elements – Step 2A Prong Two: 
            But for the recitation of the following insignificant additional elements, claim 9, under the broadest reasonable interpretation, merely covers the aforementioned judicial exceptions.  Nonetheless, these additional elements fail to amount to significant more than the claimed judicial exceptions. 
providing, as input, first training data to the first neural network; (insignificant extra-solution (pre-solution) activity: The examiner notes that this additional element merely covers providing input to a software program and is this directed to insignificant pre-solution activity that has been held to fail to integrate a judicial exception into a practical application because it does not impose any meaningful limits on practicing the abstract idea.  See MPEP § 2106. 05(b)(III).)
 
providing, as input, second training data to the second neural network, the first training data being different from the second training data; (insignificant extra-solution (pre-solution) activity: The examiner notes that this additional element, like the immediately preceding additional element, also merely covers providing input to a software program and is this directed to insignificant pre-solution activity that has been held to fail to integrate a judicial exception into a practical application because it does not impose any meaningful limits on practicing the abstract idea.  See MPEP § 2106. 05(b)(III).)
 
identifying a first output from a hidden layer of the first neural network, the first output being based on the first training data; (insignificant extra-solution activity: The examiner notes that this additional element merely covers data gathering – collecting output – from a software program and is this directed to insignificant extra-solution activity that has also been held to fail to integrate a judicial exception into a practical application because it does not impose any meaningful limits on practicing the abstract idea.  See MPEP § 2106.05(b)(III).)
 
identifying a second output from a hidden layer of the second neural network, the second output being based on the second training data, the respective hidden layers of the first and second neural networks being parallel layers; (insignificant extra-solution activity: The examiner notes that this additional element, like the immediately preceding one, also merely covers data gathering – collecting output – from a software program and is this directed to insignificant extra-solution activity that has also been held to fail to integrate a judicial exception into a practical application because it does not impose any meaningful limits on practicing the abstract idea.  See MPEP § 2106. 05(b)(III).)
          Therefore, the examiner asserts that the additional elements recited in claim 9 fail to integrate the claimed abstract idea into a practical application and thus fail to satisfy prong two of step 2A.
 
Additional Elements – Step 2A Prong Two & Step 2B: 
          Claim 9 merely recites additional elements that fail to amount to significantly more than the claimed judicial exception. As discussed above with respect to the failure of additional elements in integrating the claimed abstract idea into a practical application, the additional elements of providing input steps merely amount to insignificant pre-solution activities of providing input data that not only fail to integrate the claimed judicial exception into a practical application but also fail to amount to significantly more than the claimed judicial exception.  See MPEP § 2106.05(b)(III). Moreover, the additional elements of identifying respective outputs from two layers of two neural networks also merely amount to insignificant extra-solution activities of mere data gathering that not only fails to integrate the claimed judicial exception into a practical application but also fails to amount to significantly more than the claimed judicial exception.  See MPEP § 2106.05(b)(III).
The examiner notes that ere insignificant extra-solution activities cannot provide an inventive concept and thus asserts that claim 9 is not patent eligible.
 
With respect to claim 10, claim 10 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. Claim 10 recites wherein the ratio pertains to a mean value.
Judicial Exceptions: Step 2A – Prong One: 
          The examiner notes that this limitation, under the broadest reasonable interpretation, covers a mathematical relationship of a numeric ratio pertaining to an arithmetic or geometric mean which has been found to be an abstract idea that fails to satisfy prong one of step 2A. See MPEP § 2106.04(a).  
 
Additional Elements – Step 2A – Prong Two & Step 2B:
          Claim 10 does not recite any additional elements, much less additional elements that integrate the claimed judicial exception into a practical application. Clam 10 is thus rejected under 35 U.S.C. § 101.
 
With respect to claim 11, claim 11 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. Claim 11 recites wherein the identifying and applying steps are performed using a cross-domain batch normalization (CDBN) module.
Judicial Exceptions: Step 2A – Prong One: 
The examiner notes that this limitation, under the broadest reasonable interpretation, covers the mathematical concept/principles/calculations for identifying a ratio and another mathematical concept/principles/calculations applying the ratio both of which have been found to be abstract ideas that fails to satisfy prong one of step 2A. See MPEP § 2106.04(a).  
 
Additional Elements – Step 2A Prong Two & Step 2B: 
          Claim 11 recite the additional element of performing the aforementioned abstract ideas using a computer module recited at a high level of generality (“using a cross-domain batch normalization (CDBN) module”) such as a generic computer performing a generic computer function of performing arithmetic or floating-point operations using a numeric value (e.g., a ratio) such that it amounts to no more than mere instructions to apply the judicial exceptions using a generic computer module. The examiner notes that mere instructions to apply a judicial exception using a generic computer module recited at a high-level of generality has been found to be insufficient to integrate the claimed judicial exception into a practical application to satisfy prong two of step 2A.  See e.g., MPEP § 2106.05(f). 
          Claim 11 also does not include any additional elements that amount to significantly more than the claimed judicial exception. As discussed above with respect to the failure of additional elements in integrating the claimed abstract ideas into a practical application, the additional element of using a generic computer component recited at a high level of generality merely amounts to mere instructions to implement the claimed judicial exceptions on a computer and thus fails to amount to significantly more than the claimed judicial exception to satisfy step 2B.  See MPEP § 2106.05(f).
The examiner thus asserts that claim 11 is not patent eligible.
 
With respect to claim 12, claim 12 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Judicial Exceptions: Step 2A – Prong One: 
Claim 12 recites the following judicial exceptions:
access a first domain of training data, the first domain being associated with a first domain genre; (mental process - The examiner notes that this limitation, under the broadest reasonable interpretation, covers performance of this imitation by a human by with a physical aid such as a pen and paper or a computer. For example, a human can collect (e.g., access) images of real-world objects in a first type of data (real-world images). See MPEP § 2106.04(a).)
 
access a second domain of training data, the second domain being associated with a second domain genre different from the first domain genre; (mental process - The examiner notes that this limitation, under the broadest reasonable interpretation, covers performance of this imitation by a human by with a physical aid such as a pen and paper or a computer. For example, a human can collect (e.g., access) synthetically generated images of a second type by, for instance, taking screen shots in a software application. See MPEP § 2106.04(a).)
 
using the training data from the first and second domains, classify a target data set; and (mental process - The examiner notes that this limitation, under the broadest reasonable interpretation, covers performance of this imitation by a human by with or without a physical aid. For example, a human can, after reviewing some labels of some sample images of real-world objects and synthetic generated images, look at a first image of a real-world fruit and a second image of a synthetically generated fruit and determine what these fruits are (classification). See MPEP § 2106.04(a).)
 
output a classification of the target data set, (mental process - The examiner notes that this limitation, under the broadest reasonable interpretation, covers performance of this imitation by a human by with or without a physical aid. For example, the aforementioned human can, after what these fruits in the images are, record what the user thinks these fruits in the images are. See MPEP § 2106.04(a).)
  
Additional Elements – Step 2A Prong Two & Step 2B: 
            But for the recitation of the following insignificant additional elements, claim 12, under the broadest reasonable interpretation, merely covers the aforementioned judicial exceptions.  Nonetheless, these additional elements fail to amount to significant more than the claimed judicial exceptions. 
at least one computer storage that is not a transitory signal and that comprises instructions executable by at least one processor to: (mere instructions to apply an exception: The examiner notes that this additional element merely amounts to a recitation of the words “apply it” (or an equivalent) to implement a judicial exception on a computer or a generic computer component (e.g., at least one processor), and that this has been held to be insufficient to satisfy Step 2A Prong Two and Step 2B. See MPEP § 2106.05(f).)
 
wherein the target data set is classified by a domain adaptation module comprising a cross-domain batch normalization (CDBN) module to adaptively select domain statistics to normalize inputs. (data gathering – the additional element of selecting a particular type of data: The examiner notes that adaptively selecting domain statistics, under the broadest reasonable interpretation, merely covers selecting a particular type of data to be manipulated (e.g., to be normalized as claimed) which has been found to be mere data gathering that fails to satisfy prong two of step 2A.  See MPEP § 2106.05(g)(3). Moreover, the additional element and a domain adaptation module or CDBN module are recited at a high level of generality and are thus mere instructions to apply the claimed judicial exceptions to generic computer components performing generic computer functions of reading data according to input instructions.  This has been found to be insufficient to integrate the claimed judicial exception into a practical application to satisfy prong two of step 2A.  See e.g., MPEP § 2106.05(f).)
          Claim 12 also does not include additional elements that amount to significantly more than the claimed judicial exception. As discussed above with respect to the failure of additional elements in integrating the claimed abstract idea into a practical application, the additional elements of computer storage, processor, and selecting data using a computer module recited at a high level of generality amount to no more than mere instructions to apply the judicial exceptions using generic computer components which has been found to fail to amount to significantly more than the claimed judicial exception.  See MPEP § 2106.05(b)(III).
The examiner notes that mere insignificant extra-solution activities cannot provide an inventive concept and thus asserts that claim 12 is not patent eligible.
 
With respect to claim 13, claim 13 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Judicial Exceptions: Step 2A – Prong One: 
wherein the first domain comprises real world video and the second domain comprises computer game video. (Mental process - an observation, judgment, and/or opinion by a human: The examiner notes that claim 13 merely recites an abstract idea – mental process such as an observation, judgment, and/or opinion by a human.  For example, a human can review an image and determine whether this image depicts a real-world object or a synthetically generated object (e.g., by a computer program). See MPEP § 2106.04(a)(2)(III).)
Additional Elements – Step 2A Prong Two & Step 2B: 
Claim 13 does not recite any additional elements, much less additional elements that integrate the claimed judicial exception into a practical application or that amount to significantly more than the claimed judicial exception.  Claim 13 is thus not patent eligible.)
 
With respect to claim 14, claim 14 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Judicial Exceptions: Step 2A – Prong One: 
Claim 14 recites: 
wherein the first domain comprises information derived from a first voice and the second domain comprises information derived from a second voice. (Mental process such as an observation, judgment, and/or opinion by a human: The examiner notes that claim 14 merely recites an abstract idea – mental process such as an observation, judgment, and/or opinion by a human.  For example, a human can review voice data and determine whether the voice data corresponds to the same person. See MPEP § 2106.04(a)(2)(III).)
Additional Elements – Step 2A Prong Two & Step 2B: 
Claim 14 does not recite any additional elements, much less additional elements that integrate the claimed judicial exception into a practical application or that amount to significantly more than the claimed judicial exception. Claim 14 is thus not patent eligible.
 
With respect to claim 15, claim 15 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Judicial Exceptions: Step 2A – Prong One: 
Claim 15 recites: 
wherein the first domain comprises standard font text and the second domain comprises cursive script. (Mental process such as an observation, judgment, and/or opinion by a human: The examiner notes that claim 15 merely recites an abstract idea – mental process such as an observation, judgment, and/or opinion by a human.  For example, a human can review two pieces of writing and determine that a first piece of writing includes a standard font type, and that a second piece of writing includes a cursive script font type. See MPEP § 2106.04(a)(2)(III).)
Additional Elements – Step 2A Prong Two & Step 2B: 
Claim 15 does not recite any additional elements, much less additional elements that integrate the claimed judicial exception into a practical application or that amount to significantly more than the claimed judicial exception. Claim 15 is thus not patent eligible.
 
With respect to claim 16, claim 16 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Additional Elements – Step 2A Prong Two & Step 2B: 
Claim 16 recites the apparatus of claim 12, comprising the at least one processor. (Mere instructions to apply an exception: The examiner notes that this additional element merely amounts to a recitation of the words “apply it” (or an equivalent) to implement a judicial exception on a computer or a generic computer component (e.g., at least one computer storage that comprises instructions executable by the at least one processor), and that this has been held to be insufficient to satisfy Step 2A Prong Two and Step 2B. See MPEP § 2106.05(f).)
 
With respect to claim 17, claim 17 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Judicial Exceptions: Step 2A – Prong One: 
Claim 17 recites: 
wherein the CDBN module is operatively disposed after a fully connected layer in a spatial model. (Mathematical concept/principle: The examiner notes that this limitation, but for the recitation of generic computer modules (e.g., CDBN module, fully connected layer, and a spatial model) recited at high level of generality, merely covers a mathematical relationship that delineates the performance of mathematical operations of batch normalization after the performance of mathematical operations by a fully connected layer.  Such a mathematical relationship has been found to be an abstract idea that fails to satisfy prong one of step 2A. See MPEP § 2106.04(a).)

Additional Elements – Step 2A Prong Two & Step 2B: 
          Claim 17 recites the additional elements of a CDBN module and a fully connected layer in a spatial model at a high level of generality in such a way to perform the abstract ideas (e.g., mathematical operations by a batch normalization layer and by a fully connected layer) with computer modules recited at high level of generality such that these additional elements amount to no more than mere instructions to apply the judicial exceptions using generic computer modules.  The examiner notes that mere instructions to apply a judicial exception using a generic computer module recited at a high-level of generality has been found to be insufficient to integrate the claimed judicial exception into a practical application to satisfy prong two of step 2A.  See e.g., MPEP § 2106.05(f). 
          Claim 17 also does not include additional elements that amount to significantly more than the claimed judicial exception. As discussed above with respect to the failure of additional elements in integrating the claimed abstract ideas into a practical application, the additional element of using generic computer modules recited at a high level of generality merely amounts to mere instructions to implement the claimed judicial exceptions on a computer and thus fails to amount to significantly more than the claimed judicial exception to satisfy step 2B.  See MPEP § 2106.05(f).
Therefore, the examiner thus asserts that claim 17 is thus not patent eligible.
 
With respect to claim 18, claim 18 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Judicial Exceptions: Step 2A – Prong One: 
Claim 18 recites: 
wherein the instructions are executable to execute a training operation to learn a ratio to normalize both source and target data. (Mathematical concept / principle: The examiner notes that this limitation, but for the recitation of additional elements of executable instructions and “to normalize both source and target data”, merely covers the performance of mathematical concept or principle of learning by, for example, comparing predicted results with expected results.  Such a mathematical concept or principle has been found to be an abstract idea that fails to satisfy prong one of step 2A. See MPEP § 2106.04(a).)  
 
Additional Elements – Step 2A Prong Two & Step 2B: 
          Claim 18 recites the additional element of “instructions are executable” in such a way to perform the abstract idea (e.g., mathematical concept or principle of comparing predicted results to or with expected results) with generic computer functions (e.g., “instructions” that are “executable”) recited at high level of generality such that this additional element amounts to no more than mere instructions to apply the judicial exceptions using generic computer functions.  The examiner notes that mere instructions to apply a judicial exception using generic computer functions recited at a high-level of generality has been found to be insufficient to integrate the claimed judicial exception into a practical application to satisfy prong two of step 2A.  See e.g., MPEP § 2106.05(f). 
          Moreover, claim 18 further recites the additional elements of “to normalize both source and target data”.  The examiner notes that this additional element, under its broadest reasonable interpretation, is merely directed to an intended use or a field of use limitation – the claimed “ratio” is intended to be used to normalize data or in the field of normalization, without requiring the ratio to be actually used.  It has been well settled that such an intended use or field of use limitation cannot integrate a judicial exception into a practical application.  See e.g., MPEP § 2106.04(d)(2).
          Claim 18 also does not include additional elements that amount to significantly more than the claimed judicial exception. As discussed above with respect to the failure of additional elements in integrating the claimed abstract idea into a practical application, the additional element of generic computer functions (e.g., executable instructions) merely amounts to no more than mere instructions to apply the judicial exceptions using generic computer components which has been found to fail to amount to significantly more than the claimed judicial exception.  See MPEP § 2106.05(b)(III).
          Further, the additional element of intended use or field of use limitations (e.g., to normalize source and target data) is merely linking the judicial exception to a particular field of use (e.g., to be used in normalization on a computer) which has been found to be insufficient to amount to an inventive concept to satisfy step 2B.  See e.g., MPEP § 2105.06(I)(A)(iv).
The examiner thus asserts that claim 18 is thus not patent eligible.
 
With respect to claim 19, claim 19 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Judicial Exceptions: Step 2A – Prong One: 
Claim 19 recites: 
wherein the instructions are executable to execute a test operation to use the ratio and statistics related to the target to normalize statistics for both the source and the target. (Mathematical concept / principle / algorithm: The examiner notes that this limitation, but for the recitation of additional elements of executable instructions and “to normalize statistics for both the source and the target”, merely covers the performance of mathematical concept or principle of testing different numeric values (e.g., the ratio and statistics) for comparing results obtained through these different numeric values (e.g., the ratio and the statistics).  Such a mathematical concept or principle has been found to be an abstract idea that fails to satisfy prong one of step 2A. See MPEP § 2106.04(a).)  
 
Additional Elements – Step 2A Prong Two & Step 2B: 
          Claim 19 recites the additional element of “instructions are executable” in such a way to perform the abstract idea (e.g., mathematical concept or principle of comparing predicted results to or with expected results) with generic computer functions (e.g., “instructions” that are “executable”) recited at high level of generality such that this additional element amounts to no more than mere instructions to apply the judicial exceptions using generic computer functions.  The examiner notes that mere instructions to apply a judicial exception using generic computer functions recited at a high-level of generality has been found to be insufficient to integrate the claimed judicial exception into a practical application to satisfy prong two of step 2A.  See e.g., MPEP § 2106.05(f). 
          Moreover, claim 19 further recites the additional elements of “to normalize statistics for both the source and the target”.  The examiner notes that this additional element, under its broadest reasonable interpretation, is merely directed to an intended use or a field of use limitation – the claimed “ratio” and “statistics” are intended to be used to normalize statistics or in the field of normalization, without requiring the ratio to be actually used or the statistics be normalized.  It has been well settled that such an intended use or field of use limitation cannot integrate a judicial exception into a practical application.  See e.g., MPEP § 2106.04(d)(2).
          Claim 19 also does not include additional elements that amount to significantly more than the claimed judicial exception. As discussed above with respect to the failure of additional elements in integrating the claimed abstract idea into a practical application, the additional element of generic computer functions (e.g., executable instructions) merely amounts to no more than mere instructions to apply the judicial exceptions using generic computer components which has been found to fail to amount to significantly more than the claimed judicial exception.  See MPEP § 2106.05(b)(III).
          Further, the additional element of intended use or field of use limitations (e.g., to normalize statistics for both the source and the target) is merely linking the judicial exception to a particular field of use (e.g., to be used in normalization on a computer) which has been found to be insufficient to amount to an inventive concept to satisfy step 2B.  See e.g., MPEP § 2105.06(I)(A)(iv).
The examiner thus asserts that claim 19 is thus not patent eligible.
 
With respect to claim 20, claim 20 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Judicial Exceptions: Step 2A – Prong One: 
Claim 20 recites: 
wherein the instructions are executable to use entropy loss to separate unlabeled target data. (The examiner notes that this limitation, but for the recitation of additional elements of executable instructions and “to separate unlabeled target data”, merely covers the performance of mathematical concept or principle of executing a function (e.g., a cross entropy function).  Such a mathematical concept or principle has been found to be an abstract idea that fails to satisfy prong one of step 2A. See MPEP § 2106.04(a).)  
 
Additional Elements – Step 2A Prong Two & Step 2B: 
          Claim 20 recites the additional element of “instructions are executable” in such a way to perform the abstract idea (e.g., mathematical concept or principle of comparing predicted results to or with expected results) with generic computer functions (e.g., “instructions” that are “executable”) recited at high level of generality such that this additional element amounts to no more than mere instructions to apply the judicial exceptions using generic computer functions.  The examiner notes that mere instructions to apply a judicial exception using generic computer functions recited at a high-level of generality has been found to be insufficient to integrate the claimed judicial exception into a practical application to satisfy prong two of step 2A.  See e.g., MPEP § 2106.05(f). 
          Moreover, claim 20 further recites the additional elements of “to separate unlabeled target data”.  The examiner notes that this additional element, under its broadest reasonable interpretation, is merely directed to an intended use or a field of use limitation – the claimed “entropy loss” is intended to be used to separate target data, without requiring the target data to be actually separated.  It has been well settled that such an intended use or field of use limitation cannot integrate a judicial exception into a practical application.  See e.g., MPEP § 2106.04(d)(2).
          Claim 20 also does not include additional elements that amount to significantly more than the claimed judicial exception. As discussed above with respect to the failure of additional elements in integrating the claimed abstract idea into a practical application, the additional element of generic computer functions (e.g., executable instructions) merely amounts to no more than mere instructions to apply the judicial exceptions using generic computer components which has been found to fail to amount to significantly more than the claimed judicial exception.  See MPEP § 2106.05(b)(III).
          Further, the additional element of intended use or field of use limitations (e.g., to separate unlabeled target data) is merely linking the judicial exception to a particular field of use (e.g., to be used in separating data on a computer) which has been found to be insufficient to amount to an inventive concept to satisfy step 2B.  See e.g., MPEP § 2105.06(I)(A)(iv).
The examiner thus asserts that claim 20 is thus not patent eligible.
 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
 
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
With respect to claim 1, Tzeng teaches: 
An apparatus, comprising: access a first neural network, the first neural network being associated with a first data type; (Tzeng, FIG. 3:  


    PNG
    media_image1.png
    198
    791
    media_image1.png
    Greyscale

FIG. 3 caption: “An overview of our proposed Adversarial Discriminative Domain Adaptation (ADDA) approach. We first pre-train a source encoder CNN using labeled source image examples. Next, we perform adversarial adaptation by learning a target encoder CNN such that a discriminator that sees encoded source and target examples cannot reliably predict their domain label. During testing, target images are mapped with the target encoder to the shared feature space and classified by the source encoder CNN. Dashed lines indicate fixed network parameters.” P. 7172, § 5, ¶ 1: “We now evaluate ADDA for unsupervised classiﬁcation adaptation across four different domain shifts. We explore three digits datasets of varying difﬁculty: MNIST [18], USPS, and SVHN [19]. We additionally evaluate on the NYUD [20] dataset to study adaptation across modalities.” 
            The examiner first notes that Tzeng’s source encoder CNN teaches a first neural network, and that Tzeng’s pre-training its source encoder CNN with source images and labels or sending source images to its CNN during adversarial adaptation teaches accessing a first neural network.  Moreover, the examiner further notes that Tzeng’s evaluating any one of the four different types of domain shifts to learn the source mapping (e.g., Ms in § 3) for its source encoder CNN to a different type of the four types of domain shifts for adaptation across modalities to a target domain shift (e.g., any one of the remaining three different types of domain shifts) teaches that the source encoder CNN is associated with a first data type.
Further, the examiner notes that Tseng interchangeably uses the terms “classifier” (e.g., “source classifier” and “target classifier” in § 3, “task-specific classifier” in § 3.1, “classifier” in FIG. 3 and its caption), “encoder” (e.g., “target encoder” in FIGS. 1 and 3 as well as their respective captions), and “encoder CNN” (e.g., “source encoder CNN” and “target encoder CNN” in the caption of FIG. 3), and that “classifier,” “encoder,” and “encoder CNN” are thus interpreted as functional and/or structural equivalents of each other.)

access a second neural network, the second neural network being associated with a second data type different from the first data type; (Tzeng, FIG. 3 and p. 7172, § 5, ¶ 1, supra. FIG. 3 caption: “An overview of our proposed Adversarial Discriminative Domain Adaptation (ADDA) approach. We first pre-train a source encoder CNN using labeled source image examples. Next, we perform adversarial adaptation by learning a target encoder CNN such that a discriminator that sees encoded source and target examples cannot reliably predict their domain label. During testing, target images are mapped with the target encoder to the shared feature space and classified by the source classifier. Dashed lines indicate fixed network parameters.”
The examiner notes that Tzeng’s target encoder CNN teaches a second neural network, and that Tzeng’s target encoder CNN’s receiving target images during adversarial adaptation illustrated in FIG. 3 teaches accessing a second neural network.  The examiner further notes that Tzeng’s evaluating any one of the remaining three different types of domain shifts (e.g., three different shifts other than the aforementioned domain shift evaluated by the first neural network) including adaptation across modalities teaches that the target encoder CNN is associated with a different, second data type while the source encoder CNN above is associated with a first data type (see rationale for the limitation immediately above).)

provide, as input, first training data to the first neural network; (Tzeng, FIG. 3 caption: “An overview of our proposed Adversarial Discriminative Domain Adaptation (ADDA) approach. We first pre-train a source encoder CNN using labeled source image examples.”
The examiner notes that Tzeng’s source images provided to its source CNN illustrated in FIG. 3 or the labeled source image examples described in FIG. 3’s caption teaches first training data.  The examiner further notes that Tzeng’s providing the source images to its source CNN during adversarial adaptation or providing the source images plus labels to the source CNN during pre-training teaches provide, as input, first training data to the first neural network.) 


provide, as input, second training data to the second neural network, the first training data being different from the second training data; (Tzeng, FIG. 3 caption: “An overview of our proposed Adversarial Discriminative Domain Adaptation (ADDA) approach. We first pre-train a source encoder CNN using labeled source image examples. Next, we perform adversarial adaptation by learning a target encoder CNN such that a discriminator that sees encoded source and target examples cannot reliably predict their domain label. During testing, target images are mapped with the target encoder to the shared feature space and classified by the source encoder CNN. Dashed lines indicate fixed network parameters.” 
The examiner notes that Tzeng’s learning a target encoder CNN (also referred to as “target encoder” above) teaches training the second neural network, and that Tzeng’s providing target examples in FIG. 3’s Caption or the target images illustrated in FIG. 3 to the target encoder CNN (the clamed second neural network) for learning the target encoder CNN teaches provide, as input, second training data to the second neural network (e.g., Tzeng’s target encoder CNN/target CNN cited above.) Therefore, the examiner asserts that at least the aforementioned passages and figure teach the above limitation.)

identify a first output from an intermediate layer of the first neural network, the first output being based on the first training data; (Tzeng, § 3.1, ¶ 3: “Once the mapping parameterization is determined for the source, we must decide how to parametrize the target mapping Mt. In general, the target mapping almost always matches the source in terms of the speciﬁc functional layer (architecture), but different methods have proposed various regularization techniques. All methods initialize the target mapping parameters with the source, but different methods choose different constraints between the source and target mappings, ψ(Ms,Mt).” § 3.1, ¶ 4: “Consider a layered representations [sic] where each layer parameters are denoted as,             
                
                    
                        M
                    
                    
                        s
                    
                    
                        l
                    
                
                 
            
        or             
                
                    
                        M
                    
                    
                        t
                    
                    
                        l
                    
                
            
        , for a given set of equivalent layers, {ℓ1, …, ℓn}. Then the space of constraints explored in the literature can be described through layerwise equality constraints as follows:             
                ψ
                
                    
                        
                            
                                M
                            
                            
                                s
                            
                        
                        ,
                         
                        
                            
                                M
                            
                            
                                t
                            
                        
                    
                
                ≜
                
                    
                        
                            
                                
                                    
                                        ψ
                                    
                                    
                                        
                                            
                                                l
                                            
                                            
                                                i
                                            
                                        
                                    
                                
                                
                                    
                                        
                                            
                                                M
                                            
                                            
                                                s
                                            
                                            
                                                
                                                    
                                                        l
                                                    
                                                    
                                                        i
                                                    
                                                
                                            
                                        
                                        |
                                        ,
                                         
                                        
                                            
                                                M
                                            
                                            
                                                t
                                            
                                            
                                                
                                                    
                                                        l
                                                    
                                                    
                                                        i
                                                    
                                                
                                            
                                        
                                    
                                
                            
                        
                    
                    
                        i
                        ∈
                        
                            
                                1
                                …
                                n
                            
                        
                    
                
                 
                 
                 
                 
                (
                4
                )
            
         where each individual layer can be constrained independently. A very common form of constraint is source and target layerwise equality:             
                
                    
                        ψ
                    
                    
                        
                            
                                l
                            
                            
                                i
                            
                        
                    
                
                
                    
                        
                            
                                M
                            
                            
                                s
                            
                            
                                
                                    
                                        l
                                    
                                    
                                        i
                                    
                                
                            
                        
                        ,
                         
                        
                            
                                M
                            
                            
                                t
                            
                            
                                
                                    
                                        l
                                    
                                    
                                        i
                                    
                                
                            
                        
                    
                
                =
                 
                
                    
                        
                            
                                M
                            
                            
                                s
                            
                            
                                
                                    
                                        l
                                    
                                    
                                        i
                                    
                                
                            
                        
                        =
                        
                            
                                M
                            
                            
                                t
                            
                            
                                
                                    
                                        l
                                    
                                    
                                        i
                                    
                                
                            
                        
                    
                
                 
                 
                 
                (
                5
                )
            
        .”  p. 7169, § 3, ¶ 3: “In adversarial adaptive methods, the main goal is to regularize the learning of the source and target mappings, Ms and Mt, so as to minimize the distance between the empirical source and target mapping distributions: Ms(Xs) and Mt(Xt).” p. 7169, § 3, ¶ 4: “First a domain discriminator, D, which classiﬁes whether a data point is drawn from the source or the target domain. Thus, we can derive a generic formulation for domain adversarial techniques below: 

    PNG
    media_image2.png
    200
    400
    media_image2.png
    Greyscale
”
The examiner notes that any of Tzeng’s i-th layer in the first neural network (e.g., Tzeng’s source encoder CNN) where i {2, 3, …, n-1} (e.g., ℓi {ℓ2, …, ℓn-1} teaches an intermediate layer of the first neural network because the first layer, ℓ1, may constitute an input layer, and the last layer, ℓn, may constitute an output layer of the first neural network, and any layers in between are generally accepted as intermediate or hidden layers). The examiner also notes that the source mapping distribution for the i-th layer,             
                
                    
                        M
                    
                    
                        s
                    
                    
                        
                            
                                l
                            
                            
                                i
                            
                        
                    
                
                
                    
                        
                            
                                X
                            
                            
                                t
                            
                        
                    
                
            
        , generated by the i-th layer (            
                l
                i
            
        )  of the first neural network for the source training input images (Xt) teaches a first output of the intermediate layer of the first neural network, and that Tzeng thus teaches the above limitation.)

identify a second output from an intermediate layer of the second neural network, the second output being based on the second training data, the respective intermediate layers of the first and second neural networks being parallel layers; (Tzeng, § 3.1, ¶ 3: “Once the mapping parameterization is determined for the source, we must decide how to parametrize the target mapping Mt. In general, the target mapping almost always matches the source in terms of the speciﬁc functional layer (architecture), but different methods have proposed various regularization techniques. All methods initialize the target mapping parameters with the source, but different methods choose different constraints between the source and target mappings, ψ(Ms,Mt).” § 3.1, ¶ 4: “Consider a layered representations [sic] where each layer parameters are denoted as,             
                
                    
                        M
                    
                    
                        s
                    
                    
                        l
                    
                
                 
            
        or             
                
                    
                        M
                    
                    
                        t
                    
                    
                        l
                    
                
            
        , for a given set of equivalent layers, {ℓ1, …, ℓn}. Then the space of constraints explored in the literature can be described through layerwise equality constraints as follows:            
                ψ
                
                    
                        
                            
                                M
                            
                            
                                s
                            
                        
                        ,
                         
                        
                            
                                M
                            
                            
                                t
                            
                        
                    
                
                ≜
                
                    
                        
                            
                                
                                    
                                        ψ
                                    
                                    
                                        
                                            
                                                l
                                            
                                            
                                                i
                                            
                                        
                                    
                                
                                
                                    
                                        
                                            
                                                M
                                            
                                            
                                                s
                                            
                                            
                                                
                                                    
                                                        l
                                                    
                                                    
                                                        i
                                                    
                                                
                                            
                                        
                                        |
                                        ,
                                         
                                        
                                            
                                                M
                                            
                                            
                                                t
                                            
                                            
                                                
                                                    
                                                        l
                                                    
                                                    
                                                        i
                                                    
                                                
                                            
                                        
                                    
                                
                            
                        
                    
                    
                        i
                        ∈
                        
                            
                                1
                                …
                                n
                            
                        
                    
                
                 
                 
                 
                 
                (
                4
                )
            
         where each individual layer can be constrained independently. A very common form of constraint is source and target layerwise equality:             
                
                    
                        ψ
                    
                    
                        
                            
                                l
                            
                            
                                i
                            
                        
                    
                
                
                    
                        
                            
                                M
                            
                            
                                s
                            
                            
                                
                                    
                                        l
                                    
                                    
                                        i
                                    
                                
                            
                        
                        ,
                         
                        
                            
                                M
                            
                            
                                t
                            
                            
                                
                                    
                                        l
                                    
                                    
                                        i
                                    
                                
                            
                        
                    
                
                =
                 
                
                    
                        
                            
                                M
                            
                            
                                s
                            
                            
                                
                                    
                                        l
                                    
                                    
                                        i
                                    
                                
                            
                        
                        =
                        
                            
                                M
                            
                            
                                t
                            
                            
                                
                                    
                                        l
                                    
                                    
                                        i
                                    
                                
                            
                        
                    
                
                 
                 
                 
                (
                5
                )
            
        .”  p. 7169, § 3, ¶ 3: “In adversarial adaptive methods, the main goal is to regularize the learning of the source and target mappings, Ms and Mt, so as to minimize the distance between the empirical source and target mapping distributions: Ms(Xs) and Mt(Xt).” p. 7169, § 3, ¶ 4: “First a domain discriminator, D, which classiﬁes whether a data point is drawn from the source or the target domain. Thus, we can derive a generic formulation for domain adversarial techniques below: 

    PNG
    media_image2.png
    200
    400
    media_image2.png
    Greyscale
”
The examiner notes that any of Tzeng’s i-th layer where (e.g., ℓi {ℓ2, …, ℓn-1} teaches an intermediate layer of the second neural network (e.g., Tzeng’s target encoder CNN) because the first layer, ℓ1, may constitute an input layer, and the last layer, ℓn, may constitute an output layer of the second neural network, and any layers in between are generally accepted as intermediate or hidden layers). The examiner also notes that the target mapping distribution for the i-th layer,             
                
                    
                        M
                    
                    
                        t
                    
                    
                        
                            
                                l
                            
                            
                                i
                            
                        
                    
                
                
                    
                        
                            
                                X
                            
                            
                                t
                            
                        
                    
                
            
        , generated by the second neural network (e.g., Tzeng’s target encoder neural network) via learning the target representation mapping,             
                
                    
                        M
                    
                    
                        t
                    
                    
                        
                            
                                l
                            
                            
                                i
                            
                        
                    
                
            
        , of the i-th layer (            
                l
                i
            
        )  for the target training input images (Xt) teaches a second output (e.g., the aforementioned target mapping distribution, (Xt), for the target input images Xt), and that Tzeng thus teaches the above limitation. Further, The examiner notes that for the same index “i”, the source layer, , and the target layer, , are parallel layers for Tzeng’s layerwise computation (e.g., Eqns. (3)-(5), supra.)

Tzeng does not appear to explicitly teach: 
identify a ratio to normalize the first output and the second output; 
apply an equation that accounts for the ratio to change one or more weights of the intermediate layer of the second neural network.

Tsai does, however, teach: 
identify a ratio to normalize the first output and the second output; (Tsai, p. 5084, left-hand column, ¶ 2: “With the above goal, the objective function of our CDLS can be formulated as follows:

    PNG
    media_image3.png
    200
    400
    media_image3.png
    Greyscale
”
p. 5084, left-hand column, ¶ 3: “In (4), [Symbol font/0x64] ∈ [0, 1] controls the portion of cross-domain data in each class to be utilized for adaptation. If = 0, our CDLS would turn into its supervised version as described in Section 3.2.1. While we fix [Symbol font/0x64] = 0.5 in our work, additional analysis on the selection and effect of will be provided in our experiments.” p. 5084, left-column, last paragraph – right-hand column, first paragraph: “To match cross-domain conditional data distributions via EC, we apply SVM trained from labeled cross-domain data to predict the pseudo-labels             
                
                    
                        
                            
                                y
                            
                            ~
                        
                    
                    
                        u
                    
                    
                        i
                    
                
            
         for             
                
                    
                        
                            
                                x
                            
                            ^
                        
                    
                    
                        u
                    
                    
                        i
                    
                
            
         (as described later in Section 3.3.3). With             
                
                    
                        
                            
                                
                                    
                                        
                                            
                                                y
                                            
                                            ~
                                        
                                    
                                    
                                        u
                                    
                                    
                                        i
                                    
                                
                            
                        
                    
                    
                        i
                        =
                        1
                    
                    
                        
                            
                                n
                            
                            
                                U
                            
                        
                    
                
            
         assigned for XU, the EC term in (4) can be expressed as:
            
                
                    
                        E
                    
                    
                        C
                    
                
                
                    
                        A
                        ,
                        
                            
                                D
                            
                            
                                S
                            
                        
                        ,
                         
                        
                            
                                D
                            
                            
                                L
                            
                        
                        ,
                        
                            
                                X
                            
                            
                                U
                            
                        
                        ,
                        α
                        ,
                         
                        β
                    
                
                =
                
                    
                        ∑
                        
                            c
                            =
                            1
                        
                        
                            C
                        
                    
                    
                        
                            
                                E
                            
                            
                                c
                                o
                                n
                                d
                            
                            
                                c
                            
                        
                        +
                         
                        
                            
                                1
                            
                            
                                
                                    
                                        e
                                    
                                    
                                        c
                                    
                                
                            
                        
                        
                            
                                E
                            
                            
                                e
                                m
                                b
                                e
                                d
                            
                            
                                c
                            
                        
                    
                
            
        ,		(6)
where

    PNG
    media_image4.png
    200
    400
    media_image4.png
    Greyscale
”
p. 5084, right-hand column, ¶ 2: “It can be seen that, EC term in (6) is extended from (3) by utilizing unlabeled target-domain data with pseudo-labels. Similar to (3), we match cross-domain class-conditional distributions and preserve local embedding of transformed data of each class using via             
                
                    
                        E
                    
                    
                        c
                        o
                        n
                        d
                    
                    
                        c
                    
                
                 
            
        and             
                
                    
                        E
                    
                    
                        e
                        m
                        b
                        e
                        d
                    
                    
                        c
                    
                
            
        , respectively. The normalization term in (6) is calculated as             
                
                    
                        e
                    
                    
                        c
                    
                
                =
                 
                δ
                
                    
                        n
                    
                    
                        S
                    
                    
                        c
                    
                
                
                    
                        n
                    
                    
                        L
                    
                    
                        c
                    
                
                +
                 
                δ
                
                    
                        n
                    
                    
                        L
                    
                    
                        c
                    
                
                
                    
                        n
                    
                    
                        U
                    
                    
                        c
                    
                
                +
                 
                
                    
                        δ
                    
                    
                        2
                    
                
                
                    
                        n
                    
                    
                        U
                    
                    
                        c
                    
                
                
                    
                        n
                    
                    
                        S
                    
                    
                        c
                    
                
            
        .” P. 5085, left-hand column, Algorithm 1: “Input: Labeled source and target-domain data             
                
                    
                        D
                    
                    
                        S
                    
                
                =
                 
                
                    
                        
                            
                                
                                    
                                        X
                                    
                                    
                                        S
                                    
                                    
                                        i
                                    
                                
                                ,
                                 
                                
                                    
                                        Y
                                    
                                    
                                        S
                                    
                                    
                                        i
                                    
                                
                            
                        
                    
                    
                        i
                        =
                        1
                    
                    
                        
                            
                                n
                            
                            
                                S
                            
                        
                    
                
            
        ,             
                
                    
                        D
                    
                    
                        L
                    
                
                =
                 
                
                    
                        
                            
                                
                                    
                                        X
                                    
                                    
                                        l
                                    
                                    
                                        i
                                    
                                
                                ,
                                 
                                
                                    
                                        Y
                                    
                                    
                                        l
                                    
                                    
                                        i
                                    
                                
                            
                        
                    
                    
                        i
                        =
                        1
                    
                    
                        
                            
                                n
                            
                            
                                L
                            
                        
                    
                
            
        ; unlabeled target domain data             
                
                    
                        
                            
                                
                                    
                                        X
                                    
                                    
                                        U
                                    
                                    
                                        i
                                    
                                
                            
                        
                    
                    
                        i
                        =
                        1
                    
                    
                        
                            
                                n
                            
                            
                                U
                            
                        
                    
                
            
        ; feature dimension m; ratio  [Symbol font/0x64] , parameter [Symbol font/0x6C].” 
The examiner notes that Tsai’s ratio, [Symbol font/0x64], for computing the normalization term             
                
                    
                        e
                    
                    
                        c
                    
                
                =
                 
                δ
                
                    
                        n
                    
                    
                        S
                    
                    
                        c
                    
                
                
                    
                        n
                    
                    
                        L
                    
                    
                        c
                    
                
                +
                 
                δ
                
                    
                        n
                    
                    
                        L
                    
                    
                        c
                    
                
                
                    
                        n
                    
                    
                        U
                    
                    
                        c
                    
                
                +
                 
                
                    
                        δ
                    
                    
                        2
                    
                
                
                    
                        n
                    
                    
                        U
                    
                    
                        c
                    
                
                
                    
                        n
                    
                    
                        S
                    
                    
                        c
                    
                
            
         in Eq. (6) teaches the claimed ratio, and that Tsai’s source domain data (e.g., 
    PNG
    media_image5.png
    69
    162
    media_image5.png
    Greyscale
in Eq. (6) where [Symbol font/0x61]i and             
                
                    
                        
                            
                                α
                            
                            
                                i
                            
                        
                         
                        a
                        n
                        d
                         
                        x
                    
                    
                        S
                    
                    
                        i
                        ,
                         
                         
                        c
                    
                
            
         respectively denote weights and source domain input) as well as target domain data (e.g., 
    PNG
    media_image6.png
    55
    106
    media_image6.png
    Greyscale
 that denotes the product of the weight [Symbol font/0x62]i and unlabeled target domain data             
                
                    
                        
                            
                                x
                            
                            ^
                        
                    
                    
                        u
                    
                    
                        i
                        ,
                        c
                    
                
            
        ) respectively teach outputs of the first and the second intermedaite layers. The examiner thus asserts that Tsai’s determining the ratio for normalization teaches the above limitation.) 

apply an equation that accounts for the ratio to change one or more weights of the intermediate layer of the second neural network. (Tsai, p. 5084, left-hand column, ¶ 2: “With the above goal, the objective function of our CDLS can be formulated as follows:

    PNG
    media_image3.png
    200
    400
    media_image3.png
    Greyscale
”
p. 5084, right-hand column, ¶ 2: “It can be seen that, EC term in (6) is extended from (3) by utilizing unlabeled target-domain data with pseudo-labels. Similar to (3), we match cross-domain class-conditional distributions and preserve local embedding of transformed data of each class using via             
                
                    
                        E
                    
                    
                        c
                        o
                        n
                        d
                    
                    
                        c
                    
                
                 
            
        and             
                
                    
                        E
                    
                    
                        e
                        m
                        b
                        e
                        d
                    
                    
                        c
                    
                
            
        , respectively. The normalization term in (6) is calculated as             
                
                    
                        e
                    
                    
                        c
                    
                
                =
                 
                δ
                
                    
                        n
                    
                    
                        S
                    
                    
                        c
                    
                
                
                    
                        n
                    
                    
                        L
                    
                    
                        c
                    
                
                +
                 
                δ
                
                    
                        n
                    
                    
                        L
                    
                    
                        c
                    
                
                
                    
                        n
                    
                    
                        U
                    
                    
                        c
                    
                
                +
                 
                
                    
                        δ
                    
                    
                        2
                    
                
                
                    
                        n
                    
                    
                        U
                    
                    
                        c
                    
                
                
                    
                        n
                    
                    
                        S
                    
                    
                        c
                    
                
            
        .” P. 5085, left-hand column, Algorithm 1: “Input: Labeled source and target-domain data             
                
                    
                        D
                    
                    
                        S
                    
                
                =
                 
                
                    
                        
                            
                                
                                    
                                        X
                                    
                                    
                                        S
                                    
                                    
                                        i
                                    
                                
                                ,
                                 
                                
                                    
                                        Y
                                    
                                    
                                        S
                                    
                                    
                                        i
                                    
                                
                            
                        
                    
                    
                        i
                        =
                        1
                    
                    
                        
                            
                                n
                            
                            
                                S
                            
                        
                    
                
            
        ,             
                
                    
                        D
                    
                    
                        L
                    
                
                =
                 
                
                    
                        
                            
                                
                                    
                                        X
                                    
                                    
                                        l
                                    
                                    
                                        i
                                    
                                
                                ,
                                 
                                
                                    
                                        Y
                                    
                                    
                                        l
                                    
                                    
                                        i
                                    
                                
                            
                        
                    
                    
                        i
                        =
                        1
                    
                    
                        
                            
                                n
                            
                            
                                L
                            
                        
                    
                
            
        ; unlabeled target domain data             
                
                    
                        
                            
                                
                                    
                                        X
                                    
                                    
                                        U
                                    
                                    
                                        i
                                    
                                
                            
                        
                    
                    
                        i
                        =
                        1
                    
                    
                        
                            
                                n
                            
                            
                                U
                            
                        
                    
                
            
        ; feature dimension m; ratio  [Symbol font/0x64] , parameter [Symbol font/0x6C].” “5: Update landmark weights {[Symbol font/0x61], [Symbol font/0x62]} by (9)” “ p. 5084, right-hand column, ¶ 3: “With both EM and EC defined, we address semi-supervised HDA by solving (4). This allows us to learn the proper weights and for the representative instances from both domains (i.e., cross-domain landmarks).” P. 5085, § 3.3.2, Eqns. (8)-(9). 
The examiner notes that Tsai’s ratio, [Symbol font/0x64], for computing the normalization term             
                
                    
                        e
                    
                    
                        c
                    
                
                =
                 
                δ
                
                    
                        n
                    
                    
                        S
                    
                    
                        c
                    
                
                
                    
                        n
                    
                    
                        L
                    
                    
                        c
                    
                
                +
                 
                δ
                
                    
                        n
                    
                    
                        L
                    
                    
                        c
                    
                
                
                    
                        n
                    
                    
                        U
                    
                    
                        c
                    
                
                +
                 
                
                    
                        δ
                    
                    
                        2
                    
                
                
                    
                        n
                    
                    
                        U
                    
                    
                        c
                    
                
                
                    
                        n
                    
                    
                        S
                    
                    
                        c
                    
                
            
         in Eq. (6) teaches the claimed ratio, and that Tsai’s solving Eq. (4) with EM and EC to learn weights from target domain teaches change one or more weights of the intermediate layer of the second neural network. The examiner further notes that Eq. (9), which is based on Eq. (8) that minimizes the objective function based on the aforementioned ratio for normalization teaches an equation that accounts for the ratio and is applied to change and optimize the weights {[Symbol font/0x61], [Symbol font/0x62]}.) 

Tzeng and Tsai are analogous art because both pertain to domain adaptation of a neural network from one domain to another domain with dataset shift. 
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Tzeng with Tsai’s applying an equation accounting for a ratio for normalizing first and second outputs respectively from a first and a second neural network (Tsai, supra).  The modification exploits heterogeneous data across domains with the ability to identify the adaptation ability of each instance with properly assigned weight by determining a ratio that controls the portions of cross-domain landmarks to be exploited and normalized for adaptation (Tsai, p. 5084, left-hand column, ¶ 1: “Extended from (1), our proposed algorithm crossdomain landmark selection (CDLS) exploits heterogenous data across domains, with the ability to identify the adaptation ability of each instance with a properly assigned weight. Instances in either domain with a nonzero weight will be considered as a landmark.” P. 5088, left-hand column, ¶ 1: “Recall that the ratio in (4) controls the portion of crossdomain landmarks to be exploited for adaptation. Figure 6(c) presents the performance of CDLS with different values. It is worth repeating that, CDLS would be simplified as the supervised version CDLS sup if [Symbol font/0x64] = 0, as illustrated by the flat dotted line in Figure 6(c). As expected, using all ([Symbol font/0x64] = 1) or none ([Symbol font/0x64] = 0) of the cross-domain data as landmarks would not be able to achieve satisfactory HAD performance. Therefore, the choice of [Symbol font/0x64] = 0.5 would be reasonable in our experiments.”)

Tzeng and Tsai do not appear to explicitly teach: 
at least one processor, and 
at least one computer storage that is not a transitory signal and that comprises instructions executable by the at least one processor to: 

Csurka does, however, teach: 
at least one processor, and (Csurka, ¶ [0042]: “The digital processor 30 can be variously embodied, such as by a single-core processor, a dual-core processor (or more generally by a multiple-core processor), a digital processor and cooperating math coprocessor, a digital controller, or the like. The exemplary digital processor 30, in addition to controlling the operation of the computer system 10, executes the instructions 28 stored in memory 26 for performing the method outlined in FIG. 3.” ¶ [0044]: “The term “software,” as used herein, is intended to encompass any collection or set of instructions executable by a computer or other digital system so as to configure the computer or other digital system to perform the task that is the intent of the software. The term “software” as used herein is intended to encompass such instructions stored in storage medium such as RAM, a hard disk, optical disk, or so forth, and is also intended to encompass so-called “firmware” that is software stored on a ROM or so forth.”)

at least one computer storage that is not a transitory signal and that comprises instructions executable by the at least one processor to: (Csurka, supra.)

Tzeng, Tsai, and Csurka are analogous art because all three references pertain to domain adaptation of a neural network from one domain to another domain with dataset shift. 
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Tzeng in view of Tsai to incorporate Csurka’s processor and computer storage (Csurka, supra).  The modification provides the capability of storing instructions for software in a storage medium and configuring a computer or a digital system with the instructions to perform desired or required tasks (Csurka, ¶ [0044]: “The term “software,” as used herein, is intended to encompass any collection or set of instructions executable by a computer or other digital system so as to configure the computer or other digital system to perform the task that is the intent of the software. The term “software” as used herein is intended to encompass such instructions stored in storage medium such as RAM, a hard disk, optical disk, or so forth, and is also intended to encompass so-called “firmware” that is software stored on a ROM or so forth.”)
 
With respect to claim 2, Tzeng modified by Csurka and Tsai teaches the apparatus of claim 1, and Tsai further teaches: 
wherein the ratio pertains to a mean value. (Tsai, p. 5083, right-hand column, ¶ 1: “In order to match cross-domain data distributions for such standard supervised HDA settings, we formulate the problem formulation as follows: 

    PNG
    media_image7.png
    200
    400
    media_image7.png
    Greyscale

p. 5083, right-hand column, ¶ 2: “To solve (1), we resort to statistics to measure the probability density for describing data distributions. As suggested in [26, 24], we adopt empirical Maximum Mean Discrepancy (MMD) [16] to measure the difference between the above cross-domain data distributions.” The examiner notes that Tsai’s formulating the domain adaptation problem with the data distributions (EM and EC) measuring the differences between cross-main marginal and conditional data distributions and using maximum mean discrepancy to measure the differences between the aforementioned cross-domain data distributions teaches that EM and EC pertain to a respective mean value. The examiner further notes that Tsai’s re-formulating Eq. (1) into Eq. (4), which minimizes the objective function in terms of EM and EC above based on the ratio, [Symbol font/0x64], teaches that the ratio pertains to the aforementioned maximum mean discrepancy and hence teaches the above limitation.)

Tzeng, Tsai, and Csurka are analogous art because all three references pertain to domain adaptation of a neural network from one domain to another domain with dataset shift. 
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Tzeng in view of Tsai to incorporate Tsai’s ratio pertaining to a mean value (Tsai, supra).  The modification adopts a mean value (e.g., the aforementioned mean discrepancy in measuring the differences between cross domain data distributions) that allows rewriting a symbolic minimization of an objective function for domain adaptation into computable relations for cross-domain data distributions (EM) and cross-domain conditional data distributions (EC). More importantly, these computable relations obtained through the aforementioned mean in turn allow for solving the optimization problem and learning of the proper weights to provide improved adaptation capability across multiple domains (Tsai, p. 5083, right-hand column, Eq. (2) for cross-domain data distributions, EM, and Eq. (3) for cross-domain data conditional distributions, EC. P. 5084, right-hand column, ¶ 3: “With both EM and EC defined, we address semisupervised HDA by solving (4). This allows us to learn the proper weights and for the representative instances from both domains (i.e., cross-domain landmarks). As a result, our derived feature transformation A will result in a feature subspace with improved adaptation capability.”)
 
With respect to claim 3, Tzeng modified by Csurka and Tsai teaches the apparatus of claim 1, and Tsai further teaches: 
wherein mean and variance between the first output and the second output are both analyzed to apply the equation. (Tsai, p. 5083, right-hand column, ¶ 1: “In order to match cross-domain data distributions for such standard supervised HDA settings, we formulate the problem formulation as follows: 

    PNG
    media_image7.png
    200
    400
    media_image7.png
    Greyscale

p. 5083, right-hand column, ¶ 2: “To solve (1), we resort to statistics to measure the probability density for describing data distributions. As suggested in [26, 24], we adopt empirical Maximum Mean Discrepancy (MMD) [16] to measure the difference between the above cross-domain data distributions.”
The examiner notes that it is well known in the art that maximum mean discrepancy includes the determination of both the mean and variance as evidenced by, for example, reference number [16] cited in Tsai that establishes the level of knowledge of one of ordinary skill in the art, and that Tsai’s optimizing the objective function (e.g., Eq. (1) or (4) cited for claim 1, supra) by adopting the maximum mean discrepancy (MMD) to learn the weights for domain adaptation teaches the above limitation.)

Tzeng, Tsai, and Csurka are analogous art because all three references pertain to domain adaptation of a neural network from one domain to another domain with dataset shift. 
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Tzeng in view of Tsai and Csurka to further incorporate Csurka’s analyzing both the mean and variance between two sets of outputs (Csurka, supra).  The modification analyzes the class means of a plurality of classes and assigns an example to the class with the nearest mean having the smallest distance and further analyzes the variance to construct an invertible covariance matrix so that each model for each class of a plurality of classes is simplified into an equal weighted Gaussian mixture distribution with the shared inverted covariance matrix ()
a classifier can assign a class to a sample using arithmetic operations according to a simple formula for the posterior of a mixture distribution (Csurka, ¶ [0046]: “The Nearest Class Mean (NCM) classifier assigns xi to the class c*∈ Yc = {1, …, C} whose mean μc is the closest: Eq. (1) (reproduction omitted)”.  ¶ [0053]: “The model for each class becomes an equal weighted Gaussian mixture distribution with mjc as means and (WTW) being the shared inverse covariance matrix.”)

With respect to claim 4, Tzeng modified by Csurka and Tsai teaches the apparatus of claim 1, and Tsai further teaches: 
wherein the ratio is identified and the equation is applied using cross-domain batch normalization (CDBN). (Tsai, p. 5084, left-column, last paragraph – right-hand column, first paragraph: “To match cross-domain conditional data distributions via EC, we apply SVM trained from labeled cross-domain data to predict the pseudo-labels             
                
                    
                        
                            
                                y
                            
                            ~
                        
                    
                    
                        u
                    
                    
                        i
                    
                
            
         for             
                
                    
                        
                            
                                x
                            
                            ^
                        
                    
                    
                        u
                    
                    
                        i
                    
                
            
         (as described later in Section 3.3.3). With             
                
                    
                        
                            
                                
                                    
                                        
                                            
                                                y
                                            
                                            ~
                                        
                                    
                                    
                                        u
                                    
                                    
                                        i
                                    
                                
                            
                        
                    
                    
                        i
                        =
                        1
                    
                    
                        
                            
                                n
                            
                            
                                U
                            
                        
                    
                
            
         assigned for XU, the EC term in (4) can be expressed as: Eq. (6) (reproduction omitted).”
p. 5084, right-hand column, ¶ 2: “It can be seen that, EC term in (6) is extended from (3) by utilizing unlabeled target-domain data with pseudo-labels. Similar to (3), we match cross-domain class-conditional distributions and preserve local embedding of transformed data of each class using via             
                
                    
                        E
                    
                    
                        c
                        o
                        n
                        d
                    
                    
                        c
                    
                
                 
            
        and             
                
                    
                        E
                    
                    
                        e
                        m
                        b
                        e
                        d
                    
                    
                        c
                    
                
            
        , respectively. The normalization term in (6) is calculated as             
                
                    
                        e
                    
                    
                        c
                    
                
                =
                 
                δ
                
                    
                        n
                    
                    
                        S
                    
                    
                        c
                    
                
                
                    
                        n
                    
                    
                        L
                    
                    
                        c
                    
                
                +
                 
                δ
                
                    
                        n
                    
                    
                        L
                    
                    
                        c
                    
                
                
                    
                        n
                    
                    
                        U
                    
                    
                        c
                    
                
                +
                 
                
                    
                        δ
                    
                    
                        2
                    
                
                
                    
                        n
                    
                    
                        U
                    
                    
                        c
                    
                
                
                    
                        n
                    
                    
                        S
                    
                    
                        c
                    
                
            
        .” P. 5085, left-hand column, Algorithm 1: “Input: Labeled source and target-domain data             
                
                    
                        D
                    
                    
                        S
                    
                
                =
                 
                
                    
                        
                            
                                
                                    
                                        X
                                    
                                    
                                        S
                                    
                                    
                                        i
                                    
                                
                                ,
                                 
                                
                                    
                                        Y
                                    
                                    
                                        S
                                    
                                    
                                        i
                                    
                                
                            
                        
                    
                    
                        i
                        =
                        1
                    
                    
                        
                            
                                n
                            
                            
                                S
                            
                        
                    
                
            
        ,             
                
                    
                        D
                    
                    
                        L
                    
                
                =
                 
                
                    
                        
                            
                                
                                    
                                        X
                                    
                                    
                                        l
                                    
                                    
                                        i
                                    
                                
                                ,
                                 
                                
                                    
                                        Y
                                    
                                    
                                        l
                                    
                                    
                                        i
                                    
                                
                            
                        
                    
                    
                        i
                        =
                        1
                    
                    
                        
                            
                                n
                            
                            
                                L
                            
                        
                    
                
            
        ; unlabeled target domain data             
                
                    
                        
                            
                                
                                    
                                        X
                                    
                                    
                                        U
                                    
                                    
                                        i
                                    
                                
                            
                        
                    
                    
                        i
                        =
                        1
                    
                    
                        
                            
                                n
                            
                            
                                U
                            
                        
                    
                
            
        ; feature dimension m; ratio  [Symbol font/0x64] , parameter [Symbol font/0x6C].”
The examiner notes that Tsai’s source and target domain examples teach transmitting the data in at least two separate batches – one to the first neural network, and the other to the second neural network. The examiner further notes that Tsai’s determining the normalization term in Eq. (6) from the ratio of source and target domain examples, the source domain examples (            
                
                    
                        n
                    
                    
                        S
                    
                    
                        c
                    
                
            
        ) as well as the labeled target domain examples (            
                
                    
                        n
                    
                    
                        L
                    
                    
                        c
                    
                
            
        ) and the unlabeled target domain examples (            
                
                    
                        n
                    
                    
                        U
                    
                    
                        c
                    
                
            
        ) teaches that the equation (e.g., Tsai’s solving Eq. (4) cited for claim 1, supra) is applied using cross-domain batch normalization.)  
Tzeng, Tsai, and Csurka are analogous art because all three references pertain to domain adaptation of a neural network from one domain to another domain with dataset shift. 
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Tzeng in view of Tsai and Csurka to further incorporate Tsai’s cross-domain batch normalization (Tsai, supra).  The modification allows for rewriting a symbolic minimization of an objective function for domain adaptation into computable relations for cross-domain data distributions (EM) and cross-domain conditional data distributions (EC) which in turn allow for solving the optimization problem and learning of the proper weights to achieve improved adaptation capability (Tsai, p. 5083, right-hand column, Eq. (2) for cross-domain data distributions, EM, and Eq. (3) for cross-domain data conditional distributions, EC. P. 5084, right-hand column, ¶ 3: “With both EM and EC defined, we address semisupervised HDA by solving (4). This allows us to learn the proper weights and for the representative instances from both domains (i.e., cross-domain landmarks). As a result, our derived feature transformation A will result in a feature subspace with improved adaptation capability.”)

 
With respect to claim 5, Tzeng modified by Csurka and Tsai teaches the apparatus of claim 1, and Tzeng further teaches: 
wherein the second neural network is established by a copy of the first neural network prior to the second training data being provided to the second neural network. (Tzeng, § 4, ¶ 3: “However, note that the target domain has no label access, and thus without weight sharing target model may quickly learn a degenerate solution if we do not take care with proper initialization and training procedures. Therefore, we use the pre-trained source model as an intitialization [sic] for the target representation space and ﬁx the source model during adversarial training.”  The examiner notes that Tzeng’s using the same, pre-trained source mode as an initialization for the target domain teaches a copy of a first neural network, and that Tzeng’s initializing the second neural network by using a copy of a pre-trained first neural network teaches establishing the second neural network prior to the second training data being provided to the second neural network.)
 
With respect to claim 6, Tzeng modified by Csurka and Tsai teaches the apparatus of claim 1, and Tzeng further teaches: 
wherein the intermediate layers of the first and second neural networks are layers other than output layers. (Tzeng, § 3.1, ¶ 4: “Consider a layered representations [sic] where each layer parameters are denoted as,             
                
                    
                        M
                    
                    
                        s
                    
                    
                        l
                    
                
                 
            
        or             
                
                    
                        M
                    
                    
                        t
                    
                    
                        l
                    
                
            
        , for a given set of equivalent layers, {ℓ1, …, ℓn}. Then the space of constraints explored in the literature can be described through layerwise equality constraints as follows: Eq. (4) (reproduction omitted). 
The examiner notes that any of Tzeng’s i-th layer in the first neural network (e.g., Tzeng’s source encoder CNN) where i {2, 3, …, n-1} (e.g., ℓi {ℓ2, …, ℓn-1} teaches an intermediate layer of the first neural network because the first layer, ℓ1, may constitute an input layer, and the last layer, ℓn, may constitute an output layer of the first neural network, and any layers in between are generally accepted as intermediate or hidden layers). Therefore, the i-th layer where i {2, 3, …, n-1} teaches that the intermediate layers of the first and the second neural networks are layers other than output layers as claimed.)
 
With respect to claim 8, Tzeng modified by Csurka and Tsai teaches the apparatus of claim 1, 
wherein the first training data is related to the second training data, (Tzeng, Figure 3 Caption: “An overview of our proposed Adversarial Discriminative Domain Adaptation (ADDA) approach. We first pre-train a source encoder CNN using labeled source image examples. Next, we perform adversarial adaptation by learning a target encoder CNN such that a discriminator that sees encoded source and target examples cannot reliably predict their domain label. During testing, target images are mapped with the target encoder to the shared feature space and classified by the source classifier. Dashed lines indicate fixed network parameters.” The examiner notes that Tzeng’s source images plus labels for pre-training the source CNN and/or the source images during adversarial adaptation (the claimed first training data) and the target images (the claimed second training data) are provided to Tzeng’s ADDA for learning the target encoder CNN and are thus related to each other.)

wherein the first and second neural networks pertain to object recognition, and (Tzeng, p. 8, Figure 5 description and Caption: “We observe that our unsupervised adaptation algorithm results in a space more conducive to recognition of the most prevalent class of chair.” P. 8, right-hand column, ¶ 2: “In contrast, the classifier trained using ADDA predicts a much wider variety of classes. This leads to decreased accuracy for the pillow class, but significantly higher accuracies for many of the other classes.”
The examiner notes that Tzeng’s source encoder CNN and target encoder CNN in FIG. 3 respectively teach the first and the second neural networks (see citations for claim 1, supra).  The examiner further notes that Tzeng’s recognition of the chair class of the input image using its adversarial discriminative domain adaptation (ADDA) architecture in the description and caption of Figure 5 teaches that the first and the second neural networks pertain to object recognition.)

wherein the first training data is related to the second training data in that the first and second training data both pertain to a same object. (Tzeng, p. 3, left-hand column, ¶ 2: “In adversarial adaptive methods, the main goal is to regularize the learning of the source and target mappings, Ms and Mt, so as to minimize the distance between the empirical source and target mapping distributions: Ms(Xs) and Mt(Xt). If this is the case then the source classification model, Cs, can be directly applied to the target representations, elimating [sic] the need to learn a separate target classifier and instead setting, C = Cs = Ct.” p. 8, Figure 5 description and Caption: “We observe that our unsupervised adaptation algorithm results in a space more conducive to recognition of the most prevalent class of chair.”
The examiner notes that Tzeng’s minimizing the distance between the source mapping distribution (Ms(Xs)) from the first neural network (e.g., Tzeng’s source encoder CNN) and target mapping distribution (Mt(Xt)) from the second neural network (e.g., Tzeng’s target encoder CNN) teaches that the source image and the target image are classified as the same object (e.g., “the most prevalent class of chair” in Figure 5 caption).  Therefore, the examiner asserts that the first and the second training data pertain to the same object as claimed.)

With respect to claim 12, Tzeng teaches:
An apparatus, comprising: access a first domain of training data, the first domain being associated with a first domain genre; (Tzeng, FIG. 3:  


    PNG
    media_image1.png
    198
    791
    media_image1.png
    Greyscale

FIG. 3 caption: “An overview of our proposed Adversarial Discriminative Domain Adaptation (ADDA) approach. We first pre-train a source encoder CNN using labeled source image examples. Next, we perform adversarial adaptation by learning a target encoder CNN such that a discriminator that sees encoded source and target examples cannot reliably predict their domain label. During testing, target images are mapped with the target encoder to the shared feature space and classified by the source encoder CNN. Dashed lines indicate fixed network parameters.” P. 7172, § 5, ¶ 1: “We now evaluate ADDA for unsupervised classiﬁcation adaptation across four different domain shifts. We explore three digits datasets of varying difﬁculty: MNIST [18], USPS, and SVHN [19]. We additionally evaluate on the NYUD [20] dataset to study adaptation across modalities.” 
            The examiner first notes that Tzeng’s source domain teaches a first domain.  The examiner further notes that Tzeng’s source images used in pre-training the source CNN and/or source images used in adversarial adaptation for the source CNN teaches first domain training data. The examiner further notes that Tzeng’s pre-training its source encoder CNN with source images and labels or sending source images to its CNN during adversarial adaptation teaches accessing a first domain of training data.  Moreover, the examiner also notes that Tzeng’s evaluating any one of the four different types of domain shifts to learn the source mapping (e.g., Ms in § 3) for its source encoder CNN to a different type of the four types of domain shifts for adaptation across modalities to a target domain shift (e.g., any one of the remaining three different types of domain shifts) teaches that the first domain (e.g., Tzeng’s source domain) is associated with a first domain genre.
Further, the examiner notes that Tseng interchangeably uses the terms “classifier” (e.g., “source classifier” and “target classifier” in § 3, “task-specific classifier” in § 3.1, “classifier” in FIG. 3 and its caption), “encoder” (e.g., “target encoder” in FIGS. 1 and 3 as well as their respective captions), and “encoder CNN” (e.g., “source encoder CNN” and “target encoder CNN” in the caption of FIG. 3), and that “classifier,” “encoder,” and “encoder CNN” are thus interpreted as functional and/or structural equivalents of each other.)

access a second domain of training data, the second domain being associated with a second domain genre different from the first domain genre; (Tzeng, FIG. 3 and p. 7172, § 5, ¶ 1, supra. FIG. 3 caption: “An overview of our proposed Adversarial Discriminative Domain Adaptation (ADDA) approach. We first pre-train a source encoder CNN using labeled source image examples. Next, we perform adversarial adaptation by learning a target encoder CNN such that a discriminator that sees encoded source and target examples cannot reliably predict their domain label. During testing, target images are mapped with the target encoder to the shared feature space and classified by the source classifier. Dashed lines indicate fixed network parameters.”
The examiner notes that Tzeng’s target domain teaches a second domain, and that Tzeng’s target images used for target CNN during adversarial adaptation teaches second domain of training data.  The examiner also notes that Tzeng’s target encoder CNN’s receiving target images during adversarial adaptation illustrated in FIG. 3 teaches accessing a second neural network.  The examiner further notes that Tzeng’s evaluating any one of the remaining three different types of domain shifts (e.g., three different shifts other than the aforementioned domain shift evaluated by a first neural network) including adaptation across modalities teaches that the target domain is associated with a different, second genre while the source domain above is associated with a first genre (see rationale for the limitation immediately above).)

using the training data from the first and second domains, classify a target data set; and (Tzeng, FIG. 3, supra. The examiner notes that Tzeng’s target image in FIG. 3 teaches a target data set, and that Tzeng’s classifying the above target image by Classifier of the source CNN in FIG. 3 teaches classify a target data set.  
The examiner further notes that the aforementioned classification of the target data set uses the sources images plus labels for the source CNN as well as the source images and target images during adversarial adaption that learns the target CNN.  Therefore, the examiner asserts that Tzeng’s classifying the target data set in FIG. 3 uses the training data from the first and second domains.) 

output a classification of the target data set, (Tzeng, FIG. 3: “class label”.  The examiner notes that the class label generated by Classifier for the target image in FIG. 3 teaches outputting a classification of the target data set.)

Tzeng does not appear to explicitly teach: 
wherein the target data set is classified by a domain adaptation module comprising a cross-domain batch normalization 

Tsai does, however, teach: 
wherein the target data set is classified by a domain adaptation module comprising a cross-domain batch normalization (CDBN) module to adaptively select domain statistics to normalize inputs. (Tsai, p. 5088, § 5, ¶ 1: “We proposed Cross-Domain Landmark Selection (CDLS) for performing heterogeneous domain adaptation (HDA). In addition to the ability to associate heterogeneous data across domains in a semi-supervised setting, our CDLS is able to learn representative cross-domain landmarks for deriving a proper feature subspace for adaptation and classification purposes. Since the derived feature subspace matches cross-domain data distribution while eliminating the domain differences, we can simply project labeled cross-domain data to this domain-invariant subspace for recognizing the unlabeled target-domain instances.” p. 5083, right-hand column, ¶ 2: “To solve (1), we resort to statistics to measure the probability density for describing data distributions. As suggested in [26, 24], we adopt empirical Maximum Mean Discrepancy (MMD) [16] to measure the difference between the above cross-domain data distributions.” p. 5084, left-hand column, ¶ 3: “In (4), [Symbol font/0x64] ∈ [0, 1] controls the portion of cross-domain data in each class to be utilized for adaptation. If = 0, our CDLS would turn into its supervised version as described in Section 3.2.1. While we fix [Symbol font/0x64] = 0.5 in our work, additional analysis on the selection and effect of will be provided in our experiments.” p. 5084, right-hand column, ¶ 2: “It can be seen that, EC term in (6) is extended from (3) by utilizing unlabeled target-domain data with pseudo-labels. Similar to (3), we match cross-domain class-conditional distributions and preserve local embedding of transformed data of each class using via             
                
                    
                        E
                    
                    
                        c
                        o
                        n
                        d
                    
                    
                        c
                    
                
                 
            
        and             
                
                    
                        E
                    
                    
                        e
                        m
                        b
                        e
                        d
                    
                    
                        c
                    
                
            
        , respectively. The normalization term in (6) is calculated as             
                
                    
                        e
                    
                    
                        c
                    
                
                =
                 
                δ
                
                    
                        n
                    
                    
                        S
                    
                    
                        c
                    
                
                
                    
                        n
                    
                    
                        L
                    
                    
                        c
                    
                
                +
                 
                δ
                
                    
                        n
                    
                    
                        L
                    
                    
                        c
                    
                
                
                    
                        n
                    
                    
                        U
                    
                    
                        c
                    
                
                +
                 
                
                    
                        δ
                    
                    
                        2
                    
                
                
                    
                        n
                    
                    
                        U
                    
                    
                        c
                    
                
                
                    
                        n
                    
                    
                        S
                    
                    
                        c
                    
                
            
        .” P. 5085, left-hand column, Algorithm 1: “Input: Labeled source and target-domain data             
                
                    
                        D
                    
                    
                        S
                    
                
                =
                 
                
                    
                        
                            
                                
                                    
                                        X
                                    
                                    
                                        S
                                    
                                    
                                        i
                                    
                                
                                ,
                                 
                                
                                    
                                        Y
                                    
                                    
                                        S
                                    
                                    
                                        i
                                    
                                
                            
                        
                    
                    
                        i
                        =
                        1
                    
                    
                        
                            
                                n
                            
                            
                                S
                            
                        
                    
                
            
        ,             
                
                    
                        D
                    
                    
                        L
                    
                
                =
                 
                
                    
                        
                            
                                
                                    
                                        X
                                    
                                    
                                        l
                                    
                                    
                                        i
                                    
                                
                                ,
                                 
                                
                                    
                                        Y
                                    
                                    
                                        l
                                    
                                    
                                        i
                                    
                                
                            
                        
                    
                    
                        i
                        =
                        1
                    
                    
                        
                            
                                n
                            
                            
                                L
                            
                        
                    
                
            
        ; unlabeled target domain data             
                
                    
                        
                            
                                
                                    
                                        X
                                    
                                    
                                        U
                                    
                                    
                                        i
                                    
                                
                            
                        
                    
                    
                        i
                        =
                        1
                    
                    
                        
                            
                                n
                            
                            
                                U
                            
                        
                    
                
            
        ; feature dimension m; ratio  [Symbol font/0x64] , parameter [Symbol font/0x6C].”
The examiner notes that Tsai’s cross-domain landmark selection (CDLS) for adaptation teaches a cross-domain batch normalization (CDBN) module. The examiner further notes that Tzeng’s CDLS includes the MMD algorithm that selects statistics (e.g., probability density for describing data distributions, discrepancies, means, etc.) to measure the data distributions and computes the normalization term from both the source and target domains (the claimed first and second domain genres) for normalizing inputs (e.g., source and target domain images) for adaptation teaches that the cross-domain batch normalization (CDBN) module adaptively selects domain statics to normalize inputs as claimed.) 

Tzeng and Tsai are analogous art because both pertain to domain adaptation of a neural network from one domain to another domain with dataset shift. 
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Tzeng with Tsai’s adaptively selecting statistics to normalize inputs (Tsai, supra).  The modification provides not only accommodates existing HDA (heterogeneous domain adaptation) that use labeled data during training by adaptively selecting statistics to measure the distances between cross-domain distributions in solving the domain adaptation problem (e.g., Eq. (1) in § 3.2.1) but also provides improved adaptation by adopting unlabeled target-domain data by adaptively selecting statistics in solving a derived version of the aforementioned domain adaptation problem (e.g., Eq. (4) in § 3.2.2) (Tsai, p. 5083, right-hand column, ¶ 1: “Recall that, as discussed in Section 2, most existing HAD works consider a supervised setting. That is, only labeled data in the target domain are utilized during training. In order to match cross-domain data distributions for such standard supervised HDA settings, we formulate the problem formulation as follows: Eq. (1) (reproduction omitted)”. P. 5083, right-hand column, ¶ 2: “To solve (1), we resort to statistics to measure the probability density for describing data distributions.” P. 5083, right-hand column, last paragraph: “To adopt the information of unlabeled target-domain data for improved adaptation, we advocate the learning of the landmarks from cross-domain data when deriving the aforementioned domain-invariant feature subspace.”) 

Tzeng modified by Tsai does not appear to explicitly teach: 
at least one computer storage that is not a transitory signal and that comprises instructions executable by at least one processor to: 

Csurka does, however, teach: 
at least one computer storage that is not a transitory signal and that comprises instructions executable by at least one processor to: (Csurka, ¶ [0042]: “The digital processor 30 can be variously embodied, such as by a single-core processor, a dual-core processor (or more generally by a multiple-core processor), a digital processor and cooperating math coprocessor, a digital controller, or the like. The exemplary digital processor 30, in addition to controlling the operation of the computer system 10, executes the instructions 28 stored in memory 26 for performing the method outlined in FIG. 3.” ¶ [0044]: “The term “software,” as used herein, is intended to encompass any collection or set of instructions executable by a computer or other digital system so as to configure the computer or other digital system to perform the task that is the intent of the software. The term “software” as used herein is intended to encompass such instructions stored in storage medium such as RAM, a hard disk, optical disk, or so forth, and is also intended to encompass so-called “firmware” that is software stored on a ROM or so forth.”)
Tzeng, Tsai, and Csurka are analogous art because all three references pertain to domain adaptation of a neural network from one domain to another domain with dataset shift. 
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Tzeng in view of Tsai to incorporate Csurka’s processor and computer storage (Csurka, supra).  The modification provides the capability of storing instructions for software in a storage medium and configuring a computer or a digital system with the instructions to perform desired or required tasks (Csurka, ¶ [0044]: “The term “software,” as used herein, is intended to encompass any collection or set of instructions executable by a computer or other digital system so as to configure the computer or other digital system to perform the task that is the intent of the software. The term “software” as used herein is intended to encompass such instructions stored in storage medium such as RAM, a hard disk, optical disk, or so forth, and is also intended to encompass so-called “firmware” that is software stored on a ROM or so forth.”)

With respect to claim 18, Tzeng modified by Tsai and Csurka teaches the apparatus of claim 12, and Tsai further teaches: 
wherein the instructions are executable to execute a training operation to learn a ratio to normalize both source and target data. (Tsai, p. 5088, § 5, ¶ 1: “We proposed Cross-Domain Landmark Selection (CDLS) for performing heterogeneous domain adaptation (HDA). In addition to the ability to associate heterogeneous data across domains in a semi-supervised setting, our CDLS is able to learn representative cross-domain landmarks for deriving a proper feature subspace for adaptation and classification purposes. Since the derived feature subspace matches cross-domain data distribution while eliminating the domain differences, we can simply project labeled cross-domain data to this domain-invariant subspace for recognizing the unlabeled target-domain instances.” p. 5083, right-hand column, § 3.2.1, ¶ 2: “To solve (1), we resort to statistics to measure the probability density for describing data distributions. As suggested in [26, 24], we adopt empirical Maximum Mean Discrepancy (MMD) [16] to measure the difference between the above cross-domain data distributions.” p. 5084, left-hand column, § 3.2.2, ¶ 3: “In (4), [Symbol font/0x64] ∈ [0, 1] controls the portion of cross-domain data in each class to be utilized for adaptation. If = 0, our CDLS would turn into its supervised version as described in Section 3.2.1. While we fix [Symbol font/0x64] = 0.5 in our work, additional analysis on the selection and effect of will be provided in our experiments.”  p. 5084, right-hand column, § 3.2.2, ¶ 2: “It can be seen that, EC term in (6) is extended from (3) by utilizing unlabeled target-domain data with pseudo-labels. Similar to (3), we match cross-domain class-conditional distributions and preserve local embedding of transformed data of each class using via             
                
                    
                        E
                    
                    
                        c
                        o
                        n
                        d
                    
                    
                        c
                    
                
                 
            
        and             
                
                    
                        E
                    
                    
                        e
                        m
                        b
                        e
                        d
                    
                    
                        c
                    
                
            
        , respectively. The normalization term in (6) is calculated as             
                
                    
                        e
                    
                    
                        c
                    
                
                =
                 
                δ
                
                    
                        n
                    
                    
                        S
                    
                    
                        c
                    
                
                
                    
                        n
                    
                    
                        L
                    
                    
                        c
                    
                
                +
                 
                δ
                
                    
                        n
                    
                    
                        L
                    
                    
                        c
                    
                
                
                    
                        n
                    
                    
                        U
                    
                    
                        c
                    
                
                +
                 
                
                    
                        δ
                    
                    
                        2
                    
                
                
                    
                        n
                    
                    
                        U
                    
                    
                        c
                    
                
                
                    
                        n
                    
                    
                        S
                    
                    
                        c
                    
                
            
        .” P. 5085, left-hand column, Algorithm 1: “Input: Labeled source and target-domain data             
                
                    
                        D
                    
                    
                        S
                    
                
                =
                 
                
                    
                        
                            
                                
                                    
                                        X
                                    
                                    
                                        S
                                    
                                    
                                        i
                                    
                                
                                ,
                                 
                                
                                    
                                        Y
                                    
                                    
                                        S
                                    
                                    
                                        i
                                    
                                
                            
                        
                    
                    
                        i
                        =
                        1
                    
                    
                        
                            
                                n
                            
                            
                                S
                            
                        
                    
                
            
        ,             
                
                    
                        D
                    
                    
                        L
                    
                
                =
                 
                
                    
                        
                            
                                
                                    
                                        X
                                    
                                    
                                        l
                                    
                                    
                                        i
                                    
                                
                                ,
                                 
                                
                                    
                                        Y
                                    
                                    
                                        l
                                    
                                    
                                        i
                                    
                                
                            
                        
                    
                    
                        i
                        =
                        1
                    
                    
                        
                            
                                n
                            
                            
                                L
                            
                        
                    
                
            
        ; unlabeled target domain data             
                
                    
                        
                            
                                
                                    
                                        X
                                    
                                    
                                        U
                                    
                                    
                                        i
                                    
                                
                            
                        
                    
                    
                        i
                        =
                        1
                    
                    
                        
                            
                                n
                            
                            
                                U
                            
                        
                    
                
            
        ; feature dimension m; ratio  [Symbol font/0x64] , parameter [Symbol font/0x6C].” 
The examiner notes that Tsai’s ratio controlling the portion of cross-domain data cited above teaches a ratio, and that Tsai’s determining the normalizing term in Eq. (6) for both the source and target data based on the above ratio teaches normalize both source and target data. The examiner further notes that Tsai’s training and learning described in §§ 3.2.1 and 3.2.2, supra during which the cross-domain data distributions (EM), the cross-domain conditional data distributions (EC), and the aforementioned ratio for batch normalization are determined teaches a training operation, and that Tsai’s training and learning to determining the aforementioned ratio teaches learning a ratio.  Therefore, the examiner notes that Tzeng teaches the above limitation in its entirety.)

Tzeng and Tsai are analogous art because both pertain to domain adaptation of a neural network from one domain to another domain with dataset shift. 
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Tzeng with Tsai’s applying an equation accounting for a ratio for normalizing first and second outputs respectively from a first and a second neural network (Tsai, supra).  The modification exploits heterogeneous data across domains with the ability to identify the adaptation ability of each instance with properly assigned weight by determining a ratio that controls the portions of cross-domain landmarks to be exploited and normalized for adaptation (Tsai, p. 5084, left-hand column, ¶ 1: “Extended from (1), our proposed algorithm crossdomain landmark selection (CDLS) exploits heterogenous data across domains, with the ability to identify the adaptation ability of each instance with a properly assigned weight. Instances in either domain with a nonzero weight will be considered as a landmark.” P. 5088, left-hand column, ¶ 1: “Recall that the ratio in (4) controls the portion of crossdomain landmarks to be exploited for adaptation. Figure 6(c) presents the performance of CDLS with different values. It is worth repeating that, CDLS would be simplified as the supervised version CDLS sup if [Symbol font/0x64] = 0, as illustrated by the flat dotted line in Figure 6(c). As expected, using all ([Symbol font/0x64] = 1) or none ([Symbol font/0x64] = 0) of the cross-domain data as landmarks would not be able to achieve satisfactory HAD performance. Therefore, the choice of [Symbol font/0x64] = 0.5 would be reasonable in our experiments.”)

With respect to claim 19, Tzeng modified by Tsai and Csurka teaches the apparatus of claim 18, and Tzeng further teaches: 
wherein the instructions are executable to execute a test operation to use the ratio and statistics related to the target to normalize statistics for both the source and the target. (Tzeng, FIG. 3 cited for claim 12, supra. The examiner notes that Tzeng’s Testing illustrated in FIG. 3 teaches a test operation.)  
	Tzeng does not appear to explicitly teach that the test operation is to “use the ratio and statistics related to the target to normalize statistics for both the source and the target”. 
	Tsai does, however, teach: 
	use the ratio and statistics related to the target to normalize statistics for both the source and the target. (Tsai, p. 5083, right-hand column, § 3.2.1, ¶ 2, p. 5084, left-hand column, § 3.2.2, ¶ 3, and p. 5084, right-hand column, § 3.2.2, ¶ 2 cited for claim 18, supra.  The examiner notes that these passages cited for claim 18, supra teach an operation that uses the statistics pertaining to both the source and target data for Tsai’s Maximum Mean Discrepancy measuring the distance between source and target domain data distributions teaches statistics related to the target. The examiner also notes that Tsai’s normalizing both the source and target data such as statistics based on the aforementioned ratio, when combined with Tzeng’s test operation, teaches the above limitation in its entirety.)

Tzeng, Tsai, and Csurka are analogous art because all three references pertain to domain adaptation of a neural network from one domain to another domain with dataset shift. 
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Tzeng in view of Tsai and Csurka to further incorporate Tsai’s normalizing statistics for both the source domain data and the target domain data (Tsai, supra).  The modification provides improved domain adaptation capability by accommodating both labeled data, as most existing domain adaptation techniques do, and unlabeled data and enables learning proper weights from heterogeneous domains by solving heterogeneous domain adaptation problems with a unique normalization term (Tsai, p. 5083, right-hand column, ¶ 1: “Recall that, as discussed in Section 2, most existing HAD works consider a supervised setting. That is, only labeled data in the target domain are utilized during training. In order to match cross-domain data distributions for such standard supervised HDA settings, we formulate the problem formulation as follows: Eq. (1) (reproduction omitted)”. P. 5083, right-hand column, last paragraph: “To adopt the information of unlabeled target-domain data for improved adaptation, we advocate the learning of the landmarks from cross-domain data when deriving the aforementioned domain-invariant feature subspace.” P. 5084, right-hand column, ¶ 3: “With both EM and EC defined, we address semi-supervised HDA by solving (4). This allows us to learn the proper weights [Symbol font/0x61] and [Symbol font/0x62] for the representative instances from both domains (i.e., cross-domain landmarks). As a result, our derived feature transformation A will result in a feature subspace with improved adaptation capability.”)

[Symbol font/0xB7]	Claim(s) 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tzeng et al. Adversarial Discriminative Domain Adaptation (17 Feb. 2017) (hereinafter Tzeng) in view of Tsai et al. Learning Cross-Domain Landmarks for Heterogeneous Domain Adaptation (2016) (hereinafter Tsai) and Csurka et al. USPGPub US20160078359A published on Mar. 17, 2016 (hereinafter Csurka) and further in view of Wu et al. Cross-view Action Recognition over Heterogeneous Feature Spaces (2013) (hereinafter Wu).
With respect to claim 7, Tzeng modified by Tsai and Csurka teaches the apparatus of claim 1, and Tzeng further teaches: 
wherein the first training data is related to the second training data, (Tzeng, Figure 3 Caption: “An overview of our proposed Adversarial Discriminative Domain Adaptation (ADDA) approach. We first pre-train a source encoder CNN using labeled source image examples. Next, we perform adversarial adaptation by learning a target encoder CNN such that a discriminator that sees encoded source and target examples cannot reliably predict their domain label. During testing, target images are mapped with the target encoder to the shared feature space and classified by the source classifier. Dashed lines indicate fixed network parameters.” The examiner notes that Tzeng’s source images plus labels for pre-training the source CNN and/or the source images during adversarial adaptation (the claimed first training data) and the target images (the claimed second training data) are provided to Tzeng’s ADDA for learning the target encoder CNN and are thus related to each other.)

Tzeng modified by Tsai and Csurka does not appear to explicitly teach: 
wherein the first and second neural networks pertain to action recognition, and 
wherein the first training data is related to the second training data in that the first and second training data both pertain to a same action.

Wu does, however, teach: 
wherein the first and second neural networks pertain to action recognition, and (Wu, p. 609, § I, ¶ 2: “In this work, we propose a new transfer learning approach, namely Heterogeneous Transfer Discriminant-analysis of Canonical Correlations (HTDCC), for crossview action recognition over heterogeneous feature spaces.” Pp. 609-610, § I, ¶ 3: “In order to adapt multiple source views to the target view, we additionally present a joint weight learning method to effectively combine multiple transferred source-view classifiers to generate the target-view classifiers.” 
The examiner notes that Wu’s source-view classifier and target-view classifier respectively teach a first neural network and a second neural network. The examiner further notes that Wu’s source views and target views respectively teach first training data and second training data.  Further, the examiner notes that Wu’s performing its HTDCC for action recognition with the aforementioned source-view classifier and target-view classifier teaches that the first and second neural networks pertain to action recognition.)

wherein the first training data is related to the second training data in that the first and second training data both pertain to a same action. (Wu, p. 610, § 2, ¶ 1: “These two methods rely on simultaneous observations of the same action instance from multiple views.” Wu, p. 614, § 6 Conclusions, ¶ 1: “Our method neither requires the same type of feature shared by different views nor limits to any corresponding action instances in different views. Two projection matrices are learned to respectively map the data from source and target views to a common space, by simultaneously minimizing the canonical correlations of inter-class samples, maximizing the intra-class canonical correlations, and reducing the data distribution mismatch between source and target views.” 
The examiner notes that Wu’s source views and target views respectively teach first and second training data (see e.g., § I, ¶¶ 2-3, supra).  The examiner further notes that Wu’s performing action recognition on views of the same action instance as well as views that are not limited to those corresponding to the same action instance teaches that the first training data and the second training data both pertain to the same action. )

Tzeng, Tsai, Csurka, and Wu are analogous art because all four references pertain to domain adaptation of a neural network from one domain to another domain with heterogeneous data distributions. 
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Tzeng in view of Tsai and Csurka to further incorporate Wu’s multiple neural networks pertaining to action recognition using two pieces of training data that pertain to the same action (Wu, supra).  The modification provides a transferable dictionary pair that is transferrable among action recognition models for different domains and includes two dictionaries that correspond to the source domain data and the target domain data by transferring action recognition modules between multiple domains and thus provides the capability of action recognition across multiple domains that exhibit data distribution mismatch (Wu, p. 610, § 2, ¶ 1: “Their work requires feature-to-feature correspondence at the frame-level to train a classifier. Liu et al. [9] proposed a bipartite graph-based approach to learn bilingual-words from source-view and target-view vocabularies, and then transferred action models between two views via the bag-of-bilingual-words model. Zheng et al. [20] presented a transferable dictionary pair consisting of two dictionaries that correspond to the source and target views respectively, and learned the same sparse representation of each video in the pair views. These two methods rely on simultaneous observations of the same action instance from multiple views.”)

[Symbol font/0xB7]	Claim(s) 9-11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tzeng et al. Adversarial Discriminative Domain Adaptation (17 Feb. 2017) (hereinafter Tzeng) in view of Tsai et al. Learning Cross-Domain Landmarks for Heterogeneous Domain Adaptation (2016) (hereinafter Tsai).
With respect to claim 9, Tzeng teaches method, comprising:
accessing a first neural network, the first neural network being associated with a first data type; (Tzeng, FIG. 3:  


    PNG
    media_image1.png
    198
    791
    media_image1.png
    Greyscale

FIG. 3 caption: “An overview of our proposed Adversarial Discriminative Domain Adaptation (ADDA) approach. We first pre-train a source encoder CNN using labeled source image examples. Next, we perform adversarial adaptation by learning a target encoder CNN such that a discriminator that sees encoded source and target examples cannot reliably predict their domain label. During testing, target images are mapped with the target encoder to the shared feature space and classified by the source encoder CNN. Dashed lines indicate fixed network parameters.” P. 7172, § 5, ¶ 1: “We now evaluate ADDA for unsupervised classiﬁcation adaptation across four different domain shifts. We explore three digits datasets of varying difﬁculty: MNIST [18], USPS, and SVHN [19]. We additionally evaluate on the NYUD [20] dataset to study adaptation across modalities.” 
            The examiner first notes that Tzeng’s source encoder CNN teaches a first neural network, and that Tzeng’s pre-training its source encoder CNN with source images and labels or sending source images to its CNN during adversarial adaptation teaches accessing a first neural network.  Moreover, the examiner further notes that Tzeng’s evaluating any one of the four different types of domain shifts to learn the source mapping (e.g., Ms in § 3) for its source encoder CNN to a different type of the four types of domain shifts for adaptation across modalities to a target domain shift (e.g., any one of the remaining three different types of domain shifts) teaches that the source encoder CNN is associated with a first data type.
Further, the examiner notes that Tseng interchangeably uses the terms “classifier” (e.g., “source classifier” and “target classifier” in § 3, “task-specific classifier” in § 3.1, “classifier” in FIG. 3 and its caption), “encoder” (e.g., “target encoder” in FIGS. 1 and 3 as well as their respective captions), and “encoder CNN” (e.g., “source encoder CNN” and “target encoder CNN” in the caption of FIG. 3), and that “classifier,” “encoder,” and “encoder CNN” are thus interpreted as functional and/or structural equivalents of each other.)

accessing a second neural network, the second neural network being associated with a second data type different from the first data type; (Tzeng, FIG. 3 and p. 7172, § 5, ¶ 1, supra. FIG. 3 caption: “An overview of our proposed Adversarial Discriminative Domain Adaptation (ADDA) approach. We first pre-train a source encoder CNN using labeled source image examples. Next, we perform adversarial adaptation by learning a target encoder CNN such that a discriminator that sees encoded source and target examples cannot reliably predict their domain label. During testing, target images are mapped with the target encoder to the shared feature space and classified by the source classifier. Dashed lines indicate fixed network parameters.”
The examiner notes that Tzeng’s target encoder CNN teaches a second neural network, and that Tzeng’s target encoder CNN’s receiving target images during adversarial adaptation illustrated in FIG. 3 teaches accessing a second neural network.  The examiner further notes that Tzeng’s evaluating any one of the remaining three different types of domain shifts (e.g., three different shifts other than the aforementioned domain shift evaluated by the first neural network) including adaptation across modalities teaches that the target encoder CNN is associated with a different, second data type while the source encoder CNN above is associated with a first data type (see rationale for the limitation immediately above).)

providing, as input, first training data to the first neural network; (Tzeng, FIG. 3 caption: “An overview of our proposed Adversarial Discriminative Domain Adaptation (ADDA) approach. We first pre-train a source encoder CNN using labeled source image examples.”
The examiner notes that Tzeng’s source images provided to its source CNN illustrated in FIG. 3 or the labeled source image examples described in FIG. 3’s caption teaches first training data.  The examiner further notes that Tzeng’s providing the source images to its source CNN during adversarial adaptation or providing the source images plus labels to the source CNN during pre-training teaches provide, as input, first training data to the first neural network.) 

providing, as input, second training data to the second neural network, the first training data being different from the second training data; (Tzeng, FIG. 3 caption: “An overview of our proposed Adversarial Discriminative Domain Adaptation (ADDA) approach. We first pre-train a source encoder CNN using labeled source image examples. Next, we perform adversarial adaptation by learning a target encoder CNN such that a discriminator that sees encoded source and target examples cannot reliably predict their domain label. During testing, target images are mapped with the target encoder to the shared feature space and classified by the source encoder CNN. Dashed lines indicate fixed network parameters.” 
The examiner notes that Tzeng’s learning a target encoder CNN (also referred to as “target encoder” above) teaches training the second neural network, and that Tzeng’s providing target examples in FIG. 3’s Caption or the target images illustrated in FIG. 3 to the target encoder CNN (the clamed second neural network) for learning the target encoder CNN teaches provide, as input, second training data to the second neural network (e.g., Tzeng’s target encoder CNN/target CNN cited above.) Therefore, the examiner asserts that at least the aforementioned passages and figure teach the above limitation.)

identifying a first output from a hidden layer of the first neural network, the first output being based on the first training data; (Tzeng, § 3.1, ¶ 3: “Once the mapping parameterization is determined for the source, we must decide how to parametrize the target mapping Mt. In general, the target mapping almost always matches the source in terms of the speciﬁc functional layer (architecture), but different methods have proposed various regularization techniques. All methods initialize the target mapping parameters with the source, but different methods choose different constraints between the source and target mappings, ψ(Ms,Mt).” § 3.1, ¶ 4: “Consider a layered representations [sic] where each layer parameters are denoted as,             
                
                    
                        M
                    
                    
                        s
                    
                    
                        l
                    
                
                 
            
        or             
                
                    
                        M
                    
                    
                        t
                    
                    
                        l
                    
                
            
        , for a given set of equivalent layers, {ℓ1, …, ℓn}. Then the space of constraints explored in the literature can be described through layerwise equality constraints as follows:             
                ψ
                
                    
                        
                            
                                M
                            
                            
                                s
                            
                        
                        ,
                         
                        
                            
                                M
                            
                            
                                t
                            
                        
                    
                
                ≜
                
                    
                        
                            
                                
                                    
                                        ψ
                                    
                                    
                                        
                                            
                                                l
                                            
                                            
                                                i
                                            
                                        
                                    
                                
                                
                                    
                                        
                                            
                                                M
                                            
                                            
                                                s
                                            
                                            
                                                
                                                    
                                                        l
                                                    
                                                    
                                                        i
                                                    
                                                
                                            
                                        
                                        |
                                        ,
                                         
                                        
                                            
                                                M
                                            
                                            
                                                t
                                            
                                            
                                                
                                                    
                                                        l
                                                    
                                                    
                                                        i
                                                    
                                                
                                            
                                        
                                    
                                
                            
                        
                    
                    
                        i
                        ∈
                        
                            
                                1
                                …
                                n
                            
                        
                    
                
                 
                 
                 
                 
                (
                4
                )
            
         where each individual layer can be constrained independently. A very common form of constraint is source and target layerwise equality:             
                
                    
                        ψ
                    
                    
                        
                            
                                l
                            
                            
                                i
                            
                        
                    
                
                
                    
                        
                            
                                M
                            
                            
                                s
                            
                            
                                
                                    
                                        l
                                    
                                    
                                        i
                                    
                                
                            
                        
                        ,
                         
                        
                            
                                M
                            
                            
                                t
                            
                            
                                
                                    
                                        l
                                    
                                    
                                        i
                                    
                                
                            
                        
                    
                
                =
                 
                
                    
                        
                            
                                M
                            
                            
                                s
                            
                            
                                
                                    
                                        l
                                    
                                    
                                        i
                                    
                                
                            
                        
                        =
                        
                            
                                M
                            
                            
                                t
                            
                            
                                
                                    
                                        l
                                    
                                    
                                        i
                                    
                                
                            
                        
                    
                
                 
                 
                 
                (
                5
                )
            
        .”  p. 7169, § 3, ¶ 3: “In adversarial adaptive methods, the main goal is to regularize the learning of the source and target mappings, Ms and Mt, so as to minimize the distance between the empirical source and target mapping distributions: Ms(Xs) and Mt(Xt).” p. 7169, § 3, ¶ 4: “First a domain discriminator, D, which classiﬁes whether a data point is drawn from the source or the target domain. Thus, we can derive a generic formulation for domain adversarial techniques below: 

    PNG
    media_image2.png
    200
    400
    media_image2.png
    Greyscale
”
The examiner notes that any of Tzeng’s i-th layer in the first neural network (e.g., Tzeng’s source encoder CNN) where i {2, 3, …, n-1} (e.g., ℓi {ℓ2, …, ℓn-1} teaches an intermediate layer of the first neural network because the first layer, ℓ1, may constitute an input layer, and the last layer, ℓn, may constitute an output layer of the first neural network, and any layers in between are generally accepted as intermediate or hidden layers). The examiner also notes that the source mapping distribution for the i-th layer,             
                
                    
                        M
                    
                    
                        s
                    
                    
                        
                            
                                l
                            
                            
                                i
                            
                        
                    
                
                
                    
                        
                            
                                X
                            
                            
                                t
                            
                        
                    
                
            
        , generated by the i-th layer (            
                l
                i
            
        )  of the first neural network for the source training input images (Xt) teaches a first output of the intermediate layer of the first neural network, and that Tzeng thus teaches the above limitation.)

identifying a second output from a hidden layer of the second neural network, the second output being based on the second training data, the respective hidden layers of the first and second neural networks being parallel layers; (Tzeng, § 3.1, ¶ 3: “Once the mapping parameterization is determined for the source, we must decide how to parametrize the target mapping Mt. In general, the target mapping almost always matches the source in terms of the speciﬁc functional layer (architecture), but different methods have proposed various regularization techniques. All methods initialize the target mapping parameters with the source, but different methods choose different constraints between the source and target mappings, ψ(Ms,Mt).” § 3.1, ¶ 4: “Consider a layered representations [sic] where each layer parameters are denoted as,             
                
                    
                        M
                    
                    
                        s
                    
                    
                        l
                    
                
                 
            
        or             
                
                    
                        M
                    
                    
                        t
                    
                    
                        l
                    
                
            
        , for a given set of equivalent layers, {ℓ1, …, ℓn}. Then the space of constraints explored in the literature can be described through layerwise equality constraints as follows:            
                ψ
                
                    
                        
                            
                                M
                            
                            
                                s
                            
                        
                        ,
                         
                        
                            
                                M
                            
                            
                                t
                            
                        
                    
                
                ≜
                
                    
                        
                            
                                
                                    
                                        ψ
                                    
                                    
                                        
                                            
                                                l
                                            
                                            
                                                i
                                            
                                        
                                    
                                
                                
                                    
                                        
                                            
                                                M
                                            
                                            
                                                s
                                            
                                            
                                                
                                                    
                                                        l
                                                    
                                                    
                                                        i
                                                    
                                                
                                            
                                        
                                        |
                                        ,
                                         
                                        
                                            
                                                M
                                            
                                            
                                                t
                                            
                                            
                                                
                                                    
                                                        l
                                                    
                                                    
                                                        i
                                                    
                                                
                                            
                                        
                                    
                                
                            
                        
                    
                    
                        i
                        ∈
                        
                            
                                1
                                …
                                n
                            
                        
                    
                
                 
                 
                 
                 
                (
                4
                )
            
         where each individual layer can be constrained independently. A very common form of constraint is source and target layerwise equality:             
                
                    
                        ψ
                    
                    
                        
                            
                                l
                            
                            
                                i
                            
                        
                    
                
                
                    
                        
                            
                                M
                            
                            
                                s
                            
                            
                                
                                    
                                        l
                                    
                                    
                                        i
                                    
                                
                            
                        
                        ,
                         
                        
                            
                                M
                            
                            
                                t
                            
                            
                                
                                    
                                        l
                                    
                                    
                                        i
                                    
                                
                            
                        
                    
                
                =
                 
                
                    
                        
                            
                                M
                            
                            
                                s
                            
                            
                                
                                    
                                        l
                                    
                                    
                                        i
                                    
                                
                            
                        
                        =
                        
                            
                                M
                            
                            
                                t
                            
                            
                                
                                    
                                        l
                                    
                                    
                                        i
                                    
                                
                            
                        
                    
                
                 
                 
                 
                (
                5
                )
            
        .”  p. 7169, § 3, ¶ 3: “In adversarial adaptive methods, the main goal is to regularize the learning of the source and target mappings, Ms and Mt, so as to minimize the distance between the empirical source and target mapping distributions: Ms(Xs) and Mt(Xt).” p. 7169, § 3, ¶ 4: “First a domain discriminator, D, which classiﬁes whether a data point is drawn from the source or the target domain. Thus, we can derive a generic formulation for domain adversarial techniques below: 

    PNG
    media_image2.png
    200
    400
    media_image2.png
    Greyscale
”
The examiner notes that any of Tzeng’s i-th layer where (e.g., ℓi {ℓ2, …, ℓn-1} teaches an intermediate layer of the second neural network (e.g., Tzeng’s target encoder CNN) because the first layer, ℓ1, may constitute an input layer, and the last layer, ℓn, may constitute an output layer of the second neural network, and any layers in between are generally accepted as intermediate or hidden layers). The examiner also notes that the target mapping distribution for the i-th layer,             
                
                    
                        M
                    
                    
                        t
                    
                    
                        
                            
                                l
                            
                            
                                i
                            
                        
                    
                
                
                    
                        
                            
                                X
                            
                            
                                t
                            
                        
                    
                
            
        , generated by the second neural network (e.g., Tzeng’s target encoder neural network) via learning the target representation mapping,             
                
                    
                        M
                    
                    
                        t
                    
                    
                        
                            
                                l
                            
                            
                                i
                            
                        
                    
                
            
        , of the i-th layer (            
                l
                i
            
        )  for the target training input images (Xt) teaches a second output (e.g., the aforementioned target mapping distribution, (Xt), for the target input images Xt), and that Tzeng thus teaches the above limitation. Further, The examiner notes that for the same index “i”, the source layer, , and the target layer, , are parallel layers for Tzeng’s layerwise computation (e.g., Eqns. (3)-(5), supra.)

Tzeng does not appear to explicitly teach: 
identify a ratio to normalize the first output and the second output; 
apply an equation that accounts for the ratio to change one or more weights of the intermediate layer of the second neural network.
Tsai does, however, teach: 

identifying a ratio to normalize the first output and the second output; (Tsai, p. 5084, left-hand column, ¶ 2: “With the above goal, the objective function of our CDLS can be formulated as follows:

    PNG
    media_image3.png
    200
    400
    media_image3.png
    Greyscale
”
p. 5084, left-hand column, ¶ 3: “In (4), [Symbol font/0x64] ∈ [0, 1] controls the portion of cross-domain data in each class to be utilized for adaptation. If = 0, our CDLS would turn into its supervised version as described in Section 3.2.1. While we fix [Symbol font/0x64] = 0.5 in our work, additional analysis on the selection and effect of will be provided in our experiments.” 
p. 5084, left-column, last paragraph – right-hand column, first paragraph: “To match cross-domain conditional data distributions via EC, we apply SVM trained from labeled cross-domain data to predict the pseudo-labels             
                
                    
                        
                            
                                y
                            
                            ~
                        
                    
                    
                        u
                    
                    
                        i
                    
                
            
         for             
                
                    
                        
                            
                                x
                            
                            ^
                        
                    
                    
                        u
                    
                    
                        i
                    
                
            
         (as described later in Section 3.3.3). With             
                
                    
                        
                            
                                
                                    
                                        
                                            
                                                y
                                            
                                            ~
                                        
                                    
                                    
                                        u
                                    
                                    
                                        i
                                    
                                
                            
                        
                    
                    
                        i
                        =
                        1
                    
                    
                        
                            
                                n
                            
                            
                                U
                            
                        
                    
                
            
         assigned for XU, the EC term in (4) can be expressed as:
            
                
                    
                        E
                    
                    
                        C
                    
                
                
                    
                        A
                        ,
                        
                            
                                D
                            
                            
                                S
                            
                        
                        ,
                         
                        
                            
                                D
                            
                            
                                L
                            
                        
                        ,
                        
                            
                                X
                            
                            
                                U
                            
                        
                        ,
                        α
                        ,
                         
                        β
                    
                
                =
                
                    
                        ∑
                        
                            c
                            =
                            1
                        
                        
                            C
                        
                    
                    
                        
                            
                                E
                            
                            
                                c
                                o
                                n
                                d
                            
                            
                                c
                            
                        
                        +
                         
                        
                            
                                1
                            
                            
                                
                                    
                                        e
                                    
                                    
                                        c
                                    
                                
                            
                        
                        
                            
                                E
                            
                            
                                e
                                m
                                b
                                e
                                d
                            
                            
                                c
                            
                        
                    
                
            
        ,		(6)
where

    PNG
    media_image4.png
    200
    400
    media_image4.png
    Greyscale
”
p. 5084, right-hand column, ¶ 2: “It can be seen that, EC term in (6) is extended from (3) by utilizing unlabeled target-domain data with pseudo-labels. Similar to (3), we match cross-domain class-conditional distributions and preserve local embedding of transformed data of each class using via             
                
                    
                        E
                    
                    
                        c
                        o
                        n
                        d
                    
                    
                        c
                    
                
                 
            
        and             
                
                    
                        E
                    
                    
                        e
                        m
                        b
                        e
                        d
                    
                    
                        c
                    
                
            
        , respectively. The normalization term in (6) is calculated as             
                
                    
                        e
                    
                    
                        c
                    
                
                =
                 
                δ
                
                    
                        n
                    
                    
                        S
                    
                    
                        c
                    
                
                
                    
                        n
                    
                    
                        L
                    
                    
                        c
                    
                
                +
                 
                δ
                
                    
                        n
                    
                    
                        L
                    
                    
                        c
                    
                
                
                    
                        n
                    
                    
                        U
                    
                    
                        c
                    
                
                +
                 
                
                    
                        δ
                    
                    
                        2
                    
                
                
                    
                        n
                    
                    
                        U
                    
                    
                        c
                    
                
                
                    
                        n
                    
                    
                        S
                    
                    
                        c
                    
                
            
        .” P. 5085, left-hand column, Algorithm 1: “Input: Labeled source and target-domain data             
                
                    
                        D
                    
                    
                        S
                    
                
                =
                 
                
                    
                        
                            
                                
                                    
                                        X
                                    
                                    
                                        S
                                    
                                    
                                        i
                                    
                                
                                ,
                                 
                                
                                    
                                        Y
                                    
                                    
                                        S
                                    
                                    
                                        i
                                    
                                
                            
                        
                    
                    
                        i
                        =
                        1
                    
                    
                        
                            
                                n
                            
                            
                                S
                            
                        
                    
                
            
        ,             
                
                    
                        D
                    
                    
                        L
                    
                
                =
                 
                
                    
                        
                            
                                
                                    
                                        X
                                    
                                    
                                        l
                                    
                                    
                                        i
                                    
                                
                                ,
                                 
                                
                                    
                                        Y
                                    
                                    
                                        l
                                    
                                    
                                        i
                                    
                                
                            
                        
                    
                    
                        i
                        =
                        1
                    
                    
                        
                            
                                n
                            
                            
                                L
                            
                        
                    
                
            
        ; unlabeled target domain data             
                
                    
                        
                            
                                
                                    
                                        X
                                    
                                    
                                        U
                                    
                                    
                                        i
                                    
                                
                            
                        
                    
                    
                        i
                        =
                        1
                    
                    
                        
                            
                                n
                            
                            
                                U
                            
                        
                    
                
            
        ; feature dimension m; ratio  [Symbol font/0x64] , parameter [Symbol font/0x6C].” 
The examiner notes that Tsai’s ratio, [Symbol font/0x64], for computing the normalization term             
                
                    
                        e
                    
                    
                        c
                    
                
                =
                 
                δ
                
                    
                        n
                    
                    
                        S
                    
                    
                        c
                    
                
                
                    
                        n
                    
                    
                        L
                    
                    
                        c
                    
                
                +
                 
                δ
                
                    
                        n
                    
                    
                        L
                    
                    
                        c
                    
                
                
                    
                        n
                    
                    
                        U
                    
                    
                        c
                    
                
                +
                 
                
                    
                        δ
                    
                    
                        2
                    
                
                
                    
                        n
                    
                    
                        U
                    
                    
                        c
                    
                
                
                    
                        n
                    
                    
                        S
                    
                    
                        c
                    
                
            
         in Eq. (6) teaches the claimed ratio, and that Tsai’s source domain data (e.g., 
    PNG
    media_image5.png
    69
    162
    media_image5.png
    Greyscale
in Eq. (6) where [Symbol font/0x61]i and             
                
                    
                        
                            
                                α
                            
                            
                                i
                            
                        
                         
                        a
                        n
                        d
                         
                        x
                    
                    
                        S
                    
                    
                        i
                        ,
                         
                         
                        c
                    
                
            
         respectively denote weights and source domain input) as well as target domain data (e.g., 
    PNG
    media_image6.png
    55
    106
    media_image6.png
    Greyscale
 that denotes the product of the weight [Symbol font/0x62]i and unlabeled target domain data             
                
                    
                        
                            
                                x
                            
                            ^
                        
                    
                    
                        u
                    
                    
                        i
                        ,
                        c
                    
                
            
        ) respectively teach outputs of the first and the second intermedaite layers. The examiner thus asserts that Tsai’s determining the ratio for normalization teaches the above limitation.) 

applying the ratio to outputs from the hidden layer of the second neural network to normalize the outputs from the hidden layer of the second neural network. (Tsai, p. 5084, left-hand column, ¶ 2: “With the above goal, the objective function of our CDLS can be formulated as follows:

    PNG
    media_image3.png
    200
    400
    media_image3.png
    Greyscale
”
p. 5084, right-hand column, ¶ 2: “It can be seen that, EC term in (6) is extended from (3) by utilizing unlabeled target-domain data with pseudo-labels. Similar to (3), we match cross-domain class-conditional distributions and preserve local embedding of transformed data of each class using via             
                
                    
                        E
                    
                    
                        c
                        o
                        n
                        d
                    
                    
                        c
                    
                
                 
            
        and             
                
                    
                        E
                    
                    
                        e
                        m
                        b
                        e
                        d
                    
                    
                        c
                    
                
            
        , respectively. The normalization term in (6) is calculated as             
                
                    
                        e
                    
                    
                        c
                    
                
                =
                 
                δ
                
                    
                        n
                    
                    
                        S
                    
                    
                        c
                    
                
                
                    
                        n
                    
                    
                        L
                    
                    
                        c
                    
                
                +
                 
                δ
                
                    
                        n
                    
                    
                        L
                    
                    
                        c
                    
                
                
                    
                        n
                    
                    
                        U
                    
                    
                        c
                    
                
                +
                 
                
                    
                        δ
                    
                    
                        2
                    
                
                
                    
                        n
                    
                    
                        U
                    
                    
                        c
                    
                
                
                    
                        n
                    
                    
                        S
                    
                    
                        c
                    
                
            
        .” P. 5085, left-hand column, Algorithm 1: “Input: Labeled source and target-domain data             
                
                    
                        D
                    
                    
                        S
                    
                
                =
                 
                
                    
                        
                            
                                
                                    
                                        X
                                    
                                    
                                        S
                                    
                                    
                                        i
                                    
                                
                                ,
                                 
                                
                                    
                                        Y
                                    
                                    
                                        S
                                    
                                    
                                        i
                                    
                                
                            
                        
                    
                    
                        i
                        =
                        1
                    
                    
                        
                            
                                n
                            
                            
                                S
                            
                        
                    
                
            
        ,             
                
                    
                        D
                    
                    
                        L
                    
                
                =
                 
                
                    
                        
                            
                                
                                    
                                        X
                                    
                                    
                                        l
                                    
                                    
                                        i
                                    
                                
                                ,
                                 
                                
                                    
                                        Y
                                    
                                    
                                        l
                                    
                                    
                                        i
                                    
                                
                            
                        
                    
                    
                        i
                        =
                        1
                    
                    
                        
                            
                                n
                            
                            
                                L
                            
                        
                    
                
            
        ; unlabeled target domain data             
                
                    
                        
                            
                                
                                    
                                        X
                                    
                                    
                                        U
                                    
                                    
                                        i
                                    
                                
                            
                        
                    
                    
                        i
                        =
                        1
                    
                    
                        
                            
                                n
                            
                            
                                U
                            
                        
                    
                
            
        ; feature dimension m; ratio  [Symbol font/0x64] , parameter [Symbol font/0x6C].” “5: Update landmark weights {[Symbol font/0x61], [Symbol font/0x62]} by (9)” “ p. 5084, right-hand column, ¶ 3: “With both EM and EC defined, we address semi-supervised HDA by solving (4). This allows us to learn the proper weights and for the representative instances from both domains (i.e., cross-domain landmarks).” P. 5085, § 3.3.2, Eqns. (8)-(9). 
The examiner notes that Tsai’s ratio, [Symbol font/0x64], for computing the normalization term             
                
                    
                        e
                    
                    
                        c
                    
                
                =
                 
                δ
                
                    
                        n
                    
                    
                        S
                    
                    
                        c
                    
                
                
                    
                        n
                    
                    
                        L
                    
                    
                        c
                    
                
                +
                 
                δ
                
                    
                        n
                    
                    
                        L
                    
                    
                        c
                    
                
                
                    
                        n
                    
                    
                        U
                    
                    
                        c
                    
                
                +
                 
                
                    
                        δ
                    
                    
                        2
                    
                
                
                    
                        n
                    
                    
                        U
                    
                    
                        c
                    
                
                
                    
                        n
                    
                    
                        S
                    
                    
                        c
                    
                
            
         in Eq. (6) teaches the claimed ratio, and that Tsai’s solving Eq. (4) with EM and EC to learn weights from target domain teaches change one or more weights of the intermediate layer of the second neural network. The examiner further notes that Eq. (9), which is based on Eq. (8) that minimizes the objective function based on the aforementioned ratio for normalization teaches an equation that accounts for the ratio and is applied to change and optimize the weights {[Symbol font/0x61], [Symbol font/0x62]}.)
Tzeng and Tsai are analogous art because both pertain to domain adaptation of a neural network from one domain to another domain with dataset shift. 
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Tzeng with Tsai’s applying an equation accounting for a ratio for normalizing first and second outputs respectively from a first and a second neural network (Tsai, supra).  The modification exploits heterogeneous data across domains with the ability to identify the adaptation ability of each instance with properly assigned weight by determining a ratio that controls the portions of cross-domain landmarks to be exploited and normalized for adaptation (Tsai, p. 5084, left-hand column, ¶ 1: “Extended from (1), our proposed algorithm crossdomain landmark selection (CDLS) exploits heterogenous data across domains, with the ability to identify the adaptation ability of each instance with a properly assigned weight. Instances in either domain with a nonzero weight will be considered as a landmark.” P. 5088, left-hand column, ¶ 1: “Recall that the ratio in (4) controls the portion of crossdomain landmarks to be exploited for adaptation. Figure 6(c) presents the performance of CDLS with different values. It is worth repeating that, CDLS would be simplified as the supervised version CDLS sup if [Symbol font/0x64] = 0, as illustrated by the flat dotted line in Figure 6(c). As expected, using all ([Symbol font/0x64] = 1) or none ([Symbol font/0x64] = 0) of the cross-domain data as landmarks would not be able to achieve satisfactory HAD performance. Therefore, the choice of [Symbol font/0x64] = 0.5 would be reasonable in our experiments.”)

With respect to claim 10, it is substantially similar or identical to claim 2 and is rejected in the same manner, the same art and reasoning applying. 
 
With respect to claim 11, it is substantially similar or identical to claim 4 and is rejected in the same manner, the same art and reasoning applying. 

[Symbol font/0xB7]	Claim(s) 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tzeng et al. Adversarial Discriminative Domain Adaptation (17 Feb. 2017) (hereinafter Tzeng) in view of Tsai et al. Learning Cross-Domain Landmarks for Heterogeneous Domain Adaptation (2016) (hereinafter Tsai) and Csurka et al. USPGPub US20160078359A published on Mar. 17, 2016 (hereinafter Csurka) and further in view of Csurka, G. Domain Adapctation for Visual Applications: A Comprehensive Survey (30 Mar. 2017) (hereinafter Csurka 1).
With respect to claim 13, Tzeng modified by Tsai and Csurka teaches the apparatus of claim 12, wherein the first domain comprises real world video and the second domain comprises computer game video. (Curska 1, pp. 27-28, § 5, ¶ 3: “The recent progresses in computer graphics and modern high-level generic graphics platforms such as game engines enable to generate photo-realistic virtual worlds with diverse, realistic, and physically plausible events and actions. Popular virtual words are SYNTHIA37 [176], Virtual KITTI38 [177] and GTA-V [178] (see also Figure 23).” P. 28, § 5, ¶ 4: “The Cool Temporal Segment Network [189] is an end-to-end action recognition model for real-world target categories that combines a few examples of labeled real-world videos with a large number of procedurally generated synthetic videos. The model uses a deep multi-task representation learning architecture, able to mix synthetic and real videos even if the action categories differ between the real and synthetic sets (see Figure 24).”)

Tzeng, Tsai, Csurka, and Csurka 1 are analogous art because all four references pertain to domain adaptation of generative neural networks.
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Tzeng in view of Tsai to incorporate Csurka 1’s use of real-world videos and computer game videos for domain adaptation between a real-world domain and a synthetic domain (e.g., a computer video game domain) (Csurka 1, supra).  The modification not only provide great promise for deep learning across a variety of computer vision problems but also helps to adjust the models trained in one domain to the other domain, especially when no or few labeled examples are available (Csurka 1, p. 28, § 5, ¶ 2: “Such virtually generated and controlled environments come with different levels of labeling for free and therefore have great promise for deep learning across a variety of computer vision problems, including optical flow [179, 180, 181, 182], object trackers [183, 177], depth estimation from RGB [184], object detection [185, 186, 187] semantic segmentation [188, 176, 178] or human actions recognition [189].” P. 28, § 5, ¶ 3: “In most cases, the synthetic data is used to enrich the real data for building the models. However, DA techniques can further help to adjust the model trained with virtual data (source) to real data (target) especially when no or few labeled examples are available in the real domain [190, 191, 176, 189].”)

[Symbol font/0xB7]	Claim(s) 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tzeng et al. Adversarial Discriminative Domain Adaptation (17 Feb. 2017) (hereinafter Tzeng) in view of Tsai et al. Learning Cross-Domain Landmarks for Heterogeneous Domain Adaptation (2016) (hereinafter Tsai) and Csurka et al. USPGPub US20160078359A published on Mar. 17, 2016 (hereinafter Csurka) and further in view of Catanzaro et al. US PGPub 20170148431 published on May 25, 2017 (hereinafter Cantanzaro).
With respect to claim 14, Tzeng modified by Tsai and Csurka teaches the apparatus of claim 12 but does not appear to explicitly teach: 
wherein the first domain comprises information derived from a first voice and the second domain comprises information derived from a second voice. 

Catanzaro does, however, teach:  
wherein the first domain comprises information derived from a first voice and the second domain comprises information derived from a second voice. (Catanzaro, ¶ [0038]: “In embodiments, an English speech system was trained on 11,940 hours of speech, while a Mandarin system was trained on 9,400 hours. In embodiments, data synthesis was used to further augment the data during training.” The examiner notes that the English domain to which Catanzaro’s English speech system belongs teaches a first domain, and that the Mandarin domain to which Catanzaro’s Mandarin speech system belongs teaches a second domain.)
Tzeng, Tsai, Csurka, and Catanzaro are analogous art because all four references pertain to domain adaptation of neural networks.
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Tzeng in view of Tsai to incorporate Catanzaro’s adapting between first information derived from a first domain (e.g., an English domain) and second information derived from a second domain (e.g., a Mandarin domain) (Catanzaro, supra).  The modification not only recognizes speech of vastly different languages but can also be inexpensively deployed delivering low latency when serving users (Catanzaro, Abstract: “Embodiments of end-to-end deep learning systems and methods are disclosed to recognize speech of vastly different languages, such as English or Mandarin Chinese. In embodiments, the entire pipelines of hand-engineered components are replaced with neural networks, and the end-to-end learning allows handling a diverse variety of speech including noisy environments, accents, and different languages. Using a trained embodiment and an embodiment of a batch dispatch technique with GPUs in a data center, an end-to-end deep learning system can be inexpensively deployed in an online setting, delivering low latency when serving users at scale.”)

[Symbol font/0xB7]	Claim(s) 15-16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tzeng et al. Adversarial Discriminative Domain Adaptation (17 Feb. 2017) (hereinafter Tzeng) in view of Tsai et al. Learning Cross-Domain Landmarks for Heterogeneous Domain Adaptation (2016) (hereinafter Tsai) and Csurka et al. USPGPub US20160078359A published on Mar. 17, 2016 (hereinafter Csurka) and further in view of Chen et al. Large-Scale Visual Font Recognition (2014) (hereinafter Chen).
With respect to claim 15, Tzeng modified by Tsai and Csurka teaches the apparatus of claim 12 but does not appear to explicitly teach: 
wherein the first domain comprises standard font text and the second domain comprises cursive script. 
Chen does, however, teach:  
wherein the first domain comprises standard font text and the second domain comprises cursive script. (Chen, p. 3, § 2, ¶ 3: “For each font class, we generate one image per English word, which gives 2:42 million synthetic images for the whole dataset.” P. 3, § 2, ¶ 4: “Besides the synthetic data, we also collected 325 real world test images for the font classes we have in the training set”. p. 5, last paragraph: “When new data or font classes are added to the database, we only need to calculate the new class mean vectors, and estimate the within-class covariances to update the WCCN metric incrementally. As the template model is universally shared by all classes, the template weights do not need to be retrained.4 Therefore, our algorithm can easily adapt to new data or new classes at little added cost.” FIG. 6. “(a) Real world images that are correctly classified (rank one).”

    PNG
    media_image8.png
    225
    204
    media_image8.png
    Greyscale

	The examiner notes that in the above FIG. 6(a), a standard font (e.g., “Space Coast,” “Claude,” etc.) in a real-world domain teaches a first domain, and a cursive font (e.g., “Classic,” “of the way,” etc.) in a synthetic domain teaches a second domain.)

Tzeng, Tsai, Csurka, and Chen are analogous art because all four references pertain to dataset shift or mismatch of neural networks across multiple domains such as real-world domain and synthetic domain.
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Tzeng in view of Tsai to incorporate Chen’s recognizing fonts of different font styles (Chen, supra).  The modification enables Tzeng modified by Csurka, when modified by Chen, to automatically recognize the typeface without any knowledge of contents and to facilitate a scalable solution that not only accommodate new font styles that are constantly introduced over time but also provides effective solutions for both real-world and synthetic data (Chen, Abstract: “This paper addresses the large-scale visual font recognition (VFR) problem, which aims at automatic identification of the typeface, weight, and slope of the text in an image or photo without any knowledge of content. Although visual font recognition has many practical applications, it has largely been neglected by the vision community. To address the VFR problem, we construct a large-scale dataset containing 2,420 font classes, which easily exceeds the scale of most image categorization datasets in computer vision. As font recognition is inherently dynamic and open-ended, i.e., new classes and data for existing categories are constantly added to the database over time, we propose a scalable solution based on the nearest class mean classifier (NCM). The core algorithm is built on local feature embedding, local feature metric learning and max-margin template selection, which is naturally amenable to NCM and thus to such open-ended classification problems. The new algorithm can generalize to new classes and new data at little added cost. Extensive experiments demonstrate that our approach is very effective on our synthetic test images, and achieves promising results on real world test images”)
 
With respect to claim 16, Tzeng modified by Tsai and Csurka teaches the apparatus of claim 12, and Csurka further teaches: 
the apparatus comprising the at least one processor. (Csurka, ¶ [0042]: “The digital processor 30 can be variously embodied, such as by a single-core processor, a dual-core processor (or more generally by a multiple-core processor), a digital processor and cooperating math coprocessor, a digital controller, or the like. The exemplary digital processor 30, in addition to controlling the operation of the computer system 10, executes the instructions 28 stored in memory 26 for performing the method outlined in FIG. 3.” ¶ [0044]: “The term “software,” as used herein, is intended to encompass any collection or set of instructions executable by a computer or other digital system so as to configure the computer or other digital system to perform the task that is the intent of the software. The term “software” as used herein is intended to encompass such instructions stored in storage medium such as RAM, a hard disk, optical disk, or so forth, and is also intended to encompass so-called “firmware” that is software stored on a ROM or so forth.”)

Tzeng, Tsai, and Csurka are analogous art because all three references pertain to domain adaptation of a neural network from one domain to another domain with dataset shift. 
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Tzeng in view of Tsai and Csurka to further incorporate Csurka’s processor (Csurka, supra).  The modification provides the capability of configuring a computer or a digital system with the instructions to perform desired or required tasks (Csurka, ¶ [0044]: “The term “software,” as used herein, is intended to encompass any collection or set of instructions executable by a computer or other digital system so as to configure the computer or other digital system to perform the task that is the intent of the software. The term “software” as used herein is intended to encompass such instructions stored in storage medium such as RAM, a hard disk, optical disk, or so forth, and is also intended to encompass so-called “firmware” that is software stored on a ROM or so forth.”)

[Symbol font/0xB7]	Claim(s) 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tzeng et al. Adversarial Discriminative Domain Adaptation (17 Feb. 2017) (hereinafter Tzeng) in view of Tsai et al. Learning Cross-Domain Landmarks for Heterogeneous Domain Adaptation (2016) (hereinafter Tsai) and Csurka et al. USPGPub US20160078359A published on Mar. 17, 2016 (hereinafter Csurka) and further in view of Schlemper et al. US PGPub 20200034998 with effective filing date of 10/11/2018 (hereinafter Schlemper).
With respect to claim 17, Tzeng modified by Tsai and Csurka teaches the apparatus of claim 12 but does not appear to explicitly teach: 
wherein the CDBN module is operatively disposed after a fully connected layer in a spatial model. 

Schlemper does, however, teach: 
wherein the CDBN module is operatively disposed after a fully connected layer in a spatial model. (Schlemper, ¶ [0149]: “FIG. 5B illustrates the architecture of another example neural network model 520 for generating a magnetic resonance (MR) image from input MR spatial frequency data, in accordance with some embodiments of the technology described herein. Neural network 520 has a first neural network sub-model 522 with a batch normalization layer 507 following application of the fully connected layer and prior to the output of data from the first neural network sub-model 522 to the IFFT layer 508.” 
The examiner notes that Schlemper’s model 520 or the first neural network sub-model (522) for processing spatial frequency data teaches a spatial model, and that Schlemper’s batch normalization layer (507) teaches a CDBM module. The examiner further notes that Schlemper’s model 520 or the first neural network sub-model (522) comprising a batch normalization layer (507) that follows a fully connected layer teaches the above limitation.)
Tzeng, Tsai, Csurka, and Schlemper are analogous art because all four references pertain to domain adaptation of a neural network from one domain to another domain with heterogeneous data distributions. 
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Tzeng in view of Tsai and Csurka to incorporate Schlemper’s operatively disposing a CDBN module after a fully connected layer (Schlemper, supra).  The modification improves the performance of the underlying neural network and reduces training time (Schlemper, ¶ [0149]: “Introducing a batch normalization layer at this juncture improves the performance of the neural network and may reduce the time required for training. In other respects, neural network models 520 and 500 are the same.”)

[Symbol font/0xB7]	Claim(s) 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tzeng et al. Adversarial Discriminative Domain Adaptation (17 Feb. 2017) (hereinafter Tzeng) in view of Tsai et al. Learning Cross-Domain Landmarks for Heterogeneous Domain Adaptation (2016) (hereinafter Tsai) and Csurka et al. USPGPub US20160078359A published on Mar. 17, 2016 (hereinafter Csurka) and further in view of Bousmalis et al. Unsupervised Pixel–Level Domain Adaptation with Generative Adversarial Networks (Aug. 23, 2017) (hereinafter Bousmalis).
With respect to claim 20, Tzeng modified by Tsai and Csurka teaches the apparatus of claim 18 but does not appear to explicitly teach: 
wherein the instructions are executable to use entropy loss to separate unlabeled target data. 

Bousmalis does, however, teach: 
wherein the instructions are executable to use entropy loss to separate unlabeled target data. (Bousmalis, p. 3, § 3, ¶ 1: “We begin by explaining our model for unsupervised pixel-level domain adaptation (PixelDA) in the context of image classification, though our method is not specific to this particular task. Given a labeled dataset in a source domain and an unlabeled dataset in a target domain, our goal is to train a classifier on data from the source domain that generalizes to the target domain.” P. 3, § 3.1, ¶ 2: “Our goal is to optimize the following minimax objective: 

    PNG
    media_image9.png
    48
    466
    media_image9.png
    Greyscale

where α and β are weights that control the interaction of the losses.             
                
                    
                        L
                    
                    
                        d
                    
                
            
         represents the domain loss:

    PNG
    media_image10.png
    69
    531
    media_image10.png
    Greyscale

            
                
                    
                        L
                    
                    
                        t
                    
                
            
         is a task-specific loss, and in the case of classification we use a typical softmax cross–entropy loss:

    PNG
    media_image11.png
    78
    539
    media_image11.png
    Greyscale
”
	The examiner notes that Bousmalis’ unlabeled dataset in a target domain teaches unlabeled target data, and that Bousmalis’ using the cross-entropy loss to train a classifier using the cross-entropy loss and data on the source domain to classify the unlabeled dataset in the target domain teaches using entropy loss the separate unlabeled target data as claimed.)
	
Tzeng, Tsai, Csurka, and Bousmalis are analogous art because all four references pertain to domain adaptation of a neural network from one domain to another domain with heterogeneous data distributions. 
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Tzeng in view of Tsai and Csurka to further incorporate Bousmalis’ separating unlabeled data using entropy loss (Bousmalis, supra).  The modification greatly stabilizes training of a classifier by training the classifier with both non-adapted images and adapted images to optimize an objective function that is defined in terms of a entropy loss (Bousmalis, p. 3, right-hand column, § 3.1, ¶ 2: “Our goal is to optimize the following minimax objective: Eq. (1) (reproduction omitted)”; and “            
                
                    
                        L
                    
                    
                        t
                    
                
            
         is a task-specific loss, and in the case of classification we use a typical softmax cross–entropy loss: Eq. (3) (reproduction omitted).” P. 3, right-hand column, last paragraph – p. 4, left-hand column, first paragraph: “Notice that we train T with both adapted and non-adapted source images. When training T only on adapted images, it’s possible to achieve similar performance, but doing so may require many runs with different initializations due to the instability of the model. Indeed, without training on source as well, the model is free to shift class assignments (e.g. class 1 becomes 2, class 2 becomes 3 etc[.]) while still being successful at optimizing the training objective. We have found that training classifier T on both source and adapted images avoids this scenario and greatly stabilizes training (See Table 5).”)

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
A. 	Gretton et al. A Kernel Method for the Two-Sample Problem (15 May 2008) Gretton et al. A Kernel Method for the Two-Sample Problem (15 May 2008) teaches a framework for analyzing and comparing heterogeneous distributions by obtaining a KL divergence estimate by approximating the ratio of densities (or its log) of two heterogeneous domains with a function in an RKHS (reproducing kernel Hilbert space).
B.	Simoyan et al. Two-Stream Convolutional Networks for Action Recognition in Videos (Nov. 12, 2014) teaches two-stream convolutional networks for action recognition that includes a spatial stream convolution network operating on individual video frames and a temporal stream convolution network that exploits motions.  
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERICH C. TZOU whose telephone number is (571)272-9852. The examiner can normally be reached Monday-Friday 6:00AM-5:00PM PST with alternative Fridays off.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann J. Lo can be reached on 571-272-9767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/E.C.T./Examiner, Art Unit 2126
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126