Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
The amendment filed on 8/23/22 has been entered and made of record. Claims 1, 12, 20, and 27 are amended. Claims 2, 13, and 21 are cancelled. Claims 1, 3-12, 14-20, and 22-27 are pending.

Response to Arguments
Applicant’s arguments with respect to claims 1, 12 and 20 have been considered but they are not persuasive.
Applicant asserts that Herbster relates to a "quantum machine learning" approaches that make use of a quantum computing device (e.g., a "programmable quantum annealing device" as recited in paragraph [0009] of Hubster), and a person having ordinary skill in the art before the effective filing date of the present application would understand that it would not be obvious how one would combine techniques that make use of classical adversarial machine learning, such as those described in Shen, with the quantum machine learning approaches described in Hubster, as the techniques rely on fundamentally different types of computing devices (classical computing devices versus quantum computing devices). As such, Applicant respectfully submits that there is no apparent reason why a person having ordinary skill in the art before the effective filing date of the present application would combine the teachings of Shen and Hubster to arrive at the claimed embodiments of claims 1, 12 and 20 of the application as filed (p. 16 of Remarks).
Examiner notices that Shen discloses data samples from an original domain and corresponding labels in neural network at p. 3; mapping different domains to a common latent space where the feature distribution are close at p. 2; estimate the empirical Wasserstein distance between the source and target feature representations at p.1. Examiner cites Hubster to teach some well-known features in neural network application, such as, machine learning approach applied on robotics application in [0002]; encoder parameters in [0068] and classifier parameters in [0048]; using the encoder part to generate a feature vector in [0015]; calculate an appropriate loss of function which measures a difference between the updated feature vector generated by the RBM and the original feature vector input to the input layer of the RBM by the encoder in [0073]. These teaching from Hubster doesn’t depend on a quantum machine learning approach. Adding these features are merely the combination of known elements to perform their known function of neural network approach. "The combination of familiar elements according to known methods is likely to be obvious when it does no more than yield predictable results." KSR Int 'l Co. v. Teleflex Inc., 550 U.S. 398,416 (2007).

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(d):
(d) REFERENCE IN DEPENDENT FORMS.—Subject to subsection (e), a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.

The following is a quotation of pre-AIA  35 U.S.C. 112, fourth paragraph:
Subject to the following paragraph [i.e., the fifth paragraph of pre-AIA  35 U.S.C. 112], a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.

Claim 4 is rejected under 35 U.S.C. 112(d) or pre-AIA  35 U.S.C. 112, 4th paragraph, as being of improper dependent form for failing to further limit the subject matter of the claim upon which it depends, or for failing to include all the limitations of the claim upon which it depends.  
Claim 4 repeats the same limitations as its parent claim 1, which fail to further the claim scope of its parent claim 1. Applicant may cancel the claim, amend the claim to place the claim in proper dependent form, rewrite the claim in independent form, or present a sufficient showing that the dependent claim complies with the statutory requirements.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3-8, 11-12, 14-16, 19-20,22-24 and 27 are rejected under 35 U.S.C. 103 as being unpatentable over IDS_Shen et al. (Wasserstein Distance Guided Representation Learning for Domain Adaptation) in view of Herbster et al. (US 2020/0005154) and IDS_Kolouri et al. (Sliced Wasserstein Kernels for Probability Distributions).
As to Claim 1, Shen teaches A method for training a controller to control a robotic system in a target domain (Shen, Abstract), the method comprising: 
receiving a neural network of an original controller for controlling the robotic system based on a plurality of origin data samples from an origin domain and corresponding labels in a label space the neural network of the original controller comprising a plurality of encoder parameters and a plurality of classifier parameters (Shen discloses
 
    PNG
    media_image1.png
    218
    692
    media_image1.png
    Greyscale
;
“In our adversarial representation learning approach, there is a feature extractor which can be implemented by a neural network” at p. 3. Herbster further discloses machine learning approached to computer vision can be applied on robotics application in [0002]; encoder parameters (biases and weights) in [0068] and classifier parameters in [0048, 0107]), the neural network being trained to:
map an input data sample from the origin domain to a feature vector in a feature space in accordance with the encoder parameters; and assign a label of the label space to the input data sample based on the feature vector in accordance with the classifier parameters (Shen discloses “while symmetric feature-based methods map different domains to a common latent space where the feature distributions are close” at p. 2;

    PNG
    media_image2.png
    118
    679
    media_image2.png
    Greyscale

     
    PNG
    media_image3.png
    61
    685
    media_image3.png
    Greyscale
 at p. 3; see also Fig 1 and Algorithm 1 at p. 4; labeling function at p. 5. Herbster also discloses “The method may include using the encoder part to generate a feature vector Ф(x) \displaystyle \phi (x) which is a compressed representation of the image data” in [0015]: “The encoder part is arranged to receive input image data (4) at its input layer and to compress the input image data into a feature vector containing an abstract representation of the most relevant extracted features of the input image” in [0066]; “The classifier is a neural network that takes the encoded/compressed image data as input and outputs a label corresponding to the class” in [0092], “The classifier outputs a label corresponding to the class that has been assigned to the image” in [0096]);
updating the encoder parameters to minimize a dissimilarity, in the feature space, between: a plurality of origin feature vectors computed from the origin data samples; and a plurality of target feature vectors computed from a plurality of target data samples from the target domain (Shen discloses “In this paper, we propose a domain invariant representation learning approach to reduce domain discrepancy for domain adaptation, namely Wasserstein Distance Guided Representation Learning (WDGRL)… WDGRL trains a domain critic network to estimate the empirical Wasserstein distance between the source and target feature representations. The feature extractor network will then be optimized to minimize the estimated Wasserstein distance in an adversarial manner. By iterative adversarial training, we finally learn feature representations invariant to the covariate shift between domains” at p. 1. Herbster also discloses “The quantum pre-training is arranged to calculate an appropriate loss of function which measures a difference between the updated feature vector generated by the RBM and the original feature vector input to the input layer of the RBM by the encoder. It is arranged to use the value of that loss function in adjusting the biases and weights applied to the nodes of the RBM in such a way as to minimize the value of the loss function thereby optimizing the accuracy of the updated feature vector generated by the RBM” in [0073]), 
the target data samples having a smaller cardinality than the origin data samples (Shen discloses “Domain adaptation defines the problem when the target domain labeled data is insufficient, while the source domain has much more labeled data” at p. 1); and 
updating the controller with the updated encoder parameters to control the robotic system in the target domain (Shen discloses “Since WDGRL guarantees transferability of the learned representations, the shared discriminator can be directly applied to target domain prediction when training finished” at p. 4. Herbster also discloses “The quantum pre-training is arranged to calculate an appropriate loss of function which measures a difference between the updated feature vector generated by the RBM and the original feature vector input to the input layer of the RBM by the encoder. It is arranged to use the value of that loss function in adjusting the biases and weights applied to the nodes of the RBM in such a way as to minimize the value of the loss function thereby optimizing the accuracy of the updated feature vector generated by the RBM” in [0073]; machine learning approached to computer vision can be applied on robotics application in [0002]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the invention of Shen with the teaching of Herbster so as to provide a trained machine used to initialize a neural network for image classification thereby providing a trained computer system for use in classification of an image (Herbster, Abstract).
Shen teaches “minimize the estimated Wasserstein distance” in Abstract. However, Shen and Herbster don’t explicitly teach a sliced Wasserstein distance. The combination of Kolouri further teaches following limitation:
wherein the dissimilarity is computed in accordance with a sliced Wasserstein distance between the origin feature vectors in the feature space and the target feature vectors in the feature space (Shen discloses “In this paper, we propose a domain invariant representation learning approach to reduce domain discrepancy for domain adaptation, namely Wasserstein Distance Guided Representation Learning (WDGRL)… WDGRL trains a domain critic network to estimate the empirical Wasserstein distance between the source and target feature representations.” At p. 1. Kolouri further discloses “In this paper, we exploit the widely used kernel methods and provide a family of provably positive definite kernels based on the Sliced Wasserstein distance” in Abstract; see also section 2.2 The Sliced Wasserstein distance at p. 5260.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the invention of Shen and Herbster with the teaching of Kolouri because the Sliced Wasserstein distance satisfies the basic requirements for being used as a positive definite kernel in a variety of regression-based pattern recognition tasks, and have concrete theoretical and practical advantages (Kolouri, p. 5259).

As to Claim 3, Shen in view of Herbster and Kolouri teaches The method of claim 1, wherein the updating the encoder parameters comprises iteratively computing a plurality of intermediate encoder parameters, each iteration comprising: 
computing the origin feature vectors in the feature space (Herbster discloses “The method may include using the encoder part to generate a feature vector Ф (x) \displaystyle \phi (x) which is a compressed representation of the image data” in [0015]); 
computing the target feature vectors in the feature space in accordance with the intermediate encoder parameters; computing the dissimilarity between the origin feature vectors and the target feature vectors (Herbster discloses “The encoder part is arranged to receive input image data (4) at its input layer and to compress the input image data into a feature vector containing an abstract representation of the most relevant extracted features of the input image. This feature vector is then input to the input layer of the decoder (6) and subsequently passed through the layers of the decoder so as to de-compress the feature vector and reproduce a close approximation of the input image” in [0066]. Here, the encoder parameters used to reproduce a close approximation of input image refers to “intermediate encoder parameters”. Herbster also discloses “	An input image is first compressed, using the trained encoder. This produces a small compressed vector of data that is input to the classifier” in [0095]; “The quantum pre-training is arranged to calculate an appropriate loss of function which measures a difference between the updated feature vector generated by the RBM and the original feature vector input to the input layer of the RBM by the encoder” in [0073]; see also Fig 1); 
updating the intermediate encoder parameters to reduce the dissimilarity between the origin feature vectors and the target feature vectors (Herbster discloses “When training the RBM one may aim to find the values of the biases and the weights of the nodes of the RBM, which maximize the value of a log-likelihood function in respect of the training data applied to the RBM during its training” in [0040]; “and to input the value of that loss function into the encoder for use by the encoder in adjusting the biases and weights applied to the notes of the encoder in such a way as to minimize the value of the loss function thereby optimizing the accuracy of the compressed representation of the input image produced by the decoder” in [0068]); 
determining whether the dissimilarity is minimized; in response to determining that the dissimilarity is not minimized, proceeding with another iteration with the updated intermediate encoder parameters as the intermediate encoder parameters and in response to determining that the dissimilarity is minimized, outputting the intermediate encoder parameters as the updated encoder parameters (Shen discloses “we first train the domain critic network to optimality by optimizing the max operator via gradient ascent and then update the feature extractor by minimizing the classification loss computed by labeled source data and the estimated Wasserstein distance simultaneously. The learned representations can be domain invariant and target discriminative since the parameter θg receives the gradients from both the domain critic and the discriminator loss” at p. 4. Herbster further discloses “It is arranged to use the value of that loss function in adjusting the biases and weights applied to the nodes of the RBM in such a way as to minimize the value of the loss function thereby optimizing the accuracy of the updated feature vector generated by the RBM. In this sense, this iterative process of repeatedly updating and optimizing the feature vector input to the RBM, by repeated sampling from the Boltzmann distribution generated by the quantum annealer and subsequent back-propagation through the RBM for comparison with the original feature vector, is analogous to the operation of an auto-encoder in which the RBM serves as both the encoder and decoder parts” in [0073]).

Claim 4 is rejected based upon similar rationale as Claim 2.

As to Claim 5, Shen in view of Herbster and Kolouri teaches The method of claim 3, wherein the computing the origin feature vectors is performed by an origin encoder (Herbster, Fig 1 & 3).

As to Claim 6, Shen in view of Herbster and Kolouri teaches The method of claim 3, wherein the computing the origin feature vectors is performed in accordance with the intermediate encoder parameters (Herbster discloses “…and to input the value of that loss function into the encoder for use by the encoder in adjusting the biases and weights applied to the notes of the encoder in such a way as to minimize the value of the loss function thereby optimizing the accuracy of the compressed representation of the input image produced by the decoder” in [0068], see also Fig 1).

As to Claim 7, Shen in view of Herbster and Kolouri teaches The method of claim 1, wherein the target data samples comprise a plurality of target samples and a plurality of corresponding target labels (Shen discloses “Domain adaptation defines the problem when the target domain labeled data is insufficient, while the source domain has much more labeled data” at p. 1).

As to Claim 8, Shen in view of Herbster and Kolouri teaches The method of claim 1, wherein the target data samples comprise a plurality of unlabeled target samples (Shen discloses “Domain adaptation is a popular subject in transfer learning (Pan and Yang 2010). It concerns covariate shift between two data distributions, usually labeled source data and unlabeled target data” at p. 2).

As to Claim 11, Shen in view of Herbster and Kolouri teaches The method of claim 1, wherein the neural network comprises a convolutional neural network, a recurrent neural network, a capsule network, or combinations thereof (Shen teaches convolutional neural network at p. 6; back-propagation training algorithm to recurrent neural network at p. 4.)

Claim 12 recites similar limitations as claim 1 but in a computer readable medium form. Therefore, the same rationale used for claim 1 is applied.

Claim 14 is rejected based upon similar rationale as Claim 3.
Claim 15 is rejected based upon similar rationale as Claim 7.
Claim 16 is rejected based upon similar rationale as Claim 8.

Claim 19 is rejected based upon similar rationale as Claim 11.
Claim 20 recites similar limitations as claim 1 but in a computer readable medium form. Therefore, the same rationale used for claim 1 is applied.

Claim 22 is rejected based upon similar rationale as Claim 3.
Claim 23 is rejected based upon similar rationale as Claim 7.
Claim 24 is rejected based upon similar rationale as Claim 8.
Claim 27 is rejected based upon similar rationale as Claim 11.


Claims 9-10, 17-18 and 25-26 are rejected under 35 U.S.C. 103 as being unpatentable over IDS_Shen in view of Herbster, Kolouri and Zou et al. (US 2019/0130220).
Claim 9 recites similar limitations as claims 3 & 6 except following limitations:
computing predicted labels for the target feature vectors in accordance with the classifier parameters, each of the predicted labels being associated with a confidence; defining a plurality of pseudo-labels corresponding to the predicted labels having confidences exceeding a threshold (Shen discloses target domain prediction at p. 4. Zou further discloses “The method includes determining a target segmentation loss for training a neural network to perform semantic segmentation on a target domain image, determining a value of a pseudo-label of the target image by reducing the target segmentation loss while providing a supervision of the training over the target domain” in [0004]; “Since the target domain ground truths are not available, self-training based domain adaptation generates network predictions on target images and incorporates the most confident predictions in network training as approximated target ground truths (herein referred to as pseudo-labels). Once the network parameters are updated, the updated network regenerates the pseudo-labels on target images, and incorporate them for a next round of network training. This process is iteratively repeated for multiple rounds. Mathematically, each round of pseudo-label generation and network training can be formulated as minimizing the loss function shown in Eq. (2)” in [0038]. Here, official notice has been taken of the fact that, “most confident” means the confidence exceeding a threshold, are well-known in the art (see MPEP 2144.03). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the invention of Shen, Herbster and Kolouri with the teaching of Zou so as to determine a value of a pseudo-label of the target image by reducing the target segmentation loss while providing a supervision of the training over the target domain (Zou, Abstract).

As to Claim 10, Shen in view of Herbster, Kolouri and Zou teaches The method of claim 9, wherein the updating the intermediate encoder parameters alternates between:
the minimizing the dissimilarity between the origin feature vectors and the target feature vectors (Herbster discloses “The quantum pre-training is arranged to calculate an appropriate loss of function which measures a difference between the updated feature vector generated by the RBM and the original feature vector input to the input layer of the RBM by the encoder. It is arranged to use the value of that loss function in adjusting the biases and weights applied to the nodes of the RBM in such a way as to minimize the value of the loss function thereby optimizing the accuracy of the updated feature vector generated by the RBM” in [0073]); and
the minimizing the classification loss of the origin data samples (Shen discloses “for domain adaptation to make the source and target feature representations indistinguishable” at p. 1; minimize the estimated Wasserstein distance between the source and target samples in Abstract).

Claim 17 is rejected based upon similar rationale as Claim 9.
Claim 18 is rejected based upon similar rationale as Claim 10.

Claim 25 is rejected based upon similar rationale as Claim 9.
Claim 26 is rejected based upon similar rationale as Claim 10.

Conclusion
THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WEIMING HE whose telephone number is (571)270-1221.  The examiner can normally be reached on Monday-Friday, 8:30am-5:00pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Mehmood can be reached on 571-272-2976. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/Weiming He/
Primary Examiner, Art Unit 2612