Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claim 1 is objected to because of the following informalities:  
Claim 1, Line 12: “group truth” should read “ground truth”.  
Appropriate correction is required.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claim 19 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because the “computer program embodied on computer-readable storage” can be construed as software/data per se as computer-readable storage can include transitory forms of storage. Furthermore, the specifications do not explicitly exclude transitory signals or forms of storage as possible storage media for the computer program. A recommended remedy is to have the computer program embodied in a “non-transitory” computer-readable media.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-2, 6, 17, and 19-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Lee et al. “Estimation of Individual Treatment Effect In Latent Confounder Models via Adversarial Learning”, arXiv preprint arXiv:1811.08943 (2018).; hereafter: Lee).

    PNG
    media_image1.png
    112
    971
    media_image1.png
    Greyscale
Regarding Claim 1, Lee teaches: a computer-implemented method of machine learning (Pages: 2-3), the method comprising: receiving a plurality of observed data points each comprising a respective vector of feature values, wherein for each observed data point, the respective feature values are values of a plurality of different features of a feature vector, and each observed data point represents a respective observation of a ground truth as observed in the form of the respective values of the feature vector; 
(Page 2: Causal Effect with Latent Confounders: the observational dataset corresponds to the observed data points where both contain a plurality of feature vectors with respective feature values.)

    PNG
    media_image2.png
    112
    953
    media_image2.png
    Greyscale
learning parameters of a machine-learning model based on the observed data points
(Page 6: Appendix: Optimization of CEGAN)

    PNG
    media_image3.png
    364
    895
    media_image3.png
    Greyscale
wherein the machine-learning model comprises one or more statistical models arranged to model a causal relationship between the feature vector and a latent vector, a classification and a manipulation vector

(Page 2: Figure 2: Figure shows modeling a relationship between the feature vector, x, the latent vector, z, classification, y, and manipulation vector, t.)

    PNG
    media_image4.png
    232
    1023
    media_image4.png
    Greyscale
the manipulation vector representing an effect of potential manipulations occurring between the ground truth and the observation of the group truth as observed via said feature vector, 

(Page 1: Figure 1 shows causal mappings between x, t, y, and z, where the manipulation vector, t, affects the outcome of y given the feature vector, x.)
wherein the learning comprises learning parameters of the one or more statistical models to map between the feature vector, latent vector, classification and manipulation vector.

    PNG
    media_image2.png
    112
    953
    media_image2.png
    Greyscale

(Page 6: Appendix: Optimization of CEGAN. This section additionally describes the optimization of mapping the feature, latent, and manipulation vector and classification.)


    PNG
    media_image2.png
    112
    953
    media_image2.png
    Greyscale
Regarding Claim 2, Lee teaches: the method of claim 1, wherein the learning comprises at least a training phase wherein each of the data points used in the training phase further comprises a respective value of the classification.
(Page 6: Appendix: Optimization of CEGAN. This section additionally describes the optimization of mapping the feature, latent, and manipulation vector and classification.)

    PNG
    media_image3.png
    364
    895
    media_image3.png
    Greyscale
Regarding Claim 6, Lee teaches: the method of claim 1, wherein the statistical models comprises one or more first statistical models and one or more second statistical models, wherein the one or more second statistical models are arranged to model the causal relationship between the manipulation vector and the feature vector.

(Page 2: Figure 2: Figure shows modeling a relationship between the feature vector, x, the latent vector, z, classification, y, and manipulation vector, t. The right side of the figure shows the modeling of the feature vector, x, and manipulation vector, t.)
Regarding Claim 17, Lee teaches: the method of claim 1, wherein each of any one, some or all of said statistical models is a neural network, in which said parameters are weights (Page 7: Simulation Settings: “A dropout probability of 0.6 is assumed and Xavier and zero initializations are applied for weight matrices and bias vectors, respectively”; This section additionally describes the architecture of their model being a fully-connected neural network. 
Regarding Claim 19, Claim 19 recites a computer program embodied on computer-readable storage that implements the methods of Claim 1. Therefore, the rejection of Claim 1 is equally applied (Page 7: Simulation Settings: suggests that the methods described are implemented on a computer system with a computer program.).
Regarding Claim 20, Claim 20 recites a computer system that implements the method of Claim 1. Therefore, the rejection of Claim 1 is equally applied (Page 7: Simulation Settings: suggests that the methods described are implemented on a computer system with a computer program.).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 3-5 and 7-16 are rejected under 35 U.S.C. 103 as being unpatentable over Lee as applied to claims above, and further in view of Kocaoglu, Murat, et al. "Causalgan: Learning causal implicit generative models with adversarial training." arXiv preprint arXiv:1709.02023 (2017).; hereafter: Kocaoglu).
Regarding Claim 3, Lee teaches: the method of claim 1, but does not explicitly disclose wherein the observed data points comprise a first group of the data points which do not include the effect of at least one manipulation, and a second group of said data points which do include the effect of the at least one manipulation.
In a related art, Kocaoglu teaches: wherein the observed data points comprise a first group of the data points which do not include the effect of at least one manipulation, and a second group of said data points which do include the effect of the at least one manipulation (Section 5: Causal Generative Adversarial Networks: “First, train a generative model over the labels, then train a generative model for the images conditioned on the labels.”; suggests that there are two separate data groups where the first group has no conditioning/manipulation and the second does.) for separating observed data points based on manipulation and conditional effects.
Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified Lee with the above teachings of Kocaoglu to incorporate the separation of different data points into two groups. The motivation in doing so would lie in the ability to train a model using both data without causal relationships accounted and data with relationships accounted.
Regarding Claim 4, Lee, in view of Kocaoglu, further teaches: the method of claim 3, wherein: the learning comprises at least a training phase wherein each of the data points used in the training phase further comprises a respective value of the classification; the learning further comprises a fine-tuning phase following the training phase, wherein each of the data points used in the training phase is not labelled with a value of the classification; and the data points used in the training phase comprise the first group, and the data points used in the fine-tuning phase comprise the second group (Kocaoglu: Section 5: Causal Generative Adversarial Networks: “First, train a generative model over the labels, then train a generative model for the images conditioned on the labels.”; Kocaoglu discloses a two-step training method where a model is trained using data without conditioning/causal relationship and further refined using conditional/causal data.)
Regarding Claim 5, Lee, in view of Kocaoglu, teaches: the method of claim 4, wherein the statistical models comprises one or more first statistical models and one or more second statistical models, wherein the one or more second statistical models are arranged to model the causal relationship between the manipulation vector and the feature vector; 

    PNG
    media_image3.png
    364
    895
    media_image3.png
    Greyscale



(Lee: Page 2: Figure 2: Figure shows modeling a relationship between the feature vector, x, the latent vector, z, classification, y, and manipulation vector, t. The right side of the figure shows the modeling of the feature vector, x, and manipulation vector, t.)
a) when learning based on the first group of data points, the manipulation vector is set to a null value, and the parameters of the one or more first statistical models are learned whilst the parameters of the one or more second statistical models are fixed (Kocaoglu: Section 5: Causal Generative Adversarial Networks: “First, train a generative model over the labels, then train a generative model for the images conditioned on the labels.”; training without conditional means that the manipulation vector is empty/null), whereas b) when learning based on the second group of data points, the manipulation vector is either set to a known value representing the at least one manipulation if known or the manipulation vector is inferred if the at least one manipulation is not known, and the parameters of the at least one or more second statistical models are learned (Kocaoglu: Section 5: Causal Generative Adversarial Networks: “First, train a generative model over the labels, then train a generative model for the images conditioned on the labels.”; conditioned learning means the manipulation vector is not empty. The right side of Lee: Figure 2 further discloses an inference subnetwork to help determine manipulations.).
Regarding Claim 7, Lee, in view of Kocaoglu, further teaches: the method of claim 1, wherein the one or more statistical models comprise one or more generative models mapping from the classification, latent vector and manipulation vector as inputs to the feature vector as an output; 

    PNG
    media_image3.png
    364
    895
    media_image3.png
    Greyscale
(Lee: Page 2: Figure 2: Figure shows modeling a relationship between the feature vector, x, the latent vector, z, classification, y, and manipulation vector, t. The left side of the figure shows the modeling of the feature vector, x, and manipulation vector, t, outcome, y, and latent vector, z.) 
While Lee does not explicitly discloses mapping the classification, manipulation vector, and latent vector to the feature vector, Kocaoglu discloses a causal generative model can be generated for a given a causal graph (Kocaoglu: Abstract: “We propose an adversarial training procedure for learning a causal implicit generative model for a given causal graph.”). Kocaoglu therefore suggests that the different vectors of a generative model can combine to form different combinations of inputs including mapping the classification, manipulation vector, and latent vector to the feature vector. The motivation in doing so would lie in generating and using causal generative models for specific causal graphs to analyze different causal relationships.

    PNG
    media_image2.png
    112
    953
    media_image2.png
    Greyscale
the learning comprising learning parameters of the one or more generative models which map the classification, latent vector and manipulation to the feature vector.
(Lee: Page 6: Appendix: Optimization of CEGAN. This section additionally describes the optimization of mapping the feature, latent, and manipulation vector and classification.)
Regarding Claim 8, Lee, in view of Kocaoglu, teaches: the method of claim 7, wherein: the statistical models comprises one or more first statistical models and one or more first statistical models and one or more second statistical models, (Lee: Figure 2)

    PNG
    media_image3.png
    364
    895
    media_image3.png
    Greyscale

wherein the one or more statistical models are arranged to model the causal relationship between the manipulation vector and the feature vector (Lee: Page 2: Figure 2: Figure shows modeling a relationship between the feature vector, x, the latent vector, z, classification, y, and manipulation vector, t. The right side of the figure shows the modeling of the feature vector, x, and manipulation vector, t.); the one or more first statistical models comprise a first one or more said generative models which take the latent vector and classification but no the manipulation vector as respective inputs (Kocaoglu: Abstract: “We propose an adversarial training procedure for learning a causal implicit generative model for a given causal graph.”; the inputs and vectors required for a causal implicit generative model is dependent on a given causal graph which can include any number and arrangement of elements.); and the one or more second statistical models comprise a second, separate one of said generative models which takes the manipulation vector as respective input but not the latent vector nor the classification (Kocaoglu: Abstract: “We propose an adversarial training procedure for learning a causal implicit generative model for a given causal graph.”; the inputs and vectors required for a causal implicit generative model is dependent on a given causal graph which can include any number and arrangement of elements.); wherein each of the first and second generative models is configured to map its respective input to a respective output, the outputs of the first and second generative models being mapped to the feature vector (Kocaoglu: Abstract: “We propose an adversarial training procedure for learning a causal implicit generative model for a given causal graph.”; the inputs and vectors required for a causal implicit generative model is dependent on a given causal graph which can include any number and arrangement of elements.; Lee: Figure 2 shows two separate generative models mapping their inputs to an output.).
Regarding Claim 9, Lee, in view of Kocaoglu, teaches: the method of claim 8, wherein: the first generative models comprise a generative model taking the classification as a respective input but not the latent vector nor the manipulation vector (Kocaoglu: Abstract: “We propose an adversarial training procedure for learning a causal implicit generative model for a given causal graph.”; the inputs and vectors required for a causal implicit generative model is dependent on a given causal graph which can include any number and arrangement of elements.), and a separate generative model which takes the latent vector as an input but no the classification nor the manipulation vector (Kocaoglu: Abstract: “We propose an adversarial training procedure for learning a causal implicit generative model for a given causal graph.”; the inputs and vectors required for a causal implicit generative model is dependent on a given causal graph which can include any number and arrangement of elements.).

    PNG
    media_image3.png
    364
    895
    media_image3.png
    Greyscale
Regarding Claim 10, Lee, in view of Kocaoglu, teaches: the method of claim 8, wherein the first statistical models further comprise another of said generative models arranged as a merging generative model, mapping the outputs of the first and second generative networks to the feature vector via the merging generative model.
(Lee: Page 2: Figure 2: Figure shows modeling a relationship between the feature vector, x, the latent vector, z, classification, y, and manipulation vector, t. The discriminator in Figure 2 combines the outputs of the two generative models on either side of it.)
Regarding Claim 11, Lee, in view of Kocaoglu, teaches: the method of claim 8, wherein: the observed data points comprise a first group of the data points which do not include the effect of at least one manipulation, and a second group of said data points which do include the effect of at least one manipulation (Kocaoglu: Section 5: Causal Generative Adversarial Networks: “First, train a generative model over the labels, then train a generative model for the images conditioned on the labels.”; suggests that there are two separate data groups where the first group has no conditioning/manipulation and the second does.); the learning comprises at least a training phase wherein each of the data points used in the training phase further comprises a respective value of the classification; the learning further comprises a fine-tuning phase following the training phase, wherein each of the data points used in the training phase is not labelled with a value of the classification; the data points used in the training phase comprise the first group, and the data points used in the fine-tuning phase comprise the second group (Kocaoglu: Section 5: Causal Generative Adversarial Networks: “First, train a generative model over the labels, then train a generative model for the images conditioned on the labels.”; Kocaoglu discloses a two-step training method where a model is trained using data without conditioning/causal relationship and further refined using conditional/causal data.); the statistical models comprises one or more first statistical models and one or more second statistical models, wherein the one or more second statistical models are arranged to model the causal 
    PNG
    media_image3.png
    364
    895
    media_image3.png
    Greyscale
relationship between the manipulation vector and the feature vector;
(Lee: Page 2: Figure 2: Figure shows modeling a relationship between the feature vector, x, the latent vector, z, classification, y, and manipulation vector, t. The right side of the figure shows the modeling of the feature vector, x, and manipulation vector, t.)
a) when learning based on the first group of data points, the manipulation vector is set to a null value, and the parameters of the one or more first generative models and the merging generative model are learned, whilst the parameters of the second generative model are fixed (Kocaoglu: Section 5: Causal Generative Adversarial Networks: “First, train a generative model over the labels, then train a generative model for the images conditioned on the labels.”; training without conditional means that the manipulation vector is empty/null) whereas b) when learning based on the second group of data points, the manipulation vector is either set to a known value representing the at least one manipulation if known or the manipulation vector is inferred if the at least one manipulation is not known, and the parameters of at least the second generative model are learned (Kocaoglu: Section 5: Causal Generative Adversarial Networks: “First, train a generative model over the labels, then train a generative model for the images conditioned on the labels.”; conditioned learning means the manipulation vector is not empty. The right side of Lee: Figure 2 further discloses an inference subnetwork to help determine manipulations.).
Regarding Claim 12, Lee, in view of Kocaoglu, teaches: the method of claim 1, wherein the one or more statistical models comprise one or more inference models mapping from the classification, feature vector and manipulation vector as inputs to the latent vector as an output

    PNG
    media_image3.png
    364
    895
    media_image3.png
    Greyscale

(Lee: Page 2: Figure 2: Figure shows modeling a relationship between the feature vector, x, the latent vector, z, classification, y, and manipulation vector, t. The right side of the figure shows the modeling of the feature vector, x, and manipulation vector, t, using an inference subnetwork. Kocaoglu: Abstract: “We propose an adversarial training procedure for learning a causal implicit generative model for a given causal graph.”; the inputs and vectors required for a causal implicit generative model is dependent on a given causal graph which can include any number and arrangement of elements.)

    PNG
    media_image2.png
    112
    953
    media_image2.png
    Greyscale
the learning comprising learning parameters of the one or more inference models which map the classification, feature vector and manipulation vector to the latent vector.

(Lee: Page 6: Appendix: Optimization of CEGAN. This section additionally describes the optimization of mapping the feature, latent, and manipulation vector and classification. Lee: Figure 2 further discloses an inference subnetwork to map the manipulation and feature vector to a latent vector)

    PNG
    media_image3.png
    364
    895
    media_image3.png
    Greyscale
Regarding Claim 13, Lee, in view of Kocaoglu, teaches: the method of claim 12, wherein the statistical models comprises one or more first statistical models and one or more second statistical models, wherein the one or more second statistical models are arranged to model the causal relationship between the manipulation vector and the feature vector (Lee: Figure 2); 

and the one or more first statistical models comprises at least a first of said inference models mapping from the classification, feature vector and manipulation vector to the latent vector (Lee: Figure 2 further discloses an inference subnetwork on the right-hand side to map the manipulation and feature vector to a latent vector. Kocaoglu: Abstract: “We propose an adversarial training procedure for learning a causal implicit generative model for a given causal graph.”; the inputs and vectors required for a causal implicit generative model is dependent on a given causal graph which can include any number and arrangement of elements.).
Regarding Claim 14, Lee, in view of Kocaoglu, teaches: the method of claim 13, wherein the one or more second statistical models comprise at least a second, separate one of said inference models mapping from the feature vector to the manipulation vector (Kocaoglu: Abstract: “We propose an adversarial training procedure for learning a causal implicit generative model for a given causal graph.”; the inputs and vectors required for a causal implicit generative model is dependent on a given causal graph which can include any number and arrangement of elements).
Regarding Claim 15, Lee, in view of Kocaoglu, teaches: the method of claim 14, wherein the observed data points comprise a first group of data points which do not include the effect of at least one manipulation, and a second group of said data points which do include the effect of the at least one manipulation (Kocaoglu: Section 5: Causal Generative Adversarial Networks: “First, train a generative model over the labels, then train a generative model for the images conditioned on the labels.”; suggests that there are two separate data groups where the first group has no conditioning/manipulation and the second does.); the learning comprises at least a training phase wherein each of the data points used in the training phase further comprises a respective value of the classification; the learning further comprises a fine-tuning phase following the training phase, wherein each of the data points used in the training phase is not labelled with a value of the classification; the data points used in the training phase comprise the first group, and the data points used in the fine-tuning phase comprise the second group (Kocaoglu: Section 5: Causal Generative Adversarial Networks: “First, train a generative model over the labels, then train a generative model for the images conditioned on the labels.”; Kocaoglu discloses a two-step training method where a model is trained using data without conditioning/causal relationship and further refined using conditional/causal data.); the statistical model comprises one or more first statistical models and one or more second statistical models, wherein the one or more second statistical models are arranged to model the causal 
    PNG
    media_image3.png
    364
    895
    media_image3.png
    Greyscale
relationship between the manipulation vector and the feature vector; 
(Lee: Page 2: Figure 2: Figure shows modeling a relationship between the feature vector, x, the latent vector, z, classification, y, and manipulation vector, t. The right side of the figure shows the modeling of the feature vector, x, and manipulation vector, t.)
a) when learning based on the first group of data points, the manipulation vector is set to a null value, and the parameters of the least one first inference model are learned whilst the parameters of the second inference model are fixed,  (Kocaoglu: Section 5: Causal Generative Adversarial Networks: “First, train a generative model over the labels, then train a generative model for the images conditioned on the labels.”; training without conditional means that the manipulation vector is empty/null) whereas b) when learning based on the second group of data points, the manipulation vector is either set to a known value representing the at least one manipulation if known or the manipulation vector is inferred if the at least one manipulation is not known, and the parameters of the both the first and second inference models are learned (Kocaoglu: Section 5: Causal Generative Adversarial Networks: “First, train a generative model over the labels, then train a generative model for the images conditioned on the labels.”; conditioned learning means the manipulation vector is not empty. The right side of Lee: Figure 2 further discloses an inference subnetwork to help determine manipulations.).
Regarding Claim 16, Lee, in view of Kocaoglu, further teaches: the method of claim 1, wherein the one or more statistical models further include: a co-parent vector modelling a circumstance occurring within an environment of the ground truth having a similar effect to the ground truth, and/or a parent vector modelling a parent cause of the classification  (Kocaoglu: Abstract: “We propose an adversarial training procedure for learning a causal implicit generative model for a given causal graph.”; maps containing multiple different causes for the classification/outcome requires additional vectors for the models describing the causes.).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Patel et al. (US 2020/0184955 A1), Mathieu et al. (US 2018/0137389 A1), Arik et al. (US 2019/0251952 A1).
Song, Yang, et al. "Pixeldefend: Leveraging generative models to understand and defend against adversarial examples." arXiv preprint arXiv:1710.10766 (2017).
Schott, Lukas, et al. "Towards the first adversarially robust neural network model on MNIST." arXiv preprint arXiv:1805.09190 (2018).
Samangouei, Pouya, Maya Kabkab, and Rama Chellappa. "Defense-gan: Protecting classifiers against adversarial attacks using generative models." arXiv preprint arXiv:1805.06605 (2018).
Madry, Aleksander, et al. "Towards deep learning models resistant to adversarial attacks." arXiv preprint arXiv:1706.06083 (2017).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JULIUS CHAI whose telephone number is (571)272-4209. The examiner can normally be reached Monday-Friday 8:00 AM EST - 4:30PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vu Le can be reached on (571) 272-7332. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/JULIUS CHAI/Examiner, Art Unit 2668    

/VU LE/Supervisory Patent Examiner, Art Unit 2668