DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Examiner notes the entry of the following papers:
Amended claims filed 9/15/2021.
Applicant arguments/remarks made in amendment filed 9/15/2021.
Amendments to specification filed 9/15/2021.
Information disclosure statement filed 9/15/2021.
Objections to specification are withdrawn.
Claims 1, 8, and 15 are amended.
Claims 1-20 are presented for examination. 
Information Disclosure Statement
The information disclosure statement submitted on 9/15/2021 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is considered by the Examiner.
Response to Arguments
Applicant’s arguments filed 9/15/2021 in regard to prior art of record does not disclose the amended limitations are moot in view of new grounds for rejection.  Please see detailed rejection below.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al (Mitigating Unwanted Biases with Adversarial Learning, herein Zhang), and Kamishima et al (Fairness-Aware Classifier with Prejudice Remover Regularizer, herein Kamishima).
Regarding claim 1,
	Zhang teaches a computer-implemented method, the method comprising: (Zhang, page 335, column 1, paragraph 2, line 7 “The method is flexible and applicable to multiple definitions of fairness as well as a wide range of gradient-based learning models, including both regression and classification tasks.” In other words, the method is a computer-implemented method.)
	receiving by a reinforcement learning engine an original version of a machine learning model (MLM) including a plurality of parameter values, a plurality of hyperparameter values and an original fairness value that reflects fairness with respect to segmented relevant sub-groups; (Zhang, page 335, column 1, paragraph 1, line 4 “We present a framework for mitigating such biases by including a variable for the group of interest and simultaneously learning a predictor and an adversary.” And, page 335, column 2, paragraph 2, line 1 “Work on training machine learning systems that output fair decisions has defined several useful adversarial debiasing.  We consider supervised deep learning tasks in which the task is to predict an output variable Y given an input variable X, while remaining unbiased with respect to some variable Z.  We refer to Z as the protected variable. For these learning systems, the predictor is learning from a training set of (input, output, protected) tuples (X, Y, Z).  The predictor f is usually given access to the protected variable Z, though this is not strictly necessary.” And, page 3, column 1, paragraph 1, line 1 “We updated U to minimize LA at each training time step, according to the gradient. We modify W according to the expression: 
    PNG
    media_image1.png
    20
    264
    media_image1.png
    Greyscale
            (1) 
where 
    PNG
    media_image2.png
    11
    11
    media_image2.png
    Greyscale
  is a tunable hyperparameter that can vary at each time step and we define projvx = 0 if v = 0.” And page 336, column 1, paragraph 3, line 2 “A predictor f will be trained to model Y as accurately possible while satisfying one of the above equality constraints.  Demographic parity will be achieved by introducing an adversary g which will attempt to predict a value for Z from 
    PNG
    media_image3.png
    22
    13
    media_image3.png
    Greyscale
 . The gradient of g will then be incorporated into the weight update rule of f so as to reduce the amount of information about Z transmitted through 
    PNG
    media_image3.png
    22
    13
    media_image3.png
    Greyscale
.”  See Figure 1.  In other words, the adversary is the reinforcement learning engine. The predictor which learns from a training set of (input, output, protected) tuples (X, Y, Z) is the original machine learning model (MLM) including a plurality of parameters and hyperparameters. Several useful measurements for fairness: Demographic Parity, Equality of Odds, and Equality of Opportunity are an original fairness value that reflects fairness with respect to segmented relevant sub-groups. The adversary provides reinforcement through its gradient g which is incorporated into the weight update rule of f, which is the MLM.  Examiner notes from specification paragraphs [0058] and [0059] of the instant application that all that is required of a reinforcement learning engine is that it “calculates a reward” which is based on performance and fairness of the supervised machine-learning model and gives feedback (i.e. reinforcement) to the machine learning model which then forces an adjustment.  This is precisely what is happening in Zhang.  The only difference is Zhang labels the reinforcement learning engine as “Adversary” instead of reinforcement learning engine.)

    PNG
    media_image4.png
    179
    587
    media_image4.png
    Greyscale

[adjusting by a reinforcement learning engine at least some of the parameter values and/or at least some of the hyperparameter values of the original version of the MLM to create a provisional version of the MLM;]
	determining by a reinforcement learning engine a fairness value for the provisional version of the MLM by operations including the following: receiving a reinforcement learning meta model (RLMM) that defines a plurality of fairness related objectives and a reward function reflecting the plurality of fairness related objectives; (Zhang, page 335, column 2, paragraph 2, line 1 “Work on training machine learning systems that output fair decisions has defined several useful measurements for fairness: Demographic Parity, Equality of Odds, and adversarial debiasing.  We consider supervised deep learning tasks in which the task is to predict an output variable Y given an input variable X, while remaining unbiased with respect to some variable Z.  We refer to Z as the protected variable. For these learning systems, the predictor 
    PNG
    media_image5.png
    24
    86
    media_image5.png
    Greyscale
 is learning from a training set of (input, output, protected) tuples (X, Y, Z).  The predictor f is usually given access to the protected variable Z, though this is not strictly necessary.” In other words, work on training machine learning systems that output fair decisions has several useful measurements for fairness, where Z as the protected variable is determining a fairness value for the provisional version of the MLM, incorporated into a loss function is a reward function, and the gradient descent algorithm for the learning algorithm used with the adversarial learning framework is the reinforcement learning meta model.) 
	operating the provisional version of the MLM; during the operation of the provisional version of the MLM, calculating, by the RLMM, reward values based on the reward function; and determining a provisional fairness value for the provisional version of the MLM based upon the reward values; (Zhang, page 335, column 1, paragraph 1, line 4 “We present a framework for mitigating such biases by including a variable for the group of interest and simultaneously learning a predictor and an adversary.  The input to the network X, here text or census data, produces a prediction Y, such as an analogy completion or income bracket, while the adversary tries to model a protected variable Z, here gender or zip code. The objective is to Y while minimizing the adversary’s ability to predict Z.  Applied to analogy completion, this method results in accurate predictions that exhibit less evidence of stereotyping Z.  … The method is flexible and applicable to multiple definitions of fairness as well as a wide range of gradient-based learning models, including both regression and classification tasks.” In other words, the predictor is the MLM, Z is the provisional fairness value, and the stochastic gradient descent algorithm combined with the adversarial learning network is the RLMM.)
	determining that the provisional fairness value is greater than the original fairness value; and P201809034US01Page 28 of 34responsive to the determination that the provisional fairness value is greater than the original fairness value, replacing, by a reinforcement learning engine the original version of the MLM with the provisional version of the MLM and replacing the original fairness value with the provisional fairness value. (Zhang, page 336, column 2, paragraph 3, line 1 “We begin with a model, which we call the predictor, trained to accomplish the task of predicting Y given X.  As in Figure 1, we assume that the model is trained by attempting to modify weights W to minimize some loss 
    PNG
    media_image6.png
    19
    78
    media_image6.png
    Greyscale
using a gradient-based method such as stochastic gradient descent.  The output layer of the predictor is then used as an input to another network called the adversary which attempts to predict Z.  This is part of the network corresponds to the discriminator in a typical GAN [4].  We will suppose the adversary has loss term 
    PNG
    media_image7.png
    24
    68
    media_image7.png
    Greyscale
 and weights U
For Demographic Parity, the adversary gets the predicted label 
    PNG
    media_image8.png
    21
    12
    media_image8.png
    Greyscale
 .  Intuitively, this allows the adversary to try to predict the protected variable using nothing but the predicted label.  The goal of the predictor is to prevent the adversary from doing this.
For Equality of Odds, the adversary gets 
    PNG
    media_image8.png
    21
    12
    media_image8.png
    Greyscale
 and the true label Y. 
For Equality of Opportunity on a given class y, we can restrict the training set of the adversary to training examples where Y = y3.
In other words, weights U is updates is responsive to the determination that the provisional fairness value is greater than the original fairness value, and the final model after applying the adversary is the replacement model to the original provisional model.)
	Thus far, Zhang does not explicitly teach adjusting at least some of the parameter values and/or at least some of the hyperparameter values of the original version of the MLM to create a provisional version of the MLM;
	Kamishima teaches adjusting at least some of the parameter values and/or at least some of the hyperparameter values of the original version of the MLM to create a provisional version of the MLM; (Kamishima, page 36, paragraph 4, line 7 “Our method provides a way to control this trade-off by adjusting the regularization parameter. We propose a prejudice remover regularizer, which enforces a determination’s independence from sensitive information.” In other words, adjusting the regularization parameter is adjusting at least some of the parameter and/or at least some of the hyperparameter values.)	
	Both Kamishima and Zhang are directed to removing unfair bias in machine learning systems, among other things.  In view of the teaching of the combination of Zhang and Minh, it would be obvious to one of ordinary skill in the art before the effective filing date of the 
	One of ordinary skill in the art would be motivated to do this because it is important to be unbiased and nondiscriminatory in relation to sensitive features such as gender, religion, race, ethnicity, handicaps, and political convictions. (Kamishima, page 35, paragraph 2, line 1 “Data mining techniques are being increasingly used for serious determinations such as credit, insurance rates, employment applications, and so on. For example, credit scoring is frequently determined based on the records of past credit data together with statistical prediction techniques.  Needless to say, such serious determinations must guarantee fairness in both social and legal viewpoints; that is, they must be unbiased and nondiscriminatory in relation to sensitive features such as gender, religion, race ethnicity, handicaps, political convictions, and so on.”)
Regarding claim 2,
	The combination of Zhang and Kamishima teach the computer-implemented method of claim 1,
	further comprising: iteratively repeating the operations of until the original fairness value exceeds a predetermined threshold.  (Zhang, page 337, column 2, paragraph 1, 
“5 THEORETICAL GUARANTEES

    PNG
    media_image9.png
    375
    596
    media_image9.png
    Greyscale

Proof: 
Since the adversary converges, 
    PNG
    media_image10.png
    19
    241
    media_image10.png
    Greyscale
 otherwise, since LA is convex in U, the adversary’s weights would move toward U0.  In other words, the adversary’s minimum is the point at which the adversary gains an advantage from using 
    PNG
    media_image8.png
    21
    12
    media_image8.png
    Greyscale
 .  Similarly, since the predictor converges, 
    PNG
    media_image11.png
    23
    246
    media_image11.png
    Greyscale
. Otherwise, the predictor would be able to increase the adversary’s loss by moving toward W0, and the projection term and negative weight on 
    PNG
    media_image12.png
    22
    59
    media_image12.png
    Greyscale
 in Eqn. 1 would push the predictor to move towards 0.  Then:

    PNG
    media_image13.png
    133
    443
    media_image13.png
    Greyscale

so we must have 
    PNG
    media_image14.png
    20
    243
    media_image14.png
    Greyscale
 
    PNG
    media_image15.png
    17
    20
    media_image15.png
    Greyscale

In other words, the learning model iterates through the training adjusting weights is iteratively repeating the operations, and the above proof is proof that the debiasing is optimal which exceeds a predetermined threshold.)


Regarding claim 3,
	The combination of Zhang and Kamishima teach the computer-implemented method of claim 1,
	wherein the original MLM is a supervised MLM. (Zhang, page 335, column 2, paragraph 3, line 2 “We consider supervised deep learning tasks in which the task is to predict an output variable Y given an input variable X, while remaining unbiased with respect to some variable Z.” In other words, supervised deep learning tasks are supervised MLMs.)
Regarding claim 4,
	The combination of Zhang and Kamishima teach the computer-implemented method of claim 1,
	wherein the fairness related objectives include at least one of the following: gender, age, nationality, religious beliefs, ethnicity and orientation.  (Zhang, page 336, column 2, paragraph 4, line 7 “For Demographic Parity, the adversary gets the predicted label 
    PNG
    media_image8.png
    21
    12
    media_image8.png
    Greyscale
 .  Intuitively, this allows the adversary to try to predict the protected variable using nothing but the predicted label.”  In other words, Demographic Parity is at least one of the following: gender, age, nationality, religious beliefs, ethnicity and orientation.)
Regarding claim 5,
	The combination of Zhang and Kamishima teach the computer-implemented method of claim 1,
	further comprising: linking the original MLM to the reinforcement learning meta model based on a configuration and a read out.  (Zhang, Figure 1, In other words, the adversary model coupled with stochastic gradient descent is reinforcement learning, and the model learning model i.e. predictor is linked to the adversary model.) 
Regarding claim 6,
	The combination of Zhang and Kamishima teach the computer-implemented method of claim 1,
	wherein the plurality of parameter values includes a value for at least one of the following parameter types: weighing factors and activation function variables.  (Zhang, page 336, column 2, paragraph 3, line 2 “As in Figure 1, we assume that the model is trained by attempting to modify weights W to minimize some loss 
    PNG
    media_image16.png
    21
    78
    media_image16.png
    Greyscale
 using a gradient-based method such as stochastic gradient descent.”  In other words, modify weights is at least one of the following parameter types: weight factors and activation function variables.)  
Regarding claim 7,
	The combination of Zhang and Kamishima teach the computer-implemented method of claim 1,
	wherein the plurality of hyperparameter values include a value for at least one of the following hyperparameter types: type of activation function, number of nodes per layer, number of layers of a neural network and machine-learning model.  (Zhang, page 336, column 2, paragraph 3, line 1 “We begin with a model, which we call the predictor, trained to accomplish the task of predicting Y given X.  As in Figure 1, we assume that the model is trained by attempting to modify weights W to minimize some loss 
    PNG
    media_image17.png
    20
    67
    media_image17.png
    Greyscale
, using a gradient-based method such as stochastic gradient descent.  The output layer of the predictor is then used as Z.” In other words, the output layer of the predictor, and type, is a hyperparameter of the machine-learning model.)
Claims 8-14 are computer program product claims comprising one or more non-transitory computer readable storage media and program instructions corresponding to computer-implemented method claims 1-7, respectively.  Otherwise, they are the same.  It is implicit that a computer-implemented method requires a processor and at least one non-transitory computer readable storage media, as well as program instructions in order to execute. Therefore, claims 8-14 are rejected for the same reasons as claims 1-7.
Claims 15-16, and 17-20 are computer system claims corresponding to computer-implemented method claims 1-2, and 4-7, respectively.  Otherwise, they are the same.  It is implicit that a computer-implemented method requires a computer system with one or more computer processors; one or more computer readable storage media; and program instructions stored on the one or more computer readable storage media in order to execute.  Therefore, claims 15-16, and 17-20 are rejected for the same reasons as claims 1-2, and 4-7, respectively.
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure:
Mnih, et al “Playing Atari with Deep Reinforcement Learning” shows a reinforcement learning method for training a convolutional neural network.
Uther, “Adversarial Reinforcement Learning” shows how adversarial agents can be used as reinforcement learning engines for training machine learning models. 
Conclusion
	THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 	Any inquiry concerning this communication or earlier communications from the examiner should be directed to BART RYLANDER whose telephone number is (571)272-8359. The examiner can normally be reached Monday - Thursday 8:00 to 5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on 571-270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/B.I.R./Examiner, Art Unit 2124                                                                                                                                                                                                        

/BRIAN M SMITH/Primary Examiner, Art Unit 2122