Detailed Action
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

EXAMINER’S AMENDMENT
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.

Authorization for this examiner’s amendment was given in an interview with Mr. Peter Manzo on 06/01/2022.

This listing of claims will replace all prior versions and listings of claims in the application:

1. 	(Currently Amended) A computer-implemented method for providing fair deep reinforcement learning, the computer-implemented method comprising: 
receiving, by a computer, via a network, multimedia data capturing a robot performing an action to accomplish a task in a physical environment from a first set of sensors located in the physical environment; 
performing, by the computer, using an artificial neural network, an analysis of the multimedia data capturing the robot performing the action to accomplish the task in the physical environment to determine bias of the robot corresponding to a set of items located in the physical environment while the robot performs the action, the artificial neural network including a biased path of biased nodes having bias weights, a non-biased path of non-biased nodes having non-bias weights, and a limit function, wherein the artificial neural network executes in parallel the biased path of the biased nodes having the bias weights and the non-biased path of the non-biased nodes having the non-bias weights; 
identifying, by the computer, equal opportunity and disparate impact on protected attributes from the multimedia data during performance of the action by the robot to weight degree of bias based on a determined change in state of the physical environment in response to the robot performing the action;
performing, by the computer, post processing of [[a]] the weighted degree of bias to decrease the bias of the robot by merging the biased nodes having the bias weights in the biased path with the non-biased nodes having the non- bias weights in the non-biased path of the artificial neural network and limiting the bias weights using the limit function to form merged nodes having decreased bias; 
relabeling, by the computer, training data of a semi-supervised learning model that was used to previously train the robot to perform the action to accomplish the task by the robot in the physical environment based on the post processing of the weighted degree of bias; [[and]] 
retraining, by the computer, the robot to increase performance of the action to accomplish the task by the robot in the physical environment using the relabeled training data-; 
recalculating, by the computer, a reward corresponding to the action based on the equal opportunity and disparate impact on the protected attributes during performance of the action by the robot; and 
updating, by the computer, a Q-table with the recalculated reward corresponding to the action.

2. 	(Currently Amended) The computer-implemented method of claim 1 further comprising:
	receiving, by the computer, the semi-supervised learning model corresponding to a set of two or more physical environments, wherein the physical environment is one physical environment in the set of two or more physical environments;
training, by the computer, the robot to perform the action to accomplish the task in the physical environment of the set of two or more physical environments based on the semi-supervised learning model; and
mapping, by the computer, the action to be performed by the robot in the physical environment to [[a]] the reward using [[a]] the Q-table.

3.	(Previously Presented) The computer-implemented method of claim 2 further comprising:
	determining, by the computer, change in state of the physical environment based on the analysis of the multimedia data capturing the robot performing the action to accomplish the task in the physical environment.

4.	(Canceled)

5. 	(Currently Amended) The computer-implemented method of claim 1 further comprising:
	receiving, by the computer, [[a]] the semi-supervised learning model corresponding to a set of physical environments; and
	training, by the computer, a swarm of robots to perform the action to accomplish the task in one or more other physical environments of the set of physical environments based on the semi-supervised learning model.

6.	(Previously Presented) The computer-implemented method of claim 5 further comprising:
	receiving, by the computer, via the network, multimedia data capturing the swarm of robots performing the action to accomplish the task in the one or more other physical environments from a second set of sensors;
analyzing, by the computer, the multimedia data capturing the swarm of robots performing the action to accomplish the task in the one or more other physical environments using the artificial neural network; and
determining, by the computer, change in state of the one or more other physical environments based on analysis of the swarm of robots performing the action to accomplish the task in the one or more other physical environments.

7. 	(Currently Amended) The computer-implemented method of claim 5 further comprising:
	identifying, by the computer, the equal opportunity and disparate impact on protected attributes during performance of the action by the swarm of robots to weight degree of bias based on determined change in state of the one or more other physical environments in response to the performance of the action.

8-9	(Canceled)

10.	(Previously Presented) The computer-implemented method of claim 1, wherein the artificial neural network is a convolutional neural network.

11.	(Currently Amended) A computer system for providing fair deep reinforcement learning, the computer system comprising:
	a bus system;
	a storage device connected to the bus system, wherein the storage device stores program instructions; and
	a processor connected to the bus system, wherein the processor executes the program instructions to:
receive, via a network, multimedia data capturing a robot performing an action to accomplish a task in a physical environment from a first set of sensors located in the physical environment;
perform, using an artificial neural network, an analysis of the multimedia data capturing the robot performing the action to accomplish the task in the physical environment to determine bias of the robot corresponding to a set of items located in the physical environment while the robot performs the action, the artificial neural network including a biased path of biased nodes having bias weights, a non-biased path of non-biased nodes having non-bias weights, and a limit function, wherein the artificial neural network executes in parallel the biased path of the biased nodes having the bias weights and the non-biased path of the non-biased nodes having the non-bias weights;
identify equal opportunity and disparate impact on protected attributes from the multimedia data during performance of the action by the robot to weight degree of bias based on a determined change in state of the physical environment in response to the robot performing the action;
perform post processing of a weighted degree of bias to decrease the bias of the robot by merging the biased nodes having the bias weights in the biased path with the non-biased nodes having the non-bias weights in the non-biased path of the artificial neural network and limiting the bias weights using the limit function to form merged nodes having decreased bias;
relabel training data of a semi-supervised learning model that was used to previously train the robot to perform the action to accomplish the task by the robot in the physical environment based on the post processing of the weighted degree of bias; [[and]]
retrain the robot to increase performance of the action to accomplish the task by the robot in the physical environment using the relabeled training data-; 
recalculate a reward corresponding to the action based on the equal opportunity and disparate impact on the protected attributes during performance of the action by the robot; and 
update a Q-table with the recalculated reward corresponding to the action.

12.	(Currently Amended) The computer system of claim 11, wherein the processor further executes the program instructions to: receive [[a]] the semi-supervised learning model corresponding to a set of two or more physical environments, wherein the physical environment is one physical environment in the set of two or more physical environments; train the [[agent]] robot to perform the action to accomplish the task in the physical environment of the set of two or more physical environments based on the semi- supervised learning model; and map the action to be performed by the [[agent]] robot in the physical environment to a reward using a Q-table.

13.	(Currently Amended) A computer program product for providing fair deep reinforcement learning, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computer to cause the computer to perform a method comprising:
	receiving, by the computer, via a network, multimedia data capturing a robot performing an action to accomplish a task in a physical environment from a first set of sensors located in the physical environment;
performing, by the computer, using an artificial neural network, an analysis of the multimedia data capturing the robot performing the action to accomplish the task in the physical environment to determine bias of the robot corresponding to a set of items located in the physical environment while the robot performs the action, the artificial neural network including a biased path of biased nodes having bias weights, a non-biased path of non-biased nodes having non-bias weights, and a limit function, wherein the artificial neural network executes in parallel the biased path of the biased nodes having the bias weights and the non-biased path of the non-biased nodes having the non-bias weights;
identifying, by the computer, equal opportunity and disparate impact on protected attributes from the multimedia data during performance of the action by the robot to weight degree of bias based on a determined change in state of the physical environment in response to the robot performing the action;
performing, by the computer, post processing of a weighted degree of bias to decrease the bias of the robot by merging the biased nodes having the bias weights in the biased path with the non-biased nodes having the non-bias weights in the non-biased path of the artificial neural network and limiting the bias weights using the limit function to form merged nodes having decreased bias;
relabeling, by the computer, training data of a semi-supervised learning model that was used to previously train the robot to perform the action to accomplish the task by the robot in the physical environment based on the post processing of the weighted degree of bias; [[and]]
retraining, by the computer, the robot to increase performance of the action to accomplish the task by the robot in the physical environment using the relabeled training data; 
recalculating, by the computer, a reward corresponding to the action based on the equal opportunity and disparate impact on the protected attributes during performance of the action by the robot; and 
updating, by the computer, a Q-table with the recalculated reward corresponding to the action.

14. 	(Currently Amended) The computer program product of claim 13 further comprising:
	receiving, by the computer, the semi-supervised learning model corresponding to a set of two or more physical environments, wherein the physical environment is one physical environment in the set of two or more physical environments;
training, by the computer, the robot to perform the action to accomplish the task in the physical environment of the set of two or more physical environments based on the semi-supervised learning model; and
mapping, by the computer, the action to be performed by the robot in the physical environment to [[a]] the reward using [[a]] the Q-table.

15.	(Previously Presented) The computer program product of claim 14 further comprising:
	determining, by the computer, change in state of the physical environment based on the analysis of the multimedia data capturing the robot performing the action to accomplish the task in the physical environment.

16. 	(Canceled)

17. 	(Currently Amended) The computer program product of claim 13 further comprising:
	receiving, by the computer, [[a]] the semi-supervised learning model corresponding to a set of physical environments; and
	training, by the computer, a swarm of robots to perform the action to accomplish the task in one or more other physical environments of the set of physical environments based on the semi-supervised learning model.

18.	(Previously Presented) The computer program product of claim 17 further comprising:
	receiving, by the computer, via the network, multimedia data capturing the swarm of robots performing the action to accomplish the task in the one or more other physical environments from a second set of sensors;
analyzing, by the computer, the multimedia data capturing the swarm of robots performing the action to accomplish the task in the one or more other physical environments using the artificial neural network; and
determining, by the computer, change in state of the one or more other physical environments based on analysis of the swarm of robots performing the action to accomplish the task in the one or more other physical environments.

19. 	(Currently Amended) The computer program product of claim 17 further comprising:
	identifying, by the computer, the equal opportunity and disparate impact on protected attributes during performance of the action by the swarm of robots to weight degree of bias based on determined change in state of the one or more other physical environments in response to the performance of the action.

20.	(Canceled)

REASONS FOR ALLOWANCE
The following is an examiner’s statement of reasons for allowance:
Interpreting the claims in light of the specification examiner finds the claimed invention is patentably distinct from the prior art of record. The prior art does not expressly teach or render obvious the invention as recited in amended independent claim 1, 11, and 13.

	Hasselt (Hasselt et al, 2015, “Deep Reinforcement Learning with Double Q-learning”) teaches a observing a microstate of an environment and reaction of items in a plurality of microstates within the environment after an agent (i.e. robot, human) performs an action in the environment. Hasselt also discloses determining bias weights corresponding to the action for the microstate.  

Li (US 20200226489 A1) discloses combining bias and unbiased data using an artificial neural network. Li also discloses determining where the bias is occurring in the semi-supervised training based on the bias weight with the non-bias weights in the artificial neural network.

	The features of performing, by the computer, using an artificial neural network, an analysis of the multimedia data capturing the robot performing the action to accomplish the task in the physical environment to determine bias of the robot corresponding to a set of items located in the physical environment while the robot performs the action, the artificial neural network including a biased path of biased nodes having bias weights, a non-biased path of non-biased nodes having non-bias weights, and a limit function, wherein the artificial neural network executes in parallel the biased path of the biased nodes having the bias weights and the non-biased path of the non-biased nodes having the non-bias weights, identifying, by the computer, equal opportunity and disparate impact on protected attributes from the multimedia data during performance of the action by the robot to weight degree of bias based on a determined change in state of the physical environment in response to the robot performing the action, recalculating by the computer, a reward corresponding to the action based on the equal opportunity and disparate impact on the protected attributes during performance of the action by the robot, and updating by the computer, a Q-table with the recalculated reward corresponding to the action, were not uncovered in the prior art teachings.

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JUN KWON whose telephone number is (571)272-2072. The examiner can normally be reached on 7:30 AM - 5:30 PM. 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abdullah Kawsar can be reached on (571)270-3169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).

/JUN KWON/
Examiner, Art Unit 2127

/ABDULLAH AL KAWSAR/Supervisory Patent Examiner, Art Unit 2127