Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
1.	This action is responsive to communications: Application filed on September 6, 2022, and Drawings filed on September 6, 2022.
2.	Claims 1–17 are pending in this case. Claim 1, 10 are independent claims. Applicant’s election in the reply filed on 9/6/022 is acknowledged.


In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 2, 3, 10, 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Shwartz et al., Pub. No.: 2020/0097357A1, in view of Yang, Pub. No.: 2021/0073995.
With regard to claim 1:
Shwartz discloses a computer-implemented method for training a machine learning (ML) model for robotic process automation (RPA) using reinforcement learning (The machine learning model is trained to perform automation functions, paragraph 13: “Another embodiment of the present invention is a cognitive automation-engine system that is trained by methods of machine learning. This system consists of hardware and software modules organized and connected in a specific structure to provide improved, artificially intelligent script-customization functionality. These modules include a processor, memory, and a computer-readable hardware storage device containing program code configured that is run by the processor to perform a method for machine-trainable automated-script customization. A script library stores previously recorded automation scripts. A customization-recorder module monitors and records identifications of data structures and activities related to the customization of a previously stored (or newly derived) automation script. These recording may include identifications of: an unexpected event that produces an adverse effect on a computing environment, a selection of a previously stored automation script from the script library, a customized script generated by applying customization steps to the selected script, the customization steps, a relative risk that running an automation script comprising an instruction associated with a first customization step of the customization steps would adversely affect operation of the computing environment, and an aggregate risk that running the customized script would adversely affect operation of the computing environment. An engine-training module trains a customization module to intelligently customize automation scripts. One or more corpora store historical training material that is submitted to the customization module by the engine-training module during machine-learning training sessions. The customization module, in response to receiving notice that an unexpected event has disrupted the system's computing environment, the customization module selects, from a script library, an automation script that is configured to address the type of problems caused by the disruption. The customization module then attempts to intelligently customize the selected script into a customized script that more specifically addresses the particular type of disruption associated with the unexpected event. If this attempt fails, the system requests assistance from a human expert. In either case, the customization module directs the customization recorder to record human or automated activities leading to each identified customization step. The customization module then uses cognitive methods, learned through the machine-learning trainings sessions, to identify a relative risk to the computing environment posted by each customization step and a resulting aggregate risk of running the entire customized script. If the aggregate risk exceeds a certain threshold limit, the customization module requests special authorization to run the customized script. If the script has low aggregate risk or is otherwise authorized to be run, the customization module generates and runs the script, adds the customized script to the script library, and directs a recording module to record the result of running the script. The customization module then adds to the one or more machine-learning corpora information describing the unexpected event, the adverse effects, the original and customized scripts, the customization steps, the degrees of risk of the customized script and of each step, and the results of running the customized script. The automation engine directs the engine-training module to use the updated corpora during the next machine-learning training session in order to train the customization module to more intelligently customize automation scripts.”), comprising: training the ML model by running simulations on training data using the ML model (See figure 5 for training automation using training data such as historic data, paragraph 86: “In embodiments of the present invention, system-management platform 405 comprises a cognitive automation engine 410 that automatically customizes and runs automation scripts in response to the occurrence of an unexpected condition in computing environment 4000. System-management platform 405 also comprises a training module 430 that trains the automation engine 410 to intelligently customize template scripts. This training is performed by methods of machine-learning that submit training data, in the form of corpus 435, to automation engine 410. In some embodiments, training module 430 or corpus 435 may be components of automation engine 410. Automation engine 410 comprises, among other components, a trainable customization module 415 that learns how to customize automation scripts and template scripts, retrieved from a script library 420, by analyzing historical information contained in one or more machine-learning corpora 435 submitted to the customization module 415 by engine-training module 430. Training module 430 determines in part how to update corpus 435 by analyzing historical recordings or logs from which may be inferred the success rates and results of previous attempts to customize a script. These recordings and logs may be gathered by customization-recorder module 425 from either automated or human-implemented script customizations.”), the ML model having a performance function; and when the ML model does not achieve convergence defined by the performance function based on one or more criteria (the system determines whether there is performance issues such as engine 410 is unable to locate a script, paragraph 98 and 100: “ In certain instances, automation engine 410 may not run a retrieved script. This may happen if engine 410 is unable to locate a script that is appropriate or capable of addressing the unexpected event, or if engine 410 determines that the script requires such extensive customization that it would be impossible to successfully run the script in its original form. In some cases, a template script may even have been expressly designed so as to require customization before being performed. In step 515, the automation engine 410 or management platform 405 determines whether the automatic remedial action was successful. This step may be performed by any means known in the art, such as through system calls to a host operating system or hypervisor, through API calls, or through transactional messages exchanged with an application. In certain embodiments, this determination may be made as a function of an interactive or other communication with an administrator or other user. If the system in step 515 determines that the automated remediation was not successful, the automation engine 410 or management platform 405 requests that a human expert, such as a system administrator or application specialist, manually customize the retrieved script (or select a different script), using expert knowledge.” See also  paragraph 118 to 120 where in the system determine whether the .): requesting human assistance (the system request expert help, paragraph 100: “If the system in step 515 determines that the automated remediation was not successful, the automation engine 410 or management platform 405 requests that a human expert, such as a system administrator or application specialist, manually customize the retrieved script (or select a different script), using expert knowledge.”), monitoring actions taken by a human on a computing system (the action of the expert is monitored, paragraph 103: “In step 525, recorder 425 stores the monitored customization steps in a remediation log. This log may take any form known in the art that is capable of storing information from which may be extracted or inferred semantic meanings of each customization step and the result of performing the customized script. In its simplest form, a remediation log could be a simple keystroke log that merely records the commands typed by a human expert when editing the retrieved script. If an expert edits the script in an integrated application development environment, the log might include only relevant logical revisions, such as a listing of each added or revised instruction generated by the expert.”), and modifying a policy network for the ML model, the performance function, or both, based on the actions taken by the human (the system is trained based on user action, paragraph 118 to 120: “ In step 540, customization module 415 uses these risk determinations to assign an overall degree of risk to the entire customized script. This overall risk is then used to determine whether to run the customized script. In step 545, customization module 415 uses the procedures and results of steps 500-540 to update machine-learning training corpus 435. The updated corpus 435 is then submitted to the customization module 415 in order to train the automation engine 410 to more intelligently customize future scripts. This training process may continue until automation engine 410 is deemed to have a success rate in customizing retrieved scripts that is at least equivalent to that of a human expert. In some embodiments, the machine-language training steps will continue indefinitely, allowing the automation engine 410 to continuously improve itself.”).
Shwartz does not discloses the aspect wherein the performance function is a reward function. 
However Yang discloses the ML model having a reward function; and determine whether or not the ML model achieve convergence defined by the reward function based on one or more criteria (the system can determines the training is insufficient if convergence is not met, the  paragraph 79: “ In at least one embodiment, at decision block 608, a computer system determines whether a reward determined at block 606 indicates that training of said second network is sufficient. In at least one embodiment, a reward is sufficient when it indicates that accuracy of output of said second network exceeds a desired threshold value. In at least one embodiment, a reward is sufficient when it indicates that a convergence rate associated with training said second network exceeds a target threshold value. In at least one embodiment, as a result of having determined that said reward is sufficient, execution advances to block 610 and training of said second network is complete. In at least one embodiment, if said computer system determines that said reward is insufficient, execution advances to block 612, where a process of retraining said second network with updated hyperparameters is initiated.”).It would have been obvious to one of ordinary skill in the art, at the time the filing was made to apply Yang to Shwartz so the system can use a reward function to determine more accurately whether convergence is achieved or not for greater accuracy and wherein the reward function can  help improve the ML model and help achieve convergence where the robot perform function correctly and effectively. 

With regard to claim 2:
Shawartz and Yang disclose The computer-implemented method of claim 1, wherein the training of the ML model, requesting human assistance(Shawartz the system request expert help, paragraph 100: “If the system in step 515 determines that the automated remediation was not successful, the automation engine 410 or management platform 405 requests that a human expert, such as a system administrator or application specialist, manually customize the retrieved script (or select a different script), using expert knowledge.”), monitoring the actions taken by the human on the computing system (Shawartz the action of the expert is monitored, paragraph 103: “In step 525, recorder 425 stores the monitored customization steps in a remediation log. This log may take any form known in the art that is capable of storing information from which may be extracted or inferred semantic meanings of each customization step and the result of performing the customized script. In its simplest form, a remediation log could be a simple keystroke log that merely records the commands typed by a human expert when editing the retrieved script. If an expert edits the script in an integrated application development environment, the log might include only relevant logical revisions, such as a listing of each added or revised instruction generated by the expert.”), and modifying the policy network, the reward function, or both, are performed by an RPA robot (Shawartz the training is automated, paragraph 118 to 120: “ In step 540, customization module 415 uses these risk determinations to assign an overall degree of risk to the entire customized script. This overall risk is then used to determine whether to run the customized script. In step 545, customization module 415 uses the procedures and results of steps 500-540 to update machine-learning training corpus 435. The updated corpus 435 is then submitted to the customization module 415 in order to train the automation engine 410 to more intelligently customize future scripts. This training process may continue until automation engine 410 is deemed to have a success rate in customizing retrieved scripts that is at least equivalent to that of a human expert. In some embodiments, the machine-language training steps will continue indefinitely, allowing the automation engine 410 to continuously improve itself.”).
.
With regard to claim 3:
Shawartz and Yang disclose The computer-implemented method of claim 1, further comprising: completing the steps of running the simulations on the training data using the ML model (Shawartz See figure 5 for training automation using training data such as historic data, paragraph 86: “In embodiments of the present invention, system-management platform 405 comprises a cognitive automation engine 410 that automatically customizes and runs automation scripts in response to the occurrence of an unexpected condition in computing environment 4000. System-management platform 405 also comprises a training module 430 that trains the automation engine 410 to intelligently customize template scripts. This training is performed by methods of machine-learning that submit training data, in the form of corpus 435, to automation engine 410. In some embodiments, training module 430 or corpus 435 may be components of automation engine 410. Automation engine 410 comprises, among other components, a trainable customization module 415 that learns how to customize automation scripts and template scripts, retrieved from a script library 420, by analyzing historical information contained in one or more machine-learning corpora 435 submitted to the customization module 415 by engine-training module 430. Training module 430 determines in part how to update corpus 435 by analyzing historical recordings or logs from which may be inferred the success rates and results of previous attempts to customize a script. These recordings and logs may be gathered by customization-recorder module 425 from either automated or human-implemented script customizations.”), requesting human assistance (Shawartz the system request expert help, paragraph 100: “If the system in step 515 determines that the automated remediation was not successful, the automation engine 410 or management platform 405 requests that a human expert, such as a system administrator or application specialist, manually customize the retrieved script (or select a different script), using expert knowledge.”),  monitoring the actions taken by the human on the computing system (Shawartz the action of the expert is monitored, paragraph 103: “In step 525, recorder 425 stores the monitored customization steps in a remediation log. This log may take any form known in the art that is capable of storing information from which may be extracted or inferred semantic meanings of each customization step and the result of performing the customized script. In its simplest form, a remediation log could be a simple keystroke log that merely records the commands typed by a human expert when editing the retrieved script. If an expert edits the script in an integrated application development environment, the log might include only relevant logical revisions, such as a listing of each added or revised instruction generated by the expert.”), and modifying the policy network, the reward function, or both, to achieve convergence (Shawartz the system is trained based on user action, paragraph 118 to 120: “ In step 540, customization module 415 uses these risk determinations to assign an overall degree of risk to the entire customized script. This overall risk is then used to determine whether to run the customized script. In step 545, customization module 415 uses the procedures and results of steps 500-540 to update machine-learning training corpus 435. The updated corpus 435 is then submitted to the customization module 415 in order to train the automation engine 410 to more intelligently customize future scripts. This training process may continue until automation engine 410 is deemed to have a success rate in customizing retrieved scripts that is at least equivalent to that of a human expert. In some embodiments, the machine-language training steps will continue indefinitely, allowing the automation engine 410 to continuously improve itself.
”).
Shawartz does not disclose the aspect of repeating the steps of running the simulations on the training data using the ML model modifying the policy network, the reward function, or both, until convergence is achieved.
However Yang discloses the aspect further comprising: repeating the steps of running the simulations on the training data using the ML model modifying the policy network, the reward function, or both, until convergence is achieved (see   figure 6 wherein the steps are repeated until convergence is achieved, paragraph 79: “ In at least one embodiment, at decision block 608, a computer system determines whether a reward determined at block 606 indicates that training of said second network is sufficient. In at least one embodiment, a reward is sufficient when it indicates that accuracy of output of said second network exceeds a desired threshold value. In at least one embodiment, a reward is sufficient when it indicates that a convergence rate associated with training said second network exceeds a target threshold value. In at least one embodiment, as a result of having determined that said reward is sufficient, execution advances to block 610 and training of said second network is complete. In at least one embodiment, if said computer system determines that said reward is insufficient, execution advances to block 612, where a process of retraining said second network with updated hyperparameters is initiated.”). It would have been obvious to one of ordinary skill in the art, at the time the filing was made to apply Yang to Shwartz so the system can use a reward function to determine more accurately whether convergence is achieved or not for greater accuracy and wherein the reward function can help improve the ML model and help achieve convergence where the robot perform function correctly and effectively. 


With regard to claim 10:
Shwartz discloses A computer-implemented method for training a machine learning (ML) model for robotic process automation (RPA) using reinforcement learning (The machine learning model is trained to perform automation functions, paragraph 13: “Another embodiment of the present invention is a cognitive automation-engine system that is trained by methods of machine learning. This system consists of hardware and software modules organized and connected in a specific structure to provide improved, artificially intelligent script-customization functionality. These modules include a processor, memory, and a computer-readable hardware storage device containing program code configured that is run by the processor to perform a method for machine-trainable automated-script customization. A script library stores previously recorded automation scripts. A customization-recorder module monitors and records identifications of data structures and activities related to the customization of a previously stored (or newly derived) automation script. These recording may include identifications of: an unexpected event that produces an adverse effect on a computing environment, a selection of a previously stored automation script from the script library, a customized script generated by applying customization steps to the selected script, the customization steps, a relative risk that running an automation script comprising an instruction associated with a first customization step of the customization steps would adversely affect operation of the computing environment, and an aggregate risk that running the customized script would adversely affect operation of the computing environment. An engine-training module trains a customization module to intelligently customize automation scripts. One or more corpora store historical training material that is submitted to the customization module by the engine-training module during machine-learning training sessions. The customization module, in response to receiving notice that an unexpected event has disrupted the system's computing environment, the customization module selects, from a script library, an automation script that is configured to address the type of problems caused by the disruption. The customization module then attempts to intelligently customize the selected script into a customized script that more specifically addresses the particular type of disruption associated with the unexpected event. If this attempt fails, the system requests assistance from a human expert. In either case, the customization module directs the customization recorder to record human or automated activities leading to each identified customization step. The customization module then uses cognitive methods, learned through the machine-learning trainings sessions, to identify a relative risk to the computing environment posted by each customization step and a resulting aggregate risk of running the entire customized script. If the aggregate risk exceeds a certain threshold limit, the customization module requests special authorization to run the customized script. If the script has low aggregate risk or is otherwise authorized to be run, the customization module generates and runs the script, adds the customized script to the script library, and directs a recording module to record the result of running the script. The customization module then adds to the one or more machine-learning corpora information describing the unexpected event, the adverse effects, the original and customized scripts, the customization steps, the degrees of risk of the customized script and of each step, and the results of running the customized script. The automation engine directs the engine-training module to use the updated corpora during the next machine-learning training session in order to train the customization module to more intelligently customize automation scripts.”), comprising: running simulations on training data using the ML model (See figure 5 for training automation using training data such as historic data, paragraph 86: “In embodiments of the present invention, system-management platform 405 comprises a cognitive automation engine 410 that automatically customizes and runs automation scripts in response to the occurrence of an unexpected condition in computing environment 4000. System-management platform 405 also comprises a training module 430 that trains the automation engine 410 to intelligently customize template scripts. This training is performed by methods of machine-learning that submit training data, in the form of corpus 435, to automation engine 410. In some embodiments, training module 430 or corpus 435 may be components of automation engine 410. Automation engine 410 comprises, among other components, a trainable customization module 415 that learns how to customize automation scripts and template scripts, retrieved from a script library 420, by analyzing historical information contained in one or more machine-learning corpora 435 submitted to the customization module 415 by engine-training module 430. Training module 430 determines in part how to update corpus 435 by analyzing historical recordings or logs from which may be inferred the success rates and results of previous attempts to customize a script. These recordings and logs may be gathered by customization-recorder module 425 from either automated or human-implemented script customizations.”), the ML model having a performance function; and when the ML model does not achieve convergence defined by the performance function based on one or more criteria (the system determines whether there is performance issues such as engine 410 is unable to locate a script, paragraph 98 and 100: “ In certain instances, automation engine 410 may not run a retrieved script. This may happen if engine 410 is unable to locate a script that is appropriate or capable of addressing the unexpected event, or if engine 410 determines that the script requires such extensive customization that it would be impossible to successfully run the script in its original form. In some cases, a template script may even have been expressly designed so as to require customization before being performed. In step 515, the automation engine 410 or management platform 405 determines whether the automatic remedial action was successful. This step may be performed by any means known in the art, such as through system calls to a host operating system or hypervisor, through API calls, or through transactional messages exchanged with an application. In certain embodiments, this determination may be made as a function of an interactive or other communication with an administrator or other user. If the system in step 515 determines that the automated remediation was not successful, the automation engine 410 or management platform 405 requests that a human expert, such as a system administrator or application specialist, manually customize the retrieved script (or select a different script), using expert knowledge.”): monitoring actions taken by a human on a computing system (the action of the expert is monitored, paragraph 103: “In step 525, recorder 425 stores the monitored customization steps in a remediation log. This log may take any form known in the art that is capable of storing information from which may be extracted or inferred semantic meanings of each customization step and the result of performing the customized script. In its simplest form, a remediation log could be a simple keystroke log that merely records the commands typed by a human expert when editing the retrieved script. If an expert edits the script in an integrated application development environment, the log might include only relevant logical revisions, such as a listing of each added or revised instruction generated by the expert.”), and modifying a policy network for the ML model, the performance function, or both, based on the actions taken by the human (the system is trained based on user action, paragraph 118 to 120: “ In step 540, customization module 415 uses these risk determinations to assign an overall degree of risk to the entire customized script. This overall risk is then used to determine whether to run the customized script. In step 545, customization module 415 uses the procedures and results of steps 500-540 to update machine-learning training corpus 435. The updated corpus 435 is then submitted to the customization module 415 in order to train the automation engine 410 to more intelligently customize future scripts. This training process may continue until automation engine 410 is deemed to have a success rate in customizing retrieved scripts that is at least equivalent to that of a human expert. In some embodiments, the machine-language training steps will continue indefinitely, allowing the automation engine 410 to continuously improve itself.
”); and completing the steps of running the simulations on the training data using the ML model (Shawartz See figure 5 for training automation using training data such as historic data, paragraph 86: “In embodiments of the present invention, system-management platform 405 comprises a cognitive automation engine 410 that automatically customizes and runs automation scripts in response to the occurrence of an unexpected condition in computing environment 4000. System-management platform 405 also comprises a training module 430 that trains the automation engine 410 to intelligently customize template scripts. This training is performed by methods of machine-learning that submit training data, in the form of corpus 435, to automation engine 410. In some embodiments, training module 430 or corpus 435 may be components of automation engine 410. Automation engine 410 comprises, among other components, a trainable customization module 415 that learns how to customize automation scripts and template scripts, retrieved from a script library 420, by analyzing historical information contained in one or more machine-learning corpora 435 submitted to the customization module 415 by engine-training module 430. Training module 430 determines in part how to update corpus 435 by analyzing historical recordings or logs from which may be inferred the success rates and results of previous attempts to customize a script. These recordings and logs may be gathered by customization-recorder module 425 from either automated or human-implemented script customizations.”), monitoring the actions taken by the human on the computing system (Shawartz the action of the expert is monitored, paragraph 103: “In step 525, recorder 425 stores the monitored customization steps in a remediation log. This log may take any form known in the art that is capable of storing information from which may be extracted or inferred semantic meanings of each customization step and the result of performing the customized script. In its simplest form, a remediation log could be a simple keystroke log that merely records the commands typed by a human expert when editing the retrieved script. If an expert edits the script in an integrated application development environment, the log might include only relevant logical revisions, such as a listing of each added or revised instruction generated by the expert.”),, and modifying the policy network, the reward function, or both, to achieve convergence (Shawartz the system is trained based on user action, paragraph 118 to 120: “ In step 540, customization module 415 uses these risk determinations to assign an overall degree of risk to the entire customized script. This overall risk is then used to determine whether to run the customized script. In step 545, customization module 415 uses the procedures and results of steps 500-540 to update machine-learning training corpus 435. The updated corpus 435 is then submitted to the customization module 415 in order to train the automation engine 410 to more intelligently customize future scripts. This training process may continue until automation engine 410 is deemed to have a success rate in customizing retrieved scripts that is at least equivalent to that of a human expert. In some embodiments, the machine-language training steps will continue indefinitely, allowing the automation engine 410 to continuously improve itself.
”).
Shwartz does not discloses the aspect wherein the performance function is a reward function and repeating the steps of running the simulations on the training data using the ML model modifying the policy network, the reward function, or both, until convergence is achieved.
However Yang discloses the ML model having a reward function; and determine whether or not the ML model achieve convergence defined by the reward function based on one or more criteria (the system can determines the training is insufficient if convergence is not met, the  paragraph 79: “ In at least one embodiment, at decision block 608, a computer system determines whether a reward determined at block 606 indicates that training of said second network is sufficient. In at least one embodiment, a reward is sufficient when it indicates that accuracy of output of said second network exceeds a desired threshold value. In at least one embodiment, a reward is sufficient when it indicates that a convergence rate associated with training said second network exceeds a target threshold value. In at least one embodiment, as a result of having determined that said reward is sufficient, execution advances to block 610 and training of said second network is complete. In at least one embodiment, if said computer system determines that said reward is insufficient, execution advances to block 612, where a process of retraining said second network with updated hyperparameters is initiated.”) repeating the steps of running the simulations on the training data using the ML model modifying the policy network, the reward function, or both, until convergence is achieved (see   figure 6 wherein the steps are repeated until convergence is achieved, paragraph 79: “ In at least one embodiment, at decision block 608, a computer system determines whether a reward determined at block 606 indicates that training of said second network is sufficient. In at least one embodiment, a reward is sufficient when it indicates that accuracy of output of said second network exceeds a desired threshold value. In at least one embodiment, a reward is sufficient when it indicates that a convergence rate associated with training said second network exceeds a target threshold value. In at least one embodiment, as a result of having determined that said reward is sufficient, execution advances to block 610 and training of said second network is complete. In at least one embodiment, if said computer system determines that said reward is insufficient, execution advances to block 612, where a process of retraining said second network with updated hyperparameters is initiated.”)..It would have been obvious to one of ordinary skill in the art, at the time the filing was made to apply Yang to Shwartz so the system can use a reward function to determine more accurately whether convergence is achieved or not for greater accuracy and wherein the reward function can  help improve the ML model and help achieve convergence where the robot perform function correctly and effectively. 


With regard to claim 11:
Shwartz and Yang discloses The computer-implemented method of claim 10, wherein the method steps are performed by the RPA robot (Shawartz the training is automated, paragraph 118 to 120: “ In step 540, customization module 415 uses these risk determinations to assign an overall degree of risk to the entire customized script. This overall risk is then used to determine whether to run the customized script. In step 545, customization module 415 uses the procedures and results of steps 500-540 to update machine-learning training corpus 435. The updated corpus 435 is then submitted to the customization module 415 in order to train the automation engine 410 to more intelligently customize future scripts. This training process may continue until automation engine 410 is deemed to have a success rate in customizing retrieved scripts that is at least equivalent to that of a human expert. In some embodiments, the machine-language training steps will continue indefinitely, allowing the automation engine 410 to continuously improve itself.”).


Claims 4, 5, 12, 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Shwartz et al., Pub. No.: 2020/0097357A1, in view of Yang and further in view of Sirianni et al., Pub. No.: 2020/0374298A1. 
With regard to claim 4 and 12:
Shwartz and Yang discloses the computer-implemented method of claim 3, wherein after convergence is achieved, the method further comprises: deploying the ML model (Shwartz paragraph 72: “Any of the components of the present invention could be created, integrated, hosted, maintained, deployed, managed, serviced, supported, etc. by a service provider who offers to facilitate a method for machine-trainable automated-script customization. Thus the present invention discloses a process for deploying or integrating computing infrastructure, comprising integrating computer-readable code into the computer system 301, wherein the code in combination with the computer system 301 is capable of performing a method for machine-trainable automated-script customization.”).
Shwartz and Yang do not disclose the aspect of calling the ML model at runtime, by an RPA robot.
However Sirianni discloses the aspect of deploying the ML model; and calling the ML model at runtime, by an RPA robot (paragraph 30: “The deployment module 204 may deploy the trained machine-learning model to determine anomalous behavior of the system during runtime. For example, the deployment module 204 may take the system parameters and return values of system calls during runtime as input of the machine-learning model, and output the detection results of the anomalous behavior of the system.”). It would have been obvious to one of ordinary skill in the art, at the time the filing was made to apply Sirianni to Shwartz and Yang so the trained ML model can be deployed at runtime to perform the intended functions that are trained using ML model. 

With regard to claim 5 and 13:
Shwartz and Yang and Sirianni disclose the computer-implemented method of claim 4, wherein the deployment of the ML model comprises modifying one or more activities in an RPA workflow implemented by the RPA robot to call the trained ML model (Sirianni paragraph 29 and 30: “The training module 202 may monitor a system during the testing and evaluation phase to learn the normal behavior of the system. The training module may use different algorithms to train a machine-learning model. For example, the training module 202 may employ one or more algorithms and/or combination of different algorithms, such as n-gram, neural networks (NNs), support vector machine (SVMs), decision trees, linear and logistic regression, clustering, association rules, and scorecards for the machine-learning model training. The deployment module 204 may deploy the trained machine-learning model to determine anomalous behavior of the system during runtime. For example, the deployment module 204 may take the system parameters and return values of system calls during runtime as input of the machine-learning model, and output the detection results of the anomalous behavior of the system.”). It would have been obvious to one of ordinary skill in the art, at the time the filing was made to apply Sirianni to Shwartz and Yang so the trained ML model can be used at runtime to perform the intended functions that are trained using ML model.

Claims 6, 7, 14, 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Shwartz et al., Pub. No.: 2020/0097357A1, in view of Yang and further in view of Guelman, Pub. No.: 2019/0279109A1. 
With regard to claim 6 and 14:
	Shwartz and Yang do not disclose the computer-implemented method of claim 4, further comprising: detecting, by the RPA robot, that performance of the ML model is declining beyond a predetermined performance threshold; and retraining the ML model until convergence is achieved.
	However Guelman disclose The computer-implemented method of claim 4, further comprising: detecting, by the RPA robot, that performance of the ML model is declining beyond a predetermined performance threshold; and retraining the ML model until convergence is achieved (Based on the determined features or causes of the deteriorating performance, MPM system 100 may be configured to generate a second set of training data to re-train and improve the performance of the machine learning model, paragraph 47: “Based on the performance data sets, MPM system 100 may be configured to detect, over time, that the machine learning model 130 has a deteriorating performance and in turn, determine one or more features that may have contributed to or caused the deteriorating performance. Based on the determined features or causes of the deteriorating performance, MPM system 100 may be configured to generate a second set of training data to re-train and improve the performance of the machine learning model 130. For example, MPM system 100 may determine, based on a mapping of the first set of training data and the output of the machine learning model, that the population stability is low, which means that the first set of training data is likely outdated. This may indicate that the medical images currently processed by the model belong to a population that have a different feature, such as a different mean age, compared to the first set of medical images that were used as training data to train the model. In this case, MPM system 100 may be configured to generate a second set of training data that may have the proper feature such as the correct mean age, in order to improve the performance of machine learning model 130.”). It would have been obvious to one of ordinary skill in the art, at the time the filing was made to apply Guelman to Shwartz and Yang so the system can determine the performance threshold by determine a performance of the model over time to see if the performance has declined or not for a concrete determination of the condition of the model and whether user interference is required.

With regard to claims 7 and 15:
	Shwartz and Yang and Guelman disclose the computer-implemented method of claim 6, wherein the predetermined performance threshold comprises detection accuracy or a frequency with which convergence is achieved without user action (Yang the system can determines the training is insufficient if convergence is not met, the  paragraph 79: “ In at least one embodiment, at decision block 608, a computer system determines whether a reward determined at block 606 indicates that training of said second network is sufficient. In at least one embodiment, a reward is sufficient when it indicates that accuracy of output of said second network exceeds a desired threshold value. In at least one embodiment, a reward is sufficient when it indicates that a convergence rate associated with training said second network exceeds a target threshold value. In at least one embodiment, as a result of having determined that said reward is sufficient, execution advances to block 610 and training of said second network is complete. In at least one embodiment, if said computer system determines that said reward is insufficient, execution advances to block 612, where a process of retraining said second network with updated hyperparameters is initiated.”) It would have been obvious to one of ordinary skill in the art, at the time the filing was made to apply Guelman to Shwartz and Yang so the system can determine the performance threshold by determine a performance of the model over time to see if the performance has declined or not for a concrete determination of the condition of the model and whether user interference is required.

Claim 8 and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Shwartz et al., Pub. No.: 2020/0097357A1, in view of Yang and further in view of  Zhang, Pub. No.: 2019/0354746A1. 
With regard to claim 8 and 16:
Shwartz and Yang do not disclose the computer-implemented method of claim 1, wherein the one or more criteria comprise a predetermined number of trials, a predetermined amount of time, or a combination thereof.
However Zhang disclosers the aspect wherein the one or more criteria comprise a predetermined number of trials, a predetermined amount of time, or a combination thereof. (paragraph 137: “In some embodiments, the deep neural network includes two branches, for example, the first branch includes a plurality of first convolutional layers, and the second branch includes a plurality of second convolutional layers. The deep neural network also includes an input layer, a fully connected layer, etc. The parameters of the plurality of first convolutional layers are the same as or different from those of the plurality of second convolutional layers according to requirements. The deep neural network is trained by using a training picture set, and during training, and back propagation is performed on the deep neural network by means of a set loss function, so that a more desired output is obtained for the input of the next training through the deep neural network of which the parameters are adjusted by means of the back propagation. If a set training condition is satisfied, for example, the loss obtained according to the output reaches a certain threshold, or the training is performed a certain number of times, it is considered that the deep neural network satisfies a convergence condition, and the training is stopped to obtain the trained deep neural network.”). It would have been obvious to one of ordinary skill in the art, at the time the filing was made to apply Zhang to Shwartz and Yang for a more specific model training system that is more precise that would give replicable results and provide the user with better understand for whether the training is successful or not and when at what situation would human assistance be required. 


Claims 9 and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Shwartz et al., Pub. No.: 2020/0097357A1, in view of Yang and further in view of  Morales, Pub. No.: 2019/0347301A1. 
With regard to claims 9 and 17:
Shawartz and Yang do not disclose The computer-implemented method of claim 1, wherein the monitoring of the actions taken by the human comprises monitoring application programming interface (API) calls made based on the actions taken by the human.
However Morales discloses the aspect wherein the monitoring of the actions taken by the human comprises monitoring application programming interface (API) calls made based on the actions taken by the human. (the identity API can be called directly in order to track a user's activity, paragrpah 95: “In some embodiments, activity 704 can comprise determining the one or more user segments in constant time. In the same or other embodiments, method 700 further can comprise determining an update to the one or more user conditions of the user and dynamically updating the one or more user segments based at least in part on the update to the one or more user conditions of the user. In some embodiments, determining the update to the one or more user conditions of the user can comprise tracking the user through a different channel (e.g., collecting data on the user when the user is watching a television show and using the data or user conditions on another channel (e.g., an application), such as introducing an advertisement for the television show) and compiling a global user identity for the user. In some embodiments, the identity API can be called directly in order to track a user's activity (including events and other user conditions) and/or receive or display personalized content.”). It would have been obvious to one of ordinary skill in the art, at the time the filing was made to apply Morales to Shwartz and Yang to use api call instead of key tracker to more accurately track user action and so the system can precisely duplicate user actions and perform the correct steps without having to convert key press input into application actions. 

Pertinent Arts
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Taylor, Patent Number: 20190236455A1: In some embodiments, an optional learning speed monitoring engine 108 may be provided. Engine 108 is configured, in some embodiments to track the progress of the machine learning model in achieving rewards, tracked in an optional training performance storage 152. In an embodiment, responsive to identification that the ability of the machine learning model to obtain rewards has not improved in a number of epochs (e.g., indicating that a convergence is not occurring quickly enough or not at all), a notification is generated requesting additional demonstrator data to help the machine learning model improve.

Takigawa et al., Pub. No.: US 2019/0235481A1, In the machine learning device, the learning unit may have at least one value function to which a learning result is reflected, and include an reward calculation unit and a value function update unit; when the failure occurrence state which is confirmed agrees with the failure occurrence state which is included in the outputted quantitative failure occurrence mechanism, the reward calculation unit may set a plus reward; when there is a difference between the failure occurrence state which is confirmed and the failure occurrence state which is included in the outputted quantitative failure occurrence mechanism, the reward calculation unit may set a first minus reward depending on a magnitude of the difference; when the failure occurrence state which is confirmed is inconsistent with the physical phenomenon or the physical mechanism included in the quantitative failure occurrence mechanism which is estimated in collation with the physical model, the reward calculation unit may set a second minus reward which is larger than the first minus reward; and the value function update unit may update the value function depending on the plus reward or the first or second minus reward set by the reward calculation unit.
	

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DI XIAO whose telephone number is (571)270-1758. The examiner can normally be reached 9Am-5Pm est M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Renee Chavez can be reached on 5712701104. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DI XIAO/Primary Examiner, Art Unit 2179