DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Examiner notes the entry of the following papers:
Amended claims filed 9/16/2022.
Applicant arguments/remarks made in amendment filed 9/16/2022.
Claims 1, 8, and 15 are amended.
Claims 1-20 are presented for examination.
Response to Arguments
Applicant presents several arguments.  Each is addressed. 
Applicant’s arguments, combined with amendments, that claims 1-20 are patent-eligible subject matter is persuasive. (Remarks, page 1, paragraph 2, line 1.) The rejections under 35 U.S.C. § 101 are withdrawn.
Applicant’s arguments that the prior art of record does not teach the amended claims are moot in view of new grounds of rejection necessitated by amendments. (Remarks, page 4, paragraph 2, line 1.) See below for detailed rejection.
Applicant’s arguments with respect to the prior art rejections of the dependent claims rely upon features recited in the independent claims. Therefore, the dependent claims are rejected for depending from rejected claims.  See below for detailed rejection.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 4-6, 8, 11-13, 15, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Jin et al (Automated Behavioral Regression Testing, herein Jin), Pei  et al (DeepXplore: Automated Whitebox Testing of Deep Learning Systems, herein Pei), Bhattacharjee et al (IBM Deep Learning Service, herein Bhattacharjee), and Foo et al (Mining Performance Regression Testing Repositories for Automated Performance Analysis, herein Foo). 
Regarding claim 1,
	Jin teaches a computer-implemented method (Jin, page 137, column 1, line 12 “To address this issue, we propose a novel approach called Behavioral Regression Testing (BERT).”  In other words, Behavioral Regression Testing is a computer-implemented method.)
	for automatic regression detection in [machine learning (ML) models] the method being executed by one or more processors (Jin, Figure 4, and page 137, column 1, line 12 “To address this issue, we propose a novel approach called Behavioral Regression Testing (BERT).  Given two versions of a program, BERT identifies behavioral differences between the two versions through dynamical analysis, in three steps.  First, it generates a large number of test inputs that focus on the changed parts of the code.  Second, it runs the generated test inputs on the old and new versions of the code and identifies differences in the tests’ behavior: Third, it analyzes the identified differences and presents them to the developers.”

    PNG
    media_image1.png
    447
    593
    media_image1.png
    Greyscale

In other words, BERT is a computer-implemented method, and behavioral regression testing is automatic regression detection.) and comprising:
	[determining, by an automated regression detection system (ARDS), that training of a [ML model] is complete, by: periodically checking, by a training and resource monitoring module of the ARDS, status of the container, within which the training of the ML model is performed,]
	the [ML model] comprising a version of a previously trained [ML model], the training and resource monitoring module being configured to allocate resources and monitor all training jobs executed by a ML server including the training of the ML model, and (Jin, Figure 4, and, page 139, column 2, paragraph 3, line 1. “Figure 4 provides a high-level view of our approach compared to traditional regression testing.  In traditional regression testing (e.g., [3] – [5]), an existing test suite (T0) defined for the old version of a program (V0) is run on the modified version of a program (V1). Non-obsolete test cases that, according to their oracle, fail on V1 and did not fail on V0 are reported to the developers as warnings that may indicate the presence of regression faults.” In other words, V0 is previously trained version, and Test Suite TO is training and resource monitoring module.)
[determining that the status of the container indicates that the container is shutdown indicating that the training of the ML model is complete;] 
and in response to determining that a software modification/[training of the ML model] is complete, automatically, by the ARDS: retrieving the [ML model], executing regression testing and detection using the [ML model], (Jin, page 137, column 1, paragraph 1, line 1 “When a program is modified during software evolution, developers typically run the new version of the program against its existing test suite to validate that the changes made on the program did not introduce unintended side effects (i.e., regression faults). And, page 137, column 1, paragraph 1, line 12 “To address this issue, we propose a novel approach called Behavioral Regression Testing (BERT).” And, page 137, column 2, paragraph 2, line 8 “The goal of BERT is to accurately and automatically identify behavioral differences between two versions of a program by means of dynamic analysis.”  And, page 139, column 2, paragraph 3, line 1. “Figure 4 provides a high-level view of our approach compared to traditional regression testing.  In traditional regression testing (e.g., [3] – [5]), an existing test suite (T0) defined for the old version of a program (V0) is run on the modified version of a program (V1). Non-obsolete test cases that, according to their oracle, fail on V1 and did not fail on V0 are reported to the developers as warnings that may indicate the presence of regression faults.” In other words, BERT is automated regression detection system (ARDS), and accurately and automatically identify behavioral difference between the two versions of the program is automatically …retrieving and executing regression testing and detection.)
	generating regression results relative to the previously trained [ML model] (Jin, page 139, column 2, paragraph 3, line 1. “Figure 4 provides a high-level view of our approach compared to traditional regression testing.  In traditional regression testing (e.g., [3] – [5]), an existing test suite (T0) defined for the old version of a program (V0) is run on the modified version of a program (V1). Non-obsolete test cases that, according to their oracle, fail on V1 and did not fail on V0 are reported to the developers as warnings that may indicate the presence of regression faults.” In other words, run on the modified version or a program (v1) is generating regression results relative to the previously trained version, and reported to the developers is publishing the regression results.),
	[the regression results comprising a first set of results representative of performance of the previously trained ML model and a second set of results representative of performance of the ML model, each of the first set of results and the second set of results including a set of keys, and,
	for each key in the set of keys, a value for each attribute in a set of attributes is provided, the set of attributes comprising proposal rate, accuracy, and auto rate for each class of a set of classes predicted by the previously trained ML model and the ML model, and publishing the results.]
	Thus far, Jin does not explicitly teach machine learning (ML) models.
	Pei teaches machine learning (ML) models (Pei, page 1, column 1, paragraph 2, line 1 “We design, implement, and evaluate DeepXplore, the first whitebox framework for systematically testing real-world DL systems.” In other words, DL (deep learning) systems is machine learning (ML) models.)
	Both Jin and Pei are directed to software testing.  In view of the teaching of Jin, it would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Pei into Jin.  This would result in being able to perform regression tests on machine learning systems.
	One of ordinary skill in the art would be motivated to do so in order to automatically perform regression tests on retrained machine learning models for large enterprises. (Pei, page 1, column 1, paragraph 1, line 1 “Deep learning (DL) systems are increasingly deployed in safety- and security-critical domains including self-driving cars and malware detection, where the correctness and predictability of a system’s behavior for corner case inputs are of great importance. Existing DFL testing depend heavily on manually label data and therefore often fails to expose erroneous behaviors for rare inputs.  We design, implement, and evaluate DeepXplore, the first whitebox framework for systematically testing real-world DL systems.)
	Thus far, the combination of Jin and Pei does not explicitly teach determining, by an automated regression detection system (ARDS), that training of a ML model is complete by periodically checking, by a training and resource monitoring module of the ARDS, status of the container, within which the training of the ML model is performed, 
	Bhattacharjee teaches determining, …, that training of a ML model is complete by periodically checking,…, status of the container, within which the training of the ML model is performed (Bhattacharjee, Figure 3, and page 10, paragraph 4, line 4, “Each container holding either a learner or a parameter server shard is allocated a unique znode (Zookeeper path) by the LCM before deployment; a sidecar (auxiliary) process called the “watchdog” in the container monitors the learner/parameter server and updates its status in the corresponding znode.  Status updates can then be read by LCM from Zookeeper.  Through status monitoring, the LCM can determine when all learners have finished training, decommission them and reclaim computing resources allocated to them.” 

    PNG
    media_image2.png
    690
    745
    media_image2.png
    Greyscale

In other words, learner is ML model, container is container, status monitoring is periodically checking status, and LCM can determine when all learners have finished training is determining that the status of the container indicates that the container is shutdown indicating that the training of the ML model is complete.) ,
	Bhattacharjee teaches determining that the status of the container indicates that the container is shutdown indicating that the training of the ML model is complete (Bhattacharjee, Figure 3, and page 10, paragraph 4, line 4, “Each container holding either a learner or a parameter server shard is allocated a unique znode (Zookeeper path) by the LCM before deployment; a sidecar (auxiliary) process called the “watchdog” in the container monitors the learner/parameter server and updates its status in the corresponding znode.  Status updates can then be read by LCM from Zookeeper.  Through status monitoring, the LCM can determine when all learners have finished training, decommission them and reclaim computing resources allocated to them.”  In other words, learner is ML model, container is container, status monitoring is determining the status of the container, and determine when all learners have finished training is determining that the training of the ML model is complete.);
	Both Bhattacharjee and the combination of Jin and Pei are directed to testing software that is part of a larger enterprise. The combination of Jin and Pei teaches by an automatic regression detection system.  It would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Bhattacharjee into the combination of Jin and Pei, thereby creating a combination that teaches determining, by an automated regression detection system (ARDS), that training of a ML model is complete by periodically checking status of the container, within which the training of the ML model is performed, and in response to the training being complete, performing regression testing on the model.
	One of ordinary skill in the art would be motivated to do this because training deep learning models has become increasingly complex and computationally intensive, making it prohibitive for typical users to accomplish. (Bhattacharjee, “Training deep neural networks, known as deep learning, is currently highly complex and computationally intensive. While GPUs have helped accelerate training, the amount of data as well as complexity of models have increased the computation need beyond the capability of a single GPU. For example, training on 2.5 million images on a single GPU can take 6 days on a simple model [3].  A typical user of deep learning, a data scientist, is also unnecessarily exposed to the details of the underlying hardware and software infrastructure, including configuring expensing GPU  machines, installing dep learning libraries, and managing the jobs during execution to handle failures and recovery.”) 
	Thus far, the combination of Jin, Pei and Bhattacharjee does not explicitly teach the regression results comprising a first set of results representative of performance of the previously trained ML model and a second set of results representative of performance of the ML model, each of the first set of results and the second set of results including a set of keys.
	Foo teaches the regression results comprising a first set of results representative of performance of the previously trained ML model and a second set of results representative of performance of the ML model, each of the first set of results and the second set of results including a set of keys (Foo, abstract, line 1, “Performance regression testing detects performance regressions in a system under load. Such regressions refer to situations where software performance degrades compared to previous releases, although the new version behaves correctly.” In other words, performance degrades compared to previous releases is performance of previous model is compared to results of current model.) , and,
	Foo also teaches for each key in the set of keys, a value for each attribute in a set of attributes is provided, the set of attributes comprising proposal rate, accuracy, and auto rate for each class of a set of classes predicted by the previously trained ML model and the ML model, and publishing the results  (Foo, Figures 2  (a),(b), and (c).

    PNG
    media_image3.png
    808
    668
    media_image3.png
    Greyscale

Examiner notes from FIG. 3 of the instant application, that a key is a column heading.  Similarly, proposal rate, accuracy, and auto rate are merely three different averaging calculations for measuring performance results comparing the performance of the first version of the software and the second version of the software in the regression tests. (Drawings of instant application, FIG. 3.) In other words, from Figure 2(a), Severity column is a key, and the three categories of performance: Application Server CPU Utilization, Application Server Memory Utilization, and Database Disk Read Bytes/sec; are three categories of averaging performance that correspond to proposal rate, accuracy, and auto rate, and performance regression report is publishing the results.).
	Both Foo and the combination of Jin, Pei, and Bhattacharjee are directed to automated regression testing of software, among other things.  The combination of Jin, Pei, and Bhattacharjee teach automated regression testing of software machine learning models using containers, but does not explicitly teach the use of three different measures of performance. Foo teaches automated regression of software using at least three different measures of performance, but does not explicitly teach automated regression testing of software machine learning models using containers.  In view of the teaching of Jin, Pei, and Bhattacharjee, it would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Foo into the combination of Jin, Pei, and Bhattacharjee.  This would result in being able to perform automated regression testing of software machine learning models using containers and evaluating based on at least three measures of performance in the analysis.
	One of ordinary skill in the art would be motivated to do this because much of the issues of new software comes from performance related issues as opposed to whether the new version is correct. (Foo, page 32, column 1, paragraph 2, line 1 “Performance regression testing is integrated into traditional regression testing to reveal performance bottleneck and design problems early on [9]. Traditional regression testing focuses on verifying the functional correctness of a change [17]. However, research on large industrial projects shows that the primary problems observed in the field are often performance related [19].”)
Regarding claim 4,
	The combination of Jin, Pei, Bhattacharjee, and Foo teach the method of claim 1,
	wherein the container is provisioned in a ML server to train the ML model. (Bhattacharjee, page 1, paragraph 1, line 6 “In this paper, we will discuss the details of the software architecture behind IBM’s deep learning as a service (DLaaS).” And, page 7, paragraph 2, line 1 “The DLaaS Platform Services layer provides the key building blocks for executing and managing long running training jobs.  At the core is a GPU-enabled Container Service that is responsible for executing a training job based on a predefined learning image from the Docker [12] Registry.” In other words, deep learning is ML, DLaaS is ML server, container is container, provides the key building blocks is provisioned, and executing and managing long running training jobs is train the ML model.)
Regarding claim 5,
	The combination of Jin, Pei, Bhattacharjee, and Foo teach the method of claim 1,
	wherein publishing the regression results comprises transmitting one or more notifications to respective stakeholders through a communication platform.  (Jin, page 139, column 2, paragraph 3, line 1. “Figure 4 provides a high-level view of our approach compared to traditional regression testing.  In traditional regression testing (e.g., [3] – [5]), an existing test suite (T0) defined for the old version of a program (V0) is run on the modified version of a program (V1). Non-obsolete test cases that, according to their oracle, fail on V1 and did not fail on V0 are reported to the developers as warnings that may indicate the presence of regression faults.” In other words, reported to the developers is transmitting one or more notifications to respective stakeholders.)
Regarding claim 6,
	The combination of Jin, Pei, Bhattacharjee, and Foo teach the method of claim 1,
	wherein publishing the regression results comprises providing a user interface (UI) for display, the UI graphically depicting regression results as between the ML model and the previously trained ML model.  (Pei, page 8, column 1, paragraph 3, line 1 “We implement DeepXplore using TensorFlow 1.0.1 [5] and Keras 2.0.3 [16] DL frameworks. Our implementation consists of around 7,086 lines of Python code.  Our code is built on TensorFlow/Keras but does not require any modifications to these frameworks.”  In other words, Keras is an API that provides a user interface, and can depict regression in graphs.)12.	Claims 8, and 11-13 are non-transitory computer-readable storage medium claims corresponding to computer-implemented method claims 1, and 4-6, respectively. Otherwise, they are the same. It is implicit that a computer-implemented method requires a non-transitory computer-readable storage medium and one or more processors in order to execute.  Therefore, claims 8, and 11-13 are rejected for the same reasons as claims 1, and 4-6, respectively.
Claims 15, and 18-20 are system claims corresponding to computer-implemented method claims 1, and 4-6, respectively.  Otherwise, they are the same.  It is implicit that a computer-implemented method requires a computing device, and a computer-readable storage device capable of storing instructions in order to execute.  Therefore, claims 15, and 18-20 are rejected for the same reasons as claims 1, and 5-6, respectively.
Claims 2-3, 7, 9-10, 14, and 16-17 are rejected under 35 U.S.C. 103 as being unpatentable over Jin, Pei, Bhattacharjee, Foo, and Nguyen et al (Automated Detection of Performance Regressions Using Statistical Process Control Techniques, herein Nguyen).
Regarding claim 2,
	The combination of Jin, Pei, Bhattacharjee, and Foo teach the method of claim 1,
	Thus far, the combination of Jin, Pei, Bhattacharjee, and Foo does not explicitly teach wherein executing regression testing and detection using the ML model comprises determining variance in performance of the ML model using a Gaussian process (GP).
	Nguyen teaches wherein executing regression testing and detection using the ML model comprises determining variance in performance of the ML model using a Gaussian process (GP).  (Nguyen, page 299, column 1, paragraph 1, line 1 “The goal of performance regression testing is to check for performance regressions in a new version of a software system.” And, page 300, column 1, paragraph 2, line 1 “The contributions of this paper are:
We propose an approach based on control charts to identify performance regressions.
We derive effective solutions to satisfy the two assumptions of control charts about non-varying load and normality of the performance counters.
We show that our approach can automatically identify performance regressions by evaluating its accuracy on a large enterprise system and an open-source software system.”
And, page 300, column 2, paragraph 5, line 1 “Normality of process output.  Process output usually has a linear relationship with the process input.  This linear relation leads to a normal distribution of the process output which is the main underlying statistical foundation of control charts.” In other words, regression testing is regression testing, and using a normal distribution is using a Gaussian process.)
	Both Nguyen and the combination of Jin, Pei, Bhattacharjee, and Foo are directed to regression testing of software.  In view of the teaching of the combination of Jin, Pei, Bhattacharjee, and Foo, it would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Nguyen into the combination of Jin, Pei, Bhattacharjee, and Foo.  This would result in being able to use a statistical process for evaluation in the regression testing.
	One of ordinary skill in the art would be motivated to do this in order to speed up the time it takes to perform regressing testing by using statistical methods. (Nguyen, page 1, column 1, paragraph 1, line 4 “Performance regression testing is very time consuming yet there is usually little time assigned for it. A typical test run would output thousands of performance counters.  Testers usually have to manually inspect these counters to identify performance regressions.  In this paper, we propose an approach to analyze performance counters across test runs using a statistical process control technique called control charts.”)

Regarding claim 3, 
	The combination of Jin, Pei, Bhattacharjee, and Nguyen teach the method of claim 2, 
	wherein the variance comprises one or more of a negative-side variance indicating regression of the ML model relative to the previously trained ML model, and a positive-side variance indicating improvement of the ML model relative to the previously trained model. (Nguyen, Figure 2, and page 301, column 2, paragraph 1, line 1 “Detect performance regressions.  After the tests are done, the test engineers have to analyze the performance counters.  They compare the counters of the new version with the existing version.  The runs/counters of the existing version are called the baseline runs/counters.  The runs/counters of the new version are called the target runs/counters.  If the target counters are similar to the baseline counters, the test will pass, i.e., there is no performance regression.  Otherwise, the test engineers will alert the developers about the potential of performance regression in the new version.  For example, if the baseline run uses 40% of CPU on average and the target run uses 39% of CPU on average, the new version should be acceptable.  However, if the target run uses 55% of CPU on average, there is likely a performance problem with the new version.”

    PNG
    media_image4.png
    229
    878
    media_image4.png
    Greyscale

In other words, baseline is previously trained ML model, target is ML model, 40% CPU use is baseline average, 39% CPU on average is positive-side variance, and 55% is negative-side variance.)
Regarding claim 7,
	The combination of Jin, Pei, Bhattacharjee, and Nguyen teach the method of claim 6,
	wherein the regression results are based on a set of attributes that each represent a respective performance of the ML model relative to the previously trained model. (Nguyen, Figure 2, and page 301, column 2, paragraph 1, line 1 “Detect performance regressions.  After the tests are done, the test engineers have to analyze the performance counters.  They compare the counters of the new version with the existing version.  The runs/counters of the existing version are called the baseline runs/counters.  The runs/counters of the new version are called the target runs/counters.  If the target counters are similar to the baseline counters, the test will pass, i.e., there is no performance regression.  Otherwise, the test engineers will alert the developers about the potential of performance regression in the new version.  For example, if the baseline run uses 40% of CPU on average and the target run uses 39% of CPU on average, the new version should be acceptable.  However, if the target run uses 55% of CPU on average, there is likely a performance problem with the new version.” In other words, compare counters of the new version with the existing version is based on a set of attributes that each represent a respective performance of the ML model relative to the previously trained model.)	
Claims 9, 10 and 14 are computer-readable storage medium claims corresponding to computer-implemented method claims 2, 3 and 7, respectively.  Otherwise, they are the same.  Therefore, claims 9, 10 and 14 are rejected for the same reasons as claims 2, 3 and 7, respectively.
Claims 16 and 17 are system claims corresponding to computer-implemented method claims 2 and 3, respectively.  Otherwise, they are the same.  Claims 16 and 17 are rejected for the same reasons as claims 2 and 3, respectively.
Conclusion
	Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
	Any inquiry concerning this communication or earlier communications from the examiner should be directed to BART RYLANDER whose telephone number is (571)272-8359. The examiner can normally be reached Monday - Thursday 8:00 to 5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on 571-270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/B.I.R./Examiner, Art Unit 2124                                                                                                                                                                                                        


/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124