DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The amendment filed 2022-08-17 has been entered.  The status of the claims is as follows:
Claims 1-9 and 16-20 are pending in the application.
Claims 1 and 16 have been amended.
Claims 10-15 and 21-23 are cancelled.
Response to Arguments
Applicant's arguments in response to rejections under 35 USC 101 have been fully considered and they are persuasive in light of the newly amended matter reciting retraining, as now the claims are directed to a method of training a machine learning model, which is understood to not be a mental process (see MPEP 2106.04(a)(1)(vii)).  The rejections under 35 USC 101 have been withdrawn.
Applicant’s arguments in response to Rejections under 35 USC 103 have been fully considered but they are not persuasive.  Applicant argues on Remarks Page 6 that since Gil discloses constraints provided by a user, this does not teach customer-specific data generated as a result of executions.  Examiner respectfully disagrees, as Gil teaches on Page 15 that “When retrieving a workflow template, the system would have to map the dataset constraints provided by the user to data variables in the template.”  Here, the system-mapped data from the user constraints to the data variables is “customer-specific data”, and this is done as a result of the system executing a process scenario, as the system is mapping the user supplied dataset constraints.  Gil also teaches that the generic framework data is a result of process scenario executions, as Gil Page 15 also discloses:  “However, the request only specifies one parameter setting, so the system has to generate values for other parameters and as it turns out they depend on the parameter value that the user provided”.  Here, Gil discloses that the system generates values for other parameters, and these are generic framework data that are produced by the system’s execution.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 1-4 and 16-18 are rejected under 35 U.S.C. 103 as being unpatentable over Gil et. al. (“A semantic framework for automatic generation of computational workflows using distributed data and component catalogs”; hereinafter “Gil”) in view of Kumar et. al. (US 2020/0004891 A1; hereinafter “Kumar”).
As per Claim 1, Gil teaches a computer implemented method (Gil, Pg 13 Para 1, discloses “memory”:  “We include here a parameter that is used in the Weka implementation and that specifies the allocation of memory to be used (indicated with a “j” argument identifier on both components) and should be set in proportion to the size of the input data sets.”)
mapping customer-specific data with generic framework data based on mapping of identifications of data objects included in the customer-specific data with identifications of data objects in the generic framework data (Gil concludes Page 15 Section 2.5 Para 1 by stating: “When retrieving a workflow template, the system would have to map the dataset constraints provided by the user to data variables in the template. Then the system would have to select the remaining datasets, components (if the template has abstract components), and parameters.”  Here, Gil discloses wherein the customer-specific data (“constraints provided by the user”) is generated during execution of instances of a single first process scenario from the plurality of process scenarios, and wherein the customer-specific data includes data (“parameter values”, “datasets”) for data objects identified (“parameters”, empty or default “datasets”) in the generic framework data (the “workflow instance”, including the system generated values).  Thus, Gil discloses mapping customer-specific data with generic framework data based on mapping of identifications of data objects included in the customer-specific data with identifications of data objects in the generic framework data (“the system would have to map the dataset constraints provided by the user to data variables in the template”).  Also note that Gil Page 8 under “Data Sets” explicitly discloses identifiers for datasets (“Id” column)).
wherein the generic framework data is generated as a result of executions of a plurality of process scenarios performed by a plurality of services implementing a generic workflow, (Gil, then on Page 15 Section 2.5 Para 1, discloses:  “the system has to generate values for other parameters” and “The system has to select a dataset that satisfies those constraints.”  Here, Gil discloses that the “system” may select parameter values and input datasets.  Thus, Gil discloses generic framework data (the “workflow instance”, including the system generated values) which is generated as a result of executions of a plurality of process scenarios performed by a plurality of services (“system has to generate”) implementing a generic workflow (“workflow template”).  As the “system” is to “generate” the values, then the system is currently executing a process scenario, and thus the data is generated as a result of executions.   Note that Gil discloses a plurality of process scenarios in Page 15 Section 2.5:  “The system has to specify all the aspects of the workflow that the user did not specify in the request. We illustrate this requirement for flexibility in the user input using the examples in Table 4. We show several examples of user requests, which refer to the workflows shown in Figure 1. For each request, the table shows whether workflow, component, parameter, and data are specified by the user or by the system.”  Note that Gil Figure 1 and Table 4 discloses, for example, 5 process scenarios.)
wherein the customer-specific data is generated as a result of executions of instances of a single first process scenario from the plurality of process scenarios (Gil suggests separate training and test instances on Page 10 Section 2.3 Para 2:  “The need for workflow templates places additional requirements for workflow representation. For example, for the workflows sketched in Figures 1(b) and 1(c) we would want to state one of the tenets of machine learning: that the training data must be a different dataset from the test data used within the same workflow.”  Thus, the user could provide customer-specific data for training in the single first process scenario, and then test data in a subsequent instance of the first process scenario.  This data is generated as a result of executions of the process scenario, as Gil Page 15 Section 2.4 discloses:  “When retrieving a workflow template, the system would have to map the dataset constraints provided by the user to data variables in the template.”  Thus, the customer-specific data is the user-provided data that is mapped by the system as a result of execution.), and
wherein the customer-specific data includes data for data objects identified in the generic framework data (Gil Page 8 under “Data Sets” explicitly discloses identifiers for datasets (“Id” column), as well as other data for each object in the data set).
based on the mapping, joining the customer-specific data with the generic framework data to generate an initial data set including data for a set of the data objects in the customer- specific data that maps to objects in the generic framework data, wherein the generated initial data set is to be provided for predicting one or more predictable variables for an execution of another instance of the first process scenario from the plurality of process scenarios associated with the generic workflow (As shown above, Gil concludes Page 15 Section 2.5 Para 1 by stating: “When retrieving a workflow template, the system would have to map the dataset constraints provided by the user to data variables in the template. Then the system would have to select the remaining datasets, components (if the template has abstract components), and parameters.” Thus, Gil discloses joining the customer-specific data with generic framework data based on the mapping.  Also as just explained above, Gil Page 10 Section 2.3 Para 2 suggests execution of another instance of the first process scenario (wherein the user generates test data, after having generated training data).  During this execution of another instance of the first process scenario, a prediction is made for a predictable variable, as Gil discloses in Page 7 Section 2.2 Para 3:  “An example of a component is an ID3 decision tree modeler. Given a dataset as input data, it uses the dataset as training data to learn a decision tree model that can be used to classify new data. It has an additional input argument, which is a parameter to specify which example feature is to be predicted by the learned model.”  Here, Gil discloses a classifier that makes a prediction.)
generating a machine-learning prediction model implementing a machine-learning algorithm for providing prediction results for process execution of a scenario instance of the plurality of process scenarios (Gil, Page 4 Last Paragraph, discloses:  “Figure 1 shows sketches of some very simple workflows that can be built with machine learning algorithms. The first workflow shows how to use 2007 weather data for Santa Monica to train an ID3 model, then use that model to make predictions of the weather in Pasadena using an ID3 classifier.”)
However, Gil does not explicitly teach defining one or more first features of the initial data set to correspond to independent variables for a machine learning prediction and one or more second features to correspond to the one or more predictable variables for the machine learning prediction; identifying input for performing the machine learning prediction for a process scenario execution, the input including the initial data set, an implementation of a machine learning algorithm, and data processing rules for data enhancement of the initial data set to generate an enhanced output data set; performing data adjustment based on the data processing rules over the initial data set to generate the enhanced output data set that supports predictive services associated with the execution of the other instance of the first process scenario; providing the enhanced output data set for evaluation by the implementation of the machine learning algorithm of the process scenario execution; and re-training the machine-learning prediction model based on the evaluation of the enhanced output data and actual output data from the other instance of the first process scenario from the plurality of process scenarios associated with the generic workflow.
Kumar teaches defining one or more first features of the initial data set to correspond to independent variables for a machine learning prediction and one or more second features to correspond to the one or more predictable variables for the machine learning prediction; (Kumar, Fig. 5, discloses:

    PNG
    media_image1.png
    584
    1214
    media_image1.png
    Greyscale

Kumar describes this in [0041]:  “presenting example predictions for approval/disapproval of a loan based on input data, including a loan application ID, a loan amount, and a predicted outcome. The results can be generated through a prediction workflow”.  Thus, Kumar discloses one or more first features of the initial data set to correspond to independent variables (“loan application ID, a loan amount”) and one or more second features to correspond to the one or more predictable variables (“predicted outcome”).  The prediction is a machine learning prediction, as Kumar discloses training a model in [0043]:  “Models can be built by a data scientist user using the platform, using any type of data to train or otherwise create the model.”  As such data can be used for training, which happens before actual use of the prediction model, such training data may be considered an initial data set.)
identifying input for performing the machine learning prediction [for a process scenario execution], the input including the initial data set, an implementation of a machine learning algorithm, and data processing rules for data enhancement of the initial data set to generate an enhanced output data set (Recall above that Gil teaches process scenarios.  Kumar, Para [0079], discloses “FIG. 10 depicts a flow diagram of an example process for prediction workflow determination and execution, according to implementations of the present disclosure.” Here, Kumar discloses performing a prediction.  This is a machine learning prediction, as specified by Kumar [0021]:  “The interfaces 136 enable consuming entities 104 to access predictive insights (e.g., predictions) generated through ML operator(s) 142 applied by the subengine 138 to data from the data hub.” Kumar, Para [0083], discloses:  “A workflow is determined (1102) for model generation based on inputs to the UI, including specification of data source(s) (e.g., training data), ML operators, and/or data preparation operators.”  Here, Kumar discloses identifying input (“inputs to the UI”), the input including the initial data set (“specification of data source(s) (e.g., training data))”, an implementation of a machine learning algorithm (“ML operators”), and data processing rules for data enhancement of the initial data set to generate an enhanced output data set (“data preparation operators”)).
performing data adjustment based on the data processing rules over the initial data set to generate the enhanced output data set that supports predictive services associated with the execution of the other instance of the first process scenario (Recall above that Gil discloses execution of a plurality of process scenarios.  Recall above that Kumar [0041] discloses predictive services:  “Kumar describes this in [0041]:  “presenting example predictions for approval/disapproval of a loan based on input data, including a loan application ID, a loan amount, and a predicted outcome. The results can be generated through a prediction workflow”.  Kumar, Para [0026], discloses “The DSP 102 consumes data from the data sources 122. In some examples, heterogeneous data sources can be combined using the native data hub functionality. Feature engineering data preparation operations from the model training phase can also be used. For example, master data and/or other enterprise data can be brought together with streaming data from IoT sensors (e.g., on a shop floor). Examples of data hub operators to achieve this can include Kafka™ (e.g., to help cross technology boundaries) and the data hub Open API REST client. The data hub SAP HANA™ client can also be used for HANA™ data sources. Some implementations enable a user to leverage SAP Agile Data Preparation™ to handle more complex data preparation transformations. Additional custom transformations can be achieved using native Data Hub Python and JavaScript operators.”  Here, Kumar discloses various data adjustments being performed on the data.  These result in an enhanced output data set, as implied by “transformations”, meaning a different form of data results.  The adjustments are based on data processing rules, as specified by Kumar [0041] (“data preparation operators”).  Kumar expands on these data preparation rules in [0036]: “In the example shown, the user has composed a workflow that includes five data preparation operators (e.g., ToString converter, ToMessage converter, etc.”  Thus, Kumar teaches performing data adjustment based on the data processing rules over the initial data set to generate an enhanced output data set.  )
providing the enhanced output data set for evaluation by the implementation of the machine learning algorithm [of the process scenario execution]  (Recall that above Gil discloses process scenarios.  Kumar, Para [0005], discloses:  “the execution order further includes the at least one data preparation operator executed prior to the at least one ML operator”.  Since the data preparation is included prior to the at least one ML operator, then this implies that the enhanced output data set, as previously shown to be disclosed by Kumar, is input to the machine learning algorithm.  Thus, Kumar discloses providing the output data set for evaluation by the implementation of the machine learning algorithm.)
and re-training the machine-learning prediction model based on the evaluation of the enhanced output data and actual output data from the other instance of the first process scenario from the plurality of process scenarios associated with the generic workflow (Recall above that Kumar discloses enhanced output data.  Kumar, Para [0039], discloses:  “To create a new model, a user can select data source(s) to input to the model generation workflow, and string together any number of suitable operators to generate the model based on the input (e.g., training) data from the selected data source(s). The trained model can then be stored and used in subsequent prediction workflows. The generation workflow can also be used to retrain the model based on new data as appropriate.”  Here, Kumar discloses that after an instance of the process scenario, the machine learning model can be retrained.)
Gil and Kumar are analogous art because they are both in the field of endeavor of workflow predictions.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the customer specification of generic workflows of Gil, with the data preparation of Kumar.  One would be motivated to do so because it would result in being able to generate predictions based on diverse and unstructured data across a whole enterprise, which would allow businesses to save money on hiring data scientists, and save on time and resources needed to develop and maintain various scripts (Kumar [0032-0033]: “One approach to analyse the data is to create custom ML solutions, for example using Apache Spark™. However, one major drawback with this approach is that it requires both Big Data/Spark expertise and a Data Scientist skillset, both highly sought-after skills which are rarely found together in the same people. Adopters have traditionally taken the approach of developing custom scripts and/or custom programming code for their data analytics processing. This has introduced further challenges to create and maintain specific scripts/code for each business model and dataset.”).

As per Claim 2, the combination of Gil and Kumar teaches the method of Claim 1.  Gil teaches further comprising: receiving workflow data for the generic workflow, wherein the generic workflow is implemented in multiple process scenarios, and wherein the workflow data is generated during execution of instances of multiple process scenarios.  (Gil, Page 10 Section 2.3, discloses:  “Figure 1(a) shows a workflow that refers to specific datasets and has values for all component parameters. We refer to such a workflow as a workflow instance. Workflow instances are specific to the given datasets and parameters. We also want to represent workflows that are generally applicable to a range of types of data or that can take in a range of parameter values. The workflows shown in Figures 1(b) and 1(c) are examples of such generic workflows. We refer to such a workflow as a workflow template. Workflow templates are reusable, and represent commonly used analyses.”  Here, Gil discloses a generic workflow (“workflow template”) and an implementation of the generic workflow (“workflow instance”). Gil, Pg 15 Section 2.5, discloses:  “An important consideration for automated workflow generation is that it must allow the user to have the flexibility to specify very little or alternatively to specify precisely what they want in the workflow request. The system has to specify all the aspects of the workflow that the user did not specify in the request. We illustrate this requirement for flexibility in the user input using the examples in Table 4. We show several examples of user requests, which refer to the workflows shown in Figure 1. For each request, the table shows whether workflow, component, parameter, and data are specified by the user or by the system. In user request RA a workflow template is provided that specifies all the components to be executed and the datasets to be used. However, the request only specifies one parameter setting, so the system has to generate values for other parameters and as it turns out they depend on the parameter value that the user provided.”  Here, Gil discloses receiving workflow data for the generic workflow (“data are specified by the user or by the system”) wherein the generic workflow is implemented in multiple process scenarios (Table 4 and Figure 1 show 5 process scenarios), and wherein the workflow data is generated during execution of instances of multiple process scenarios (some data is generated “by the system”, and thus during execution of instances of the process scenarios.)

As per Claim 3, the combination of Gil and Kumar teaches the method of Claim 2 as well as the one or more predictable variables (see Rejection to Claim 1).  Gil teaches further comprising: based on the received workflow data, defining a generic framework to include data for features of the generic workflow and the one or more predictable variables of the generic workflow, wherein the features are determined based on the workflow data and comprise a feature to identify data objects associated with an executed instance of the generic workflow. (Gil, Page 10 Section 2.3 Para 1, shown above in the rejection to Claim 2, also discloses a “workflow instance” with “specific datasets” and “values for all component parameters”.  Gil, then on Page 15 Section 2.5 Para 1, discloses:  “In user request RA a workflow template is provided that specifies all the components to be executed and the datasets to be used. However, the request only specifies one parameter setting, so the system has to generate values for other parameters and as it turns out they depend on the parameter value that the user provided. In user request RB only one dataset is specified, but constraints on another input dataset (training data) are given (domain must be weather and area must be Pasadena). The system has to select a dataset that satisfies those constraints.”  Here, Gil discloses that the “system” may select parameter values and input datasets.  Thus, Gil discloses generic framework data (the “workflow instance”, including the system generated values, and which data the system must supply is based on the received workflow data from the user, and thus Gil discloses that the features are determined based on the workflow data.  Gil concludes Page 15 Section 2.5 Para 1 by stating: “When retrieving a workflow template, the system would have to map the dataset constraints provided by the user to data variables in the template. Then the system would have to select the remaining datasets, components (if the template has abstract components), and parameters.”  Since there is a “mapping”, there is a way of identifying the data objects associated with an executed instance of the workflow, and thus Gil discloses wherein the features are determined based on the workflow data and comprise a feature to identify data objects associated with an executed instance of the generic workflow.  Gil also discloses the one or more predictable variables of the generic workflow, as Gil Pg 5 Figure 1 caption discloses classifiers performing a prediction:  “Figure 1: A high-level sketch of some workflows: (a) WA is a workflow to process 2007 weather data from Santa Monica to make weather predictions for Pasadena; (b) WB is a generic version of WA that uses ID3 to learn a model from training data, then use the model to classify test data; (c) WC is a generic workflow to use any algorithms that use decision trees to learn and classify continuous datasets after discretizing them; and (d) WD is a generic workflow that is customized for weather prediction using ID3, and that samples the training data to obtain results faster.”)

As per Claim 4, the combination of Gil and Kumar teaches the method of Claim 3.  Gil teaches further comprising: receiving customer-specific data, the customer-specific data being stored in relation to executions of the first process scenario from the plurality of process scenarios (Gil, Page 15 Section 2.5 Para 1, discloses:  “In user request RA a workflow template is provided that specifies all the components to be executed and the datasets to be used. However, the request only specifies one parameter setting, so the system has to generate values for other parameters and as it turns out they depend on the parameter value that the user provided. In user request RB only one dataset is specified, but constraints on another input dataset (training data) are given (domain must be weather and area must be Pasadena). The system has to select a dataset that satisfies those constraints.”  Here, Gil discloses that in each process scenario (“RA” is one of the 5 process scenarios given), the user provides some data, and the system provides the rest.  The user-supplied data is customer-specific data, and it is stored in relation to executions of the first process scenario (“RA”) from the plurality of process scenarios.)

As per Claim 16, Claim 16 is a system claim corresponding to method claim 1.  The difference is that it recites a computing device and a computer-readable storage device.  Kumar, Para [0007], discloses:  “Other implementations of any of the above aspects include corresponding systems, apparatus, and/or computer programs that are configured to perform the operations of the methods. The present disclosure also provides a computer-readable storage medium coupled to one or more processors and having instructions stored thereon which, when executed by the one or more processors, cause the one or more processors to perform operations in accordance with implementations of the methods provided herein.”  Claim 16 is rejected for the same reasons as Claim 1.

As per Claim 17, Claim 17 is a system claim corresponding to method claim 3.  The difference is that it recites a computing device and a computer-readable storage device.  Claim 17 is rejected for the same reasons as Claim 3.

As per Claim 18, Claim 18 is a system claim corresponding to method claim 4.  The difference is that it recites a computing device and a computer-readable storage device.  Claim 18 is rejected for the same reasons as Claim 4.

Claims 5 and 7 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Gil and Kumar in view of Tang et. al. (US 2015/0379051 A1; hereinafter Tang).
As per Claim 5, the combination of Gil and Kumar teaches the method of claim 1.  Kumar teaches data processing rules and performing the data adjustment over the initial data set.  (Kumar, Para [0083], discloses:  “A workflow is determined (1102) for model generation based on inputs to the UI, including specification of data source(s) (e.g., training data), ML operators, and/or data preparation operators.”  Here, Kumar discloses initial data set (“specification of data source(s) (e.g., training data))” and data processing rules (“data preparation operators”).  Kumar, Para [0026], discloses “The DSP 102 consumes data from the data sources 122. In some examples, heterogeneous data sources can be combined using the native data hub functionality. Feature engineering data preparation operations from the model training phase can also be used. For example, master data and/or other enterprise data can be brought together with streaming data from IoT sensors (e.g., on a shop floor). Examples of data hub operators to achieve this can include Kafka™ (e.g., to help cross technology boundaries) and the data hub Open API REST client. The data hub SAP HANA™ client can also be used for HANA™ data sources. Some implementations enable a user to leverage SAP Agile Data Preparation™ to handle more complex data preparation transformations. Additional custom transformations can be achieved using native Data Hub Python and JavaScript operators.”  Here, Kumar discloses various ways of performing the data adjustment.
However, the combination of Gil and Kumar does not explicitly teach performing data cleaning on the initial data set, the data cleaning being based on an evaluation of the initial data set according to data cleaning rules included in the data processing rules to generate a clean data set.
Tang teaches performing data cleaning on the initial data set, the data cleaning being based on an evaluation of the initial data set according to data cleaning rules [included in the data processing rules] to generate a clean data set. (Recall that Kumar above teaches data processing rules.  Tang, Para [0031-0032], discloses:  “There is therefore a need for improved data cleaning rules which seek to overcome the above problems. According to one aspect of the present invention, there is provided, a method for cleaning data stored in a database, the method comprising providing a set of fixing rules, each fixing rule incorporating a set of attribute values that capture an error in a plurality of semantically related attribute values, and a deterministic correction which is operable to replace one of the set of attribute values with a correct attribute value to correct the error, wherein the method further comprises comparing at least two of the fixing rules with one another to check that the error correction carried out by one fixing rule is consistent with the error correction carried out by another fixing rule.”  Here, Tang discloses performing data cleaning on the initial data set (“a method for cleaning data”) the data cleaning being based on an evaluation of the initial data set according to data cleaning rules (“for improved data cleaning rules… providing a set of fixing rules”) to generate a clean data set (Tang Para [0002]: “The term “cleaning” is used herein to mean correcting or repairing errors in values or attribute values which are stored as information in a database).
Tang and the combination of Gil and Kumar are analogous art because Tang’s data cleaning is reasonably pertinent to the problem faced by the combination of Gil and Kumar (see MPEP 2141.01(a):  “Rather, a reference is analogous art to the claimed invention if: (1) the reference is from the same field of endeavor as the claimed invention (even if it addresses a different problem); or (2) the reference is reasonably pertinent to the problem faced by the inventor (even if it is not in the same field of endeavor as the claimed invention). See Bigio, 381 F.3d at 1325, 72 USPQ2d at 1212”).
It would have been obvious before the effective filing date of the claimed invention to combine the data preparation rules of Gil and Kumar, with the data cleaning rules of Tang.  One of ordinary skill in the art would be motivated to do so to reduce the time and costs associated with needing to interact with users to detect and correct errors (Tang Para [0130]:  “The clear advantage of fixing rules, compared with the prior art, is that they can automatically detect errors and derive dependable repairs without interacting with the users, and without the assumption that some values have been validated to be correct.”)

As per Claim 7, the combination of Gil and Kumar teaches the method of claim 5.  Kumar teaches one or more first features corresponding to the independent variables (Kumar, Para [0041], discloses:  “presenting example predictions for approval/disapproval of a loan based on input data, including a loan application ID, a loan amount, and a predicted outcome. The results can be generated through a prediction workflow”.  Thus, Kumar discloses one or more first features of the initial data set to correspond to independent variables (“loan application ID, a loan amount”)
However, the combination of Gil and Kumar does not teach wherein the data cleaning is performed based on evaluation of occurrences of values in stored data in relation to a feature from the one or more first features corresponding to the independent variables according to the data cleaning rules.
Tang teaches data cleaning is performed based on evaluation of occurrences of values in stored data in relation to a feature [from the one or more first features corresponding to the independent variables] according to the data cleaning rules (Recall that Kumar teaches one or more first features corresponding to the independent variables.  (Tang, Para [0031-0032], discloses:  “There is therefore a need for improved data cleaning rules which seek to overcome the above problems. According to one aspect of the present invention, there is provided, a method for cleaning data stored in a database, the method comprising providing a set of fixing rules, each fixing rule incorporating a set of attribute values that capture an error in a plurality of semantically related attribute values, and a deterministic correction which is operable to replace one of the set of attribute values with a correct attribute value to correct the error, wherein the method further comprises comparing at least two of the fixing rules with one another to check that the error correction carried out by one fixing rule is consistent with the error correction carried out by another fixing rule.”  Here, Tang discloses data cleaning is performed (“a method for cleaning data”) according to data cleaning rules (“for improved data cleaning rules… providing a set of fixing rules”).  Tang, Para [0002], discloses: “The term “cleaning” is used herein to mean correcting or repairing errors in values or attribute values which are stored as information in a database”.  Here, Tang discloses cleaning is performed based on evaluation of occurrences of values (“errors in values”) in stored data (“which are stored as information in a database”) in relation to a feature (“attribute value”)).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention, to modify Gil and Kumar with the teachings of Tang for at least the same reasons recited in Claim 5.

Claims 6, 19, and 22 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Gil, Kumar, and Tang in view of Sharma et. al. (US 2020/0089650 A1; hereinafter Sharma).
As per Claim 6, the combination of Gil, Kumar, and Tang teaches the method of claim 5 as well as the clean data set (Tang, see Rejection to Claim 5).
Kumar teaches performing the data adjustment further comprises: evaluating features from the [clean] data set to generate the enhanced output data set to be provided for evaluation by the implementation of the machine learning algorithm, wherein the enhanced output data set includes a set of features defined based on evaluation and combination of the features from the clean data set according to data preparation rules (Kumar, Para [0026], discloses: “Some implementations enable a user to leverage SAP Agile Data Preparation™ to handle more complex data preparation transformations. Additional custom transformations can be achieved using native Data Hub Python and JavaScript operators.”  Here, Kumar discloses various data adjustments being performed on the data.  These result in an output data set, as implied by “transformations”, meaning a different form of data results.  The adjustments are based on data processing rules, as specified by Kumar [0041] (“data preparation operators”).  Kumar expands on these data preparation rules in [0036]: “In the example shown, the user has composed a workflow that includes five data preparation operators (e.g., ToString converter, ToMessage converter, etc.”  Thus, Kumar teaches evaluating features from data set to generate the enhanced output data set.  Kumar, Para [0005], discloses:  “the execution order further includes the at least one data preparation operator executed prior to the at least one ML operator”.  Since the data preparation is included prior to the at least one ML operator, then this implies that the output data set, as previously shown to be disclosed by Kumar, is input to the machine learning algorithm.  Thus, Kumar discloses output data set to be provided for evaluation by the implementation of the machine learning algorithm.  Kumar, Para [0041], discloses:  “presenting example predictions for approval/disapproval of a loan based on input data, including a loan application ID, a loan amount, and a predicted outcome. The results can be generated through a prediction workflow”.  Thus, Kumar discloses output data set includes a set of features defined based on evaluation and combination of the features from the data set, wherein “loan application ID” and “loan amount” represent a set of features from the data set.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Kumar with Gil for at least the reasons recited in Claim 1.
However, the combination of Gil, Kumar, and Tang thus far fails to teach evaluating features according to a type of data stored, according to data preparation rules for numerical and for categorical data. 
Sharma teaches evaluating features according to a type of data stored, according to data preparation rules for numerical and for categorical data. (Sharma, Para [0041], discloses:  “In this regard, FIG. 2 is a flowchart summarizing a conventional approach to data preprocessing, and FIG. 3 is a flowchart summarizing an improved approach to data preprocessing in accordance with certain example embodiments. As shown in FIG. 2, in step S202, the data is read and the data types (e.g., one of categorical and numerical data types) of different records are identified. In step S204, missing values are filled using imputation techniques. In step S206, categorical variables are transformed using one-hot encoding or label encoding, and numerical variables are treated with scaling operations. In step S208, the preprocessed data is ready for consumption by machine learning algorithms.”  Here, Sharma discloses evaluating features according to a type of data stored (“the data is read and the data types… of different records are identified”), according to data preparation rules for numerical and for categorical data (“categorical variables are transformed using one-hot encoding or label encoding, and numerical variables are treated with scaling operations”)).
Sharma and the combination of Gil, Kumar, and Tang are analogous art because they are both in the field of endeavor of data analysis, as well as in the field of machine learning.
Therefore, it would have been obvious before the effective filing date of the claimed invention to combine the prediction workflow with data preparation and cleaning of the combination of Gil, Kumar, and Tang, with the automated data processing including rules by data type of Sharma.  One of ordinary skill in the art would be motivated to do so in order to automate the data preparation process to prepare the data for machine learning in a more efficient manner, thereby reducing time and costs for manual data preparation (Sharma Para [0029]:  “In certain example embodiments, it becomes feasible to predict the data cleansing operations for a particular column or for a complete dataset very quickly, which helps improve performance at the preprocessing phase in an automatic manner that removes subjectivity and does not require reliance on the accuracy values of the model performance.”)

As per Claim 19, Claim 19 is a system claim corresponding to method claims 6 and 7.  The difference is that it recites a computing device and a computer-readable storage device.  Kumar, Para [0007], discloses:  “Other implementations of any of the above aspects include corresponding systems, apparatus, and/or computer programs that are configured to perform the operations of the methods. The present disclosure also provides a computer-readable storage medium coupled to one or more processors and having instructions stored thereon which, when executed by the one or more processors, cause the one or more processors to perform operations in accordance with implementations of the methods provided herein.”  Claim 19 is rejected for the same reasons as Claims 6 and 7.

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Gil and Kumar in view of Holle (US 2003/0172084 A1).
As per Claim 8, the combination of Gil and Kumar teaches the method of claim 1.  Kumar teaches data processing rules for data enhancement (Kumar, Para [0083], discloses:  “A workflow is determined (1102) for model generation based on inputs to the UI, including specification of data source(s) (e.g., training data), ML operators, and/or data preparation operators.”  Here, Kumar discloses data processing rules for data enhancement (“data preparation operators”).
data entries for a feature from the one or more first features (Kumar, Para [0041], discloses:  “presenting example predictions for approval/disapproval of a loan based on input data, including a loan application ID, a loan amount, and a predicted outcome. The results can be generated through a prediction workflow”.  Thus, Kumar discloses data entries for a feature (“loan application ID” or “loan amount”) from the one or more first features that correspond to independent variables (“loan application ID, a loan amount”) and one or more second features to correspond to the one or more predictable variables (“predicted outcome”).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Kumar with Gil, for at least the reasons recited in Claim 1.
However, the combination of Gil and Kumar does not teach wherein the data processing rules for data enhancement comprise a rule associated with adjusting data entries for a feature from the one or more first features based on a maximum number of categories of data entries included in stored data for the feature.
Holle teaches a rule associated with adjusting data entries based on a maximum number of categories of data entries included in stored data (Holle, Para [0087], discloses:  “The following values control the heuristics used in the method which follows. The parameters shall be referred to with the following names: (i) Prefcats: preferred number of categories, e.g. 10; (ii) Maxcats: maximum number of categories, e.g. 25; (iii) Othratio: an acceptable ratio between count in the "Other" category and the biggest non-Other category, e.g. 0.5; and (iv) Maxothratio: largest acceptable ratio between "other" and the biggest non-Other, e.g. 1.”  Here, Holle teaches maximum number of categories of data entries included in stored data for the feature (“Maxcats: maximum number of categories”).  Also, Holle Fig. 2 illustrates making adjustments based on this maximum by creating categories, as seen below.


    PNG
    media_image2.png
    740
    1141
    media_image2.png
    Greyscale

Above, Holle illustrates adjustments made based on “maxcats” in steps 707, 711, and 713.  Thus, Holle teaches a rule associated with adjusting data entries based on a maximum number of categories of data entries included in stored data.)
Holle and the combination of Gil and Kumar are analogous art because they are both in the field of endeavor of data analysis.
Therefore, it would have been obvious before the effective filing date of the claimed invention to combine the prediction workflow with data preparation rules of Gil and Kumar, with the maximum number of categories of Holle.  The combination would result in a prediction algorithm using a data in a limited number of categories. One of ordinary skill in the art would be motivated to do so in order to generate more accurate predictions by limiting the influence of outlier data, whether that be continuous data as suggested by Holle below, or even rarer categories (Holle Para [0029]:  “Categorization provides another benefit for data mining. Continuous variables in databases often have outliers, i.e., extreme values which are often errors or other anomalies which may be present in a very small number of cases, but by virtue of their values have significant influence on the algorithms, "distracting" them from seeing the patterns in the broader databases. Using categorized data instead of raw data, these problems are minimized.”)

Claims 9, 20, and 23 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Gil and Kumar in view of Teller et. al. (US 2010/0179930 A1; hereinafter Teller).
As per Claim 9, the combination of Gil and Kumar teaches the method of claim 1 as well as process scenario instance execution (“running service”) and generic framework (see Rejection to Claim 1).  Kumar teaches executing the machine learning algorithm based on the implementation, the enhanced output data set, and input data associated with the [process scenario] execution, to generate a prediction value (Kumar, Para [0004], discloses:  “and executing the workflow including executing the at least one ML operator and the at least one visualization operator in the execution order against data included in the at least one data source, wherein the workflow generates at least one prediction that is presented according to the at least one visualization operator.”  Here, Kumar discloses executing the machine learning algorithm based on the implementation (“executing the at least one ML operator”) and input data associated with the [process scenario] execution (“against data included in the at least one data source”), to generate a prediction value (“generates at least one prediction”).  Kumar, Para [0005], discloses:  “the execution order further includes the at least one data preparation operator executed prior to the at least one ML operator”.  Since the data preparation is included prior to the at least one ML operator, then this implies that the enhanced output data set, as previously shown to be disclosed by Kumar, is input to the machine learning algorithm.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Kumar with Gil for at least the reasons recited in Claim 1.
However, the combination of Gil and Kumar thus far fails to teach storing the prediction value for the process scenario execution and an actual value for the process scenario execution, wherein the actual value is determined based on actual execution of the process scenario at a running service implementing a process scenario instance and associated with the input data used for generating the prediction value; and reevaluating the generic framework data based on an evaluation of stored prediction values and actual values, wherein reevaluating the generic framework data includes adjusting the features defined at the generic framework data
Teller teaches storing the prediction value for the [process scenario] execution and an actual value for the [process scenario] execution, wherein the actual value is determined based on actual execution [of the process scenario at a running service implementing a process scenario instance] and associated with the input data used for generating the prediction value; (Teller, Para [0013], discloses:  “Rather than attempt to develop a closed model, the methods and systems disclosed herein take as inputs as many potential causal factors as possible, connecting thousands of data sources as inputs to a machine learning platform that makes predictions, compares predictions to actual results, and adjusts the weight that it gives to particular sources, strengthening the influence of data sources that lead to good predictions and weakening the influence of data sources that lead to poor predictions.” Here, Teller discloses storing the prediction value for the execution and an actual value for the execution (“compares predictions to actual results”, wherein these must be at least temporarily stored in memory to perform the subsequent comparison).  It is inherent that the actual value is determined based on actual execution of some process that is being represented by the machine learning model, as the actual value is not created from nothing. Also, recall that Gil and Kumar discloses execution of a process (“workflow instance”).  The actual value is also determined based on the input data used (“thousands of data sources”) for generating the prediction value (“connecting thousands of data sources as inputs to a machine learning platform that makes predictions, compares predictions to actual results”))
 and reevaluating the [generic framework] data based on an evaluation of stored prediction values and actual values, wherein reevaluating the [generic framework] data includes adjusting the features defined at the [generic framework] data (Teller, Para [0013], discloses:  “Rather than attempt to develop a closed model, the methods and systems disclosed herein take as inputs as many potential causal factors as possible, connecting thousands of data sources as inputs to a machine learning platform that makes predictions, compares predictions to actual results, and adjusts the weight that it gives to particular sources, strengthening the influence of data sources that lead to good predictions and weakening the influence of data sources that lead to poor predictions. Over time, the machine learning platform learns to make a prediction based on those input factors among the many it has considered that contribute most to accurate predictions. For certain kinds of predictions, especially those most dependent on small contributions from many different factors, the platform may generate predictions that are much more accurate than current models.”  Here, Teller discloses and reevaluating the data based on an evaluation of stored prediction values and actual values (“compares predictions to actual results”), wherein reevaluating the data includes adjusting the features defined at the data (“adjusts the weight that it gives to particular sources, strengthening the influence of data sources that lead to good predictions and weakening the influence of data sources that lead to poor predictions.”)).
Teller and the combination of Gil and Kumar are analogous art because they are both in the field of endeavor of prediction analysis, as well as in the field of machine learning.
It would have been obvious before the effective filing date of the claimed invention to combine the prediction workflow with data preparation and cleaning of Gil and Kumar, with the feature adjustment based on actual results of Teller.  One of ordinary skill in the art would be motivated to do so in order to improve the accuracy of the predictions (Teller, Para [0013]:  “Over time, the machine learning platform learns to make a prediction based on those input factors among the many it has considered that contribute most to accurate predictions. For certain kinds of predictions, especially those most dependent on small contributions from many different factors, the platform may generate predictions that are much more accurate than current models.”)

As per Claim 20, Claim 20 is a system claim corresponding to method claim 9.  The difference is that it recites a computing device and a computer-readable storage device.  Kumar, Para [0007], discloses:  “Other implementations of any of the above aspects include corresponding systems, apparatus, and/or computer programs that are configured to perform the operations of the methods. The present disclosure also provides a computer-readable storage medium coupled to one or more processors and having instructions stored thereon which, when executed by the one or more processors, cause the one or more processors to perform operations in accordance with implementations of the methods provided herein.”  Claim 20 is rejected for the same reasons as Claim 9.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Herbst et al. (“Integrating machine learning and workflow management to support acquisition and adaptation of workflow models”) discloses using machine learning to implement workflows, and also includes a cyclical process in which the workflow can be retrained.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEONARD A SIEGER whose telephone number is (571)272-9710. The examiner can normally be reached M-F 8:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571) 272-9767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/L.A.S./Examiner, Art Unit 2126 
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126