DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claim 20 is objected to because of the following informalities:  
In claim 20, line 15, “a third stage configured to” may read “a fourth stage configured to”.  
Appropriate correction is required.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.


The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
Claim 17 recites the limitation “a schema reference module”, “a data synthesizer module coupled to the schema inference module”, “an initializer module”, “an executor module coupled to the initializer module”, “an expander module coupled to the executor module”, “a terminator module coupled to the expander module”, and “an information repository coupled to the executor and terminator modules” has been interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because it uses a generic placeholder “configured to” coupled with functional language “receive a raw dataset and to determine a schema of the raw dataset”, “receive one or more data quality metric goals”, “identify an initial set of validation nodes”, “execute the initial set of validation nodes”, “iteratively expand and execute a next set of validation nodes”, “iteratively determine the next set of validation nodes”, and “provide a corrected dataset of the raw dataset” respectively without reciting sufficient structure to achieve the function. 
Claim 19 recites the limitation “the initial set of validation nodes” has been interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because it uses a generic placeholder “configured to” coupled with functional language “identify all possible remediation actions any data quality check” without reciting sufficient structure to achieve the function. 
Claim 20 recites the limitation “a first stage”, “a second stage”, “a third stage”, and a third stage has been interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because it uses a generic placeholder “configured to” coupled with functional language “perform a logical check of the raw dataset”, “for each new version of data produced, generate a data quality metric (DQM)”, “for each DQM of each new version of data produced, perform a comparison to the raw dataset”, and “select the operator of the new version of data produced that best meets the data quality metric goals” respectively without reciting sufficient structure to achieve the function. 
Since the claim limitations invoke 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, claims 17 and 19-20  has been interpreted to cover the corresponding structure described in the specification that achieves the claimed function, and equivalents thereof.  
A review of the specification shows that the following appears to be the corresponding structure described in the specification for the 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph limitation: a schema reference module, a data synthesizer module coupled to the schema inference module, an initializer module, an executor module coupled to the initializer module, “an expander module coupled to the executor module, “a terminator module coupled to the expander module, and “an information repository coupled to the executor and terminator modules in claim 17, the initial set of validation nodes in claim 19, and a first stage, a second stage, a third stage, and a third stage in claim 20 (see Para. 0010; 0011; 0013; 0014; 0017 of the publication) are disclosed in applicant’s specification. 
If applicant wishes to provide further explanation or dispute the examiner’s interpretation of the corresponding structure, applicant must identify the corresponding structure with reference to the specification by page and line number, and to the drawing, if any, by reference characters in response to this Office action. 
If applicant does not intend to have the claim limitation(s) treated under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112 , sixth paragraph, applicant may amend the claim(s) so that it/they will clearly not invoke 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, or present a sufficient showing that the claim recites/recite sufficient structure, material, or acts for performing the claimed function to preclude application of 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
For more information, see MPEP § 2173 et seq. and Supplementary Examination Guidelines for Determining Compliance With 35 U.S.C. 112 and for Treatment of Related Issues in Patent Applications, 76 FR 7162, 7167 (Feb. 9, 2011).


Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

4.	Claims 17-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.
 	Claim 17 recites the limitation “a schema reference module”, “a data synthesizer module coupled to the schema inference module”, “an initializer module”, “an executor module coupled to the initializer module”, “an expander module coupled to the executor module”, “a terminator module coupled to the expander module”, and “an information repository coupled to the executor and terminator modules” are the limitations that invokes 35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for the claimed function. “A schema reference module”, “a data synthesizer module coupled to the schema inference module”, “an initializer module”, “an executor module coupled to the initializer module”, “an expander module coupled to the executor module”, “a terminator module coupled to the expander module”, and “an information repository coupled to the executor and terminator modules are disclosed in applicant’s specification, but there is no specific structure described for the modules and the repository. 
Claim 19 recites the limitation “the initial set of validation nodes” is the limitation that invokes 35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for the claimed function. The initial set of validation nodes is disclosed in applicant’s specification, but there is no specific structure described for the initial set of validation nodes. 
Claim 20 recites the limitation “a first stage”, “a second stage”, “a third stage”, and “a third stage” are the limitations that invokes 35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for the claimed function. A first stage, a second stage, a third stage, and a third stage are disclosed in applicant’s specification, but there is no specific structure described for the stages. 
Since claim 18 is dependent on the claim 17, inherits the features of and do not cure the deficiencies previously set forth with respect to claim 17 above. As such, the claim is rejected under 35 USC §112(b) for the same reasons set forth with respect to claim 17 above.
 	Applicant may:
(a)          Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112, sixth paragraph; or
(b)          Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the claimed function without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either:
(a)          Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or
(b)          Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.


Claim Rejections - 35 USC § 103
5.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

6.	Claims 1-25 are rejected under 35 U.S.C. 103 as being unpatentable over Al-Haimi et al. (US 2019/0286620 A1) hereinafter Al-Haimi, in view of Marrelli et al. (US 2016/0070725 A1) hereinafter Marrelli. 
As to claim 1, Al-Haimi discloses a computing device comprising: a processor; a storage device coupled to the processor; an engine stored in the storage device, wherein an execution of the engine by the processor configures the computing device to perform acts comprising (Para. 51; 61): receiving a raw dataset (Fig. 1-2, Para. 25, The clinical data analysis system 100 may receive raw source clinical data 110, i.e. a raw dataset, from various data sources. The raw source clinical data set 110 may be any source data in any format.); 

receiving one or more data quality metric goals corresponding to the received raw dataset (Fig. 3, Para. 7, Using a prediction model, the system may predict the correct transformation to generate content corresponding to the source data  but formatted according to the target data schema, i.e. data quality metric. In some instances the predicted transformation can be based upon the schema mapped to a set of raw clinical data. Para. 49, “At component 320, the four probability matrices may be used to generate a weighted probability matrix 325. Weightings for each classifier algorithm may be tuned based on particular applications an experience”. Thus, the target data schema such as the one or more data quality metric goals corresponding to the received raw dataset are being received.); 
determining a schema of the dataset (Fig. 4, Para. 9, “The method may comprise determining a source data set for transformation to a unified target data schema. The source data set may include a source data records organized according to a source data schema. The source data schema may include a plurality of source fields.”. Para. 23, According to some aspects, a system may utilize deep learning algorithms to determine a mapping from the source schema to the target schema through identifying the source schema, i.e. determining a schema of the dataset, and creating a correspondence between source fields and target fields, and a corresponding data transformation.); 
identifying an initial set of validation nodes based on the schema of the dataset (Para. 11, “The computer may determine appropriate data transformations to apply to the source data to generate the target data in an appropriate form. Based on the schema mapping, the computer may determine a transformation operative to transform source data content to appropriately formatted target data content.”. Para. 23, “According to some aspects, a system may utilize deep learning algorithms to determine a mapping from the source schema to the target schema through identifying the source schema and creating a correspondence between source fields and target fields, and a corresponding data transformation. Artificial neural networks, configured as schema-level and instance-level classifiers, may generate a set of predictions based on the fields of the source data set and fields of the target data schema”. Para. 60, “the system may determine a data transform based on the source data schema, the target data schema, and the determined mapping. The determined data transform may be applied to the source data set to generate a target data set”, where determining a data transform using set of rules based on the source data schema indicates the validation nodes such as the set of rules which is being identified based on the schema of the dataset.);
executing the initial set of validation nodes (Para. 11, As with determining the mapping, the computer may pre-process one or more fields of the source data set. Based on the schema mapping, the computer may determine a transformation operative to transform source data content to appropriately formatted target data content. Para. 48, The fields/columns of the source data set may be run, i.e. executing the initial set of validation nodes, through the series of four classification algorithms to generate the prediction and probability results which are used to build probability matrices 315A-D. Para. 26, “The data mapping and transformation module 120 may be an artificial intelligence engine that leverages one or more deep learning algorithms to substantially automatically map the raw source clinical data 110 to a standard data schema, and then perform substantially automatic data transformations on the mapped raw source clinical data 110 to correct data discrepancies.”. Therefore, the initial set of validation nodes are being executed to transform source data content to appropriately formatted target data content.).
Al-Haimi does not explicitly disclose iteratively expanding and executing a next set of validation nodes based on the schema of the dataset until a termination criterion is reached; and providing a corrected dataset of the raw dataset based on the iterative execution of the initial and next set of validation nodes.
However, in the same field of endeavor, Marrelli discloses iteratively expanding and executing a next set of validation nodes based on the schema of the dataset until a termination criterion is reached (Fig. 3, Para. 95, “The generation of action plans and cleansing of data at step 315 and re-calculation at step 320 are repeated until the results of the data quality analysis are satisfactory (e.g., the source data is sufficiently clean for migration to the target system, etc.). For example, the data quality percentage values may satisfy corresponding thresholds or other criteria to indicate sufficient cleanliness of the source data”. Para. 40, “a table of a data domain may include a row (or record) for each customer, where the columns or data attributes for each row may include first name, last name, and address. Data attributes of a data domain include in-scope data attributes that are relevant to a future-state target environment (e.g., critical to one or more business or other processes of the target system, required by the target system, etc) and considered for data cleansing (e.g., provided with a nonzero weight as described below).”. Thus, a next set of validation nodes are being iteratively expanding and executing based on the schema of the dataset until a termination criterion is reached since data attributed of a data domain are considered for data cleansing until the source data is sufficiently cleaned for migration.); and 
providing a corrected dataset of the raw dataset based on the iterative execution of the initial and next set of validation nodes (Fig. 4, Para. 103, “Once the source analysis phase of the data analysis is completed, the source data is initially cleansed to a sufficient level, and a target process phase of the data quality analysis may be performed. During the target process phase, data in staging areas 122 is converted to the common data model of alignment area 124 (e.g., via an ETL tool) and profiled by the business process hierarchy (BPH). Data quality engine 130 (e.g., via one or more server systems 110) determines actionable or problematic data prioritized for critical processes of the target system. Reports are routed to appropriate users and/or administrators by data quality reports module 132 (e.g., via one or more server systems 110).”. Para. 105, “The target process phase of the data quality analysis further determines whether the cleansing activities of the action plan (e.g., either in the source system or alignment area 124) have been performed correctly, and identifies the potential impact of actionable or problematic data relative to the business or other processes that the actionable data supports. In other words, the target process phase provides an indication of the cleanliness of source data for the particular business or other processes of the target system utilizing that source data”. Thus, a corrected dataset of the raw dataset is being provided based on the iterative execution of the initial and next set of validation nodes.).
Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Al-Haimi such that the next set of validation nodes such as set of rules of Al-Haimi can be executed repeatedly until the results of the data quality analysis are satisfactory as suggested by Marrelli (Para. 95). One of the ordinary skills in the art would have motivated to make this modification in order to migrate corrected data to the target system as suggested by Marrelli (Col. 95).

As to claim 2, the claim is rejected for the same reasons as claim 1 above. In addition, Marrelli discloses wherein each validation node includes a data quality check and one or more remediation actions (Para. 44, each data attribute of a data domain is associated with a set of data quality rules, i.e. each validation node, for each of source systems 140 and for a corresponding data attribute of target system 150. The set of data quality rules typically span the data quality dimensions. These data quality rules may be pre-defined by a user. For example, a set of data quality rules for a data attribute of the target system may include a completeness rule (e.g., the data attribute must not be mill), a validity rule (e.g., the data attribute must not contain special characters), and an accuracy rule (e.g., the data attribute must be a valid street name for a given zip code). Para. 41, “Actionable or problematic data is prioritized by business criticality and routed to appropriate users and/or administrators by data quality reports module 132. The actionable data is either cleansed in the source systems, or the mappings are updated with conversion rules.”. Para. 110, “the data quality engine may analyze the action plan and correct and/or add data based on the statuses and/or data quality rules violated by the data and indicated in the action plan.”. Therefore, each validation node includes a data quality check and one or more remediation actions.).

As to claim 3, the claim is rejected for the same reasons as claim 1 above. In addition, Marrelli discloses wherein execution of a validation node of the initial set of validation nodes comprises: identifying all possible remediation actions for any data quality check (Para. 119, “The load analysis phase is typically executed once for each integration test cycle, ideally with improved data quality and less process impact each time.”. Para. 135, “data quality rules for a data object or attribute may span any quantity of data quality dimensions or metrics, where any desired quantity of rules satisfied (or violated) may determine clean (or actionable) data. Further, the data quality rules may be of any quantity, and be associated with one or more particular data objects and a corresponding system (e.g., source, target or other system, etc.). The action plans for the individual phases may include any desired information (e.g., listing of problematic or clean data items, violated rules, cleansing actions, etc.). Any portions of action plans may be generated and/or executed manually and/or automatically (e.g., via a computer system without user intervention)”. Therefore, execution of a validation node of the initial set of validation nodes comprises: identifying all possible remediation actions for any data quality check.); 
transforming the data with each possible remediation action (Para. 92, “The action plan may indicate which data records are to be cleansed at the source system, data to be created at the source system prior to conversion, and conversion rules needed to transform source data to standards of the target system. For example, the action plan may be in the form of a listing of records indicating for each record the statuses of the data record attributes, the data quality rules (for the target system) violated, reasons for the violation, and recommended cleansing actions. The cleansing actions may be performed on data within the source systems and/or staging areas 122 manually and/or by the data quality engine as described below”. Thus, the data is being transformed with each possible remediation action.); and  
computing a plurality of data quality metrics (DQMs) to evaluate the transformations (Fig. 7A-7B, Para. 21, “Present invention embodiments compare data elements expected in the target system against corresponding data elements of one or more source systems and produce weighted data quality metrics that are meaningful to resources accountable for cleansing and transformation of the source data elements.”. Para. 135, “Any quantity of the data quality dimensions and/or metrics may be utilized to determine clean or actionable data. For example, data quality rules for a data object or attribute may span any quantity of data quality dimensions or metrics, where any desired quantity of rules satisfied (or violated) may determine clean (or actionable) data. Further, the data quality rules may be of any quantity, and be associated with one or more particular data objects and a corresponding system (e.g., source, target or other system, etc.). The action plans for the individual phases may include any desired information (e.g., listing of problematic or clean data items, violated rules, cleansing actions, etc.).”. Thus, a plurality of data quality metrics (DQMs) computed to evaluate the transformations.).

As to claim 4, the claim is rejected for the same reasons as claim 1 above. In addition, Marrelli discloses wherein execution of a validation node includes a first stage, comprising: performing a logical check of the raw dataset by a validator object to detect one or more anomalies in the raw dataset (Para. 42, “data from source systems 140 (FIG. 2) is received and stored in corresponding staging areas 122 for data quality assessment based on data quality rules for the target system at step 305”. Para. 59, The data quality rules of the target system are utilized to identify data of the source systems that are actionable or problematic, i.e. to detect one or more anomalies in the raw dataset, with respect to the target system prior to migration to ensure the source data is accepted into the target system.); and 
performing different data transformations by way of a corresponding operator on the raw dataset to produce a new version of data for each data transformation, to correct the one or more detected anomalies (Para. 110, “the data quality engine may determine appropriate conversions or transformations and transform the corresponding data. Further, the data quality engine may analyze the action plan and correct and/or add data based on the statuses and/or data quality rules violated by the data and indicated in the action plan”. Thus, the data quality engine transforms the data using appropriate transformation to correct the one or more detected anomalies.).

As to claim 5, the claim is rejected for the same reasons as claim 4 above. In addition, Marrelli discloses wherein execution of the validation node includes a second stage comprising: for each new version of data produced, generating a data quality metric (DQM) by an internal quality evaluator (IQE) module; and generating a DQM for the raw dataset (Para. 42, “data from source systems 140 (FIG. 2) is received and stored in corresponding staging areas 122 for data quality assessment based on data quality rules for the target system at step 305”. Para. 21, Present invention embodiments compare data elements expected in the target system against corresponding data elements, i.e. the raw dataset, of one or more source systems and produce weighted data quality metrics that are meaningful to resources accountable for cleansing and transformation of the source data elements. Thus, the data quality metric for the raw dataset are being generated.).

As to claim 6, the claim is rejected for the same reasons as claim 5 above. In addition, Marrelli discloses wherein each DQM of the second stage comprises at least one of (i) a summary of characteristics in multiple dimensions of the corresponding new version of data produced from the raw dataset; or (ii) a gain or change information of the corresponding new version of data produced from the raw dataset (Para. 42, “data from source systems 140 (FIG. 2) is received and stored in corresponding staging areas 122 for data quality assessment based on data quality rules for the target system at step 305”. Para. 98, “Interface screen 800 preferably provides a visual representation of the data quality of the data attributes within a data domain (e.g., Customer Master, Material Master, etc.). A data attribute 853 may be selected from interface screen 800 (e.g., via a mouse or other input device), where the actionable or problematic data records of the selected data attribute are presented. For example, data records containing a selected data attribute that violate data quality rules across in-scope (or relevant) data quality dimensions (e.g., accuracy, completeness, etc.) may be presented. This presentation may be used to generate action plans, where actionable or problematic data may be routed to users and/or administrators for correction or designation to other users/administrators for appropriate handling.”. Thus, each DQM of the second stage comprises a summary of characteristics in multiple dimensions of the corresponding new version of data produced from the raw dataset.). 

As to claim 7, the claim is rejected for the same reasons as claim 5 above. In addition, Marrelli discloses wherein execution of the validation node includes a third stage comprising: for each DQM of each new version of data produced and the DQM of the raw dataset, performing a comparison to the raw dataset to assess an improvement from the raw dataset (Para. 42, “data from source systems 140 (FIG. 2) is received and stored in corresponding staging areas 122 for data quality assessment based on data quality rules for the target system at step 305”. Para. 39, “A technical specification for each mapping (generated based on the logical mappings) describes the manner in which physical attributes of the source data models of staging areas 122 are mapped to the common physical data model (derived from the target system) employed as a baseline for alignment area 124. These mappings enable tracing of attributes from the target system back to one or more source systems and, therefore, allow correlation between source data quality metrics and target data quality metrics”, where the source data quality metrics indicates the DQM of the raw dataset and the target data quality metrics indicates an improvement from the raw dataset.).

As to claim 8, the claim is rejected for the same reasons as claim 7 above. In addition, Marrelli discloses wherein execution of the validation node includes a fourth stage comprising: selecting the operator of the new version of data produced that best meets the data quality metric goals (Para. 42, “data from source systems 140 (FIG. 2) is received and stored in corresponding staging areas 122 for data quality assessment based on data quality rules for the target system at step 305”. Para. 135, “data quality rules for a data object or attribute may span any quantity of data quality dimensions or metrics, where any desired quantity of rules satisfied (or violated) may determine clean (or actionable) data. Further, the data quality rules may be of any quantity, and be associated with one or more particular data objects and a corresponding system (e.g., source, target or other system, etc.). The action plans for the individual phases may include any desired information (e.g., listing of problematic or clean data items, violated rules, cleansing actions, etc.). Any portions of action plans may be generated and/or executed manually and/or automatically (e.g., via a computer system without user intervention)”. Thus, execution of the validation node includes a fourth stage comprising: selecting the operator of the new version of data produced that best meets the data quality metric goals.).

As to claim 9, the claim is rejected for the same reasons as claim 8 above. In addition, Al-Haimi discloses wherein the operator that is selected has a highest gap between its corresponding DQM and the DQM of the raw dataset that is below a predetermined threshold (Para. 36, “The Levenshtein distance similarity classifier 253 may assess the similarity between the column names of a set of raw source clinical data and the column names of a possible standard target schema. Levenshtein distance may represent a similarity between source and target column names. Lower values may represent better, "closer" matches. Levenshtein distance similarity classifier 253 may determine a value representative of the similarity between the actual column name and a proposed column name. Using this value, the Levenshtein distance similarity classifier may predict a standard target schema for the data based on the Levenshtein distance between the fields, where determined distance values between source and target column names indicates gap between its corresponding DQM and the DQM of the raw dataset”. Para. 49, “Based on the weighted probability matrix 325, a maximum matching algorithm may be employed by prediction layer 260 to determine the best match predicted mapping between the source field and one or more target fields. The best match may be compared against an accuracy threshold at step 330.”. Thus, the operator that is selected has a highest gap between its corresponding DQM and the DQM of the raw dataset that is below a predetermined threshold since the similarity value is determined based on Levenshtein distance similarity classifier and the accuracy threshold.).

As to claim 10, the claim is rejected for the same reasons as claim 1 above. In addition, Marrelli discloses wherein expanding a next set of validation nodes comprises at least one of: determining a validation node that best achieves one or more received quality metric goals; or determining a validation node based on mining an execution information repository to find all validation nodes that usually occur together (Para. 135, “data quality rules for a data object or attribute may span any quantity of data quality dimensions or metrics, where any desired quantity of rules satisfied (or violated) may determine clean (or actionable) data. Further, the data quality rules may be of any quantity, and be associated with one or more particular data objects and a corresponding system (e.g., source, target or other system, etc.). The action plans for the individual phases may include any desired information (e.g., listing of problematic or clean data items, violated rules, cleansing actions, etc.). Any portions of action plans may be generated and/or executed manually and/or automatically (e.g., via a computer system without user intervention)”. Therefore, expanding a next set of validation nodes comprises at least one of: determining a validation node that best achieves one or more received quality metric goals.).


As to claim 11, Al-Haimi discloses a non-transitory computer readable storage medium tangibly embodying a computer readable program code having computer readable instructions that, when executed (Para. 51; 61), causes a computer device to carry out a method of improving data quality to conserve computational resources (Para. 26), the method comprising receiving a raw dataset (Fig. 1-2, Para. 25, The clinical data analysis system 100 may receive raw source clinical data 110, i.e. a raw dataset, from various data sources. The raw source clinical data set 110 may be any source data in any format.); 
receiving one or more data quality metric goals corresponding to the received raw dataset (Fig. 3, Para. 7, Using a prediction model, the system may predict the correct transformation to generate content corresponding to the source data  but formatted according to the target data schema, i.e. data quality metric. In some instances the predicted transformation can be based upon the schema mapped to a set of raw clinical data. Para. 49, “At component 320, the four probability matrices may be used to generate a weighted probability matrix 325. Weightings for each classifier algorithm may be tuned based on particular applications an experience”. Thus, the target data schema such as the one or more data quality metric goals corresponding to the received raw dataset are being received.); 
determining a schema of the dataset (Fig. 4, Para. 9, “The method may comprise determining a source data set for transformation to a unified target data schema. The source data set may include a source data records organized according to a source data schema. The source data schema may include a plurality of source fields.”. Para. 23, According to some aspects, a system may utilize deep learning algorithms to determine a mapping from the source schema to the target schema through identifying the source schema, i.e. determining a schema of the dataset, and creating a correspondence between source fields and target fields, and a corresponding data transformation.); 
identifying an initial set of validation nodes based on the schema of the dataset (Para. 11, “The computer may determine appropriate data transformations to apply to the source data to generate the target data in an appropriate form. Based on the schema mapping, the computer may determine a transformation operative to transform source data content to appropriately formatted target data content.”. Para. 23, “According to some aspects, a system may utilize deep learning algorithms to determine a mapping from the source schema to the target schema through identifying the source schema and creating a correspondence between source fields and target fields, and a corresponding data transformation. Artificial neural networks, configured as schema-level and instance-level classifiers, may generate a set of predictions based on the fields of the source data set and fields of the target data schema”. Para. 60, “the system may determine a data transform based on the source data schema, the target data schema, and the determined mapping. The determined data transform may be applied to the source data set to generate a target data set”, where determining a data transform using set of rules based on the source data schema indicates the validation nodes such as the set of rules which is being identified based on the schema of the dataset.); 

executing the initial set of validation nodes (Para. 11, As with determining the mapping, the computer may pre-process one or more fields of the source data set. Based on the schema mapping, the computer may determine a transformation operative to transform source data content to appropriately formatted target data content. Para. 48, The fields/columns of the source data set may be run, i.e. executing the initial set of validation nodes, through the series of four classification algorithms to generate the prediction and probability results which are used to build probability matrices 315A-D. Para. 26, “The data mapping and transformation module 120 may be an artificial intelligence engine that leverages one or more deep learning algorithms to substantially automatically map the raw source clinical data 110 to a standard data schema, and then perform substantially automatic data transformations on the mapped raw source clinical data 110 to correct data discrepancies.”. Therefore, the initial set of validation nodes are being executed to transform source data content to appropriately formatted target data content.).
Al-Haimi does not explicitly disclose iteratively expanding and executing a next set of validation nodes based on the schema of the dataset until a termination criterion is reached; and providing a corrected dataset of the raw dataset based on the iterative execution of the initial and next set of validation nodes.
However, in the same field of endeavor, Marrelli discloses iteratively expanding and executing a next set of validation nodes based on the schema of the dataset until a termination criterion is reached (Fig. 3, Para. 95, “The generation of action plans and cleansing of data at step 315 and re-calculation at step 320 are repeated until the results of the data quality analysis are satisfactory (e.g., the source data is sufficiently clean for migration to the target system, etc.). For example, the data quality percentage values may satisfy corresponding thresholds or other criteria to indicate sufficient cleanliness of the source data”. Para. 40, “a table of a data domain may include a row (or record) for each customer, where the columns or data attributes for each row may include first name, last name, and address. Data attributes of a data domain include in-scope data attributes that are relevant to a future-state target environment (e.g., critical to one or more business or other processes of the target system, required by the target system, etc) and considered for data cleansing (e.g., provided with a nonzero weight as described below).”. Thus, a next set of validation nodes are being iteratively expanding and executing based on the schema of the dataset until a termination criterion is reached since data attributed of a data domain are considered for data cleansing until the source data is sufficiently cleaned for migration.); and 
providing a corrected dataset of the raw dataset based on the iterative execution of the initial and next set of validation nodes (Fig. 4, Para. 103, “Once the source analysis phase of the data analysis is completed, the source data is initially cleansed to a sufficient level, and a target process phase of the data quality analysis may be performed. During the target process phase, data in staging areas 122 is converted to the common data model of alignment area 124 (e.g., via an ETL tool) and profiled by the business process hierarchy (BPH). Data quality engine 130 (e.g., via one or more server systems 110) determines actionable or problematic data prioritized for critical processes of the target system. Reports are routed to appropriate users and/or administrators by data quality reports module 132 (e.g., via one or more server systems 110).”. Para. 105, “The target process phase of the data quality analysis further determines whether the cleansing activities of the action plan (e.g., either in the source system or alignment area 124) have been performed correctly, and identifies the potential impact of actionable or problematic data relative to the business or other processes that the actionable data supports. In other words, the target process phase provides an indication of the cleanliness of source data for the particular business or other processes of the target system utilizing that source data”. Thus, a corrected dataset of the raw dataset is being provided based on the iterative execution of the initial and next set of validation nodes.).
Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Al-Haimi such that the next set of validation nodes such as set of rules of Al-Haimi can be executed repeatedly until the results of the data quality analysis are satisfactory as suggested by Marrelli (Para. 95). One of the ordinary skills in the art would have motivated to make this modification in order to migrate corrected data to the target system as suggested by Marrelli (Col. 95).



As to claim 12, the claim is rejected for the same reasons as claim 11 above. In addition, Marrelli discloses wherein each validation node includes a data quality check and one or more remediation actions (Para. 44, each data attribute of a data domain is associated with a set of data quality rules, i.e. each validation node, for each of source systems 140 and for a corresponding data attribute of target system 150. The set of data quality rules typically span the data quality dimensions. These data quality rules may be pre-defined by a user. For example, a set of data quality rules for a data attribute of the target system may include a completeness rule (e.g., the data attribute must not be mill), a validity rule (e.g., the data attribute must not contain special characters), and an accuracy rule (e.g., the data attribute must be a valid street name for a given zip code). Para. 41, “Actionable or problematic data is prioritized by business criticality and routed to appropriate users and/or administrators by data quality reports module 132. The actionable data is either cleansed in the source systems, or the mappings are updated with conversion rules.”. Para. 110, “the data quality engine may analyze the action plan and correct and/or add data based on the statuses and/or data quality rules violated by the data and indicated in the action plan.”. Therefore, each validation node includes a data quality check and one or more remediation actions.).


As to claim 13, the claim is rejected for the same reasons as claim 11 above. In addition, Marrelli discloses wherein execution of a validation node of the initial set of validation nodes comprises: identifying all possible remediation actions for each data quality check (Para. 119, “The load analysis phase is typically executed once for each integration test cycle, ideally with improved data quality and less process impact each time.”. Para. 135, “data quality rules for a data object or attribute may span any quantity of data quality dimensions or metrics, where any desired quantity of rules satisfied (or violated) may determine clean (or actionable) data. Further, the data quality rules may be of any quantity, and be associated with one or more particular data objects and a corresponding system (e.g., source, target or other system, etc.). The action plans for the individual phases may include any desired information (e.g., listing of problematic or clean data items, violated rules, cleansing actions, etc.). Any portions of action plans may be generated and/or executed manually and/or automatically (e.g., via a computer system without user intervention)”. Therefore, execution of a validation node of the initial set of validation nodes comprises: identifying all possible remediation actions for any data quality check.); 
transforming the data with each possible remediation action (Para. 92, “The action plan may indicate which data records are to be cleansed at the source system, data to be created at the source system prior to conversion, and conversion rules needed to transform source data to standards of the target system. For example, the action plan may be in the form of a listing of records indicating for each record the statuses of the data record attributes, the data quality rules (for the target system) violated, reasons for the violation, and recommended cleansing actions. The cleansing actions may be performed on data within the source systems and/or staging areas 122 manually and/or by the data quality engine as described below”. Thus, the data is being transformed with each possible remediation action.); and 
computing a plurality of data quality metrics (DQMs) to evaluate the transformations (Fig. 7A-7B, Para. 21, “Present invention embodiments compare data elements expected in the target system against corresponding data elements of one or more source systems and produce weighted data quality metrics that are meaningful to resources accountable for cleansing and transformation of the source data elements.”. Para. 135, “Any quantity of the data quality dimensions and/or metrics may be utilized to determine clean or actionable data. For example, data quality rules for a data object or attribute may span any quantity of data quality dimensions or metrics, where any desired quantity of rules satisfied (or violated) may determine clean (or actionable) data. Further, the data quality rules may be of any quantity, and be associated with one or more particular data objects and a corresponding system (e.g., source, target or other system, etc.). The action plans for the individual phases may include any desired information (e.g., listing of problematic or clean data items, violated rules, cleansing actions, etc.).”. Thus, a plurality of data quality metrics (DQMs) computed to evaluate the transformations.).


As to claim 14, the claim is rejected for the same reasons as claim 11 above. In addition, Marrelli discloses wherein execution of a validation node includes: a first stage, comprising: performing a logical check of the raw dataset by a validator object to detect one or more anomalies in the raw dataset (Para. 42, “data from source systems 140 (FIG. 2) is received and stored in corresponding staging areas 122 for data quality assessment based on data quality rules for the target system at step 305”. Para. 59, The data quality rules of the target system are utilized to identify data of the source systems that are actionable or problematic, i.e. to detect one or more anomalies in the raw dataset, with respect to the target system prior to migration to ensure the source data is accepted into the target system.); and 
performing different data transformations by way of a corresponding operator on the raw dataset to produce a new version of data for each data transformation, to correct the one or more detected anomalies (Para. 110, “the data quality engine may determine appropriate conversions or transformations and transform the corresponding data. Further, the data quality engine may analyze the action plan and correct and/or add data based on the statuses and/or data quality rules violated by the data and indicated in the action plan”. Thus, the data quality engine transforms the data using appropriate transformation to correct the one or more detected anomalies.); 
a second stage, comprising: for each new version of data produced, generating a data quality metric (DQM) by an internal quality evaluator (IQE) module; and generating a DQM for the raw dataset (Para. 42, “data from source systems 140 (FIG. 2) is received and stored in corresponding staging areas 122 for data quality assessment based on data quality rules for the target system at step 305”. Para. 21, Present invention embodiments compare data elements expected in the target system against corresponding data elements, i.e. the raw dataset, of one or more source systems and produce weighted data quality metrics that are meaningful to resources accountable for cleansing and transformation of the source data elements. Thus, the data quality metric for the raw dataset are being generated.); 
a third stage, comprising: for each DQM of each new version of data produced, performing a comparison to the raw dataset to assess an improvement from the raw dataset (Para. 42, “data from source systems 140 (FIG. 2) is received and stored in corresponding staging areas 122 for data quality assessment based on data quality rules for the target system at step 305”. Para. 39, “A technical specification for each mapping (generated based on the logical mappings) describes the manner in which physical attributes of the source data models of staging areas 122 are mapped to the common physical data model (derived from the target system) employed as a baseline for alignment area 124. These mappings enable tracing of attributes from the target system back to one or more source systems and, therefore, allow correlation between source data quality metrics and target data quality metrics”, where the source data quality metrics indicates the DQM of the raw dataset and the target data quality metrics indicates an improvement from the raw dataset.); and 
a fourth stage, comprising: selecting the operator of the new version of data produced that best meets the data quality metric goals (Para. 42, “data from source systems 140 (FIG. 2) is received and stored in corresponding staging areas 122 for data quality assessment based on data quality rules for the target system at step 305”. Para. 135, “data quality rules for a data object or attribute may span any quantity of data quality dimensions or metrics, where any desired quantity of rules satisfied (or violated) may determine clean (or actionable) data. Further, the data quality rules may be of any quantity, and be associated with one or more particular data objects and a corresponding system (e.g., source, target or other system, etc.). The action plans for the individual phases may include any desired information (e.g., listing of problematic or clean data items, violated rules, cleansing actions, etc.). Any portions of action plans may be generated and/or executed manually and/or automatically (e.g., via a computer system without user intervention)”. Thus, execution of the validation node includes a fourth stage comprising: selecting the operator of the new version of data produced that best meets the data quality metric goals.).

As to claim 15, the claim is rejected for the same reasons as claim 14 above. In addition, Al-Haimi discloses wherein the operator that is selected has a highest gap between its corresponding DQM and the DQM of the raw dataset that is below a predetermined threshold (Para. 36, “The Levenshtein distance similarity classifier 253 may assess the similarity between the column names of a set of raw source clinical data and the column names of a possible standard target schema. Levenshtein distance may represent a similarity between source and target column names. Lower values may represent better, "closer" matches. Levenshtein distance similarity classifier 253 may determine a value representative of the similarity between the actual column name and a proposed column name. Using this value, the Levenshtein distance similarity classifier may predict a standard target schema for the data based on the Levenshtein distance between the fields, where determined distance values between source and target column names indicates gap between its corresponding DQM and the DQM of the raw dataset”. Para. 49, “Based on the weighted probability matrix 325, a maximum matching algorithm may be employed by prediction layer 260 to determine the best match predicted mapping between the source field and one or more target fields. The best match may be compared against an accuracy threshold at step 330.”. Thus, the operator that is selected has a highest gap between its corresponding DQM and the DQM of the raw dataset that is below a predetermined threshold since the similarity value is determined based on Levenshtein distance similarity classifier and the accuracy threshold.).

As to claim 16, the claim is rejected for the same reasons as claim 11 above. In addition, Marrelli discloses wherein expanding a next set of validation nodes comprises at least one of: determining a validation node that best achieves one or more of the data quality metric goals; or determining a validation node based on mining an execution information repository to find all validation nodes that usually occur together (Para. 135, “data quality rules for a data object or attribute may span any quantity of data quality dimensions or metrics, where any desired quantity of rules satisfied (or violated) may determine clean (or actionable) data. Further, the data quality rules may be of any quantity, and be associated with one or more particular data objects and a corresponding system (e.g., source, target or other system, etc.). The action plans for the individual phases may include any desired information (e.g., listing of problematic or clean data items, violated rules, cleansing actions, etc.). Any portions of action plans may be generated and/or executed manually and/or automatically (e.g., via a computer system without user intervention)”. Therefore, expanding a next set of validation nodes comprises at least one of: determining a validation node that best achieves one or more received quality metric goals.).

As to claim 17, Al-Haimi discloses a system comprising: a schema reference module configured to receive a raw dataset (Fig. 1-2, Para. 25, The clinical data analysis system 100 may receive raw source clinical data 110, i.e. a raw dataset, from various data sources. The raw source clinical data set 110 may be any source data in any format.) and to determine a schema of the raw dataset (Fig. 4, Para. 9, “The method may comprise determining a source data set for transformation to a unified target data schema. The source data set may include a source data records organized according to a source data schema. The source data schema may include a plurality of source fields.”. Para. 23, According to some aspects, a system may utilize deep learning algorithms to determine a mapping from the source schema to the target schema through identifying the source schema, i.e. determining a schema of the dataset, and creating a correspondence between source fields and target fields, and a corresponding data transformation.); and 
a data synthesizer module coupled to the schema inference module and configured to receive one or more data quality metric goals corresponding to the received raw dataset from a knowledge base (Fig. 3, Para. 7, Using a prediction model, the system may predict the correct transformation to generate content corresponding to the source data  but formatted according to the target data schema, i.e. data quality metric. In some instances the predicted transformation can be based upon the schema mapped to a set of raw clinical data. Para. 49, “At component 320, the four probability matrices may be used to generate a weighted probability matrix 325. Weightings for each classifier algorithm may be tuned based on particular applications an experience”. Thus, the target data schema such as the one or more data quality metric goals corresponding to the received raw dataset are being received.), 
wherein the data synthesizer module comprises: an initializer module configured to identify an initial set of validation nodes based on the schema of the dataset (Para. 11, “The computer may determine appropriate data transformations to apply to the source data to generate the target data in an appropriate form. Based on the schema mapping, the computer may determine a transformation operative to transform source data content to appropriately formatted target data content.”. Para. 23, “According to some aspects, a system may utilize deep learning algorithms to determine a mapping from the source schema to the target schema through identifying the source schema and creating a correspondence between source fields and target fields, and a corresponding data transformation. Artificial neural networks, configured as schema-level and instance-level classifiers, may generate a set of predictions based on the fields of the source data set and fields of the target data schema”. Para. 60, “the system may determine a data transform based on the source data schema, the target data schema, and the determined mapping. The determined data transform may be applied to the source data set to generate a target data set”, where determining a data transform using set of rules based on the source data schema indicates the validation nodes such as the set of rules which is being identified based on the schema of the dataset.); 
an executor module coupled to the initializer module and configured to execute the initial set of validation nodes (Para. 11, As with determining the mapping, the computer may pre-process one or more fields of the source data set. Based on the schema mapping, the computer may determine a transformation operative to transform source data content to appropriately formatted target data content. Para. 48, The fields/columns of the source data set may be run, i.e. executing the initial set of validation nodes, through the series of four classification algorithms to generate the prediction and probability results which are used to build probability matrices 315A-D. Para. 26, “The data mapping and transformation module 120 may be an artificial intelligence engine that leverages one or more deep learning algorithms to substantially automatically map the raw source clinical data 110 to a standard data schema, and then perform substantially automatic data transformations on the mapped raw source clinical data 110 to correct data discrepancies.”. Therefore, the initial set of validation nodes are being executed to transform source data content to appropriately formatted target data content.).
Al-Haimi does not explicitly disclose an expander module coupled to the executor module and configured to iteratively expand and execute a next set of validation nodes based on the schema of the dataset, until a termination criterion is reached; and a terminator module coupled to the expander module and configured to iteratively determine the next set of validation nodes to consider by the expander module and to decide when to terminate the iterative determination; and an information repository coupled to the executor and terminator modules and configured to provide a corrected dataset of the raw dataset based on the iterative execution of the initial and next set of validation nodes.
However, in the same field of endeavor, Marrelli discloses an expander module coupled to the executor module and configured to iteratively expand and execute a next set of validation nodes based on the schema of the dataset, until a termination criterion is reached; and a terminator module coupled to the expander module and configured to iteratively determine the next set of validation nodes to consider by the expander module and to decide when to terminate the iterative determination (Fig. 3, Para. 95, “The generation of action plans and cleansing of data at step 315 and re-calculation at step 320 are repeated until the results of the data quality analysis are satisfactory (e.g., the source data is sufficiently clean for migration to the target system, etc.). For example, the data quality percentage values may satisfy corresponding thresholds or other criteria to indicate sufficient cleanliness of the source data”. Para. 40, “a table of a data domain may include a row (or record) for each customer, where the columns or data attributes for each row may include first name, last name, and address. Data attributes of a data domain include in-scope data attributes that are relevant to a future-state target environment (e.g., critical to one or more business or other processes of the target system, required by the target system, etc) and considered for data cleansing (e.g., provided with a nonzero weight as described below).”. Thus, a next set of validation nodes are being iteratively expanding and executing based on the schema of the dataset until a termination criterion is reached since data attributed of a data domain are considered for data cleansing until the source data is sufficiently cleaned for migration.); and 
an information repository coupled to the executor and terminator modules and configured to provide a corrected dataset of the raw dataset based on the iterative execution of the initial and next set of validation nodes (Fig. 4, Para. 103, “Once the source analysis phase of the data analysis is completed, the source data is initially cleansed to a sufficient level, and a target process phase of the data quality analysis may be performed. During the target process phase, data in staging areas 122 is converted to the common data model of alignment area 124 (e.g., via an ETL tool) and profiled by the business process hierarchy (BPH). Data quality engine 130 (e.g., via one or more server systems 110) determines actionable or problematic data prioritized for critical processes of the target system. Reports are routed to appropriate users and/or administrators by data quality reports module 132 (e.g., via one or more server systems 110).”. Para. 105, “The target process phase of the data quality analysis further determines whether the cleansing activities of the action plan (e.g., either in the source system or alignment area 124) have been performed correctly, and identifies the potential impact of actionable or problematic data relative to the business or other processes that the actionable data supports. In other words, the target process phase provides an indication of the cleanliness of source data for the particular business or other processes of the target system utilizing that source data”. Thus, a corrected dataset of the raw dataset is being provided based on the iterative execution of the initial and next set of validation nodes.).

Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Al-Haimi such that the next set of validation nodes such as set of rules of Al-Haimi can be executed repeatedly until the results of the data quality analysis are satisfactory as suggested by Marrelli (Para. 95). One of the ordinary skills in the art would have motivated to make this modification in order to migrate corrected data to the target system as suggested by Marrelli (Col. 95).

As to claim 18, the claim is rejected for the same reasons as claim 17 above. In addition, Marrelli discloses wherein each validation node includes a data quality check and one or more remediation actions (Para. 44, each data attribute of a data domain is associated with a set of data quality rules, i.e. each validation node, for each of source systems 140 and for a corresponding data attribute of target system 150. The set of data quality rules typically span the data quality dimensions. These data quality rules may be pre-defined by a user. For example, a set of data quality rules for a data attribute of the target system may include a completeness rule (e.g., the data attribute must not be mill), a validity rule (e.g., the data attribute must not contain special characters), and an accuracy rule (e.g., the data attribute must be a valid street name for a given zip code). Para. 41, “Actionable or problematic data is prioritized by business criticality and routed to appropriate users and/or administrators by data quality reports module 132. The actionable data is either cleansed in the source systems, or the mappings are updated with conversion rules.”. Para. 110, “the data quality engine may analyze the action plan and correct and/or add data based on the statuses and/or data quality rules violated by the data and indicated in the action plan.”. Therefore, each validation node includes a data quality check and one or more remediation actions.).

As to claim 19, the claim is rejected for the same reasons as claim 17 above. In addition, Marrelli discloses wherein the initial set of validation nodes are configured to: identify all possible remediation actions any data quality check (Para. 119, “The load analysis phase is typically executed once for each integration test cycle, ideally with improved data quality and less process impact each time.”. Para. 135, “data quality rules for a data object or attribute may span any quantity of data quality dimensions or metrics, where any desired quantity of rules satisfied (or violated) may determine clean (or actionable) data. Further, the data quality rules may be of any quantity, and be associated with one or more particular data objects and a corresponding system (e.g., source, target or other system, etc.). The action plans for the individual phases may include any desired information (e.g., listing of problematic or clean data items, violated rules, cleansing actions, etc.). Any portions of action plans may be generated and/or executed manually and/or automatically (e.g., via a computer system without user intervention)”. Therefore, execution of a validation node of the initial set of validation nodes comprises: identifying all possible remediation actions for any data quality check.); 
transform the data with each possible remediation action (Para. 92, “The action plan may indicate which data records are to be cleansed at the source system, data to be created at the source system prior to conversion, and conversion rules needed to transform source data to standards of the target system. For example, the action plan may be in the form of a listing of records indicating for each record the statuses of the data record attributes, the data quality rules (for the target system) violated, reasons for the violation, and recommended cleansing actions. The cleansing actions may be performed on data within the source systems and/or staging areas 122 manually and/or by the data quality engine as described below”. Thus, the data is being transformed with each possible remediation action.); and 
compute a plurality of data quality metrics to evaluate the transformations (Fig. 7A-7B, Para. 21, “Present invention embodiments compare data elements expected in the target system against corresponding data elements of one or more source systems and produce weighted data quality metrics that are meaningful to resources accountable for cleansing and transformation of the source data elements.”. Para. 135, “Any quantity of the data quality dimensions and/or metrics may be utilized to determine clean or actionable data. For example, data quality rules for a data object or attribute may span any quantity of data quality dimensions or metrics, where any desired quantity of rules satisfied (or violated) may determine clean (or actionable) data. Further, the data quality rules may be of any quantity, and be associated with one or more particular data objects and a corresponding system (e.g., source, target or other system, etc.). The action plans for the individual phases may include any desired information (e.g., listing of problematic or clean data items, violated rules, cleansing actions, etc.).”. Thus, a plurality of data quality metrics (DQMs) computed to evaluate the transformations.).

As to claim 20, the claim is rejected for the same reasons as claim 17 above. In addition, Marrelli discloses wherein each validation node comprises: a first stage configured to: perform a logical check of the raw dataset by a validator object to detect one or more anomalies in the raw dataset (Para. 42, “data from source systems 140 (FIG. 2) is received and stored in corresponding staging areas 122 for data quality assessment based on data quality rules for the target system at step 305”. Para. 59, The data quality rules of the target system are utilized to identify data of the source systems that are actionable or problematic, i.e. to detect one or more anomalies in the raw dataset, with respect to the target system prior to migration to ensure the source data is accepted into the target system.); and 
perform different data transformations by way of a corresponding operator on the raw dataset to produce a new version of data for each data transformation, to correct the one or more detected anomalies (Para. 110, “the data quality engine may determine appropriate conversions or transformations and transform the corresponding data. Further, the data quality engine may analyze the action plan and correct and/or add data based on the statuses and/or data quality rules violated by the data and indicated in the action plan”. Thus, the data quality engine transforms the data using appropriate transformation to correct the one or more detected anomalies.); 
a second stage configured to: for each new version of data produced, generate a data quality metric (DQM) by an internal quality evaluator (IQE) module; and generate a DQM for the raw dataset (Para. 42, “data from source systems 140 (FIG. 2) is received and stored in corresponding staging areas 122 for data quality assessment based on data quality rules for the target system at step 305”. Para. 21, Present invention embodiments compare data elements expected in the target system against corresponding data elements, i.e. the raw dataset, of one or more source systems and produce weighted data quality metrics that are meaningful to resources accountable for cleansing and transformation of the source data elements. Thus, the data quality metric for the raw dataset are being generated.); 
a third stage configured to: for each DQM of each new version of data produced, perform a comparison to the raw dataset (Para. 42, “data from source systems 140 (FIG. 2) is received and stored in corresponding staging areas 122 for data quality assessment based on data quality rules for the target system at step 305”. Para. 39, “A technical specification for each mapping (generated based on the logical mappings) describes the manner in which physical attributes of the source data models of staging areas 122 are mapped to the common physical data model (derived from the target system) employed as a baseline for alignment area 124. These mappings enable tracing of attributes from the target system back to one or more source systems and, therefore, allow correlation between source data quality metrics and target data quality metrics”, where the source data quality metrics indicates the DQM of the raw dataset and the target data quality metrics indicates an improvement from the raw dataset.); and 
a third stage configured to: select the operator of the new version of data produced that best meets the data quality metric goals (Para. 42, “data from source systems 140 (FIG. 2) is received and stored in corresponding staging areas 122 for data quality assessment based on data quality rules for the target system at step 305”. Para. 135, “data quality rules for a data object or attribute may span any quantity of data quality dimensions or metrics, where any desired quantity of rules satisfied (or violated) may determine clean (or actionable) data. Further, the data quality rules may be of any quantity, and be associated with one or more particular data objects and a corresponding system (e.g., source, target or other system, etc.). The action plans for the individual phases may include any desired information (e.g., listing of problematic or clean data items, violated rules, cleansing actions, etc.). Any portions of action plans may be generated and/or executed manually and/or automatically (e.g., via a computer system without user intervention)”. Thus, execution of the validation node includes a fourth stage comprising: selecting the operator of the new version of data produced that best meets the data quality metric goals.). 

As to claim 21, Al-Haimi discloses a computer implemented method of improving data quality to conserve computational resources (Para. 26), the method comprising receiving a raw dataset (Fig. 1-2, Para. 25, The clinical data analysis system 100 may receive raw source clinical data 110, i.e. a raw dataset, from various data sources. The raw source clinical data set 110 may be any source data in any format.); 
receiving one or more data quality metric goals corresponding to the received raw dataset (Fig. 3, Para. 7, Using a prediction model, the system may predict the correct transformation to generate content corresponding to the source data  but formatted according to the target data schema, i.e. data quality metric. In some instances the predicted transformation can be based upon the schema mapped to a set of raw clinical data. Para. 49, “At component 320, the four probability matrices may be used to generate a weighted probability matrix 325. Weightings for each classifier algorithm may be tuned based on particular applications an experience”. Thus, the target data schema such as the one or more data quality metric goals corresponding to the received raw dataset are being received.); 
determining a schema of the dataset (Fig. 4, Para. 9, “The method may comprise determining a source data set for transformation to a unified target data schema. The source data set may include a source data records organized according to a source data schema. The source data schema may include a plurality of source fields.”. Para. 23, According to some aspects, a system may utilize deep learning algorithms to determine a mapping from the source schema to the target schema through identifying the source schema, i.e. determining a schema of the dataset, and creating a correspondence between source fields and target fields, and a corresponding data transformation.); 
identifying an initial set of validation nodes to be performed based on the schema of the dataset (Para. 11, “The computer may determine appropriate data transformations to apply to the source data to generate the target data in an appropriate form. Based on the schema mapping, the computer may determine a transformation operative to transform source data content to appropriately formatted target data content.”. Para. 23, “According to some aspects, a system may utilize deep learning algorithms to determine a mapping from the source schema to the target schema through identifying the source schema and creating a correspondence between source fields and target fields, and a corresponding data transformation. Artificial neural networks, configured as schema-level and instance-level classifiers, may generate a set of predictions based on the fields of the source data set and fields of the target data schema”. Para. 60, “the system may determine a data transform based on the source data schema, the target data schema, and the determined mapping. The determined data transform may be applied to the source data set to generate a target data set”, where determining a data transform using set of rules based on the source data schema indicates the validation nodes such as the set of rules which is being identified based on the schema of the dataset.); 
executing the initial set of validation nodes (Para. 11, As with determining the mapping, the computer may pre-process one or more fields of the source data set. Based on the schema mapping, the computer may determine a transformation operative to transform source data content to appropriately formatted target data content. Para. 48, The fields/columns of the source data set may be run, i.e. executing the initial set of validation nodes, through the series of four classification algorithms to generate the prediction and probability results which are used to build probability matrices 315A-D. Para. 26, “The data mapping and transformation module 120 may be an artificial intelligence engine that leverages one or more deep learning algorithms to substantially automatically map the raw source clinical data 110 to a standard data schema, and then perform substantially automatic data transformations on the mapped raw source clinical data 110 to correct data discrepancies.”. Therefore, the initial set of validation nodes are being executed to transform source data content to appropriately formatted target data content.). 

Al-Haimi does not explicitly disclose iteratively expanding and executing a next set of validation nodes based on the schema of the dataset until a termination criterion is reached; and providing a corrected dataset of the raw dataset based on the iterative execution of the initial and next set of validation nodes.
However, in the same field of endeavor, Marrelli discloses iteratively expanding and executing a next set of validation nodes based on the schema of the dataset until a termination criterion is reached (Fig. 3, Para. 95, “The generation of action plans and cleansing of data at step 315 and re-calculation at step 320 are repeated until the results of the data quality analysis are satisfactory (e.g., the source data is sufficiently clean for migration to the target system, etc.). For example, the data quality percentage values may satisfy corresponding thresholds or other criteria to indicate sufficient cleanliness of the source data”. Para. 40, “a table of a data domain may include a row (or record) for each customer, where the columns or data attributes for each row may include first name, last name, and address. Data attributes of a data domain include in-scope data attributes that are relevant to a future-state target environment (e.g., critical to one or more business or other processes of the target system, required by the target system, etc) and considered for data cleansing (e.g., provided with a nonzero weight as described below).”. Thus, a next set of validation nodes are being iteratively expanding and executing based on the schema of the dataset until a termination criterion is reached since data attributed of a data domain are considered for data cleansing until the source data is sufficiently cleaned for migration.); and 
providing a corrected dataset of the raw dataset based on the iterative execution of the initial and next set of validation nodes (Fig. 4, Para. 103, “Once the source analysis phase of the data analysis is completed, the source data is initially cleansed to a sufficient level, and a target process phase of the data quality analysis may be performed. During the target process phase, data in staging areas 122 is converted to the common data model of alignment area 124 (e.g., via an ETL tool) and profiled by the business process hierarchy (BPH). Data quality engine 130 (e.g., via one or more server systems 110) determines actionable or problematic data prioritized for critical processes of the target system. Reports are routed to appropriate users and/or administrators by data quality reports module 132 (e.g., via one or more server systems 110).”. Para. 105, “The target process phase of the data quality analysis further determines whether the cleansing activities of the action plan (e.g., either in the source system or alignment area 124) have been performed correctly, and identifies the potential impact of actionable or problematic data relative to the business or other processes that the actionable data supports. In other words, the target process phase provides an indication of the cleanliness of source data for the particular business or other processes of the target system utilizing that source data”. Thus, a corrected dataset of the raw dataset is being provided based on the iterative execution of the initial and next set of validation nodes.).
Therefore, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Al-Haimi such that the next set of validation nodes such as set of rules of Al-Haimi can be executed repeatedly until the results of the data quality analysis are satisfactory as suggested by Marrelli (Para. 95). One of the ordinary skills in the art would have motivated to make this modification in order to migrate corrected data to the target system as suggested by Marrelli (Col. 95).

As to claim 22, the claim is rejected for the same reasons as claim 21 above. In addition, Marrelli discloses wherein each validation node includes a data quality check and one or more remediation actions (Para. 44, each data attribute of a data domain is associated with a set of data quality rules, i.e. each validation node, for each of source systems 140 and for a corresponding data attribute of target system 150. The set of data quality rules typically span the data quality dimensions. These data quality rules may be pre-defined by a user. For example, a set of data quality rules for a data attribute of the target system may include a completeness rule (e.g., the data attribute must not be mill), a validity rule (e.g., the data attribute must not contain special characters), and an accuracy rule (e.g., the data attribute must be a valid street name for a given zip code). Para. 41, “Actionable or problematic data is prioritized by business criticality and routed to appropriate users and/or administrators by data quality reports module 132. The actionable data is either cleansed in the source systems, or the mappings are updated with conversion rules.”. Para. 110, “the data quality engine may analyze the action plan and correct and/or add data based on the statuses and/or data quality rules violated by the data and indicated in the action plan.”. Therefore, each validation node includes a data quality check and one or more remediation actions.).

As to claim 23, the claim is rejected for the same reasons as claim 21 above. In addition, Marrelli discloses wherein execution of a validation node of the initial set of validation nodes comprises: identifying all possible remediation actions for each data quality check (Para. 119, “The load analysis phase is typically executed once for each integration test cycle, ideally with improved data quality and less process impact each time.”. Para. 135, “data quality rules for a data object or attribute may span any quantity of data quality dimensions or metrics, where any desired quantity of rules satisfied (or violated) may determine clean (or actionable) data. Further, the data quality rules may be of any quantity, and be associated with one or more particular data objects and a corresponding system (e.g., source, target or other system, etc.). The action plans for the individual phases may include any desired information (e.g., listing of problematic or clean data items, violated rules, cleansing actions, etc.). Any portions of action plans may be generated and/or executed manually and/or automatically (e.g., via a computer system without user intervention)”. Therefore, execution of a validation node of the initial set of validation nodes comprises: identifying all possible remediation actions for any data quality check.); 
transforming the data with each possible remediation action (Para. 92, “The action plan may indicate which data records are to be cleansed at the source system, data to be created at the source system prior to conversion, and conversion rules needed to transform source data to standards of the target system. For example, the action plan may be in the form of a listing of records indicating for each record the statuses of the data record attributes, the data quality rules (for the target system) violated, reasons for the violation, and recommended cleansing actions. The cleansing actions may be performed on data within the source systems and/or staging areas 122 manually and/or by the data quality engine as described below”. Thus, the data is being transformed with each possible remediation action.); and 
computing a plurality of data quality metrics (DQMs) to evaluate the transformations (Fig. 7A-7B, Para. 21, “Present invention embodiments compare data elements expected in the target system against corresponding data elements of one or more source systems and produce weighted data quality metrics that are meaningful to resources accountable for cleansing and transformation of the source data elements.”. Para. 135, “Any quantity of the data quality dimensions and/or metrics may be utilized to determine clean or actionable data. For example, data quality rules for a data object or attribute may span any quantity of data quality dimensions or metrics, where any desired quantity of rules satisfied (or violated) may determine clean (or actionable) data. Further, the data quality rules may be of any quantity, and be associated with one or more particular data objects and a corresponding system (e.g., source, target or other system, etc.). The action plans for the individual phases may include any desired information (e.g., listing of problematic or clean data items, violated rules, cleansing actions, etc.).”. Thus, a plurality of data quality metrics (DQMs) computed to evaluate the transformations.).


As to claim 24, the claim is rejected for the same reasons as claim 21 above. In addition, Marrelli discloses wherein execution of a validation node includes: a first stage, comprising: performing a logical check of the raw dataset by a validator object to detect one or more anomalies in the raw dataset (Para. 42, “data from source systems 140 (FIG. 2) is received and stored in corresponding staging areas 122 for data quality assessment based on data quality rules for the target system at step 305”. Para. 59, The data quality rules of the target system are utilized to identify data of the source systems that are actionable or problematic, i.e. to detect one or more anomalies in the raw dataset, with respect to the target system prior to migration to ensure the source data is accepted into the target system.); and 
performing different data transformations by way of a corresponding operator on the raw dataset to produce a new version of data for each data transformation, to correct the one or more detected anomalies (Para. 110, “the data quality engine may determine appropriate conversions or transformations and transform the corresponding data. Further, the data quality engine may analyze the action plan and correct and/or add data based on the statuses and/or data quality rules violated by the data and indicated in the action plan”. Thus, the data quality engine transforms the data using appropriate transformation to correct the one or more detected anomalies.); 
a second stage, comprising: for each new version of data produced, generating a data quality metric (DQM) by an internal quality evaluator (IQE) module; and generating a DQM for the raw dataset (Para. 42, “data from source systems 140 (FIG. 2) is received and stored in corresponding staging areas 122 for data quality assessment based on data quality rules for the target system at step 305”. Para. 21, Present invention embodiments compare data elements expected in the target system against corresponding data elements, i.e. the raw dataset, of one or more source systems and produce weighted data quality metrics that are meaningful to resources accountable for cleansing and transformation of the source data elements. Thus, the data quality metric for the raw dataset are being generated.); 
a third stage, comprising: for each DQM of each new version of data produced, performing a comparison to the raw dataset (Para. 42, “data from source systems 140 (FIG. 2) is received and stored in corresponding staging areas 122 for data quality assessment based on data quality rules for the target system at step 305”. Para. 39, “A technical specification for each mapping (generated based on the logical mappings) describes the manner in which physical attributes of the source data models of staging areas 122 are mapped to the common physical data model (derived from the target system) employed as a baseline for alignment area 124. These mappings enable tracing of attributes from the target system back to one or more source systems and, therefore, allow correlation between source data quality metrics and target data quality metrics”, where the source data quality metrics indicates the DQM of the raw dataset and the target data quality metrics indicates an improvement from the raw dataset.); and 
a fourth stage, comprising: selecting the operator of the new version of data produced that best meets the data quality metric goals (Para. 42, “data from source systems 140 (FIG. 2) is received and stored in corresponding staging areas 122 for data quality assessment based on data quality rules for the target system at step 305”. Para. 135, “data quality rules for a data object or attribute may span any quantity of data quality dimensions or metrics, where any desired quantity of rules satisfied (or violated) may determine clean (or actionable) data. Further, the data quality rules may be of any quantity, and be associated with one or more particular data objects and a corresponding system (e.g., source, target or other system, etc.). The action plans for the individual phases may include any desired information (e.g., listing of problematic or clean data items, violated rules, cleansing actions, etc.). Any portions of action plans may be generated and/or executed manually and/or automatically (e.g., via a computer system without user intervention)”. Thus, execution of the validation node includes a fourth stage comprising: selecting the operator of the new version of data produced that best meets the data quality metric goals.).

As to claim 25, the claim is rejected for the same reasons as claim 21 above. In addition, Marrelli discloses wherein expanding a next set of validation nodes comprises at least one of: determining a validation node that best achieves one or more received quality metric goals; or determining a validation node based on mining an execution information repository to find all validation nodes that usually occur together (Para. 135, “data quality rules for a data object or attribute may span any quantity of data quality dimensions or metrics, where any desired quantity of rules satisfied (or violated) may determine clean (or actionable) data. Further, the data quality rules may be of any quantity, and be associated with one or more particular data objects and a corresponding system (e.g., source, target or other system, etc.). The action plans for the individual phases may include any desired information (e.g., listing of problematic or clean data items, violated rules, cleansing actions, etc.). Any portions of action plans may be generated and/or executed manually and/or automatically (e.g., via a computer system without user intervention)”. Therefore, expanding a next set of validation nodes comprises at least one of: determining a validation node that best achieves one or more received quality metric goals.).

Conclusion
7.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Nath et al. (US 2019/0155797 A1) teaches data quality management.
WONG et al. (US 2016/0267082 A1) teaches processing high volumes of data and generating insights within a pre-determined timeframe.
Sundaramoorthy et al. (US 2021/0004350 A1) teaches automatically extracts a data file from an upstream source based on ingestion parameters.
Lowry et al. (US 2012/0123994 A1) teaches analyzing data quality.

8.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to MOHAMMAD SOLAIMAN BHUYAN whose telephone number is (571)272-7843. The examiner can normally be reached on Monday - Friday 9:00am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Robert Beausoliel can be reached on 571-272-3645. The fax phone number for the organization where this application or proceeding is assigned is 571 -273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/MOHAMMAD S BHUYAN/Examiner, Art Unit 2167    

/ROBERT W BEAUSOLIEL JR/Supervisory Patent Examiner, Art Unit 2167