DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
This Non-Final Office Action is in response to remarks and amendment filed on 04/05/2022.
Amended claims 1, 8 and 15, filed on 04/05/2022 are being considered on the merits.
Claims 1-20 remain pending in the application.  

This action is in response to the remarks and amendments submitted on 04/05/2022. In response to the last Office Action: 
Claims 1, 8 and 15 have been amended.
The rejection of claims 1-2, 5-9, 12-16 and 19-20 under 35 USC § 101 as being an abstract idea, set forth in the previous Office Action mailed on 01/05/2022, has been withdrawn.   Applicant’s amendments and remarks regarding independent claims 1, 8 and 15, filed on 04/05/2022, to recite additional claim limitations and clarify that the combination of elements are integrated into a practical application, have overcome the previously set forth rejection.

Response to Arguments
The applicant’s remarks and/or arguments, filed on 04/05/2022 have been fully considered. 
The examiner is entitled to give claim limitations their broadest reasonable interpretation in light of the specification. See MPEP 2111 [R-1] Interpretation of Claims-Broadest Reasonable Interpretation. The applicant always has the opportunity to amend the claims during prosecution, and broad interpretation by the examiner reduces the possibility that the claim, once issued, will be interpreted more broadly than is justified. In re Prater, 162 USPQ 541,550-51 (CCPA 1969).

Applicant’s claim amendments and remarks regarding the amended independent claims, filed on 04/05/2022, have been fully considered and are persuasive.  Therefore, the previously set claim rejections under 35 USC 103 has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of the newly found prior art: (US 2020/0210427 A1) issued to Dugan et al. (disclosing methods for processing column lineage and metadata propagation); and in view of (US 2015/0278214 A1) issued to Anand et al. (disclosing methods for ranking data visualizations using different data fields).  Please see the below set forth rejection for further details.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1, 3-6, 8, 10, 10-13, 15, and 17-20 rejected under 35 U.S.C. 103 as being unpatentable over US Patent Application Publication (US 2020/0210427 A1) issued to Dugan et al. (hereinafter as “DUGAN”, and in view of US Patent Application Publication (US 2015/0278214 A1) issued to Anand et al. (hereinafter as “ANAND”).
Regarding claim 1 (Currently Amended), DUGAN teaches a method comprising: 
receiving, by a processor, a script, the script including commands to access a composite dataset (DUGAN Para. [0015]: “Aspects of the present disclosure are directed to metadata generation for columns of a dataset. The dataset may be used or created as part of a data pipeline. A data pipeline may refer to an ordered set of logic (e.g., a collection of computer software scripts or programs) that performs a multi-step transformation of data obtained from data sources to produce one or more output datasets. Each data transformation step applies transformation code to one or more source datasets (i.e. collections of enterprise data) to produce one or more target datasets. For example, the transformation code can be software code, such as a script, that defines a set of instructions to transform source columns of one or more source datasets into target columns of one or more target datasets. In a data pipeline, the source dataset can result in hundreds if not thousands of derived (target) datasets.”; and
Fig. 1, Para. [0034]: “A source dataset may be raw (i.e., un-edited) data that comes directly from a data source (e.g., a full list of customer accounts) and represents the starting point of a data pipeline. Alternatively, a source dataset may be a target dataset, which is a dataset that is generated (i.e., built) by editing (e.g., manually or by executing logic of a data transformation step from pipeline repository 107) one or more source datasets. A target dataset may be potentially further transformed to provide one or more other datasets as input to the next data transformation step.”, 
the examiner notes that the reference discloses that each data transformation step applies transformation code to the source datasets to that of commands to access a composite dataset),
pre-processing, by the processor, the script to identify a set of columns associated with the composite dataset (DUGAN Para. [0015]: “.., the transformation code can be software code, such as a script, that defines a set of instructions to transform source columns of one or more source datasets into target columns of one or more target datasets. In a data pipeline, the source dataset can result in hundreds if not thousands of derived (target) datasets.”); and
Para. [0016]: “Provenance of a dataset, and in particular column provenance of columns of a dataset, can help users or systems determine whether a given dataset is trustworthy. An error or mistake in a dataset can propagate through the data pipeline if left uncorrected. Such an error or mistake can cause many problems including, for example, inaccurate data, failure of downstream processes that rely on the dataset, and so forth. Column lineage metadata provides granularity with respect to a column's history, which can be invaluable for identifying and correcting propagated errors in datasets.”); 
loading, by the processor, a metadata file associated with the composite dataset , the metadata file including an algebraic representation defining relationships among  (DUGAN Fig. 3A/3B, Para. [0010]: “FIG. 3B illustrates a visual representation of derived relationships between one or more source columns and respective target columns using the logical query plan of FIG. 3A”; and
Para. [0020]: “…, the relationships between source column(s) of one or more source datasets and respective target column(s) of one or more target dataset(s) can be derived from a logical query plan. A logical query plan can refer to an ordered set of operations that is used to access data from one or more source datasets to generate one or more target datasets”; and 
Fig. 1/2, Para. [0043]: “…, planning analyzer 210 can receive or identify the transformation code when it is executed as part of a data transformation step of the data pipeline, resulting in creation of a target dataset. Planning analyzer 210 can parse and convert the transformation code into a logical query plan. As noted above, a logical query plan (also referred to as “logical plan” herein) can refer to an ordered set of operations that is used to access data from one or more source datasets to generate one or more new target datasets. The logical query plan can be a hierarchical structure expressed as a tree of nodes of logical operators (e.g., relational algebraic expression)… the logical query plan can have a particular syntax suitable for interpretation or execution by metadata management system 110”; and
Para. [0051]: “A logical query plan is generated and subsequently parsed to derive relationships between the source column (e.g., column Y) of the source dataset (e.g., dataset B) and the target column (e.g., column Z) of the target dataset (e.g., dataset C). Target column metadata, in particular column lineage metadata, is generated for the target column, column Z. The column lineage metadata can identify the currently derived relationship between the source column and the target column, and the existing column lineage metadata of column Y.”, 
the examiner notes that the logical query plan that can be a hierarchical structure expressed as a tree of nodes of logical operators (e.g., relational algebraic expression), to that of a metadata file associated with the composite dataset , the metadata file including an algebraic representation of the plurality of datasets); 
parsing, by the processor, the algebraic representation to identify one or more datasets that include a column in the set of columns, the one or more datasets comprising a subset of the plurality of datasets (DUGAN Para. [0021]: “…, the logical query plan can be parsed to derive relationships between the source columns of the source datasets and the respective target columns of the target datasets. The derived relationships can be used to generate target column metadata (e.g., current column lineage metadata). Further, existing column metadata of the source columns (e.g., existing column lineage metadata, column level access control metadata, or user comments) can be also included in the generated target column metadata to provide metadata of all ancestors of the target column in one place.”);
loading, by the processor, data from the one or more datasets (DUGAN Para. [0046]: “…, the preliminary relationship representation can identify the source dataset(s), the source columns in the source dataset(s), the target dataset(s), one or more target columns of the target dataset(s), and relationships between the source columns of the source dataset(s) and respective target columns of the target dataset(s). Additional details regarding preliminary relationship representation of the derived relationships is further described with respect to FIG. 3B (e.g., derived relationships model 350 of FIG. 3B).”; and
Fig. 5, Para. [0105]: “At operation 510 of method 500, processing logic finds, in the logical query plan, one or more keywords associated with one or more first logical query plan portions that each identify a source dataset of the source datasets. At operation 520, processing logic finds, in the logical query plan, one or more keywords associated with a second logical query plan portion that identifies the source columns of the source datasets. At operation 530, processing logic finds, in the logical query plan, one or more keywords associated with a third logical query plan portion that identifies the respective target columns of the target dataset. At operation 540, processing logic finds, for each of the respective target columns of the target dataset, one or more keywords associated with a fourth logical query plan portion describing a relationship between at least one of the source columns of the source datasets and the respective target column of the target dataset.”); and 
executing, by the processor, the script on the one or more datasets (DUGAN Fig. 1/2,  Para. [0027]: “…, data management platform 102 can include metadata management system 110, datastore 105 storing the underlying data (e.g., enterprise data), and pipeline repository 107 storing one or more data pipelines. A data pipeline includes a set of logic to execute a series of data transformation steps on one or more source datasets stored in datastore 105. Each data transformation step produces one or more target datasets (also referred to herein as “derived datasets”) that may also be stored in datastore 105.”).
  
However, DUGAN does not explicitly teach that the composite dataset comprising a plurality of datasets, the plurality of datasets including a raw dataset and one or more annotation datasets, the one or more annotation datasets created independent of the raw dataset.
But, ANAND teaches that the composite dataset comprising a plurality of datasets, the plurality of datasets including a raw dataset and one or more annotation datasets, the one or more annotation datasets created independent of the raw dataset (ANAND Fig. 2, Para. [0203]: “The data source 236 may be a SQL database, a spreadsheet, an XML file, a desktop database, a flat file, a CSV file, or other organized data source. Some implementations support combined or blended data sources, with data from two or more distinct sources. The data fields may be raw fields from the data source (i.e., the data field exists in the data source) or may be computed from one or more raw fields (e.g., computing a month, quarter, or year from a date field in the data source).”, 
the examiner notes that a data source of combined or blended od raw data or computed data to that a composite dataset containing raw and annotated data). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of DUGAN (disclosing method for processing column lineage and metadata propagation) to include the teachings of ANAND (disclosing methods for ranking data visualizations using different data fields) and arrive at a method to manipulate datasets of combined data sources .  One of ordinary skill in the art would have been motivated to make this combination because by applying a rules-based/scripted operations on derived datasets of combined sources, thereby system users can process data of all sorts through data pipeline systems with an increased efficiency, as also recognized by (ANAND, Abstract, Para. [0014]-[0016]). In addition, the references of DUGAN and ANAND teach features that are directed to analogous art and they are directed to the same field of endeavor of datasets transformation and processing.

Regarding claim 3 (Previously Presented), the combination of DUGAN and ANAND teaches the limitations of claim 1.  Further, DUGAN teaches wherein parsing the metadata file comprises identifying one or more dataset objects stored within the metadata file (DUGAN Fig. 3, Para. [0075]: “If the parsing module 220 identifies keyword 320A, parsing module 220 can expect particular information in the identified portion(s) of the logical query plan 300 associated with keyword 320A and extract the particular information to identify one or more source datasets.”); and 
extracting schemas associated with each of the one or more dataset objects (DUGAN Fig. 3, Para. [0077]: “Parsing the logical query plan 300 to derive the relationships between the source columns of the one or more source datasets and the respective target columns of the target dataset can also include finding, in the logical query plan 300, one or more keywords associated with a specific logical query plan portion (second portion) that identifies the source columns of the source datasets.”, 
the examiner notes that the reference derives the relationships between the source columns of the one or more source datasets system to that of a metadata file associated with the composite dataset).  

Regarding claim 4 (Previously Presented), the combination of DUGAN and ANAND teaches the limitations of claim 3.  Further, DUGAN teaches wherein parsing the metadata file further comprises identifying a schema in the schemas that includes at least one column in the set of columns (DUGAN Fig. 3, Para. [0077]: “Parsing the logical query plan 300 to derive the relationships between the source columns of the one or more source datasets and the respective target columns of the target dataset can also include finding, in the logical query plan 300, one or more keywords associated with a specific logical query plan portion (second portion) that identifies the source columns of the source datasets.”).  

Regarding claim 5 (Previously Presented), the combination of DUGAN and ANAND teaches the limitations of claim 1.  Further, DUGAN teaches wherein loading data from the one or more datasets comprises identifying file paths associated with the one or more datasets and loading data from files stored at the file paths (DUGAN Fig. 2/3, Para. [0078]: “…, parsing module 220 can parse the logical query plan 300 to identify keyword 320D (“Project”) arranged in a particular location with respect to logical query plan 300.”, the examiner notes that a particular location of the query plan to that of data path/location).  

Regarding claim 6 (Previously Presented), the combination of DUGAN and ANAND teaches the limitations of claim 5.  Further, DUGAN teaches wherein executing the script comprises combining the one or more datasets to form a second composite dataset and using the second composite dataset while executing the script (DUGAN Para. [0034]: “…, a data transformation step may produce a target dataset by filtering records in an input dataset to those comprising a particular value or set of values, or by joining together two related input datasets, or by replacing references in an input dataset to values in another input dataset with actual data referenced.”).  

Regarding claim 8 (Currently Amended), DUGAN teaches a non-transitory computer-readable storage medium for tangibly storing computer program instructions capable of being executed by a computer processor (DUGAN Para. [0112]: “The machine-readable storage medium 628 may also be used to store instructions of column lineage and metadata propagation, as described here”), the computer program instructions defining the steps of: 
receiving a script, the script including commands to access a composite dataset (DUGAN Para. [0015]: “Aspects of the present disclosure are directed to metadata generation for columns of a dataset. The dataset may be used or created as part of a data pipeline. A data pipeline may refer to an ordered set of logic (e.g., a collection of computer software scripts or programs) that performs a multi-step transformation of data obtained from data sources to produce one or more output datasets. Each data transformation step applies transformation code to one or more source datasets (i.e. collections of enterprise data) to produce one or more target datasets. For example, the transformation code can be software code, such as a script, that defines a set of instructions to transform source columns of one or more source datasets into target columns of one or more target datasets. In a data pipeline, the source dataset can result in hundreds if not thousands of derived (target) datasets.”; and
Fig. 1, Para. [0034]: “A source dataset may be raw (i.e., un-edited) data that comes directly from a data source (e.g., a full list of customer accounts) and represents the starting point of a data pipeline. Alternatively, a source dataset may be a target dataset, which is a dataset that is generated (i.e., built) by editing (e.g., manually or by executing logic of a data transformation step from pipeline repository 107) one or more source datasets. A target dataset may be potentially further transformed to provide one or more other datasets as input to the next data transformation step.”, 
the examiner notes that the reference discloses that each data transformation step applies transformation code to the source datasets to that of commands to access a composite dataset); 
pre-processing, by the processor, the script to identify a set of columns associated with the composite dataset (DUGAN Para. [0015]: “.., the transformation code can be software code, such as a script, that defines a set of instructions to transform source columns of one or more source datasets into target columns of one or more target datasets. In a data pipeline, the source dataset can result in hundreds if not thousands of derived (target) datasets.”); and
Para. [0016]: “Provenance of a dataset, and in particular column provenance of columns of a dataset, can help users or systems determine whether a given dataset is trustworthy. An error or mistake in a dataset can propagate through the data pipeline if left uncorrected. Such an error or mistake can cause many problems including, for example, inaccurate data, failure of downstream processes that rely on the dataset, and so forth. Column lineage metadata provides granularity with respect to a column's history, which can be invaluable for identifying and correcting propagated errors in datasets.”); 
loading, by the processor, a metadata file associated with the composite dataset, the metadata file including an algebraic representation defining relationships among  (DUGAN Fig. 3A/3B, Para. [0010]: “FIG. 3B illustrates a visual representation of derived relationships between one or more source columns and respective target columns using the logical query plan of FIG. 3A”; and
Para. [0020]: “…, the relationships between source column(s) of one or more source datasets and respective target column(s) of one or more target dataset(s) can be derived from a logical query plan. A logical query plan can refer to an ordered set of operations that is used to access data from one or more source datasets to generate one or more target datasets”; and 
Fig. 1/2, Para. [0043]: “…, planning analyzer 210 can receive or identify the transformation code when it is executed as part of a data transformation step of the data pipeline, resulting in creation of a target dataset. Planning analyzer 210 can parse and convert the transformation code into a logical query plan. As noted above, a logical query plan (also referred to as “logical plan” herein) can refer to an ordered set of operations that is used to access data from one or more source datasets to generate one or more new target datasets. The logical query plan can be a hierarchical structure expressed as a tree of nodes of logical operators (e.g., relational algebraic expression)… the logical query plan can have a particular syntax suitable for interpretation or execution by metadata management system 110”; and
Para. [0051]: “A logical query plan is generated and subsequently parsed to derive relationships between the source column (e.g., column Y) of the source dataset (e.g., dataset B) and the target column (e.g., column Z) of the target dataset (e.g., dataset C). Target column metadata, in particular column lineage metadata, is generated for the target column, column Z. The column lineage metadata can identify the currently derived relationship between the source column and the target column, and the existing column lineage metadata of column Y.”, 
the examiner notes that the logical query plan that can be a hierarchical structure expressed as a tree of nodes of logical operators (e.g., relational algebraic expression), to that of a metadata file associated with the composite dataset , the metadata file including an algebraic representation of the plurality of datasets); 
parsing, by the processor, the algebraic representation  (DUGAN Para. [0021]: “…, the logical query plan can be parsed to derive relationships between the source columns of the source datasets and the respective target columns of the target datasets. The derived relationships can be used to generate target column metadata (e.g., current column lineage metadata). Further, existing column metadata of the source columns (e.g., existing column lineage metadata, column level access control metadata, or user comments) can be also included in the generated target column metadata to provide metadata of all ancestors of the target column in one place.”);  
loading data from the one or more datasets (DUGAN Para. [0046]: “…, the preliminary relationship representation can identify the source dataset(s), the source columns in the source dataset(s), the target dataset(s), one or more target columns of the target dataset(s), and relationships between the source columns of the source dataset(s) and respective target columns of the target dataset(s). Additional details regarding preliminary relationship representation of the derived relationships is further described with respect to FIG. 3B (e.g., derived relationships model 350 of FIG. 3B).”; and
Fig. 5, Para. [0105]: “At operation 510 of method 500, processing logic finds, in the logical query plan, one or more keywords associated with one or more first logical query plan portions that each identify a source dataset of the source datasets. At operation 520, processing logic finds, in the logical query plan, one or more keywords associated with a second logical query plan portion that identifies the source columns of the source datasets. At operation 530, processing logic finds, in the logical query plan, one or more keywords associated with a third logical query plan portion that identifies the respective target columns of the target dataset. At operation 540, processing logic finds, for each of the respective target columns of the target dataset, one or more keywords associated with a fourth logical query plan portion describing a relationship between at least one of the source columns of the source datasets and the respective target column of the target dataset.”); and 
executing the script on the one or more datasets (DUGAN Fig. 1/2,  Para. [0027]: “…, data management platform 102 can include metadata management system 110, datastore 105 storing the underlying data (e.g., enterprise data), and pipeline repository 107 storing one or more data pipelines. A data pipeline includes a set of logic to execute a series of data transformation steps on one or more source datasets stored in datastore 105. Each data transformation step produces one or more target datasets (also referred to herein as “derived datasets”) that may also be stored in datastore 105.”).  

However, DUGAN does not explicitly teach that the composite dataset comprising a plurality of datasets, the plurality of datasets including a raw dataset and one or more annotation datasets, the one or more annotation datasets created independent of the raw dataset.
But, ANAND teaches that the composite dataset comprising a plurality of datasets, the plurality of datasets including a raw dataset and one or more annotation datasets, the one or more annotation datasets created independent of the raw dataset (ANAND Fig. 2, Para. [0203]: “The data source 236 may be a SQL database, a spreadsheet, an XML file, a desktop database, a flat file, a CSV file, or other organized data source. Some implementations support combined or blended data sources, with data from two or more distinct sources. The data fields may be raw fields from the data source (i.e., the data field exists in the data source) or may be computed from one or more raw fields (e.g., computing a month, quarter, or year from a date field in the data source).”, 
the examiner notes that a data source of combined or blended od raw data or computed data to that a composite dataset containing raw and annotated data). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of DUGAN (disclosing method for processing column lineage and metadata propagation) to include the teachings of ANAND (disclosing methods for ranking data visualizations using different data fields) and arrive at a method to manipulate datasets of combined data sources .  One of ordinary skill in the art would have been motivated to make this combination because by applying a rules-based/scripted operations on derived datasets of combined sources, thereby system users can process data of all sorts through data pipeline systems with an increased efficiency, as also recognized by (ANAND, Abstract, Para. [0014]-[0016]). In addition, the references of DUGAN and ANAND teach features that are directed to analogous art and they are directed to the same field of endeavor of datasets transformation and processing.

Regarding claim 10 (Previously Presented), the combination of DUGAN and ANAND teaches the limitations of claim 8. Further, DUGAN teaches wherein parsing the metadata file comprises identifying one or more dataset objects stored within the metadata file (DUGAN Fig. 3, Para. [0075]: “If the parsing module 220 identifies keyword 320A, parsing module 220 can expect particular information in the identified portion(s) of the logical query plan 300 associated with keyword 320A and extract the particular information to identify one or more source datasets.”); and 
extracting schemas associated with each of the one or more dataset objects (DUGAN Fig. 3, Para. [0077]: “Parsing the logical query plan 300 to derive the relationships between the source columns of the one or more source datasets and the respective target columns of the target dataset can also include finding, in the logical query plan 300, one or more keywords associated with a specific logical query plan portion (second portion) that identifies the source columns of the source datasets.”, 
the examiner notes that the reference derives the relationships between the source columns of the one or more source datasets system to that of a metadata file associated with the composite dataset).  

Regarding claim 11 (Previously Presented), the combination of DUGAN and ANAND teaches the limitations of claim 10. Further, DUGAN teaches wherein parsing the metadata file further comprises identifying a schema in the schemas that includes at least one column in the set of columns (DUGAN Fig. 3, Para. [0077]: “Parsing the logical query plan 300 to derive the relationships between the source columns of the one or more source datasets and the respective target columns of the target dataset can also include finding, in the logical query plan 300, one or more keywords associated with a specific logical query plan portion (second portion) that identifies the source columns of the source datasets.”).  

Regarding claim 12 (Previously Presented), the combination of DUGAN and ANAND teaches the limitations of claim 8.  Further, DUGAN teaches wherein loading data from the one or more datasets comprises identifying file paths associated with the one or more datasets and loading data from files stored at the file paths (DUGAN Fig. 2/3, Para. [0078]: “…, parsing module 220 can parse the logical query plan 300 to identify keyword 320D (“Project”) arranged in a particular location with respect to logical query plan 300.”, the examiner notes that a particular location of the query plan to that of data path/location).  

Regarding claim 13 (Previously Presented), the combination of DUGAN and ANAND teaches the limitations of claim 12.  Further, DUGAN teaches wherein executing the script comprises combining the one or more datasets to form a second composite dataset and using the second composite dataset while executing the script (DUGAN Para. [0034]: “…, a data transformation step may produce a target dataset by filtering records in an input dataset to those comprising a particular value or set of values, or by joining together two related input datasets, or by replacing references in an input dataset to values in another input dataset with actual data referenced.”).  

Regarding claim 15 (Currently Amended), DUGAN teaches an apparatus comprising: a processor; and a storage medium for tangibly storing thereon program logic for execution by the processor (DUGAN Fig. 4 and Para. [0083: “FIG. 4 is a flow diagram illustrating a method of generating column metadata for a target column of a target dataset, in accordance with some embodiments. The method 400 may be performed by processing logic that comprises hardware (e.g., circuitry, dedicated logic, programmable logic, microcode, etc.), software (e.g., instructions run on a processor to perform hardware simulation), or a combination thereof”; and Para. [0112]: “The machine-readable storage medium 628 may also be used to store instructions of column lineage and metadata propagation, as described here”), the program logic causing the processor to perform operations of: 
receiving a script, the script including commands to access a composite dataset (DUGAN Para. [0015]: “Aspects of the present disclosure are directed to metadata generation for columns of a dataset. The dataset may be used or created as part of a data pipeline. A data pipeline may refer to an ordered set of logic (e.g., a collection of computer software scripts or programs) that performs a multi-step transformation of data obtained from data sources to produce one or more output datasets. Each data transformation step applies transformation code to one or more source datasets (i.e. collections of enterprise data) to produce one or more target datasets. For example, the transformation code can be software code, such as a script, that defines a set of instructions to transform source columns of one or more source datasets into target columns of one or more target datasets. In a data pipeline, the source dataset can result in hundreds if not thousands of derived (target) datasets.”; and
Fig. 1, Para. [0034]: “A source dataset may be raw (i.e., un-edited) data that comes directly from a data source (e.g., a full list of customer accounts) and represents the starting point of a data pipeline. Alternatively, a source dataset may be a target dataset, which is a dataset that is generated (i.e., built) by editing (e.g., manually or by executing logic of a data transformation step from pipeline repository 107) one or more source datasets. A target dataset may be potentially further transformed to provide one or more other datasets as input to the next data transformation step.”, 
the examiner notes that the reference discloses that each data transformation step applies transformation code to the source datasets to that of commands to access a composite dataset); 
pre-processing, by the processor, the script to identify a set of columns associated with the composite dataset (DUGAN Para. [0015]: “.., the transformation code can be software code, such as a script, that defines a set of instructions to transform source columns of one or more source datasets into target columns of one or more target datasets. In a data pipeline, the source dataset can result in hundreds if not thousands of derived (target) datasets.”); and
Para. [0016]: “Provenance of a dataset, and in particular column provenance of columns of a dataset, can help users or systems determine whether a given dataset is trustworthy. An error or mistake in a dataset can propagate through the data pipeline if left uncorrected. Such an error or mistake can cause many problems including, for example, inaccurate data, failure of downstream processes that rely on the dataset, and so forth. Column lineage metadata provides granularity with respect to a column's history, which can be invaluable for identifying and correcting propagated errors in datasets.”); 
loading, by the processor, a metadata file associated with the composite dataset, the metadata file including an algebraic representation defining relationships among  (DUGAN Fig. 3A/3B, Para. [0010]: “FIG. 3B illustrates a visual representation of derived relationships between one or more source columns and respective target columns using the logical query plan of FIG. 3A”; and
Para. [0020]: “…, the relationships between source column(s) of one or more source datasets and respective target column(s) of one or more target dataset(s) can be derived from a logical query plan. A logical query plan can refer to an ordered set of operations that is used to access data from one or more source datasets to generate one or more target datasets”; and 
Fig. 1/2, Para. [0043]: “…, planning analyzer 210 can receive or identify the transformation code when it is executed as part of a data transformation step of the data pipeline, resulting in creation of a target dataset. Planning analyzer 210 can parse and convert the transformation code into a logical query plan. As noted above, a logical query plan (also referred to as “logical plan” herein) can refer to an ordered set of operations that is used to access data from one or more source datasets to generate one or more new target datasets. The logical query plan can be a hierarchical structure expressed as a tree of nodes of logical operators (e.g., relational algebraic expression)… the logical query plan can have a particular syntax suitable for interpretation or execution by metadata management system 110”; and
Para. [0051]: “A logical query plan is generated and subsequently parsed to derive relationships between the source column (e.g., column Y) of the source dataset (e.g., dataset B) and the target column (e.g., column Z) of the target dataset (e.g., dataset C). Target column metadata, in particular column lineage metadata, is generated for the target column, column Z. The column lineage metadata can identify the currently derived relationship between the source column and the target column, and the existing column lineage metadata of column Y.”, 
the examiner notes that the logical query plan that can be a hierarchical structure expressed as a tree of nodes of logical operators (e.g., relational algebraic expression), to that of a metadata file associated with the composite dataset , the metadata file including an algebraic representation of the plurality of datasets); 
parsing, by the processor, the algebraic representation to identify one or more datasets that include a column in the set of columns, the one or more datasets comprising a subset of the plurality of datasets (DUGAN Para. [0021]: “…, the logical query plan can be parsed to derive relationships between the source columns of the source datasets and the respective target columns of the target datasets. The derived relationships can be used to generate target column metadata (e.g., current column lineage metadata). Further, existing column metadata of the source columns (e.g., existing column lineage metadata, column level access control metadata, or user comments) can be also included in the generated target column metadata to provide metadata of all ancestors of the target column in one place.”); and 
executing the script on the one or more datasets (DUGAN Fig. 1/2,  Para. [0027]: “…, data management platform 102 can include metadata management system 110, datastore 105 storing the underlying data (e.g., enterprise data), and pipeline repository 107 storing one or more data pipelines. A data pipeline includes a set of logic to execute a series of data transformation steps on one or more source datasets stored in datastore 105. Each data transformation step produces one or more target datasets (also referred to herein as “derived datasets”) that may also be stored in datastore 105.”).  

However, DUGAN does not explicitly teach that the composite dataset comprising a plurality of datasets, the plurality of datasets including a raw dataset and one or more annotation datasets, the one or more annotation datasets created independent of the raw dataset.
But, ANAND teaches that the composite dataset comprising a plurality of datasets, the plurality of datasets including a raw dataset and one or more annotation datasets, the one or more annotation datasets created independent of the raw dataset (ANAND Fig. 2, Para. [0203]: “The data source 236 may be a SQL database, a spreadsheet, an XML file, a desktop database, a flat file, a CSV file, or other organized data source. Some implementations support combined or blended data sources, with data from two or more distinct sources. The data fields may be raw fields from the data source (i.e., the data field exists in the data source) or may be computed from one or more raw fields (e.g., computing a month, quarter, or year from a date field in the data source).”, 
the examiner notes that a data source of combined or blended od raw data or computed data to that a composite dataset containing raw and annotated data). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of DUGAN (disclosing method for processing column lineage and metadata propagation) to include the teachings of ANAND (disclosing methods for ranking data visualizations using different data fields) and arrive at a method to manipulate datasets of combined data sources .  One of ordinary skill in the art would have been motivated to make this combination because by applying a rules-based/scripted operations on derived datasets of combined sources, thereby system users can process data of all sorts through data pipeline systems with an increased efficiency, as also recognized by (ANAND, Abstract, Para. [0014]-[0016]). In addition, the references of DUGAN and ANAND teach features that are directed to analogous art and they are directed to the same field of endeavor of datasets transformation and processing.  

Regarding claim 17 (Previously Presented), the combination of DUGAN and ANAND teaches the limitations of claim 15. Further, DUGAN teaches wherein parsing the metadata file comprises identifying one or more dataset objects stored within the metadata file (DUGAN Fig. 3, Para. [0075]: “If the parsing module 220 identifies keyword 320A, parsing module 220 can expect particular information in the identified portion(s) of the logical query plan 300 associated with keyword 320A and extract the particular information to identify one or more source datasets.”); and 
extracting schemas associated with each of the one or more dataset objects (DUGAN Fig. 3, Para. [0077]: “Parsing the logical query plan 300 to derive the relationships between the source columns of the one or more source datasets and the respective target columns of the target dataset can also include finding, in the logical query plan 300, one or more keywords associated with a specific logical query plan portion (second portion) that identifies the source columns of the source datasets.”, 
the examiner notes that the reference derives the relationships between the source columns of the one or more source datasets system to that of a metadata file associated with the composite dataset).  

Regarding claim 18 (Previously Presented), the combination of DUGAN and ANAND teaches the limitations of claim 17. Further, DUGAN teaches wherein parsing the metadata file further comprises identifying a schema in the schemas that includes at least one column in the set of columns (DUGAN Fig. 3, Para. [0077]: “Parsing the logical query plan 300 to derive the relationships between the source columns of the one or more source datasets and the respective target columns of the target dataset can also include finding, in the logical query plan 300, one or more keywords associated with a specific logical query plan portion (second portion) that identifies the source columns of the source datasets.”).  

Regarding claim 19 (Previously Presented), the combination of DUGAN and ANAND teaches the limitations of claim 15.  Further, DUGAN teaches wherein loading data from the one or more datasets comprises identifying file paths associated with the one or more datasets and loading data from files stored at the file paths (DUGAN Fig. 2/3, Para. [0078]: “…, parsing module 220 can parse the logical query plan 300 to identify keyword 320D (“Project”) arranged in a particular location with respect to logical query plan 300.”, the examiner notes that a particular location of the query plan to that of data path/location).  

Regarding claim 20 (Previously Presented), the combination of DUGAN and ANAND teaches the limitations of claim 19.  Further, DUGAN teaches wherein executing the script comprises combining the one or more datasets to form a second composite dataset and using the second composite dataset while executing the script (DUGAN Para. [0034]: “…, a data transformation step may produce a target dataset by filtering records in an input dataset to those comprising a particular value or set of values, or by joining together two related input datasets, or by replacing references in an input dataset to values in another input dataset with actual data referenced.”).

Claims 2, 9, and 16 rejected under 35 U.S.C. 103 as being unpatentable over US Patent Application Publication (US 2020/0210427 A1) issued to Dugan et al. (hereinafter as “DUGAN”, in view of US Patent Application Publication (US 2015/0278214 A1) issued to Anand et al. (hereinafter as “ANAND”), and in view of US Patent Application Publication (US 2013/0332449 A1) issued to Amos et al. (hereinafter as “AMOS”).

Regarding claim 2 (Previously Presented), the combination of DUGAN and ANAND teaches the limitations of claim 1.
However, the combination of DUGAN and ANAND does not explicitly wherein pre-processing the script comprises analyzing a directed acyclic graph representing the script to identify one or more column names included in the commands.
But, AMOS teaches  the wherein pre-processing the script comprising comprises analyzing a directed acyclic graph representing the script to identify one or more column names included in the commands (AMOS Abstract, lines (1-3): “The present invention provides a computer-implemented code generation system that generates data processing code from a directed acyclic graph (DAG).”; and
Para. [0009], lines (3-5): “…, a table is defined as a collection of data values that has one or more columns and zero or more rows. If a table has zero rows then the table is empty. Each column has a name and data type (e.g., character, number, or date).”; and 
Para. [0010], lines (1-5): “Individuals can build a data processing model using a Directed Acyclic Graph (DAG) that shows the flow of data from input tables to output tables. Each node has attributes that specify a number of input tables, a number of output tables, and the operations performed on the data.”; and
Para. [0011], lines (2-12): “…, an open-source data-mining tool called KNIME is used to build a DAG. KNIME saves the DAG in XML files. ... The resulting DAG-XML file is used to generate Pig Latin and User Defined Functions (UDF) Java Archive (JAR) files for Apache Pig, or SQL scripts for a relational database. The resulting scripts are then run in Apache Pig or a relational database to process the data and produce the results.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination teachings of DUGAN (disclosing method for processing column lineage and metadata propagation) to include the teachings of ANAND (disclosing methods for ranking data visualizations using different data fields), to include the teachings of AMOS (disclosing generating data processing code from a directed acyclic graph) and arrive at a method to analyzing a directed acyclic graph representing the script to identify associated data information.  One of ordinary skill in the art would have been motivated to make this combination because by generating transformation scripts files based on DAG graphs, and by running the resulting scripts the system user can efficiently be able to process the input data and produce the desired results, as recognized by (AMOS, Abstract, Para. [0011]). In addition, the references of DUGAN, ANAND and AMOS teach features that are directed to analogous art and they are directed to the same field of endeavor of datasets transformation processing.

Regarding claim 9 (Previously Presented), the combination of DUGAN and ANAND teaches the limitations of claim 8.
However, the combination of DUGAN and ANAND does not explicitly wherein pre-processing the script comprises analyzing a directed acyclic graph representing the script to identify one or more column names included in the commands.
But, AMOS teaches  the wherein pre-processing the script comprising comprises analyzing a directed acyclic graph representing the script to identify one or more column names included in the commands (AMOS Abstract, lines (1-3): “The present invention provides a computer-implemented code generation system that generates data processing code from a directed acyclic graph (DAG).”; and
Para. [0009], lines (3-5): “…, a table is defined as a collection of data values that has one or more columns and zero or more rows. If a table has zero rows then the table is empty. Each column has a name and data type (e.g., character, number, or date).”; and 
Para. [0010], lines (1-5): “Individuals can build a data processing model using a Directed Acyclic Graph (DAG) that shows the flow of data from input tables to output tables. Each node has attributes that specify a number of input tables, a number of output tables, and the operations performed on the data.”; and
Para. [0011], lines (2-12): “…, an open-source data-mining tool called KNIME is used to build a DAG. KNIME saves the DAG in XML files. ... The resulting DAG-XML file is used to generate Pig Latin and User Defined Functions (UDF) Java Archive (JAR) files for Apache Pig, or SQL scripts for a relational database. The resulting scripts are then run in Apache Pig or a relational database to process the data and produce the results.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination teachings of DUGAN (disclosing method for processing column lineage and metadata propagation) to include the teachings of ANAND (disclosing methods for ranking data visualizations using different data fields), to include the teachings of AMOS (disclosing generating data processing code from a directed acyclic graph) and arrive at a method to analyzing a directed acyclic graph representing the script to identify associated data information.  One of ordinary skill in the art would have been motivated to make this combination because by generating transformation scripts files based on DAG graphs, and by running the resulting scripts the system user can efficiently be able to process the input data and produce the desired results, as recognized by (AMOS, Abstract, Para. [0011]). In addition, the references of DUGAN, ANAND and AMOS teach features that are directed to analogous art and they are directed to the same field of endeavor of datasets transformation processing.

Regarding claim 16 (Previously Presented), the combination of DUGAN and ANAND teaches the limitations of claim 15.
However, the combination of DUGAN and ANAND does not explicitly wherein pre-processing the script comprises analyzing a directed acyclic graph representing the script to identify one or more column names included in the commands.
But, AMOS teaches  the wherein pre-processing the script comprising comprises analyzing a directed acyclic graph representing the script to identify one or more column names included in the commands (AMOS Abstract, lines (1-3): “The present invention provides a computer-implemented code generation system that generates data processing code from a directed acyclic graph (DAG).”; and
Para. [0009], lines (3-5): “…, a table is defined as a collection of data values that has one or more columns and zero or more rows. If a table has zero rows then the table is empty. Each column has a name and data type (e.g., character, number, or date).”; and 
Para. [0010], lines (1-5): “Individuals can build a data processing model using a Directed Acyclic Graph (DAG) that shows the flow of data from input tables to output tables. Each node has attributes that specify a number of input tables, a number of output tables, and the operations performed on the data.”; and
Para. [0011], lines (2-12): “…, an open-source data-mining tool called KNIME is used to build a DAG. KNIME saves the DAG in XML files. ... The resulting DAG-XML file is used to generate Pig Latin and User Defined Functions (UDF) Java Archive (JAR) files for Apache Pig, or SQL scripts for a relational database. The resulting scripts are then run in Apache Pig or a relational database to process the data and produce the results.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination teachings of DUGAN (disclosing method for processing column lineage and metadata propagation) to include the teachings of ANAND (disclosing methods for ranking data visualizations using different data fields), to include the teachings of AMOS (disclosing generating data processing code from a directed acyclic graph) and arrive at a method to analyzing a directed acyclic graph representing the script to identify associated data information.  One of ordinary skill in the art would have been motivated to make this combination because by generating transformation scripts files based on DAG graphs, and by running the resulting scripts the system user can efficiently be able to process the input data and produce the desired results, as recognized by (AMOS, Abstract, Para. [0011]). In addition, the references of DUGAN, ANAND and AMOS teach features that are directed to analogous art and they are directed to the same field of endeavor of datasets transformation processing.

Claims 7 and 14 rejected under 35 U.S.C. 103 as being unpatentable over US Patent Application Publication (US 2020/0210427 A1) issued to Dugan et al. (hereinafter as “DUGAN”, in view of US Patent Application Publication (US 2015/0278214 A1) issued to Anand et al. (hereinafter as “ANAND”), and in view of US Patent Application Publication (US 2020/0409952 A1) issued to Dean et al. (hereinafter as “DEAN”).
Regarding claim 7 (Original), the combination of DUGAN and ANAND teaches the limitations of claim 1.
However, the combination of DUGAN and ANAND does not explicitly teach executing a predicate push down procedure prior to loading the data.
But, DEAN teaches executing a predicate push down procedure prior to loading the data (DEAN Para. [0041], lines (10-16): “For any remaining blocks that are not cached, the SQL processing engine can perform a partial predicate pushdown to remove the blocks which are already cached such that only the non-cached blocks are identified and fetched from the blockchain ledger.”, the examiner notes that a predicate pushdown that removes cached blocks, i.e. data, is performed then non-cached blocks are fetched, to that of executing a predicate push down procedure prior to loading the data).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination teachings of DUGAN (disclosing method for processing column lineage and metadata propagation) to include the teachings of ANAND (disclosing methods for ranking data visualizations using different data fields), to include the teachings of DEAN (disclosing a database predicate processing engine) and arrive at a method to perform a predicate push down procedure.  One of ordinary skill in the art would have been motivated to make this combination because by performing a predicate pushdown procedure on a data block that enables, for example, the SQL processing engine, to fetch only a specific subset of blocks, thereby speeding up access to the desired data, as recognized by (DEAN, Abstract, Para. [0038]). In addition, the references of DUGAN, ANAND and DEAN teach features that are directed to analogous art and they are directed to the same field of endeavor of datasets transformation processing.  

Regarding claim 14 (Original), the combination of DUGAN and ANAND teaches the limitations The non-transitory computer-readable storage medium of claim 8.
However, the combination of DUGAN and ANAND does not explicitly teach executing a predicate push down procedure prior to loading the data.
But, DEAN teaches executing a predicate push down procedure prior to loading the data (DEAN Para. [0041], lines (10-16): “For any remaining blocks that are not cached, the SQL processing engine can perform a partial predicate pushdown to remove the blocks which are already cached such that only the non-cached blocks are identified and fetched from the blockchain ledger.”, the examiner notes that a predicate pushdown that removes cached blocks, i.e. data, is performed then non-cached blocks are fetched, to that of executing a predicate push down procedure prior to loading the data).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination teachings of DUGAN (disclosing method for processing column lineage and metadata propagation) to include the teachings of ANAND (disclosing methods for ranking data visualizations using different data fields), to include the teachings of DEAN (disclosing a database predicate processing engine) and arrive at a method to perform a predicate push down procedure.  One of ordinary skill in the art would have been motivated to make this combination because by performing a predicate pushdown procedure on a data block that enables, for example, the SQL processing engine, to fetch only a specific subset of blocks, thereby speeding up access to the desired data, as recognized by (DEAN, Abstract, Para. [0038]). In addition, the references of DUGAN, ANAND and DEAN teach features that are directed to analogous art and they are directed to the same field of endeavor of datasets transformation processing.  

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
CHEN et al. ; (US- 20190332696 -A1); “Methods for processing task that may include obtaining raw data of the first data set.”
Piedmonte et al. ; (US-7865503-B2); “Methods for data storage and retrieval using virtual data sets”.
Heer et al.; (US- 10346421 -B1); “Data profiling of large datasets”.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Zuheir A Mheir whose telephone number is (571)272-4151.  The examiner can normally be reached on Monday - Friday 9:00 - 5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Vital can be reached on (571)272-4215.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
07/16/2022

/ZUHEIR A MHEIR/Patent Examiner, Art Unit 2162     

/PIERRE M VITAL/Supervisory Patent Examiner, Art Unit 2162