DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
This Final Office Action is in response to remarks and amendment filed on 09/07/2021.
Amended claims 1-6, 8-13 and 15-20, filed on 09/07/2021 are being considered on the merits.
Claims 1-20 remain pending in the application.  

This action is in response to the remarks and amendments submitted on 09/07/2021. In response to the last Office Action: 
Claims 1-6, 8-13 and 15-20 have been amended.
The rejection of claims 1, 8 and 15, under 35 USC § 112(b), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter, previously set forth in the Non-Final Office Action mailed on 06/07/2021, have been withdrawn.  Applicant has amended the aforementioned claims to define and clarify the reference to the “the composite dataset”, which resulted into the withdrawal of the 35 USC § 112(b).
The rejection of claims 1-2, 5-9, 12-16 and 19-20 under 35 USC § 101 as being an abstract idea, previously set forth in the Non-Final Office Action mailed on 06/07/2021, has been maintained and reiterated below for applicant’s convenience.


Response to Arguments
The applicant’s remarks and/or arguments, filed on 09/07/2021 have been fully considered. 
The examiner is entitled to give claim limitations their broadest reasonable interpretation in light of the specification. See MPEP 2111 [R-1] Interpretation of Claims-Broadest Reasonable Interpretation. The applicant always has the opportunity to amend the claims during prosecution, and broad interpretation by the examiner reduces the possibility that the claim, once issued, will be interpreted more broadly than is justified. In re Prater, 162 USPQ 541,550-51 (CCPA 1969).

Applicant's below arguments in the applicant’s remarks regarding amended claims 1, 8 and 15,  found on pages 8-10, and filed on 09/07/2021, have been fully considered but they are not persuasive.

Applicant stated: “At no point does Marschner describe identifying columns in multiple datasets based on pre-processing a script." …, “Nothing in Marschner discloses a pre-stored metadata file. Further, nothing in Marschner describes an algebraic relationship between datasets in a composite dataset.” …, “Marschner includes no meaningful discussion of the structure of the dataset store.”
Regarding the aforementioned claim limitations, Examiner respectfully disagrees.  Examiner asserts that the aforementioned limitation of independent claims 1, 8 and 15, as drafted and given the broadest reasonable interpretation, are disclosed by the cited prior art to Marschner.  In particular, Marschner discloses in Col. 7, lines (11-17): “The data preprocessing system 100 receives datasets for processing from the sources of big data 110. A dataset comprises one or more attributes. In an embodiment, the attributes of the dataset are represented as columns and the dataset is represented as a set of columns. A column comprises a set of cells, each cell storing a cell value. An attribute may be a simple attribute or a composite attribute.”; and  in Fig. 3, Col. 8, lines (45-46): “The dataset store 380 stores datasets and metadata describing the datasets.”; and in Fig. 3, Col. 8, lines (59-63): “The metadata module 320 determines metadata 

Applicant stated: “As described in the specification, a composite data store comprises raw data and annotation data, wherein the raw data is supplemented after storage with annotation data.”, filed on 09/07/2021, with respect to the rejection(s) of amended independent claims 1, 8 and 15 under 35 U.S.C 102 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn. However, upon further consideration, a new ground(s) of rejection is made in view of the newly found prior art: (US 2020/0394166 A1) issued to Vanhooser, in addition to the previously cited prior art.  Vanhooser discloses rules-based dataset management methods, see the below set forth 35 USC 103 rejection for further details.

Regarding the claim(s) rejections under 35 USC 101, Applicant's below arguments in the applicant’s remarks, found on pages 11-14, and filed on 09/07/2021, have been fully considered but they are not persuasive.
Applicant stated: “Applicant respectfully disagrees with the Examiner's framing of the claims as a
mental process. In the rejection, the Examiner does not interpret the claims according to either the plain meaning of the terms or the meaning of the terms imparted by the specification. Steps such as pre-processing a database script, loading metadata files based on the pre-processing, parsing portions of the 
Regarding the claim limitations and rejections under 35 USC 101, the examiner respectfully disagrees with Applicant’s remarks.  Examiner points out to the applicant that under the analysis of the 35 USC 101 rejection, Step 2A-Prong 1, the claim concept as a whole is analyzed which details steps of processing information, analyzing it, and taking an action(s) based on the collection and analysis of this information, which is nothing more than an abstract idea (see detail analysis set forth under the 35 USC 101 rejection below).  Furthermore, the following functions are not considered mental process but rather analyzed under Step 2A-Prong 2; these functions/activities include: “receiving”, “loading”, and “parsing”. These activities are data manipulation activities for simply enabling a user to manipulate information/data, which are considered to be insignificant extra-solution activities to the judicial exception, for which an extra-solution activity includes both pre-solution and post-solution activity, see MPEP 2106.05(g). 
The Applicant alleges claim 1 is not abstract idea because the “loading” step cannot be implemented in the human mind. The Examiner agrees that “loading” function cannot be executed by the human mind, however, the Applicant is reminded that the “loading” step is analyzed under step 2A, prong 2, not prong 1.  The examiner emphasizes, as mentioned above, even though the “loading” step is a computing function but rather it is analyzed, under step 2A- prong 2, as an extra-solution activity and again this step is a well-known computing function when analyzed under step 2B. 
Additionally, under the analysis of the 35 USC 101 rejection, Step 2A-Prong 2 and Step 2B, the method of claim 1 (and independent claims 8 and 15), present steps that are directed towards 
As for Applicant’s reliance on BASCOM Global Internet Services Inc. v. AT&T (June 27, 2016), the instant application identifies no features from the claim to be considered unconventional or non-generic arrangement of additional elements like for purpose of analysis under step 2B. In the instant claims, the Examiner finds nothing unconventional or non-generic in the claimed functions. 
If the Applicant considered limitations such as executing the functions of “receiving”, “loading”, and “parsing” that should be analyzed under step 2B for significantly more, Applicant must convince the Examiner that those limitations are not conventional or generic computer elements performing conventional or generic functions such as receiving, processing, and transmitting data.
Generic computers performing generic computer functions, alone, do not amount to significantly more than the abstract idea. Viewing the limitations as an ordered combination does not add anything further than looking at the limitations individually.
Furthermore, the courts have recognized the following computer functions to be well‐understood, routine, and conventional functions when they are claimed in a merely generic manner: receiving, processing, and storing data electronic, recordkeeping (Alice Corp),  and receiving or transmitting data over a network, e.g., using the Internet to gather data (Ultramerical, buySAFE, Cyberfone).
Please see the below set forth 35 USC 101 rejection for further details.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:


Claims 1-2, 5-9, 12-16 and 19-20 are rejected under 35 U.S.C 101 because the claimed invention is directed to abstract idea without significantly more. 
Step 1: 
The claims are directed to a process. The claimed process is comprising of instructions to receive a script, the script including commands to access a composite dataset, identify a set of columns, load/read a metadata file associated with the composite dataset, parse the metadata file to identify one or more datasets that include a column in the set of columns, load data from the one or more datasets,  and execute the script on the one or more datasets. 
Step 2A – Prong One – The claims recite an abstract idea
Independent claims 1, 8 and 15 are directed to an abstract idea without significantly more. The claims recite receiving/reading a script that can be fairly viewed as a number of logical simple steps to perform some activities given some type of information, i.e. dataset, wherein this information can be identified as new/raw or already modified/annotated type of information.  For example, a script can be a list of actions/activities to be performed by a person on some data, like manipulating information provided to this person on a piece of paper, for which this person would use his/her mental capacity to perform these activities given the provided information/dataset.   Further, the claim continues with the concept of analyzing, i.e. pre-processing, this information wherein this person can inspect the information provided and locate certain tabulated data in a column(s) within the boundaries of the provided information, which is again a mental process.  Further, the claim continues with the concept of examining a set of information that provides attributes, i.e. metadata, that describe the information in hand, including the concept of how this information is related to each other, i.e. the algebraic association of this information.  Further, the claim continues to identify particular information in this 
Such a process of processing information, analyzing it, and taking an action(s) based on the collection and analysis of this information is nothing more than an abstract idea.  Consequently, if a claim limitation, under its broadest reasonable interpretation, covers an abstract idea that includes a series of steps that recite mental steps, but for the recitation of generic computer components, then it falls within the “Mental Processes” and  grouping of “Abstract Ideas”.  Accordingly, the aforementioned claims recite abstract ideas.
Step 2A – Prong Two - The abstract idea is not integrated into a practical application
This judicial exception is not integrated into a practical application. The aforementioned claim(s) recites the following additional solution activities: “receiving”, “loading”, and “parsing”, which are considered to be extra-solution activities of mere data gathering. In this context, these activities are data manipulation activities for simply enabling a user to manipulate information/data, which are considered to be insignificant extra-solution activities to the judicial exception, for which an extra-solution activity includes both pre-solution and post-solution activity.  An example of pre-solution activity is a step of gathering data for use in a claimed process, for example, a step of obtaining information about credit card transactions, which is recited as part of a claimed process of analyzing and manipulating the gathered information by a series of steps in order to detect whether the transactions were fraudulent, see MPEP 2106.05(g).  
The additional element(s) recited in the claims are “a processor” and “a non-transitory computer-readable storage medium”.  The additional elements of using a computer to receive data and manipulate data are steps amounts to no more than mere instructions to apply the exception using a generic computer component.  Accordingly, this additional elements does not integrate the abstract 
Step 2B:  
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above the limitation of processing information, analyzing it, and taking an action(s) based on the collection and analysis of this information in a system , these steps are  considered to be generic and well-understood routine in computing technology. The processor and storage medium are recited at a high-level of generality such that it amounts no more than mere instructions to apply the exception using a generic computer component and cannot provide an inventive concept.  
Additionally, the claim(s) recite the limitation of “executing, by the processor, the script on the one or more datasets”, in this context, “executing” is equivalent to applying the list of actions, i.e. script, on the information at hand, which is again a mental process. Furthermore, the aforementioned claim(s) recites the following additional solution activities: “receiving”, “loading”, and “parsing”, which are considered to be extra-solution activities of mere data gathering. In this context, these activities are data manipulation activities for simply enabling a user to manipulate information/data, which are considered to be insignificant extra-solution activities to the judicial exception, for which an extra-solution activity includes both pre-solution and post-solution activity.  Therefore, the method of claim 1 (and independent claims 8 and 15), present steps that are directed towards an abstract idea.  These steps cannot improve computer functionality as the claim is directed towards an abstract idea and abstract idea cannot improve computer functionality.   

Thus, there are no additional elements that amount to significantly more than the above-identified judicial exception (the abstract idea).  Looking at the limitations as an ordered combination adds nothing that is not already present when looking at the elements taken individually. There is no indication that any combination of elements improves the functioning of a computer or improves any other technology. The claim is not patent eligible.
Claim 2 is dependent on claim 1 and includes all the limitations of claim 1.  Therefore, claim 2 recites the same abstract idea of a person analyzing and preparing some actions to perform on some information, i.e. dataset, and associating this information with attributes that describe the at hand.  The claim recites the additional limitations of “analyzing a directed acyclic graph representing the script to identify one or more column names included in the commands”.  This additional limitation elaborates on the abstract idea described above where a person now would analyze the set of actions to take and, using a pencil and paper, be able to draw a graph that represents these actions, which does not amount to significantly more than the abstract idea.
Claim 5 is dependent on claim 1 and includes all the limitations of claim 1.  The claim recite the additional limitations of “identifying file paths associated with the one or more datasets and loading data from files stored at the file paths”, which elaborates on the abstract idea of a person locating the  information of interest at a particular location in the information at hand.
Claim 6 is dependent on claim 5 and includes all the limitations of claim 5.  The claim recites the additional limitations of “combining the one or more datasets to form a second composite dataset and using the second composite dataset while executing the script”, which elaborates on the abstract idea of 
Claim 7 is dependent on claim 1 and includes all the limitations of claim 1.  The claim recite the additional limitations of “executing a predicate push down procedure prior to loading the data”, which elaborates on the abstract idea of a person taking a decision of quickly skimming through information at hand and making or taking further action based on what this person was looking for in the information at hand.
Independent claims 8 and 15 recite similar limitations to claim 1 and therefore rejected  for the same reasons as explained above.  Further, dependent claims 9, 12-14, 16, and 19-20 recite similar limitations to claims 2 and 5-7, hence rejected for similar reasons as detailed above.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the 

Claims 1, 3-6, 8, 10, 10-13, 15, and 17-20 rejected under 35 U.S.C. 103 as being unpatentable over US Patent (US 10,733,198 B1) issued to Marschner et al. (hereinafter as “MARSCHNER”), and in view of US Patent Application Publication (US 2020/0394166 A1) issued to Vanhooser (hereinafter as “VANHOOSER”).
Regarding claim 1 (Currently Amended), MARSCHNER teaches a method comprising: 
receiving, by a processor, a script, the script including commands to access a composite dataset (MARSCHNER Abstract, lines (1-3): “A data preprocessing system builds transformation scripts for preprocessing datasets for processing by a data analysis system.”; and
Fig. 3, Col. 8, lines (18-25): “FIG. 3 shows the architecture of a data preprocessing system for preprocessing data for big data analysis, ... The data preprocessing system 100 includes a user interface manager 310, a metadata module 320, a nested data structure processor 230, a transformation recommendation module 350, a transformation execution engine 250, a sample store 360, a transformation script store 370, and a dataset store 380”),
pre-processing, by the processor, the script to identify a set of columns associated with the composite dataset (MARSCHNER Abstract, lines (9-13): “The data preprocessing system configures for display a new column representing the extracted value. The data preprocessing system adds a transformation for extracting the selected attribute from the composite attribute to the transformation script.”; and 
Fig. 1, Col 5, lines (32-39): “The step of preprocessing the data is also referred to as cleansing the data by removing data that does not satisfy criteria that determine whether the data can be processed by the big data analysis system 130. These criteria include various constraints that may be specified for the data, for example, properties of different types of data including format of the data, types of values that can be occupied by specific fields, and so on.”; and 
Fig 1. Col. 7, lines (11-17): “The data preprocessing system 100 receives datasets for processing from the sources of big data 110. A dataset comprises one or more attributes. In an embodiment, the attributes of the dataset are represented as columns and the dataset is represented as a set of columns. A column comprises a set of cells, each cell storing a cell value. An attribute may be a simple attribute or a composite attribute.”); 
loading, by the processor, a metadata file associated with the composite dataset , the metadata file including an algebraic representation of the plurality of datasets (MARSCHNER Fig. 3, Col. 8, lines (45-46): “The dataset store 380 stores datasets and metadata describing the datasets.”; and
Fig. 3, Col. 8, lines (59-63): “The metadata module 320 determines metadata describing the datasets received by the data preprocessing system 100. …, the metadata module 320 takes a sample of rows and identifies row separators and column separators”; and 
Fig. 4, Col. 12, lines (42-51): “A type of the attribute is associated with certain formatting rules (or type rules) associated with the data. An attribute 430 defines characteristics of the data of the attribute. For example, the attribute 430b represents a URL that is expected to be of the format “http:” followed by a website address. The attribute 430 storing description of the farmer's markets may be associated with the attribute 430 that the text representing the description may not include certain special characters, such as ‘?’.”, 
the examiner notes that the stored metadata/attributes associated with the data by some defined rules/characteristics to that of a metadata file associated with the composite dataset , the metadata file including an algebraic representation of the plurality of datasets); 
parsing, by the processor, the algebraic representation , the one or more datasets comprising a subset of the plurality of datasets (MARSCHNER Fig. 3, Col. 8, lines (59-67): “The metadata module 320 determines metadata describing the datasets received by the data preprocessing system 100. …, the metadata module 320 takes a sample of rows and identifies row separators and column separators. By analyzing the various data values corresponding to columns, the metadata module 320 infers types of each column. …, the metadata module 320 sends information describing the various column types to the user via the user interface manager 310.”; and
Fig. 3, Col. 9, lines (16-22): “The data parsing module 340 parses data received by the data preprocessing system 100 to determine various parts of the data. The data parsing module 340 identifies record separators, for example, based on newline characters to determine where one record of the dataset ends and the next record begins”); 
loading, by the processor, data from the one or more datasets (MARSCHNER Fig. 3, Col. 9, lines (16-22): “The data parsing module 340 parses data received by the data preprocessing system 100 to determine various parts of the data. The data parsing module 340 identifies record separators, for example, based on newline characters to determine where one record of the dataset ends and the next record begins”); and 
executing, by the processor, the script on the one or more datasets (MARSCHNER Fig. 1/Fig. 2, Col. 6, lines (28-33): “A user may interact with the data preprocessing system 100 via the client device 260. The client device 260 executes a client application 210 that allows a user to interact with the data preprocessing system 100, for example, to develop and/or test transformation scripts that are used for preprocessing of the data.”; and
Fig. 2, Col. 9, lines (43-49): “The transformation execution engine 250 receives transformations and executes the transformations for a given set of input datasets. …, the transformation execution engine 250 receives transformation script and executes the transformation script for a given set of input datasets. A transformation script is a representation of a sequence (or set) of transformations.”).  
Although MARSCHNER teaches receiving, by a processor, a script, the script including commands to access a composite dataset (MARSCHNER Abstract, lines (1-3): “A data preprocessing system builds transformation scripts for preprocessing datasets for processing by a data analysis system.”; and
Fig. 3, Col. 8, lines (18-25): “FIG. 3 shows the architecture of a data preprocessing system for preprocessing data for big data analysis, ... The data preprocessing system 100 includes a user interface manager 310, a metadata module 320, a nested data structure processor 230, a transformation recommendation module 350, a transformation execution engine 250, a sample store 360, a transformation script store 370, and a dataset store 380”),
However, MARSCHNER does not explicitly teach that the composite dataset comprising a plurality of datasets, the plurality of datasets including a raw dataset and one or more annotation datasets, the one or more annotation datasets created independent of the raw dataset.
But, VANHOOSER teaches that the composite dataset comprising a plurality of datasets, the plurality of datasets including a raw dataset and one or more annotation datasets, the one or more annotation datasets created independent of the raw dataset (VANHOOSER Fig. 1, Para. [0020], lines (1-8): “…, the datasets in datastore 105 are both “immutable” and “versioned” datasets. A dataset may be defined as a named collection of data. The datasets are “immutable” in the sense that it is not possible to overwrite existing dataset data in order to modify the dataset. The datasets are “versioned” in the sense that modifications to a dataset, including historical modifications, are separately identifiable.”; and
Fig. 1, Para. [0022], lines (1-26): “An initial dataset may be raw (i.e., un-edited) data that comes directly from a data source (e.g., a full list of customer accounts) and represents the starting point of a data pipeline. Alternatively, an initial dataset may be a derived dataset, which is a dataset that is generated (i.e., built) by editing (e.g., manually or by executing logic of a data transformation step from pipeline repository 107) one or more initial datasets. A derived dataset may be potentially further transformed to provide one or more other datasets as input to the next data transformation step. Each data transformation step may perform one or more operations on the input dataset(s) to produce one or more derived datasets. For example, a data transformation step may produce a derived dataset by filtering records in an input dataset to those comprising a particular value or set of values, or by joining together two related input datasets, or by replacing references in an input dataset to values in another input dataset with actual data referenced. Because derived datasets, like datasets generally, are immutable and versioned in the system, it is possible to trace dataset data to the data source data from which the dataset data was derived or obtained, even if the dataset data is no longer in the current version of the derived dataset and even if the data source data is no longer available from the data source.”, 
the examiner notes that an dataset can be a derived dataset consisting of both original and versioned/changed/updated/annotated data to that of a composite dataset containing raw and annotated data). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of MARSCHNER (disclosing a data preprocessing system builds transformation scripts for preprocessing datasets) to include the teachings of VANHOOSER (disclosing rules-based dataset management methods) and arrive at a method to manipulate datasets of 

Regarding claim 3 (Currently Amended), the combination of MARSCHNER and VANHOOSER teaches the limitations of claim 1.  Further, MARSCHNER teaches  wherein parsing the metadata file comprises identifying one or more dataset objects stored within the metadata file (MARSCHNER Fig. 3, Col. 8, lines (45-46): “The dataset store 380 stores datasets and metadata describing the datasets.”;  and
Fig. 4, Col. 12, lines (42-51): “A type of the attribute is associated with certain formatting rules (or type rules) associated with the data. An attribute 430 defines characteristics of the data of the attribute. For example, the attribute 430b represents a URL that is expected to be of the format “http:” followed by a website address. The attribute 430 storing description of the farmer's markets may be associated with the attribute 430 that the text representing the description may not include certain special characters, such as ‘?’.”; and
Fig. 13, Col. 13, lines (56-67): “The user interface manager 310 receives 500 information identifying a dataset. The information identifying the dataset may be an address of a file stored locally on the data preprocessing system 100, a URI (uniform resource identifier) of a file on a remote system, a file on an external storage attached to the data preprocessing system, and so on. The data preprocessing system 100 uploads the dataset and may store the dataset in the dataset store 380 or may simply store metadata describing the data in the dataset store 380 such that the data itself may be retrieved from the source identified”); and 
extracting schemas associated with each of the one or more dataset objects (MARSCHNER Fig. 3, Col. 8, lines (59-66): “The metadata module 320 determines metadata describing the datasets received by the data preprocessing system 100. …, the metadata module 320 takes a sample of rows and identifies row separators and column separators. By analyzing the various data values corresponding to columns, the metadata module 320 infers types of each column.”, 
the examiner notes that the system determines from the stored metadata/attributes associated with the data by some defined rules/characteristics to that of a metadata file associated with the composite dataset).  

Regarding claim 4 (Currently Amended), the combination of MARSCHNER and VANHOOSER teaches the limitations of claim 3.  Further, MARSCHNER teaches wherein parsing the metadata file further comprises identifying a schema in the schemas that includes at least one column in the set of columns (MARSCHNER Fig. 3, Col. 8, lines (59-66): “The metadata module 320 determines metadata describing the datasets received by the data preprocessing system 100. …, the metadata module 320 takes a sample of rows and identifies row separators and column separators. By analyzing the various data values corresponding to columns, the metadata module 320 infers types of each column.”, the examiner notes that the reference disclose data layout with columns to that of a schema including at least one column).  

Regarding claim 5 (Currently Amended), the combination of MARSCHNER and VANHOOSER teaches the limitations of claim 1.  Further, MARSCHNER teaches wherein loading data from the one or more datasets comprises identifying file paths associated with the one or more datasets (MARSCHNER, Col. 2, lines (19-28): “The data preprocessing system builds a transformation script for processing the dataset by performing the following steps iteratively. The data preprocessing system receives a selection of an attribute of the composite attribute based on the structural representation of a value of the composite attribute. The data preprocessing system determines an expression representing the path of the selected attribute within the composite attribute. The data preprocessing system extracts values of the selected attribute from records of the dataset.”).  

Regarding claim 6 (Currently Amended), the combination of MARSCHNER and VANHOOSER teaches the limitations of claim 5.  Further, MARSCHNER teaches wherein executing the script comprises combining the one or more datasets to form a second composite dataset and using the second composite dataset while executing the script (MARSCHNER Fig. 2, Col. 9, lines (43-49): “The transformation execution engine 250 receives transformations and executes the transformations for a given set of input datasets. In an embodiment, the transformation execution engine 250 receives transformation script and executes the transformation script for a given set of input datasets. A transformation script is a representation of a sequence (or set) of transformations.”; and 
Col. 9, lines (50-61): “The transformation execution engine 250 includes instructions to execute various operators associated with the transformations. …, joining two or more datasets based on join keys, …”).  

Regarding claim 8 (Currently Amended), MARSCHNER teaches a non-transitory computer-readable storage medium for tangibly storing computer program instructions capable of being executed by a computer processor (MARSCHNER Fig. 1, Col. 6, lines (40-50): “…, a computer system executing code for the data preprocessing system 100 or the client device 260 is a conventional computer system executing, for example, a Microsoft Windows-compatible operating system (OS), Apple OS X, .... The computer system includes a non-transitory storage medium storing instructions that perform the various steps described herein”), the computer program instructions defining the steps of: 
receiving a script, the script including commands to access a composite dataset (MARSCHNER Abstract, lines (1-3): “A data preprocessing system builds transformation scripts for preprocessing datasets for processing by a data analysis system.”; and
Fig. 3, Col. 8, lines (18-25): “FIG. 3 shows the architecture of a data preprocessing system for preprocessing data for big data analysis, ... The data preprocessing system 100 includes a user interface manager 310, a metadata module 320, a nested data structure processor 230, a transformation recommendation module 350, a transformation execution engine 250, a sample store 360, a transformation script store 370, and a dataset store 380”); 
pre-processing, by the processor, the script to identify a set of columns associated with the composite dataset (MARSCHNER Abstract, lines (9-13): “The data preprocessing system configures for display a new column representing the extracted value. The data preprocessing system adds a transformation for extracting the selected attribute from the composite attribute to the transformation script.”; and 
Fig. 1, Col 5, lines (32-39): “The step of preprocessing the data is also referred to as cleansing the data by removing data that does not satisfy criteria that determine whether the data can be processed by the big data analysis system 130. These criteria include various constraints that may be specified for the data, for example, properties of different types of data including format of the data, types of values that can be occupied by specific fields, and so on.”; and 
Fig 1. Col. 7, lines (11-17): “The data preprocessing system 100 receives datasets for processing from the sources of big data 110. A dataset comprises one or more attributes. In an embodiment, the attributes of the dataset are represented as columns and the dataset is represented as a set of columns. A column comprises a set of cells, each cell storing a cell value. An attribute may be a simple attribute or a composite attribute.”); 
loading, by the processor, a metadata file associated with the composite dataset , the metadata file including an algebraic representation of the plurality of datasets (MARSCHNER Fig. 3, Col. 8, lines (45-46): “The dataset store 380 stores datasets and metadata describing the datasets.”; and
Fig. 3, Col. 8, lines (59-63): “The metadata module 320 determines metadata describing the datasets received by the data preprocessing system 100. …, the metadata module 320 takes a sample of rows and identifies row separators and column separators”; and 
Fig. 4, Col. 12, lines (42-51): “A type of the attribute is associated with certain formatting rules (or type rules) associated with the data. An attribute 430 defines characteristics of the data of the attribute. For example, the attribute 430b represents a URL that is expected to be of the format “http:” followed by a website address. The attribute 430 storing description of the farmer's markets may be associated with the attribute 430 that the text representing the description may not include certain special characters, such as ‘?’.”, 
the examiner notes that the stored metadata/attributes associated with the data by some defined rules/characteristics to that of a metadata file associated with the composite dataset , the metadata file including an algebraic representation of the plurality of datasets); 
parsing, by the processor, the algebraic representation , the one or more datasets comprising a subset of the plurality of datasets (MARSCHNER Fig. 3, Col. 8, lines (59-67): “The metadata module 320 determines metadata describing the datasets received by the data preprocessing system 100. …, the metadata module 320 takes a sample of rows and identifies row separators and column separators. By analyzing the various data values corresponding to columns, the metadata module 320 infers types of each column. …, the metadata module 320 sends information describing the various column types to the user via the user interface manager 310.”; and
Fig. 3, Col. 9, lines (16-22): “The data parsing module 340 parses data received by the data preprocessing system 100 to determine various parts of the data. The data parsing module 340 identifies record separators, for example, based on newline characters to determine where one record of the dataset ends and the next record begins”);  
loading data from the one or more datasets (MARSCHNER Fig. 3, Col. 9, lines (16-22): “The data parsing module 340 parses data received by the data preprocessing system 100 to determine various parts of the data. The data parsing module 340 identifies record separators, for example, based on newline characters to determine where one record of the dataset ends and the next record begins”); and 
executing the script on the one or more datasets (MARSCHNER Fig. 1/Fig. 2, Col. 6, lines (28-33): “A user may interact with the data preprocessing system 100 via the client device 260. The client device 260 executes a client application 210 that allows a user to interact with the data preprocessing system 100, for example, to develop and/or test transformation scripts that are used for preprocessing of the data.”; and
Fig. 2, Col. 9, lines (43-49): “The transformation execution engine 250 receives transformations and executes the transformations for a given set of input datasets. …, the transformation execution engine 250 receives transformation script and executes the transformation script for a given set of input datasets. A transformation script is a representation of a sequence (or set) of transformations.”).  
Although MARSCHNER teaches receiving, by a processor, a script, the script including commands to access a composite dataset (MARSCHNER Abstract, lines (1-3): “A data preprocessing system builds transformation scripts for preprocessing datasets for processing by a data analysis system.”; and
Fig. 3, Col. 8, lines (18-25): “FIG. 3 shows the architecture of a data preprocessing system for preprocessing data for big data analysis, ... The data preprocessing system 100 includes a user interface manager 310, a metadata module 320, a nested data structure processor 230, a transformation recommendation module 350, a transformation execution engine 250, a sample store 360, a transformation script store 370, and a dataset store 380”),
However, MARSCHNER does not explicitly teach that the composite dataset comprising a plurality of datasets, the plurality of datasets including a raw dataset and one or more annotation datasets, the one or more annotation datasets created independent of the raw dataset.
But, VANHOOSER teaches that the composite dataset comprising a plurality of datasets, the plurality of datasets including a raw dataset and one or more annotation datasets, the one or more annotation datasets created independent of the raw dataset (VANHOOSER Fig. 1, Para. [0020], lines (1-8): “…, the datasets in datastore 105 are both “immutable” and “versioned” datasets. A dataset may be defined as a named collection of data. The datasets are “immutable” in the sense that it is not possible to overwrite existing dataset data in order to modify the dataset. The datasets are “versioned” in the sense that modifications to a dataset, including historical modifications, are separately identifiable.”; and
Fig. 1, Para. [0022], lines (1-26): “An initial dataset may be raw (i.e., un-edited) data that comes directly from a data source (e.g., a full list of customer accounts) and represents the starting point of a data pipeline. Alternatively, an initial dataset may be a derived dataset, which is a dataset that is generated (i.e., built) by editing (e.g., manually or by executing logic of a data transformation step from pipeline repository 107) one or more initial datasets. A derived dataset may be potentially further transformed to provide one or more other datasets as input to the next data transformation step. Each data transformation step may perform one or more operations on the input dataset(s) to produce one or more derived datasets. For example, a data transformation step may produce a derived dataset by filtering records in an input dataset to those comprising a particular value or set of values, or by joining together two related input datasets, or by replacing references in an input dataset to values in another input dataset with actual data referenced. Because derived datasets, like datasets generally, are immutable and versioned in the system, it is possible to trace dataset data to the data source data from which the dataset data was derived or obtained, even if the dataset data is no longer in the current version of the derived dataset and even if the data source data is no longer available from the data source.”, 
the examiner notes that an dataset can be a derived dataset consisting of both original and versioned/changed/updated/annotated data to that of a composite dataset containing raw and annotated data). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of MARSCHNER (disclosing a data preprocessing system builds transformation scripts for preprocessing datasets) to include the teachings of VANHOOSER (disclosing rules-based dataset management methods) and arrive at a method to manipulate datasets of combined data sources .  One of ordinary skill in the art would have been motivated to make this combination because by applying a rules-based/scripted operations on derived datasets of combined sources, thereby system users can process data of all sorts through data pipeline systems with an increased efficiency, as also recognized by (VANHOOSER, Abstract, Para. [0002]-[0003]). In addition, the references of MARSCHNER and VANHOOSER teach features that are directed to analogous art and they are directed to the same field of endeavor of datasets transformation processing.

Regarding claim 10 (Currently Amended), the combination of MARSCHNER and VANHOOSER teaches the limitations of claim 8. Further, MARSCHNER teaches wherein parsing the metadata file comprises identifying one or more dataset objects stored within the metadata file (MARSCHNER Fig. 3, Col. 8, lines (45-46): “The dataset store 380 stores datasets and metadata describing the datasets.”;  and
Fig. 4, Col. 12, lines (42-51): “A type of the attribute is associated with certain formatting rules (or type rules) associated with the data. An attribute 430 defines characteristics of the data of the attribute. For example, the attribute 430b represents a URL that is expected to be of the format “http:” followed by a website address. The attribute 430 storing description of the farmer's markets may be associated with the attribute 430 that the text representing the description may not include certain special characters, such as ‘?’.”; and
Fig. 13, Col. 13, lines (56-67): “The user interface manager 310 receives 500 information identifying a dataset. The information identifying the dataset may be an address of a file stored locally on the data preprocessing system 100, a URI (uniform resource identifier) of a file on a remote system, a file on an external storage attached to the data preprocessing system, and so on. The data preprocessing system 100 uploads the dataset and may store the dataset in the dataset store 380 or may simply store metadata describing the data in the dataset store 380 such that the data itself may be retrieved from the source identified”); and 
extracting schemas associated with each of the one or more dataset objects (MARSCHNER Fig. 3, Col. 8, lines (59-66): “The metadata module 320 determines metadata describing the datasets received by the data preprocessing system 100. …, the metadata module 320 takes a sample of rows and identifies row separators and column separators. By analyzing the various data values corresponding to columns, the metadata module 320 infers types of each column.”, 
the examiner notes that the system determines from the stored metadata/attributes associated with the data by some defined rules/characteristics to that of a metadata file associated with the composite dataset).  

Regarding claim 11 (Currently Amended), the combination of MARSCHNER and VANHOOSER teaches the limitations of claim 10. Further, MARSCHNER teaches wherein parsing the metadata file further comprises identifying a schema in the schemas that includes at least one column in the set of columns (MARSCHNER Fig. 3, Col. 8, lines (59-66): “The metadata module 320 determines metadata describing the datasets received by the data preprocessing system 100. …, the metadata module 320 takes a sample of rows and identifies row separators and column separators. By analyzing the various data values corresponding to columns, the metadata module 320 infers types of each column.”, the examiner notes that the reference disclose data layout with columns to that of a schema including at least one column).  

Regarding claim 12 (Currently Amended), the combination of MARSCHNER and VANHOOSER teaches wherein loading data from the one or more datasets comprises identifying file paths associated with the one or more datasets and loading data from files stored at the file paths (MARSCHNER, Col. 2, lines (19-28): “The data preprocessing system builds a transformation script for processing the dataset by performing the following steps iteratively. The data preprocessing system receives a selection of an attribute of the composite attribute based on the structural representation of a value of the composite attribute. The data preprocessing system determines an expression representing the path of the selected attribute within the composite attribute. The data preprocessing system extracts values of the selected attribute from records of the dataset.”).  

Regarding claim 13 Currently Amended), the combination of MARSCHNER and VANHOOSER teaches the limitations of claim 12.  Further, MARSCHNER teaches wherein executing the script comprises combining the one or more datasets to form a second composite dataset and using the second composite dataset while executing the script (MARSCHNER Fig. 2, Col. 9, lines (43-49): “The transformation execution engine 250 receives transformations and executes the transformations for a given set of input datasets. In an embodiment, the transformation execution engine 250 receives transformation script and executes the transformation script for a given set of input datasets. A transformation script is a representation of a sequence (or set) of transformations.”; and 
Col. 9, lines (50-61): “The transformation execution engine 250 includes instructions to execute various operators associated with the transformations. …, joining two or more datasets based on join keys, …”).  

Regarding claim 15 (Currently Amended), MARSCHNER teaches an apparatus comprising: a processor; and a storage medium for tangibly storing thereon program logic for execution by the processor (MARSCHNER Abstract, lines (1-3): “A data preprocessing system builds transformation scripts for preprocessing datasets for processing by a data analysis system. “; and 
Fig. 1, Col. 6, lines (40-50): “…, a computer system executing code for the data preprocessing system 100 or the client device 260 is a conventional computer system executing, for example, a Microsoft Windows-compatible operating system (OS), Apple OS X, .... The computer system includes a non-transitory storage medium storing instructions that perform the various steps described herein”), the 
receiving a script, the script including commands to access a composite dataset (MARSCHNER Abstract, lines (1-3): “A data preprocessing system builds transformation scripts for preprocessing datasets for processing by a data analysis system.”; and
Fig. 3, Col. 8, lines (18-25): “FIG. 3 shows the architecture of a data preprocessing system for preprocessing data for big data analysis, ... The data preprocessing system 100 includes a user interface manager 310, a metadata module 320, a nested data structure processor 230, a transformation recommendation module 350, a transformation execution engine 250, a sample store 360, a transformation script store 370, and a dataset store 380”); 
pre-processing, by the processor, the script to identify a set of columns associated with the composite dataset (MARSCHNER Abstract, lines (9-13): “The data preprocessing system configures for display a new column representing the extracted value. The data preprocessing system adds a transformation for extracting the selected attribute from the composite attribute to the transformation script.”; and 
Fig. 1, Col 5, lines (32-39): “The step of preprocessing the data is also referred to as cleansing the data by removing data that does not satisfy criteria that determine whether the data can be processed by the big data analysis system 130. These criteria include various constraints that may be specified for the data, for example, properties of different types of data including format of the data, types of values that can be occupied by specific fields, and so on.”; and 
Fig 1. Col. 7, lines (11-17): “The data preprocessing system 100 receives datasets for processing from the sources of big data 110. A dataset comprises one or more attributes. In an embodiment, the attributes of the dataset are represented as columns and the dataset is represented as a set of columns. A column comprises a set of cells, each cell storing a cell value. An attribute may be a simple attribute or a composite attribute.”); 
loading, by the processor, a metadata file associated with the composite dataset , the metadata file including an algebraic representation of the plurality of datasets (MARSCHNER Fig. 3, Col. 8, lines (45-46): “The dataset store 380 stores datasets and metadata describing the datasets.”; and
Fig. 3, Col. 8, lines (59-63): “The metadata module 320 determines metadata describing the datasets received by the data preprocessing system 100. …, the metadata module 320 takes a sample of rows and identifies row separators and column separators”; and 
Fig. 4, Col. 12, lines (42-51): “A type of the attribute is associated with certain formatting rules (or type rules) associated with the data. An attribute 430 defines characteristics of the data of the attribute. For example, the attribute 430b represents a URL that is expected to be of the format “http:” followed by a website address. The attribute 430 storing description of the farmer's markets may be associated with the attribute 430 that the text representing the description may not include certain special characters, such as ‘?’.”, 
the examiner notes that the stored metadata/attributes associated with the data by some defined rules/characteristics to that of a metadata file associated with the composite dataset , the metadata file including an algebraic representation of the plurality of datasets); 
parsing, by the processor, the algebraic representation , the one or more datasets comprising a subset of the plurality of datasets (MARSCHNER Fig. 3, Col. 8, lines (59-67): “The metadata module 320 determines metadata describing the datasets received by the data preprocessing system 100. …, the metadata module 320 takes a sample of rows and identifies row separators and column separators. By analyzing the various data values corresponding to columns, the metadata module 320 infers types of each column. …, the metadata module 320 sends information describing the various column types to the user via the user interface manager 310.”; and
Fig. 3, Col. 9, lines (16-22): “The data parsing module 340 parses data received by the data preprocessing system 100 to determine various parts of the data. The data parsing module 340 identifies record separators, for example, based on newline characters to determine where one record of the dataset ends and the next record begins”); 
loading data from the one or more datasets (MARSCHNER Fig. 3, Col. 9, lines (16-22): “The data parsing module 340 parses data received by the data preprocessing system 100 to determine various parts of the data. The data parsing module 340 identifies record separators, for example, based on newline characters to determine where one record of the dataset ends and the next record begins”); and 
executing the script on the one or more datasets (MARSCHNER Fig. 1/Fig. 2, Col. 6, lines (28-33): “A user may interact with the data preprocessing system 100 via the client device 260. The client device 260 executes a client application 210 that allows a user to interact with the data preprocessing system 100, for example, to develop and/or test transformation scripts that are used for preprocessing of the data.”; and
Fig. 2, Col. 9, lines (43-49): “The transformation execution engine 250 receives transformations and executes the transformations for a given set of input datasets. …, the transformation execution engine 250 receives transformation script and executes the transformation script for a given set of input datasets. A transformation script is a representation of a sequence (or set) of transformations.”).
Although MARSCHNER teaches receiving, by a processor, a script, the script including commands to access a composite dataset (MARSCHNER Abstract, lines (1-3): “A data preprocessing system builds transformation scripts for preprocessing datasets for processing by a data analysis system.”; and
Fig. 3, Col. 8, lines (18-25): “FIG. 3 shows the architecture of a data preprocessing system for preprocessing data for big data analysis, ... The data preprocessing system 100 includes a user interface manager 310, a metadata module 320, a nested data structure processor 230, a transformation recommendation module 350, a transformation execution engine 250, a sample store 360, a transformation script store 370, and a dataset store 380”),
However, MARSCHNER does not explicitly teach that the composite dataset comprising a plurality of datasets, the plurality of datasets including a raw dataset and one or more annotation datasets, the one or more annotation datasets created independent of the raw dataset.
the composite dataset comprising a plurality of datasets, the plurality of datasets including a raw dataset and one or more annotation datasets, the one or more annotation datasets created independent of the raw dataset (VANHOOSER Fig. 1, Para. [0020], lines (1-8): “…, the datasets in datastore 105 are both “immutable” and “versioned” datasets. A dataset may be defined as a named collection of data. The datasets are “immutable” in the sense that it is not possible to overwrite existing dataset data in order to modify the dataset. The datasets are “versioned” in the sense that modifications to a dataset, including historical modifications, are separately identifiable.”; and
Fig. 1, Para. [0022], lines (1-26): “An initial dataset may be raw (i.e., un-edited) data that comes directly from a data source (e.g., a full list of customer accounts) and represents the starting point of a data pipeline. Alternatively, an initial dataset may be a derived dataset, which is a dataset that is generated (i.e., built) by editing (e.g., manually or by executing logic of a data transformation step from pipeline repository 107) one or more initial datasets. A derived dataset may be potentially further transformed to provide one or more other datasets as input to the next data transformation step. Each data transformation step may perform one or more operations on the input dataset(s) to produce one or more derived datasets. For example, a data transformation step may produce a derived dataset by filtering records in an input dataset to those comprising a particular value or set of values, or by joining together two related input datasets, or by replacing references in an input dataset to values in another input dataset with actual data referenced. Because derived datasets, like datasets generally, are immutable and versioned in the system, it is possible to trace dataset data to the data source data from which the dataset data was derived or obtained, even if the dataset data is no longer in the current version of the derived dataset and even if the data source data is no longer available from the data source.”, 
the examiner notes that an dataset can be a derived dataset consisting of both original and versioned/changed/updated/annotated data to that of a composite dataset containing raw and annotated data). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of MARSCHNER (disclosing a data preprocessing system builds transformation scripts for preprocessing datasets) to include the teachings of VANHOOSER (disclosing rules-based dataset management methods) and arrive at a method to manipulate datasets of combined data sources .  One of ordinary skill in the art would have been motivated to make this combination because by applying a rules-based/scripted operations on derived datasets of combined sources, thereby system users can process data of all sorts through data pipeline systems with an increased efficiency, as also recognized by (VANHOOSER, Abstract, Para. [0002]-[0003]). In addition, the references of MARSCHNER and VANHOOSER teach features that are directed to analogous art and they are directed to the same field of endeavor of datasets transformation processing.  

Regarding claim 17 (Currently Amended), the combination of MARSCHNER and VANHOOSER teaches the limitations of claim 15. Further, MARSCHNER teaches wherein parsing the metadata file comprises identifying one or more dataset objects stored within the metadata file (MARSCHNER Fig. 3, Col. 8, lines (45-46): “The dataset store 380 stores datasets and metadata describing the datasets.”;  and
Fig. 4, Col. 12, lines (42-51): “A type of the attribute is associated with certain formatting rules (or type rules) associated with the data. An attribute 430 defines characteristics of the data of the attribute. For example, the attribute 430b represents a URL that is expected to be of the format “http:” followed by a website address. The attribute 430 storing description of the farmer's markets may be associated with the attribute 430 that the text representing the description may not include certain special characters, such as ‘?’.”; and
Fig. 13, Col. 13, lines (56-67): “The user interface manager 310 receives 500 information identifying a dataset. The information identifying the dataset may be an address of a file stored locally on the data preprocessing system 100, a URI (uniform resource identifier) of a file on a remote system, a file on an external storage attached to the data preprocessing system, and so on. The data preprocessing system 100 uploads the dataset and may store the dataset in the dataset store 380 or may simply store metadata describing the data in the dataset store 380 such that the data itself may be retrieved from the source identified”); and 
extracting schemas associated with each of the one or more dataset objects (MARSCHNER Fig. 3, Col. 8, lines (59-66): “The metadata module 320 determines metadata describing the datasets received by the data preprocessing system 100. …, the metadata module 320 takes a sample of rows and identifies row separators and column separators. By analyzing the various data values corresponding to columns, the metadata module 320 infers types of each column.”, 
the examiner notes that the system determines from the stored metadata/attributes associated with the data by some defined rules/characteristics to that of a metadata file associated with the composite dataset).  

Regarding claim 18 (Currently Amended), the combination of MARSCHNER and VANHOOSER teaches the limitations of claim 17. Further, MARSCHNER teaches wherein parsing the metadata file further comprises identifying a schema in the schemas that includes at least one column in the set of columns (MARSCHNER Fig. 3, Col. 8, lines (59-66): “The metadata module 320 determines metadata describing the datasets received by the data preprocessing system 100. …, the metadata module 320 takes a sample of rows and identifies row separators and column separators. By analyzing the various data values corresponding to columns, the metadata module 320 infers types of each column.”, the examiner notes that the reference disclose data layout with columns to that of a schema including at least one column).  

Regarding claim 19 (Currently Amended), the combination of MARSCHNER and VANHOOSER teaches the limitations of claim 15.  Further, MARSCHNER teaches wherein loading data from the one or more datasets comprises identifying file paths associated with the one or more datasets and loading data from files stored at the file paths (MARSCHNER, Col. 2, lines (19-28): “The data preprocessing system builds a transformation script for processing the dataset by performing the following steps iteratively. The data preprocessing system receives a selection of an attribute of the composite attribute based on the structural representation of a value of the composite attribute. The data preprocessing system determines an expression representing the path of the selected attribute within the composite attribute. The data preprocessing system extracts values of the selected attribute from records of the dataset.”).  

Regarding claim 20 (Currently Amended), the combination of MARSCHNER and VANHOOSER teaches the limitations of claim 19.  Further, MARSCHNER teaches wherein executing the script comprises combining the one or more datasets to form a second composite dataset and using the second composite dataset while executing the script (MARSCHNER Fig. 2, Col. 9, lines (43-49): “The transformation execution engine 250 receives transformations and executes the transformations for a given set of input datasets. In an embodiment, the transformation execution engine 250 receives transformation script and executes the transformation script for a given set of input datasets. A transformation script is a representation of a sequence (or set) of transformations.”; and 
Col. 9, lines (50-61): “The transformation execution engine 250 includes instructions to execute various operators associated with the transformations. …, joining two or more datasets based on join keys, …”).

Claims 2, 9, and 16 rejected under 35 U.S.C. 103 as being unpatentable over US Patent (US 10,733,198 B1) issued to Marschner et al. (hereinafter as “MARSCHNER”), in view of US Patent Application Publication (US 2020/0394166 A1) issued to Vanhooser (hereinafter as “VANHOOSER”), and in view of US Patent Application Publication (US 2013/0332449 A1) issued to Amos et al. (hereinafter as “AMOS”).

Regarding claim 2 (Currently Amended), the combination of MARSCHNER and VANHOOSER teaches the limitations of claim 1.
However, the combination of MARSCHNER and VANHOOSER does not explicitly  wherein pre-processing the script comprises analyzing a directed acyclic graph representing the script to identify one or more column names included in the commands.
But, AMOS teaches  the wherein pre-processing the script comprising comprises analyzing a directed acyclic graph representing the script to identify one or more column names included in the commands (AMOS Abstract, lines (1-3): “The present invention provides a computer-implemented code generation system that generates data processing code from a directed acyclic graph (DAG).”; and
Para. [0009], lines (3-5): “…, a table is defined as a collection of data values that has one or more columns and zero or more rows. If a table has zero rows then the table is empty. Each column has a name and data type (e.g., character, number, or date).”; and 
Para. [0010], lines (1-5): “Individuals can build a data processing model using a Directed Acyclic Graph (DAG) that shows the flow of data from input tables to output tables. Each node has attributes that specify a number of input tables, a number of output tables, and the operations performed on the data.”; and
Para. [0011], lines (2-12): “…, an open-source data-mining tool called KNIME is used to build a DAG. KNIME saves the DAG in XML files. ... The resulting DAG-XML file is used to generate Pig Latin and User Defined Functions (UDF) Java Archive (JAR) files for Apache Pig, or SQL scripts for a relational database. The resulting scripts are then run in Apache Pig or a relational database to process the data and produce the results.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination teachings of MARSCHNER (disclosing a data preprocessing system builds transformation scripts for preprocessing datasets) and VANHOOSER (disclosing rules-based dataset management methods), to include the teachings of AMOS (disclosing generating data processing code from a directed acyclic graph) and arrive at a method to analyzing a directed acyclic graph representing the script to identify associated data information.  One of ordinary skill in the art would have been motivated to make this combination because by generating transformation scripts files based on DAG graphs, and by running the resulting scripts the system user can efficiently be able to process the input data and produce the desired results, as recognized by (AMOS, Abstract, Para. [0011]). In addition, the references of MARSCHNER, VANHOOSER and AMOS teach features that are directed to analogous art and they are directed to the same field of endeavor of datasets transformation processing.

Regarding claim 9 (Currently Amended), the combination of MARSCHNER and VANHOOSER teaches the limitations of claim 8.
 wherein pre-processing the script comprises analyzing a directed acyclic graph representing the script to identify one or more column names included in the commands.
But, AMOS teaches  the wherein pre-processing the script comprising comprises analyzing a directed acyclic graph representing the script to identify one or more column names included in the commands (AMOS Abstract, lines (1-3): “The present invention provides a computer-implemented code generation system that generates data processing code from a directed acyclic graph (DAG).”; and
Para. [0009], lines (3-5): “…, a table is defined as a collection of data values that has one or more columns and zero or more rows. If a table has zero rows then the table is empty. Each column has a name and data type (e.g., character, number, or date).”; and 
Para. [0010], lines (1-5): “Individuals can build a data processing model using a Directed Acyclic Graph (DAG) that shows the flow of data from input tables to output tables. Each node has attributes that specify a number of input tables, a number of output tables, and the operations performed on the data.”; and
Para. [0011], lines (2-12): “…, an open-source data-mining tool called KNIME is used to build a DAG. KNIME saves the DAG in XML files. ... The resulting DAG-XML file is used to generate Pig Latin and User Defined Functions (UDF) Java Archive (JAR) files for Apache Pig, or SQL scripts for a relational database. The resulting scripts are then run in Apache Pig or a relational database to process the data and produce the results.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination teachings of MARSCHNER (disclosing a data preprocessing system builds transformation scripts for preprocessing datasets) and VANHOOSER (disclosing rules-based dataset management methods), to include the teachings of AMOS (disclosing 

Regarding claim 16 (Currently Amended), the combination of MARSCHNER and VANHOOSER teaches the limitations of claim 15.
However, the combination of MARSCHNER and VANHOOSER does not explicitly  wherein pre-processing the script comprises analyzing a directed acyclic graph representing the script to identify one or more column names included in the commands.
But, AMOS teaches  the wherein pre-processing the script comprising comprises analyzing a directed acyclic graph representing the script to identify one or more column names included in the commands (AMOS Abstract, lines (1-3): “The present invention provides a computer-implemented code generation system that generates data processing code from a directed acyclic graph (DAG).”; and
Para. [0009], lines (3-5): “…, a table is defined as a collection of data values that has one or more columns and zero or more rows. If a table has zero rows then the table is empty. Each column has a name and data type (e.g., character, number, or date).”; and 
Para. [0010], lines (1-5): “Individuals can build a data processing model using a Directed Acyclic Graph (DAG) that shows the flow of data from input tables to output tables. Each node has attributes that specify a number of input tables, a number of output tables, and the operations performed on the data.”; and
Para. [0011], lines (2-12): “…, an open-source data-mining tool called KNIME is used to build a DAG. KNIME saves the DAG in XML files. ... The resulting DAG-XML file is used to generate Pig Latin and User Defined Functions (UDF) Java Archive (JAR) files for Apache Pig, or SQL scripts for a relational database. The resulting scripts are then run in Apache Pig or a relational database to process the data and produce the results.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination teachings of MARSCHNER (disclosing a data preprocessing system builds transformation scripts for preprocessing datasets) and VANHOOSER (disclosing rules-based dataset management methods), to include the teachings of AMOS (disclosing generating data processing code from a directed acyclic graph) and arrive at a method to analyzing a directed acyclic graph representing the script to identify associated data information.  One of ordinary skill in the art would have been motivated to make this combination because by generating transformation scripts files based on DAG graphs, and by running the resulting scripts the system user can efficiently be able to process the input data and produce the desired results, as recognized by (AMOS, Abstract, Para. [0011]). In addition, the references of MARSCHNER, VANHOOSER and AMOS teach features that are directed to analogous art and they are directed to the same field of endeavor of datasets transformation processing.

Claims 7 and 14 rejected under 35 U.S.C. 103 as being unpatentable over US Patent (US 10,733,198 B1) issued to Marschner et al. (hereinafter as “MARSCHNER”), in view of US Patent Application Publication (US 2020/0394166 A1) issued to Vanhooser (hereinafter as “VANHOOSER”), and in view of US Patent Application Publication (US 2020/0409952 A1) issued to Dean et al. (hereinafter as “DEAN”).
Regarding claim 7 (Original), the combination of MARSCHNER and VANHOOSER teaches the limitations of claim 1.
However, the combination of MARSCHNER and VANHOOSER does not explicitly teach executing a predicate push down procedure prior to loading the data.
But, DEAN teaches executing a predicate push down procedure prior to loading the data (DEAN Para. [0041], lines (10-16): “For any remaining blocks that are not cached, the SQL processing engine can perform a partial predicate pushdown to remove the blocks which are already cached such that only the non-cached blocks are identified and fetched from the blockchain ledger.”, the examiner notes that a predicate pushdown that removes cached blocks, i.e. data, is performed then non-cached blocks are fetched, to that of executing a predicate push down procedure prior to loading the data).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination teachings of MARSCHNER (disclosing a data preprocessing system builds transformation scripts for preprocessing datasets) and VANHOOSER (disclosing rules-based dataset management methods), to include the teachings of DEAN (disclosing a database predicate processing engine) and arrive at a method to perform a predicate push down procedure.  One of ordinary skill in the art would have been motivated to make this combination because by performing a predicate pushdown procedure on a data block that enables, for example, the SQL processing engine, to fetch only a specific subset of blocks, thereby speeding up access to the desired data, as recognized by (DEAN, Abstract, Para. [0038]). In addition, the references of MARSCHNER, VANHOOSER and DEAN teach features that are directed to analogous art and they are directed to the same field of endeavor of datasets transformation processing.  

Regarding claim 14 (Original), the combination of MARSCHNER and VANHOOSER teaches the limitations The non-transitory computer-readable storage medium of claim 8.
However, the combination of MARSCHNER and VANHOOSER does not explicitly teach executing a predicate push down procedure prior to loading the data.
But, DEAN teaches executing a predicate push down procedure prior to loading the data (DEAN Para. [0041], lines (10-16): “For any remaining blocks that are not cached, the SQL processing engine can perform a partial predicate pushdown to remove the blocks which are already cached such that only the non-cached blocks are identified and fetched from the blockchain ledger.”, the examiner notes that a predicate pushdown that removes cached blocks, i.e. data, is performed then non-cached blocks are fetched, to that of executing a predicate push down procedure prior to loading the data).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination teachings of MARSCHNER (disclosing a data preprocessing system builds transformation scripts for preprocessing datasets) and VANHOOSER (disclosing rules-based dataset management methods), to include the teachings of DEAN (disclosing a database predicate processing engine) and arrive at a method to perform a predicate push down procedure.  One of ordinary skill in the art would have been motivated to make this combination because by performing a predicate pushdown procedure on a data block that enables, for example, the SQL processing engine, to fetch only a specific subset of blocks, thereby speeding up access to the desired data, as recognized by (DEAN, Abstract, Para. [0038]). In addition, the references of MARSCHNER, VANHOOSER and DEAN teach features that are directed to analogous art and they are directed to the same field of endeavor of datasets transformation processing.  

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Kumar et al. ; (US- 2015/0234870-A1); “Methods for dynamic mapping of extensible datasets to relational database schemas”.
KOENIG et al. ; (US- 2019/0114335-A1); “Database storing mixed datasets raw and updated”.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Zuheir Mheir whose telephone number is (571)272-4151.  The examiner can normally be reached on Monday - Friday 9:00 - 5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
12/28/2021

/ZUHEIR A MHEIR/Patent Examiner, Art Unit 2162     


/Hares Jami/Primary Examiner, Art Unit 2162