DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

Claims 16-18 are rejected under 35 U.S.C. 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.
Claims 16-18 recite "the data analytics system" in line 1. There is insufficient antecedent basis for this limitation in the claims. Independent claim 15 from which claims 16-18 depend on only mentions “a data analytics method”.	
Appropriate correction is required. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 1-7, 12-13 are rejected under 35 U.S.C. 103 as being unpatentable over Jones (US 11,269,911) in view of Park (US 2018/0075163) and further in view of Dugan (US 2020/0210427).
Regarding claim 1, Jones discloses:
A data analytics system, comprising: at least one processor; and at least one non-transitory computer-readable medium containing instructions that, when executed by the at least one processor, cause the data analytics system to perform operations comprising: creating at least one data storage at least by ([col. 5, lines 61-64] “Data storage service(s) 210 may also include various kinds of object or file data stores for putting, updating, and getting data objects or files, which may include data files of unknown file type.”);
creating a metadata store separate from the at least one data storage at least by ([col. 8, lines 55-66] “ETL service 220 may maintain data catalogs 360 that describe data sets (stored in provider network 200 or in external storage locations). ETL service 220 may identify unknown data objects, identify a data format for the unknown data objects and store the data format in a data catalog for the unknown data objects.” [col. 9, lines 11-23] “Storage for data catalog(s) 360 may be implemented by one or more storage nodes, services, or computing devices (e.g., system 1000 discussed below with regard to FIG. 10) to provide persistent storage for data catalogs generated by data catalog service 200. Such storage nodes (or other storage components of storage for data catalog(s) 360) may implement various query processing engines or other request handling components to provide access to data catalogs according to requests received via interface 310. For example, data catalog storage may be implemented as a non-relational database, in one embodiment, that stores file types and other metadata for data objects in table”) and ETL service 220 stores the metadata of the data objects within catalogs separately from data storage service 210, which stores the data objects themselves as shown in at least Fig. 2;
creating a flow storage at least by ([col. 8, lines 16-23, 24-30] “ETL job creation 320 may then generate code for selected transformations and construct the source code for executing the selected transformations. The code for the ETL job may be stored in ETL job store 350 for subsequent execution… ETL job creation 320 may also implement manual creation of ETL jobs. For example, transformation operations may be manually selected, combined, or assembled via graphical user interface to define a workflow of transformations to apply. Code corresponding to the workflow may be generated (or supplied by a user), edited, and stored for subsequent execution as part of ETL job store 350.”) and the flow storage is ETL job store 350;
and configuring a flow service using first received instructions to: obtain a first flow from the flow storage at least by ([col. 3, lines 17-19] “FIG. 1 illustrates a logical block diagram of using specified performance attributes to configure machine learning pipeline stages for an Extract Transform Load (ETL) job” [col. 12, lines 53-57] “ETL Job execution worker(s) 720 may get information 722 (including executable code, invoked operations or transformations, and other information (e.g., machine learning models, configuration parameters) to execute the identified ETL job) from ETL job store 350 for the ETL job.”) and the first flow is any of the ETL jobs;
obtain metadata from the metadata storage at least by ([col. 6, lines 22-26] “ETL service 220 may access a data catalog generated by ETL service 220 in order to perform an ETL operation (e.g., a job to convert a data object from one file type into one or more other data objects of a different file type).” [col. 9, lines 11-23] “Storage for data catalog(s) 360 may be implemented by one or more storage nodes, services, or computing devices (e.g., system 1000 discussed below with regard to FIG. 10) to provide persistent storage for data catalogs generated by data catalog service 200. Such storage nodes (or other storage components of storage for data catalog(s) 360) may implement various query processing engines or other request handling components to provide access to data catalogs according to requests received via interface 310. For example, data catalog storage may be implemented as a non-relational database, in one embodiment, that stores file types and other metadata for data objects in table”));
and execute the flow, flow execution including: obtaining input data from the at least one data storage at least by ([col. 12, lines 57-61] “ETL job execution worker(s) 720 may then perform the ETL job in parallel or serialized fashion, obtaining data 724 from the source data store 730 (which may be a data storage service 210 of provider network 200).”);
generating output data at least in part by validating, transforming … the input data using the metadata at least by ([col. 3, lines 31-43] “ETL system 110 may perform an ETL job 110 which may identify the operations to perform the ETL job 110. For example, extract data operation 130 may identify where source data for the ETL job is to be found (e.g., network address, storage location, file handle, object identifier, etc.), what format the source data is in (e.g., file format, encryption scheme, compression scheme, etc.), and how to access the data (e.g., identity tokens, credentials, passwords, etc.). Transform data operations, such as transform data operation 140, may perform various types of transformations, such as dropping or filtering columns or fields, joining values, mapping values, renaming fields, splitting fields, unboxing fields, splitting rows, and so on” [col. 9, lines 51-64] “ETL job configuration 420 may handle requests or solicited input to select and configuration transformation operations 404… Requests or solicited input to select and configure transformation operations 404 may include what transformation operation (e.g., machine learning pipeline, data mapping, data filtering, data splitting, data joining, storage format conversion, etc.), various operational parameters for performing the transformation operation (e.g., which columns to join into a single column), among other information for performing a transform data operation” [col. 12, lines 24-39] “ETL job creation interface 610 illustrates an example of pipeline quality metrics 660. Pipeline quality metrics 660 may be displayed, which may identify various metrics, such as metrics 662 a and 662 b, and respective explanations. For example, metrics such as pipeline precision, recall, area under precision recall curve (AUPRC), or accuracy score (e.g., max F1) may be explained including the impact of performance attributes upon the quality metrics 660 in some embodiments.”) and the metadata is the operational parameters for the configuring of ETL jobs which transform data and the performance attributes for the displayed quality metrics (validating);
providing the output data for storage in the at least one data storage at least by ([0211] “ETL system may perform ETL jobs, such as ETL job 120, to access one or more data stores to retrieve data from one or more sources, perform one or more transformations on the retrieved data, and then store the data in storage location (e.g., different than the location from which the data was taken), in some embodiments. ETL system 110 may implement one or multiple computing systems, such as computing system 1000 discussed below with regard to FIG. 10.”).
Jones fails to disclose “…and serializing the input data using the metadata; generating additional metadata describing the output data; and providing the additional metadata for storage in the metadata storage”
However, Park teaches …and serializing the input data using the metadata at least by ([0169] “the processing depicted in FIG. 12 may be performed by a node in the cluster of computing nodes 1012 in the distributed computing system 1002 each time a batch of events is received via a task 1020 (shown in FIG. 10).” [0152] “the disclosed distributed event processing system may be configured to perform the serialization and de-serialization of event data received via a continuous event stream. The serialization and de-serialization of event data enables the conversion of complex data objects in memory into sequences of bits that can be transferred to the computing nodes in the distributed event processing system” [0175] “FIG. 13A is an example flow diagram of a process 1300 that describes a set of operations for generating a set of serialized data values for a numeric attribute of an event”) and the metadata is the numeric attribute of the event for which the data values of the events are serialized.
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the teaching of Park into the teaching of Jones because the references similarly disclose data pipelines and/or flows. Consequently, one of ordinary skill in the art would be motivated to further modify the system as in Jones to include the serialization of inputted event data “reduces latencies in exchanging input and output events between the processing nodes in the distributed event processing system and improves the overall performance of the distributed event processing system” (Park, [0152]).
Jones, Park fail to disclose “generating additional metadata describing the output data; and providing the additional metadata for storage in the metadata storage”
However, Dugan teaches the above limitations at least by ([0040] “Datastore 270 may be part of datastore 105 or pipeline repository 107 or be separate from datastore 105 or pipeline repository 107 of FIG. 1.” [0060] “Responsive to determining that one or more source columns are associated with a column level access control policy, column level access control manger 240 can propagate the column level access control policy to the target column of the target dataset. For example, the column level access control manager 240 can associate the column level access control policy with the target column by, for example, storing the column level access control policy with the target column metadata or associating a pointer with the target column pointing to the respective one of the column level access control policies 280 stored at datastore 270”) and the metadata storage is datastore 270, which can be separate from the other datastores and stores propagated column level access control policies (additional metadata) within the target column metadata.
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the teaching of Dugan into the teaching of Jones, Park because the references similarly disclose data pipelines and/or flows. Consequently, one of ordinary skill in the art would be motivated to modify the system as in the combination of references to further include generating additional column metadata because “automatically generating target column metadata that includes column lineage metadata improves the trustworthiness of a dataset and aids in the detection and correction of errors, which improves overall system performance” (Dugan, [0023]).
As per claim 2, claim 1 is incorporated, Jones further discloses:
wherein: the flow service is configured as a stateless service at least by ([col. 12, lines 57-61] “ETL Job execution worker(s) 720 may get information 722 (including executable code, invoked operations or transformations, and other information (e.g., machine learning models, configuration parameters) to execute the identified ETL job) from ETL job store 350 for the ETL job. ETL job execution worker(s) 720 may then perform the ETL job in parallel or serialized fashion, obtaining data 724 from the source data store 730 (which may be a data storage service 210 of provider network 200)”) and the jobs can be performed in parallel, meaning that the state of one job does not need to reference the data of another job in order to operate because they can execute independently.
As per claim 3, claim 1 is incorporated, Jones further discloses:
wherein: the first flow specifies a sequence of stages, at least one of the stages specifying a data transformation at least by ([col. 3, lines 62-67] “A machine learning pipeline, like machine learning pipeline 150 may implement multiple data processing stages, such as stages 152, 154, 156, and 158, to apply various techniques, like pre-processing, item selection, analysis, result refinement, in order to identify similar items, as discussed in more detail below with regard to FIG. 5.”).
As per claim 4, claim 1 is incorporated, Jones further discloses:
the operations further include creating an artifact storage; the first flow specifies a first data transformation; wherein the flow service is further configurable to obtain an artifact implementing the first data transformation from the artifact storage; and wherein generating output data includes executing the artifact to perform the first data transformation at least by ([col. 12, lines 53-61] “ETL Job execution worker(s) 720 may get information 722 (including executable code, invoked operations or transformations, and other information (e.g., machine learning models, configuration parameters) to execute the identified ETL job) from ETL job store 350 for the ETL job. ETL job execution worker(s) 720 may then perform the ETL job in parallel or serialized fashion, obtaining data 724 from the source data store 730 (which may be a data storage service 210 of provider network 200).”) and the artifact storage is the ETL job store 350.
As per claim 5, claim 4 is incorporated, Jones further discloses:
wherein: the artifact comprises a script, executable binary, or module at least by ([col. 18, lines 51-61] “various examples of the ETL service including different components/modules, or arrangements of components/module that may be employed as part of implementing the ETL service are discussed. A number of different methods and techniques to implement using specified performance attributes to configure machine learning pipeline stages for an ETL job are then discussed, some of which are illustrated in accompanying flowcharts. Finally, a description of an example computing system upon which the various components, modules, systems, devices, and/or nodes may be implemented is provided.”).
As per claim 6, claim 1 is incorporated, Jones, Park fail to disclose “wherein: the metadata includes access metadata; and the flow service is further configurable to determine, using the access metadata, an authorization to: access the input data; or execute an object implementing a transformation of the input data, the transformation specified in the first flow”
However, Dugan teaches the following limitations, wherein: the metadata includes access metadata at least by ([0023] “The generated target column metadata can include user comment metadata, column level access control policy metadata, or column lineage metadata.”).
and the flow service is further configurable to determine, using the access metadata, an authorization to: access the input data; or execute an object implementing a transformation of the input data, the transformation specified in the first flow at least by ([0015] “Aspects of the present disclosure are directed to metadata generation for columns of a dataset. The dataset may be used or created as part of a data pipeline. A data pipeline may refer to an ordered set of logic (e.g., a collection of computer software scripts or programs) that performs a multi-step transformation of data obtained from data sources to produce one or more output datasets.” [0059] “one or more source columns of the source dataset can be associated with one or more column level access control policies. In some embodiments, the column level access control policy can be stored or otherwise identified in source column metadata. A column level access control policy can restrict access (e.g., read, write, copy, view, or access) to a column or the data therein to persons or operations having adequate authority.”).
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the teaching of Dugan into the teaching of Jones, Park because the references similarly disclose data pipelines and/or flows. Consequently, one of ordinary skill in the art would be motivated to further modify the system as in the combination of references to further including the access control policies as in Dugan to “enable[s] improved data security” (Dugan, [0023]).
As per claim 7, claim 1 is incorporated, Jones further discloses:
wherein: the metadata specifies at least one of a physical or logical location of the input data; and the flow service is configured to access the input data using the specified physical or logical location at least by ([col. 3, lines 31-38] “ETL system 110 may perform an ETL job 110 which may identify the operations to perform the ETL job 110. For example, extract data operation 130 may identify where source data for the ETL job is to be found (e.g., network address, storage location, file handle, object identifier, etc.), what format the source data is in (e.g., file format, encryption scheme, compression scheme, etc.), and how to access the data (e.g., identity tokens, credentials, passwords, etc.).” [col. 7, lines 57-62] “ETL service 220 may provide access to data catalogs 360 and ETL jobs (for creation, management, and execution) via interface 310, which may be a programmatic interface (e.g., Application Programming Interface (API)), command line interface, and/or graphical user interface, in various embodiments.” [col. 9, 32-46] “The selection and configuration of an extraction operation input 402 may include where source data for the ETL job is to be found (e.g., network address, storage location, file handle, object identifier, etc.), what format the source data is in (e.g., file format, encryption scheme, compression scheme, etc.), and how to access the data (e.g., identity tokens, credentials, passwords, etc.)…For example, ETL job creation 320 may implement job configuration feature 420 to handle requests to create and configure ETL jobs. For example, job configuration 420 may implement a series of interactions via a GUI to guide a user through the configuration of an ETL job. For instance, job configuration 420 may solicit input that selects and configures an extraction operation 402 via interface 310.”).
As per claim 12, claim 1 is incorporated, Jones further discloses:
wherein the flow specifies that the output data can be accessed using at least one of GraphQL, SOAP, Odata, and OpenAPI at least by ([col. 18, lines 41-48] “In various embodiments, a network-based service may be requested or invoked through the use of a message that includes parameters and/or data associated with the network-based services request. Such a message may be formatted according to a particular markup language such as Extensible Markup Language (XML), and/or may be encapsulated using a protocol such as Simple Object Access Protocol (SOAP).”).
As per claim 13, claim 1 is incorporated, Jones further discloses:
wherein: the flow service is configured to execute the flow in response to storage of the input data into the at least one data storage at least by ([col. 6, lines 14-16] “ETL service 220 may also perform ETL jobs that extract, transform, and load from one or more of the various data storage service(s) 210 to another location.” [col. 3, lines 17-19] “FIG. 1 illustrates a logical block diagram of using specified performance attributes to configure machine learning pipeline stages for an Extract Transform Load (ETL) job”).

Claims 8-9, 19 are rejected under 35 U.S.C. 103 as being unpatentable over Jones (US 11,269,911) in view of Park (US 2018/0075163) and Dugan (US 2020/0210427) and further in view of Fram (US 2015/0101066).
As per claim 8, claim 1 is incorporated, Jones further discloses:
wherein: the flow service is further configured, using received second instructions, to: obtain a second flow from the flow storage at least by [col. 12, lines 53-57] “ETL Job execution worker(s) 720 may get information 722 (including executable code, invoked operations or transformations, and other information (e.g., machine learning models, configuration parameters) to execute the identified ETL job) from ETL job store 350 for the ETL job.” [col. 10, lines 22-23] “ETL job code generation 430 may store encoded ETL jobs 470 in ETL job store 350.”) and ETL job store 350 stores multiple ETL jobs which can be identified by ETL job execution workers.
Dugan further discloses:
obtain the additional metadata from the metadata storage at least by ([0040] “Datastore 270 may be part of datastore 105 or pipeline repository 107 or be separate from datastore 105 or pipeline repository 107 of FIG. 1.” [0060] “Responsive to determining that one or more source columns are associated with a column level access control policy, column level access control manger 240 can propagate the column level access control policy to the target column of the target dataset. For example, the column level access control manager 240 can associate the column level access control policy with the target column by, for example, storing the column level access control policy with the target column metadata or associating a pointer with the target column pointing to the respective one of the column level access control policies 280 stored at datastore 270”);
and execute the second flow using output data obtained from the at least one data storage and the additional metadata, second flow execution… at least by ([0015] “Aspects of the present disclosure are directed to metadata generation for columns of a dataset. The dataset may be used or created as part of a data pipeline. A data pipeline may refer to an ordered set of logic (e.g., a collection of computer software scripts or programs) that performs a multi-step transformation of data obtained from data sources to produce one or more output datasets.” [0027] “datastore 105 storing the underlying data (e.g., enterprise data), and pipeline repository 107 storing one or more data pipelines.”) and the second flow is one of the one or more data pipelines, such as those stored within pipeline repository.
Jones, Park, Dugan fail to disclose “… execution including generating a view of at least some of the output data using the additional metadata; and the view is provided for display on a user device”
However, Fram teaches the following limitations, … execution including generating a view of at least some of the output data using the additional metadata; and the view is provided for display on a user device at least by ([0023] “As will be appreciated from context, the verb “restricting” refers to the actions taken in response to instructions executing in a processor to control information flow such as to a particular display” [0044] “The Device Interaction Database 164 provides storage and indexing of data related to the input and output functionality or capabilities of a display of a given type or particular device identification (e.g., a MAC address in the case of a device with an integrated display).” [0054] “The hardware connected on this path can be analyzed by code executing in a processor in order to determine the device type and/or ID of the device to which the dataset is to be transmitted. Whether transmission of the dataset is to be restricted can be assessed in view of this determination. If the device type or device ID is known to or approved by the system (e.g., based on look-ups to stored reference or authorization data), then that transmission can proceed.” [0057] “The system enables a user to review some portions of private or restricted data on standard display devices, e.g. computers, smartphones, computer tablets, interactive displays, while still ensuring that other portions that are more sensitive, are only accessible to a user on a “private display,” i.e., those displays that are not easily viewed by others nearby.” [0085] “Once the user is authenticated and the device characteristics have been evaluated by the system according to steps 631-633 of FIG. 6 c, the data requested is either transmitted to the display device or a placeholder statement is transmitted to the display device, according to the privacy level of the device, the user credentials and the privacy level of the information requested, as in step 635.”).
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the teaching of Fram into the teaching of Jones, Park, Dugan because the references similarly disclose the processing and/or displaying of stored data. Consequently, one of ordinary skill in the art would be motivated to further modify the system as in the combination of references to further include restricting the display of data as in Fram which “allows a user to efficiently interact with a computer with a standard computer monitor, but only view restricted information using devices that have “private displays,”” ([Fram], [0010]) in order to protect sensitive data.
As per claim 9, claim 8 is incorporated, Jones further discloses:
…flow service… at least by ([col. 3, lines 17-19] “FIG. 1 illustrates a logical block diagram of using specified performance attributes to configure machine learning pipeline stages for an Extract Transform Load (ETL) job” [col. 12, lines 53-57] “ETL Job execution worker(s) 720 may get information 722 (including executable code, invoked operations or transformations, and other information (e.g., machine learning models, configuration parameters) to execute the identified ETL job) from ETL job store 350 for the ETL job.”) and the flow service is any of the ETL jobs which each involved machine learning pipeline stages.
Fram further discloses:
wherein: the … service is further configured to determine an authorization to generate the view using the additional metadata and an identity associated with the user device at least by ([0044] “The Device Interaction Database 164 provides storage and indexing of data related to the input and output functionality or capabilities of a display of a given type or particular device identification (e.g., a MAC address in the case of a device with an integrated display).” [0054] “The hardware connected on this path can be analyzed by code executing in a processor in order to determine the device type and/or ID of the device to which the dataset is to be transmitted. Whether transmission of the dataset is to be restricted can be assessed in view of this determination. If the device type or device ID is known to or approved by the system (e.g., based on look-ups to stored reference or authorization data), then that transmission can proceed.” [0085] “Once the user is authenticated and the device characteristics have been evaluated by the system according to steps 631-633 of FIG. 6 c, the data requested is either transmitted to the display device or a placeholder statement is transmitted to the display device, according to the privacy level of the device, the user credentials and the privacy level of the information requested, as in step 635.”).
As per claim 19, claim 1 is incorporated, Jones further discloses:
wherein: the flow service is further configured, using received second instructions, to: obtain a second flow from the flow storage at least by [col. 12, lines 53-57] “ETL Job execution worker(s) 720 may get information 722 (including executable code, invoked operations or transformations, and other information (e.g., machine learning models, configuration parameters) to execute the identified ETL job) from ETL job store 350 for the ETL job.” [col. 10, lines 22-23] “ETL job code generation 430 may store encoded ETL jobs 470 in ETL job store 350.”) and ETL job store 350 stores multiple ETL jobs which can be identified by ETL job execution workers.
Dugan further discloses:
obtain the additional metadata from the metadata storage at least by ([0040] “Datastore 270 may be part of datastore 105 or pipeline repository 107 or be separate from datastore 105 or pipeline repository 107 of FIG. 1.” [0060] “Responsive to determining that one or more source columns are associated with a column level access control policy, column level access control manger 240 can propagate the column level access control policy to the target column of the target dataset. For example, the column level access control manager 240 can associate the column level access control policy with the target column by, for example, storing the column level access control policy with the target column metadata or associating a pointer with the target column pointing to the respective one of the column level access control policies 280 stored at datastore 270”);
and based on the determination, execute the second flow using output data obtained from the at least one data storage and the additional metadata, second flow execution… at least by ([0015] “Aspects of the present disclosure are directed to metadata generation for columns of a dataset. The dataset may be used or created as part of a data pipeline. A data pipeline may refer to an ordered set of logic (e.g., a collection of computer software scripts or programs) that performs a multi-step transformation of data obtained from data sources to produce one or more output datasets.” [0027] “datastore 105 storing the underlying data (e.g., enterprise data), and pipeline repository 107 storing one or more data pipelines.”) and the second flow is one of the one or more data pipelines, such as those stored within pipeline repository.
Jones, Park, Dugan fail to disclose “determine an authorization to generate the view using the additional metadata and an identity associated with a user device; … execution including generating a view of at least some of the output data using the additional metadata; and the view is provided for display on a user device”
However, Fram teaches the following limitations, determine an authorization to generate the view using the additional metadata and an identity associated with a user device at least by ([0044] “The Device Interaction Database 164 provides storage and indexing of data related to the input and output functionality or capabilities of a display of a given type or particular device identification (e.g., a MAC address in the case of a device with an integrated display).” [0054] “The hardware connected on this path can be analyzed by code executing in a processor in order to determine the device type and/or ID of the device to which the dataset is to be transmitted. Whether transmission of the dataset is to be restricted can be assessed in view of this determination. If the device type or device ID is known to or approved by the system (e.g., based on look-ups to stored reference or authorization data), then that transmission can proceed.” [0085] “Once the user is authenticated and the device characteristics have been evaluated by the system according to steps 631-633 of FIG. 6 c, the data requested is either transmitted to the display device or a placeholder statement is transmitted to the display device, according to the privacy level of the device, the user credentials and the privacy level of the information requested, as in step 635.”);
… execution including generating a view of at least some of the output data using the additional metadata; and the view is provided for display on a user device at least by ([0023] “As will be appreciated from context, the verb “restricting” refers to the actions taken in response to instructions executing in a processor to control information flow such as to a particular display” [0044] “The Device Interaction Database 164 provides storage and indexing of data related to the input and output functionality or capabilities of a display of a given type or particular device identification (e.g., a MAC address in the case of a device with an integrated display).” [0054] “The hardware connected on this path can be analyzed by code executing in a processor in order to determine the device type and/or ID of the device to which the dataset is to be transmitted. Whether transmission of the dataset is to be restricted can be assessed in view of this determination. If the device type or device ID is known to or approved by the system (e.g., based on look-ups to stored reference or authorization data), then that transmission can proceed.” [0057] “The system enables a user to review some portions of private or restricted data on standard display devices, e.g. computers, smartphones, computer tablets, interactive displays, while still ensuring that other portions that are more sensitive, are only accessible to a user on a “private display,” i.e., those displays that are not easily viewed by others nearby.” [0085] “Once the user is authenticated and the device characteristics have been evaluated by the system according to steps 631-633 of FIG. 6 c, the data requested is either transmitted to the display device or a placeholder statement is transmitted to the display device, according to the privacy level of the device, the user credentials and the privacy level of the information requested, as in step 635.”).
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the teaching of Fram into the teaching of Jones, Park, Dugan because the references similarly disclose the processing and/or displaying of stored data. Consequently, one of ordinary skill in the art would be motivated to further modify the system as in the combination of references to further include restricting the display of data as in Fram which “allows a user to efficiently interact with a computer with a standard computer monitor, but only view restricted information using devices that have “private displays,”” ([Fram], [0010]) in order to protect sensitive data.

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Jones (US 11,269,911) in view of Park (US 2018/0075163) and Dugan (US 2020/0210427) and Fram (US 2015/0101066) and further in view of Huber (US 2020/0186444) and Chan (US 2008/0320050).
As per claim 10, claim 8 is incorporated, Fram further discloses:
wherein: the data analytics system further includes a serving layer configured to: receive the view at least by ([0054] “The hardware connected on this path can be analyzed by code executing in a processor in order to determine the device type and/or ID of the device to which the dataset is to be transmitted. Whether transmission of the dataset is to be restricted can be assessed in view of this determination. If the device type or device ID is known to or approved by the system (e.g., based on look-ups to stored reference or authorization data), then that transmission can proceed.” [0057] “The system enables a user to review some portions of private or restricted data on standard display devices, e.g. computers, smartphones, computer tablets, interactive displays, while still ensuring that other portions that are more sensitive, are only accessible to a user on a “private display,” i.e., those displays that are not easily viewed by others nearby.” [0085] “Once the user is authenticated and the device characteristics have been evaluated by the system according to steps 631-633 of FIG. 6 c, the data requested is either transmitted to the display device or a placeholder statement is transmitted to the display device, according to the privacy level of the device, the user credentials and the privacy level of the information requested, as in step 635.”).
Jones, Park, Dugan, Fram fail to disclose “determine a delivery API format based on a characteristic of the user device; transform the view for provision to the user device in the delivery API format; and provide the transformed view to the user device”
However, Huber teaches the following limitations, determine a delivery API format based on a characteristic of the user device at least by ([0056] “The analytics engine 700 may provide an output to a requesting function or user. In some embodiments, an API can be called with information such as a device or application identifier and specified content package. The API may be configured to receive electronic messages that encode identifiers indicative of a data package analysis request for fulfillment by the analytics engine 700.”).
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the teaching of Huber into the teaching of Jones, Park, Dugan, Fram because the references similarly disclose the processing and/or displaying of stored data. Consequently, one of ordinary skill in the art would be motivated to further modify the system as in the combination of references to further include the determining of an API for delivering data based on a characteristic of the device in order to avoid any potential errors or issues when displaying the data.
Jones, Park, Dugan , Fram fail to disclose “transform the view for provision to the user device in the delivery API format; and provide the transformed view to the user device”
However, Chan teaches the following limitations, transform the view for provision to the user device in the delivery API format at least by ([0013] “application programming interfaces (APIs) may be provided in the modified data view module which, when exposed, facilitate the building of customized data views by providing asynchronous update behavior for customized data view modules in a web page” [0020] “Thus, it will be appreciated that a data view module developer may “call” the exposed APIs 112 to provide access to the asynchronous update functions 108 when building on top of developer data form web part classes to create customized data view modules.”) and the transforming of the views is the building of the customized views using the APIs 112 (delivery API format) as shown in Fig. 1;
and provide the transformed view to the user device at least by ([0022] “The client computer 120 is in communication with the web server 102 and may include a data view design application 122 and a browser 124. As discussed above, the client computer 120 may be configured to receive web pages including data views from the web server 102. In accordance with various embodiments, the data view design application 122 may provide a user interface which may be utilized by a user to open a website, select data (from the SQL database 116 for example) for display in a data view by a modified data view module 110, and insert the data view in a web page for viewing and for asynchronous updating in the browser 124, without having to reload or refresh the web page.” [0038] “From operation 415, the routine 400 continues to operation 420, where, in response to user input, the server applications 104 insert the modified data view module (e.g., one of the modified data view modules 110) into a web page for displaying a data view. From operation 420, the routine 400 continues to operation 425 where the inserted modified data module 110 updates the displayed data view without reloading the web page.”) and the client computer is the user device.
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the teaching of Chan into the teaching of Jones, Park, Dugan, Fram, Huber because the references similarly disclose the processing and/or displaying of stored data. Consequently, one of ordinary skill in the art would be motivated to further modify the system as in the combination of references to further include the APIs as in Chan to “facilitate the building of customized data views by providing asynchronous update behavior for customized data view modules in a web page” (Chan, [0014]).

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Jones (US 11,269,911) in view of Park (US 2018/0075163) and Dugan (US 2020/0210427) and further in view of Thatte (US 2020/0125540).
As per claim 11, claim 1 is incorporated, Jones, Park, Dugan fail to disclose “wherein the flow comprises a JSON or YAML object”
However, Thatte teaches the above limitation at least by ([0067] “This configuration may be captured as rules, and may be stored as part of a JSON definition that represents the pipeline.”).
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the teaching of Thatte into the teaching of Jones, Park, Dugan because the references similarly disclose data pipelines and/or flows. Consequently, one of ordinary skill in the art would be motivated to further modify the system as in the combination of references to further include the pipelines as JSON definitions as in Thatte in order to be able to disparately store and call the definitions as needed for improved storage and processing efficiency.

Claims 14-18, 20 are rejected under 35 U.S.C. 103 as being unpatentable over Jones (US 11,269,911) in view of Park (US 2018/0075163) and Dugan (US 2020/0210427) and further in view of Todd (US 2021/0124727)
As per claim 14, claim 1 is incorporated, Jones, Park, Dugan fail to disclose “wherein: the at least one data store comprises an append-only data storage and a data lake; and the input data is retrieved from the append-only data store and the output data is written to the data lake”
However, Todd teaches the above limitations at least by ([0016] “The data scored or ranked in the DCF system may be stored in various locations, such as a data lake, in a datacenter or the like.” [0069] “The reading or ingested data may be stored in immutable edge storage platform. A pointer to the storage platform may be placed in the ledger entry along with other trust metadata” [0085] “Storage may include data lakes or the like”) and the append-only storage is the immutable edge storage platform.
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the teaching of Todd into the teaching of Jones, Park, Dugan because the references similarly disclose data pipelines and/or data flows. Consequently, one of ordinary skill in the art would be motivated to further modify the system as in the combination of references to further include the append-only data stores and data lakes as in Todd in order to be able to protect the input data from being compromised and store the output data in any format for easier retrieval in the future.
Regarding claim 15, Jones discloses:
A data analytics method, comprising: configuring a stateless flow serviceat least by ([col. 12, lines 57-61] “ETL Job execution worker(s) 720 may get information 722 (including executable code, invoked operations or transformations, and other information (e.g., machine learning models, configuration parameters) to execute the identified ETL job) from ETL job store 350 for the ETL job. ETL job execution worker(s) 720 may then perform the ETL job in parallel or serialized fashion, obtaining data 724 from the source data store 730 (which may be a data storage service 210 of provider network 200)”) and the jobs can be performed in parallel, meaning that the state of one job does not need to reference the data of another job in order to operate because they can execute independently using first received instructions to: obtain a first flow from a flow storage at least by ([col. 3, lines 17-19] “FIG. 1 illustrates a logical block diagram of using specified performance attributes to configure machine learning pipeline stages for an Extract Transform Load (ETL) job” [col. 12, lines 53-57] “ETL Job execution worker(s) 720 may get information 722 (including executable code, invoked operations or transformations, and other information (e.g., machine learning models, configuration parameters) to execute the identified ETL job) from ETL job store 350 for the ETL job.”) and the first flow is any of the ETL jobs,
the first flow specifies a first data transformation at least by ([col. 2, lines 44-45] “ETL jobs may apply various and multiple transformation operations to extracted data.”);
obtain metadata from a metadata storage separate from the at least one data storage at least by ([col. 6, lines 22-26] “ETL service 220 may access a data catalog generated by ETL service 220 in order to perform an ETL operation (e.g., a job to convert a data object from one file type into one or more other data objects of a different file type).” [col. 9, lines 11-23] “Storage for data catalog(s) 360 may be implemented by one or more storage nodes, services, or computing devices (e.g., system 1000 discussed below with regard to FIG. 10) to provide persistent storage for data catalogs generated by data catalog service 200. Such storage nodes (or other storage components of storage for data catalog(s) 360) may implement various query processing engines or other request handling components to provide access to data catalogs according to requests received via interface 310. For example, data catalog storage may be implemented as a non-relational database, in one embodiment, that stores file types and other metadata for data objects in table”));
obtain an artifact implementing a first data transformation from the artifact storage at least by ([col. 12, lines 53-61] “ETL Job execution worker(s) 720 may get information 722 (including executable code, invoked operations or transformations, and other information (e.g., machine learning models, configuration parameters) to execute the identified ETL job) from ETL job store 350 for the ETL job. ETL job execution worker(s) 720 may then perform the ETL job in parallel or serialized fashion, obtaining data 724 from the source data store 730 (which may be a data storage service 210 of provider network 200).”) and the artifact storage is the ETL job store 350,
the artifact comprising a script, executable binary, or module at least by ([col. 18, lines 51-61] “various examples of the ETL service including different components/modules, or arrangements of components/module that may be employed as part of implementing the ETL service are discussed. A number of different methods and techniques to implement using specified performance attributes to configure machine learning pipeline stages for an ETL job are then discussed, some of which are illustrated in accompanying flowcharts. Finally, a description of an example computing system upon which the various components, modules, systems, devices, and/or nodes may be implemented is provided.”);
and execute the flow, flow execution including: obtaining input data from at least one data storage at least by ([col. 12, lines 57-61] “ETL job execution worker(s) 720 may then perform the ETL job in parallel or serialized fashion, obtaining data 724 from the source data store 730 (which may be a data storage service 210 of provider network 200).”);
generating output data at least in part by validating, transforming … the input data using the metadata at least by ([col. 3, lines 31-43] “ETL system 110 may perform an ETL job 110 which may identify the operations to perform the ETL job 110. For example, extract data operation 130 may identify where source data for the ETL job is to be found (e.g., network address, storage location, file handle, object identifier, etc.), what format the source data is in (e.g., file format, encryption scheme, compression scheme, etc.), and how to access the data (e.g., identity tokens, credentials, passwords, etc.). Transform data operations, such as transform data operation 140, may perform various types of transformations, such as dropping or filtering columns or fields, joining values, mapping values, renaming fields, splitting fields, unboxing fields, splitting rows, and so on” [col. 9, lines 51-64] “ETL job configuration 420 may handle requests or solicited input to select and configuration transformation operations 404… Requests or solicited input to select and configure transformation operations 404 may include what transformation operation (e.g., machine learning pipeline, data mapping, data filtering, data splitting, data joining, storage format conversion, etc.), various operational parameters for performing the transformation operation (e.g., which columns to join into a single column), among other information for performing a transform data operation” [col. 12, lines 24-39] “ETL job creation interface 610 illustrates an example of pipeline quality metrics 660. Pipeline quality metrics 660 may be displayed, which may identify various metrics, such as metrics 662 a and 662 b, and respective explanations. For example, metrics such as pipeline precision, recall, area under precision recall curve (AUPRC), or accuracy score (e.g., max F1) may be explained including the impact of performance attributes upon the quality metrics 660 in some embodiments.”) and the metadata is the operational parameters for the configuring of ETL jobs which transform data and the performance attributes for the displayed quality metrics (validating),
the generation including executing the artifact to perform the first data transformation at least by ([col. 12, lines 53-61] “ETL Job execution worker(s) 720 may get information 722 (including executable code, invoked operations or transformations, and other information (e.g., machine learning models, configuration parameters) to execute the identified ETL job) from ETL job store 350 for the ETL job. ETL job execution worker(s) 720 may then perform the ETL job in parallel or serialized fashion, obtaining data 724 from the source data store 730 (which may be a data storage service 210 of provider network 200).”) and the artifacts are the ETL jobs which perform the transformations;
providing the output data for storage in the at least one data storage at least by ([0211] “ETL system may perform ETL jobs, such as ETL job 120, to access one or more data stores to retrieve data from one or more sources, perform one or more transformations on the retrieved data, and then store the data in storage location (e.g., different than the location from which the data was taken), in some embodiments. ETL system 110 may implement one or multiple computing systems, such as computing system 1000 discussed below with regard to FIG. 10.”).
Jones fails to disclose “at least one data storage comprising an append-only data store and a data lake; …and serializing the input data using the metadata; generating additional metadata describing the output data; providing the additional metadata for storage in the metadata storage; and wherein the input data is retrieved from the append-only data store and the output data is written to the data lake”
However, Park teaches …and serializing the input data using the metadata at least by ([0169] “the processing depicted in FIG. 12 may be performed by a node in the cluster of computing nodes 1012 in the distributed computing system 1002 each time a batch of events is received via a task 1020 (shown in FIG. 10).” [0152] “the disclosed distributed event processing system may be configured to perform the serialization and de-serialization of event data received via a continuous event stream. The serialization and de-serialization of event data enables the conversion of complex data objects in memory into sequences of bits that can be transferred to the computing nodes in the distributed event processing system” [0175] “FIG. 13A is an example flow diagram of a process 1300 that describes a set of operations for generating a set of serialized data values for a numeric attribute of an event”) and the metadata is the numeric attribute of the event for which the data values of the events are serialized.
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the teaching of Park into the teaching of Jones because the references similarly disclose data pipelines and/or flows. Consequently, one of ordinary skill in the art would be motivated to further modify the system as in Jones to include the serialization of inputted event data “reduces latencies in exchanging input and output events between the processing nodes in the distributed event processing system and improves the overall performance of the distributed event processing system” (Park, [0152]).
Jones, Park fail to disclose “at least one data storage comprising an append-only data store and a data lake; generating additional metadata describing the output data; providing the additional metadata for storage in the metadata storage; and wherein the input data is retrieved from the append-only data store and the output data is written to the data lake”
However, Dugan teaches generating additional metadata describing the output data; providing the additional metadata for storage in the metadata storage at least by ([0040] “Datastore 270 may be part of datastore 105 or pipeline repository 107 or be separate from datastore 105 or pipeline repository 107 of FIG. 1.” [0060] “Responsive to determining that one or more source columns are associated with a column level access control policy, column level access control manger 240 can propagate the column level access control policy to the target column of the target dataset. For example, the column level access control manager 240 can associate the column level access control policy with the target column by, for example, storing the column level access control policy with the target column metadata or associating a pointer with the target column pointing to the respective one of the column level access control policies 280 stored at datastore 270”) and the metadata storage is datastore 270, which can be separate from the other datastores and stores propagated column level access control policies (additional metadata) within the target column metadata.
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the teaching of Dugan into the teaching of Jones, Park because the references similarly disclose data pipelines and/or flows. Consequently, one of ordinary skill in the art would be motivated to modify the system as in the combination of references to further include generating additional column metadata because “automatically generating target column metadata that includes column lineage metadata improves the trustworthiness of a dataset and aids in the detection and correction of errors, which improves overall system performance” (Dugan, [0023]).
Jones, Park, Dugan fail to disclose “at least one data storage comprising an append-only data store and a data lake; and wherein the input data is retrieved from the append-only data store and the output data is written to the data lake”
However, Todd teaches the above limitations at least by ([0016] “The data scored or ranked in the DCF system may be stored in various locations, such as a data lake, in a datacenter or the like.” [0069] “The reading or ingested data may be stored in immutable edge storage platform. A pointer to the storage platform may be placed in the ledger entry along with other trust metadata” [0085] “Storage may include data lakes or the like”) and the append-only storage is the immutable edge storage platform.
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the teaching of Todd into the teaching of Jones, Park, Dugan because the references similarly disclose data pipelines and/or data flows. Consequently, one of ordinary skill in the art would be motivated to further modify the system as in the combination of references to further include the append-only data stores and data lakes as in Todd in order to be able to protect the input data from being compromised and store the output data in any format for easier retrieval in the future.
As per claim 16, claim 15 is incorporated, Jones further discloses:
wherein: the first flow specifies a sequence of stages, at least one of the stages specifying a data transformation at least by ([col. 3, lines 62-67] “A machine learning pipeline, like machine learning pipeline 150 may implement multiple data processing stages, such as stages 152, 154, 156, and 158, to apply various techniques, like pre-processing, item selection, analysis, result refinement, in order to identify similar items, as discussed in more detail below with regard to FIG. 5.”).
As per claim 17, claim 15 is incorporated, Jones, Park fail to disclose “wherein: the metadata includes access metadata; and the flow service is further configurable to determine, using the access metadata, an authorization to: access the input data; or execute an object implementing a transformation of the input data, the transformation specified in the first flow”
However, Dugan teaches the following limitations, wherein: the metadata includes access metadata at least by ([0023] “The generated target column metadata can include user comment metadata, column level access control policy metadata, or column lineage metadata.”).
and the flow service is further configurable to determine, using the access metadata, an authorization to: access the input data; or execute an object implementing a transformation of the input data, the transformation specified in the first flow at least by ([0015] “Aspects of the present disclosure are directed to metadata generation for columns of a dataset. The dataset may be used or created as part of a data pipeline. A data pipeline may refer to an ordered set of logic (e.g., a collection of computer software scripts or programs) that performs a multi-step transformation of data obtained from data sources to produce one or more output datasets.” [0059] “one or more source columns of the source dataset can be associated with one or more column level access control policies. In some embodiments, the column level access control policy can be stored or otherwise identified in source column metadata. A column level access control policy can restrict access (e.g., read, write, copy, view, or access) to a column or the data therein to persons or operations having adequate authority.”).
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the teaching of Dugan into the teaching of Jones, Park because the references similarly disclose data pipelines and/or flows. Consequently, one of ordinary skill in the art would be motivated to further modify the system as in the combination of references to further including the access control policies as in Dugan to “enable[s] improved data security” (Dugan, [0023]).
As per claim 18, claim 15 is incorporated, Jones further discloses:
wherein: the metadata specifies at least one of a physical or logical location of the input data; and the flow service is configured to access the input data using the specified physical or logical location at least by ([col. 3, lines 31-38] “ETL system 110 may perform an ETL job 110 which may identify the operations to perform the ETL job 110. For example, extract data operation 130 may identify where source data for the ETL job is to be found (e.g., network address, storage location, file handle, object identifier, etc.), what format the source data is in (e.g., file format, encryption scheme, compression scheme, etc.), and how to access the data (e.g., identity tokens, credentials, passwords, etc.).” [col. 7, lines 57-62] “ETL service 220 may provide access to data catalogs 360 and ETL jobs (for creation, management, and execution) via interface 310, which may be a programmatic interface (e.g., Application Programming Interface (API)), command line interface, and/or graphical user interface, in various embodiments.” [col. 9, 32-46] “The selection and configuration of an extraction operation input 402 may include where source data for the ETL job is to be found (e.g., network address, storage location, file handle, object identifier, etc.), what format the source data is in (e.g., file format, encryption scheme, compression scheme, etc.), and how to access the data (e.g., identity tokens, credentials, passwords, etc.)…For example, ETL job creation 320 may implement job configuration feature 420 to handle requests to create and configure ETL jobs. For example, job configuration 420 may implement a series of interactions via a GUI to guide a user through the configuration of an ETL job. For instance, job configuration 420 may solicit input that selects and configures an extraction operation 402 via interface 310.”).
Regarding claim 20, Jones discloses:
A non-transitory computer-readable medium containing instructions that, when executed by at least one processor of a data analytics system, cause the data analytics system to perform operations comprising: configuring a stateless flow service at least by ([col. 12, lines 57-61] “ETL Job execution worker(s) 720 may get information 722 (including executable code, invoked operations or transformations, and other information (e.g., machine learning models, configuration parameters) to execute the identified ETL job) from ETL job store 350 for the ETL job. ETL job execution worker(s) 720 may then perform the ETL job in parallel or serialized fashion, obtaining data 724 from the source data store 730 (which may be a data storage service 210 of provider network 200)”) and the jobs can be performed in parallel, meaning that the state of one job does not need to reference the data of another job in order to operate because they can execute independently using first received instructions to: obtain a first flow from a flow storage at least by ([col. 3, lines 17-19] “FIG. 1 illustrates a logical block diagram of using specified performance attributes to configure machine learning pipeline stages for an Extract Transform Load (ETL) job” [col. 12, lines 53-57] “ETL Job execution worker(s) 720 may get information 722 (including executable code, invoked operations or transformations, and other information (e.g., machine learning models, configuration parameters) to execute the identified ETL job) from ETL job store 350 for the ETL job.”) and the first flow is any of the ETL jobs,
the first flow specifies a first data transformation at least by ([col. 2, lines 44-45] “ETL jobs may apply various and multiple transformation operations to extracted data.”);
obtain metadata from a metadata storage separate from the at least one data storage at least by ([col. 6, lines 22-26] “ETL service 220 may access a data catalog generated by ETL service 220 in order to perform an ETL operation (e.g., a job to convert a data object from one file type into one or more other data objects of a different file type).” [col. 9, lines 11-23] “Storage for data catalog(s) 360 may be implemented by one or more storage nodes, services, or computing devices (e.g., system 1000 discussed below with regard to FIG. 10) to provide persistent storage for data catalogs generated by data catalog service 200. Such storage nodes (or other storage components of storage for data catalog(s) 360) may implement various query processing engines or other request handling components to provide access to data catalogs according to requests received via interface 310. For example, data catalog storage may be implemented as a non-relational database, in one embodiment, that stores file types and other metadata for data objects in table”));
obtain an artifact implementing a first data transformation from the artifact storage at least by ([col. 12, lines 53-61] “ETL Job execution worker(s) 720 may get information 722 (including executable code, invoked operations or transformations, and other information (e.g., machine learning models, configuration parameters) to execute the identified ETL job) from ETL job store 350 for the ETL job. ETL job execution worker(s) 720 may then perform the ETL job in parallel or serialized fashion, obtaining data 724 from the source data store 730 (which may be a data storage service 210 of provider network 200).”) and the artifact storage is the ETL job store 350,
the artifact comprising a script, executable binary, or module at least by ([col. 18, lines 51-61] “various examples of the ETL service including different components/modules, or arrangements of components/module that may be employed as part of implementing the ETL service are discussed. A number of different methods and techniques to implement using specified performance attributes to configure machine learning pipeline stages for an ETL job are then discussed, some of which are illustrated in accompanying flowcharts. Finally, a description of an example computing system upon which the various components, modules, systems, devices, and/or nodes may be implemented is provided.”);
and execute the flow, flow execution including: obtaining input data from at least one data storage at least by ([col. 12, lines 57-61] “ETL job execution worker(s) 720 may then perform the ETL job in parallel or serialized fashion, obtaining data 724 from the source data store 730 (which may be a data storage service 210 of provider network 200).”);
generating output data at least in part by validating, transforming … the input data using the metadata at least by ([col. 3, lines 31-43] “ETL system 110 may perform an ETL job 110 which may identify the operations to perform the ETL job 110. For example, extract data operation 130 may identify where source data for the ETL job is to be found (e.g., network address, storage location, file handle, object identifier, etc.), what format the source data is in (e.g., file format, encryption scheme, compression scheme, etc.), and how to access the data (e.g., identity tokens, credentials, passwords, etc.). Transform data operations, such as transform data operation 140, may perform various types of transformations, such as dropping or filtering columns or fields, joining values, mapping values, renaming fields, splitting fields, unboxing fields, splitting rows, and so on” [col. 9, lines 51-64] “ETL job configuration 420 may handle requests or solicited input to select and configuration transformation operations 404… Requests or solicited input to select and configure transformation operations 404 may include what transformation operation (e.g., machine learning pipeline, data mapping, data filtering, data splitting, data joining, storage format conversion, etc.), various operational parameters for performing the transformation operation (e.g., which columns to join into a single column), among other information for performing a transform data operation” [col. 12, lines 24-39] “ETL job creation interface 610 illustrates an example of pipeline quality metrics 660. Pipeline quality metrics 660 may be displayed, which may identify various metrics, such as metrics 662 a and 662 b, and respective explanations. For example, metrics such as pipeline precision, recall, area under precision recall curve (AUPRC), or accuracy score (e.g., max F1) may be explained including the impact of performance attributes upon the quality metrics 660 in some embodiments.”) and the metadata is the operational parameters for the configuring of ETL jobs which transform data and the performance attributes for the displayed quality metrics (validating),
the generation including executing the artifact to perform the first data transformation at least by ([col. 12, lines 53-61] “ETL Job execution worker(s) 720 may get information 722 (including executable code, invoked operations or transformations, and other information (e.g., machine learning models, configuration parameters) to execute the identified ETL job) from ETL job store 350 for the ETL job. ETL job execution worker(s) 720 may then perform the ETL job in parallel or serialized fashion, obtaining data 724 from the source data store 730 (which may be a data storage service 210 of provider network 200).”) and the artifacts are the ETL jobs which perform the transformations;
providing the output data for storage in the at least one data storage at least by ([0211] “ETL system may perform ETL jobs, such as ETL job 120, to access one or more data stores to retrieve data from one or more sources, perform one or more transformations on the retrieved data, and then store the data in storage location (e.g., different than the location from which the data was taken), in some embodiments. ETL system 110 may implement one or multiple computing systems, such as computing system 1000 discussed below with regard to FIG. 10.”).
Jones fails to disclose “at least one data storage comprising an append-only data store and a data lake; …and serializing the input data using the metadata; generating additional metadata describing the output data; providing the additional metadata for storage in the metadata storage; and wherein the input data is retrieved from the append-only data store and the output data is written to the data lake”
However, Park teaches …and serializing the input data using the metadata at least by ([0169] “the processing depicted in FIG. 12 may be performed by a node in the cluster of computing nodes 1012 in the distributed computing system 1002 each time a batch of events is received via a task 1020 (shown in FIG. 10).” [0152] “the disclosed distributed event processing system may be configured to perform the serialization and de-serialization of event data received via a continuous event stream. The serialization and de-serialization of event data enables the conversion of complex data objects in memory into sequences of bits that can be transferred to the computing nodes in the distributed event processing system” [0175] “FIG. 13A is an example flow diagram of a process 1300 that describes a set of operations for generating a set of serialized data values for a numeric attribute of an event”) and the metadata is the numeric attribute of the event for which the data values of the events are serialized.
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the teaching of Park into the teaching of Jones because the references similarly disclose data pipelines and/or flows. Consequently, one of ordinary skill in the art would be motivated to further modify the system as in Jones to include the serialization of inputted event data “reduces latencies in exchanging input and output events between the processing nodes in the distributed event processing system and improves the overall performance of the distributed event processing system” (Park, [0152]).
Jones, Park fail to disclose “at least one data storage comprising an append-only data store and a data lake; generating additional metadata describing the output data; providing the additional metadata for storage in the metadata storage; and wherein the input data is retrieved from the append-only data store and the output data is written to the data lake”
However, Dugan teaches generating additional metadata describing the output data; providing the additional metadata for storage in the metadata storage at least by ([0040] “Datastore 270 may be part of datastore 105 or pipeline repository 107 or be separate from datastore 105 or pipeline repository 107 of FIG. 1.” [0060] “Responsive to determining that one or more source columns are associated with a column level access control policy, column level access control manger 240 can propagate the column level access control policy to the target column of the target dataset. For example, the column level access control manager 240 can associate the column level access control policy with the target column by, for example, storing the column level access control policy with the target column metadata or associating a pointer with the target column pointing to the respective one of the column level access control policies 280 stored at datastore 270”) and the metadata storage is datastore 270, which can be separate from the other datastores and stores propagated column level access control policies (additional metadata) within the target column metadata.
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the teaching of Dugan into the teaching of Jones, Park because the references similarly disclose data pipelines and/or flows. Consequently, one of ordinary skill in the art would be motivated to modify the system as in the combination of references to further include generating additional column metadata because “automatically generating target column metadata that includes column lineage metadata improves the trustworthiness of a dataset and aids in the detection and correction of errors, which improves overall system performance” (Dugan, [0023]).
Jones, Park, Dugan fail to disclose “at least one data storage comprising an append-only data store and a data lake; and wherein the input data is retrieved from the append-only data store and the output data is written to the data lake”
However, Todd teaches the above limitations at least by ([0016] “The data scored or ranked in the DCF system may be stored in various locations, such as a data lake, in a datacenter or the like.” [0069] “The reading or ingested data may be stored in immutable edge storage platform. A pointer to the storage platform may be placed in the ledger entry along with other trust metadata” [0085] “Storage may include data lakes or the like”) and the append-only storage is the immutable edge storage platform.
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the teaching of Todd into the teaching of Jones, Park, Dugan because the references similarly disclose data pipelines and/or data flows. Consequently, one of ordinary skill in the art would be motivated to further modify the system as in the combination of references to further include the append-only data stores and data lakes as in Todd in order to be able to protect the input data from being compromised and store the output data in any format for easier retrieval in the future.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WILLIAM P BARTLETT whose telephone number is (469)295-9085.  The examiner can normally be reached on M-Th 11:30-8:30, F 11-3.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.  
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Usmaan Saeed can be reached on 5712724046.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/WILLIAM P BARTLETT/
Examiner, Art Unit 2169
/USMAAN SAEED/Supervisory Patent Examiner, Art Unit 2169