Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
This action is in response to the application filed on 02/18/2021.
Claims 1-20 are pending. 

Examiner’s Note
Please note that Examiner cites particular columns and line numbers in the references as applied to the claims below for the convenience of the applicant. Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested that, in preparing responses, the applicant fully consider the references in entirely as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the examiner. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over USPN 8,978,034 to Goodson et al. in view of USPN 20160267082 Wong et al.
Per claim 1:
Goodson discloses:
1. A method comprising: 
extracting run-time metadata from a computer system (Col. 6, lines 42-43 “metadata is extracted as the workflow is processed”), the run-time metadata generated on execution, by the computer system, of a workflow including one or more data processing jobs (Col. 6, lines 45-58 “Since each transformation/aggregation/filter can modify the Source data schema, the schema of the data as it is processed at each stage in the workflow can also be maintained… the steps the data goes through as it is being processed. Workflow provenance includes metadata indicating where the data came from (e.g., the Source database or file), which processing stages have the data gone through, what types of transformations have been applied on the data”).

Goodson does not explicitly discloses determining a data lineage associated with the workflow based on the run-time metadata; and generating design-time information associated with the workflow based on the data lineage, the design-time information indicative of a design of any of the computer system, the workflow, or the one or more data processing jobs included in the workflow.
However, Wong discloses in an analogous computer system determining a data lineage associated with the workflow based on the run-time metadata (Paragraph [0235] “data lineage unit 114 may be configured to maintain a the "data lineage" of each and every element from a source file to reports”); and generating design-time information associated with the workflow based on the data lineage (Paragraph [0241] “a power user reporting interface is also available for ad-Hoc reports (users can build their own report from a list of tables and columns for in depth analysis or customized reporting). Specific formatting may be automatically applied based on the metadata tags”), the design-time information indicative of a design of any of the computer system, the workflow, or the one or more data processing jobs included in the workflow (Since this appears to be MARKUSH type language requiring at a minimum just one from the list, Wong teaches Paragraph [0244] “The reporting/analytics unit 116 may be configured to provide various dashboards and/or user interface functionality such that a user may be able to view, analyze, interpret various elements of information organized into reports, and in some embodiments, the dashboards may be configured to allow a user to take various actions, such as initiate the re-running of data extraction, flag data for low data quality, review a "data lineage", etc.”).
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention was made to incorporate the method of determining a data lineage associated with the workflow based on the run-time metadata; and generating design-time information associated with the workflow based on the data lineage, the design-time information indicative of a design of any of the computer system, the workflow, or the one or more data processing jobs included in the workflow as taught by Wong into the method of automated workflow for data management, and specifically to efficient data manipulation and information extraction techniques as taught by Goodson. The modification would be obvious because of one of ordinary skill in the art would be motivated to add/incorporate the features of determining a data lineage associated with the workflow based on the run-time metadata; and generating design-time information associated with the workflow based on the data lineage, the design-time information indicative of a design of any of the computer system, the workflow, or the one or more data processing jobs included in the workflow to provide an efficient technique for incorporating such a large amount of data so that the workflows are generated for analyzing the large volume of data as suggested by Wong (paragraph [0003-0007]).

Per claim 2:
Goodson discloses:
2. The method of claim 1, wherein determining the data lineage associated with the workflow includes: identifying, based on the run-time metadata, a plurality of entities involved in executing the one or more data processing jobs includes in the workflow (Col. 5, lines 20-25 “A workflow compiler (element 105) takes the workflow definition and translates it into a set of actions that the processing system knows how to perform on the data. Typically, a workflow requires multiple of these actions to fully process the data. The processing system has a set of processing elements that are able to perform a specific action or set of actions on the data”); and 

Goodson does not explicitly discloses determining, based on the run-time metadata, relationships between the plurality of entities.
However, Wong discloses in an analogous computer system determining, based on the run-time metadata, relationships between the plurality of entities (Paragraph [0235] “maintain the relationship between reporting information and source data…  Such linkages may be maintained and updated in various metadata tags as reports are generated using the data”).
The feature of providing determining, based on the run-time metadata, relationships between the plurality of entities would be obvious for the reasons set forth in the rejection of claim 1.




Per claim 3:
The rejection of claim 2 is incorporated and further, Goodson does not explicitly disclose wherein the plurality of entities include any one or more of a file, a directory, a table, a script, a query template, a query execution object, a job template, or a job execution object.
However, Wong discloses in an analogous computer system wherein the plurality of entities include any one or more of a file, a directory, a table, a script, a query template, a query execution object, a job template, or a job execution object (Since this appears to be MARKUSH type language requiring at a minimum just one from the list, Wong teaches Paragraph [0317] “employees of entities associated with a particular institution or combined with external data from social media, news sources, legal and financial documents, and other internet sources to discover business relationships between entities”).
The feature of providing wherein the plurality of entities include any one or more of a file, a directory, a table, a script, a query template, a query execution object, a job template, or a job execution object would be obvious for the reasons set forth in the rejection of claim 1.

Per claim 4:
The rejection of claim 2 is incorporated and further, Goodson does not explicitly disclose wherein the relationships between entities include any one or more of a data flow relationship, a parent-child relationship, a logical-physical relationship, or a control relationship.
However, Wong discloses in an analogous computer system wherein the relationships between entities include any one or more of a data flow relationship, a parent-child relationship, a logical-physical relationship, or a control relationship (Since this appears to be MARKUSH type language requiring at a minimum just one from the list, Wong teaches Paragraph [0322] “graph consists of vertices and edges that represent entities and relationships”).
The feature of providing wherein the relationships between entities include any one or more of a data flow relationship, a parent-child relationship, a logical-physical relationship, or a control relationship would be obvious for the reasons set forth in the rejection of claim 1.

Per claim 5:
Goodson discloses:
5. The method of claim 1, wherein the run-time metadata is extracted from one or more services for storing and processing data in the computer system (Col. 6, lines 41-60 “metadata is extracted as the workflow is processed… Workflow provenance includes metadata indicating where the data came from (e.g., the source database or file)”).

Per claim 6:
The rejection of claim 1 is incorporated and further, Goodson does not explicitly discloses determining a structure of a previous version of the workflow based on the design-time information.
However, Wong discloses in an analogous computer system determining a structure of a previous version of the workflow based on the design-time information (Paragraph [0163] “the system 100 may be configured to maintain data for a predefined period (e.g., 30 days (versions) of source RAW data)”).
The feature of providing determining a structure of a previous version of the workflow based on the design-time information would be obvious for the reasons set forth in the rejection of claim 1.

Per claim 7:
Goodson discloses:
 7. The method of claim 1, wherein the design-time information includes information regarding any one or more of: data processed by the workflow; operations performed on the data as part of the workflow; or services of the computer system utilized to perform the operations on the data (Since this appears to be MARKUSH type language requiring at a minimum just one from the list, Goodson teaches Col. 4, lines 17-22 “The workflow definition is then fed to a workflow compiler that takes the workflow definition and produces a description of processing stages that the data flows through (step 104). The processing stages transform the data according to the workflow definition”). 

Per claim 8:
The rejection of claim 1 is incorporated and further, Goodson does not explicitly discloses wherein generating the design-time information includes: inferring, based on the data lineage associated with the workflow, logical connections between the one or more data processing jobs included in the workflow.
However, Wong discloses in an analogous computer system wherein generating the design-time information includes: inferring, based on the data lineage associated with the workflow, logical connections between the one or more data processing jobs included in the workflow (Paragraph [0064] “an analytics platform, which may periodically or continuously monitor the pre-processing, and may be configured to provide various reports that relate to the data quality and/or integrity of the pre-processed data, tracking a `data lineage` of data points relating to original source data streams”).
The feature of providing wherein generating the design-time information includes: inferring, based on the data lineage associated with the workflow, logical connections between the one or more data processing jobs included in the workflow would be obvious for the reasons set forth in the rejection of claim 1.
 
Per claim 9:
The rejection of claim 8 is incorporated and further, Goodson does not explicitly disclose wherein logical connections between data processing jobs may include any one or more of: sequencing of data processing jobs; scheduling of data processing jobs; dependencies between data processing jobs; or common parameters between data processing jobs.
However, Wong discloses in an analogous computer system wherein logical connections between data processing jobs may include any one or more of: sequencing of data processing jobs; scheduling of data processing jobs; dependencies between data processing jobs; or common parameters between data processing jobs (Since this appears to be MARKUSH type language requiring at a minimum just one from the list, Wong teaches Paragraph [0064] “The matrix structure may, for example, store the metadata tags in a linked list wherein the linkages define interrelationships between the metadata tags. Such established linkages may be especially helpful where the metadata tags are provided in association with data points of information where there are myriad linkages (e.g., various N:N, 1:N, N:1) between data points”).
The feature of providing wherein logical connections between data processing jobs may include any one or more of: sequencing of data processing jobs; scheduling of data processing jobs; dependencies between data processing jobs; or common parameters between data processing jobs would be obvious for the reasons set forth in the rejection of claim 1.

Per claim 10:
10. The method of claim 1, further comprising: optimizing a structure of the workflow based on the design-time information (Col. 4, lines 4-8 “FIG. 1A describes the three main stages of the overall workflow management framework… definition of workflow (step 102), compilation of workflow (step 104) and processing of workflow (step 106)”); and generating an optimized workflow definition based on the optimized structure of the workflow (Col. 4, lines 14-16 “A workflow comprises a definition that describes how a stream of data is processed (step 102)”), the optimized workflow definition being executable by the computer system to process data according to the optimized structure of the workflow (Col. 4, lines 18-24 “The workflow definition is then fed to a workflow compiler that takes the workflow definition and produces a description of processing stages that the data flows through (step 104). The processing stages transform the data according to the workflow definition. Through the entire workflow processing dataflow, metadata is collected about the data and how it is being processed”).

Per claim 11:
Goodson discloses:
11. The method of claim 10, wherein optimizing the structure of the workflow includes any one or more of: changing the data processed according to the workflow; changing the sequencing and/or scheduling of data processing jobs included in the workflow; or changing one or more of the services utilized to store and process the data according to the workflow (Since this appears to be MARKUSH type language requiring at a minimum just one from the list, Goodson teaches Col. 5, lines 20-23 “A workflow compiler (element 105) takes the workflow definition and translates it into a set of actions that the processing system knows how to perform on the data”).


Per claim 12:
Goodson discloses:
12. The method of claim 10, wherein the structure of the workflow is optimized to improve data processing efficiency and/or data storage efficiency (Col. 4, lines 25-26 “metadata can be used to drive workflow improvement/data management policies”).

Per claim 13:
Goodson discloses:
13. The method of claim 1, further comprising: configuring the computer system to log and prepare the run-time metadata for extraction (Col. 8, lines 53-57 “Content of metadata may include where data physically resides, its characteristics (e.g., internal formats, size, access logs, etc.), the data owner… Metadata collected from the processing layer may include what transformations are being performed at what stages” note here the logs are accessed thus logging service implies here).

Per claim 14:
Goodson discloses:
14. The method of claim 1, wherein the workflow is heterogeneous, the heterogeneous workflow including a plurality of different types of data processing jobs performed by a plurality of different types of services associated with the computer system (Col. 4, lines 3-7 “FIG. 1A describes the three main stages of the overall workflow management framework, the stages being: definition of workflow (step 102), compilation of workflow (step 104) and processing of workflow (step 106)”).

Per claim 15:
Goodson discloses:
15. The method of claim 1, further comprising: determining if the workflow involves processing personally identifiable information (PII) based on the design-time information (Col. 6, line 61 to col. 11, line 5 “A workflow consisting of two input Sources where one input source 626 is an event stream of sensor data generated by home utility monitors and the second input source 628 is a database consisting of customer demographic information… customer demographic database and breaks down the event stream by user demographics (e.g., by gender and/or age and/or address)… demographic group aggregations are computed for neighborhood usage (e.g., within some Zipcode)”).

Per claim 16:
The rejection of claim 1 is incorporated and further, Goodson does not explicitly discloses generating a data lineage visualization based on the data lineage associated with the workflow; and causing display of the data lineage visualization.
However, Wong discloses in an analogous computer system generating a data lineage visualization based on the data lineage associated with the workflow; and causing display of the data lineage visualization (Paragraph [0241] “Specific formatting may be automatically applied based on the metadata tags, for example: conditional formatting (i.e. colour coding & applying of symbols) to highlight status of objects and daily process runs; and visualizations”).
The feature of providing generating a data lineage visualization based on the data lineage associated with the workflow; and causing display of the data lineage visualization would be obvious for the reasons set forth in the rejection of claim 1.

Per claim 17:
The rejection of claim 16 is incorporated and further, Goodson does not explicitly discloses wherein the data lineage diagram visualization includes: a plurality of graphical entity nodes representative of a least some of the plurality of the identified entities involved in the processing of data according to the workflow, each of the plurality of graphical entity nodes visually linked to one or more of the other plurality of graphical entity nodes based on the identified relationships between the plurality of entities.
However, Wong discloses in an analogous computer system wherein the data lineage visualization includes: a plurality of graphical entity nodes representative of a least some of a plurality of the entities involved in the processing of data according to the workflow, each of the plurality of graphical entity nodes visually linked to one or more of the other plurality of graphical entity nodes based on relationships between the plurality of entities (Paragraph [0322] “The graph consists of vertices and edges that represent entities and relationships. The sequence for building this graph is as follows: Extract and process semi-structured data from external sources and databases, entities discovery, relationship discovery, and data extraction and processing semi-structured data”).
The feature of providing wherein the data lineage visualization includes: a plurality of graphical entity nodes representative of a least some of a plurality of the entities involved in the processing of data according to the workflow, each of the plurality of graphical entity nodes visually linked to one or more of the other plurality of graphical entity nodes based on relationships between the plurality of entities would be obvious for the reasons set forth in the rejection of claim 1.

Per claim 18:
The rejection of claim 17 is incorporated and further, Goodson does not explicitly discloses wherein at least some of the plurality of graphical entity nodes include interactive elements, which when interacted with by a user, display information regarding the represented entities.
However, Wong discloses in an analogous computer system wherein at least some of the plurality of graphical entity nodes include interactive elements, which when interacted with by a user, display information regarding the represented entities (Paragraph [0322] “The graph consists of vertices and edges that represent entities and relationships. The sequence for building this graph is as follows: Extract and process semi-structured data from external sources and databases, entities discovery, relationship discovery, and data extraction and processing semi-structured data”).
The feature of providing wherein at least some of the plurality of graphical entity nodes include interactive elements, which when interacted with by a user, display information regarding the represented entities would be obvious for the reasons set forth in the rejection of claim 1.

Claim 19 is/are the medium/product claim corresponding to method claim 1 and rejected under the same rational set forth in connection with the rejection of claim 1 as noted above.

Claim 20 is/are the apparatus/system claim corresponding to method claim 1 and rejected under the same rational set forth in connection with the rejection of claim 1 as noted above.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Related cited arts:
US 20180357333 A1 discloses a system may use content classes to manage indexing of object data. A content class may include a set of one or more content properties. Each content property may include a name, an expression for extracting data, and an expression type. When object data is received, such as for indexing, the expression type of each content property may be compared with the data type of the received data. Based at least in part on determining that the expression type matches the data type, the system may extract a data value from the received data in accordance with the expression. The system may save the extracted data value to a data structure in association with the name of the content property, and may subsequently use the data value and the name of the content property when creating an index for the object data.
USPN 9015118 discloses methods and apparatus, including computer program products, implementing and using techniques for determining provenance and lineage for content elements in a content management system. An option to track provenance and lineage data for the content element is provided in response to a content element being entered into a content management system. A provenance metadata attribute and a lineage metadata attribute are associated with the content element in response to selecting the option to track provenance and lineage data. An extent of difference is determined between the original content element and the changed content element in response to a change of content being made to the content element. The provenance metadata attribute is updated to reflect the determined extent of difference. It is determined what user changed the content element, and the lineage metadata attribute is updated to reflect the user's involvement in changing the content element.
US 20150347193 A1 discloses methods, systems, and apparatus, including computer programs encoded on computer storage media, for workload automation and job scheduling information. One of the methods includes obtaining job dependency information, the job dependency information specifying an order of execution of a plurality of jobs. The method also includes obtaining data lineage information that identifies dependency relationships between data stores and transformation, wherein at least one transformation accepts data from a first data store and produces data for a second data store. The method also includes creating links between the job dependency information and the data lineage information. The method also includes determining an impact of a change in a planned execution of an application of the plurality of applications based on the job dependency information, the created links, and the data lineage information.

Berrick, Stephen W., et al. "Giovanni: a web service workflow-based data visualization and analysis system." 

Wang, Shaowen, et al. "Towards provenance-aware geographic information systems." 

Woodruff, Allison, and Michael Stonebraker. "Supporting fine-grained data lineage in a database visualization environment." 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Satish Rampuria at (571) 272-3732.  The examiner can normally be reached Monday-Friday between 8:30 am to 5:00 pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chat Do can be reached on (571) 272-3721.  Any inquiry of a general nature or relating to the status of this application should be directed to the TC 2100 Group receptionist: (571) 272-2100.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/Satish Rampuria/Primary Examiner, Art Unit 2193