Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant's arguments filed 08/02/2021 have been fully considered but they are not persuasive.  There are following Applicant’s argument :

    PNG
    media_image1.png
    255
    827
    media_image1.png
    Greyscale

	Examiner does not agree with Applicant’s argument above (page 8) since Sudhakar discloses the higher level edges are file folders (i.e., “The multilevel data lineage view system provides higher level edges such as a cluster level edge 120, a virtual cluster level edge 122, and a folder level edge 124…the path "Datastore/clusterA/MyVC1/MyFolder/Stream A" for the input node 104”(0015) and “solve the technological problem of determining which clusters, virtual clusters, folders, and other data source constituent parts of an input data source are related to which clusters, virtual clusters, folders, and other data source constituent parts of an output data source”(0012)).  Examiner indicates the myfolder, cluster or MyVC1 are file folders as claimed invention.  Further, Applicant argued the reference fail to teach “generating, by one or more computer processor, a file folder, the file folder comprising data and having a file path.  Examiner does not agree with Applicant since Sudhakar discloses generating, by one or more computer processor, a file folder, the file folder comprising data and having a file i.e., “For example, the number of assets may be the number of files in a folder, number of tables in a folder, number of folders in virtual cluster, etc. In one implementation, the edges are added when the number of assets for a given constituent part of input node 104 is approximately similar to the number of assets for a given constituent part of output node 106”(0030) and the output is generated as claimed invention since the system implementation to create the output).  Further, the paragraph give the example how to implantation or create the folder comprising the data and having a file path (fig. 1) with folder files, for example, datastore/clusterB or myVC2, etc. and the path are datastore/clusterB or datastore/cluster/MyVC2, etc. Another example how to create the file folder (i.e., “the multilevel data lineage view system creates higher level edge from a constituent part of the input node 104 to a constituent part of the output node 106 that is closest in terms of the number of assets of the constituent part of the input node 104” (0032) or ‘the input node may be a file that is used in generating a file that represents the output node” (0046)).  Therefore, the Applicant’s arguments are not persuasive. 

    PNG
    media_image2.png
    137
    845
    media_image2.png
    Greyscale

Examiner does not agree with Applicant since Sudhakar discloses datastore 202, 204 (fig. 2) store plurality resources such as tables, etc. Each datastore has resource are used to generated other resource (0034).  Sudhakar discloses using application to generate other resource by using the table resource (i.e., “The resources 220, 222 may be used as source file for generating other resources” (0034)).  Further, Sudhakar disclose a “multilevel data lineage view engine 230 generates higher level edges between various resources based on one or more lower level edges, such as edge 224” (0036) i.e., “The resource path store 236 may receive such paths from the resource table 206”(0036) or “The multilevel data lineage view system 200 also includes a resource table 206 that stores the listing of the one or more of the various resources in various datastores… the edge database 208 may include an edge 224 identifying an edge between the resource 220 and the resource 222, where the resource 220 is an input resource and the resource 222 is an output resource”(0035) and Examiner asserts the resource 220 and 222 is table).  Furthermore, Sudhakar discloses the output is generated and output including the table as claimed invention (i.e., “For example, the number of assets may be the number of files in a folder, number of tables in a folder, number of folders in virtual cluster, etc. In one implementation, the edges are added when the number of assets for a given constituent part of input node 104 is approximately similar to the number of assets for a given constituent part of output node 106” (0030)). Fig. 3 shows create edges between parts of equal or similar size at step 312, and it is generating data table associated with the data from the aforementioned file folder (edges).  Therefore, the Applicant’s arguments are not persuasive.

    PNG
    media_image3.png
    210
    791
    media_image3.png
    Greyscale


    PNG
    media_image4.png
    133
    806
    media_image4.png
    Greyscale

i.e., “the multilevel data lineage view system adds higher level edges between constituent parts of the input node 104 with constituent parts of the output node 106 based on number of assets at various constituent parts …the number of assets may be the number of files in a folder, number of tables in a folder, number of folders in virtual cluster, etc. In one implementation, the edges are added when the number of assets for a given constituent part of input node 104 is approximately similar to the number of assets for a given constituent part of output node 106” (0030) and output node is recording an overall lineage comprising the lineage of data into the construct identified).  Examiner asserts that based the input, the system generates the output with “the edges are added when the number of assets for a given constituent part of input node 104 is approximately similar to the number of assets”.  Further, fig. 2 shows multilevel data lineage view engine 230 generates higher level edge between various resource based on one or more lower level edges and the output 260 are stored or are recorded in edge database 208 (fig. 2) (“The newly created higher-level edges may be added back to the edge database 208 as illustrated by 260”(0045)).  Therefore, the Applicant’s arguments are not persuasive.  

    PNG
    media_image5.png
    171
    859
    media_image5.png
    Greyscale

Examiner does not agree with Applicant’s argument since Sudhakar discloses generating a folder including data and a file path, or generating a table associated with the data and the file path (see Examiner’s response above). Further, Sudhakar discloses tracking data into the folder or from the table (see rejection below).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2 are rejected under 35 U.S.C 103(a) as being unpatentable over Sudhakar el al. (U.S. Pub. 2002/0301910 A1) in view of Subramanian et al. (U.S. 10,445,170 B1).
With respect to claim 1, Sudhakar et al. discloses a computer implemented method for automatically extracting lineage data, the method comprising: 
generating, by one or more computer processors, a file folder, the file folder comprising data and having a file path (i.e., “The multilevel data lineage view system provides higher level edges such as a cluster level edge 120, a virtual cluster level edge 122, and a folder level edge 124…the path "Datastore/clusterA/MyVC1/MyFolder/Stream A" for the input node 104”(0015) and “solve the technological problem of determining which clusters, virtual clusters, folders, and other data source constituent parts of an input data source are related to which clusters, virtual clusters, folders, and other data source constituent parts of an output data source”(0012) and “if the input node is a file within a datastore, the operation 304 determines the full path to that file from the datastore level including all the intervening clusters, virtual clusters, folders, etc. An operation 306 parses the full paths to the nodes into its constituent parts” (0046)); 
generating, by the one or more computer processors, a data table associated with the data and the file path (i.e., “the number of assets may be the number of files in a folder, number of tables in a folder, number of folders in virtual cluster, etc.”(0030) and “A multilevel data lineage view engine 230 generates higher level edges between various resources based on one or more lower level edges, such as the edge 224…The resource path store 236 may receive such paths from the resource table 206” (0036) and Examiner asserts the engine 230 generating a table associated with data and the file path based on resource path store 236 that is received from the resource table 206 (fig. 2)); 
tracking, by the one or more computer processors, the lineage of data into the file folder (“FIG. 3 illustrates example operations 300 for providing multilevel data lineage view for a datastore…if the input node is a file within a datastore, the operation 304 determines the full path to that file from the datastore level including all the intervening clusters, virtual clusters, folders, etc. An operation 306 parses the full paths to the nodes into its constituent parts” (0046) and Examiner asserts the operation 304 is tracking as claimed limitation); 
 tracking, by the one or more computer processors, the lineage of data from the data table (i.e., “determining relations between at least some of these objects at a first level in the hierarchy, and inferring relationships between the objects at a second level in the hierarchy based on the relationships between the objects at the first level in the hierarchy based on a count of assets of constituent parts of the objects at the first level in the hierarchy, wherein the second level is above the first level in the hierarchy”(abstract) and “the multilevel data lineage view system adds higher level edges between constituent parts of the input node 104 with constituent parts of the output node 106 based on number of assets at various constituent parts For example, the number of assets may be the number of files in a folder, number of tables in a folder” (0030)); 
recording, by the one or more computer processors, an overall lineage comprising the lineage of data into the file folder and from the data table (i.e., “Each of the datastores 202, 204 may store a plurality of resources, such as files, tables, etc.”(0034)); and further, Sudhakar et al. discloses transform from the input note and input source to target data 0012) but Sudhakar does not explicitly discloses utilizing, by the one or more computer processors, the overall lineage in conjunction with an extract - transform - load (ETL) job.  However Subramanian et al. discloses utilizing, by the one or more computer processors, the overall lineage in conjunction with an extract - transform - load (ETL) job (i.e., “the plurality of data sources comprise at least one of databases, entity relationship models, extract transform and load ( ETL) systems, extract load and transform (ELT) systems, business intelligence reporting systems, and web configuration systems”(col. 4, lines 1-10) and “solutions do not use advanced machine learning algorithms and techniques to advantageously self-learn using existing data lineage information in conjunction with incident tickets (arising from data object errors) to discover both indirect relationships between data sources and assess the likelihood of failure if a data object is changed”(col. 1, lines 60-67)).
  It would have been obvious for a person of ordinary skill in the art, as of the effective filing date of the claimed invention, to include ETL in order to use advanced machine learning algorithms and techniques to advantageously self-learning using existing data lineage information in conjunction for the stated purpose has been well known in the art as evidenced by teaching of Subramanian et al. (col. 1, lines 60-67).  Both references teach the same field such as lineage data.


	

  
i.e., “A multilevel data lineage view system disclosed herein allows generating higher level data lineage views” (abstract)).  
Claims 3-5 and 7 are rejected under 35 U.S.C 103(a) as being unpatentable over Sudhakar el al. (U.S. Pub. 2002/0301910 A1), Subramanian et al. (U.S. 10,445,170 B1) and further in view of Bhide et al. (U.S. Pub. 2015/0134699 A1).
With respect to claim 3, Sudhakar and Subramanian et al. disclose all limitation recited in claim 1 except for wherein the file folder comprises a HADOOP distributed file system (HDFS) folder.  However, Bhide et al. discloses wherein the file folder comprises a HADOOP distributed file system (HDFS) folder (i.e. “data in distributed files systems, such as an Apache.RTM. Hadoop.RTM. Distributed File System (HDFS.TM.), is accessed in the form of directories and files.”(0004)). It would have been obvious for a person of ordinary skill in the art, as of the effective filing date of the claimed invention, to include HADOOP distributed file system in order to easy to access in the folder and files for the stated purpose has been well known in the art as evidenced by teaching of Bhide et al. (0004). 
With respect to claim 4, Bhide et al. discloses wherein the data table comprises a HIVE data table (i.e., “database-like tools may be used to allow inspection and querying of the data in the files. In certain embodiments, an Apache.RTM. Hive.RTM. data warehouse software is a database-like layer on top of the files in the Apache.RTM. Hadoop.RTM. Distributed File System (HDFS.TM.) and allows for querying and managing the data in the files. In certain alternative embodiments, Apache.RTM. Hbase.TM. tables are created on top of the files. (Apache and Hbase are trademarks or registered trademarks of Apache Software Foundation in the United States and/or other countries”(0018)).  
With respect to claim 5, Bhide et al. discloses wherein tracking the lineage of data into the file folder comprises tracking an ETL job loading data into the file folder (i.e., “These files are present in the folder that is mapped to the target table” (0055) and “The data movement engine 130a . . . 130n may make use of Extract, Transform, and Load (ETL) tools, as well as, change data capture technology. The data movement engine 130a . . . 130n (1) simplifies the data movement process, (2) ensures that all the data is moved (irrespective of presence of special delimiter characters” (0048)).  
With respect to claim 7, Bhide et al. discloses wherein further comprising providing the overall lineage to a user, wherein the file folder comprises an HDFS folder (i.e., “data in distributed files systems, such as an Apache.RTM. Hadoop.RTM. Distributed File System (HDFS.TM.), ” (0004)), and the data table comprises a HIVE data table (i.e., “n Apache.RTM. Hive.RTM. data warehouse software is a database-like layer on top of the files in the Apache.RTM. Hadoop.RTM. Distributed File System (HDFS.TM.) and allows for querying and managing the data in the files” (0018)).  
Claim 6 is rejected under 35 U.S.C 103(a) as being unpatentable over Sudhakar el al. (U.S. Pub. 2002/0301910 A1), Subramanian et al. (U.S. 10,445,170 B1) and further in view of Marrelli et al. (U.S. Pub. 2015/0134589 A1).
With respect to claim 6, Sudhakar and Subramanian et al. disclose all limitation recited in claim 1 except for wherein tracking the lineage of data from the data table comprises tracking an ETL job reading data from the data table.  However, Marrelli et al. discloses wherein tracking the lineage of data from the data table comprises tracking an ETL job reading data from the data table (i.e., “data flow aggregator module 410 also performs the following functions: (i) determines the number of accumulated record failures per table across all ETL functions; (ii) determines based on table relationships how many business object instances are affected by individual record failures;”(0051)).  It would have been obvious for a person of ordinary skill in the art, as of the effective filing date of the claimed invention, to include tracking 
With respect to claims 8-20, claims 8-20 are rejected as claims 1-7 above since the claims 8-20 are similar with set of claims 1-7 above but different form.
Reference:
U.S. Pub. 2015/0134589 A1 (ETL with lineage from business processes in the target to the table source (0096) 
Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HUNG T VY whose telephone number is (571)272-1954.  The examiner can normally be reached on M-F 8-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tony Mahmoudi can be reached on (571)272-4078.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  





/HUNG T VY/Primary Examiner, Art Unit 2163                                                                                                                                                                                                        September 15, 2021