DETAILED ACTION
Response to Amendment
The amendment filed on 08/01/22 has been entered. Claims 1, 4-6, 11, 14-15, 18, 22-33 are pending in the application. It is acknowledged that claim 3, 13, 20, 21 have been cancelled and claims 22-33 are newly added.	

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 5, 11, 18, 22-33 are rejected under 35 U.S.C. 103 as being unpatentable over of Reynolds (US 2018/0262864) in view of Oberbreckling (US 2018/0075104) and further in view of Soza (US 2018/0096001) and Kinsella (US 2012/0124079).
Regarding claim 1, Reynolds discloses:
A method for data migration utilizing inference of location attributes from data entries in structured data sets, said method comprising: …receiving, from a plurality of servers in a cluster of servers…, a first set of data from a first data source and a second set of data from a second data source, said structured data sets comprising the first set of data and the second set of data, said first set of data including first location attributes having first location values in a first format, said second set of data including second location attributes having second location values in a second format at least by ([0072] “different entities 405 a, 405 b, and 405 n may each include a computing device 402 (e.g., representative of one or more servers and/or data processors) and one or more data storage devices 403 (e.g., representative of one or more database and/or data store technologies” [0102] “In the event that dataset 762 (and latitude 724 and longitude 726 data) are formatted differently than dataset 702, then latitude 724 and longitude 726 data may be converted to an atomized data format (e.g., compatible with RDF).”) and the cluster of servers are the servers represented by computing device 402; the first data set is one of the plurality of data sets, such as dataset 762 while the second dataset is another one of the data sets, such as dataset 702 which both comprise latitude and longitude data that are formatted differently;
identifying, by the plurality of servers in the cluster, geospatial and temporal information within the structured data sets utilizing machine learning at least by ([0072] “different entities 405 a, 405 b, and 405 n may each include a computing device 402 (e.g., representative of one or more servers and/or data processors) and one or more data storage devices 403 (e.g., representative of one or more database and/or data store technologies” [0106] describes inferring country codes (geospatial) based on machine learning (as in [0124] which also describes classifying dates (temporal information) for machine learning inferences) [0124] “As to the latter, a datatype, a data classification, etc., as well any dataset attribute, may be derived based on predictive inferences (e.g., via machine learning, etc.) using patterns in data 1203 a to 1203 d.”);
in response to a prior determination that the second format of the second location values does not match the first format of the first location values, … identifying, by the plurality of servers in the cluster using the parallel processing and using entity recognition, in the first and second sets of data within the structured data sets, location information pertaining to the first and second location attributes and the first and second location values therein respectively at least by ([0102] “In the event that dataset 762 (and latitude 724 and longitude 726 data) are formatted differently than dataset 702, then latitude 724 and longitude 726 data may be converted to an atomized data format (e.g., compatible with RDF). Thereafter, a supplemental atomized dataset can be formed by linking or integrating atomized latitude 724 and longitude 726 data with atomized population 704 data in an atomized version of dataset 702. Similarly, inference engine 780 may correlate columns 724 and 726 of dataset 722 to columns 761 and 764. As such, earthquake data in row 770 of dataset 762 may be correlated to the city in row 734 (“Springfield, Ill.”) of dataset 722 (or correlated to the city in row 716 of dataset 702 via the linking between columns 704 and 721). The earthquake data may be derived via latitude and longitude coordinate-to-earthquake correlations as supplemental data for dataset 702.”) and the location information pertaining to the first and second location attributes are the “city”, “lat” and “long” attributes;
… determining, by the plurality of servers in the cluster using the parallel processing, implied location information in the first and second sets of data based on the identified location information at least by ([0102] “Inference engine 780 may be configured to detect a pattern in the data of column 704 in dataset 702. For example, column 704 may be determined to relate to cities in Illinois based on the cities shown (or based on additional cities in column 704 that are not shown, such as Skokie, Cicero, etc.). Based on a determination by inference engine 780 that cities 704 likely are within Illinois, then row 716 may be annotated to include annotative portion (“IL”) 790 (e.g., as derived supplemental data) so that Springfield in row 716 can be uniquely identified as “Springfield, Ill.” rather than, for example, “Springfield, Nebr.” or “Springfield, Mass.”…As such, earthquake data in row 770 of dataset 762 may be correlated to the city in row 734 (“Springfield, Ill.”) of dataset 722 (or correlated to the city in row 716 of dataset 702 via the linking between columns 704 and 721). The earthquake data may be derived via latitude and longitude coordinate-to-earthquake correlations as supplemental data for dataset 702.”) and the implied location information could be the inferred or annotated information determined for each city record as including the name of a city in Illinois; the implied location information could also be the derived earthquake data as given in several examples throughout the reference,
wherein the determined implied location information includes N different sets of information in the first set of data or the second set of data, wherein the N sets of information provide N respective different clues suggesting N respective values of one location attribute, and wherein N is at least 2; …implied value/values… at least by ([0102] “Inference engine 780 may be configured to detect a pattern in the data of column 704 in dataset 702. For example, column 704 may be determined to relate to cities in Illinois based on the cities shown (or based on additional cities in column 704 that are not shown, such as Skokie, Cicero, etc.). Based on a determination by inference engine 780 that cities 704 likely are within Illinois, then row 716 may be annotated to include annotative portion (“IL”) 790 (e.g., as derived supplemental data) so that Springfield in row 716 can be uniquely identified as “Springfield, Ill.” rather than, for example, “Springfield, Nebr.” or “Springfield, Mass.” Further, inference engine 780 may correlate columns 704 and 721 of datasets 702 and 722, respectively. As such, each population number in rows 710 to 716 may be correlated to corresponding latitude 724 and longitude 726 coordinates in rows 730 to 734 of dataset 722. Thus, dataset 702 may be enriched by including latitude 724 and longitude 726 coordinates as a supplemental subset of data.”) and the listing of cities in column 704 provides more than 2 clues that suggest the additional attribute “IL” because they are all cities within Illinois, as well as further inferring the latitude and longitude of these cities and enriching dataset 702 to also include the latitude and longitude as inferred; that is, the suggesting of at least 2 respective values of one location attribute is the inference of the state “IL” as well as the latitude and longitude attributes for each of the cities listed in 704, respectively, and as shown in Fig. 7;
deriving, by the plurality of servers in the cluster, location values based on the identified location information and the determined implied location information using consolidation rules, resulting in a final set of location attributes and associated location values for the data entries at least by ([0102] “Inference engine 780 may be configured to detect a pattern in the data of column 704 in dataset 702. For example, column 704 may be determined to relate to cities in Illinois based on the cities shown (or based on additional cities in column 704 that are not shown, such as Skokie, Cicero, etc.). Based on a determination by inference engine 780 that cities 704 likely are within Illinois, then row 716 may be annotated to include annotative portion (“IL”) 790 (e.g., as derived supplemental data) so that Springfield in row 716 can be uniquely identified as “Springfield, Ill.” rather than, for example, “Springfield, Nebr.” or “Springfield, Mass.”…”) and the final set of location attributes are the cities listed in 721 of table 722 which included annotated portion “IL” for each city from table 702,
wherein the first set of data or the second set of data includes a file … at least by ([0047] “a file 105 is selected and dragged via pointer element 107 (e.g., a pointer device, or any other interface selection tool, including a finger) into file upload interface 106. Computing device 109 a may detect a data signal generated by the implementation of “create dataset” input 141, which may initiate creation of the dataset. As an example, consider that file 105 may include data formatted in a particular data arrangement, such as formatted as a CSV file, a TSV, an XLS file, or the like. In one example, a set of data 104 from file 105 may be uploaded, responsive to dragging icon of file 105 to upload interface 106, into collaborative dataset consolidation system 110, which, in turn, may generate an atomized dataset 142 a.”) and the files are any of the CSV, TSV, or XLS files,
wherein the file is formatted as a table at least by ([0060] “activation of the “create dataset” user input can initiate creation of an atomized dataset based on a set of data, which may include, for example, raw data in data file (e.g., a tabular data file, such as a XLS file, etc.).”);
and wherein each respective implied value of the N respective implied values is one or more data items inferred from the filename or inferred from content in a cell in the table at least by ([0102] “Inference engine 780 may be configured to detect a pattern in the data of column 704 in dataset 702. For example, column 704 may be determined to relate to cities in Illinois based on the cities shown (or based on additional cities in column 704 that are not shown, such as Skokie, Cicero, etc.). Based on a determination by inference engine 780 that cities 704 likely are within Illinois, then row 716 may be annotated to include annotative portion (“IL”) 790 (e.g., as derived supplemental data) so that Springfield in row 716 can be uniquely identified as “Springfield, Ill.” rather than, for example, “Springfield, Nebr.” or “Springfield, Mass.”…”) and the implied values are annotated portion “IL” determined based on each city from table 702 (inferred from content in a cell in the table);
associating, by plurality of servers in the cluster, the final set of location attributes with the data entries at least by ([0102] “Thereafter, a supplemental atomized dataset can be formed by linking or integrating atomized latitude 724 and longitude 726 data with atomized population 704 data in an atomized version of dataset 702. Similarly, inference engine 780 may correlate columns 724 and 726 of dataset 722 to columns 761 and 764.”) and the dataset 702 is enriched with location information, including the latitude and longitude values, from dataset 722 by associating and linking their attributes; and columns 724 and 726 of dataset 722 are associated to columns 761 and 764, which is now associate with columns 704 of dataset 702 to form final consolidated datasets as shown in Fig. 7;
… transforming, by the plurality of servers in the cluster using the parallel processing, the first and second location attributes and the first and second location values therein respectively, said …transforming utilizing the implied location information at least by ([0102] “Inference engine 780 may be configured to detect a pattern in the data of column 704 in dataset 702. For example, column 704 may be determined to relate to cities in Illinois based on the cities shown (or based on additional cities in column 704 that are not shown, such as Skokie, Cicero, etc.). Based on a determination by inference engine 780 that cities 704 likely are within Illinois, then row 716 may be annotated to include annotative portion (“IL”) 790 (e.g., as derived supplemental data) so that Springfield in row 716 can be uniquely identified as “Springfield, Ill.” rather than, for example, “Springfield, Nebr.” or “Springfield, Mass.”…”)”) and the transforming could be the annotating of the cities 704 with “IL” to form city record 721 which include this annotative portion;
said simultaneously transforming resulting in a format of the transformed first location values and a format of the transformed second location values being matched, said … transforming further resulting in the first and second data sets being changed to include the transformed first and second location attributes and the first and second location values therein respectively at least by ([0060] “Thereafter, a supplemental atomized dataset can be formed by linking or integrating atomized latitude 724 and longitude 726 data with atomized population 704 data in an atomized version of dataset 702. Similarly, inference engine 780 may correlate columns 724 and 726 of dataset 722 to columns 761 and 764. As such, earthquake data in row 770 of dataset 762 may be correlated to the city in row 734 (“Springfield, Ill.”) of dataset 722 (or correlated to the city in row 716 of dataset 702 via the linking between columns 704 and 721). The earthquake data may be derived via lat/long coordinate-to-earthquake correlations as supplemental data for dataset 702. Thus, new links (or triples) may be formed to supplement population data 704 with earthquake magnitude data 768” [0065] “normalizing the data set to create a normalized data set includes modifying the data set having one format to an adjusted format as a normalized data set, the adjusted format being different from the format. A data set may be normalized by identifying one or more columns of data in the data set, and modifying a format of the data corresponding to the columns to the same format. For example, data having different formatted dates in a data set may be normalized by changing the formats to a common format for the dates that can be processed by profile engine 326.” [0094] “For example, the transform engine 322 can generate a transformation script to transform a column of dates based on a recommendation from recommendation engine 308 to modify, or convert, the formats of the dates in the column.” [0131] “transforms listed in panel 404 may have been applied at the direction of the user (e.g., in response to an instruction to apply the transform) or may have been applied automatically);
automatically merging, by the plurality of servers in the cluster using the parallel processing, the changed first data set and the changed second data set to form a merged data set at least by ([0099] “dataset enrichment manager 636, according to some examples, may be configured to identify correlated datasets based on correlated attributes as determined, for example, by attribute correlator 663. The correlated attributes, as generated by attribute correlator 663, may facilitate the use of derived data or link-related data, as attributes, to form associate, combine, join, or merge datasets to form collaborative datasets”) and the merged data set are the collaborative datasets;
Reynolds fails to explicitly disclose “…automatically and simultaneously…; … using parallel processing; wherein said deriving location values comprises forming a Venn diagram comprising an intersection of the N respective … values of the one location attribute and determining that the intersection includes only one … value of the N respective … values, and wherein the final set of location attributes and associated location values includes the one location attribute whose associated value is the one … value included in the intersection; simultaneously transforming…; ; … a file having a filename; and automatically and simultaneously transmitting, by the plurality of servers in the cluster using the parallel processing, the merged data set to a plurality of computing systems for processing by an application in each computing system; wherein the plurality of servers in the cluster implement the parallel processing by working collectively as a single system in a Hadoop environment”
However, Oberbreckling teaches the following limitations, “…automatically and simultaneously…; … using parallel processing at least by ([0048] “Although a flowchart may describe the operations as a sequential process, many of the operations may be performed in parallel or concurrently.”);
and automatically and simultaneously transmitting, by the plurality of servers in the cluster using the parallel processing, the merged data set to a plurality of computing systems for processing by an application in each computing system at least by ([0127] “a transform engine can transform (e.g., repair and/or enrich) the normalized data based on the metadata. The resulting enriched data can be provided to the publish engine to be sent to one or more data targets”) ;
wherein the plurality of servers in the cluster implement the parallel processing by working collectively as a single system in a Hadoop environment at least by ([0041] “The data sources may be sampled, and the sampled data analyzed for enrichment, making large data sets more manageable. The identified data can be received and added to a distributed storage system (such as a Hadoop Distributed Storage (HDFS) system) accessible to the data enrichment service.” [0121] “Data enrichment service 302 may request data to be processed from data sources. The data sources may be sampled and the sampled data may be stored in a distributed storage system (such as a Hadoop Distributed Storage (HDFS) system) accessible to data enrichment service”).
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the teaching of Oberbreckling into the teaching of Reynolds because the references similarly disclose the associating of disparate datasets. Consequently, one of ordinary skill in the art would be motivated to further modify the system as in Reynolds to further include the processing of the merged dataset by applications and using an HDFS as in Oberbreckling in order to provide the users the applications with a unified view of the datasets.
Reynolds, Oberbreckling fail to explicitly disclose “wherein said deriving location values comprises forming a Venn diagram comprising an intersection of the N respective … values of the one location attribute and determining that the intersection includes only one … value of the N respective … values, and wherein the final set of location attributes and associated location values includes the one location attribute whose associated value is the one … value included in the intersection; … a file having a filename”
However, Soza teaches the above limitations at least by ([0273] “The statistics computed are summarised below, with reference to FIG. 14B which shows a Venn diagram illustrating the overlap between column values for two columns A and B. Here, “a” represents the set of distinct valid (non-null and not excluded) values that appear in column A, whilst “b” represents the set of distinct valid (non-null and not excluded) values of column B. Intersection “c” represents the set of unique intersecting values; that is, distinct valid values that are common to (appear in both) column A and column B” [0296] disclose that the values of different columns from the different sources can have different formats that are standardized and that they can include time/date values (location values)) and any of the distinct values that are common to both column A and column B are the only one values of the set of unique intersecting values that are common to both columns. That is, [0062] of the applicant’s specification describes that the location attribute/values can be dates or date values in a date format as analogously mentioned in Soza which states that the different columns/values can be time/dates in different formats that can be standardized. 
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the teaching of Soza into the teaching of Reynolds, Oberbreckling because the references similarly disclose the associating of disparate datasets. Consequently, one of ordinary skill in the art would be motivated to further modify the system as in the combination of references to further include the forming of a Venn diagram that shows intersecting values as in Soza in order to be able to display the overlapping data in a more clear and concise graphical manner.
Reynolds, Oberbreckling, Soza fail to explicitly disclose “… a file having a filename”
However, Kinsella teaches the above limitations at least by ([0038] “If the current geographical location data matches a geofence, the processor 300 can generate a filename comprising the label of the geofence at block 465. The processor 300 can receive camera data (block 430) and apply the filename comprising the label of the geofence to the camera data (block 435). Thus, when camera data is received at a location within the boundary of the geofence, the processor 300 can determine that the camera data is being captured within the geofence and can apply a filename comprising the geofence label to the camera data. With such a filename, the user of the mobile device can easily identify the content of the camera data as being associated with the geofence (for example, camera data having a filename containing the geofence label “School” can indicate that the camera data includes images of schoolmates, teachers, and classrooms)”) and the implied value is the label of the geofence, or location of the camera data, obtained from the file name.
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the teaching of Kinsella into the teaching of Reynolds, Oberbreckling, Soza because the references similarly disclose the processing of file data. Consequently, one of ordinary skill in the art would be motivated to further modify the system as in the combination of references to further include the identifying of location information from filenames as in Kinsella in order to allow the user to “easily identify the file for retrieval and organization without having to rename the file after the camera data is captured” (Kinsella, [0040]).
As per claim 5, claim 1 is incorporated, Reynolds further discloses:
wherein said determining implied location information includes inferring using at least one of dictionaries and geocoding services, the implied location information using at least one of: a missing latitude/longitude, a missing address, a missing city, a missing state, a missing country, and a missing time zone at least by ([0132] “FIG. 13 may include structures and/or functions as similarly-named or similarly-numbered elements depicted in other drawings, or as otherwise described herein, in accordance with one or more examples. As shown, the dataset may be presented in a tabular format arranged in rows of data in accordance with a specific time (e.g., column 1303 data). The dataset is shown to include column data 1306 a (i.e., latitude coordinates), column data 1306 b (i.e., longitude coordinates),” [0133] “Hence, the “place” of an earthquake can be calculated (e.g., using a data derivation calculator or other logic) to determine a geographic location based on latitude and longitude data of an earthquake event (e.g., column data 1306 a and 1306 b) at a distance 1319 from a location of a nearest city. For example, an earthquake event and its data in row 1305 may include derived distance data of “16 km,” as a distance 1319, from a nearest city “Kaikoura, New Zealand” in derived row portion 1305 a.”) and Fig. 13 shows another example of deriving or inferring attributes and data such that the  derived or inferred country data 1392 corresponding to the latitude/longitude, date, and time in the set of data 1304 is merged together to form a final set of location attributes.
Regarding claim 11, Reynolds discloses:
A computer program product comprising: a computer-readable storage device and a computer-readable program code stored in the computer-readable storage device, said computer readable program code containing instructions executable by one or more processors of a computer system to implement a method for data migration utilizing inference of location attributes from data entries in structured data sets, said method comprising:  …receiving, from a plurality of servers in a cluster of servers…, a first set of data from a first data source and a second set of data from a second data source, said structured data sets comprising the first set of data and the second set of data, said first set of data including first location attributes having first location values in a first format, said second set of data including second location attributes having second location values in a second format at least by ([0072] “different entities 405 a, 405 b, and 405 n may each include a computing device 402 (e.g., representative of one or more servers and/or data processors) and one or more data storage devices 403 (e.g., representative of one or more database and/or data store technologies” [0102] “In the event that dataset 762 (and latitude 724 and longitude 726 data) are formatted differently than dataset 702, then latitude 724 and longitude 726 data may be converted to an atomized data format (e.g., compatible with RDF).”) and the cluster of servers are the servers represented by computing device 402; the first data set is one of the plurality of data sets, such as dataset 762 while the second dataset is another one of the data sets, such as dataset 702 which both comprise latitude and longitude data that are formatted differently;
identifying, by the plurality of servers in the cluster, geospatial and temporal information within the structured data sets utilizing machine learning at least by ([0072] “different entities 405 a, 405 b, and 405 n may each include a computing device 402 (e.g., representative of one or more servers and/or data processors) and one or more data storage devices 403 (e.g., representative of one or more database and/or data store technologies” [0106] describes inferring country codes (geospatial) based on machine learning (as in [0124] which also describes classifying dates (temporal information) for machine learning inferences) [0124] “As to the latter, a datatype, a data classification, etc., as well any dataset attribute, may be derived based on predictive inferences (e.g., via machine learning, etc.) using patterns in data 1203 a to 1203 d.”);
in response to a prior determination that the second format of the second location values does not match the first format of the first location values, … identifying, by the plurality of servers in the cluster using the parallel processing and using entity recognition, in the first and second sets of data within the structured data sets, location information pertaining to the first and second location attributes and the first and second location values therein respectively at least by ([0102] “In the event that dataset 762 (and latitude 724 and longitude 726 data) are formatted differently than dataset 702, then latitude 724 and longitude 726 data may be converted to an atomized data format (e.g., compatible with RDF). Thereafter, a supplemental atomized dataset can be formed by linking or integrating atomized latitude 724 and longitude 726 data with atomized population 704 data in an atomized version of dataset 702. Similarly, inference engine 780 may correlate columns 724 and 726 of dataset 722 to columns 761 and 764. As such, earthquake data in row 770 of dataset 762 may be correlated to the city in row 734 (“Springfield, Ill.”) of dataset 722 (or correlated to the city in row 716 of dataset 702 via the linking between columns 704 and 721). The earthquake data may be derived via latitude and longitude coordinate-to-earthquake correlations as supplemental data for dataset 702.”) and the location information pertaining to the first and second location attributes are the “city”, “lat” and “long” attributes;
… determining, by the plurality of servers in the cluster using the parallel processing, implied location information in the first and second sets of data based on the identified location information at least by ([0102] “Inference engine 780 may be configured to detect a pattern in the data of column 704 in dataset 702. For example, column 704 may be determined to relate to cities in Illinois based on the cities shown (or based on additional cities in column 704 that are not shown, such as Skokie, Cicero, etc.). Based on a determination by inference engine 780 that cities 704 likely are within Illinois, then row 716 may be annotated to include annotative portion (“IL”) 790 (e.g., as derived supplemental data) so that Springfield in row 716 can be uniquely identified as “Springfield, Ill.” rather than, for example, “Springfield, Nebr.” or “Springfield, Mass.”…As such, earthquake data in row 770 of dataset 762 may be correlated to the city in row 734 (“Springfield, Ill.”) of dataset 722 (or correlated to the city in row 716 of dataset 702 via the linking between columns 704 and 721). The earthquake data may be derived via latitude and longitude coordinate-to-earthquake correlations as supplemental data for dataset 702.”) and the implied location information could be the inferred or annotated information determined for each city record as including the name of a city in Illinois; the implied location information could also be the derived earthquake data as given in several examples throughout the reference,
wherein the determined implied location information includes N different sets of information in the first set of data or the second set of data, wherein the N sets of information provide N respective different clues suggesting N respective values of one location attribute, and wherein N is at least 2; …implied value/values… at least by ([0102] “Inference engine 780 may be configured to detect a pattern in the data of column 704 in dataset 702. For example, column 704 may be determined to relate to cities in Illinois based on the cities shown (or based on additional cities in column 704 that are not shown, such as Skokie, Cicero, etc.). Based on a determination by inference engine 780 that cities 704 likely are within Illinois, then row 716 may be annotated to include annotative portion (“IL”) 790 (e.g., as derived supplemental data) so that Springfield in row 716 can be uniquely identified as “Springfield, Ill.” rather than, for example, “Springfield, Nebr.” or “Springfield, Mass.” Further, inference engine 780 may correlate columns 704 and 721 of datasets 702 and 722, respectively. As such, each population number in rows 710 to 716 may be correlated to corresponding latitude 724 and longitude 726 coordinates in rows 730 to 734 of dataset 722. Thus, dataset 702 may be enriched by including latitude 724 and longitude 726 coordinates as a supplemental subset of data.”) and the listing of cities in column 704 provides more than 2 clues that suggest the additional attribute “IL” because they are all cities within Illinois, as well as further inferring the latitude and longitude of these cities and enriching dataset 702 to also include the latitude and longitude as inferred; that is, the suggesting of at least 2 respective values of one location attribute is the inference of the state “IL” as well as the latitude and longitude attributes for each of the cities listed in 704, respectively, and as shown in Fig. 7;
deriving, by the plurality of servers in the cluster, location values based on the identified location information and the determined implied location information using consolidation rules, resulting in a final set of location attributes and associated location values for the data entries at least by ([0102] “Inference engine 780 may be configured to detect a pattern in the data of column 704 in dataset 702. For example, column 704 may be determined to relate to cities in Illinois based on the cities shown (or based on additional cities in column 704 that are not shown, such as Skokie, Cicero, etc.). Based on a determination by inference engine 780 that cities 704 likely are within Illinois, then row 716 may be annotated to include annotative portion (“IL”) 790 (e.g., as derived supplemental data) so that Springfield in row 716 can be uniquely identified as “Springfield, Ill.” rather than, for example, “Springfield, Nebr.” or “Springfield, Mass.”…”) and the final set of location attributes are the cities listed in 721 of table 722 which included annotated portion “IL” for each city from table 702,
wherein the first set of data or the second set of data includes a file … at least by ([0047] “a file 105 is selected and dragged via pointer element 107 (e.g., a pointer device, or any other interface selection tool, including a finger) into file upload interface 106. Computing device 109 a may detect a data signal generated by the implementation of “create dataset” input 141, which may initiate creation of the dataset. As an example, consider that file 105 may include data formatted in a particular data arrangement, such as formatted as a CSV file, a TSV, an XLS file, or the like. In one example, a set of data 104 from file 105 may be uploaded, responsive to dragging icon of file 105 to upload interface 106, into collaborative dataset consolidation system 110, which, in turn, may generate an atomized dataset 142 a.”) and the files are any of the CSV, TSV, or XLS files,
wherein the file is formatted as a table at least by ([0060] “activation of the “create dataset” user input can initiate creation of an atomized dataset based on a set of data, which may include, for example, raw data in data file (e.g., a tabular data file, such as a XLS file, etc.).”);
and wherein each respective implied value of the N respective implied values is one or more data items inferred from the filename or inferred from content in a cell in the table at least by ([0102] “Inference engine 780 may be configured to detect a pattern in the data of column 704 in dataset 702. For example, column 704 may be determined to relate to cities in Illinois based on the cities shown (or based on additional cities in column 704 that are not shown, such as Skokie, Cicero, etc.). Based on a determination by inference engine 780 that cities 704 likely are within Illinois, then row 716 may be annotated to include annotative portion (“IL”) 790 (e.g., as derived supplemental data) so that Springfield in row 716 can be uniquely identified as “Springfield, Ill.” rather than, for example, “Springfield, Nebr.” or “Springfield, Mass.”…”) and the implied values are annotated portion “IL” determined based on each city from table 702 (inferred from content in a cell in the table);
associating, by plurality of servers in the cluster, the final set of location attributes with the data entries at least by ([0102] “Thereafter, a supplemental atomized dataset can be formed by linking or integrating atomized latitude 724 and longitude 726 data with atomized population 704 data in an atomized version of dataset 702. Similarly, inference engine 780 may correlate columns 724 and 726 of dataset 722 to columns 761 and 764.”) and the dataset 702 is enriched with location information, including the latitude and longitude values, from dataset 722 by associating and linking their attributes; and columns 724 and 726 of dataset 722 are associated to columns 761 and 764, which is now associate with columns 704 of dataset 702 to form final consolidated datasets as shown in Fig. 7;
… transforming, by the plurality of servers in the cluster using the parallel processing, the first and second location attributes and the first and second location values therein respectively, said …transforming utilizing the implied location information at least by ([0102] “Inference engine 780 may be configured to detect a pattern in the data of column 704 in dataset 702. For example, column 704 may be determined to relate to cities in Illinois based on the cities shown (or based on additional cities in column 704 that are not shown, such as Skokie, Cicero, etc.). Based on a determination by inference engine 780 that cities 704 likely are within Illinois, then row 716 may be annotated to include annotative portion (“IL”) 790 (e.g., as derived supplemental data) so that Springfield in row 716 can be uniquely identified as “Springfield, Ill.” rather than, for example, “Springfield, Nebr.” or “Springfield, Mass.”…”)”) and the transforming could be the annotating of the cities 704 with “IL” to form city record 721 which include this annotative portion;
said simultaneously transforming resulting in a format of the transformed first location values and a format of the transformed second location values being matched, said … transforming further resulting in the first and second data sets being changed to include the transformed first and second location attributes and the first and second location values therein respectively at least by ([0060] “Thereafter, a supplemental atomized dataset can be formed by linking or integrating atomized latitude 724 and longitude 726 data with atomized population 704 data in an atomized version of dataset 702. Similarly, inference engine 780 may correlate columns 724 and 726 of dataset 722 to columns 761 and 764. As such, earthquake data in row 770 of dataset 762 may be correlated to the city in row 734 (“Springfield, Ill.”) of dataset 722 (or correlated to the city in row 716 of dataset 702 via the linking between columns 704 and 721). The earthquake data may be derived via lat/long coordinate-to-earthquake correlations as supplemental data for dataset 702. Thus, new links (or triples) may be formed to supplement population data 704 with earthquake magnitude data 768” [0065] “normalizing the data set to create a normalized data set includes modifying the data set having one format to an adjusted format as a normalized data set, the adjusted format being different from the format. A data set may be normalized by identifying one or more columns of data in the data set, and modifying a format of the data corresponding to the columns to the same format. For example, data having different formatted dates in a data set may be normalized by changing the formats to a common format for the dates that can be processed by profile engine 326.” [0094] “For example, the transform engine 322 can generate a transformation script to transform a column of dates based on a recommendation from recommendation engine 308 to modify, or convert, the formats of the dates in the column.” [0131] “transforms listed in panel 404 may have been applied at the direction of the user (e.g., in response to an instruction to apply the transform) or may have been applied automatically);
automatically merging, by the plurality of servers in the cluster using the parallel processing, the changed first data set and the changed second data set to form a merged data set at least by ([0099] “dataset enrichment manager 636, according to some examples, may be configured to identify correlated datasets based on correlated attributes as determined, for example, by attribute correlator 663. The correlated attributes, as generated by attribute correlator 663, may facilitate the use of derived data or link-related data, as attributes, to form associate, combine, join, or merge datasets to form collaborative datasets”) and the merged data set are the collaborative datasets;
Reynolds fails to explicitly disclose “…automatically and simultaneously…; … using parallel processing; wherein said deriving location values comprises forming a Venn diagram comprising an intersection of the N respective … values of the one location attribute and determining that the intersection includes only one … value of the N respective … values, and wherein the final set of location attributes and associated location values includes the one location attribute whose associated value is the one … value included in the intersection; simultaneously transforming…; ; … a file having a filename; and automatically and simultaneously transmitting, by the plurality of servers in the cluster using the parallel processing, the merged data set to a plurality of computing systems for processing by an application in each computing system; wherein the plurality of servers in the cluster implement the parallel processing by working collectively as a single system in a Hadoop environment”
However, Oberbreckling teaches the following limitations, “…automatically and simultaneously…; … using parallel processing at least by ([0048] “Although a flowchart may describe the operations as a sequential process, many of the operations may be performed in parallel or concurrently.”);
and automatically and simultaneously transmitting, by the plurality of servers in the cluster using the parallel processing, the merged data set to a plurality of computing systems for processing by an application in each computing system at least by ([0127] “a transform engine can transform (e.g., repair and/or enrich) the normalized data based on the metadata. The resulting enriched data can be provided to the publish engine to be sent to one or more data targets”) ;
wherein the plurality of servers in the cluster implement the parallel processing by working collectively as a single system in a Hadoop environment at least by ([0041] “The data sources may be sampled, and the sampled data analyzed for enrichment, making large data sets more manageable. The identified data can be received and added to a distributed storage system (such as a Hadoop Distributed Storage (HDFS) system) accessible to the data enrichment service.” [0121] “Data enrichment service 302 may request data to be processed from data sources. The data sources may be sampled and the sampled data may be stored in a distributed storage system (such as a Hadoop Distributed Storage (HDFS) system) accessible to data enrichment service”).
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the teaching of Oberbreckling into the teaching of Reynolds because the references similarly disclose the associating of disparate datasets. Consequently, one of ordinary skill in the art would be motivated to further modify the system as in Reynolds to further include the processing of the merged dataset by applications and using an HDFS as in Oberbreckling in order to provide the users the applications with a unified view of the datasets.
Reynolds, Oberbreckling fail to explicitly disclose “wherein said deriving location values comprises forming a Venn diagram comprising an intersection of the N respective … values of the one location attribute and determining that the intersection includes only one … value of the N respective … values, and wherein the final set of location attributes and associated location values includes the one location attribute whose associated value is the one … value included in the intersection; … a file having a filename”
However, Soza teaches the above limitations at least by ([0273] “The statistics computed are summarised below, with reference to FIG. 14B which shows a Venn diagram illustrating the overlap between column values for two columns A and B. Here, “a” represents the set of distinct valid (non-null and not excluded) values that appear in column A, whilst “b” represents the set of distinct valid (non-null and not excluded) values of column B. Intersection “c” represents the set of unique intersecting values; that is, distinct valid values that are common to (appear in both) column A and column B” [0296] disclose that the values of different columns from the different sources can have different formats that are standardized and that they can include time/date values (location values)) and any of the distinct values that are common to both column A and column B are the only one values of the set of unique intersecting values that are common to both columns. That is, [0062] of the applicant’s specification describes that the location attribute/values can be dates or date values in a date format as analogously mentioned in Soza which states that the different columns/values can be time/dates in different formats that can be standardized. 
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the teaching of Soza into the teaching of Reynolds, Oberbreckling because the references similarly disclose the associating of disparate datasets. Consequently, one of ordinary skill in the art would be motivated to further modify the system as in the combination of references to further include the forming of a Venn diagram that shows intersecting values as in Soza in order to be able to display the overlapping data in a more clear and concise graphical manner.
Reynolds, Oberbreckling, Soza fail to explicitly disclose “… a file having a filename”
However, Kinsella teaches the above limitations at least by ([0038] “If the current geographical location data matches a geofence, the processor 300 can generate a filename comprising the label of the geofence at block 465. The processor 300 can receive camera data (block 430) and apply the filename comprising the label of the geofence to the camera data (block 435). Thus, when camera data is received at a location within the boundary of the geofence, the processor 300 can determine that the camera data is being captured within the geofence and can apply a filename comprising the geofence label to the camera data. With such a filename, the user of the mobile device can easily identify the content of the camera data as being associated with the geofence (for example, camera data having a filename containing the geofence label “School” can indicate that the camera data includes images of schoolmates, teachers, and classrooms)”) and the implied value is the label of the geofence, or location of the camera data, obtained from the file name.
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the teaching of Kinsella into the teaching of Reynolds, Oberbreckling, Soza because the references similarly disclose the processing of file data. Consequently, one of ordinary skill in the art would be motivated to further modify the system as in the combination of references to further include the identifying of location information from filenames as in Kinsella in order to allow the user to “easily identify the file for retrieval and organization without having to rename the file after the camera data is captured” (Kinsella, [0040]).
Regarding claim 18, Reynolds discloses:
A computer system, comprising: one or more processors; a memory coupled to the one or more processors; and a computer readable storage device coupled to the one or more processors; and a computer readable storage device coupled to the one or more processors,  said storage device containing instructions executable by the one or more processors via the memory to implement a method for data migration utilizing inference of location attributes from data entries in structured data sets, said method comprising: …receiving, from a plurality of servers in a cluster of servers…, a first set of data from a first data source and a second set of data from a second data source, said structured data sets comprising the first set of data and the second set of data, said first set of data including first location attributes having first location values in a first format, said second set of data including second location attributes having second location values in a second format at least by ([0072] “different entities 405 a, 405 b, and 405 n may each include a computing device 402 (e.g., representative of one or more servers and/or data processors) and one or more data storage devices 403 (e.g., representative of one or more database and/or data store technologies” [0102] “In the event that dataset 762 (and latitude 724 and longitude 726 data) are formatted differently than dataset 702, then latitude 724 and longitude 726 data may be converted to an atomized data format (e.g., compatible with RDF).”) and the cluster of servers are the servers represented by computing device 402; the first data set is one of the plurality of data sets, such as dataset 762 while the second dataset is another one of the data sets, such as dataset 702 which both comprise latitude and longitude data that are formatted differently;
identifying, by the plurality of servers in the cluster, geospatial and temporal information within the structured data sets utilizing machine learning at least by ([0072] “different entities 405 a, 405 b, and 405 n may each include a computing device 402 (e.g., representative of one or more servers and/or data processors) and one or more data storage devices 403 (e.g., representative of one or more database and/or data store technologies” [0106] describes inferring country codes (geospatial) based on machine learning (as in [0124] which also describes classifying dates (temporal information) for machine learning inferences) [0124] “As to the latter, a datatype, a data classification, etc., as well any dataset attribute, may be derived based on predictive inferences (e.g., via machine learning, etc.) using patterns in data 1203 a to 1203 d.”);
in response to a prior determination that the second format of the second location values does not match the first format of the first location values, … identifying, by the plurality of servers in the cluster using the parallel processing and using entity recognition, in the first and second sets of data within the structured data sets, location information pertaining to the first and second location attributes and the first and second location values therein respectively at least by ([0102] “In the event that dataset 762 (and latitude 724 and longitude 726 data) are formatted differently than dataset 702, then latitude 724 and longitude 726 data may be converted to an atomized data format (e.g., compatible with RDF). Thereafter, a supplemental atomized dataset can be formed by linking or integrating atomized latitude 724 and longitude 726 data with atomized population 704 data in an atomized version of dataset 702. Similarly, inference engine 780 may correlate columns 724 and 726 of dataset 722 to columns 761 and 764. As such, earthquake data in row 770 of dataset 762 may be correlated to the city in row 734 (“Springfield, Ill.”) of dataset 722 (or correlated to the city in row 716 of dataset 702 via the linking between columns 704 and 721). The earthquake data may be derived via latitude and longitude coordinate-to-earthquake correlations as supplemental data for dataset 702.”) and the location information pertaining to the first and second location attributes are the “city”, “lat” and “long” attributes;
… determining, by the plurality of servers in the cluster using the parallel processing, implied location information in the first and second sets of data based on the identified location information at least by ([0102] “Inference engine 780 may be configured to detect a pattern in the data of column 704 in dataset 702. For example, column 704 may be determined to relate to cities in Illinois based on the cities shown (or based on additional cities in column 704 that are not shown, such as Skokie, Cicero, etc.). Based on a determination by inference engine 780 that cities 704 likely are within Illinois, then row 716 may be annotated to include annotative portion (“IL”) 790 (e.g., as derived supplemental data) so that Springfield in row 716 can be uniquely identified as “Springfield, Ill.” rather than, for example, “Springfield, Nebr.” or “Springfield, Mass.”…As such, earthquake data in row 770 of dataset 762 may be correlated to the city in row 734 (“Springfield, Ill.”) of dataset 722 (or correlated to the city in row 716 of dataset 702 via the linking between columns 704 and 721). The earthquake data may be derived via latitude and longitude coordinate-to-earthquake correlations as supplemental data for dataset 702.”) and the implied location information could be the inferred or annotated information determined for each city record as including the name of a city in Illinois; the implied location information could also be the derived earthquake data as given in several examples throughout the reference,
wherein the determined implied location information includes N different sets of information in the first set of data or the second set of data, wherein the N sets of information provide N respective different clues suggesting N respective values of one location attribute, and wherein N is at least 2; …implied value/values… at least by ([0102] “Inference engine 780 may be configured to detect a pattern in the data of column 704 in dataset 702. For example, column 704 may be determined to relate to cities in Illinois based on the cities shown (or based on additional cities in column 704 that are not shown, such as Skokie, Cicero, etc.). Based on a determination by inference engine 780 that cities 704 likely are within Illinois, then row 716 may be annotated to include annotative portion (“IL”) 790 (e.g., as derived supplemental data) so that Springfield in row 716 can be uniquely identified as “Springfield, Ill.” rather than, for example, “Springfield, Nebr.” or “Springfield, Mass.” Further, inference engine 780 may correlate columns 704 and 721 of datasets 702 and 722, respectively. As such, each population number in rows 710 to 716 may be correlated to corresponding latitude 724 and longitude 726 coordinates in rows 730 to 734 of dataset 722. Thus, dataset 702 may be enriched by including latitude 724 and longitude 726 coordinates as a supplemental subset of data.”) and the listing of cities in column 704 provides more than 2 clues that suggest the additional attribute “IL” because they are all cities within Illinois, as well as further inferring the latitude and longitude of these cities and enriching dataset 702 to also include the latitude and longitude as inferred; that is, the suggesting of at least 2 respective values of one location attribute is the inference of the state “IL” as well as the latitude and longitude attributes for each of the cities listed in 704, respectively, and as shown in Fig. 7;
deriving, by the plurality of servers in the cluster, location values based on the identified location information and the determined implied location information using consolidation rules, resulting in a final set of location attributes and associated location values for the data entries at least by ([0102] “Inference engine 780 may be configured to detect a pattern in the data of column 704 in dataset 702. For example, column 704 may be determined to relate to cities in Illinois based on the cities shown (or based on additional cities in column 704 that are not shown, such as Skokie, Cicero, etc.). Based on a determination by inference engine 780 that cities 704 likely are within Illinois, then row 716 may be annotated to include annotative portion (“IL”) 790 (e.g., as derived supplemental data) so that Springfield in row 716 can be uniquely identified as “Springfield, Ill.” rather than, for example, “Springfield, Nebr.” or “Springfield, Mass.”…”) and the final set of location attributes are the cities listed in 721 of table 722 which included annotated portion “IL” for each city from table 702,
wherein the first set of data or the second set of data includes a file … at least by ([0047] “a file 105 is selected and dragged via pointer element 107 (e.g., a pointer device, or any other interface selection tool, including a finger) into file upload interface 106. Computing device 109 a may detect a data signal generated by the implementation of “create dataset” input 141, which may initiate creation of the dataset. As an example, consider that file 105 may include data formatted in a particular data arrangement, such as formatted as a CSV file, a TSV, an XLS file, or the like. In one example, a set of data 104 from file 105 may be uploaded, responsive to dragging icon of file 105 to upload interface 106, into collaborative dataset consolidation system 110, which, in turn, may generate an atomized dataset 142 a.”) and the files are any of the CSV, TSV, or XLS files,
wherein the file is formatted as a table at least by ([0060] “activation of the “create dataset” user input can initiate creation of an atomized dataset based on a set of data, which may include, for example, raw data in data file (e.g., a tabular data file, such as a XLS file, etc.).”);
and wherein each respective implied value of the N respective implied values is one or more data items inferred from the filename or inferred from content in a cell in the table at least by ([0102] “Inference engine 780 may be configured to detect a pattern in the data of column 704 in dataset 702. For example, column 704 may be determined to relate to cities in Illinois based on the cities shown (or based on additional cities in column 704 that are not shown, such as Skokie, Cicero, etc.). Based on a determination by inference engine 780 that cities 704 likely are within Illinois, then row 716 may be annotated to include annotative portion (“IL”) 790 (e.g., as derived supplemental data) so that Springfield in row 716 can be uniquely identified as “Springfield, Ill.” rather than, for example, “Springfield, Nebr.” or “Springfield, Mass.”…”) and the implied values are annotated portion “IL” determined based on each city from table 702 (inferred from content in a cell in the table);
associating, by plurality of servers in the cluster, the final set of location attributes with the data entries at least by ([0102] “Thereafter, a supplemental atomized dataset can be formed by linking or integrating atomized latitude 724 and longitude 726 data with atomized population 704 data in an atomized version of dataset 702. Similarly, inference engine 780 may correlate columns 724 and 726 of dataset 722 to columns 761 and 764.”) and the dataset 702 is enriched with location information, including the latitude and longitude values, from dataset 722 by associating and linking their attributes; and columns 724 and 726 of dataset 722 are associated to columns 761 and 764, which is now associate with columns 704 of dataset 702 to form final consolidated datasets as shown in Fig. 7;
… transforming, by the plurality of servers in the cluster using the parallel processing, the first and second location attributes and the first and second location values therein respectively, said …transforming utilizing the implied location information at least by ([0102] “Inference engine 780 may be configured to detect a pattern in the data of column 704 in dataset 702. For example, column 704 may be determined to relate to cities in Illinois based on the cities shown (or based on additional cities in column 704 that are not shown, such as Skokie, Cicero, etc.). Based on a determination by inference engine 780 that cities 704 likely are within Illinois, then row 716 may be annotated to include annotative portion (“IL”) 790 (e.g., as derived supplemental data) so that Springfield in row 716 can be uniquely identified as “Springfield, Ill.” rather than, for example, “Springfield, Nebr.” or “Springfield, Mass.”…”)”) and the transforming could be the annotating of the cities 704 with “IL” to form city record 721 which include this annotative portion;
said simultaneously transforming resulting in a format of the transformed first location values and a format of the transformed second location values being matched, said … transforming further resulting in the first and second data sets being changed to include the transformed first and second location attributes and the first and second location values therein respectively at least by ([0060] “Thereafter, a supplemental atomized dataset can be formed by linking or integrating atomized latitude 724 and longitude 726 data with atomized population 704 data in an atomized version of dataset 702. Similarly, inference engine 780 may correlate columns 724 and 726 of dataset 722 to columns 761 and 764. As such, earthquake data in row 770 of dataset 762 may be correlated to the city in row 734 (“Springfield, Ill.”) of dataset 722 (or correlated to the city in row 716 of dataset 702 via the linking between columns 704 and 721). The earthquake data may be derived via lat/long coordinate-to-earthquake correlations as supplemental data for dataset 702. Thus, new links (or triples) may be formed to supplement population data 704 with earthquake magnitude data 768” [0065] “normalizing the data set to create a normalized data set includes modifying the data set having one format to an adjusted format as a normalized data set, the adjusted format being different from the format. A data set may be normalized by identifying one or more columns of data in the data set, and modifying a format of the data corresponding to the columns to the same format. For example, data having different formatted dates in a data set may be normalized by changing the formats to a common format for the dates that can be processed by profile engine 326.” [0094] “For example, the transform engine 322 can generate a transformation script to transform a column of dates based on a recommendation from recommendation engine 308 to modify, or convert, the formats of the dates in the column.” [0131] “transforms listed in panel 404 may have been applied at the direction of the user (e.g., in response to an instruction to apply the transform) or may have been applied automatically);
automatically merging, by the plurality of servers in the cluster using the parallel processing, the changed first data set and the changed second data set to form a merged data set at least by ([0099] “dataset enrichment manager 636, according to some examples, may be configured to identify correlated datasets based on correlated attributes as determined, for example, by attribute correlator 663. The correlated attributes, as generated by attribute correlator 663, may facilitate the use of derived data or link-related data, as attributes, to form associate, combine, join, or merge datasets to form collaborative datasets”) and the merged data set are the collaborative datasets;
Reynolds fails to explicitly disclose “…automatically and simultaneously…; … using parallel processing; wherein said deriving location values comprises forming a Venn diagram comprising an intersection of the N respective … values of the one location attribute and determining that the intersection includes only one … value of the N respective … values, and wherein the final set of location attributes and associated location values includes the one location attribute whose associated value is the one … value included in the intersection; simultaneously transforming…; ; … a file having a filename; and automatically and simultaneously transmitting, by the plurality of servers in the cluster using the parallel processing, the merged data set to a plurality of computing systems for processing by an application in each computing system; wherein the plurality of servers in the cluster implement the parallel processing by working collectively as a single system in a Hadoop environment”
However, Oberbreckling teaches the following limitations, “…automatically and simultaneously…; … using parallel processing at least by ([0048] “Although a flowchart may describe the operations as a sequential process, many of the operations may be performed in parallel or concurrently.”);
and automatically and simultaneously transmitting, by the plurality of servers in the cluster using the parallel processing, the merged data set to a plurality of computing systems for processing by an application in each computing system at least by ([0127] “a transform engine can transform (e.g., repair and/or enrich) the normalized data based on the metadata. The resulting enriched data can be provided to the publish engine to be sent to one or more data targets”) ;
wherein the plurality of servers in the cluster implement the parallel processing by working collectively as a single system in a Hadoop environment at least by ([0041] “The data sources may be sampled, and the sampled data analyzed for enrichment, making large data sets more manageable. The identified data can be received and added to a distributed storage system (such as a Hadoop Distributed Storage (HDFS) system) accessible to the data enrichment service.” [0121] “Data enrichment service 302 may request data to be processed from data sources. The data sources may be sampled and the sampled data may be stored in a distributed storage system (such as a Hadoop Distributed Storage (HDFS) system) accessible to data enrichment service”).
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the teaching of Oberbreckling into the teaching of Reynolds because the references similarly disclose the associating of disparate datasets. Consequently, one of ordinary skill in the art would be motivated to further modify the system as in Reynolds to further include the processing of the merged dataset by applications and using an HDFS as in Oberbreckling in order to provide the users the applications with a unified view of the datasets.
Reynolds, Oberbreckling fail to explicitly disclose “wherein said deriving location values comprises forming a Venn diagram comprising an intersection of the N respective … values of the one location attribute and determining that the intersection includes only one … value of the N respective … values, and wherein the final set of location attributes and associated location values includes the one location attribute whose associated value is the one … value included in the intersection; … a file having a filename”
However, Soza teaches the above limitations at least by ([0273] “The statistics computed are summarised below, with reference to FIG. 14B which shows a Venn diagram illustrating the overlap between column values for two columns A and B. Here, “a” represents the set of distinct valid (non-null and not excluded) values that appear in column A, whilst “b” represents the set of distinct valid (non-null and not excluded) values of column B. Intersection “c” represents the set of unique intersecting values; that is, distinct valid values that are common to (appear in both) column A and column B” [0296] disclose that the values of different columns from the different sources can have different formats that are standardized and that they can include time/date values (location values)) and any of the distinct values that are common to both column A and column B are the only one values of the set of unique intersecting values that are common to both columns. That is, [0062] of the applicant’s specification describes that the location attribute/values can be dates or date values in a date format as analogously mentioned in Soza which states that the different columns/values can be time/dates in different formats that can be standardized. 
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the teaching of Soza into the teaching of Reynolds, Oberbreckling because the references similarly disclose the associating of disparate datasets. Consequently, one of ordinary skill in the art would be motivated to further modify the system as in the combination of references to further include the forming of a Venn diagram that shows intersecting values as in Soza in order to be able to display the overlapping data in a more clear and concise graphical manner.
Reynolds, Oberbreckling, Soza fail to explicitly disclose “… a file having a filename”
However, Kinsella teaches the above limitations at least by ([0038] “If the current geographical location data matches a geofence, the processor 300 can generate a filename comprising the label of the geofence at block 465. The processor 300 can receive camera data (block 430) and apply the filename comprising the label of the geofence to the camera data (block 435). Thus, when camera data is received at a location within the boundary of the geofence, the processor 300 can determine that the camera data is being captured within the geofence and can apply a filename comprising the geofence label to the camera data. With such a filename, the user of the mobile device can easily identify the content of the camera data as being associated with the geofence (for example, camera data having a filename containing the geofence label “School” can indicate that the camera data includes images of schoolmates, teachers, and classrooms)”) and the implied value is the label of the geofence, or location of the camera data, obtained from the file name.
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the teaching of Kinsella into the teaching of Reynolds, Oberbreckling, Soza because the references similarly disclose the processing of file data. Consequently, one of ordinary skill in the art would be motivated to further modify the system as in the combination of references to further include the identifying of location information from filenames as in Kinsella in order to allow the user to “easily identify the file for retrieval and organization without having to rename the file after the camera data is captured” (Kinsella, [0040]).
As per claim 22, claim 1 is incorporated, Reynolds further discloses:
wherein one respective implied value of the N respective implied values is a plurality of data items at least by ([0102] “Inference engine 780 may be configured to detect a pattern in the data of column 704 in dataset 702. For example, column 704 may be determined to relate to cities in Illinois based on the cities shown (or based on additional cities in column 704 that are not shown, such as Skokie, Cicero, etc.). Based on a determination by inference engine 780 that cities 704 likely are within Illinois, then row 716 may be annotated to include annotative portion (“IL”) 790 (e.g., as derived supplemental data) so that Springfield in row 716 can be uniquely identified as “Springfield, Ill.” rather than, for example, “Springfield, Nebr.” or “Springfield, Mass.”…”)”) and the one implied value “Illinois” is applied to each of the plurality of cities (plurality of data items), as shown in Fig. 7.
As per claim 23, claim 1 is incorporated, Kinsella further discloses:
wherein one respective implied value of the N respective implied values is inferred from the filename at least by ([0038] “If the current geographical location data matches a geofence, the processor 300 can generate a filename comprising the label of the geofence at block 465. The processor 300 can receive camera data (block 430) and apply the filename comprising the label of the geofence to the camera data (block 435). Thus, when camera data is received at a location within the boundary of the geofence, the processor 300 can determine that the camera data is being captured within the geofence and can apply a filename comprising the geofence label to the camera data. With such a filename, the user of the mobile device can easily identify the content of the camera data as being associated with the geofence (for example, camera data having a filename containing the geofence label “School” can indicate that the camera data includes images of schoolmates, teachers, and classrooms)”) and the implied value is the label of the geofence, or location of the camera data, obtained from the file name.
As per claim 24, claim 1 is incorporated, Reynolds further discloses:
wherein one respective implied value of the N respective implied values is inferred from content in one cell in the table at least by ([0102] “Inference engine 780 may be configured to detect a pattern in the data of column 704 in dataset 702. For example, column 704 may be determined to relate to cities in Illinois based on the cities shown (or based on additional cities in column 704 that are not shown, such as Skokie, Cicero, etc.). Based on a determination by inference engine 780 that cities 704 likely are within Illinois, then row 716 may be annotated to include annotative portion (“IL”) 790 (e.g., as derived supplemental data) so that Springfield in row 716 can be uniquely identified as “Springfield, Ill.” rather than, for example, “Springfield, Nebr.” or “Springfield, Mass.”…”) and the implied values are annotated portion “IL” determined based on a cell that specifies each city from table 702 (inferred from content in a cell in the table).
As per claim 25, claim 1 is incorporated, Reynolds further discloses:
wherein N is at least 3 at least by ([0102] “Inference engine 780 may be configured to detect a pattern in the data of column 704 in dataset 702. For example, column 704 may be determined to relate to cities in Illinois based on the cities shown (or based on additional cities in column 704 that are not shown, such as Skokie, Cicero, etc.). Based on a determination by inference engine 780 that cities 704 likely are within Illinois, then row 716 may be annotated to include annotative portion (“IL”) 790 (e.g., as derived supplemental data) so that Springfield in row 716 can be uniquely identified as “Springfield, Ill.” rather than, for example, “Springfield, Nebr.” or “Springfield, Mass.”…”) and the implied values are annotated portion “IL” in table 722 determined based on four different cities from table 702. That is, N is at least four in the provided example because four inferences for the location attributes were made.

Claims 26, 27, 28, 29, 30, 31, 32, 33 recite similar claim limitations as the method of claims 22, 23, 24, 25, except that they set forth the claimed invention as a computer program product and a computer system, respectively, as such they are rejected for the same reasons as applied hereinabove.

Claims 4, 14 are rejected under 35 U.S.C. 103 as being unpatentable over Reynolds (US 2018/0262864) in view of Oberbreckling (US 2018/0075104) and Soza (US 2018/0096001) and further in view of Witzel (US 8,649,777).
As per claim 4, claim 1 is incorporated, Reynolds, Oberbreckling, Soza fail to disclose “wherein said determining implied location information includes inferring, using dictionaries, the implied location information using at least one of: a missing latitude/longitude, a missing address, a missing city, a missing state, a missing country, and a missing time zone”
However, Witzel teaches the above limitation at least by ([col. 4, lines 5-13] “Preferably, said means for identifying a geographical position identifies an individual control node from where the mobile user entity is controlled. Said means for determining a time zone contain a data base in which the time zones of the respective control nodes are provided. Said means for determining the time zone can then use the data base to determine the time zone by deducing the time zone from the identified control node.” [col. 5, lines 48-57] “In connection with FIG. 6 the related decision logic in a database node such as the HLR is shown. In step 60 a map update location operation is received. In the next step 61 the presence user agent determines the geographical position of the serving MSC server by checking the MSC address country code. In step 62 it is asked whether the serving MSC server is located in the home country. If this is the case, the method ends in step 63. However, if the serving MSC server is not located in the home country, the database 43 may be queried in step 64 to derive the time zone of the serving MSC server.” [col. 6, lines 31-33] “In generic terms, any database offering a look-up whereto a call needs to be routed may be appropriate to serve the above purpose.”) and the dictionary is the database used to derive or infer the (missing) time zone information.
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the teaching of Witzel into the teaching of Reynolds, Oberbreckling, Soza because the references similarly disclose the inferring of data. Consequently, one of ordinary skill in the art would be motivated to further modify the system as in the combination of references to further include the deriving or inferring of location information such as time zones as in Witzel in order to fill in missing information to complete the dataset.
Claim 14 recites equivalent claim limitations as the method of claim 4, except that it sets forth the claimed invention as a computer program product, as such it is rejected for the same reason as applied hereinabove.

Claims 6, 15 are rejected under 35 U.S.C. 103 as being unpatentable over Reynolds (US 2018/0262864) in view of Oberbreckling (US 2018/0075104) and Soza (US 2018/0096001) and further in view of Mitra (US 2014/0337358).
As per claim 6, claim 1 is incorporated, Reynolds, Oberbreckling, Soza fail to disclose “said method further comprising: determining, by the plurality of servers in the cluster using the parallel processing, a confidence rating for each of the location value within the merged data set; and providing, by the plurality of servers in the cluster using the parallel processing, the merged data set with the confidence for each of the location value within the merged data set”
However, Mitra teaches the above limitations, at least by ([0022] “receiving data from multiple heterogeneous data sources, the data including a plurality of entity attribute values each associated with an entity and an attribute, each attribute having an associated attribute type and an attribute confidence score” [0037] “Once there is a common representation of how each heterogeneous data source is formatted, it is still possible for the data sources to represent the same attribute values in a different way (e.g., {Seattle, USA} vs. {Seattle, Wash., USA}).” [0040] “if one was lacking information about the residence location of a particular people entity but it was known that the individual worked for Microsoft Corporation and was a Software Developer by profession, by examining other entities matching the same known attributes, it would be relatively easy to infer that Redmond, Wash. was a probably residence location for the individual” [0045] “To infer missing or ambiguous attribute values for an entity, known partials corresponding to that entity are identified in the partial-to-partial similarity graph …The attributes are then examined, in aggregate, across these similar/neighboring partials to estimate the possible value of the attribute for the entity of interest. It should be noted that each attribute can be multi-valued and hence in the output for the missing or ambiguous attribute for an entity, a ranked list of possible values based on confidence scores computed from the partial-to-partial similarity graph may be provided”) and an output for missing attributes, which can include or infer location information, for an entity can be presented as a ranked list of possible values based on computed confidence scores for each attribute.
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the teaching of Mitra into the teaching of Reynolds, Oberbreckling, Soza because the references similarly disclose the inferring of data. Consequently, one of ordinary skill in the art would be motivated to further modify the system as in the combination of references to further include the determination of a confidence score to be used in ranking inferred data as in Mitra in order to infer the most relevant data to be included in the data set.
Claim 15 recites equivalent claim limitations as the method of claim 6, except that it sets forth the claimed invention as a computer program product, as such it is rejected for the same reason as applied hereinabove.

Response to Arguments
The following is in response to the amendment filed on 08/01/22.
Applicant’s arguments have been carefully and respectfully considered but are not persuasive.
Regarding 35 USC 103, on pg. 15, applicant argues that the cited references do not disclose “deriving, by the plurality of servers in the cluster, location values based on the identified location information and the determined implied location information using consolidation rules, resulting in a final set of location attributes and associated location values for the data entries, wherein said deriving location values comprises forming a Venn diagram comprising an intersection of the N respective implied values of the one location attribute and determining that the intersection includes only one implied value of the N respective implied values, and wherein the final set of location attributes and associated location values includes the one location attribute whose associated value is the one implied value included in the intersection, wherein the first set of data or the second set of data includes a file having a filename, wherein the file is formatted as a table, and wherein each respective implied value of the N respective implied values is one or more data items inferred from the filename or inferred from content in a cell in the table”.
In response to the preceding argument, examiner respectfully submits that Reynolds discloses the following limitations comprising: deriving, by the plurality of servers in the cluster, location values based on the identified location information and the determined implied location information using consolidation rules, resulting in a final set of location attributes and associated location values for the data entries at least by [0102] wherein the final set of location attributes are the cities listed in 721 of table 722 which included annotated portion “IL” for each city from table 702, wherein the first set of data or the second set of data includes a file … at least by [0047] discloses that the data can be input by files in a CSV, TSV, or XLS format, wherein the file is formatted as a table at least by [0060] discloses that the input data may be a file (e.g., a tabular data file, such as a XLS file, etc.).”); and wherein each respective implied value of the N respective implied values is one or more data items inferred from the filename or inferred from content in a cell in the table at least by [0102] which discloses implied values, which are annotated portion “IL” determined based on each city from table 702 (inferred from content in a cell in the table). Soza discloses the limitation, wherein said deriving location values comprises forming a Venn diagram comprising an intersection of the N respective implied values of the one location attribute and determining that the intersection includes only one implied value of the N respective implied values, and wherein the final set of location attributes and associated location values includes the one location attribute whose associated value is the one implied value included in the intersection at least by ([0273] which discloses a Venn diagraph for two columns, A and B, which includes an intersection of the data, in section C [0296] disclose that the values of different columns from the different sources can have different formats that are standardized and that they can include time/date values (location values)) and any of the distinct values that are common to both column A and column B are the only one values of the set of unique intersecting values that are common to both columns. That is, [0062] of the applicant’s specification describes that the location attribute/values can be dates or date values in a date format as analogously mentioned in Soza which states that the different columns/values can be time/dates in different formats that can be standardized. Lastly, Kinsella discloses … a file having a filename at least by [0038] which discloses generating a filename comprising the label of the geofence and applying the filename comprising the label of the geofence to the camera data.
Regarding 35 USC 103, on pg. 16, applicant argues that amendments similar to the newly-amended limitations were discussed in the interview.
In response to the preceding argument, examiner respectfully submits that no agreements were reached in the interview.
Regarding 35 USC 103, on pgs. 16-17, applicant argues that Soza fails to disclose implied values for the Venn diagram.
In response to the preceding argument, examiner respectfully submits that Reynolds discloses the implied values, as aforementioned in the rejection itself and in response to previous arguments. Here, the applicant appears to be attacking the references individually. That is, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references. See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986). That is, from the combination of references, Reynolds discloses the implied values in combination with Soza which discloses the limitations pertaining to the Venn diagram, as aforementioned.
Regarding 35 USC 103, on pgs. 17-20, applicant argues that the cited references do not disclose the newly-added claims.
In response to the preceding argument, examiner respectfully submits that Reynolds further discloses the limitations as recited in claims 22, 24-26, 28-30, 32-33. That is, Reynold discloses wherein one respective implied value of the N respective implied values is a plurality of data items at least by [0102] wherein the one implied value “Illinois” is applied to each of the plurality of cities (plurality of data items), as shown in Fig. 7. Further, Reynolds discloses wherein N is at least 3 at least by [0102] wherein the implied values are annotated portion “IL” in table 722 which are determined based on four different cities from table 702. That is, N is at least four in the provided example because four inferences for the location attributes were made. Lastly, Reynolds discloses wherein one respective implied value of the N respective implied values is a plurality of data items at least by [0102] wherein the implied values are annotated portion “IL” in table 722 determined based on each city from table 702 (inferred from content in a cell in the table). Kinsella discloses the limitations as recited in claims 23, 27, 31. That is, Kinsella discloses wherein one respective implied value of the N respective implied values is inferred from the filename at least by [0038] wherein the implied value is the label of the geofence, or location of the camera data, obtained from the file name.
	
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WILLIAM P BARTLETT whose telephone number is (469)295-9085.  The examiner can normally be reached on M-Th 11:30-8:30, F 11-3.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Usmaan Saeed can be reached on (571)272-4046.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/WILLIAM P BARTLETT/
Examiner, Art Unit 2169

/USMAAN SAEED/Supervisory Patent Examiner, Art Unit 2169