EXAMINER’S AMENDMENT
Authorization for this examiner’s amendment was given in an interview with Attorney Steven P. Skabrat  on 2/16/2021.
In claims: Please replace current amendment with below amendment for claims:




















	accessing, using one or more computer processors, a first dataset from a memory device, the first dataset comprising at least one column identifier and associated entries;
	accessing, using one or more computer processors, a dataset other than the first dataset, from a memory device, the accessed dataset other than the first dataset comprising at least one column identifier and associated entries, and for each accessed dataset other than the first dataset, performing:
	identifying, using one or more computer processors, a join key column identifier of the first dataset corresponding to a column identifier of the accessed dataset;
	determining, using one or more computer processors, a level of independence between the first dataset and the accessed dataset based at least, in part, on (i) an occurrence of a combination of an entry from the first dataset with multiple entries associated with the join key column identifier and (ii) an occurrence of a combination of an entry from the accessed dataset with multiple entries associated with the column identifier of the accessed dataset corresponding to the join key column identifier; and
	recommending, using one or more computer processors, at least one dataset candidate to join with the first dataset based on the determined level of independence between the first dataset and the accessed dataset, the recommending comprising:

	receiving an indication to join a selected dataset with the first dataset, 
	causing a formation of a composite dataset, using one or more computer processors, by joining the selected dataset with the first dataset, and
	storing the composite dataset into a memory device.

2. (Original)	The computer-implemented method of claim 1, comprising:
	determining, using one or more computer processors, the occurrence of a combination of an entry from the first dataset with multiple entries associated with the join key column identifier based, at least, in part on a number of occurrences of the combination of an entry from the first dataset with multiple entries associated with the join key column identifier and
	determining, using one or more computer processors, the occurrence of a combination of an entry from the accessed dataset with multiple entries associated with the column identifier of the accessed dataset corresponding to the join key column identifier based, at least, in part on a number of occurrences of a combination of an entry from the accessed dataset with multiple entries associated with the column identifier of the accessed dataset corresponding to the join key column identifier.



4. (Currently Amended)	The computer-implemented method of claim 1, wherein a format of a dataset is compatible with one or more of: Salesforce applications, comma-separated values (CSV) file, Hadoop, Structured Query Language (SQL) server, MySQL , Netezza, Oracle applications, or PostgreSQL

5. (Currently Amended)	The computer-implemented method of claim 1, comprising:
	applying a G-test of independence between the first dataset and each accessed dataset of accessed datasets other than the first dataset and
	applying a chi squared distribution to rank as join candidates of the accessed datasets other than the first dataset.

6. (Original)	The computer-implemented method of claim 5, comprising:
identifying join set candidates for the first dataset based on a pre-computed rank.

7. (Original)	The computer-implemented method of claim 5, comprising:
storing a pre-computed rank for use prior to recommending at least one dataset candidate to join with the first dataset.

8. (Currently Amended)	The computer-implemented method of claim 1, wherein determining, using one or more computer processors, a level of independence between wherein the determining comprises determining at least:
P(AB) = O(first grouping with A) * O(first grouping with B) / (P(first grouping) * N2) +  O(second grouping with A) * O(second grouping with B) / (P(second grouping) * N2), wherein 
P(AB) represents a probability of A and B, A and B are paired with a same grouping,
A represents the entry from the first dataset, 
B represent the entry from the accessed dataset, 
the first grouping is multiple entries associated with the join key column identifier,
the second grouping is multiple other entries associated with the join key column identifier, 
O represents an occurrence, and
N2 represents a number of rows in the first dataset.

9 – 22.	(Canceled) 

23. (Newly presented) A data management system, comprising:
at least one processing device and

	access, using one or more computer processors, a first dataset from a memory device, the first dataset comprising at least one column identifier and associated entries;
	access, using one or more computer processors, a dataset other than the first dataset, from a memory device, the accessed dataset other than the first dataset comprising at least one column identifier and associated entries, and for each accessed dataset other than the first dataset, performing:
	identifying, using one or more computer processors, a join key column identifier of the first dataset corresponding to a column identifier of the accessed dataset;
	determining, using one or more computer processors, a level of independence between the first dataset and the accessed dataset based at least, in part, on (i) an occurrence of a combination of an entry from the first dataset with multiple entries associated with the join key column identifier and (ii) an occurrence of a combination of an entry from the accessed dataset with multiple entries associated with the column identifier of the accessed dataset corresponding to the join key column identifier; and
	recommending, using one or more computer processors, at least one dataset candidate to join with the first dataset based on the determined level of independence between the first dataset and the accessed dataset, the recommending comprising:

	receiving an indication to join a selected dataset with the first dataset, 
	causing a formation of a composite dataset, using one or more computer processors, by joining the selected dataset with the first dataset, and
	storing the composite dataset into a memory device.

24. (New)	The data management system of claim 23, comprising the at least one memory coupled to the at least one processing device, the at least one memory having instructions stored thereon that, in response to execution by the at least one processing device, cause the at least one processing device to:
	determine, using one or more computer processors, the occurrence of a combination of an entry from the first dataset with multiple entries associated with the join key column identifier based, at least, in part on a number of occurrences of the combination of an entry from the first dataset with multiple entries associated with the join key column identifier and
	determine, using one or more computer processors, the occurrence of a combination of an entry from the accessed dataset with multiple entries associated with the column identifier of the accessed dataset corresponding to the join key column identifier based, at least, in part on a number of occurrences of a combination of an entry from the accessed dataset with multiple entries 

25. (New)	The data management system of claim 23, wherein entries associated with the join key column identifier include unique values and no null values.

26. (New)	The data management system of claim 23, wherein a format of a dataset is compatible with one or more of: Salesforce applications, comma-separated values (CSV) file,  Hadoop, Structure Query Language (SQL) server, MySQL, Netezza, Oracle applications, or PostgreSQL.

27. (New)	The data management system of claim 23, comprising the at least one memory  coupled to the at least one processing device, the at least one memory having instructions stored thereon that, in response to execution by the at least one processing device, cause the at least one processing device to:
	apply a G-test of independence between the first dataset and each accessed dataset of accessed datasets other than the first dataset and
	apply a chi squared distribution to rank as join candidates of the accessed datasets other than the first dataset.

28. (New) The data management system of claim 27, comprising the at least one memory coupled to the at least one processing device, the at least one memory having 
identify join set candidates for the first dataset based on a pre-computed rank.

29. (New)	The data management system of claim 27, comprising the at least one memory coupled to the at least one processing device, the at least one memory having instructions stored thereon that, in response to execution by the at least one processing device, cause the at least one processing device to:
store a pre-computed rank for use prior to recommending at least one dataset candidate to join with the first dataset.

30. (New)	The data management system of claim 23, wherein determining, using one or more computer processors, a level of independence between the first dataset and the accessed dataset based at least, in part, on (i) an occurrence of a combination of an entry from the first dataset with multiple entries associated with the join key column identifier and (ii) an occurrence of a combination of an entry from the accessed dataset with multiple entries associated with the column identifier of the accessed dataset corresponding to the join key column identifier, wherein the determining comprises determining at least:
P(AB) = O(first grouping with A) * O(first grouping with B) / (P(first grouping) * N2) +  O(second grouping with A) * O(second grouping with B) / (P(second grouping) * N2), wherein 

A represents the entry from the first dataset, 
B represent the entry from the accessed dataset, 
the first grouping is multiple entries associated with the join key column identifier,
the second grouping is multiple other entries associated with the join key column identifier, 
O represents an occurrence, and
N2 represents a number of rows in the first dataset.

31.  (New)	A non-transitory computer-readable storage medium having instructions encoded thereon which, when executed by at least one processing device, cause the at least one processing device to:
	access, using one or more computer processors, a first dataset from a memory device, the first dataset comprising at least one column identifier and associated entries;
	access, using one or more computer processors, a dataset other than the first dataset, from a memory device, the accessed dataset other than the first dataset comprising at least one column identifier and associated entries, and for each accessed dataset other than the first dataset, performing:
	identifying, using one or more computer processors, a join key column identifier of the first dataset corresponding to a column identifier of the accessed dataset;

	recommending, using one or more computer processors, at least one dataset candidate to join with the first dataset based on the determined level of independence between the first dataset and the accessed dataset, the recommending comprising:
	using one or more computer processors to cause display, in a remote graphical user interface, of a region displaying an identifier of the first dataset and at least one other join candidate dataset,
	receiving an indication to join a selected dataset with the first dataset, 
	causing a formation of a composite dataset, using one or more computer processors, by joining the selected dataset with the first dataset, and
	storing the composite dataset into a memory device.

32. (New)	The non-transitory computer-readable storage medium of claim 31 having instructions encoded thereon which, when executed by at least one processing device, cause the at least one processing device to:
	determine, using one or more computer processors, the occurrence of a combination of an entry from the first dataset with multiple entries associated with 
	determine, using one or more computer processors, the occurrence of a combination of an entry from the accessed dataset with multiple entries associated with the column identifier of the accessed dataset corresponding to the join key column identifier based, at least, in part on a number of occurrences of a combination of an entry from the accessed dataset with multiple entries associated with the column identifier of the accessed dataset corresponding to the join key column identifier.

33. (New)	The non-transitory computer-readable storage medium of claim 31, wherein entries associated with the join key column identifier include unique values and no null values.

34. (New)	The non-transitory computer-readable storage medium of claim 31, wherein a format of a dataset is compatible with one or more of: Salesforce applications, comma-separated values (CSV) file, Hadoop, Structure Query Language (SQL) server, MySQL, Netezza, Oracle applications, or PostgreSQL.

35. (New)	The non-transitory computer-readable storage medium of claim 31 having instructions encoded thereon which, when executed by at least one processing device, cause the at least one processing device to:

	apply a chi squared distribution to rank as join candidates of the accessed datasets other than the first dataset.

36. (New)	The non-transitory computer-readable storage medium of claim 35 having instructions encoded thereon which, when executed by at least one processing device, cause the at least one processing device to:
identify join set candidates for the first dataset based on a pre-computed rank.














Allowable Subject Matter
Claims 1-8, 23-36 are allowed.
The following is an examiner’s statement of reasons for allowance:
The prior arts of the record such as Berkman teaches to display digital objects that are independent information subsets of a resource broken into a plurality of digital objects based on statistical frequencies of patterns within the structure of source (paragraph 40) and constructing a new information resource from selected relevant finite element combined with other finite elements associated with the selected relevant finite element (paragraphs 112).  Hofmann teaches combine the distributions of coincident (xy) counts and independent (x and y) counts for each point-wise computation (paragraph 286). Baer teaches results of all the Individual Queries will be merged by entities to form one single Search Results Structure (paragraphs 549, 551).  
	However, none of the prior arts of the record teaches wherein:
for each accessed dataset other than the first dataset, performing:
	identifying, using one or more computer processors, a join key column identifier of the first dataset corresponding to a column identifier of the accessed dataset; determining, using one or more computer processors, a level of independence between the first dataset and the accessed dataset based at least, in part, on (i) an occurrence of a combination of an entry from the first dataset with multiple entries associated with the join key column identifier and (ii) an occurrence of a combination of an entry from the accessed dataset with multiple entries associated with the column identifier of the accessed dataset corresponding to the join key column identifier; and recommending, using one or .

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”









Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CAM-Y T TRUONG whose telephone number is (571)272-4042.  The examiner can normally be reached on (571) 272 4042.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Usmaan Saeed can be reached on (571) 272 4046.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/CAM Y T TRUONG/           Primary Examiner, Art Unit 2169