Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
2.	This action is in response to the filing with the office dated 10/04/2021.  
Claims 1-20 are pending in this office action.
Priority
3.	Applicant’s claim for the benefit of parent Application No. 15/264,377 filed on 09/13/2016 is acknowledged by the examiner.
4.	Applicant’s claim for the benefit of a prior-filed provisional Application No. 62/308,133 filed on 03/14/2016 is acknowledged by the examiner.
Information Disclosure Statement
5.	The information disclosure statement (IDS) submitted on 02/10/2022 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine  grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the "right to exclude" granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Langi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528,163 USPQ 644 (CCPA 1969).---
A timely filed terminal disclaimer in compliance with 37 CFR 1.321 (c) or 1.321 (d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See
MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP §§ 706.02(1) (1) - 706.02(1) (3) for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321 (b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-l.isp.

6.	Claims 1-20 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-14, 17-22 of Patent No. (US 11030183B2). Although the claims at issue are not identical, they are not patentably distinct from each other because the claims in the patent application either anticipate or render obvious the claims in the instant application. The table below is comparing the instant application to the patent application. NOTE: The only difference between the two applications are in bold.
 
Instant application (17/326/680)
Patent No (US 11030183B2)
1. A method, comprising: identifying, based at least in part on contents of a first data set comprising a first plurality of columns and contents of a second data set comprising a second plurality of columns: a plurality of matching columns comprising: one or more columns among the first plurality of columns; and corresponding one or more matching columns among the second plurality of columns, wherein the one or more columns among the first plurality of columns and the corresponding one or more matching columns among the second plurality of columns have at least some matching content; and a plurality of non-matching columns comprising: one or more columns among the first plurality of columns that do not match with any columns among the second plurality of columns; and one or more columns among the second plurality of columns that do not match with any columns among the first plurality of columns; obtaining a specification of: a first one or more non-matching columns to be appended to a second one or more non-matching columns, the first one or more non-matching columns and the second one or more non-matching columns being selected among the plurality of non-matching columns; a change to the plurality of matching columns; or both; and appending at least a portion of the first data set and at least a portion of the second data set according to the plurality of matching columns and the specification.
1. A method, comprising: identifying, based at least in part on contents of a first data set comprising a first plurality of columns and contents of a second data set comprising a second plurality of columns: a plurality of matching columns comprising: one or more columns among the first plurality of columns; and corresponding one or more matching columns among the second plurality of columns, wherein the one or more columns among the first plurality of columns and the corresponding one or more matching columns among the second plurality of columns have at least some matching content, wherein identifying the plurality of matching columns comprises performing a clustering technique on an extracted plurality of features of cells in the first plurality of columns of the first data set and an extracted plurality of features of cells in the second plurality of columns of the second data set; and a plurality of non-matching columns comprising: one or more columns among the first plurality of columns, that are included in the first data set and that do not match with any columns among the second plurality of columns; and one or more columns among the second plurality of columns, that are included in the second data set and that do not match with any columns among the first plurality of columns; obtaining a specification of: a selection of a first column among non-matching columns of the first data set to be appended to a second column among non-matching columns of the second data set, a selection of a first column among non-matching columns of the second data set to be appended to a second column among non-matching columns of the first data set, or both; a change to the plurality of matching columns; or both; and appending at least a portion of the first data set and at least a portion of the second data set according to the plurality of matching columns and the specification, including: appending at least some of the first one or more non-matching columns with at least some of the second one or more non-matching columns; appending matching columns that are not subject to the change to the plurality of matching columns; or both.
2. The method of claim 1, wherein the identification of the plurality of matching columns and the plurality of non-matching columns includes: extracting features of contents in the first plurality of columns of the first data set to generate a first plurality of results and extracting features of contents in the second plurality of columns of the second data set to generate a second plurality of results; 
and clustering the first plurality of results and the second plurality of results based on the extracted features.
2. The method of claim 1, wherein the identification of the plurality of matching columns and the plurality of non-matching columns includes: extracting the plurality of features of cells in the first plurality of columns of the first data set and extracting the plurality of features of cells in the second plurality of columns of the second data set.
3. The method of claim 2, wherein the clustering of the first plurality of results and the second plurality of results includes performing a K-means based clustering technique.
3. The method of claim 1, wherein performing the clustering technique includes performing a K-means based clustering technique.
4. The method of claim 2, further comprising identifying among clustering results one or more non-matching columns, one or more clusters with matching pairs, and one or more clusters with tied matching columns.
4. The method of claim 1, wherein the identification of the plurality of matching columns and the plurality of non-matching columns further includes identifying one or more clusters with matching pairs, and one or more clusters with tied matching columns in which multiple columns from the first or the second data set independently match at least one column from the other data set.
5. The method of claim 4, further comprising performing a pattern matching operation on the one or more clusters with the tied matching columns to identify one or more additional clusters with matching pairs.
5. The method of claim 4, further comprising performing a pattern matching operation on the one or more clusters with the tied matching columns to identify one or more additional clusters with matching pairs.

6. The method of claim 5, wherein the pattern matching operation is implemented as a TOPEI-based pattern matching operation.
6. The method of claim 5, wherein the pattern matching operation is implemented as a TOPEI-based pattern matching operation.
7. The method of claim 5, further comprising performing a title matching operation on one or more remaining clusters with the tied matching columns to identify one or more additional clusters with untied matching columns.
7. The method of claim 5, further comprising performing a title matching operation on one or more remaining clusters with the tied matching columns to identify one or more additional clusters with untied matching columns.
8. The method of claim 2, wherein the features that are extracted for a column include one or more of: number of spaces in cells of the column, number of punctuations in the cells of the column, average length of values in the cells of the column, variance of values in the cells of the column, total number of words in the cells of the column, average number of words in the cells of the column, and/or number of symbol type transitions in the cells of the column.
8. The method of claim 2, wherein the features that are extracted for a column include one or more of: number of spaces in cells of the column, number of punctuations in the cells of the column, average length of values in the cells of the column, variance of values in the cells of the column, total number of words in the cells of the column, average number of words in the cells of the column, and/or number of symbol type transitions in the cells of the column.
9. The method of claim 1, further comprising outputting the plurality of non-matching columns to be displayed.
9. The method of claim 1, further comprising outputting the plurality of non-matching columns to be displayed.	
10. The method of claim 1, further comprising causing a selection interface to be provided to a user, and the selection interface being configured for the user to: select a first column among non-matching columns of the first data set to be appended to a second column among non-matching columns of the second data set, select a first column among non-matching columns of the second data set to be appended to a second column among non-matching columns of the first data set, or both.
10. The method of claim 1, further comprising causing a selection interface to be provided to a user, and the selection interface being configured for the user to: select the first column among non-matching columns of the first data set to be appended to the second column among non-matching columns of the second data set, select the first column among non-matching columns of the second data set to be appended to the second column among non-matching columns of the first data set, or both.
11. The method of claim 1, wherein the plurality of matching columns are identified based at least in part on a plurality of N-gram feature vectors.
11. The method of claim 1, wherein the plurality of matching columns are identified based at least in part on a plurality of N-gram feature vectors.
12. The method of claim 1, further comprising: determining N-grams of entries in the first data set and in the second data set; forming a plurality of matrices based at least in part on the N-grams of the entries in the first data set and in the second data set; determining, based at least in part on the plurality of matrices, a first plurality of N-gram feature vectors corresponding to the first plurality of columns and a second plurality of N-gram feature vectors corresponding to the second plurality of columns; and comparing one or more vectors in the first plurality of N-gram feature vectors with one or more vectors in the second plurality of N-gram feature vectors to determine the matching columns.
12. The method of claim 1, further comprising: determining N-grams of entries in the first data set and in the second data set; forming a plurality of matrices based at least in part on the N-grams of the entries in the first data set and in the second data set; determining, based at least in part on the plurality of matrices, a first plurality of N-gram feature vectors corresponding to the first plurality of columns and a second plurality of N-gram feature vectors corresponding to the second plurality of columns; and comparing one or more vectors in the first plurality of N-gram feature vectors with one or more vectors in the second plurality of N-gram feature vectors to determine the matching columns.
13. The method of claim 12, wherein the comparing of the one or more vectors in the first plurality of N-gram feature vectors with the one or more vectors in the second plurality of N-gram feature vectors to determine the matching columns includes computing cosine similarities.
13. The method of claim 12, wherein the comparing of the one or more vectors in the first plurality of N-gram feature vectors with the one or more vectors in the second plurality of N-gram feature vectors to determine the matching columns includes computing cosine similarities.
14. The method of claim 12, wherein the comparing of the one or more vectors in the first plurality of N-gram feature vectors with the one or more vectors in the second plurality of N-gram feature vectors includes projecting the one or more vectors in the first plurality of N-gram feature vectors and the one or more vectors in the second plurality of N-gram feature vectors in a vector space.
14. The method of claim 12, wherein the comparing of the one or more vectors in the first plurality of N-gram feature vectors with the one or more vectors in the second plurality of N-gram feature vectors includes projecting the one or more vectors in the first plurality of N-gram feature vectors and the one or more vectors in the second plurality of N-gram feature vectors in a vector space.
15. A system, comprising: one or more processors configured to: identify, based at least in part on contents of a first data set comprising a first plurality of columns and contents of a second data set comprising a second plurality of columns: a plurality of matching columns comprising: one or more columns among the first plurality of columns; and corresponding one or more matching columns among the second plurality of columns, wherein the one or more columns among the first plurality of columns and the corresponding one or more matching columns among the second plurality of columns have at least some matching content; and a plurality of non-matching columns comprising: one or more columns among the first plurality of columns that do not match with any columns among the second plurality of columns; and one or more columns among the second plurality of columns that do not match with any columns among the first plurality of columns; obtain a specification of: a first one or more non-matching columns to be appended to a second one or more non-matching columns, the first one or more non-matching columns and the second one or more non-matching columns being selected among the plurality of non-matching columns; a change to the plurality of matching columns; or both; and append at least a portion of the first data set and at least a portion of the second data set according to the plurality of matching columns and the specification; and one or more memories coupled to the one or more processors and configured to provide the one or more processors with instructions.
17. A system, comprising: one or more processors configured to: identify, based at least in part on contents of a first data set comprising a first plurality of columns and contents of a second data set comprising a second plurality of columns: a plurality of matching columns comprising: one or more columns among the first plurality of columns; and corresponding one or more matching columns among the second plurality of columns, wherein the one or more columns among the first plurality of columns and the corresponding one or more matching columns among the second plurality of columns have at least some matching content, wherein to identify the plurality of matching columns includes to perform a clustering technique on an extracted plurality of features of cells in the first plurality of columns of the first data set and an extracted plurality of features of cells in the second plurality of columns of the second data set; and a plurality of non-matching columns comprising: one or more columns among the first plurality of columns, that are included in the first data set and that do not match with any columns among the second plurality of columns; and one or more columns among the second plurality of columns, that are included in the second data set and that do not match with any columns among the first plurality of columns; obtain a specification of: a selection of a first column among non-matching columns of the first data set to be appended to a second column among non-matching columns of the second data set, a selection of a first column among non-matching columns of the second data set to be appended to a second column among non-matching columns of the first data set, or both; a change to the plurality of matching columns; or both; and append at least a portion of the first data set and at least a portion of the second data set according to the plurality of matching columns and the specification, including to: append at least some of the first one or more non-matching columns with at least some of the second one or more non-matching columns; append matching columns that are not subject to the change to the plurality of matching columns; or both; and one or more memories coupled to the one or more processors and configured to provide the one or more processors with instructions.
16. The system of claim 15, wherein to identify the plurality of matching columns and the plurality of non-matching columns includes to: extract features of contents in the first plurality of columns of the first data set to generate a first plurality of results and extract features of contents in the second plurality of columns of the second data set to generate a second plurality of results; and cluster the first plurality of results and the second plurality of results based on the extracted features.
18. The system of claim 17, wherein to identify the plurality of matching columns and the plurality of non-matching columns includes to: extract the plurality of features of cells in the first plurality of columns of the first data set and extract the plurality of features of cells in the second plurality of columns of the second data set.
17. The system of claim 16, wherein to cluster the first plurality of results and the second plurality of results includes to perform a K-means based clustering technique.
19. The system of claim 17, wherein to perform the clustering technique includes to perform a K-means based clustering technique.
18. The system of claim 16, wherein the one or more processors are further configured to identify among clustering results one or more non-matching columns, one or more clusters with matching pairs, and one or more clusters with tied matching columns.
20. The system of claim 17, wherein to identify the plurality of matching columns and the plurality of non-matching columns further includes to identify one or more clusters with matching pairs, and one or more clusters with tied matching columns in which multiple columns from the first or the second data set independently match at least one column from the other data set.
19. The system of claim 18, wherein the one or more processors are further configured to perform a pattern matching operation on the one or more clusters with the tied matching columns to identify one or more additional clusters with matching pairs.
21. The system of claim 20, wherein the one or more processors are further configured to perform a pattern matching operation on the one or more clusters with the tied matching columns to identify one or more additional clusters with matching pairs.
20. The system of claim 19, wherein the pattern matching operation is implemented as a TOPEI-based pattern matching operation.
22. The system of claim 21, wherein the pattern matching operation is implemented as a TOPEI-based pattern matching operation.	


The independent claims 1 and 15 of instant application has all the limitations in anticipation of the co-pending application.
All the dependent claims of instant application has all the limitations in anticipation of the co-pending application except for claims 2 and 16 with the exception of “and cluster the first plurality of results and the second plurality of results based on the extracted features”.
However Jonathan; Young (US 20160055205 A1) teaches, (Fig. 2 and 3 Paragraph [0022] shows the grouping of matching columns when they have identical attribute name in both data sets and grouping of non-matching attributes is based on the data type and the field name. The clustering is done based on the data type, field name and the Array).
Therefore it would have been obvious to one of the ordinarily skilled in the art at the time of the filing of the invention to have modified the teachings of Welling et al, a method and system for providing a user who can select a subset of the tables, and the system creates a join tree with recommended joins between the tables selected by the user. The recommended joins are used to create a structured query language statement which is executed to return a result to the user as taught by Jonathan et al (Abstract), by doing so, more complex tables can be joined that arise from separate and distinct databases with different table structures without any design coordination as taught by Jonathan et al (Paragraph [0004]).

Claim Rejections - 35 USC § 103
The following is a quotation of pre-AIA  35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action: 
(a) A patent may not be obtained through the invention is not identically disclosed or described as set forth in section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the manner in which the invention was made

7. 	Claims 1-3, 8-10 and 15-17 are rejected under 35 U.S.C. 103 as being unpatentable over Welling; Girish ( US 20110255782 A1) in view of Jonathan; Young (US 20160055205 A1).

Regarding independent claim 1, Welling; Girish ( US 20110255782 A1) teaches,  a method, comprising: identifying, based at least in part on contents of a first data set comprising a first plurality of columns and contents of a second data set  comprising a second plurality of columns: (Paragraph [0115] a grouping system 442 and a data extraction system 452. [0246] The group identification system 730 uses a merging confidence that is determined from matching and mismatching criteria that is stored in the trained group database 732) a plurality of matching columns comprising: one or more columns among the first plurality of columns; and corresponding one or more matching columns among the second plurality of columns, (Paragraph [0149] System 534 is an image training system. The image training system 534 performs computations on the data in its document database corresponding to the image that are in place and generates datasets used by the image identification system for recognizing the content in source document images. (i.e., plurality of documents having plurality of columns that are stored in the database. Examiner interprets first plurality of columns are from the document data set and the second plurality of columns are from the trained group data set from training system for matching the columns), wherein the one or more columns among the first plurality of columns and the corresponding one or more matching columns among the second plurality of columns have at least some matching content (Paragraph [0246]  The group identification system 730 uses a merging confidence that is determined from matching and mismatching criteria that is stored in the trained group database 732. Matching criteria between two groups contribute towards an increased confidence to merge the groups, while mismatching criteria contribute towards keeping the groups separate. The final merging confidence is used to decide whether to merge the two groups. This process is repeated for every pair of groups, in each iteration step of the process.[0255-[0256] There also are features corresponding to the elements of certain composite features like table headers, table rows, and table columns. There are also features corresponding to form-specific items such as address blocks, phone numbers, and instruction blocks. The Feature data-structure supports operations to merge a set of features into another. For example, a label feature and a value feature that correspond to each other are merged into respective features (i.e., matching columns are merged based on the confidence of association between the labels and the columns); 
Welling et al fails to explicitly teach, and a plurality of non-matching columns comprising: one or more columns among the first plurality of columns that do not match with any columns among the second plurality of columns; and one or more columns among the second plurality of columns that do not match with any columns among the first plurality of columns; obtaining a specification of: a first one or more non-matching columns to be appended to a second one or more non-matching columns, the first one or more non-matching columns and the second one or more non-matching columns being selected among the plurality of non-matching columns; a change to the plurality of matching columns; or both; and appending at least a portion of the first data set and at least a portion of the second data set according to the plurality of matching columns and the specification.
Jonathan; Young (US 20160055205 A1) teaches, and a plurality of non-matching columns comprising: one or more columns among the first plurality of columns that do not match with any columns among the second plurality of columns; and one or more columns among the second plurality of columns that do not match with any columns among the first plurality of columns (Paragraph [0044] an appropriate similarity (or difference) metric, given the type of the data in the fields, can be used to compare each pair of values. [0045] The similarity measure for the pair of fields is compared to a threshold to determine if there is sufficient similarity in the values of the data fields to make further analysis worthwhile. This threshold can be a user-defined setting, for example. If the comparison indicates that there is insufficient similarity, as illustrated at 502, processing ends, as illustrated at 504. Otherwise, this pair of fields is identified as a potential candidate and further analysis is performed (i.e., the columns in the data sets are similar. Examiner interprets similar as not identical or non-matching));
obtaining a specification of: a first one or more non-matching columns to be appended to a second one or more non-matching columns, the first one or more non-matching columns and the second one or more non-matching columns being selected among the plurality of non-matching columns; a change to the plurality of matching columns; or both; and appending at least a portion of the first data set and at least a portion of the second data set according to the plurality of matching columns and the specification (Paragraph [0062] Given a score for a pair of fields F and G after step 706, a recommendation can be made 708 regarding that pair of fields. A minimum score optionally can be enforced by applying a threshold, such as 0.5, to the score for the pair of fields. Different pairs of fields can be ranked by their score as part of the recommendation. The computer system can present a user interface to a user that allows the user to select a pair of fields based on these scores. The user interface can include information about the different fields (e.g., field names, types and tables in which they reside) and optionally the score for each pair of fields (Examiner interprets obtaining a specification as recommendation/suggestion). (Paragraph [0063]-[0066] The recommendation (obtaining the specification based on the recommendation) generally will take one of four forms. For example, given a table A, this analysis could be performed by analyzing multiple other tables, of which one is table B. In such a case, suitable fields in table A are compared to suitable fields in other tables to identify good fields to support a join operation. The analysis identifies a field F in table A to be joined with a field G in a table B. [0068] As another example, given a table A and a field F, this analysis could be performed by analyzing multiple other tables, of which one is table B. In such a case, field F in table A is compared to suitable fields in other tables to identify good fields to support a join operation. The analysis identifies a field G in table B to be joined with the specified field F in table A. [0069] As another example, given a table A and a table B, this analysis could be performed by analyzing the fields of both tables A and B. In such a case, suitable fields in table A are compared to suitable fields in table B to identify good fields to support a join operation between the two tables A and B. The analysis identifies a field G in table B to be joined with a field F in table A. [0070] As another example, given a field F in a table A and a table B, this analysis could be performed by analyzing the fields of table B with respect to field F of table A. In such a case, suitable fields in table B are compared to field F in table A to identify good fields to support a join operation using field F in table A and a field in table B. The analysis identifies a field G in table B to be joined with the specified field F in table A. Also see Fig. 2, 3 Paragraph [0034] shows appending the matching columns and non-matching columns based on the user selection).
Therefore it would have been obvious to one of the ordinarily skilled in the art at the time of the filing of the invention to have modified the teachings of Welling et al, a method and system for providing a user who can select a subset of the tables, and the system creates a join tree with recommended joins between the tables selected by the user. The recommended joins are used to create a structured query language statement which is executed to return a result to the user as taught by Jonathan et al (Abstract), by doing so, more complex tables can be joined that arise from separate and distinct databases with different table structures without any design coordination as taught by Jonathan et al (Paragraph [0004]).

Regarding dependent claim 2, Welling et al and Jonathan et al teach, the method of claim 1. 
Jonathan et al further teaches, wherein the identification of the plurality of matching columns and the plurality of non-matching columns includes: extracting features of contents in the first plurality of columns of the first data set to generate a first plurality of results and extracting features of contents in the second plurality of columns of the second data set to generate a second plurality of results (Paragraph [0046] the name, email address and employer fields are extracted from the people table 200 in FIG. 2, and the name, email and author affiliation fields are extracted from the document table 220 in FIG. 2 are extracted (plurality of features or attributes are extracted). [0048] The statistical processing engine selects each possible pair of fields from each data set, excluding the data sets that are not likely to enable a realistic join operation (identification of plurality of matching columns), and performs an analysis using data from the selected pair, and repeats this process for each possible pair. Accordingly, a field from the first data set is selected at 400 and a field from the second data set is selected at 402); 
and clustering the first plurality of results and the second plurality of results based on the extracted features (Fig. 2 and 3 Paragraph [0022] shows the grouping of matching columns when they have identical attribute name in both data sets and grouping of non-matching attributes is based on the data type and the field name. the clustering is done based on the data type, field name and the Array).

Regarding dependent claim 3, Welling et al and Jonathan et al teach, the method of claim 2. 
Welling et al further teaches, wherein the clustering of the first plurality of results and the second plurality of results includes performing a K-means based clustering technique ((Fig, 15) [0184], [0212] the class identification system 630 works much like the CTI class identification system 630 compares the code vectors for each quadrant of source documents with code vectors in the trained class database 632 using the K-means approach (first result). The trained class database 632 is organized into clusters (clustering) representing documents in the training set with similar image properties as defined by the feature vectors (second results). Each document that requires training is manually identified and the extracted text (extracted feature).

Regarding dependent claim 8, Welling et al and Jonathan et al teach, the method of claim 2. 
Welling et al further teaches, wherein the features that are extracted for a column include one or more of: number of spaces in cells of the column, number of punctuations in the cells of the column, average length of values in the cells of the column (Paragraph [0145] the image identification system 530 selects the text value from a contextually limited lexicon (words and characters) and special characters that is stored in the trained image database 532), variance of values in the cells of the column (Paragraph [0131] performs Gaussian smoothening with a filter using variance of 0.5 and a 3.times.3 kernel or convolution mask on the distance transform on pixel (i.e. a cell of a column), total number of words in the cells of the column, average number of words in the cells of the column, and/or number of symbol type transitions in the cells of the column (Paragraph [0145] the image identification system 530 selects the text value from a contextually limited lexicon (words and characters) and special characters that is stored in the trained image database 532). 

Regarding dependent claim 9, Welling et al and Jonathan et al teach, the method of claim 1. 
Jonathan et al further teaches, further comprising outputting the plurality of non-matching columns to be displayed (Paragraph [0062] Given a score for a pair of fields F and G after step 706, a recommendation can be made 708 regarding that pair of fields. A minimum score optionally can be enforced by applying a threshold, such as 0.5, to the score for the pair of fields. Different pairs of fields can be ranked by their score as part of the recommendation. The computer system can present a user interface to a user that allows the user to select a pair of fields based on these scores. The user interface can include information about the different fields (e.g., field names, types and tables in which they reside) and optionally the score for each pair of fields (Examiner interprets obtaining a specification as recommendation/suggestion)).

Regarding dependent claim 10, Welling et al and Jonathan et al teach, the method of claim 1. 
Jonathan et al further teaches, further comprising causing a selection interface to be provided to a user, and the selection interface being configured for the user to: select a first column among non-matching columns of the first data set to be appended to a second column among non-matching columns of the second data set, select a first column among non-matching columns of the second data set to be appended to a second column among non-matching columns of the first data set, or both (Paragraph [0063]-[0066] The recommendation (obtaining the specification based on the recommendation) generally will take one of four forms. For example, given a table A, this analysis could be performed by analyzing multiple other tables, of which one is table B. In such a case, suitable fields in table A are compared to suitable fields in other tables to identify good fields to support a join operation. The analysis identifies a field F in table A to be joined with a field G in a table B. [0068] As another example, given a table A and a field F, this analysis could be performed by analyzing multiple other tables, of which one is table B. In such a case, field F in table A is compared to suitable fields in other tables to identify good fields to support a join operation. The analysis identifies a field G in table B to be joined with the specified field F in table A. [0069] As another example, given a table A and a table B, this analysis could be performed by analyzing the fields of both tables A and B. In such a case, suitable fields in table A are compared to suitable fields in table B to identify good fields to support a join operation between the two tables A and B. The analysis identifies a field G in table B to be joined with a field F in table A. [0070] As another example, given a field F in a table A and a table B, this analysis could be performed by analyzing the fields of table B with respect to field F of table A. In such a case, suitable fields in table B are compared to field F in table A to identify good fields to support a join operation using field F in table A and a field in table B. The analysis identifies a field G in table B to be joined with the specified field F in table A. Also see Fig. 2, 3 Paragraph [0034] shows appending the matching columns and non-matching columns based on the user selection).

Regarding independent claim 15, Welling; Girish ( US 20110255782 A1) teaches,  a system, comprising: one or more processors (Paragraph [0337] An exemplary document data extraction system may include a host computer 1801 that contains volatile memory, 1802, a persistent storage device such as a hard drive, 1808, a processor, 1803, and a network interface, configured to: identify, based at least in part on contents of a first data set comprising a first plurality of columns and contents of a second data set comprising a second plurality of columns (Paragraph [0115] a grouping system 442 and a data extraction system 452. [0246] The group identification system 730 uses a merging confidence that is determined from matching and mismatching criteria that is stored in the trained group database 732): a plurality of matching columns comprising: one or more columns among the first plurality of columns; and corresponding one or more matching columns among the second plurality of columns (Paragraph [0149] System 534 is an image training system. The image training system 534 performs computations on the data in its document database corresponding to the image that are in place and generates datasets used by the image identification system for recognizing the content in source document images. (i.e., plurality of documents having plurality of columns that are stored in the database. Examiner interprets first plurality of columns are from the document data set and the second plurality of columns are from the trained group data set from training system for matching the columns), wherein the one or more columns among the first plurality of columns and the corresponding one or more matching columns among the second plurality of columns have at least some matching content (Paragraph [0246]  The group identification system 730 uses a merging confidence that is determined from matching and mismatching criteria that is stored in the trained group database 732. Matching criteria between two groups contribute towards an increased confidence to merge the groups, while mismatching criteria contribute towards keeping the groups separate. The final merging confidence is used to decide whether to merge the two groups. This process is repeated for every pair of groups, in each iteration step of the process.[0255-[0256] There also are features corresponding to the elements of certain composite features like table headers, table rows, and table columns. There are also features corresponding to form-specific items such as address blocks, phone numbers, and instruction blocks. The Feature data-structure supports operations to merge a set of features into another. For example, a label feature and a value feature that correspond to each other are merged into respective features (i.e., matching columns are merged based on the confidence of association between the labels and the columns); 
and one or more memories coupled to the one or more processors and configured to provide the one or more processors with instructions (Paragraph [0339] the flow charts included in this application describe the logical steps that are embodied as computer executable instructions that could be stored in computer readable medium, such as various memories and disks, that, when executed by a processor, such as a server or server cluster, cause the processor to perform the logical steps).
Welling et al fails to explicitly teach, and a plurality of non-matching columns comprising: one or more columns among the first plurality of columns that do not match with any columns among the second plurality of columns; and one or more columns among the second plurality of columns that do not match with any columns among the first plurality of columns; obtain a specification of: a first one or more non-matching columns to be appended to a second one or more non-matching columns, the first one or more non-matching columns and the second one or more non-matching columns being selected among the plurality of non-matching columns; a change to the plurality of matching columns; or both; and append at least a portion of the first data set and at least a portion of the second data set according to the plurality of matching columns and the specification.
Jonathan; Young (US 20160055205 A1) teaches, and a plurality of non-matching columns comprising: one or more columns among the first plurality of columns that do not match with any columns among the second plurality of columns; and one or more columns among the second plurality of columns that do not match with any columns among the first plurality of columns (Paragraph [0044] an appropriate similarity (or difference) metric, given the type of the data in the fields, can be used to compare each pair of values. [0045] The similarity measure for the pair of fields is compared to a threshold to determine if there is sufficient similarity in the values of the data fields to make further analysis worthwhile. This threshold can be a user-defined setting, for example. If the comparison indicates that there is insufficient similarity, as illustrated at 502, processing ends, as illustrated at 504. Otherwise, this pair of fields is identified as a potential candidate and further analysis is performed (i.e., the columns in the data sets are similar. Examiner interprets similar as not identical or non-matching));
obtain a specification of: a first one or more non-matching columns to be appended to a second one or more non-matching columns, the first one or more non-matching columns and the second one or more non-matching columns being selected among the plurality of non-matching columns; a change to the plurality of matching columns; or both; and append at least a portion of the first data set and at least a portion of the second data set according to the plurality of matching columns and the specification (Paragraph [0062] Given a score for a pair of fields F and G after step 706, a recommendation can be made 708 regarding that pair of fields. A minimum score optionally can be enforced by applying a threshold, such as 0.5, to the score for the pair of fields. Different pairs of fields can be ranked by their score as part of the recommendation. The computer system can present a user interface to a user that allows the user to select a pair of fields based on these scores. The user interface can include information about the different fields (e.g., field names, types and tables in which they reside) and optionally the score for each pair of fields (Examiner interprets obtaining a specification as recommendation/suggestion). (Paragraph [0063]-[0066] The recommendation (obtaining the specification based on the recommendation) generally will take one of four forms. For example, given a table A, this analysis could be performed by analyzing multiple other tables, of which one is table B. In such a case, suitable fields in table A are compared to suitable fields in other tables to identify good fields to support a join operation. The analysis identifies a field F in table A to be joined with a field G in a table B. [0068] As another example, given a table A and a field F, this analysis could be performed by analyzing multiple other tables, of which one is table B. In such a case, field F in table A is compared to suitable fields in other tables to identify good fields to support a join operation. The analysis identifies a field G in table B to be joined with the specified field F in table A. [0069] As another example, given a table A and a table B, this analysis could be performed by analyzing the fields of both tables A and B. In such a case, suitable fields in table A are compared to suitable fields in table B to identify good fields to support a join operation between the two tables A and B. The analysis identifies a field G in table B to be joined with a field F in table A. [0070] As another example, given a field F in a table A and a table B, this analysis could be performed by analyzing the fields of table B with respect to field F of table A. In such a case, suitable fields in table B are compared to field F in table A to identify good fields to support a join operation using field F in table A and a field in table B. The analysis identifies a field G in table B to be joined with the specified field F in table A. Also see Fig. 2, 3 Paragraph [0034] shows appending the matching columns and non-matching columns based on the user selection).
Therefore it would have been obvious to one of the ordinarily skilled in the art at the time of the filing of the invention to have modified the teachings of Welling et al, a method and system for providing a user who can select a subset of the tables, and the system creates a join tree with recommended joins between the tables selected by the user. The recommended joins are used to create a structured query language statement which is executed to return a result to the user as taught by Jonathan et al (Abstract), by doing so, more complex tables can be joined that arise from separate and distinct databases with different table structures without any design coordination as taught by Jonathan et al (Paragraph [0004]).

Regarding dependent claim 16, Welling et al and Jonathan et al teach, the system of claim 15. 
Jonathan et al further teaches, wherein to identify the plurality of matching columns and the plurality of non-matching columns includes to: extract features of contents in the first plurality of columns of the first data set to generate a first plurality of results and extract features of contents in the second plurality of columns of the second data set to generate a second plurality of results (Paragraph [0046] the name, email address and employer fields are extracted from the people table 200 in FIG. 2, and the name, email and author affiliation fields are extracted from the document table 220 in FIG. 2 are extracted (plurality of features or attributes are extracted). [0048] The statistical processing engine selects each possible pair of fields from each data set, excluding the data sets that are not likely to enable a realistic join operation (identification of plurality of matching columns), and performs an analysis using data from the selected pair, and repeats this process for each possible pair. Accordingly, a field from the first data set is selected at 400 and a field from the second data set is selected at 402); 
and cluster the first plurality of results and the second plurality of results based on the extracted features (Fig. 2 and 3 Paragraph [0022] shows the grouping of matching columns when they have identical attribute name in both data sets and grouping of non-matching attributes is based on the data type and the field name. the clustering is done based on the data type, field name and the Array).

Regarding dependent claim 17, Welling et al and Jonathan et al teach, the system of claim 16. 
Welling et al further teaches, wherein to cluster the first plurality of results and the second plurality of results includes to perform a K-means based clustering technique (Fig, 15) [0184], [0212] the class identification system 630 works much like the CTI class identification system 630 compares the code vectors for each quadrant of source documents with code vectors in the trained class database 632 using the K-means approach (first result). The trained class database 632 is organized into clusters (clustering) representing documents in the training set with similar image properties as defined by the feature vectors (second results). Each document that requires training is manually identified and the extracted text (extracted feature).

8. 	Claims 4, 5, 7, 11, 12-14, 18 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Welling; Girish ( US 20110255782 A1) in view of Jonathan; Young (US 20160055205 A1) and in further view of Abuelsaad; Tamer E. (US 20150032609 A1).

Regarding dependent claim 4, Welling et al and Jonathan et al teach, the method of claim 2. 
Welling et al and Jonathan et al  fails to explicitly teach, further comprising identifying among clustering results one or more non-matching columns, one or more clusters with matching pairs, and one or more clusters with tied matching columns.
Abuelsaad; Tamer E. (US 20150032609 A1) teaches, further comprising identifying among clustering results one or more non-matching columns, one or more clusters with matching pairs, and one or more clusters with tied matching columns (Paragraph [0070] teaches, identification of matching columns. Paragraph [0064] teaches, non-matching columns and tied matching columns.  If a column contains multiple data types, column ID program 106 may determine that the column contains a least generic applicable data type (Examiner interprets tied columns as patter matching, in this case matching data types). For example, if the column contains both days of the week and names of states, the data type of the column may be determined to be an ADT (An Abstract Data Type (ADT) is a data type which identifies data with a semantically-valuable classification as taught in Paragraph [0017]) corresponding to dictionary words. If a column contains multiple data types, the column ID program 106 may match the column with multiple ADTs by grouping entries with patterns in common, in which case the column ID program 106 may group the entries logically or by rearranging them within the column data. Paragraph [0083], [0084] Column ID program 106 may identify multiple suspected matches (step 410) (where matching multiple columns based on pattern is a tied matching column), in which case column ID program 106 narrows the results (step 414) by determining how closely the column data complies with the ADT pattern definition of each suspected match. The suspected match with which the column data most closely complies is determined to be the ADT of the column (step 416)). 
Therefore it would have been obvious to one of the ordinarily skilled in the art at the time of the filing of the invention to have modified the teachings of Barsness et al, Chang2011 and Jonathan et al to provide a method, to resolve a tie using column ID program, in the event that the column data is equally compliant with the ADT pattern definition of more than one suspected match as taught in (Paragraph [0084])
It would have been obvious to one of the ordinary skill in the art, to provide a method to resolve a tie using column ID program, in the event that the column data is equally compliant with the ADT pattern definition of more than one suspected match as taught in (Paragraph [0084]).

Regarding dependent claim 5, Welling et al, Jonathan et al and Abuelsaad et al teach, the method of claim 4. 
Abuelsaad et al further teaches, further comprising performing a pattern matching operation on the one or more clusters with the tied matching columns to identify one or more additional clusters with matching pairs (Fig. 4 Paragraph [0017], [0064] [0069], [0070] column ID program inspects the column data to determine the formatting patterns followed by the entries (step 406) with the tied matching columns. Column ID program 106 matches the formatting patterns of the column data to known patterns of ADTs (step 412). The column data may include entries with different formats. For example, column data may include entries of "1-123-456-7890" and "1 (123) 456-7890." Despite the differences in formatting, both comply with the formatting conventions of an ADT corresponding to phone numbers).

Regarding dependent claim 7, Welling et al and Jonathan et al and Abuelsaad et al teach, the method of claim 5. 
Abuelsaad et al teaches, further comprising performing a title matching operation on one or more remaining clusters with the tied matching columns to identify one or more additional clusters with untied matching columns (Paragraph [0064] If there is no detectible pattern to the column data, column ID program 106 may associate the column with an ADT corresponding to unknown data or raw data (not grouped into cluster). Also see [0070], [0090] where if the confidence score exceeds the threshold (YES branch, decision 508) (here yes branch is grouping/clustering the matching columns, and If the confidence score does not exceed the threshold (NO branch, decision 508), comparison program 108 compares the first data set to a plurality of data sets (here NO branch is grouping/clustering the non-matching columns). Paragraph  [0084] In the event that the column data is equally compliant with the ADT pattern definition of more than one suspected match, then column ID program 106 may resolve the tie (i.e., identifying untied matching columns), for example, through additional context or by prompting a user for resolution. Alternatively, column ID program 106 may leave the tie unresolved, in which case it may associate the column with multiple ADTs, no ADTs, and/or an identifier indicating a tie. Additional context may include, for example, the ADTs of any other columns of the data set, as certain columns may be expected to co-occur within a data set (e.g., first names and last names), or a probabilistic analysis based on which ADT is more common (i.e., performing a match based on the additional clusters/column/titles name such as first names and last names)).

Regarding dependent claim 11, Welling et al and Jonathan et al teach, the method of claim 1. 
Abuelsaad et al teaches, wherein the plurality of matching columns are identified based at least in part on a plurality of N-gram feature vectors (Paragraph [0031]  use a variety of semantic analysis techniques, including tokenization, synonym analysis, acronym expansion, and n-gram analysis, which may be used separately or in combination for plurality of data sets).

Regarding dependent claim 12, Welling et al and Jonathan et al teach, the method of claim 1. 
Abuelsaad et al teaches, further comprising: determining N-grams of entries in the first data set and in the second data set; forming a plurality of matrices based at least in part on the N-grams of the entries in the first data set and in the second data set (Paragraph [0031]  use a variety of semantic analysis techniques, including tokenization, synonym analysis, acronym expansion, and n-gram analysis, which may be used separately or in combination for plurality of data sets);
determining, based at least in part on the plurality of matrices, a first plurality of N-gram feature vectors corresponding to the first plurality of columns and a second plurality of N-gram feature vectors corresponding to the second plurality of columns (Paragraphs [0099]-[0100] extracting features/data from each of the plurality of data sets); 
and comparing one or more vectors in the first plurality of N-gram feature vectors with one or more vectors in the second plurality of N-gram feature vectors to determine the matching columns (Paragraph [0100]-10103] comparing each header/data from first data set with each header/data of the second data set). Also see Paragraph [0081] Column ID program 106 may determine suspected matches by using semantic analysis techniques on the entire text of the ADT identifier, the tokens into which the ADT identifier was broken, and/or the variations, combinations, and/or permutations of those tokens. Column ID program 106 may also use n-gram analysis in order to infer many related terms from a single term. For example, the unigram "phone" occurs in the context of the bigram "phone number" with a high TF/IDF frequency, so column ID program 106 can infer "work phone number," "phone number," and "work number" from "work phone." (i.e., the column ID program is performed on all the datasets as taught in Paragraph [0036])).
Jonathan et al also further teaches, further comprising: determining N-grams of entries in the first data set and in the second data set; forming a plurality of matrices based at least in part on the N-grams of the entries in the first data set and in the second data set; determining, based at least in part on the plurality of matrices, a first plurality of N-gram feature vectors corresponding to the first plurality of columns and a second plurality of N-gram feature vectors corresponding to the second plurality of columns (Paragraph [0029] The analysis data 106 can be structured as one or more ordered data structures, such as an array, list, matrix or the like, in which each value is stored at an indexed location. In general, from each data set, separately accessible analysis data is generated for each field which has been selected for analysis from the data set. The analysis data from a data set can be structured, for example, as a one-to-one mapping of values for or from a field to values in a data structure (e.g., an array), or can be a many-to-one mapping of values from multiple fields to values in a data structure (e.g., a matrix).
and comparing one or more vectors in the first plurality of N-gram feature vectors with one or more vectors in the second plurality of N-gram feature vectors to determine the matching columns (Paragraphs [0048], [0050] similarity of the data in the two fields is measured. In particular, the values of the selected field from the first data set are compared to the values of the selected field from the second data set.

Regarding dependent claim 13, Welling et al, Jonathan et al and Abuelsaad et al teach, the method of claim 12. 
Abuelsaad et al further teaches, wherein the comparing of the one or more vectors in the first plurality of N-gram feature vectors with the one or more vectors in the second plurality of N-gram feature vectors  (Paragraph [0100]-10103] comparing each header/data from first data set with each header/data of the second data set). Also see Paragraph [0081] Column ID program 106 may determine suspected matches by using semantic analysis techniques on the entire text of the ADT identifier, the tokens into which the ADT identifier was broken, and/or the variations, combinations, and/or permutations of those tokens. Column ID program 106 may also use n-gram analysis in order to infer many related terms from a single term. For example, the unigram "phone" occurs in the context of the bigram "phone number" with a high TF/IDF frequency, so column ID program 106 can infer "work phone number," "phone number," and "work number" from "work phone." (i.e., the column ID program is performed on all the datasets as taught in Paragraph [0036])).
Jonathan et al also further teaches, wherein the comparing of the one or more vectors in the first plurality of N-gram feature vectors with the one or more vectors in the second plurality of N-gram feature vectors (Paragraph [0029] The analysis data 106 can be structured as one or more ordered data structures, such as an array, list, matrix or the like, in which each value is stored at an indexed location. In general, from each data set, separately accessible analysis data is generated for each field which has been selected for analysis from the data set. The analysis data from a data set can be structured, for example, as a one-to-one mapping of values for or from a field to values in a data structure (e.g., an array), or can be a many-to-one mapping of values from multiple fields to values in a data structure (e.g., a matrix). Paragraphs [0048], [0050] similarity of the data in the two fields is measured. In particular, the values of the selected field from the first data set are compared to the values of the selected field from the second data set).
Welling et al further teaches, to determine the matching columns includes computing cosine similarities ([0177], [0178] [0238] When the cosine distance nears 0, that means the vectors are orthogonal and when it nears 1 it means the vectors are in the same direction or similar. The cosine distance is used as a similarity measure).

Regarding dependent claim 13, Welling et al, Jonathan et al and Abuelsaad et al teach, the method of claim 12. 
Abuelsaad et al further teaches, wherein the comparing of the one or more vectors in the first plurality of N-gram feature vectors with the one or more vectors in the second plurality of N-gram feature vectors includes projecting the one or more vectors in the first plurality of N-gram feature vectors and the one or more vectors in the second plurality of N-gram feature vectors in a vector space (Paragraph (Paragraph [0031]  use a variety of semantic analysis techniques, including tokenization, synonym analysis, acronym expansion, and n-gram analysis, which may be used separately or in combination for plurality of data sets. [0039]-[0044] Comparison program 108 receives a first data set from storefront program 104 and compares the first data set to the second data set to generate a relevancy score. Comparison program 108 sends the relevancy score to storefront program 104. In one embodiment, comparison program 108 also sends a data set to storefront program 104 (i.e., suggesting the user a list for the for the first data set and also suggests the user a list for the second data set as taught in Paragraph [0044]. Examiner interprets projecting as displaying/notifying).
Jonathan et al also further teaches, wherein the comparing of the one or more vectors in the first plurality of N-gram feature vectors with the one or more vectors in the second plurality of N-gram feature vectors includes projecting the one or more vectors in the first plurality of N-gram feature vectors and the one or more vectors in the second plurality of N-gram feature vectors in a vector space (Paragraph [0062] Given a score for a pair of fields F and G after step 706, a recommendation can be made 708 regarding that pair of fields. A minimum score optionally can be enforced by applying a threshold, such as 0.5, to the score for the pair of fields. Different pairs of fields can be ranked by their score as part of the recommendation. The computer system can present a user interface to a user that allows the user to select a pair of fields based on these scores. The user interface can include information about the different fields (e.g., field names, types and tables in which they reside) and optionally the score for each pair of fields (Examiner interprets obtaining a specification as recommendation/suggestion).

Regarding dependent claim 18, Welling et al and Jonathan et al teach, the system of claim 16. 
Welling et al and Jonathan et al  fails to explicitly teach, wherein the one or more processors are further configured to identify among clustering results one or more non-matching columns, one or more clusters with matching pairs, and one or more clusters with tied matching columns.
Abuelsaad; Tamer E. (US 20150032609 A1) teaches, wherein the one or more processors are further configured to identify among clustering results one or more non-matching columns, one or more clusters with matching pairs, and one or more clusters with tied matching columns (Paragraph [0070] teaches, identification of matching columns. Paragraph [0064] teaches, non-matching columns and tied matching columns.  If a column contains multiple data types, column ID program 106 may determine that the column contains a least generic applicable data type. For example, if the column contains both days of the week and names of states, the data type of the column may be determined to be an ADT (An Abstract Data Type (ADT) is a data type which identifies data with a semantically-valuable classification as taught in Paragraph [0017]) corresponding to dictionary words. If a column contains multiple data types, the column ID program 106 may match the column with multiple ADTs by grouping entries with patterns in common, in which case the column ID program 106 may group the entries logically or by rearranging them within the column data. Paragraph [0083], [0084] Column ID program 106 may identify multiple suspected matches (step 410) (where matching multiple columns is a tied matching column), in which case column ID program 106 narrows the results (step 414) by determining how closely the column data complies with the ADT pattern definition of each suspected match. The suspected match with which the column data most closely complies is determined to be the ADT of the column (step 416)). 
Therefore it would have been obvious to one of the ordinarily skilled in the art at the time of the filing of the invention to have modified the teachings of Welling et al and Jonathan et al to provide a method, to resolve a tie using column ID program, in the event that the column data is equally compliant with the ADT pattern definition of more than one suspected match as taught in (Paragraph [0084])
It would have been obvious to one of the ordinary skill in the art, to provide a method to resolve a tie using column ID program, in the event that the column data is equally compliant with the ADT pattern definition of more than one suspected match as taught in (Paragraph [0084]).

Regarding dependent claim 19, Welling et al and Jonathan et al and Abuelsaad et al teach, the system of claim 18. 
Abuelsaad et al further teaches, wherein the one or more processors are further configured to perform a pattern matching operation on the one or more clusters with the tied matching columns to identify one or more additional clusters with matching pairs (Fig. 4 Paragraph [0017], [0064] [0069], [0070] column ID program inspects the column data to determine the formatting patterns followed by the entries (step 406) with the tied matching columns. Column ID program 106 matches the formatting patterns of the column data to known patterns of ADTs (step 412). The column data may include entries with different formats. For example, column data may include entries of "1-123-456-7890" and "1 (123) 456-7890." Despite the differences in formatting, both comply with the formatting conventions of an ADT corresponding to phone numbers (i.e., tied  matching is pattern matching)).

9. 	Claims 6 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Welling; Girish ( US 20110255782 A1) in view of Jonathan; Young (US 20160055205 A1), view of Abuelsaad; Tamer E. (US 20150032609 A1) and in further view of Christopher Scaffidi, "Unsupervised Inference of Data Formats in Human-Readable Notation", Institute for Software Research, School of Computer Science, Carnegie Mellon University, 2007. (Applicant admitted prior art dated 02/10/2022).

Regarding dependent claim 6, Welling et al and Jonathan et al and Abuelsaad et al teach, the method of claim 5. 
Welling et al and Jonathan et al and Abuelsaad et al fails to explicitly teach, wherein the pattern matching operation is implemented as a TOPEI-based pattern matching operation.
Christopher Scaffidi  teaches, wherein the pattern matching operation is implemented as a TOPEI-based pattern matching operation (Page 2 Col 2 Line 17, Line 24) Topei helps users validate (matching) strings. Topei includes patterns that help identify most outliers. ((page 4, col 1, Line 41) Topei identifies the format’s parts and each part’s composite character class. Second, Topei identifies constraints. The discussion below uses six example email addresses to demonstrate the algorithm)
Therefore it would have been obvious to one of the ordinarily skilled in the art at the time of the filing of the invention to have modified the teachings of Barsness et al, Chang2011, Jonathan et al and Abuelsaad et al, with the patterns being matched are checked against inferred patterns in the columns as taught by Scaffidi).
It would have been obvious to one of the ordinary skill in the art, to provide a method to check inferred the patterns against the matched columns as taught by Scaffidi 

Regarding dependent claim 19, Welling et al and Jonathan et al and Abuelsaad et al teach, the system of claim 19. 
Welling et al and Jonathan et al and Abuelsaad et al fails to explicitly teach, wherein the pattern matching operation is implemented as a TOPEI-based pattern matching operation.
Christopher Scaffidi  teaches, wherein the pattern matching operation is implemented as a TOPEI-based pattern matching operation (Page 2 Col 2 Line 17, Line 24) Topei helps users validate (matching) strings. Topei includes patterns that help identify most outliers. ((page 4, col 1, Line 41) Topei identifies the format’s parts and each part’s composite character class. Second, Topei identifies constraints. The discussion below uses six example email addresses to demonstrate the algorithm)
Therefore it would have been obvious to one of the ordinarily skilled in the art at the time of the filing of the invention to have modified the teachings of Welling et al and Jonathan et al and Abuelsaad et al, with the patterns being matched are checked against inferred patterns in the columns as taught by Scaffidi).
It would have been obvious to one of the ordinary skill in the art, to provide a method to check inferred the patterns against the matched columns as taught by Scaffidi.

Conclusion
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ashish Thomas (571) 272-0631 can be reached. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SUMAN RAJAPUTRA whose telephone number is (571) 272-4669. The examiner can normally be reached between 8:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ashish Thomas (571) 272-0631 can be reached. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/S. R./
Examiner, Art Unit 2164

/ASHISH THOMAS/Supervisory Patent Examiner, Art Unit 2164