DETAILED ACTION
1.	This office action is in response to application 17/136,124 filed on 12/29/2020. Claims 1-10 are pending in this office action.


Notice of Pre-AIA  or AIA  Status
2.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Allowable Subject Matter
3.	Claims 3 and 4 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.


Claim Rejections - 35 USC § 103
4.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 2, 5-10 is/are rejected under 35 U.S.C. 103 as being unpatentable over US 2017/0351717 (hereinafter Kabra) in view of US 2019/0188288 (hereinafter Holm).

As for claim 1 Kabra discloses: receiving, by an input/output interface, an input data from an input file, wherein the input data is in the form of tabular data having a plurality of rows and the plurality of columns (202) (See paragraphs 0029 and 0051 note the system can receive a file including the data) ; preprocessing, by one or more hardware processors, the input data (204) (See paragraph 0012 note normalization of data is a preprocessing of the data to standardize the data for comparison further see paragraph 0046 noting that the normalized data is populated into a database as part of the preprocessing of the data) deriving, by the one or more hardware processors, a statistical score for each pair of columns from possible pairs of columns in the plurality of columns in the preprocessed input data (206) (See paragraphs 0012-0014 and 0023-0025 note a score is calculated for each column that indicates a match and if the score is above the threshold the columns are considered a match/duplicate); selecting, by the one or more hardware processors, a first set of pair of columns from the possible pairs of columns, wherein the first set of pair of columns satisfies a predefined condition (See paragraphs 0024-0029 note the system uses thresholds and weights each factor of importance based on an attribute being a predefined condition from which matched are determined based on aggregated conditions that define the total match threshold score), and wherein the first predefined condition is the statistical score for the pair of columns is more than a first threshold value (208) (See paragraphs 0024, 0032 and 0046 note threshold is based on the statistical analysis of attributes and similarity);  performing, by the one or more hardware processors, a row level analysis on the selected first set of pair of columns (See paragraph 0029 note the row based analysis determines if the entities are the same based on the attributes stored within) using one or more of: a fuzzy logic technique, a semantic level analysis using a word embedding technique, wherein the word present in the plurality of columns checking the concurrence of a plurality of words on the selected first set of pair of columns, and utilizing a look up table after converting unit of measures of the input data (See paragraphs 0045-0048 note the system uses a lookup table to determine correlations and appropriate weights to determine matches), wherein the row level analysis results in generation of a row level score (210) (See paragraphs 0028 note the entity matching score is determined based on the row while the attribute and weight scores are determine based on the columns which represent the attributes) ; selecting, by the one or more hardware processors, a set of characteristic pair of columns out of the first set of pair of columns if the generated row level score is more than a second threshold value (212) (See paragraphs 0022-0024 and 0043 note there can be different threshold levels for the attributes and an overall matching threshold).
Kabra does not disclose: identifying, by the one or more hardware processors, the selected set of characteristic pair of columns as duplicate columns in the form of an output file (214). Holm however discloses identifying, by the one or more hardware processors, the selected set of characteristic pair of columns as duplicate columns in the form of an output file (214) (See paragraph 0084-0086 note the query specifics how to perform deduplication and the results are output to a file). It would have been obvious to an artisan of ordinary skill in the pertinent at the time the instantly claimed invention was filed to have incorporated the teaching of Holm into the system of Kabra. The modification would have been obvious because the two references are concerned with the solution to problem of data deduplication (See Holm paragraph 0085 and Kabra abstract), therefore there is an implicit motivation to combine these references (i.e. motivation from the references themselves). In other words, the ordinary skilled artisan, during his/her quest for a solution to the cited problem, would look to the cited references at the time the invention was made. Consequently, the ordinary skilled artisan would have been motivated to combine the cited references since Holm’s teaching would enable users of the Kabra system to more efficient deduplication of records.

As for claim 2 the rejection of claim 1 is incorporated and further Kabra discloses: the step of providing intervention by a subject matter expert by manually screening the identified duplicate columns (See paragraph 0024 note the administrator is the expert that can be tasked with performing a manual review/screening).

As for claim 5 the rejection of claim 1 is incorporated and further Kabra discloses: the step of removing the identified duplicate columns (See paragraph 0029 note the duplicate records can be automatically removed).

As for claim 6 the rejection of claim 1 is incorporated and further Holm discloses: wherein the input file and the output file is are in the form of at least one or more of comma separated value (.csv) format, XLS format, XLSX format (See paragraph 0084).

As for claim 7 the rejection of claim 1 is incorporated and further Holm discloses: wherein the step of preprocessing is preceded by the step of merging the input data received from more than one type input files (See paragraphs 0071-0078 note during the initial processing data is merged with duplicates retained and displayed side by side).

Claims 8-9 are system claims substantially corresponding to the method of claims 1 and 6 are thus rejected for the same reasons as set forth in the rejection of claims 1 and 6.

Claim 10 is a non-transitory storage medium substantially corresponding to the method of claim 1 are thus rejected for the same reasons as set forth in the rejection of claim 1.

Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ELIYAH STONE HARPER whose telephone number is (571)272-0759.  The examiner can normally be reached on Monday-Friday 10:00 am - 6:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.  
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mark Featherstone can be reached on (571)270-3750.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Eliyah S. Harper/Primary Examiner, Art Unit 2166                                                                                                                                                                                                        September 8, 2022