Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Applicant’s Application filed on 06/02/2021 has been reviewed.
Claims 1-24 have been examined.
Notice of Pre-AIA  or AIA  Status
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows: 
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 22-23 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.
The claimed invention as recited in claims 21-22 are addressed to "an automated and dynamic system" that can be interpreted as referring to lines of programming within a computer system, rather than referring to the system as a physical object comprising hardware processor processing instructions stored in non-transitory computer readable storage medium. The claimed invention is also addressed to  “storage systems…pair generator...” that can be interpreting as a hardware system or a software system. Therefore, Examiner interprets the claimed system as a software system.
Accordingly, the claim recites no more than software, logic, or a data structure (i.e., an abstraction) and do not fall within any statutory category. In re Warmerdam, 33 F.3d 1354, 1361 (Fed. Cir. 1994). Significantly, "[a]bstract software code is an idea without physical embodiment." Microsoft Corp. v. AT&T Corp., 550 U.S. 437, 449 (2007).
As such, the claims are not limited to statutory subject matter and are therefore non-statutory.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(d):
(d) REFERENCE IN DEPENDENT FORMS.—Subject to subsection (e), a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.

The following is a quotation of pre-AIA  35 U.S.C. 112, fourth paragraph:
Subject to the following paragraph [i.e., the fifth paragraph of pre-AIA  35 U.S.C. 112], a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.

Claims 20-21 are rejected under 35 U.S.C. 112(d) or pre-AIA  35 U.S.C. 112, 4th paragraph, as being of improper dependent form for failing to further limit the subject matter of the claim upon which it depends, or for failing to include all the limitations of the claim upon which it depends.  
Regarding claim 20, the claim depends on itself.
Claim 21 inherits the deficiencies of the claim 20 and is rejected by virtue of its dependencies.
Applicant may cancel the claim(s), amend the claim(s) to place the claim(s) in proper dependent form, rewrite the claim(s) in independent form, or present a sufficient showing that the dependent claim(s) complies with the statutory requirements.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1, 3, 4, 18-22 and 24 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by U.S. Patent Application Publication No. 20200364243 to Tamayo-Rios et al. (hereinafter “Tamayo-Rios”).
As to claim 1, Tamayo-Rios teaches an automated computer-implemented method for grouping data records for improving the efficiency of a clustering process, the method comprising (par. 0003-0005, 0053-0058, computer implemented method in a system comprising processor and non-transitory computer readable storage medium): 
a) accessing, from one or more storage systems, an initial dataset of data records, each data record being structured with predetermined fields (par. 0020-0022, initial dataset/sub-group of records are a record set with similar field values, by performing blocking); 
b) generating, by a processor, comparison vectors associated with pairs of data records from the initial dataset, each vector associated with a pair comprising a set of values, each value being associated with one of the predetermined fields and representing a comparison result of the values in said field for the first and second data records of a pair (par. 0024-0026, 0043-0049, records are expressed as vectors of extracted features within the feature matrix); 
c) inputting the comparison vectors into a trained non-linear similarity model, stored onto a storage medium, and generating therefrom similarity scores, each similarity score providing an indication of the degree of similarity between the two data records in the pair (par. 0018, 0027, 0043, trained non-linear similarity model); 
d) inputting, by the processor, the similarity scores into a clustering algorithm, and creating therefrom clusters of data records (par. 0028-0029, clustering based on similarity scores); 
e) removing, by the processor, from the dataset, data records in the created clusters that have been determined as reconciled (par. 0029, records are removed and stored in a secure database after clustering).
As to claim 3, Tamayo-Rios teaches the computer-implemented method according to claim 2, comprising removing, after each iteration of step e), reconciled data records from the initial dataset and from the additional dataset(s) (par. 0028, clustering algorithm move data from sub-group to clusters, while removing records that do not have similarity scores meeting threshold).
As to claim 4, Tamayo-Rios teaches the computer-implemented method according to claim 3, wherein entire clusters of reconciled data records are removed after each iteration of step e) (par. 0029, records are removed and stored in a secure database after clustering).
As to claim 18, Tamayo-Rios teaches the computer-implemented method according to claim 1, wherein training of the non-linear similarity model comprises the following steps: i) providing a training dataset of training data records, the training data records being structured with the same predetermined fields as those of the data records of the initial and additional datasets (par. 0037); ii) generating training comparison vectors associated to pairs of training data records, each training comparison vector being associated with a pair comprising a set of values, each value being associated to one field and representing a comparison result of the values in said field for the first and second training data records of a pair (par. 0043-0048); and iii) training a non-linear similarity model by inputting therein the training comparison vectors, to determine or predict a similarity between pairs of data records (par. 0044-0052).
As to claim 19, Tamayo-Rios teaches the computer-implemented method according to claim 19, comprising determining groups of training data records before generating comparison vectors, wherein groups are based on the values contained in at least some of the fields of the training data records, so as to classify the data records of the training dataset into said groups and train a non-linear similarity model for each group (par. 0037-0042).
As to claim 20, Tamayo-Rios teaches the computer-implemented method according to claim 20, wherein the trained non-linear similarity models are either gradient boosting models or neural network models (par. 0043-0048, neural network).
As to claim 21, Tamayo-Rios teaches the computer-implemented method according to claim 20, wherein the data records that have been removed are added to the training dataset of the corresponding group, whereby the non-linear similarity model associated to the group is retrained with data records from the initial and additional datasets (par. 0037-0052, various type of records is added to training data to optimize precision, accuracy).
As to claim 22, Tamayo-Rios teaches an automated and dynamic system for clustering data records pertaining to different datasets, the system comprising (par. 0003-0005, 0053-0058, computer implemented method in a system comprising processor and non-transitory computer readable storage medium): 
one or more storage systems for storing an initial dataset of data records, each data record being structured with predetermined fields (par. 0020-0022, initial dataset/sub-group of records are a record set with similar field values, by performing blocking); 
a pair generator and a comparison algorithm toolbox for generating comparison vectors associated with pairs of data records from the initial dataset, each vector associated with a pair comprising a set of values, each value being associated with one field and representing a comparison result of the values in said field for the first and second data records of a pair (par. 0024-0026, 0043-0049, records are expressed as vectors of extracted features within the feature matrix); 
at least one trained non-linear similarity model receiving as an input the comparison vectors, and generating therefrom a matrix of similarity scores, each similarity score providing an indication of the degree of similarity between the two data records in the pair of the group(par. 0018, 0027, 0043, trained non-linear similarity model); 
a clustering algorithm for receiving as an input the matrix of similarity scores, and creating therefrom clusters of data records (par. 0028-0029, clustering based on similarity scores); and 
a graphical user interface for receiving as input reconciled data records in a given one of the clusters and for removing reconciled data records from the initial dataset (par. 0029, 0031-0036, records are removed and stored in a secure database after clustering, including removing links of reconciled data records).
Regarding claim 24, is essentially the same as claim 1 except that it sets forth the claimed invention as a non-transitory storage medium rather than a method and rejected for the same reasons as applied hereinabove. 
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim(s) 2, 5, 6, 9 and 23 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent Application Publication No. 20200364243 to Tamayo-Rios et al. (hereinafter “Tamayo-Rios”), and further in view of U.S. Patent Application Publication No. 20190019083 to Trunck et al. (hereinafter “Trunck”).
As to claim 2, Tamayo-Rios teaches the computer-implemented method according to claim 1. While Tamayo-Rios  teaches the steps b) to e), Tamayo-Rios does not explicitly teach wherein the data records pertain to different datasets, and wherein the method comprises periodically repeating steps with additional datasets of data records while keeping the remaining data records of previous datasets that have not been removed or reconciled, thereby improving a reconciliation rate of the data records that are scattered between the different datasets as claimed.
Trunck teaches wherein the data records pertain to different datasets, and wherein the method comprises periodically repeating steps with additional datasets of data records while keeping the remaining data records of previous datasets that have not been removed or reconciled, thereby improving a reconciliation rate of the data records that are scattered between the different datasets (Fig. 9, par. 0087-0090, periodic synchronization for data records.)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Tamayo-Rios with the teaching of Trunck because they are in the same field of endeavor. One of ordinary skill in the art at the time of the invention would have been motivated to do so because the teaching of Trunck would allow Tamayo-Rios to reduce the complexity in identifying relationships in data (Trunck, par. 0002-0006)
As to claim 5, the rejection of claim 2 is hereby incorporated by reference, combination of Tamayo-Rios and Trunck teaches the computer-implemented method according to claim 2, comprising automatically classifying the data records into a plurality of groups, based on values contained in at least some of the predetermined fields (par. 0028-0029, clustering based on similarity scores), and wherein steps b) to e) are performed for each group using non-linear model (par. 0018, 0027, 0043). Tamayo-Rios does not explicitly teach a distinct trained model being associated with each group, for reducing computational requirements when comparing pairs of data records as claimed.
Trunck teaches a distinct trained model being associated with each group, for reducing computational requirements when comparing pairs of data records (par. 0007-0009, 0042, Fig. 1, 2, 7, 9, distinct machine leaning models for input data.)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Tamayo-Rios with the teaching of Trunck because they are in the same field of endeavor. One of ordinary skill in the art at the time of the invention would have been motivated to do so because the teaching of Trunck would allow Tamayo-Rios to reduce the complexity in identifying relationships in data (Trunck, par. 0002-0006)
As to claim 6, the rejection of claim 5 is hereby incorporated by reference, combination of Tamayo-Rios and Trunck teaches the computer-implemented method according to claim 5, comprising a step of adjusting a parameter of the clustering algorithm, for each of the groups, said parameter setting a threshold that determines whether or not a given data record is to be attributed to a given cluster (Trunck, par. 0051-0053, forward and reverse process for setting threshold to data record).
As to claim 9, the rejection of claim 5 is hereby incorporated by reference, combination of Tamayo-Rios and Trunck teaches the computer-implemented method according to claim 5, wherein classifying the data records in a group is made by using a transaction type field or a transaction characteristic field of the data records (par. 0023-0028).
As to claim 23, Tamayo-Rios teaches the automated and dynamic system according to claim 22, further comprising: a grouping module for automatically classifying the data records into groups, based on values contained in at least some of the predetermined fields, for receiving as an input the comparison vectors of a group (par. 0018, 0027, 0028, 0043, clustering algorithm move data from sub-group to clusters, while removing records that do not have similarity scores meeting threshold).
Trunck further teaches wherein the at least one trained non-linear similarity model comprises a plurality trained non-linear similarity models associated with each group (par. 0007-0009, 0042, Fig. 1, 2, 7, 9, plurality of machine leaning models for input data.)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Tamayo-Rios with the teaching of Trunck because they are in the same field of endeavor. One of ordinary skill in the art at the time of the invention would have been motivated to do so because the teaching of Trunck would allow Tamayo-Rios to reduce the complexity in identifying relationships in data (Trunck, par. 0002-0006)
Claim(s) 7 and 8 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent Application Publication No. 20200364243 to Tamayo-Rios et al. (hereinafter “Tamayo-Rios”), U.S. Patent Application Publication No. 20190019083 to Trunck et al. (hereinafter “Trunck”), and further in view of publication “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise” to Ester et al. (hereinafter “Ester”), published 1996 by University of Munich.
As to claim 7, Tamayo-Rios teaches the computer-implemented method according to claim 6. Tamayo-Rios does not explicitly teach wherein the clustering algorithm is a Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm as claimed.
Ester teaches wherein the clustering algorithm is a Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm (Abstract, Section 4, 4.1, 4.2, DBSCAN: Density Based Spatial Clustering of Applications with Noise.)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Tamayo-Rios and Trunck with the teaching of Ester because they are in the same field of endeavor. One of ordinary skill in the art at the time of the invention would have been motivated to do so because the teaching of Ester would allow the combination of Tamayo-Rios and Trunck to improve efficiency in data clustering (Ester, Abstract, Introduction, Performing Evaluation.)
As to claim 8, the rejection of claim 7 is hereby incorporated by reference, the combination of Tamayo-Rios and Trunck teaches the computer-implemented method according to claim 7, wherein the parameter is an epsilon parameter, the method comprising a step of adjusting the epsilon parameter of the DBSCAN clustering algorithm, for each of the groups (Ester, Section 4.2, Determining the Parameters Eps and MinPts.)
Claim(s) 10 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent Application Publication No. 20200364243 to Tamayo-Rios et al. (hereinafter “Tamayo-Rios”), U.S. Patent Application Publication No. 20190019083 to Trunck et al. (hereinafter “Trunck”), and further in view of U.S. Patent Application Publication No. 20190340533 to Jack Copper (hereinafter “Copper”).
As to claim 10, the rejection of claim 5 is hereby incorporated by reference, combination of Tamayo-Rios and Trunck teaches the computer-implemented method according to claim 5. The combination of Tamayo-Rios and Trunck does not explicitly teach comprising a step of estimating values of data records having unpopulated or missing fields, prior to classifying the records into groups, the estimated values being obtained by using a classifier model trained on data records in which fields are all populated as claimed.
Copper teaches estimating values of data records having unpopulated or missing fields, prior to classifying the records into groups, the estimated values being obtained by using a classifier model trained on data records in which fields are all populated (par. 0025-0029, 0034-0036, 0046-0047, 0051-0052, estimating values of data records having unpopulated or missing fields.)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Tamayo-Rios and Trunck with the teaching of Copper because they are in the same field of endeavor. One of ordinary skill in the art at the time of the invention would have been motivated to do so because the teaching of Copper would allow the combination of Tamayo-Rios and Trunck to preparing data for a machine learning algorithm that accounts for missing or invalid data in a way that increases the ability of the model representing the algorithm to generate a more accurate and useful output when the model is placed in service (Copper, par. 0019)
As to claim 11, the rejection of claim 10 is hereby incorporated by reference, combination of Tamayo-Rios, Trunck and Copper teaches the computer-implemented method according to claim 10, wherein the classifier model is a decision tree type classifier model or a neural network model (Copper, par. 0042, 0060-0066).
Claim(s) 12-15 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent Application Publication No. 20200364243 to Tamayo-Rios et al. (hereinafter “Tamayo-Rios”), U.S. Patent Application Publication No. 20190019083 to Trunck et al. (hereinafter “Trunck”), U.S. Patent Application Publication No. 20190340533 to Jack Copper (hereinafter “Copper”), and further in view of U.S. Patent Application Publication No. 20080215602 to Samson et al. (hereinafter “Samson”).
As to claim 12, the rejection of claim 11 is hereby incorporated by reference, the combination of Tamayo-Rios, Trunck and Copper teaches the computer-implemented method according to claim 11. The combination of Tamayo-Rios, Trunck and Copper does not explicitly teach wherein the values of the comparison vectors are generated using one or more comparison models, comprising true/false comparison models for categorical or entity values and difference comparison models or distance models for numeral values as claimed.
Samson teaches wherein the values of the comparison vectors are generated using one or more comparison models, comprising true/false comparison models for categorical or entity values and difference comparison models or distance models for numeral values (par. 0021, 0025, 0030, 0041-0042, 0045, 0050, binary comparison model and distance models.)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of combination of Tamayo-Rios, Trunck and Copper with the teaching of Copper because they are in the same field of endeavor. One of ordinary skill in the art at the time of the invention would have been motivated to do so because the teaching of Copper would allow the combination of Tamayo-Rios, Trunck and Copper to overcome many difficulties in data fusion and correlation (Samson, par. 0002-0008)
As to claim 13, the rejection of claim 12 is hereby incorporated by reference, the combination of Tamayo-Rios, Trunck, Copper and Samson teaches the computer-implemented method according to claim 12, comprising a step of standardizing the values of the comparison vectors into numerical values, prior to inputting the comparison vectors into the trained non-linear similarity model (par. 0042-0048).
As to claim 14. , the rejection of claim 13 is hereby incorporated by reference, the combination of Tamayo-Rios, Trunck, Copper and Samson teaches the computer-implemented method according to claim 13, wherein the trained non-linear similarity models comprise at least one of: a XGBoost machine learning algorithm, a Random Forest or a Neural Nets machine learning algorithm (par. 0018, Neural Nets).
As to claim 15, the rejection of claim 14 is hereby incorporated by reference, the combination of Tamayo-Rios, Trunck, Copper and Samson teaches the computer-implemented method according to claim 14, wherein the similarity scores outputted by the non-linear similarity model are comprised in an NxN matrix which is inputted into the clustering algorithm, wherein N corresponds to the number of data records in the group (par. 0042-0044).
Claim(s) 16 is rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent Application Publication No. 20200364243 to Tamayo-Rios et al. (hereinafter “Tamayo-Rios”), and further in view of U.S. Patent Application Publication No. 20080275916 to James Bohannon et al. (hereinafter “Bohannon”).
As to claim 16, Yamayo-Rios teaches the computer-implemented method according to claim 1. Tamayo-Rios does not explicitly teach wherein at least one of the predetermined fields of each data record comprises a monetary value, and wherein the sum of the monetary values of the at least one field of each data record in a cluster that is removed is below a predetermined threshold as claimed.
Bohannon teaches wherein at least one of the predetermined fields of each data record comprises a monetary value, and wherein the sum of the monetary values of the at least one field of each data record in a cluster that is removed is below a predetermined threshold (par. 0099-0102, the enterprise system may prevent the submitted activity report from being applied to the member database 550, such as “For example, a campaign manager may expect the purchase amount for a certain campaign job not to be below $20 because there may be no units of smaller value that may be sold as part of the given campaign. As discussed above, a proprietor may configure this expected purchase via proprietor interface 450 and by communicating with a screen 570 or a similar interface. Each record reporting an amount below $20 may then be considered to violate this particular expectation, which may be referred to as the "$20 or more" rule. Accordingly, a threshold value associated with the purchase amount parameter may correspond to the maximum number of candidate records within a single activity report that can violate the "$20 or more" rule without triggering a rejection of the activity report.”)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Tamayo-Rios with the teaching of Bohannon because they are in the same field of endeavor. One of ordinary skill in the art at the time of the invention would have been motivated to do so because the teaching of Bohannon would allow Tamayo-Rios to improve quality of data in database(Bohannon, par. 0008-0011)
Claim(s) 17 is rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent Application Publication No. 20200364243 to Tamayo-Rios et al. (hereinafter “Tamayo-Rios”), and further in view of U.S. Patent Application Publication No. 20050273452 to Molloy et al. (hereinafter “Molloy”).
As to claim 17, Tamayo-Rios teaches the computer-implemented method according to claim 1. Tamayo-Rios does not explicitly teach wherein the predetermined fields of a data record comprise at least one of: a sender identification, a receiver identification, a date and time, a transit number, one or more types or characteristics of a transaction as claimed.
Molloy teaches wherein the predetermined fields of a data record comprise at least one of: a sender identification, a receiver identification, a date and time, a transit number, one or more types or characteristics of a transaction (par. 0041-0063.)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Tamayo-Rios with the teaching of Molloy because they are in the same field of endeavor. One of ordinary skill in the art at the time of the invention would have been motivated to do so because the teaching of Molloy would allow Tamayo-Rios to ”match records in one database with records in another database so as to identify records which are intended to describe the same item, event, transaction or other instance of a particular phenomenon”( Molloy, par. 0002-0004)
Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANHTAI V TRAN whose telephone number is (571)270-5129.  The examiner can normally be reached on Monday through Thursday from 8:00 AM to 4:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Fred Ehichioya can be reached on (571)272-4034.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ANHTAI V TRAN/Primary Examiner, Art Unit 2168