DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
	Claims 1-20 are pending of which claims 1, 6, and 19 are in independent form. 
Claims 1-20 are rejected under 35 U.S.C. 103.  

Response to Arguments
Applicant's arguments filed 8/17/2021 have been fully considered but they are not persuasive. 

Applicant’s Argument:
Applicant argues, on pages 8-10 of the “Remarks” that the prior art of record does not teach “Applicant respectfully submits that the Cassidy, Chetia, and Copper references, neither in combination nor standing alone, disclose or suggest at least the features of "identifying a particular subset of the set of clusters having one or more logs to be categorized with a transaction type, based at least in part on one or more metrics of the set of clusters," as recited in independent claim 1 as presently amended”.

Examiner’s Response:
Examiner respectfully disagrees; the combination of Cassidy, Chetia and Copper clearly teaches, identifying a particular subset of the set of clusters having one or more logs to be categorized with a transaction type (Chetia: In some non-limiting embodiments, a first data field of the plurality of data fields may include a categorical data field, and/or each transaction record may include categorical data , based at least in part on one or more metrics of the set of clusters (The present invention also allows for comprehensive, real-time status reporting such that all users involved in a particular process are aware of the status of the form ¶ [0028], [0403]) during a joint session (Copper: The data structure can be used in generating replacement values for fields with invalid values in data records intended for use by the primary model. In addition, tertiary clustering models can implement clustering algorithms that calculate similarity metrics in order for the algorithms to associate data records with nodes which identify groups of data records that are similar, in order to identify candidate nodes to effect generation of replacement values for fields when multiple fields in a subject data record contain invalid values ¶ [0094]-[0097]).

Examiner believes that the combination of the references above clearly teaches the combination of independent claims 1, 6 and 19. Additionally examiner preformed an additional search and has found prior art that teaches the newly added amendments, However the newly discovered prior art has not be used for this rejection and is merely for Applicant’s knowledge:
Agarwal; Nipun et al. (US 20190266635 A1), ¶ [0004], [0006], [0009], [0011], [0124], [0149].
 Shao; Dongxu et al. (US 20200126144 A1), ¶ [0082].

Kumar; Debesh et al. (US 20190005502 A1), ¶ [0125], [0149], [0150].
Chigurupati; Vedavyas et al. (US 20160117702 A1), ¶ [0032].
Isaksson; Lennart et al. (US 20130254294 A1), [Abstract], ¶ [0010], [0012], [0038], [0065]. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over CASSIDY; Hugh et al. (US 20170308557 A1) [Cassidy] in view of Chetia; Chiranjeet et al. (US 20200341954 A1) [Chetia] in view of Copper; Jack et al. (US 20190340533 A1) [Copper].

	Regarding claims 1, 6 and 19, Cassidy discloses, a computer-implemented method for processing transaction records, comprising:  [receiving transaction data]; generating, based on the transaction data, a cleaned data set (CDS) comprising logs for a plurality of transactions, [wherein each log is associated with text], at least a portion of which has been processed using natural language processing (NLP) (Method and system for cleansing and de-duplicating data in database are provided. The method includes filtering garbage records from a plurality of records based on data fields, and applying cleansing rules to create a cleansed database. A similarity vector is generated, where each vector corresponds to pairwise comparison of distinct data entries in cleansed database. Matching rules 
clustering at least a subset of the logs from the CDS [based at least in part on the associated text and a similarity threshold], wherein clustering results in a set of clusters (Further, records in each cluster are merged to obtain de-duplicated cleansed database using predefined consolidated rules [abstract]. The computer-implemented method then processes all the vectors labeled as match to create clusters of records that are duplicates of each other in the cleansed database. Further, the records in each cluster are merged to obtain a de-duplicated cleansed database using predefined consolidated rules ¶ [0006]-[0008], [0054]-[0055], [0062] and [0072]).
However Cassidy does not explicitly facilitate receiving transaction data; wherein each log is associated with text; based at least in part on the associated text and a similarity threshold; identifying a particular subset of the set of clusters having one or more logs to be categorized with a transaction type,; receiving a user determination of a transaction type for a representative log for each cluster of the particular subset, wherein received transaction types are associated with each log of the respective cluster; generating a transaction report, the transaction report comprising a plurality of calculated parameters determined based at least in part on the logs and their associated transaction types.
Chetia discloses, receiving transaction data (Provided is a computer-implemented method for monitoring and improving data quality of transaction data that may include receiving transaction data associated with a plurality of payment transactions from an acquirer system. The transaction data may include a transaction record associated with each payment transaction of the plurality of payment transactions. Each transaction record may include a plurality of data fields [abstract]. Also see ¶ [0009] and [0012]-[0025]);
wherein each log is associated with text (Additionally or alternatively, whether the feature values associated with the textual data field satisfy at least one rule associated with the parsing layer of the NLP model may be determined. Additionally or alternatively, determining the data quality score may include determining the data quality score for the textual data field included in the transaction data based on determining whether the feature values associated with the textual data field satisfy the at least one rule associated with the parsing layer of the NLP model ¶ [0012], [0025], [0038], [0053], [0067], [0144]);
based at least in part on the associated text and a similarity threshold (Additionally or alternatively, categorizing may include categorizing the respective data field into the categorical type based on a statistical distribution of values in the data contained in the respective data field and a threshold of unique values ¶ [0011]. Also see ¶ [0052], [0172]);
identifying a particular subset of the set of clusters (As shown in FIG. 3, at step 304, process 300 may include categorizing fields of the data. For example, transaction service provider system 102 may categorize each respective data field of the transaction records into a respective type of a plurality of types ¶ [0136], ¶ [0139]. Also see ¶ [0142], [0149], [0153]-[0155] and [0178]) having one or more logs to be categorized with a transaction type (In some non-limiting embodiments, a first data field of the plurality of data fields may include a categorical data field, and/or each transaction record may include categorical data associated with the categorical data field. In some non-limiting embodiments, the categorical data associated with the categorical data field in each transaction record may be determined to match historical categorical data associated with the categorical data field in at least one historical transaction record. Additionally or alternatively, a percentage of transaction records for which the categorical data associated with the categorical data field therein matches the historical categorical data associated with the categorical data field in the at least one historical transaction record may be determined. In some non-limiting embodiments, determining the data quality score may include 
receiving a user determination of a transaction type for a representative log for each cluster of the particular subset, wherein received transaction types are associated with each log of the respective cluster (As shown in FIG. 3, at step 304, process 300 may include categorizing fields of the data. For example, transaction service provider system 102 may categorize each respective data field of the transaction records into a respective type of a plurality of types ¶ [0136]. If the respective data field is not categorized into the date type, transaction service provider system 102 may categorize the respective data field into the categorical type based on a statistical distribution of values in the data contained in the respective data field and a threshold of unique values. If the respective data field is not categorized into the date type or the categorical type, transaction service provider system 102 may categorize the respective data field into the identifier type based on a degree of uniqueness of the values in the data contained in the respective data field. If the respective data field is not categorized into the date type, the categorical type, or the identifier type, transaction service provider system 102 may categorize the respective data field into the textual type based on at least one of a plurality of regular expression functions; a number of combinations of punctuation, alphabetical characters, and/or digits of the data contained in the respective data field; any combination thereof; and/or the like. If the respective data field is not categorized into the date type, the categorical type, the identifier type, or the textual type, transaction service provider system 102 may categorize the respective data field into the numeric type if the data contained in the respective data field includes only digits and up to one decimal point ¶ [0139]. Also see ¶ [0142], [0149], [0153]-[0155] and [0178]);  
generating a transaction report, the transaction report comprising a plurality of calculated parameters determined based at least in part on the logs and their associated transaction types (As shown in FIG. 3, at step 308, process 300 may include reporting and/or using the data quality score. For 
It would have been obvious to one ordinary skilled in the art at the time of the present invention to combine the teachings of the cited references because Chetia’s system would have allowed Cassidy to facilitate receiving transaction data; wherein each log is associated with text; based at least in part on the associated text and a similarity threshold; identifying a particular subset of the set of clusters having one or more logs to be categorized with a transaction type; receiving a user determination of a transaction type for a representative log for each cluster of the particular subset, wherein received transaction types are associated with each log of the respective cluster; generating a transaction report, the transaction report comprising a plurality of calculated parameters determined based at least in part on the logs and their associated transaction types. The motivation to combine is apparent in the of Cassidy’s reference, because there is a need for a system, method, and product for monitoring and improving data quality of transaction data.
However neither Cassidy nor Chetia explicitly facilitates training a prediction model using logs from the particular subset of clusters and the associated transaction types; determining, using the predictive model, transaction types for logs in the CDS not yet associated with a transaction type; based at least in part on one or more metrics of the set of clusters.
Copper discloses, training a prediction model using logs from the particular subset of clusters and the associated transaction types (Historical data used to train machine learning algorithms can have thousands of records with hundreds of fields, and inevitably includes faulty data that affects the 
determining, using the predictive model, transaction types for logs in the CDS not yet associated with a transaction type (The clean dataset is used to produce a secondary model machine learning algorithm trained to generate from plural complete data records a replacement value for a single invalid data value in a data record, and a tertiary model machine learning clustering algorithm trained to generate from plural complete data records replacement values for multiple invalid data values [abstract]. In one aspect the present invention improves known systems and methods for replacing instances of missing or invalid data in historical data to improve the accuracy and utility of a primary machine learning algorithm developed using supervised machine learning to predict or classify a phenomenon of interest, or a primary machine learning algorithm developed using unsupervised learning to identify clusters of interest, when the primary algorithm is applied to new data ¶ [0034], [0037]-[0039], [0045], [0061], [0062], [0095], [0096]); 
based at least in part on one or more metrics of the set of clusters (The present invention also allows for comprehensive, real-time status reporting such that all users involved in a particular process are aware of the status of the form ¶ [0028], [0403]) during a joint session (Copper: The data structure can be used in generating replacement values for fields with invalid values in data records intended for 
It would have been obvious to one ordinary skilled in the art at the time of the present invention to combine the teachings of the cited references because Copper’s system would have allowed Cassidy and Chetia facilitates training a prediction model using logs from the particular subset of clusters and the associated transaction types; determining, using the predictive model, transaction types for logs in the CDS not yet associated with a transaction type; based at least in part on one or more metrics of the set of clusters. The motivation to combine is apparent in the of Cassidy and Chetia’s reference, because there is a need for systems and methods for improving the integrity and quality of data used in training and applying machine learning algorithms to increase the utility and accuracy of computer implementations and executions of such algorithms.

Regarding claims 2, 7 and 20, the combination of Cassidy, Chetia and Copper discloses, identifying an additional subset of the set of clusters and determining, using a set of base mapping rules, transaction types to be associated with the logs of the additional subset of clusters, wherein training the prediction model further comprises training the model using the logs from the additional subset and the associated transaction types (Copper: FIG. 7 illustrates one embodiment for training a self-organizing map for use as a tertiary model for replacing multiple instances of missing/invalid data in a particular record ¶ [0027]. For example, a secondary model named "Credit_F002_M001" would be a model for field no. 2 (F002, value v2, FIG. 3) in a data record that is mapped to input no. 1 (M001, the 

Regarding claims 3 and 13, the combination of Cassidy, Chetia and Copper discloses, wherein identifying the particular subset of clusters comprises selecting clusters for the subset based at least in part on a  transaction value associated with each cluster (Chetia: If the respective data field is not categorized into the date type, transaction service provider system 102 may categorize the respective data field into the categorical type based on a statistical distribution of values in the data contained in the respective data field and a threshold of unique values. If the respective data field is not categorized into the date type or the categorical type, transaction service provider system 102 may categorize the respective data field into the identifier type based on a degree of uniqueness of the values in the data contained in the respective data field. If the respective data field is not categorized into the date type, the categorical type, or the identifier type, transaction service provider system 102 may categorize the respective data field into the textual type based on at least one of a plurality of regular expression functions; a number of combinations of punctuation, alphabetical characters, and/or digits of the data contained in the respective data field; any combination thereof; and/or the like ¶ [0139]. Also see ¶ [0172], [0173], [0191]).

Regarding claim 4, the combination of Cassidy, Chetia and Copper discloses, wherein identifying the particular subset of the set of clusters comprises selecting clusters based at least in part on a distance metric between respective clusters in the subset (Copper: the weight W1-3 is the third weight value in node no. 1. The degree of similarity of a particular record 520.sub.D in the auxiliary clean dataset 510 to a node's weights is calculated by a similarity metric such as the Euclidean distance between any two d-dimensional vectors X and Y in a d-dimensional space, defined as [(X.sub.1-

Regarding claims 5 and 9, the combination of Cassidy, Chetia and Copper discloses, wherein generating the CDS comprises removing out-of-scope or duplicate logs (Cassidy: Method and system for cleansing and de-duplicating data in database are provided. The method includes filtering garbage records from a plurality of records based on data fields, and applying cleansing rules to create a cleansed database. The method analyzes the vectors labeled as matched and unmatched to train a machine learning model to identify duplicates in the cleansed database. Further, records in each cluster are merged to obtain de-duplicated cleansed database using predefined consolidated rules [abstract]. Also see ¶ [0006]-[0008], [0012]-[0015], [0024]-[0029], [0032]).

Regarding claim 8, the combination of Cassidy, Chetia and Copper discloses, generating a cleaned data set (CDS) of logs from the transaction data, wherein generating the CDS comprises processing [at least a portion of the text associated with] the logs using natural language processing (NLP) (Cassidy: Method and system for cleansing and de-duplicating data in database are provided. The method includes filtering garbage records from a plurality of records based on data fields, and applying cleansing rules to create a cleansed database. A similarity vector is generated, where each vector corresponds to pairwise comparison of distinct data entries in cleansed database. Matching rules are applied to label each vector as one of matched, unmatched and unclassified. The method analyzes the vectors labeled as matched and unmatched to train a machine learning model to identify duplicates in 
at least a portion of the text associated with (Chetia: Additionally or alternatively, whether the feature values associated with the textual data field satisfy at least one rule associated with the parsing layer of the NLP model may be determined. Additionally or alternatively, determining the data quality score may include determining the data quality score for the textual data field included in the transaction data based on determining whether the feature values associated with the textual data field satisfy the at least one rule associated with the parsing layer of the NLP model ¶ [0012], [0025], [0038], [0053], [0067], [0144]);

Regarding claim 10, the combination of Cassidy, Chetia and Copper discloses, providing for display a representative log for a cluster of the particular subset prior to receiving the respective user determination of the transaction type for the cluster (Chetia: As shown in FIG. 3, at step 308, process 300 may include reporting and/or using the data quality score. For example, transaction service provider system 102 may report and/or use the data quality scores for each data field of the transaction data and/or the overall data quality score. In some non-limiting embodiments, transaction service provider system 102 may display a graphical user interface (e.g., GUI) including an indication of at least one of the data quality scores for each data field of the transaction data, the overall data quality score, any combination thereof, and/or the like ¶ [0159], [0160], [0166], [0167], [0176] ).

Regarding claim 11, the combination of Cassidy, Chetia and Copper discloses, providing, for display, two representative logs for a cluster of the particular subset (Chetia: see Fig. 4, element 402d. Also see ¶ [0137], [0159], [0160], [0176]);  
receiving different user determinations of the transaction types for the two representative logs (Chetia: used herein, the terms "issuer," "issuer institution," "issuer bank," or "payment device issuer," may refer to one or more entities that provide accounts to individuals (e.g., users, customers, and/or the like) for conducting payment transactions such as such as credit payment transactions and/or debit payment transactions ¶ [0093], [0124], [0129]); and  
splitting the cluster based at least in part on the different user determinations for the transaction types for the two representative logs (Chetia: categorizing may include categorizing each respective data field of the subset into the respective type of the plurality of types ¶ [0010], [0023], [0036], [0050], [0137] and [0178]).

Regarding claim 12, the combination of Cassidy, Chetia and Copper discloses, further comprising validating the prediction types provided by the trained prediction model using a Human-in-the-Loop method (Cassidy: The data cleansing and de-duplication system 108 includes a data diagnostic module 202, a cleansing module 204, a rules defining module 206, a vector generation module 208, a labeling module 210, a machine learning algorithm module 212, a human assisted checking module 214, a cluster creation module 216 and a merge record module 218 ¶ [0032]. In an embodiment, the machine learning algorithm module 212 classifies the remaining vectors and returns a confidence level with each label. The labeled vectors are then checked by users using the human assisted checking module 214 ¶ [0044]-[0045]).

Regarding claim 14, the combination of Cassidy, Chetia and Copper discloses, performing an automated quality and error analysis of the transaction types predicted by the prediction model; and  modifying at least one model parameter based on the quality and error analysis (Chetia: In some non-limiting embodiments, a regression may be performed on each tuple of the plurality of tuples to 

Regarding claim 15, the combination of Cassidy, Chetia and Copper discloses, performing an automated quality and error analysis of the transaction types predicted by the prediction model;  identifying one or more logs for manual tagging based on the quality and error analysis; and  receiving a user determination of a transaction type for each of the one or more logs (Chetia: In some non-limiting embodiments, a regression may be performed on each tuple of the plurality of tuples to provide an error value for each tuple of the plurality of tuples. Additionally or alternatively, the error value of at least one tuple of the plurality of tuples may be determined to satisfy a data quality threshold. In some non-limiting embodiments, at least one of a coefficient value, an intercept value, or any combination thereof may be stored based on the regression for the at least one tuple that satisfies the data quality threshold ¶ [0018], [0031], [0044], [0060]).

Regarding claim 16, the combination of Cassidy, Chetia and Copper discloses, further comprising training the prediction model using the one or more logs and the associated transaction types (Copper: Historical data used to train machine learning algorithms can have thousands of records with hundreds of fields, and inevitably includes faulty data that affects the accuracy and utility of a primary model machine learning algorithm. To improve dataset integrity it is segregated into a clean dataset having no invalid data values and a faulty dataset having the invalid data values. The clean dataset is used to produce a secondary model machine learning algorithm trained to generate from 

Regarding claim 17, the combination of Cassidy, Chetia and Copper discloses, wherein predicting transaction types for the additional logs results in logs being associated with more than 95% of the plurality of logs (Chetia: As shown in FIG. 3, at step 304, process 300 may include categorizing fields of the data. For example, transaction service provider system 102 may categorize each respective data field of the transaction records into a respective type of a plurality of types ¶ [0136]. If the respective data field is not categorized into the date type, transaction service provider system 102 may categorize the respective data field into the categorical type based on a statistical distribution of values in the data contained in the respective data field and a threshold of unique values. If the respective data field is not categorized into the date type or the categorical type, transaction service provider system 102 may categorize the respective data field into the identifier type based on a degree of uniqueness of the values in the data contained in the respective data field. If the respective data field is not categorized into the date type, the categorical type, or the identifier type, transaction service provider system 102 may categorize the respective data field into the textual type based on at least one of a plurality of regular expression functions; a number of combinations of punctuation, alphabetical characters, and/or digits of the data contained in the respective data field; any combination thereof; and/or the like. If the respective data field is not categorized into the date type, the categorical type, the identifier type, or the textual type, transaction service provider system 102 may categorize the 

Regarding claim 18, the combination of Cassidy, Chetia and Copper discloses, further comprising using the trained model to predict transaction types for logs from another set of transaction data (Copper: The clean dataset is used to produce a secondary model machine learning algorithm trained to generate from plural complete data records a replacement value for a single invalid data value in a data record, and a tertiary model machine learning clustering algorithm trained to generate from plural complete data records replacement values for multiple invalid data values [abstract]. In one aspect the present invention improves known systems and methods for replacing instances of missing or invalid data in historical data to improve the accuracy and utility of a primary machine learning algorithm developed using supervised machine learning to predict or classify a phenomenon of interest, or a primary machine learning algorithm developed using unsupervised learning to identify clusters of interest, when the primary algorithm is applied to new data ¶ [0034], [0037]-[0039], [0045], [0061], [0062], [0095], [0096]).

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MOHAMMAD S ROSTAMI whose telephone number is (571)270-1980. The examiner can normally be reached Mon-Fri From 9 a.m. to 5 p.m..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hosain T Alam can be reached on (571)272-3978. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





11/17/2021
/MOHAMMAD S ROSTAMI/Primary Examiner, Art Unit 2154