Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Allowable Subject Matter
Claims 1-20 are allowed.
The following is an examiner’s statement of reasons for allowance: 
Bergeron (USPAP. 20200210442) discloses a system for processing data. During operation, the system extracts text windows of varying length from text in one or more content items associated with an entity. Next, the system applies a machine learning model to features for the text windows to produce scores representing the likelihoods that the text windows contain addresses. The system then identifies, based on the scores and validation rules applied to the text windows, one of the text windows as an address for the entity. Finally, the system stores the selected text window as the address for the entity (Abstract; Pars. 33-46).
Zhuang et al. (USPN. 7707129) discloses improvements to the support vector machine (SVM) classification model. When text data is significantly unbalanced (i.e., positive and negative labeled data are in disproportion), the classification quality of standard SVM deteriorates. Embodiments of the invention are directed to a weighted proximal SVM (WPSVM) model that achieves substantially the same accuracy as the traditional SVM model while requiring significantly less computational time. A weighted proximal SVM (WPSVM) model in accordance with embodiments of the invention may include a weight for each training error and a method for estimating the weights, which automatically solves the unbalanced data problem. And, instead of solving the optimization problem via the KKT (Karush-Kuhn-Tucker) conditions and the Sherman-Morrison-Woodbury formula, embodiments of the invention use an iterative algorithm to solve an unconstrained optimization problem, which makes WPSVM suitable for classifying relatively high dimensional data (Abstract; cols. 3-5 and Figs. 2-4). 
Sathyanarayana (20120102002) discloses systems and methods for data validation and correction. Such systems and methods can reduce costs, improve productivity, improve scalability, improve data quality, improve accuracy, and enhance data security. A data manager can execute such data validation and correction. The data manager identifies one or more anomalies from a given data set using both contextual information and validation rules, and then automatically corrects any identified anomalies or missing information. Identification of anomalies includes generating similar data elements, and correlating against contextual information and validation rules (Abstract; Pars. 34-39).

However, regarding claim 1, the closest prior art of record either alone or in combination fails to anticipate or render obvious the combination wherein "determining a respective data vector category for each respective data vector, wherein a data vector category is based on the feature data includes in each respective data vector; assigning each respective data vector to the respective data vector category determined for the respective data vector; evaluating the dataset based on each respective data vector category; in response to a result of the evaluation of the dataset, determining whether the dataset satisfies a data quality metric; in response to the data set failing to satisfy the data quality metric, generating an alarm; and reporting how the dataset failed to satisfy the data quality metric and provide a link to the dataset that caused the alarm to be generated" in combination with other limitations in the claims as defined by Applicants. 
Claims 2-9 depend from allowed claim 1 and therefore are also allowed.
Regarding claim 10, the closest prior art of record either alone or in combination fails to anticipate or render obvious the combination wherein "generate a data vector for each respective text string of the plurality of text strings, wherein the generated data vector has a preset data length and includes feature data indicating features of the respective text string; determine a respective data vector category for each respective data vector, wherein a category is based on the feature data included in each respective data vector; evaluate the dataset based on each respective data vector category; in response to evaluating the dataset, determine whether the dataset satisfies a data quality metric;  33Docket No.: 1988.0161D in response to the data set failing to satisfy the data quality metric, generate an alarm; and report how the dataset failed to satisfy the data quality metric and provide a link to the dataset that caused the alarm to be generated" in combination with other limitations in the claims as defined by Applicant. 
Claims 11-14 depend from allowed claim 10 and therefore are also allowed.
Regarding claim 15, the closest prior art of record either alone or in combination fails to anticipate or render obvious the combination wherein " capture attributes of the respective variable length character string based on the plurality of computed features; populate a data vector with the captured attributes, wherein the data vector has a predetermined length and includes one or more of the captured attributes of the respective variable length character string; assign a category to each respective data vector using a machine learning algorithm;  35Docket No.: 1988.0161D based on the category assigned to each respective data vector, evaluate the dataset; and in response to evaluating the dataset based on the category assigned to each respective data vector, determine whether the dataset satisfies a data quality metric" in combination with other limitations in the claims as defined by Applicant. 
Claims 16-20 depend from allowed claim 15 and therefore are also allowed.
Conclusion

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Any inquiry concerning this communication or earlier communications from the examiner should be directed to PHUONG HUYNH whose telephone number is (571)272-2718. The examiner can normally be reached M-F: 9:00AM-5:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew M Schechter can be reached on 571-272-2302. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/PHUONG HUYNH/Primary Examiner, Art Unit 2857                                                                                                                                                                                                        June 7, 2022