DETAILED ACTION
This is in response to the application filed on 08/27/2019 in which claims 1-80 are preserved for examination; of which claims 1, 21, 41, and 61 are in independent forms.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 08/27/2019 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1, 21, 41, and 61 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 6, 12, 18, and 24, respectively, of copending Application No. 17/021,770 (reference application). 
As it is shown in the following table, the claims 6, 12, 18, and 24 of copending Application No. 17/021,770 (as amended on 05/24/2022) anticipates and fully discloses all the limitations of claims 1, 21, 41, and 61 of the instant application.

Application No. 17/021,770 (as amended on 05/24/2022)
Instant application 
1. (Currently Amended) A method for consistently preparing data for a machine learning (ML) system, comprising: receiving a tabular training data set, the training data set including a set of one or more source columns; identifying column labels from the tabular training data set, the column labels associated with a received column of data points from the set of source columns; determining, for an identified column label, a root category based on at least one of a user specification, data types, or distribution properties associated with the data points in the received column from the set of source columns; performing one or more data transformations for data points in the received column, the one or more data transformations for extracting a grammatical structure shared between entries of a categoric feature set to obtain a transformed data set; recording column categories determined for each identified column label and properties of the data transformations performed for each source column in a metadata database; outputting the metadata database and transformed training data set, wherein the transformed training data set is for training a ML system, and wherein the metadata database is output for use by a user for additional data sets; receiving a tabular additional data set and the metadata database; performing the one or more data transformations for data points in corresponding additional columns of the tabular additional data set using the recorded column categories and properties of the data transformations from the metadata database to obtain a transformed additional data set; and outputting the transformed additional data set for use with the ML system.
6. (Original) The method of claim 1, wherein the data transformations for extracting the grammatical structure comprise: performing one or more data transformations for data points in a received column in an order based on defined primitives of a transformation tree to obtain a transformed data set, the transformation tree including defined primitive category entries associated with each root category, wherein the defined primitives associated with the received column are based on a root category associated with the received column, wherein the defined primitive category entries for the root category are associated with a defined transformation function set.
1. (Currently Amended) A method for consistently preparing data for a machine learning (ML) system, comprising: receiving a tabular training data set, the training data set including a set of one or more source columns; identifying column labels from the training data set, the column labels associated with a source column of data points; determining, for each identified column label, a root category based on at least one of a user specification, data types, performing one or more data transformations for data points in each column in an order based on defined primitives of a transformation tree to obtain a transformed data set, the transformation tree including defined primitive category entries associated with each root category, wherein the defined primitives associated with the source column are based on a root category associated with the source column, wherein the defined primitive category entries for the root category are associated with a defined transformation function set; recording the column categories determined for each identified column label and properties of the data transformations performed for each source column in a metadata database; outputting the metadata database and transformed training data set for training a ML system; receiving a tabular additional data set and the metadata database; 2Application No. 16/552, 857Docket No: 1724-0002US Second Preliminary Amendment performing the one or more data transformations for data points in corresponding additional columns of the tabular additional data set using the recorded column categories and properties of the data transformations from the metadata database to obtain a transformed additional data set; and outputting the transformed additional data set for use with the ML system.
claims 12, 18, and 24 (respectively)
claims 21, 41, and 61


This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-8, 15-17, 20-28, 35-37, 40-48, 55-57, 60-68, 75-77, and 80 are rejected under 35 U.S.C. 103 as being unpatentable over MCFALL et al., US 2020/0327252 A1 (hereinafter, MCFALL) in view of Stojanovic et al. U.S. 2016/0092475 A1 (hereinafter, Stojanovic). 
Regarding claim 1, 
MCFALL discloses a method for consistently preparing data for a machine learning (ML) system, comprising: 
receiving a tabular training data set (MCFALL [0511], [0524], [00626]-[0633], [0844]-[0846] e.g. training dataset —- tabular datasets), the training data set including a set of one or more source columns; identifying column labels (MCFALL [0511], [0524], [00626] - [0633] e.g. labels, patterns) from the tabular training data set, the column labels associated with a  source column of data points (MCFALL [0511], [0524], [00626] - [0633] e.g. [0511] If several columns are marked as interesting, Publisher will concatenate the values of all those columns for each row and treat the resulting value as the interesting value for this record. Each combination of values, for instance pairs of gender and eye colour, will be treated as a different label, i.e. (female; blue), (female; green), (male; blue) and (male; green). [0524] If the sensitive column chosen by the user contains categorical labels then each cluster in the output data will contain at least 1 distinct class labels.); 
determining, for each identified column label, a root category (MCFALL [0468], [0471], [0482], [0557] e.g. root category/node) based on at least one of a user specification, data types, or distribution properties associated with the data points in each column from the set of source columns (MCFALL [0468], [0471], [0482], [0557] e.g. [0468] Publisher can generalise nominal columns by supplying a generalisation hierarchy and instructing the system to generalise to a level (measured in distance from the root node) within the hierarchy. [0471] In Publisher, automatic generalisation is implemented as a ‘top-down’ algorithm, meaning that every quasi-identifying attribute starts as fully generalised, and then gets repeatedly specialized (made more specific). Fully generalised means the most general possible--for numerical columns, this is the full range of the variable (e.g. "0-100") while for categorical columns, this is the root node of the generalisation hierarchy. 5.2.4 Splitting Options [0482] A hierarchical category always splits into its child node categories (in the example above, the root category will always be split into a “Vegetable” category and a ~Fruit’ category). Note that the number of records that fall in each child category can therefore be unbalanced (e.g. there may be 80 “Potato°’ or “Carrot” records but only 20 “Appleor “Orange records). [0557] Information loss on generalised categorical columns as the average "generalisation height" across data values. Generalisation height is the number of levels up the hierarchy that the value ended up, normalized by the total distance between the leaf node and the root node. For instance, if a value "January" has a parent "Winter" which has a parent "Any", the root node, and it is generalised to "Winter", then this is a 50% generalisation height): 
performing one or more data transformations for data points in each column in an order based on defined primitives of a transformation tree to obtain a transformed data set (MCFALL [0369] e.g. [0369] Rules may also reference other rules. The system builds up a graph of all of the operations that are to be performed on the rows so that it can apply them in the correct order), the transformation tree including defined primitive category entries associated with each root category (MCFALL [0468], [0471], [0482], [0557] e.g. root category/node), wherein the defined primitives associated with the source column are based on a root category associated with the source column, wherein the defined primitive category entries for the root category are associated with a defined transformation function set (MCFALL [0504] - [0505] e.g. [0504] After repartitioning, the data for each node has been moved to its own partition, so we can now run exactly the same top-down specialisation ‘“locally’--that is, the top down operations can proceed on the data locally in one of the executors, with all the data for the partition held in local memory. This is much faster than the distributed splitting. The amount of distributed splitting required to reach the “repartition point’ depends on the size of the input data and the number of partitions. [0505] FIG. 26 shows an example with a diagram illustrating the top down decision tree approach. A tree structure of nodes is built wherein each node may hold a list of rows and a value for each quasi-identifying column. The first node (n.sub.1) at the top represents the data that is generalised the most and hence has the highest privacy level). 
 recording column categories determined for each identified column label and properties of the data transformations performed for each source column in a metadata database (MCFALL [0106], [0434], [0448], [0451], [0510], [0525], [0977] e.g. 3.6 Configuration Database [0106] Lens uses a relational database (e.g., PostgreSQL) to store configuration information, the audit log, and metadata about the loaded datasets. As configuration information, Lens stores the permitted users, metadata about each user, privacy parameters for each user, as well as the access control information for each (user, column) pair where the user has access to the column. As audit log, Lens stores every query that is asked, as well as the results and the privacy budget spent (if applicable). Lens also stores any alerts that have been triggered. As metadata about the loaded datasets, Lens stores the names of all the tables and columns, as well as the types of columns, and certain other metadata such as the column ranges and the options for categorical columns. Lens captures this dataset metadata when the dataset is uploaded. [0434] This section describes the generalisation functionalities of Publisher. Generalisation is the process of replacing values with less specific values. For categorical columns, less specific values means broader categories: for instance, "Smartphone" is less specific than "iPhone”™ or "Blackberry". For numerical columns, less specific values means wider intervals: for instance, "10-20" is less specific than "15-20", and "15-20" is less specific than "18". Publisher supports generalising certain columns); 
outputting the metadata database and transformed training data set for training a ML system (MCFALL [1068] - [1084] e.g. [1068] D.14 Information may be assembled into rich input to a rules engine or a machine learning classifier. [1069] Machine learning or a rules engine applied to sensitive columns: the system uses machine learning approaches or a rules engine to build a model that can output to a user a score for an unknown column being sensitive or non-sensitive. [1070] Machine learning or a rules engine applied to ideating and quasi-identifying columns: the system uses machine learning approaches or a rules engine to build a model that can output to a user a score for an unknown column being identifying or non-identifying or quasi-identifying or non-quasi-identifying. [1071] The following features may be used in either machine learning process: Any information that could indicate sensitivity or that data in a column is identifying or quasi-identifying, for example: [1072] The number of distinct values [1073] The mean, median, mode, min, max and variance of the numeric values [1074] The type of the value (decimal, integer, string, date) [1075] The column name [1076] Length of column name [1077] The n-grams of the column name (where underscores are considered as breaks between words) [1078] Entropy of the value set [1079] Metadata [1080] Policies [1081] jobs [1082] Data lineage [1083] Join all of the above [1084] Label); 
receiving a tabular additional data set and the metadata database; performing the one or more data transformations for data points in corresponding additional columns of the tabular additional data set using the recorded column categories and properties of the data transformations from the metadata database to obtain a transformed additional data set (MCFALL [0565], [1053], [1068] — [1084] e.g. [0565] This section describes Publisher's features for automatically detecting sensitive, quasi-identifying, or identifying columns. These features allow the program to assist the user in properly configuring the anonymisation of input datasets and, additionally, in identifying new datasets to anonymise. Publisher takes several approaches to detecting sensitive, quasi-identifying, or identifying columns including using metadata, measuring correlation with known columns, and using machine learning. [1053] D.6 Identification of Primary Identifiers: when assessing whether a column is potentially identifying, the system implements one or more of the following techniques: measures the cardinality of columns; analyses column names against a list of names associated with personal identifiers; takes values from previously known sources of identifiers and finds similarity between those sources and the new data in question; uses a set of patterns representing common formats to identify the presence of standard identifier types; scans unstructured columns (for example, log files, chat/email messages, call transcriptions or contracts) for substrings that are equal to values in other columns marked as identifying; and the system compares any of these metrics with a threshold to determine whether or not to inform the user that the new column is potentially identifying. [1068] D.14 Information may be assembled into rich input to a rule engine or a machine learning classifier. [1069] Machine learning or a rules engine applied to sensitive columns: the system uses machine learning approaches or a rules engine to build a model that can output to a user a score for an unknown column being sensitive or nonsensitive. [1070] Machine learning or a rules engine applied to ideating and quasi-identifying columns: the system uses machine learning approaches or a rules engine to build a model that can output to a user a score for an unknown column being identifying or non-identifying or quasi-identifying or non-quasi-identifying); and outputting the transformed additional data set for use with the ML system (MCFALL [0565], [1053], [1068] - [1084]). 
Although MCFALL substantially teaches the above limitations, MCFALL does not explicitly teach using the recorded column categories and properties of the data transformations from the metadata database. 
On the other hand, Stojanovic teaches the limitations by stating receiving a tabular additional data set and the metadata database; performing the one or more data transformations for data points in corresponding additional columns of the tabular additional data set using the recorded column categories and properties of the data transformations from the metadata database to obtain a transformed additional data set (Stojanovic [0073], [0085] — [0091], [0114], [0125] - [0133] e.g. [0073] As discussed above, profile engine 326 can analyze data from a data source to determine whether any patterns exist, and if so, whether a pattern can be classified. Once data obtained from a data source is normalized, the data may be parsed to identify one or more attributes or fields in the structure of the data. Patterns may be identified using a collection of regular expressions, each having a label ("tag") and being defined by a category. The data may be compared to different types of patterns to identify a pattern. [0085] Column-specific statistics may include populated rows (e.g., K-most frequent, Kleast frequent unique values, unique patterns, and (where applicable) types), frequency distributions, text metrics (e.g., minimum, maximum, mean values of: text length, token count, punctuation, pattern-based tokens, and various useful derived text properties), token metrics, data type and subtype, statistical analysis of numeric columns, L-most/least probable simple or compound terms or n-grams found in columns with mostly unstructured data, and reference knowledge categories matched by this naive lexicon, date/time pattern discovery and formatting, reference data matches, and imputed column heading label. [0086] The resulting profile can be used to classify content for subsequent analyses, to suggest, directly or indirectly, transformations of the data, to identify relationships among data sources, and to validate newly acquired data before applying a set of transformations designed based on the profile of previously acquired data. [0088] In some embodiments, the recommendation engine 308 can generate transform recommendations based on the matched patterns received from the knowledge service 310. For example, for the data including social security numbers, the recommendation engine can recommend a transform that obfuscates the entries (e.g., truncating, randomizing, or deleting, all or a portion of the entries). Other examples of transformation may include, reformatting data (e.g., reformatting a date in data), renaming data, enriching data (e.g., inserting values or associating categories with data), searching and replacing data (e.g., correcting spelling of data), change case of letter (e.g., changing a case from upper to lower case), and filter based on black list or white list terms. [0114] In some embodiments, data enrichment service 302 can recommend additional columns of data to be added to a data source. As shown in FIG. 4D, continuing with the city example, transforms 418 have been accepted to enrich the data with new columns including city population, and city location detail including longitude and latitude. When selected, the user's data set is enriched to include this additional information 420. The data set now includes information that was not previously available to the user in a comprehensive and automated fashion. The user's data set can now be used to produce a nationwide map of locations and population zones associated with other data in the dataset (for example, this may be associated with a company's web site transactions). [0131] In at least one example, entity extraction engine 704 can identify entity information (e.g., address information stored in the address column) and identify data related to entity information using, e.g., knowledge service 340. As shown in FIG. 7, the transform engine 322 can join new columns Zip Code and Population to data set 602 when forming enriched data set 708. The enriched data set 708 can then be passed to publish engine 324 to be pushed to one or more data targets 330. [0133] In some embodiments, metadata 802 may include additional data determined by enriching data ingested by data enrichment service 302. For example, metadata 802 may be displayed in GUI 800 based on performing the process described with reference to FIG. 11. In the example shown in FIG. 8A, metadata 802 may display columns 804 (or suggested names) that represents a category (e.g., a classification) for an attribute of data for each entity in the data ingested by data enrichment service 302. Metadata 802 in each row displayed in GUI 800 may correspond to a different entity in the ingested data. The category represented by column 804 may represent a discovered type or classification of an attribute of an entity corresponding to each row of data in the ingested data). 
Therefore, it would have been obvious to one of ordinary skill in the art before the time the invention was effectively filed to modify the teachings of MCFALL with Stojanovic’s teaching in order to implement above function with reasonable expectation of success. The motivation for doing so would have been to enable related data be analyzed for classification, which can be used to enrich the data (Stojanovic [0014]).
Regarding claim 2,
the combination of MCFALL and Stojanovic discloses determining, for one or more rows for each source column, where infill is needed; determining a type of infill for each source column based in part on the column categories; and filling the determined rows where infill is needed based on the determined type of infill (MCFALL [0094]-[0095], [0369]-[0370], [0472], [0511], [0524], [00626] — [0633], and [0844] - [0846]).  
Regarding claim 3,
the combination of MCFALL and Stojanovic discloses wherein the type of infill is determined based on information received from a user (MCFALL [0094]-[0095], [0369]-[0370], [0472], [0511], [0524], [00626]-[0633], and [0844]-[0846]).   
Regarding claim 4,
the combination of MCFALL and Stojanovic  discloses wherein the determined type of infill for the column is a ML infill and further comprising: partitioning a training data set into at least two subsets based on a target column (MCFALL [0283], [0393], [0407]-[0409], and [0633]-[0652], partitioning data); determining a predictive model based on the column categories for the target column (MCFALL [0090] and [0691], predictive model); training the predictive model based on the partitioned subsets; predicting a set of data points for infilling (MCFALL [0090], [0283], [0393], [0407]-[0409], and [0633]-[0652]); inserting the set of data points for infilling in the determined rows where infill is needed; and recording the trained infill predictive model into the metadata database (MCFALL [0090], [0283], [0393], [0407]-[0409], and [0633]-[0652]).  
Regarding claim 5,
the combination of MCFALL and Stojanovic discloses wherein the infill type is recorded into the metadata database and further comprising: determining, for one or more rows for each column of the additional data set, where infill is needed; and filling the determined rows where infill is needed based on the determined type of infill recorded in the metadata database (MCFALL  [0094]-[0095], [0283], [0369]-[0370], [0393], [0472], [0511], [0524], [0626]-[0652], and [0844] - [0846]).  
Regarding claim 6,
the combination of MCFALL and Stojanovic discloses partitioning the additional data set into target column specific or target set of columns specific subsets for use as features to generate infill predictions from the trained infill predictive model recorded in the metadata database; predicting a set of data points for infilling; and filling the determined rows where infill is needed (MCFALL [0283], [0393], [0407]-[0409], and [0633]-[0652]).  
Regarding claim 7,
the combination of MCFALL and Stojanovic discloses wherein the root category associated with the source column is further based on a user provided indication of a category for the source column (MCFALL [0468], [0471], [0482], and [0557]).  
Regarding claim 8,
the combination of MCFALL and Stojanovic discloses wherein the order of data transformations associated with category entries of a first defined primitive and a second defined primitive of the transformation tree is derived based at least in part on a defined relationship between the first defined primitive and the second defined primitive (MCFALL [0369], [0468], [0471], [0482], and [0557]).  
Regarding claim 15,
the combination of MCFALL and Stojanovic discloses partitioning a validation data set from the training data set and performing the one or more data transformations for data points in corresponding columns of the validation data set using the recorded column categories and properties of the data transformations from the metadata database (MCFALL [0283], [0393], [0407]-[0409], [0434], [0448], [0451], [0504]-[0505], [0510], [0525], [0633]-[0652], and  [0977], partition data and transforming data).  
Regarding claim 16,
the combination of MCFALL and Stojanovic discloses wherein one or more category entries to the primitives of the transformation tree are defined based on information received from a user ([0106], [0369], [0434], [0448], [0451], [0504]-[0505], [0510], [0525], and [0977]).
Regarding claim 17,
the combination of MCFALL and Stojanovic discloses wherein at least one transformation function associated with a category entry to a primitive of the transformation tree is defined based on information received from a user ([0106], [0369], [0434], [0448], [0451], [0504]-[0505], [0510], [0525], and [0977]).
Regarding claim 20,
the combination of MCFALL and Stojanovic discloses identifying a source column not associated with a label; assigning the source column a label based on an order of columns associated with the training data set ([0511], [0524], and [0633]).
Regarding claims 21-28, 35-37, and 40,
the scopes of the claims are substantially the same as claims 1-8, 15-17, and 20 respectively, and are rejected on the same basis as set forth for the rejections of claims 1-8, 15-17, and 20, respectively.
Regarding claims 41-48, 55-57, and 60,
the scopes of the claims are substantially the same as claims 1-8, 15-17, and 20 respectively, and are rejected on the same basis as set forth for the rejections of claims 1-8, 15-17, and 20, respectively.
Regarding claims 61-68, 75-77, and 80,
the scopes of the claims are substantially the same as claims 1-8, 15-17, and 20 respectively, and are rejected on the same basis as set forth for the rejections of claims 1-8, 15-17, and 20, respectively.


Claims 12, 32, 52, and 72 are rejected under 35 U.S.C. 103 as being unpatentable over MCFALL et al., US 2020/0327252 A1 in view of Stojanovic et al. U.S. 2016/0092475 A1 and further in view of Barthur, US 2020/0250477.
Regarding claim 12,
the combination of MCFALL and Stojanovic discloses the limitations as stated above including columns of transformed training data set. However, it does not explicitly teach removing data from the training data set and the additional data set based on one or more determined feature importance evaluation scores associated with each data of the training data set.
On the other hand, Barthur discloses removing data entries from training data set based on gradient/threshold values (Barther: [0019], [0021], [0037], and [0042]). Therefore, it would have been obvious to one of ordinary skill in the art before the time the invention was effectively filed to modify the teachings of the combination of MCFALL and Stojanovic with Barthur’s teaching in order to remove one or more columns from the transformed training data set and the additional data set based on one or more determined feature importance evaluation scores associated with each of one or more columns of the training data set with reasonable expectation of success. The motivation for doing so would have been to improve accuracy of training dataset by removing anomalous or low important data.
Regarding claims 32, 52, and 72,
the scopes of the claims are substantially the same as claim 12, and are rejected on the same basis as set forth for the rejection of claim 12.
 
Allowable Subject Matter
Claims 9-11, 13-14, 18-19, 29-31, 33-34, 38-39, 49-51, 53-54, 58-59, 69-71, 73-74, and 78-79 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Yoldemir et al., US 2017/0364815 disclosing associating a relevant semantic data type (e.g., date) with incoming raw data (e.g., a column of digits) which lacks metadata. Assignment of semantic data type is inferred from a plurality of features. A first step determines a first feature comprising success rate in converting the raw data into various semantic data types. Then, alignment between observed/reference distributions of other features (e.g., data first digit, data length) is determined per-semantic data type.
Elliman, US 2020/0134083 disclosing employing machine learning concepts to accurately predict categories for unseen data assets, present the same to a user via a user interface for review, and assign the categories to the data assets responsive to user interaction confirming the same.
Jackson, JR. et al, US 2016/0104077 disclosing receiving at a computer system a document having one or more tables, each table having one or more whitespace features, processing the document using a first computer model executed by the computer system to classify each row of the one or more tables as a header row or a data row, processing the document using a second computer model executed by the computer system to classify each whitespace feature in each row conditional on classification of each row by the first computer model, the second computer model identifying whether a whitespace feature corresponds to information missing from the one or more tables, and generating an output of the classified whitespace features and storing the output in a digital file.

Points of Contact
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HARES JAMI whose telephone number is (571)270-1291. The examiner can normally be reached M-F 9:00a-5:00p.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Vital can be reached on 571-272-4215. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/Hares Jami/           Primary Examiner, Art Unit 2162
06/15/2022