Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION

1.	The pending claims 1-24 are presented for examination.

Claim Rejections - 35 USC § 103
2.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
3.	This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

5.	The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
6.	Claims 1, 6-7, 12-13, 19 and 24 are rejected under 35 U.S.C. 103 as being unpatentable over MCFALL et al (US 20200327252 A1, hereinafter “MCFALL”) in view of Stojanovic 475 et al (U.S. 20160092475 A1 hereinafter, “Stojanovic 475”).
7.	With respect to claim 1,
MCFALL discloses a method for consistently preparing data for a machine learning (ML) system, comprising:
receiving a tabular training data set (MCFALL [0511], [0524], [00626] – [0633], [0844] – [0846] e.g. training dataset – tabular datasets), the training data set including a set of one or more source columns;
identifying column labels (MCFALL [0511], [0524], [00626] – [0633] e.g. labels, patterns) from the tabular training data set, the column labels associated with a received column of data points from the set of source columns (MCFALL [0511], [0524], [00626] – [0633] e.g. [0511] If several columns are marked as interesting, Publisher will concatenate the values of all those columns for each row and treat the resulting value as the interesting value for this record.  Each combination of values, for instance pairs of gender and eye colour, will be treated as a different label, i.e. (female; blue), (female; green), (male; blue) and (male; green). [0524] If the sensitive column chosen by the user contains categorical labels then each cluster in the output data will contain at least l distinct class labels. [0626] The user should select potential candidates based on the following criteria: [0627] The exact (or close approximate) value of the attribute for an individual contained in the dataset can be obtained from a secondary, auxiliary data source.  [0628] The attribute is not necessarily unique for the individual but there is a variance of labels contained in the data. [0629] Publisher provides a UI that gives the user sufficient guidance to make an informed choice about the potential set of quasi-identifiers: [0630] 1.  The user is encouraged to select all columns that are known to be linkable to the individual data subject through publicly available information.  [0631] 2.  Metadata machine learning approach to identifying sensitive or quasi-identifying columns.  Publisher constructs a set of training data using the column names and value sets of all datasets that pass through the system, and labels them according to whether they were marked as "sensitive" or not by the user, and separately, whether they were marked as "quasi-identifying" or not by the user.  Publisher can randomly subsample the value sets in order to limit the size of the training set);
determining, for an identified column label, a root category (MCFALL [0468], [0471], [0482], [0557] e.g. root category/node) based on at least one of a user specification, data types, or distribution properties associated with the data points in the received column from the set of source columns (MCFALL [0468], [0471], [0482], [0557] e.g. [0468] Publisher can generalise nominal columns by supplying a generalisation hierarchy and instructing the system to generalise to a level (measured in distance from the root node) within the hierarchy. columns, this is the full range of the variable (e.g. "0-100") while for categorical columns, this is the root node of the generalisation hierarchy. 5.2.4 Splitting Options [0482] A hierarchical category always splits into its child node categories (in the example above, the root category will always be split into a `Vegetable` category and a `Fruit` category).  Note that the number of records that fall in each child category can therefore be unbalanced (e.g. there may be 80 `Potato` or `Carrot` records but only 20 `Apple` or `Orange` records). [0557] Information loss on generalised categorical columns as the average "generalisation height" across data values.  Generalisation height is the number of levels up the hierarchy that the value ended up, normalized by the total distance between the leaf node and the root node.  For instance, if a value "January" has a parent "Winter" which has a parent "Any", the root node, and it is generalised to "Winter", then this is a 50% generalisation height):
performing one or more data transformations for data points in the received column (MCFALL [0093] – [0094] e.g. [0093] Feature extraction is often not a problem for privacy since no , the one or more data transformations for extracting a grammatical structure shared between entries of a categoric feature set to obtain a transformed data set (MCFALL [0277], [0448], [0508], [0617] – [0624], [0668] e.g. 3.2.1 Extracting Metadata from Data Objects [0448] Categorical columns are generalised according to a hierarchy of related terms.  A hierarchy is a tree structure with the actual raw values in the column in the leaf nodes of the tree.  The nodes above the leaf nodes contain "category" values whose semantic meaning encompasses the child values.  For instance a node with value "tree" might have child nodes "deciduous tree" and "evergreen".  By default, the system generates a flat hierarchy of common terms in the data and an "other" category for uncommon values, where "common" is defined as appearing more than "k" times in the dataset. [0508] The user of Publisher may select a set of interesting (or priority) columns.  The user can specify the set of columns in the dataset column because it is the attribute about which we want to detect meaningful patterns.  [0622] Fourthly, Publisher uses a set of patterns representing common formats to identify the presence of standard identifier types such as passport numbers, email addresses and telephone numbers.  These textual pattern descriptions are included with Publisher.  The patterns may be implemented as regular expressions or as more sophisticated `fuzzy` matching approaches; referring to the instant applicant’s specification [0068] “the grammatical structure may refer to patterns which may be embedded in the data, …”);
recording column categories determined for each identified column label and properties of the data transformations performed for each source column in a metadata database (MCFALL [0106], [0434], [0448], [0451], [0510], [0525], [0977] e.g. 3.6 Configuration Database [0106] Lens uses a relational database (e.g., PostgreSQL) to store configuration information, the audit log, and metadata about the loaded datasets.  As configuration information, Lens stores the ;
outputting the metadata database and transformed training data set for training a ML system (MCFALL [1068] – [1084] e.g. [1068] D.14 Information may be assembled into rich input to a rules engine or a machine learning classifier. [1069] Machine learning or a machine learning approaches or a rules engine to build a model that can output to a user a score for an unknown column being sensitive or non-sensitive. [1070] Machine learning or a rules engine applied to ideating and quasi-identifying columns: the system uses machine learning approaches or a rules engine to build a model that can output to a user a score for an unknown column being identifying or non-identifying or quasi-identifying or non-quasi-identifying. [1071] The following features may be used in either machine learning process: Any information that could indicate sensitivity or that data in a column is identifying or quasi-identifying, for example: [1072] The number of distinct values [1073] The mean, median, mode, min, max and variance of the numeric values [1074] The type of the value (decimal, integer, string, date) [1075] The column name [1076] Length of column name [1077] The n-grams of the column name (where underscores are considered as breaks between words) [1078] Entropy of the value set [1079] Metadata [1080] Policies [1081] jobs [1082] Data lineage [1083] Join all of the above [1084] Label);
receiving a tabular additional data set and the metadata database;
performing the one or more data transformations for data points in corresponding additional columns of the tabular additional data set to obtain a transformed additional data set (MCFALL [0565], [1053], [1068] – [1084] features for automatically detecting sensitive, quasi-identifying, or identifying columns.  These features allow the program to assist the user in properly configuring the anonymisation of input datasets and, additionally, in identifying new datasets to anonymise.  Publisher takes several approaches to detecting sensitive, quasi-identifying, or identifying columns including using metadata, measuring correlation with known columns, and using machine learning. [1053] D.6 Identification of Primary Identifiers: when assessing whether a column is potentially identifying, the system implements one or more of the following techniques: measures the cardinality of columns; analyses column names against a list of names associated with personal identifiers; takes values from previously known sources of identifiers and finds similarity between those sources and the new data in question; uses a set of patterns representing common formats to identify the presence of standard identifier types; scans unstructured columns (for example, log files, chat/email messages, call transcriptions or contracts) for substrings that are equal to values in other columns marked as identifying; and the system compares any of these metrics with a threshold to determine whether or not to inform the user that the new column is potentially identifying. [1068] D.14 Information may be assembled into rich input to a rules engine or a machine learning classifier. [1069] Machine learning or a rules engine applied to sensitive columns: the system uses machine learning approaches or a rules engine to build a model that can output to a user a score for an unknown column being sensitive or non-sensitive. [1070] Machine learning or a rules engine applied to ideating and quasi-identifying columns: the system uses machine learning approaches or a rules engine to build a model that can output to a user a score for an unknown column being identifying or non-identifying or quasi-identifying or non-quasi-identifying); and
outputting the transformed additional data set for use with the ML system.
Although MCFALL substantially teaches the claimed invention, MCFALL does not explicitly indicate using the recorded column categories and properties of the data transformations from the metadata database.
Stojanovic 475 teaches the limitations by stating
receiving a tabular additional data set and the metadata database;
performing the one or more data transformations for data points in corresponding additional columns of the tabular additional data set using the recorded column categories and properties of the data transformations from the metadata database to obtain a transformed additional data set (Stojanovic 475 [0073], [0085] – [0091], [0114], [0125] – [0133] e.g. [0073] As discussed above, profile engine 326 can analyze data from a data source to determine whether any patterns exist, and if so, whether a pattern can be classified.  Once data obtained from a Patterns may be identified using a collection of regular expressions, each having a label ("tag") and being defined by a category.  The data may be compared to different types of patterns to identify a pattern.  [0085] Column-specific statistics may include populated rows (e.g., K-most frequent, K-least frequent unique values, unique patterns, and (where applicable) types), frequency distributions, text metrics (e.g., minimum, maximum, mean values of: text length, token count, punctuation, pattern-based tokens, and various useful derived text properties), token metrics, data type and subtype, statistical analysis of numeric columns, L-most/least probable simple or compound terms or n-grams found in columns with mostly unstructured data, and reference knowledge categories matched by this naive lexicon, date/time pattern discovery and formatting, reference data matches, and imputed column heading label.[0086] The resulting profile can be used to classify content for subsequent analyses, to suggest, directly or indirectly, transformations of the data, to identify relationships among data sources, and to validate newly acquired data before applying a set of transformations designed based on the profile of previously acquired data. [0088] In some embodiments, the recommendation engine 308 can generate transform recommendations associating categories with data), searching and replacing data (e.g., correcting spelling of data), change case of letter (e.g., changing a case from upper to lower case), and filter based on black list or white list terms. [0114] In some embodiments, data enrichment service 302 can recommend additional columns of data to be added to a data source.  As shown in FIG. 4D, continuing with the city example, transforms 418 have been accepted to enrich the data with new columns including city population, and city location detail including longitude and latitude.  When selected, the user's data set is enriched to include this additional information 420.  The data set now includes information that was not previously available to the user in a comprehensive and automated fashion.  The user's data set can now be used to produce a nationwide map of locations and population zones associated with other data in the dataset (for example, this may be associated with a company's web site transactions). [0131] In at least one example, entity extraction engine 704 can identify entity information (e.g., address information stored in the address column) and identify data related to entity information using, e.g., knowledge service 340.  As shown in FIG. 7, the transform engine 322 can join new columns Zip Code and Population to data set 602 when forming enriched data set 708.  The enriched data set 708 can then be passed to publish engine 324 to be pushed to one or more data targets 330. [0133] In some embodiments, metadata 802 may include additional data determined by enriching data ingested by data enrichment service 302.  For example, metadata 802 may be displayed in GUI 800 based on performing the process described with reference to FIG. 11.  In the example shown in FIG. 8A, metadata 802 may display columns 804 (or suggested names) that represents a category (e.g., a classification) for an attribute of data for each entity in the data ingested by data enrichment service 302.  Metadata 802 in each row displayed in GUI 800 may correspond to a different entity in the ingested data.  The category represented by column 804 may represent a discovered type or classification of an attribute of an entity corresponding to each row of data in the ingested data).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the invention, in view of the teachings of MCFALL and 
8.	With respect to claim 6,
	MCFALL further discloses performing one or more data transformations for data points in a received column in an order based on defined primitives of a transformation tree to obtain a transformed data set (MCFALL [0369] e.g. [0369] Rules may also reference other rules.  The system builds up a graph of all of the operations that are to be performed on the rows so that it can apply them in the correct order), the transformation tree including defined primitive category entries associated with each root category (MCFALL [0468], [0471], [0482], [0557] e.g. root category/node), wherein the defined primitives associated with the received column are based on a root category associated with the received column, wherein the defined primitive category entries for the root category are associated with a defined transformation function set (MCFALL [0504] – [0505] e.g. [0504] After repartitioning, the data for each node has been moved to its own partition, so we can now run exactly the same top-down specialisation `locally`--that is, the top-down operations can proceed on the data locally in one of the executors, with all the data for the partition held in local memory.  This is much faster than the distributed splitting.  The amount of distributed splitting required to reach the `repartition point` depends on the size of the input data and tree structure of nodes is built wherein each node may hold a list of rows and a value for each quasi-identifying column.  The first node (n.sub.1) at the top represents the data that is generalised the most and hence has the highest privacy level).
9.	Claims 7 and 12 are same as claims 1 and 6 and are rejected for the same reasons as applied hereinabove.
10.	Claims 13 and 18 are same as claims 1 and 6 and are rejected for the same reasons as applied hereinabove. 
11.	Claims 19 and 24 are same as claims 1 and 6 and are rejected for the same reasons as applied hereinabove.

12.	Claims 2-5, 8-11, 14-17 and 20-23 are rejected under 35 U.S.C. 103 as being unpatentable over MCFALL in view of Stojanovic 475, and further in view of GIBSON (U.S. 20210117437 A1 hereinafter, “GIBSON”).
13.	With respect to claim 2,
Stojanovic 475 further discloses wherein the data transformations for extracting the grammatical structure comprise:
comparing string character of entries to string character of other entries:
identifying overlaps shared between string character of entries; and
returning one or more returned columns with activations corresponding to identified overlaps from received entries of the categoric feature set (Stojanovic 475 [0082], [0097] e.g. [0082] Using the knowledge .
Although MCFALL and Stojanovic 475 combination substantially teaches the claimed invention, they do not explicitly indicate string character subsets of entries.
GIBSON teaches the limitations by stating string character subsets of entries (GIBSON [0055] – [0059] e.g. [0055] FIG. 5 shows the options available under the "Code patterns" tab 404.  For example, the "Code patterns" tab 404 may include options for structuring code.  This example illustrated in FIG. 5 may be for SQL language code.  The "Code patterns" tab 404 includes various conventions for prefixes and suffixes in SQL language code that the designer may set and/or modify.  [0059] In one example implementation, the new table name may default to be "dim_Product" using the dimension table prefix "dim_" that was set in the "Code patterns" tab 404 shown in FIG. 5.  The default name may be modified by the designer.  In some implementations, the choice between fact table or dimension table may be automatically determined using rule-based and/or machine learning-based algorithms to set the table type for the designer and/or to make a pre-selected default suggestion to the designer).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the invention, in view of the teachings of MCFALL, 
14.	With respect to claim 3,
	GIBSON further discloses
comparing string character subsets of entries to string character subsets of other entries:
identifying overlaps shared between string character subsets of entries; and
returning a returned column with entries from the categoric feature set consolidated into a fewer number of unique values according to the identified overlaps (GIBSON [0055] – [0059] e.g. [0055] FIG. 5 shows the options available under the "Code patterns" tab 404.  For example, the "Code patterns" tab 404 may include options for structuring code.  This example illustrated in FIG. 5 may be for SQL language code.  The "Code patterns" tab 404 includes various conventions for prefixes and suffixes in SQL language code that the designer may set and/or modify.  [0059] In one example implementation, the new table name may default to be "dim_Product" using the dimension table prefix "dim_" that was set in the "Code patterns" tab 404 shown in FIG. 5.  The default name may be modified by the designer.  In some implementations, the choice between fact table or dimension table may be automatically determined using rule-based and/or machine learning-based algorithms to set the table type for the designer .
15.	With respect to claim 4,
	Stojanovic 475 further discloses
inspecting string character extracts of entries;
checking validity of the extracts as numeric character sets (Stojanovic 475 [0086] e.g. to validate newly acquired data before applying a set of transformations designed based on the profile of previously acquired data); and
returning a returned column with extracted numeric entries (Stojanovic 475 [0084] – [0085] e.g. numeric data can be analyzed statistically).
GIBSON further discloses string character subsets of entries (GIBSON [0055] – [0059] e.g. [0055] FIG. 5 shows the options available under the "Code patterns" tab 404.  For example, the "Code patterns" tab 404 may include options for structuring code.  This example illustrated in FIG. 5 may be for SQL language code.  The "Code patterns" tab 404 includes various conventions for prefixes and suffixes in SQL language code that the designer may set and/or modify.  [0059] In one example implementation, the new table name may default to be "dim_Product" using the dimension table prefix "dim_" that was set in the "Code patterns" tab 404 shown in FIG. 5.  The default name may be machine learning-based algorithms to set the table type for the designer and/or to make a pre-selected default suggestion to the designer).
16.	With respect to claim 5,
	MCFALL further discloses
receiving one or more search terms as a parameter to a transformation function; and
returning one or more returned columns with activations associated with identified search terms present as string character in the entries of the categoric feature set (MCFALL [0109] e.g. At this stage, queries may also be altered to return more general results.  For instance, the query "SELECT AVG(salary) WHERE weight=207" might be altered to read "SELECT AVG(salary) WHERE weight>200 AND weight< 250").
	GIBSON further discloses string character subsets of entries (GIBSON [0055] – [0059] e.g. [0055] FIG. 5 shows the options available under the "Code patterns" tab 404.  For example, the "Code patterns" tab 404 may include options for structuring code.  This example illustrated in FIG. 5 may be for SQL language code.  The "Code patterns" tab 404 includes various conventions for prefixes and suffixes in SQL language code that the designer may set and/or modify.  [0059] In one example implementation, the "dim_Product" using the dimension table prefix "dim_" that was set in the "Code patterns" tab 404 shown in FIG. 5.  The default name may be modified by the designer.  In some implementations, the choice between fact table or dimension table may be automatically determined using rule-based and/or machine learning-based algorithms to set the table type for the designer and/or to make a pre-selected default suggestion to the designer).
17.	Claims 8-11 are same as claims 2-5 and are rejected for the same reasons as applied hereinabove.
18.	Claims 14-18 are same as claims 2-5 and are rejected for the same reasons as applied hereinabove.
19.	Claims 20-23 are same as claims 2-5 and are rejected for the same reasons as applied hereinabove.

Conclusion
The prior art made of record, listed on form PTO-892, and not relied upon, if any, is considered pertinent to applicant's disclosure.
20.	The examiner requests, in response to this office action, support be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line no(s) in the specification and/or drawing figure(s). This will assist the examiner in prosecuting the application.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SYLING YEN whose telephone number is (571)270-1306.  The examiner can normally be reached on 8am-4:30pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mark Featherstone can be reached at 571-270-3750.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



February 24, 2022