Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION

1.	This action is responsive to the communication filed on 5/24/22.  Claims 1, 7, 13 and 19 have been amended. Claims 4-5, 10-11, 16-17 and 22-23 have been cancelled. Claims 1-3, 6-9, 12-15, 18-21 and 24 are pending.
2.	Applicants' arguments filed 5/24/22 have been fully considered but they are not deemed to be persuasive.  Rejections and/or objections not reiterated from previous office actions are hereby withdrawn.  The following rejections and/or objections are either reiterated or newly applied.  They constitute the complete set presently being applied to the instant application.

Claim Rejections - 35 USC § 103
3.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
4.	This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
5.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

6.	The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
7.	Claims 1, 6-7, 12-13, 19 and 24 are rejected under 35 U.S.C. 103 as being unpatentable over MCFALL et al (US 20200327252 A1, hereinafter “MCFALL”) in view of Stojanovic 475 et al (U.S. 20160092475 A1 hereinafter, “Stojanovic 475”).
8.	With respect to claim 1,
MCFALL discloses a method for consistently preparing data for a machine learning (ML) system, comprising:
receiving a tabular training data set (MCFALL [0511], [0524], [00626] – [0633], [0844] – [0846] e.g. training dataset – tabular datasets), the training data set including a set of one or more source columns;
identifying column labels (MCFALL [0511], [0524], [00626] – [0633] e.g. labels, patterns) from the tabular training data set, the column labels associated with a received column of data points from the set of source columns (MCFALL [0511], [0524], [00626] – [0633] e.g. [0511] If several columns are marked as interesting, Publisher will concatenate the values of all those columns for each row and treat the resulting value as the interesting value for this record.  Each combination of values, for instance pairs of gender and eye colour, will be treated as a different label, i.e. (female; blue), (female; green), (male; blue) and (male; green). [0524] If the sensitive column chosen by the user contains categorical labels then each cluster in the output data will contain at least l distinct class labels. [0626] The user should select potential candidates based on the following criteria: [0627] The exact (or close approximate) value of the attribute for an individual contained in the dataset can be obtained from a secondary, auxiliary data source.  [0628] The attribute is not necessarily unique for the individual but there is a variance of labels contained in the data. [0629] Publisher provides a UI that gives the user sufficient guidance to make an informed choice about the potential set of quasi-identifiers: [0630] 1.  The user is encouraged to select all columns that are known to be linkable to the individual data subject through publicly available information.  [0631] 2.  Metadata automatically obtained by Publisher will be used to highlight columns for known, typically quasi-identifying information such as date of birth, age, or postal codes.  This is based on HIPAA names and analysis of textual patterns (regular expressions, etc) in the data values. [0632] As the selection of quasi-identifiers requires domain-specific knowledge this task is not fully automated in Publisher but clear guidance and automated suggestions are provided. [0633] Publisher supports a machine learning approach to identifying sensitive or quasi-identifying columns.  Publisher constructs a set of training data using the column names and value sets of all datasets that pass through the system, and labels them according to whether they were marked as "sensitive" or not by the user, and separately, whether they were marked as "quasi-identifying" or not by the user.  Publisher can randomly subsample the value sets in order to limit the size of the training set);
determining, for an identified column label, a root category (MCFALL [0468], [0471], [0482], [0557] e.g. root category/node) based on at least one of a user specification, data types, or distribution properties associated with the data points in the received column from the set of source columns (MCFALL [0468], [0471], [0482], [0557] e.g. [0468] Publisher can generalise nominal columns by supplying a generalisation hierarchy and instructing the system to generalise to a level (measured in distance from the root node) within the hierarchy. [0471] In Publisher, automatic generalisation is implemented as a `top-down` algorithm, meaning that every quasi-identifying attribute starts as fully generalised, and then gets repeatedly specialized (made more specific).  Fully generalised means the most general possible--for numerical columns, this is the full range of the variable (e.g. "0-100") while for categorical columns, this is the root node of the generalisation hierarchy. 5.2.4 Splitting Options [0482] A hierarchical category always splits into its child node categories (in the example above, the root category will always be split into a `Vegetable` category and a `Fruit` category).  Note that the number of records that fall in each child category can therefore be unbalanced (e.g. there may be 80 `Potato` or `Carrot` records but only 20 `Apple` or `Orange` records). [0557] Information loss on generalised categorical columns as the average "generalisation height" across data values.  Generalisation height is the number of levels up the hierarchy that the value ended up, normalized by the total distance between the leaf node and the root node.  For instance, if a value "January" has a parent "Winter" which has a parent "Any", the root node, and it is generalised to "Winter", then this is a 50% generalisation height):
performing one or more data transformations for data points in the received column (MCFALL [0093] – [0094] e.g. [0093] Feature extraction is often not a problem for privacy since no information needs to be returned.  For example, a data analyst may wish to transform a dataset before querying over it. [0094] Similar in format to the code for an arbitrary query, the feature extraction feature takes as input the code for an arbitrary row transformation.  A row transformation is a function that takes in a row and outputs a row--the output row need not be the same length as the input row, and may contain transformations, combinations, or arbitrary functions of one or more values in the input row), the one or more data transformations for extracting a grammatical structure shared between entries of a categoric feature set to obtain a transformed data set (MCFALL [0277], [0448], [0508], [0617] – [0624], [0668] e.g. 3.2.1 Extracting Metadata from Data Objects [0448] Categorical columns are generalised according to a hierarchy of related terms.  A hierarchy is a tree structure with the actual raw values in the column in the leaf nodes of the tree.  The nodes above the leaf nodes contain "category" values whose semantic meaning encompasses the child values.  For instance a node with value "tree" might have child nodes "deciduous tree" and "evergreen".  By default, the system generates a flat hierarchy of common terms in the data and an "other" category for uncommon values, where "common" is defined as appearing more than "k" times in the dataset. [0508] The user of Publisher may select a set of interesting (or priority) columns.  The user can specify the set of columns in the dataset that will be especially meaningful for any post-processing or downstream analysis.  All columns that are of interest, have a particular structure that should be preserved, or for which the resolution loss should be minimal can be selected as a priority column.  For instance, if the data is going to be fed into a linear classifier, the target variable can be selected as interesting column because it is the attribute about which we want to detect meaningful patterns.  [0622] Fourthly, Publisher uses a set of patterns representing common formats to identify the presence of standard identifier types such as passport numbers, email addresses and telephone numbers.  These textual pattern descriptions are included with Publisher.  The patterns may be implemented as regular expressions or as more sophisticated `fuzzy` matching approaches; referring to the instant applicant’s specification [0068] “the grammatical structure may refer to patterns which may be embedded in the data, …”);
recording column categories determined for each identified column label and properties of the data transformations performed for each source column in a metadata database (MCFALL [0106], [0434], [0448], [0451], [0510], [0525], [0977] e.g. 3.6 Configuration Database [0106] Lens uses a relational database (e.g., PostgreSQL) to store configuration information, the audit log, and metadata about the loaded datasets.  As configuration information, Lens stores the permitted users, metadata about each user, privacy parameters for each user, as well as the access control information for each (user, column) pair where the user has access to the column.  As audit log, Lens stores every query that is asked, as well as the results and the privacy budget spent (if applicable).  Lens also stores any alerts that have been triggered.  As metadata about the loaded datasets, Lens stores the names of all the tables and columns, as well as the types of columns, and certain other metadata such as the column ranges and the options for categorical columns.  Lens captures this dataset metadata when the dataset is uploaded. [0434] This section describes the generalisation functionalities of Publisher.  Generalisation is the process of replacing values with less specific values.  For categorical columns, less specific values means broader categories: for instance, "Smartphone" is less specific than "iPhone" or "Blackberry".  For numerical columns, less specific values means wider intervals: for instance, "10-20" is less specific than "15-20", and "15-20" is less specific than "18".  Publisher supports generalising certain columns);
outputting the metadata database and transformed training data set, wherein the transformed training data set is for training a ML system (MCFALL [0232], [0242], [0308], [0433], [0465], [0559] – [0580], [0908], [1068] – [1084] e.g. [0232] Using the web application the user defines how the dataset is represented, which transformations should be applied to the dataset and where the dataset can be found.  The web application allows the user to submit the transformation program to their own compute cluster alongside processing instructions.  Once processing is complete the transformation program will persist the anonymised data in the cluster.  The transformation program also writes summary results (such as number of rows processed) in the cluster, and the web application then retrieves these results for display to the user.  [0242] Any data object that conforms to a given Schema can be anonymised by use of a compatible Policy.  It is also possible to have multiple Policies that can transform the same Schema. [0308] Hadoop distributions provide support for data lineage metadata in HDFS, which allows files that have been derived in some way from other files to be connected, recording their origin.  Publisher integrates with such systems to record the production of anonymised files from input files.  When Publisher writes an output file to HDFS, a description of the Policy is created as metadata, and used to connect the sensitive input file with the safe output file.  This metadata is shown in FIG. 18. [0433] Incoming values are read from the input queue and buffered into `micro-batches`--the values within that batch are then tokenised together and transformed into anonymised outputs which are then added to the output queue. [0465] Publisher executes the generalisation by transforming certain columns in a table of data.  The way that columns are transformed is discussed in this section. [0559] The automatic generalisation algorithm transforms columns through binning and hierarchical generalisation. 6.  Automatic Privacy Analysis of Raw Data & Guided Policy Setup [0565] These features allow the program to assist the user in properly configuring the anonymisation of input datasets and, additionally, in identifying new datasets to anonymise. Publisher takes several approaches to detecting sensitive, quasi-identifying, or identifying columns including using metadata, measuring correlation with known columns, and using machine learning. [0566] Publisher effectively combines sensitive data discovery, policy management, and anonymisation to increase the value of each.  This operates in both directions: sensitive data discovery informs, and is also informed by policy and anonymisation activity, so each is improved by the other.  [0567] 1) Identifying and classifying sensitive data, including identifiers, quasi-identifiers and sensitive values, based on the data and metadata about it, as well as policy management user activity, and anonymisation activity.  [0575] Analysing how other this data has been classified and managed in other privacy policies--if there exist policies requiring a column to be tokenised, that is a strong indicator that it is sensitive; [0577] Reading metadata and data lineage information generated from the anonymisation process, in order to tell the difference between sensitive data, and very realistic anonymised data of the same structure.  Since the tokenisation process produces fields of the same structure as the original, and the generalisation process preserves the data istributions, anonymised data looks very like raw sensitive data, and the metadata recording that it has been anonymised is necessary. [0579] Evaluate how privacy risk and data sensitivity is reduced by anonymisation. [0580] These pieces of information may be assembled into rich input to a rules engine or a machine learning classifier [as outputting the metadata database and transformed training data set, wherein the transformed training data set (e.g. transformed - tokenized, anonymized and/or generalized} is for training a ML system (e.g. machine learning classifier}]. [0908] C.2 We can re-state this as a method in which a computer-based system processes a sensitive dataset and publishes a derivative dataset such that privacy is preserved in the derivative dataset by generalising data values to less specific values by transforming columns in a table of data such that the derivative dataset achieves a required level of k-anonymity and l-diversity;), and wherein the metadata database is output for use by a user for additional data sets (MCFALL [0160], [0330] – [0336], [0308], [0447] e.g. [0308] Hadoop distributions provide support for data lineage metadata in HDFS, which allows files that have been derived in some way from other files to be connected, recording their origin.  Publisher integrates with such systems to record the production of anonymised files from input files.  When Publisher writes an output file to HDFS, a description of the Policy is created as metadata, and used to connect the sensitive input file with the safe output file.  This metadata is shown in FIG. 18. [0330] Generating datasets for sharing involves running a kind of Publisher Job where additional metadata fields are specified, which may include, but are not limited to: [0331] Authoriser (name/email address/company); [0332] Recipient (name/email address/company); [0333] Intended purpose (text description); [0334] Expiry date (date); [0335] Terms of Use; [0336] (Arbitrary other fields as configured). [0447] For date columns there are three options available.  By default dates are treated as numeric fields.  Generalising dates as numerics produces lower distortion but may end up with dates that don't align with standard date periods such as internal accounting periods.  Alternatively dates can be generalised using a hierarchy.  Publisher provides a default hierarchy (decades->years->months->days), but alternatively the user can specify a custom hierarchy.  This could include quarters as an additional level, or have year boundaries set to financial years instead of calendar years. [0565] This section describes Publisher's features for automatically detecting sensitive, quasi-identifying, or identifying columns.  These features allow the program to assist the user in properly configuring the anonymisation of input datasets and, additionally, in identifying new datasets to anonymise.  Publisher takes several approaches to detecting sensitive, quasi-identifying, or identifying columns including using metadata, measuring correlation with known columns, and using machine learning [as and wherein the metadata database (e.g. metadata) is output (e.g. output) for use by a user for additional data sets (e.g. additional/new metadata fields/datasets)]);
receiving a tabular additional data set and the metadata database;
performing the one or more data transformations for data points in corresponding additional columns of the tabular additional data set to obtain a transformed additional data set (MCFALL [0565], [1053], [1068] – [1084] e.g. [0565] This section describes Publisher's features for automatically detecting sensitive, quasi-identifying, or identifying columns.  These features allow the program to assist the user in properly configuring the anonymisation of input datasets and, additionally, in identifying new datasets to anonymise.  Publisher takes several approaches to detecting sensitive, quasi-identifying, or identifying columns including using metadata, measuring correlation with known columns, and using machine learning. [1053] D.6 Identification of Primary Identifiers: when assessing whether a column is potentially identifying, the system implements one or more of the following techniques: measures the cardinality of columns; analyses column names against a list of names associated with personal identifiers; takes values from previously known sources of identifiers and finds similarity between those sources and the new data in question; uses a set of patterns representing common formats to identify the presence of standard identifier types; scans unstructured columns (for example, log files, chat/email messages, call transcriptions or contracts) for substrings that are equal to values in other columns marked as identifying; and the system compares any of these metrics with a threshold to determine whether or not to inform the user that the new column is potentially identifying. [1068] D.14 Information may be assembled into rich input to a rules engine or a machine learning classifier. [1069] Machine learning or a rules engine applied to sensitive columns: the system uses machine learning approaches or a rules engine to build a model that can output to a user a score for an unknown column being sensitive or non-sensitive. [1070] Machine learning or a rules engine applied to ideating and quasi-identifying columns: the system uses machine learning approaches or a rules engine to build a model that can output to a user a score for an unknown column being identifying or non-identifying or quasi-identifying or non-quasi-identifying); and
outputting the transformed additional data set for use with the ML system.
Although MCFALL substantially teaches the claimed invention, MCFALL does not explicitly indicate using the recorded column categories and properties of the data transformations from the metadata database.
Stojanovic 475 teaches the limitations by stating
receiving a tabular additional data set and the metadata database;
performing the one or more data transformations for data points in corresponding additional columns of the tabular additional data set using the recorded column categories and properties of the data transformations from the metadata database to obtain a transformed additional data set (Stojanovic 475 [0073], [0085] – [0091], [0114], [0125] – [0133] e.g. [0073] As discussed above, profile engine 326 can analyze data from a data source to determine whether any patterns exist, and if so, whether a pattern can be classified.  Once data obtained from a data source is normalized, the data may be parsed to identify one or more attributes or fields in the structure of the data.  Patterns may be identified using a collection of regular expressions, each having a label ("tag") and being defined by a category.  The data may be compared to different types of patterns to identify a pattern.  [0085] Column-specific statistics may include populated rows (e.g., K-most frequent, K-least frequent unique values, unique patterns, and (where applicable) types), frequency distributions, text metrics (e.g., minimum, maximum, mean values of: text length, token count, punctuation, pattern-based tokens, and various useful derived text properties), token metrics, data type and subtype, statistical analysis of numeric columns, L-most/least probable simple or compound terms or n-grams found in columns with mostly unstructured data, and reference knowledge categories matched by this naive lexicon, date/time pattern discovery and formatting, reference data matches, and imputed column heading label.[0086] The resulting profile can be used to classify content for subsequent analyses, to suggest, directly or indirectly, transformations of the data, to identify relationships among data sources, and to validate newly acquired data before applying a set of transformations designed based on the profile of previously acquired data. [0088] In some embodiments, the recommendation engine 308 can generate transform recommendations based on the matched patterns received from the knowledge service 310.  For example, for the data including social security numbers, the recommendation engine can recommend a transform that obfuscates the entries (e.g., truncating, randomizing, or deleting, all or a portion of the entries).  Other examples of transformation may include, reformatting data (e.g., reformatting a date in data), renaming data, enriching data (e.g., inserting values or associating categories with data), searching and replacing data (e.g., correcting spelling of data), change case of letter (e.g., changing a case from upper to lower case), and filter based on black list or white list terms. [0114] In some embodiments, data enrichment service 302 can recommend additional columns of data to be added to a data source.  As shown in FIG. 4D, continuing with the city example, transforms 418 have been accepted to enrich the data with new columns including city population, and city location detail including longitude and latitude.  When selected, the user's data set is enriched to include this additional information 420.  The data set now includes information that was not previously available to the user in a comprehensive and automated fashion.  The user's data set can now be used to produce a nationwide map of locations and population zones associated with other data in the dataset (for example, this may be associated with a company's web site transactions). [0131] In at least one example, entity extraction engine 704 can identify entity information (e.g., address information stored in the address column) and identify data related to entity information using, e.g., knowledge service 340.  As shown in FIG. 7, the transform engine 322 can join new columns Zip Code and Population to data set 602 when forming enriched data set 708.  The enriched data set 708 can then be passed to publish engine 324 to be pushed to one or more data targets 330. [0133] In some embodiments, metadata 802 may include additional data determined by enriching data ingested by data enrichment service 302.  For example, metadata 802 may be displayed in GUI 800 based on performing the process described with reference to FIG. 11.  In the example shown in FIG. 8A, metadata 802 may display columns 804 (or suggested names) that represents a category (e.g., a classification) for an attribute of data for each entity in the data ingested by data enrichment service 302.  Metadata 802 in each row displayed in GUI 800 may correspond to a different entity in the ingested data.  The category represented by column 804 may represent a discovered type or classification of an attribute of an entity corresponding to each row of data in the ingested data).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the invention, in view of the teachings of MCFALL and Stojanovic 475, to enable related data be analyzed for classification, which can be used to enrich the data (Stojanovic 475 [0014]). 
9.	With respect to claim 6,
	MCFALL further discloses performing one or more data transformations for data points in a received column in an order based on defined primitives of a transformation tree to obtain a transformed data set (MCFALL [0369] e.g. [0369] Rules may also reference other rules.  The system builds up a graph of all of the operations that are to be performed on the rows so that it can apply them in the correct order), the transformation tree including defined primitive category entries associated with each root category (MCFALL [0468], [0471], [0482], [0557] e.g. root category/node), wherein the defined primitives associated with the received column are based on a root category associated with the received column, wherein the defined primitive category entries for the root category are associated with a defined transformation function set (MCFALL [0504] – [0505] e.g. [0504] After repartitioning, the data for each node has been moved to its own partition, so we can now run exactly the same top-down specialisation `locally`--that is, the top-down operations can proceed on the data locally in one of the executors, with all the data for the partition held in local memory.  This is much faster than the distributed splitting.  The amount of distributed splitting required to reach the `repartition point` depends on the size of the input data and the number of partitions. [0505] FIG. 26 shows an example with a diagram illustrating the top down decision tree approach.  A tree structure of nodes is built wherein each node may hold a list of rows and a value for each quasi-identifying column.  The first node (n.sub.1) at the top represents the data that is generalised the most and hence has the highest privacy level).
10.	Claims 7 and 12 are same as claims 1 and 6 and are rejected for the same reasons as applied hereinabove.
11.	Claims 13 and 18 are same as claims 1 and 6 and are rejected for the same reasons as applied hereinabove. 
12.	Claims 19 and 24 are same as claims 1 and 6 and are rejected for the same reasons as applied hereinabove.

13.	Claims 2-3, 8-9, 14-15 and 20-21 are rejected under 35 U.S.C. 103 as being unpatentable over MCFALL in view of Stojanovic 475, and further in view of GIBSON (U.S. 20210117437 A1 hereinafter, “GIBSON”).
14.	With respect to claim 2,
Stojanovic 475 further discloses wherein the data transformations for extracting the grammatical structure comprise:
comparing string character of entries to string character of other entries:
identifying overlaps shared between string character of entries; and
returning one or more returned columns with activations corresponding to identified overlaps from received entries of the categoric feature set (Stojanovic 475 [0082], [0097] e.g. [0082] Using the knowledge sources 340, knowledge service 310 can match, in context, the patterns identified by profile engine 326.  Knowledge service 310 may compare the identified patterns in the data or the data if in text to entity information for different entities stored by a knowledge source.  The entity information may be obtained from one or more knowledge sources 340 using knowledge service 310.  Examples of known entity may include social security numbers, telephone numbers, address, proper names, or other personal information.  The data may be compared to entity information for different entities to determine if there is a match with one or more entities based on the identified pattern. [0097] Using the input data set, which may include added data, knowledge service 310 can implement matching methods (e.g., a graph matching method) to compare the words from the augmented data set to categories of data from knowledge source 340.  Knowledge service 310 can implement a method to determine the semantic similarity between the augmented data set and each category in knowledge source 340 to identify a name for the category.  The name of the category may be chosen based on a highest similarity metric.  The similarity metric may computed be based on the number of terms in the data set that match a category name.  The category may be chosen based on the highest number of terms matching based on the similarity metric.  Techniques and operations performed for similarity analysis and categorization are further described below).
Although MCFALL and Stojanovic 475 combination substantially teaches the claimed invention, they do not explicitly indicate string character subsets of entries.
GIBSON teaches the limitations by stating string character subsets of entries (GIBSON [0055] – [0059] e.g. [0055] FIG. 5 shows the options available under the "Code patterns" tab 404.  For example, the "Code patterns" tab 404 may include options for structuring code.  This example illustrated in FIG. 5 may be for SQL language code.  The "Code patterns" tab 404 includes various conventions for prefixes and suffixes in SQL language code that the designer may set and/or modify.  [0059] In one example implementation, the new table name may default to be "dim_Product" using the dimension table prefix "dim_" that was set in the "Code patterns" tab 404 shown in FIG. 5.  The default name may be modified by the designer.  In some implementations, the choice between fact table or dimension table may be automatically determined using rule-based and/or machine learning-based algorithms to set the table type for the designer and/or to make a pre-selected default suggestion to the designer).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the invention, in view of the teachings of MCFALL, Stojanovic 475 and GIBSON, to enable related data be analyzed for classification, which can be used to enrich the data (Stojanovic 475 [0014]). 
15.	With respect to claim 3,
	GIBSON further discloses
comparing string character subsets of entries to string character subsets of other entries:
identifying overlaps shared between string character subsets of entries; and
returning a returned column with entries from the categoric feature set consolidated into a fewer number of unique values according to the identified overlaps (GIBSON [0055] – [0059] e.g. [0055] FIG. 5 shows the options available under the "Code patterns" tab 404.  For example, the "Code patterns" tab 404 may include options for structuring code.  This example illustrated in FIG. 5 may be for SQL language code.  The "Code patterns" tab 404 includes various conventions for prefixes and suffixes in SQL language code that the designer may set and/or modify.  [0059] In one example implementation, the new table name may default to be "dim_Product" using the dimension table prefix "dim_" that was set in the "Code patterns" tab 404 shown in FIG. 5.  The default name may be modified by the designer.  In some implementations, the choice between fact table or dimension table may be automatically determined using rule-based and/or machine learning-based algorithms to set the table type for the designer and/or to make a pre-selected default suggestion to the designer).
16.	Claims 8-9 are same as claims 2-3 and are rejected for the same reasons as applied hereinabove.
17.	Claims 14-15 are same as claims 2-3 and are rejected for the same reasons as applied hereinabove.
18.	Claims 20-21 are same as claims 2-3 and are rejected for the same reasons as applied hereinabove.

Response to Arguments
19.	On pages 10-13, Applicant alleges MCFALL does not teach “the transformed training data set is for training a ML system” or “the metadata database is output for use by a user for additional data sets.”
	Examiner disagrees because:
As described in MCFALL [0232], [0242], [0308], [0433], [0465], [0559] – [0580], [0908], [1068] – [1084], raw data is transformed (such as tokenized, anonymized and/or generalized), and is inputted into a machine learning classifier.
The disclosure reasonably describes the argued limitation of "the transformed training data set is for training a ML system".
As described in MCFALL [0160], [0330] – [0336], [0308], [0447], metadata is generated and output, and is used to generated additional metadata fields and/or new datasets.
The disclosure reasonably describes the argued limitation of "the metadata database is output for use by a user for additional data sets".

Conclusion
20.	Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SyLing Yen whose telephone number is 571-270-1306.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mark Featherstone can be reached at 571-270-3750.  The fax and phone numbers for the organization where this application or proceeding is assigned is 571-273-8300.
Any inquiry of a general nature or relating to the status of this application or proceeding should be directed to the receptionist whose telephone number is 571-272-2100. 

66



/SYLING YEN/Primary Examiner, Art Unit 2166                                                                                                                                                                                                        
May 29, 2022