DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
Applicant’s amendment, filed 5/6/2021 amended claims 1, 2, 4-6, 8-11, 15 and 20-21, added claims 22 and 23, and did not cancel any claims. Therefore, claims 1-2, 4-17 and 20-23 are pending.  Claims 18 and 19 were previously cancelled in the amendment filed on November 25, 2020, and claim 3 was previously cancelled and claim 21 was previously added in the amendment filed June 26, 2020.
The rejections of claims 1-2 and 4-21 under 35 U.S.C. § 103, set forth in the previous Office Action, have been withdrawn due to Applicant’s amendments to the claims filed 5/6/2021 and the examiner’s amendment discussed below.

EXAMINER’S AMENDMENT
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.

Authorization for this examiner’s amendment was given by Applicant’s representative Wei Yuan, Registration No. 71,772, on July 2, 2021.

The application has been amended as follows: 
1. (Currently Amended) A system for tagging datasets comprising:
at least one processor; and
at least one non-transitory memory storing instructions that, when executed by
the at least one processor cause the at least one processor to perform operations comprising:
receiving a dataset;
applying a series of nodes to the dataset, comprising:
applying a first node of the series of nodes to the dataset, the first node comprising at least one first machine learning model outputting at least one first probability, the first node preceding a plurality of second nodes in the series of nodes;
determining a first tag based on the at least one first probability; 
applying a transition rule to the at least one first probability;
in response to a result of the application of the transition rule to the at least one first probability:
selecting a first one of the second nodes for processing the dataset;
processing the dataset with the selected first one of the second nodes; and
selectively preventing a data flow of the dataset through a second one of the second nodes and a third node following the second one of the second nodes; and 
;
training the series of nodes, the training comprising:
training the at least one first machine learning model to classify a column of data within a first category, each of the second nodes comprising a second machine learning model; and
training the second machine learning models to classify the column of data within a plurality of subcategories of the first category;
generating a data structure comprising the first and second probabilities and the first and second tags; and
outputting the data structure as metadata.

2. (Currently Amended) The system of claim 1, wherein:



applying the first node comprises applying the trained at least one first machine learning model to the dataset; and
selecting the first one of the second nodes comprises selecting one of the trained second machine learning models.



4. (Previously Presented) The system of claim 1, wherein the rule comprises one or more inequalities.

5. (Previously Presented) The system of claim 4, wherein the transition rule further comprises a threshold configured to halt the application processing of the dataset.

6. (Previously Presented) The system of claim 1, wherein the second tag comprises a subcategory of the first tag.

7. (Original) The system of claim 1, wherein the data structure comprises at least one of
a tree structure or a vector.

8. (Previously Presented) The system of claim 1, wherein the at least one machine learning
model comprises at least one of a neural network type, a Bayesian network type,
a support vector machine type, or a random forest type.

9. (Currently Amended) The system of claim 1, wherein the training comprises 



11. (Previously Presented) A system for building an ensemble model for tagging datasets, comprising:
at least one processor; and
at least one non-transitory memory storing instructions that, when executed by the at least one processor cause the at least one processor to perform operations comprising:
training a series of nodes, the nodes comprising machine learning models, the training comprising:
training at least one first machine learning model to classify a column of data within a first category, the first category comprising a plurality subcategories;
training a plurality of second machine learning models to classify the column of data; and
iteratively training third machine learning models to perform subclassifications of the subcategories; 
arranging the trained first, second, and third models in the series of nodes by assigning the trained first, second, and third models to the nodes according to hierarchical levels of the first category, the subcategories, and the subclassifications; and
determining a plurality of transition probabilities governing data movement between the arranged models by applying one or more respective transition rules to outputs from the first, second, and third models in the series of nodes, the transition rules comprising one or more inequalities and a threshold configured to:

selectively prevent a flow of data through:
a second one of the nodes corresponding to a second one of the trained second models; and
a third one of the nodes following the second one of the nodes, the third one of the nodes corresponding to one of the trained third models.

12. (Original) The system of claim 11, wherein the at least one first machine learning
model and at least one of the second machine learning models comprise
different machine learning model types.

13. (Original) The system of claim 12, wherein the different machine learning model
types comprise at least one of a neural network type, a Bayesian network type, a
support vector machine type, or a random forest type.

14. (Original) The system of claim 11, wherein the operations further comprise training
the at least one first machine learning model and at least one of the second
machine learning models using different training sets.

15. (Previously Presented) The system of claim 11, wherein the operations further comprise:
training at least one replacement machine learning model using an updated
training set;

updating one or more of the transition probabilities to integrate the at least one
replacement machine learning model into the series of nodes.
 
16. (Original) The system of claim 15, wherein a machine learning model type of the at
least one replacement machine learning model is different than a machine
learning model type of the replaced model.

17. (Original) The system of claim 16, wherein the machine learning model types
comprise at least one of a neural network type, a Bayesian network type, a
support vector machine type, or a random forest type.

18. (Cancelled).

19. (Cancelled).

20. (Currently Amended) A system for tagging datasets, comprising:
at least one processor; and
at least one non-transitory memory storing instructions that, when executed by
the at least one processor cause the at least one processor to perform operations comprising:
receiving at least one dataset;
applying a series of nodes to the at least one dataset, comprising:

determining a first tag to the at least one dataset based on the at least one first probability;
applying a transition rule to the at least one first probability, the series of nodes further comprising second nodes;
in response to a result of the application of the transition rule:
selecting a first one of the second nodes for processing the dataset; and
selectively preventing a data flow of the dataset through a second one of the second nodes and one or more third nodes following the second one of the second nodes in the series of nodes;
applying the selected first node of the second nodes to output a second probability and an associated second tag;
terminating the iterative application upon one of the following conditions:
a final node in the series has been applied; or
the second probability is below a threshold;
training the series of nodes, the training comprising:
training the first machine learning model to classify a column of data within a first category, each of the second nodes comprising a second machine learning model; and
training the second machine learning models to classify the column of data within a plurality of subcategories of the first category;
generating a data structure comprising the first and second probabilities and the first and second tags; and


21. (Currently Amended) The system of claim 1, wherein:
the is trained to perform a first classification task[[,]];
the selected first one of the second nodes comprises [[a]] the second machine learning model trained to perform a second classification task different from the first classification task; and 
applying the selected first one of the second nodes comprises applying the second machine learning model to the dataset.

22. (Previously Presented) The system of claim 20, wherein the second tag comprises a subcategory of the first tag.

23. (Previously Presented) The system of claim 20, wherein the first machine learning model comprises at least one of: a neutral network model, a Bayesian network model, a support vector machine model, or a random forest model.

Allowable Subject Matter
The following is an examiner’s statement of reasons for allowance:
The currently amended claims 1-2, 4-17 and 20-23 are allowed over the prior art of record.


The prior art of record Ramachandran et al. (U.S. Patent Application Pub. No. 2016/0314123 A1, hereinafter “Ramachandran”) discloses that “the first layer of classification assigns an entity type to a token based on the different identities of the token based on the different contexts … inferred from the training set” and “a set of logical tables, containing data fitted into predefined categories” where “Each table generally contains one or more data categories logically arranged as columns” (see, e.g., paragraphs 34 and 75).

The prior art of record Guttmann et al. (U.S. Patent Application Pub. No. 2019/0294999 A1, hereinafter “Guttmann”) discloses that “datasets 610 may comprise data-points, and view 630 may comprise a rule for merging data-points, a rule for selecting a subset of the data-points, and so forth.” (i.e., selecting data-points/nodes from datasets 610 based on rules), “a rule for merging the information corresponding to a data-point to obtain new annotation information. Such rule may prioritize information from one annotation source over others, may include a decision mechanism to produce new annotation and/or select an annotation out of the original annotations, and so forth.” (i.e., transition rules), “annotations 620 may comprise information corresponding to data-points (i.e., tagged data), and view 630 may comprise a rule for selecting information corresponding to a subset of the data-points” and “a decision rule may compare a computed value to a threshold” (i.e., selecting information/nodes based on 

However, the prior art of record does not anticipate, nor do they render obvious in any reasonable combination to one of ordinary skill in the art at the time of Applicants' invention, the combination of recited limitations of independent claim 1.
For example, the prior art of record does not anticipate or render obvious the limitations “applying a first node of the series of nodes to the dataset, the first node comprising at least one first machine learning model outputting at least one first probability, the first node preceding a plurality of second nodes in the series of nodes;
determining a first tag based on the at least one first probability; 
applying a transition rule to the at least one first probability;
in response to a result of the application of the transition rule to the at least one first probability:
selecting a first one of the second nodes for processing the dataset;
processing the dataset with the selected first one of the second nodes; and
selectively preventing a data flow of the dataset through a second one of the second nodes and a third node following the second one of the second nodes; and 

training the series of nodes, the training comprising:
training the at least one first machine learning model to classify a column of data within a first category, each of the second nodes comprising a second machine learning model; and
training the second machine learning models to classify the column of data within a plurality of subcategories of the first category” as recited in independent claim 1 in combination with the other limitations in this claim.

Independent claims 11 and 20 recite similar distinguishing features.
Thus, independent claims 1, 11 and 20 are patently distinct over the prior art of record for at least the reasons above. 
The remaining claims are dependent claims, thus they are also patently distinct over the prior art of record for at least the reasons above.
In particular, claims 2, 4-10 and 21, 12-17, and 22-23 each depend directly or indirectly from independent claims 1, 11 and 20, respectively, and as such, claims 2, 4-10 and 21, 12-17, and 22-23 each include all of the limitations of base claims 1, 11 and 20, respectively.
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee. Such submissions should be clearly labeled "Comments on Statement of Reasons for Allowance."

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RANDY K BALDWIN whose telephone number is (571)270-5222. The examiner can normally be reached on Mon - Fri 9:00-6:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on 571-272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.







/KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125