DETAILED ACTION
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
2.	This communication is in response to the Applicant’s submission filed 31 January 2020, where:
Claims 1-14 are pending.
Claims 1-14 are rejected.
Information Disclosure Statement
3.	An information disclosure statement was submitted on 17 June 2020. The submission complies with the provisions of 37 CFR 1.97. Accordingly, the Examiner considered the information disclosure statement.
Claim Rejections - 35 U.S.C. § 103
4.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
5.	The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. 	Determining the scope and contents of the prior art.
2. 	Ascertaining the differences between the prior art and the claims at issue.
3. 	Resolving the level of ordinary skill in the pertinent art.
4. 	Considering objective evidence present in the application indicating obviousness or nonobviousness.
6.	This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
7.	Claims 1-4, 7-11, and 14 are rejected under 35 U.S.C. 103 as being unpatentable over US Patent 11048979 to Zhdanov et al. [hereinafter Zhdanov] in view of US Published Application 20180240031 to Huszar et al. [hereinafter Huszar] and Vongkulbhisal et al., “Unifying Heterogeneous Classifiers with Distillation,” IEEE (2019) [hereinafter Vongkulbhisal].
Regarding claims 1 and 8, Zhdanov teaches [a] method and a server for managing a dataset (Zhdanov 18:65-66 teaches a server; Zhdanov, 3:60-66, teaches [e]ach dataset can be a collection of homogeneous pieces of data (such as image data, video data, comma separated values (CSV) files, etc.). A dataset may be a raw unlabeled dataset, a partially labeled dataset, a gold standard dataset, or a training dataset (that is, data items). As used herein, a gold standard dataset may refer to a dataset that has been verified as being accurately labeled), the dataset comprising data items, labelling tasks associated to the data items and labels corresponding to answers to the labelling tasks (Zhdanov 4:15-18 teaches a label refers to the true underlying object property (that is, each “true underlying object property” are labels corresponding to the labelling tasks), while annotations refer to the tags or other outputs by a labeling task (e.g., by a human labeler or machine annotation) (that is, label tasks associated to the data items)), the method comprising:
determining an artificial intelligence (AI) model to be used on the dataset (Zhdanov 7:23-27, teaches [i]f there is no pre-trained model, it but there is some prelabeled data, this data is used to train a default model for the selected modality. Even if the labels are not very reliable, prelabeled data can be used to train the model (that is, the “default model” being “trainable” is an artificial intelligence (AI) model, and accordingly, is determining an artificial intelligence (AI) model to be used on the dataset));
creating a data mask describing a labeling status of the data items of the dataset (Vongkulbhisal, right column of p. 3178, “3.4.1 Matrix Factorisation in Probability Space,” first paragraph, teaches [t]o account for these missing predictions, we define M ∈ {0, 1}L×N as a mask matrix where Mli is 1 if l ∈ Li and zero otherwise; Dutta, 2:59-64, teaches input data includes a mixture of labelled and unlabeled example; Zheng, right column of p. 4, “6.1 Training Data,” first paragraph); and
repeating a loop, until patience parameters are satisfied (Zhdanov, 2:52-57, teaches [a]s subsets of the dataset are labeled, this label data is used to train a model which can then identify additional objects in the dataset without manual intervention. The process may continue iteratively until the model converges (e.g., identifies objects within an accuracy threshold (that is, patience parameters)) (that is, repeating a loop, until patience parameters are satisfied)):
receiving one or more trusted labels provided by one or more trusted data labelers (Zhdanov, 4:15-18, teaches a label refers to the true underlying object property (that is, a “label” is one or more trusted labels), while annotations refer to the tags or other outputs by a labeling task (e.g., by a human labeler or machine annotation); Zhdanov, 5:19-23, teaches [active learning service (ALS)] 112 can perform active learning for unlabeled or partially unlabeled datasets and use machine learning to evaluate unlabeled raw datasets and provide input into the data labeling process by identifying a subset of the input data to be labeled by manual labelers (that is, receiving one or more trusted labels provided by one or more trusted data labelers));
updating . . . by changing the labeling status of the data items for which a trusted label is received (Zhdanov, 8:44-49, teaches the dataset may include a manifest file which describes dataset properties and records. A record may include named attributes, including metadata such as image size, or labels such as "dog" or "cat". Other attributes may include raw data which needs labeling (that is, labeling status of the data items for which a trusted label is received), such as image or sentences in natural language processing (NLP); Zhdanov 13:39-43 teaches “updating,” in which training the machine learning model using the plurality of labels to generate an updated machine learning model (that is, for an “updated machine learning model,” the labeling status of the data items for which a trusted label is received is by changing the labeling status), wherein the updated detection model is used to perform auto-labeling of in a next iteration of the active learning loop);
from a labelled data items subset . . . training the AI model (Zhdanov, 2:52-55, teaches [a]s subsets of the dataset are labeled, this label data is used to train a model which can then identify additional objects in the dataset without manual intervention);
* * *
from an unlabelled data items subset . . . creating a randomized unlabeled subset having fewer members than the unlabelled data items subset (Zhdanov, 5:23-26, teaches ALS 112 randomly selects a sample of the input dataset for labeling. In some embodiments, ALS 112 selects the subset 25 of the dataset using uncertainty sampling (that is, creating a randomized unlabeled subset having fewer members than the unlabeled data items subset));
at a cluster manager server (Zhdanov 19:2-4 teaches server(s) also may be capable of executing programs or scripts in response requests from user devices (that is, the “server(s)” are a cluster manager server); generally, the Specification recites that an “AI server 2100 may also comprise a cluster manager 2500.” (Specification ¶ 0033)), chunking the randomized unlabeled subset into a plurality of data subsets for dispatching to one or more of the processing nodes (Zhdanov, 5:24-25, teaches ALS 112 randomly selects a sample of the input dataset for labeling (that is, chunking the randomized unlabeled subset into a plurality of data subsets); Zhdanov 19:33-39 teaches Once a subset of the input dataset is identified to be auto-labeled, the subset may be annotated. For example, in some embodiments, the subset may be sent to machine annotation service 114, as shown at numeral 4. Machine annotation service 114 may use an existing model that has been trained on the same or similar labelspace which is selected for the input dataset (that is, for dispatching to one or more of the processing nodes); 
[Examiner notes that this “for dispatching” claim language is an intended use that is not positively recited by the claims]);
at the cluster manager server, receiving an indication that one or more predicted label answers have been inferred by the one or more processing nodes using the local AI model (Zhdanov 7:35-37 teaches [t]he main loop starts by running inference with the model on the validation dataset. After that, every object is given a confidence level (that is, receiving an indication that one or more predicted label answers have been inferred by the one . . . processing nodes using the local AI model)); and
computing a model uncertainty measurement from statistical analysis of the one or more predicted label answers (Zhdanov 7:43-46 teaches the inference on the unlabeled data is performed, and the threshold is applied on the resulting inferences. All objects with the confidence larger than the threshold get auto-annotated and put into the labeled dataset (that is, “confidence” is a measure of uncertainty, such that Zhdanov is computing a model uncertainty measurement from statistical analysis of the one or more predicted label answers));
wherein the patience parameters include one or more of: a threshold value on the model uncertainty measurement and information gain between different training cycles (Zhdanov, 13:8-10 teaches selecting the subset of the input dataset having a confidence score lower than a threshold value. . . . In some embodiments, the operations may further include . . . performing regression on the accuracy of the plurality of auto-annotations to determine a confidence interval (that is, a “confidence interval” comports to a model uncertainty measurement) for each accuracy value, and determining the threshold value based on the confidence interval for a selected accuracy value).
Though Zhdanov teaches machine learning to automate annotation and management of the datasets to increase efficiency of labeling tasks and reducing the time required to perform labeling, Zhdanov does not explicitly teach -
* * *
cloning the trained AI model into a local AI model on the processing nodes;
* * *
But Huszar teaches -
* * *
cloning the trained AI model into a local AI model on the processing nodes (Huszar ¶ 0020 & Fig. 1, teaches an active learning system to build a highly accurate classifier or other machine learning system in less time and with greatly reduced number of labeled examples (Examiner annotations in dashed-text boxes):

    PNG
    media_image1.png
    616
    817
    media_image1.png
    Greyscale

Huszar¶ 0026 teaches the committee generator 110 initializes each committee member by training it using one of the different training sets generated by the committee generator 110. For training of each committee member the system can use any algorithm for training a deep neural network, without modification (that is, “each committee member [of Deep Neural Network 150_1, thru 150_n]” is cloning the trained AI model into a local AI model on the processing nodes); Huszar ¶ 0027 teaches the committee generator 110 may train a single deep neural network on the labeled objects 2015 . . . [that] may be referred to as the source neural network (that is, the trained AI model));
* * *
Zhdanov and Huszar are from the same or similar field of invention. Zhdanov teaches machine learning to automate annotation and management of the datasets. Huszar teaches committee member the system can use any algorithm for training a deep neural network for data labelling. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to modify Zhdanov pertaining to data labeling through machine learning with the artificial intelligence models of the committee members of Huszar.
The motivation for doing so is to learn a strong machine learning model from a much smaller set of labelled examples than is conventionally used to train a system. (Huszar ¶ 0012).
Though Zhdanov and Huszar teach the features of machine learning models for data labelling, the combination of Zhdanov and Huszar, however, do not explicitly teach identifying unlabelled data items “using a data mask, . . . .”
But Vongkulbhisal teaches identifying unlabeled data items “using a data mask.” (Vongkulbhisal, right column of p. 3178, “3.4.1 Matrix Factorisation in Probability Space,” first paragraph, teaches a matrix P ∈ [0, 1]L×N where we set Pli (the element in row l and column i) to pi(Y = l) if l ∈ Li  (that is, “labelled”) and zero otherwise (that is, “unlabeled”). This matrix P is similar to the decision profile matrix in ensemble methods [23], but here we fill in 0 for the classes that Ci’s cannot predict. To account for these missing predictions (that is, “unlabeled data”), we define M ∈ {0, 1}L×N as a mask matrix where Mli is 1 if l ∈ Li and zero otherwise (that is, “unlabeled data items”). With regard to updating the data mask, an accuracy of the heterogeneous classifiers (Vongkulbhisal, Fig. 3(c)).
Zhdanov, Huszar, and Vongkulbhisal are from the same or similar field of invention. Zhdanov teaches machine learning to automate annotation and management of the datasets. Huszar teaches committee member the system can use any algorithm for training a deep neural network for data labelling. Vongkulbhisal teaches cross-entropy minimization and mask matrix factorisation methods for estimating soft labels of the unlabelled data. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to modify the combination of Zhdanov and Huszar pertaining to data labeling through machine learning having the artificial intelligence models of the committee members with the mask matrix of Vongkulbhisal.
The motivation for doing so is to achieve a robust approach to unify heterogeneous classifiers into a single classifier. (Vongkulbhisal, right column of p. 3182, “5. Conclusion,” first paragraph).
Examiner notes that the term "processing module" recited in Applicant's claims is interpreted to be a well-known hardware structure. 
Examiner also notes that the Applicant’s preamble does not afford patentable weight to the Applicant’s claims because the claim preamble is not “necessary to give life, meaning, and vitality” to the claim. Moreover, because the Applicant’s preamble merely states the purpose or intended use of the invention rather than any distinct definition of any of the claimed invention’s limitations, the preamble is not considered a limitation and is of no significance to claim construction.
Regarding claims 2 and 9, the combination of Zhdanov, Huszar, and Vongkulbhisal teach all of the limitations of claims 1 and 8, respectively, as described above in detail. 
further comprising updating the dataset by concatenating the predicted label answers received from the one or more processing nodes into an updated dataset to be used in a next iteration of the loop (Zhdanov, 11:48-53, teaches [u]sing the new labeled portions of the input dataset, the machine learning model can be further trained. This active learning loop may then be repeated on new portions of the input dataset, with each iteration adding to the labeled dataset (that is, updating the dataset by concatenating) and further training the model, until the input dataset has been labeled an updated dataset to be used in a next iteration of the loop)).
Regarding claims 3 and 10, the combination of Zhdanov, Huszar, and Vongkulbhisal teach all of the limitations of claims 1 and 8, respectively, as described above in detail.
wherein receiving the indication further comprises receiving a local model uncertainty measurement for the local AI model from the respective one or more processing nodes (Zhdanov, 6:1-4, teaches [a]nnotation consolidation may refer to the process of taking annotations from multiple annotators (e.g., humans and/or machines) (that is, receiving . . . for the local AI model from the respective one or more processing nodes) and consolidating these together (e.g., using majority-consensus heuristics, removing bias or low-quality annotators, using probabilistic distribution that minimizes a risk function for observed, predicted and true labels, or other techniques). For example, based on each annotators' accuracy history, their annotations can be weighted (that is, receiving a local model uncertainty measurement); Applicant’s specification recites “a model-uncertainty measurement representing the prediction confidence of the model may be computed for each data item” (Specification ¶ 0026); notably, Zhdanov 7:35-39 teaches running inference with the model on the validation dataset. After that, every object is given a confidence level (that is, confidence level is synonymous with “uncertainty measurement”)).
Regarding claims 4 and 11, the combination of Zhdanov, Huszar, and Vongkulbhisal teach all of the limitations of claims 1 and 8, respectively, as described above in detail.
further comprising receiving a computed information gain from the one or more processing nodes (Zhdanov 6:55-61 teaches the data labeling service 108 may also output various performance metrics, such as performance against the annotation budget, quality score of annotated labels and performance against the defined quality threshold, logs and metrics in a monitoring dashboard, and/or an audit trail of annotations tasks as performed by annotators (that is, the “various performance metrics” are receiving a computed information gain from the one or more processing nodes); the Specification recites that “[t]he information gain may be seen as the amount of information gained by training the AI model on a new trusted label of a labeling task. . . . In a preferred embodiment, the information gain may be considered as an average accuracy gain of the model over several iterations of the training (Specification ¶ 0040); accordingly, Examiner notes that the “computed information gain” of the claims has a BRI that covers the performance metrics of Zhdanov).
Regarding claims 7 and 14, the combination of Zhdanov, Huszar, and Vongkulbhisal teach all of the limitations of claims 1 and 8, respectively, as described above in detail.
wherein the data mask is a vector in which each component is 1 if the components is related to a labelled data item and 0 if the components is related to an unlabelled data item (Vongkulbhisal, right column of p. 3178, “3.4.1 Matrix Factorisation in Probability Space,” first paragraph, teaches a matrix P ∈ [0, 1]L×N where we set Pli (the element in row l and column i) to pi(Y = l) if l ∈ Li  (that is, each component is 1 if the component is related to a labelled data item) and zero otherwise (that is, “unlabeled”). This matrix P is similar to the decision profile matrix in ensemble methods [23], but here we fill in 0 for the classes that Ci’s cannot predict. To account for these missing predictions (that is, “unlabeled data”), we define M ∈ {0, 1}L×N as a mask matrix (that is, a vector) where Mli is 1 if l ∈ Li and zero otherwise (that is, 0 if the components is related to an unlabeled data item). With regard to updating the data mask, an accuracy of the heterogeneous classifiers (Vongkulbhisal, Fig. 3(c)).
8.	Claims 5, 6, 12, and 13 are rejected under 35 U.S.C. 103 as being unpatentable over US Patent 11048979 to Zhdanov et al. [hereinafter Zhdanov] in view of US Published Application 20180240031 to Huszar et al. [hereinafter Huszar] and Vongkulbhisal et al., “Unifying Heterogeneous Classifiers with Distillation,” IEEE (2019) [hereinafter Vongkulbhisal], and further in view of US Patent 11120364 to Gokalp et al. [hereinafter Gokalp].
Regarding claims 5 and 12, the combination of Zhdanov, Huszar, and Vongkulbhisal teach all of the limitations of claims 1 and 8, respectively, as described above in detail.
further comprising receiving a computed relevancy values for one or more predicted labels from the one or more processing nodes (Gokalp, 39:40-60, teaches class descriptors parameter 2305 may be used to specify the target classes into which data items are to be categorized in the depicted embodiment, and/or one or more specific attribute values (such as keywords included in the titles/descriptions of the data items) that may be used to create an initial training set. . . . Attribute descriptors 2314 may indicate the names and descriptions of various relevant attributes of the data items (that is, “attribute values” and “attributed descriptors” is receiving a computed relevancy values for one or more predicted labels from the one or more processing nodes), and how various relevant attributes of data items may be parsed/extracted from the raw data items if needed in at least some embodiments).
Zhdanov, Huszar, Vongkulbhisal, and Gokalp are from the same or similar field of invention. Zhdanov teaches machine learning to automate annotation and management of the datasets. Huszar teaches committee member the system can use any algorithm for training a deep neural network for data labelling. Vongkulbhisal teaches cross-entropy minimization and mask matrix factorisation methods for estimating soft labels of the unlabelled data. Gokalp teaches an artificial intelligence system respective status indicators for classifier training iterations are determined. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to modify the combination of Zhdanov, Huszar, and Vongkulbhisal pertaining to data labeling through machine learning having the artificial intelligence models of the committee members using a mask matrix with the attribute values of the class descriptor parameters of Gokalp.
The motivation for doing so is for efficient training of machine learning models such as classifiers using an automated workflow. (Gokalp 3:49-51).
Regarding claims 6 and 13, the combination of Zhdanov, Huszar, Vongkulbhisal, and Gokalp teach all of the limitations of claims 5 and 12, respectively, as described above in detail.
further comprising requesting trusted labels for data items having associated therewith higher relevancy value compared to other ones of the data items (Gokalp 26:24-36 teaches a prediction score which indicates that the current user-suggested label is incorrect (that is, a lower relevancy value) may be indicated in the reconsideration request . . . . [T]he request to reconsider a previously-supplied label may be sent to a different individual/user than the source of the previously-supplied label-e.g., to one of a set of trusted individuals who are permitted to change previously-provided labels (that is, requesting trusted labels for data items having associated therewith higher relevancy value compared to other ones of the data items)).
Conclusion
9.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
(US Published Application 20130097103 to Chari et al.) teaches in each iteration, semi-supervised clustering is used to embed prior knowledge (i.e., labeled samples) to produce clusters close(r) to the true classes. For each data set 80% of the data set was randomly selected to be used to generate the training set and use classifiers trained with this training set to classify the remaining 20% of the samples.
(US Patent 8554703 to Lin et al.) teaches a recency weighted predictive model can be generated by creating a clone of a corresponding trained predictive model from the predictive model repository 215. The recency weighted predictive model is updated with the training data stored in the training data queue 213.
(Kittler et al., “On Combining Classifiers,” IEEE 1998) teaches the combination rule developed under the most restrictive assumptions—the sum rule—outperforms other classifier combinations schemes. A sensitivity analysis of the various schemes to estimation errors is carried out to show that this finding can be justified theoretically.
10.	Any inquiry concerning this communication or earlier communications from the Examiner should be directed to KEVIN L. SMITH whose telephone number is (571) 272-5964. Normally, the Examiner is available on Monday-Thursday 0730-1730. 
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the Examiner by telephone are unsuccessful, the Examiner’s supervisor, KAKALI CHAKI can be reached on 571-272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/K.L.S./
Examiner, Art Unit 2122
/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122