DETAILED ACTION
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
2.	This communication is in response to the Applicant’s submission filed 02 May 2022 [hereinafter Response], where:
Claims 1 and 8 have been amended.
Claims 1-14 are pending.
Claims 1-14 are rejected.
Claim Rejections - 35 U.S.C. § 103
3.	The following is a quotation of 35 U.S.C. § 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
4.	The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. § 103 are summarized as follows:
1. 	Determining the scope and contents of the prior art.
2. 	Ascertaining the differences between the prior art and the claims at issue.
3. 	Resolving the level of ordinary skill in the pertinent art.
4. 	Considering objective evidence present in the application indicating obviousness or nonobviousness.
5.	This application currently names joint inventors. In considering patentability of the claims the Examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the Examiner to consider the applicability of 35 U.S.C. § 102(b)(2)(C) for any potential 35 U.S.C. § 102(a)(2) prior art against the later invention.
6.	Claims 1-4, 7-11, and 14 are rejected under 35 U.S.C. § 103 as being unpatentable over US Patent 11048979 to Zhdanov et al. [hereinafter Zhdanov] in view of US Published Application 20180240031 to Huszar et al. [hereinafter Huszar] and Vongkulbhisal et al., “Unifying Heterogeneous Classifiers with Distillation,” IEEE (2019) [hereinafter Vongkulbhisal].
Regarding claims 1 and 8, Zhdanov teaches [a] method and a server for managing a dataset (Zhdanov 18:65-66 teaches a server; Zhdanov, 3:60-66, teaches [e]ach dataset can be a collection of homogeneous pieces of data (such as image data, video data, comma separated values (CSV) files, etc.). A dataset may be a raw unlabeled dataset, a partially labeled dataset, a gold standard dataset, or a training dataset (that is, data items). As used herein, a gold standard dataset may refer to a dataset that has been verified as being accurately labeled), the dataset comprising data items, labelling tasks associated to the data items and labels corresponding to answers to the labelling tasks (Zhdanov 4:15-18 teaches a label refers to the true underlying object property (that is, each “true underlying object property” are labels corresponding to the labelling tasks), while annotations refer to the tags or other outputs by a labeling task (e.g., by a human labeler or machine annotation) (that is, label tasks associated to the data items)), the method comprising:
determining an artificial intelligence (AI) model to be used on the dataset (Zhdanov 7:23-27, teaches [i]f there is no pre-trained model, it but there is some prelabeled data, this data is used to train a default model for the selected modality. Even if the labels are not very reliable, prelabeled data can be used to train the model (that is, the “default model” being “trainable” is an artificial intelligence (AI) model, and accordingly, is determining an artificial intelligence (AI) model to be used on the dataset));
creating a data mask describing a labeling status of the data items of the dataset (Vongkulbhisal, right column of p. 3178, “3.4.1 Matrix Factorisation in Probability Space,” first paragraph, teaches [t]o account for these missing predictions, we define M ∈ {0, 1}L×N as a mask matrix where Mli is 1 if l ∈ Li and zero otherwise); and
repeating a loop, until patience parameters are satisfied (Zhdanov, 2:52-57, teaches [a]s subsets of the dataset are labeled, this label data is used to train a model which can then identify additional objects in the dataset without manual intervention. The process may continue iteratively until the model converges (e.g., identifies objects within an accuracy threshold (that is, patience parameters)) (that is, repeating a loop, until patience parameters are satisfied)):
receiving one or more trusted labels provided by one or more trusted data labelers (Zhdanov, 4:15-18, teaches a label refers to the true underlying object property (that is, a “label” is one or more trusted labels), while annotations refer to the tags or other outputs by a labeling task (e.g., by a human labeler or machine annotation); Zhdanov, 5:19-23, teaches [active learning service (ALS)] 112 can perform active learning for unlabeled or partially unlabeled datasets and use machine learning to evaluate unlabeled raw datasets and provide input into the data labeling process by identifying a subset of the input data to be labeled by manual labelers (that is, receiving one or more trusted labels provided by one or more trusted data labelers));
updating . . . by changing the labeling status of the data items for which a trusted label is received (Zhdanov, 8:44-49, teaches the dataset may include a manifest file which describes dataset properties and records. A record may include named attributes, including metadata such as image size, or labels such as "dog" or "cat". Other attributes may include raw data which needs labeling (that is, labeling status of the data items for which a trusted label is received), such as image or sentences in natural language processing (NLP); Zhdanov 13:39-43 teaches “updating,” in which training the machine learning model using the plurality of labels to generate an updated machine learning model (that is, for an “updated machine learning model,” the labeling status of the data items for which a trusted label is received is by changing the labeling status), wherein the updated detection model is used to perform auto-labeling of in a next iteration of the active learning loop);
from a labelled data items subset . . . training the AI model (Zhdanov, 2:52-55, teaches [a]s subsets of the dataset are labeled, this label data is used to train a model which can then identify additional objects in the dataset without manual intervention);
* * *
from an unlabelled data items subset . . . creating a randomized unlabeled subset having fewer members than the unlabelled data items subset (Zhdanov, 5:23-26, teaches ALS 112 randomly selects a sample of the input dataset for labeling. In some embodiments, ALS 112 selects the subset 25 of the dataset using uncertainty sampling (that is, creating a randomized unlabeled subset having fewer members than the unlabeled data items subset));
at a cluster manager server (Zhdanov 19:2-4 teaches server(s) also may be capable of executing programs or scripts in response requests from user devices (that is, the “server(s)” are a cluster manager server); generally, the Specification recites that an “AI server 2100 may also comprise a cluster manager 2500.” (Specification ¶ 0033)), chunking the randomized unlabeled subset into a plurality of data subsets, and dispatching the chunked data subsets to one or more of the processing nodes (Zhdanov, 5:24-25, teaches ALS 112 randomly selects a sample of the input dataset for labeling (that is, chunking the randomized unlabeled subset into a plurality of data subsets); Zhdanov 19:33-39 teaches [o]nce a subset of the input dataset is identified (that is, the “subset” is chunked) to be auto-labeled, the subset may be annotated. For example, in some embodiments, the subset may be sent to machine annotation service 114, as shown at numeral 4. Machine annotation service 114 may use an existing model that has been trained on the same or similar labelspace which is selected for the input dataset (that is, for dispatching to one or more of the processing nodes); also Zhdanov 7:17-19 teaches a random subset of the unlabeled dataset is selected for validation and sent to [a plurality of] human annotators (that is, the “random subset” being “sent to a plurality of human annotators” is chunking)); );
at the cluster manager server, receiving an indication that one or more predicted label answers have been inferred by the one or more processing nodes using the local AI model (Zhdanov 7:35-37 teaches [t]he main loop starts by running inference with the model on the validation dataset. After that, every object is given a confidence level (that is, receiving an indication that one or more predicted label answers have been inferred by the one . . . processing nodes using the local AI model)); and
computing a model uncertainty measurement from statistical analysis of the one or more predicted label answers (Zhdanov 7:43-46 teaches the inference on the unlabeled data is performed, and the threshold is applied on the resulting inferences. All objects with the confidence larger than the threshold get auto-annotated and put into the labeled dataset (that is, “confidence” is a measure of uncertainty, such that Zhdanov is computing a model uncertainty measurement from statistical analysis of the one or more predicted label answers));
wherein the patience parameters include one or more of: a threshold value on the model uncertainty measurement and information gain between different training cycles (Zhdanov, 13:8-10 teaches selecting the subset of the input dataset having a confidence score lower than a threshold value. . . . In some embodiments, the operations may further include . . . performing regression on the accuracy of the plurality of auto-annotations to determine a confidence interval (that is, a “confidence interval” comports to a model uncertainty measurement) for each accuracy value, and determining the threshold value based on the confidence interval for a selected accuracy value).
Though Zhdanov teaches machine learning to automate annotation and management of the datasets using pre-trained supervised machine learning model (that is, a trained AI model) to increase efficiency of labeling tasks and reducing the time required to perform labeling (Zhdanov 7:3-7), Zhdanov does not explicitly teach -
* * *
cloning the . . . AI model into a local AI model on the processing nodes;
* * *
But Huszar teaches -
* * *
cloning the . . . AI model into a local AI model on the processing nodes (Huszar ¶ 0009 teaches [i]n Bayesian inference, the answer to a machine learning problem is not just a single deep learning model, but a whole distribution of deep learning models, called the posterior distribution (that is, each “deep learning model” is a local AI model on the processing nodes); Huszar ¶ 0020 & Fig. 1, teaches an active learning system to build a highly accurate classifier or other machine learning system in less time and with greatly reduced number of labeled examples (Examiner annotations in dashed-text boxes):

    PNG
    media_image1.png
    616
    817
    media_image1.png
    Greyscale

Huszar ¶ 0026 teaches the committee generator 110 initializes each committee member by training it using one of the different training sets generated by the committee generator 110. For training of each committee member the system can use any algorithm for training a deep neural network, without modification (that is, “each committee member [of Deep Neural Network 150_1, thru 150_n]” is cloning the . . . AI model into a local AI model on the processing nodes); Huszar ¶ 0027 teaches the committee generator 110 may train a single deep neural network on the labeled objects 2015 . . . [that] may be referred to as the source neural network (that is, the trained AI model));
* * *
Zhdanov and Huszar are from the same or similar field of invention. Zhdanov teaches machine learning to automate annotation and management of the datasets. Huszar teaches committee member the system can use any algorithm for training a deep neural network for data labelling. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to modify Zhdanov pertaining to data labeling through machine learning with the artificial intelligence models of the committee members of Huszar.
The motivation for doing so is to learn a strong machine learning model from a much smaller set of labelled examples than is conventionally used to train a system. (Huszar ¶ 0012).
Though Zhdanov and Huszar teach the features of machine learning models for data labelling, the combination of Zhdanov and Huszar, however, do not explicitly teach identifying unlabelled data items “using a data mask, . . . .”
But Vongkulbhisal teaches identifying unlabeled data items “using a data mask.” (Vongkulbhisal, right column of p. 3178, “3.4.1 Matrix Factorisation in Probability Space,” first paragraph, teaches a matrix P ∈ [0, 1]L×N where we set Pli (the element in row l and column i) to pi(Y = l) if l ∈ Li  (that is, “labelled”) and zero otherwise (that is, “unlabeled”). This matrix P is similar to the decision profile matrix in ensemble methods [23], but here we fill in 0 for the classes that Ci’s cannot predict. To account for these missing predictions (that is, “unlabeled data”), we define M ∈ {0, 1}L×N as a mask matrix where Mli is 1 if l ∈ Li and zero otherwise (that is, “unlabeled data items”). With regard to updating the data mask, an accuracy of the heterogeneous classifiers (see Vongkulbhisal, Fig. 3(c); also, Vongkulbhisal teaches an adjustment set, in which To vary the accuracy of each [classifier] Ci, we take 50 samples per class from training data as the adjustment set, completely train each Ci from the remaining training data, then inject increasing Gaussian noise into the last [fully connected] layer until its accuracy on the adjustment set drops to the desired value (Vongkulbhisal, right column at p. 3182, “4.2 Sensitivity Analysis,” first partial paragraph)).
Zhdanov, Huszar, and Vongkulbhisal are from the same or similar field of invention. Zhdanov teaches machine learning to automate annotation and management of the datasets. Huszar teaches committee member the system can use any algorithm for training a deep neural network for data labelling. Vongkulbhisal teaches cross-entropy minimization and mask matrix factorisation methods for estimating soft labels of the unlabelled data. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to modify the combination of Zhdanov and Huszar pertaining to data labeling through machine learning having the artificial intelligence models of the committee members with the mask matrix of Vongkulbhisal.
The motivation for doing so is to achieve a robust approach to unify heterogeneous classifiers into a single classifier. (Vongkulbhisal, right column of p. 3182, “5. Conclusion,” first paragraph).
Examiner notes that the term "processing module" recited in Applicant's claims is interpreted to be a well-known hardware structure. 
Examiner also notes that the Applicant’s preamble does not afford patentable weight to the Applicant’s claims because the claim preamble is not “necessary to give life, meaning, and vitality” to the claim. Moreover, because the Applicant’s preamble merely states the purpose or intended use of the invention rather than any distinct definition of any of the claimed invention’s limitations, the preamble is not considered a limitation and is of no significance to claim construction.
Regarding claims 2 and 9, the combination of Zhdanov, Huszar, and Vongkulbhisal teach all of the limitations of claims 1 and 8, respectively, as described above in detail. 
further comprising updating the dataset by concatenating the predicted label answers received from the one or more processing nodes into an updated dataset to be used in a next iteration of the loop (Zhdanov, 11:48-53, teaches [u]sing the new labeled portions of the input dataset, the machine learning model can be further trained. This active learning loop may then be repeated on new portions of the input dataset, with each iteration adding to the labeled dataset (that is, updating the dataset by concatenating) and further training the model, until the input dataset has been labeled an updated dataset to be used in a next iteration of the loop)).
Regarding claims 3 and 10, the combination of Zhdanov, Huszar, and Vongkulbhisal teach all of the limitations of claims 1 and 8, respectively, as described above in detail.
wherein receiving the indication further comprises receiving a local model uncertainty measurement for the local AI model from the respective one or more processing nodes (Zhdanov, 6:1-4, teaches [a]nnotation consolidation may refer to the process of taking annotations from multiple annotators (e.g., humans and/or machines) (that is, receiving . . . for the local AI model from the respective one or more processing nodes) and consolidating these together (e.g., using majority-consensus heuristics, removing bias or low-quality annotators, using probabilistic distribution that minimizes a risk function for observed, predicted and true labels, or other techniques). For example, based on each annotators' accuracy history, their annotations can be weighted (that is, receiving a local model uncertainty measurement); Applicant’s specification recites “a model-uncertainty measurement representing the prediction confidence of the model may be computed for each data item” (Specification ¶ 0026); notably, Zhdanov 7:35-39 teaches running inference with the model on the validation dataset. After that, every object is given a confidence level (that is, confidence level is synonymous with “uncertainty measurement”)).
Regarding claims 4 and 11, the combination of Zhdanov, Huszar, and Vongkulbhisal teach all of the limitations of claims 1 and 8, respectively, as described above in detail.
further comprising receiving a computed information gain from the one or more processing nodes (Zhdanov 6:55-61 teaches the data labeling service 108 may also output various performance metrics, such as performance against the annotation budget, quality score of annotated labels and performance against the defined quality threshold, logs and metrics in a monitoring dashboard, and/or an audit trail of annotations tasks as performed by annotators (that is, the “various performance metrics” are receiving a computed information gain from the one or more processing nodes); the Specification recites that “[t]he information gain may be seen as the amount of information gained by training the AI model on a new trusted label of a labeling task. . . . In a preferred embodiment, the information gain may be considered as an average accuracy gain of the model over several iterations of the training (Specification ¶ 0040); accordingly, Examiner notes that the “computed information gain” of the claims has a BRI that covers the performance metrics of Zhdanov).
Regarding claims 7 and 14, the combination of Zhdanov, Huszar, and Vongkulbhisal teach all of the limitations of claims 1 and 8, respectively, as described above in detail.
wherein the data mask is a vector in which each component is 1 if the components is related to a labelled data item and 0 if the components is related to an unlabelled data item (Vongkulbhisal, right column of p. 3178, “3.4.1 Matrix Factorisation in Probability Space,” first paragraph, teaches a matrix P ∈ [0, 1]L×N where we set Pli (the element in row l and column i) to pi(Y = l) if l ∈ Li  (that is, each component is 1 if the component is related to a labelled data item) and zero otherwise (that is, “unlabeled”). This matrix P is similar to the decision profile matrix in ensemble methods [23], but here we fill in 0 for the classes that Ci’s cannot predict. To account for these missing predictions (that is, “unlabeled data”), we define M ∈ {0, 1}L×N as a mask matrix (that is, a vector) where Mli is 1 if l ∈ Li and zero otherwise (that is, 0 if the components is related to an unlabeled data item). With regard to updating the data mask, an accuracy of the heterogeneous classifiers (Vongkulbhisal, Fig. 3(c)).
7.	Claims 5, 6, 12, and 13 are rejected under 35 U.S.C. § 103 as being unpatentable over US Patent 11048979 to Zhdanov et al. [hereinafter Zhdanov] in view of US Published Application 20180240031 to Huszar et al. [hereinafter Huszar] and Vongkulbhisal et al., “Unifying Heterogeneous Classifiers with Distillation,” IEEE (2019) [hereinafter Vongkulbhisal], and further in view of US Patent 11120364 to Gokalp et al. [hereinafter Gokalp].
Regarding claims 5 and 12, the combination of Zhdanov, Huszar, and Vongkulbhisal teach all of the limitations of claims 1 and 8, respectively, as described above in detail.
further comprising receiving a computed relevancy values for one or more predicted labels from the one or more processing nodes (Gokalp, 39:40-60, teaches class descriptors parameter 2305 may be used to specify the target classes into which data items are to be categorized in the depicted embodiment, and/or one or more specific attribute values (such as keywords included in the titles/descriptions of the data items) that may be used to create an initial training set. . . . Attribute descriptors 2314 may indicate the names and descriptions of various relevant attributes of the data items (that is, “attribute values” and “attributed descriptors” is receiving a computed relevancy values for one or more predicted labels from the one or more processing nodes), and how various relevant attributes of data items may be parsed/extracted from the raw data items if needed in at least some embodiments).
Zhdanov, Huszar, Vongkulbhisal, and Gokalp are from the same or similar field of invention. Zhdanov teaches machine learning to automate annotation and management of the datasets. Huszar teaches committee member the system can use any algorithm for training a deep neural network for data labelling. Vongkulbhisal teaches cross-entropy minimization and mask matrix factorisation methods for estimating soft labels of the unlabelled data. Gokalp teaches an artificial intelligence system respective status indicators for classifier training iterations are determined. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to modify the combination of Zhdanov, Huszar, and Vongkulbhisal pertaining to data labeling through machine learning having the artificial intelligence models of the committee members using a mask matrix with the attribute values of the class descriptor parameters of Gokalp.
The motivation for doing so is for efficient training of machine learning models such as classifiers using an automated workflow. (Gokalp 3:49-51).
Regarding claims 6 and 13, the combination of Zhdanov, Huszar, Vongkulbhisal, and Gokalp teach all of the limitations of claims 5 and 12, respectively, as described above in detail.
further comprising requesting trusted labels for data items having associated therewith higher relevancy value compared to other ones of the data items (Gokalp 26:24-36 teaches a prediction score which indicates that the current user-suggested label is incorrect (that is, a lower relevancy value) may be indicated in the reconsideration request . . . . [T]he request to reconsider a previously-supplied label may be sent to a different individual/user than the source of the previously-supplied label-e.g., to one of a set of trusted individuals who are permitted to change previously-provided labels (that is, requesting trusted labels for data items having associated therewith higher relevancy value compared to other ones of the data items)).
Response to Argument
8.	Examiner has fully considered Applicant’s arguments; however, such arguments are unpersuasive for the reasons set out below. 
9.	Applicant argues, referring to instant claim 1 as an exemplar claim, that “the claimed chunked data subsets may be different from the claimed randomized unlabeled subset. Accordingly, Applicant respectfully submits that the sample obtained in Zhdanov by randomly selecting a sample of the input dataset for labelling is not, from the perspective of the skilled person, considered as teaching the claimed chunked data subsets and the randomized unlabeled subset at the same time.” (Response at p. 8).
Examiner respectfully disagrees because the BRI of the claim elements covers the teachings of the cited prior art of Zhdanov, and because the argument appears to rely on limitations not set out in the claims.
Referring to claim 1 as an exemplar claim, the claim recites, inter alia, is:
* * *
from an unlabelled data items subset obtained using the data mask, creating a randomized unlabeled subset having fewer members than the unlabelled data items subset;
at a cluster manager server, chunking the randomized unlabeled subset into a plurality of data subsets, and dispatching the chunked data subsets to one or more of the processing nodes;
* * *
(claim 1, lines 16-21; see also claim 8, lines 19-25).
a.	BRI of the claim term “chunking” covers the teachings of Zhdanov
Applicant argues, with respect to Zhdanov, that “the claimed chunked data subsets may be different from the claimed randomized unlabeled subset.” However, the claim merely recites “chunking the randomized unlabeled subset into a plurality of data subsets.” The claim does not recite the distinction argued by Applicant. 
Moreover, the plain meaning of the term “chunking” is to provide for delivery of a plurality of subsets for processing by nodes. Turning to the Specification, “chunking” is simply “[t]o be managed, the dataset is chunked into several subsets in order to train a plurality of local AI models of a plurality of processing nodes.” (Specification ¶ 0048). 
Accordingly, the BRI of “chunking” reads on the cited prior art of Zhdanov because plain meaning of “chunking” is not inconsistent with the Applicant’s specification.
b.	BRI of “chunking” does not require a sequencing limitation as suggested by Applicant
With regard to “sequencing,” Applicant appears to argue that Zhdanov does not teach “sequencing” with respect to “chunking” data and “dispatching chunked data. 
The BRI of the claims do not recite a “sequencing,” and accordingly, there is no such constraint with regard to “chunking” and “dispatching.” Simply put, the plain meaning of the term “chunking” is for distributing a subset to multiple receivers, or nodes, regardless of the occurrence being “at the same time” or not, and further, is not inconsistent with the Applicant’s Specification. 
Though the Specification recites that [t]he randomized unlabeled subset is subsequently chunked 207 into a plurality of data subsets to be dispatched to one or more of the processing nodes,” (Specification ¶ 0050), and the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 20 USPQ2d 1057 (Fed. Cir. 1993).
Moreover, the rejection clearly sets forth which claim limitations are taught by each of the prior art references, and the reason why it would be obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to combine their teachings.
10.	Applicant argues, referring to instant claim 1 as an exemplar claim, that “[t]he Examiner states on page 4 of the Office Action that the combination of Vongkulbhisal, Dutta and Zheng teaches creating a data mask describing a labeling status of the data items of the dataset. Applicant respectfully disagrees.” (Response at p. 8). In sum, Applicant submits that “Applicant reiterates, that Vongkulbhisal's mask matrix in not configured to be applied to data items. The mask matrix is rather configured to be applied to the classifiers. In fact, Vongkulbhisal's mask matrix M is oblivious to the data items. Therefore, even if Dutta was to suggest that input data may include a mixture of labelled and unlabeled examples, Vongkulbhisal's mask matrix M would not be able to provide a data mask describing a labeling status of the data items of the dataset.” (Response at pp. 10-12).
Examiner respectfully disagrees because Applicant argues against the cited prior art reference of Vongkulbhisal individually, and also that Applicant relies on distinctions not recited by Applicant’s claims.
With respect to Vongkulbhisal, Applicant argues “[t]here is no indication of a data mask in Zhdanov.” (Response at p. 8). Examiner agrees. As set out in detail in the rejections hereinabove, Examiner points out that though Zhdanov and Huszar teach the features of machine learning models for data labelling, the combination of Zhdanov and Huszar, however, does not explicitly teach identifying unlabelled data items “using a data mask, . . . .” Examiner relies on this feature taught by Vongkulbhisal as set out hereinabove.
Applicant argues the teachings of Vongkulbhisal cannot be relied upon because Zhdanov does not teach a data mask. (see Response at p. 8). Also, Applicant extensively argues that the data mask of Vongkulbhisal is not the “data mask” of the Applicant’s claims. (see Response at pp. 9-11).
Examiner respectfully disagrees. First, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references. See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986); MPEP § 2145. Examiner submits that Applicant attacks Vongkulbhisal individually, where the rejections hereinabove are based on combinations of references.
Second, Applicant argues the nature of the data matrix feature of Vongkulbhisal is not covered, under a broadest reasonable interpretation, by the “data mask” of the instant claims. (see Response at pp. 8-11). The instant claims, however, broadly recite “creating a data mask describing a labeling status of the data items of the dataset,” “updating the data mask by changing the labeling status of the data items,” and from “a labeled items subset obtained using the data mask, training the AI model.” (see instant claim 1).
Vongkulbhisal, Figure 2, teaches a unifying heterogeneous classifiers (UHC):

    PNG
    media_image2.png
    343
    556
    media_image2.png
    Greyscale

Vongkulbhisal teaches that to account for missing [labeling] predictions, we define a . . . mask matrix where Mli is 1 if l ∈ Li (that is, labelled) and zero otherwise (that is, unlabeled) (See Vongkulbhisal, at p. 3178, “e.41 Matrix Factorisation in Probability space,” first paragraph). That is, Vongkulbhisal teaches, inter alia, the feature of “creating a data mask describing a labeling status of the data items of the dataset.” (see instant claim 1, line 5). Accordingly, Vongkulbhisal teaches the feature of the “data matrix” of the instant claims.
Second, Applicant argues Vongkulbhisal does not teach the features of Applicant’s claims relating to a “data mask.” Though Vongkulbhisal teaches a labeling status of a data set on a probabilistic basis, Applicant’s instant claim simply recites “creating a data mask describing a labeling status of the data items of the dataset.” (Instant claim 1, line 5). Also, Vongkulbhisal teaches a data mask in the form of a matrix vector P ∈ [0, 1]L×N, as set out in the rejections hereinabove. (See instant claim 7, lines 1-3). 
Accordingly, the BRI of “data mask” element of Applicant’s claims cover the teachings of Vongkulbhisal.
Moreover, the rejections clearly set forth which claim limitations are taught by each of the prior art references, and the reasons why it would be obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to combine their teachings.
Further, Applicant appears to argue that Vongkulbhisal “teaches away” because teachings of Vongkulbhisal are “technically inaccurate.” But Vongkulbhisal teaches unifying heterogeneous classifiers with distillation concerning the problem of unifying knowledge from a set of classifiers with different architectures and target classes into a single classifier, given only a generic set of unlabeled data. (Vongkulbhisal, Abstract). Applicant does not specifically point out how the language of the instant claims are patentably distinguishable over the cited references. That is, Applicant has not explained why the cited prior art references cannot be combined in the manner set forth in the rejection.
11.	Applicant points out that “Dutta and Zheng are not listed on the Notice of References Cited and, likewise, do not appear in the description of the rejections of page 3.” (Response at pp. 11-12).
Examiner agrees. The features taught by Vongkulbhisal are relied upon for teaching the feature of Applicant’s “data mask.” The references of Dutta and Zheng were extraneous notes, which are now set out below as prior art made of record and not relied upon is considered pertinent to Applicant's disclosure.
12.	Applicant argues that “Huszar clearly states that the committee members are algorithms for training a deep neural network. These algorithms are used in Huszar without modification. In Huszar, these algorithms are trained using different training sets generated by the committee generator 110. Applicant respectfully submits that a person skilled in the art would recognize that there is a difference between an algorithm for training a deep neural network and a trained AI model (in this case, a trained AI model is the result of training an algorithm for training a deep neural network on a dataset).” (Response at p. 13).
Examiner respectfully disagrees, because one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references. See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986); MPEP § 2145. Examiner submits that Applicant attacks Huszar individually, where the rejections hereinabove are based on combinations of references.
As set out above, though Zhdanov teaches machine learning to automate annotation and management datasets using pre-trained supervised machine learning models (see Zhdanov 7:3-7), Zhdanov, however, does not explicitly teach “cloning the . . . AI model into a local AI model on the processing nodes.” (see instant claim 1, lines 14-15). 
Huszar, which teaches an active learning system, teaches a constant supply of unlabeled objects 120 that have not been used to train the committee members 150 or that need classification using the trained classifier 180. (Huszar ¶ 0024). Clones of the AI model are placed at each committee member 150-1, 150-2, thru 150-n. These models are initialized “by training it using one of the different training sets generated by the committee generator 110.” (Huszar ¶ 0026). Huszar ¶ 0030 teaches that any of the deep neural networks, e.g., 150-1 to 150-n can be used as a trained classifier 180 (that is, the “deep neural networks” are cloned DNN frameworks). Thus, a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to modify the pre-trained supervised machine learning models of Zhdanov with each of the deep neural networks 150-1 to 150-n of Huszar.
Also, the rejections clearly set forth which claim limitations are taught by each of the prior art references, and the reasons why it would be obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to combine their teachings.
Conclusion
13.	Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action.
14.	The prior art made of record and not relied upon is considered pertinent to Applicant's disclosure:
(US Patent 10339470 to Dutta et al.) teaches input data includes a mixture of labelled and unlabeled example. The model must learn the structures to organize the data as well as make predictions. 
(Zheng et al., “Regularized Singular Value Decomposition and Application to Recommender System,” arXiv (2018)) teaches The evaluation methodology is: (1) construct training data by converting some 1s in the rating matrix into 0s, which is called “maskout”, (2) check if recommender algorithms can correctly recommend these masked-out ratings.
(Yao et al., “Locating Anomalies using Bayesian Factorizations and Masks,” ESANN (2011)) teaches performing anomaly location by which each instance receives a “mask” indicating both its anomaly score and the locations of its potentially anomalous attributes.
15.	Any inquiry concerning this communication or earlier communications from the Examiner should be directed to KEVIN L. SMITH whose telephone number is (571) 272-5964. Normally, the Examiner is available on Monday-Thursday 0730-1730. 
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, Applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the Examiner by telephone are unsuccessful, the Examiner’s supervisor, KAKALI CHAKI can be reached on 571-272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/K.L.S./
Examiner, Art Unit 2122
/BRIAN M SMITH/Primary Examiner, Art Unit 2122