Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Citation of Relevant Prior Art 
1.	The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure. See MPEP 707.05.  Although the prior art discloses several unclaimed, some claimed limitation, for example:

	Petra Schneider (Hyperparameter learning in probabilistic prototype-based models,  Neurocomputing 73(2010)1117–1124) describes present two approaches to extend Robust Soft Learning Vector Quantization (RSLVQ). This algorithm for nearest prototype classification is derived from an explicit cost function and follows the dynamics of a stochastic gradient ascent. The RSLVQ cost function is defined in terms of a likelihood ratio and involves a hyperparameter which is kept constant during training. We propose to adapt the hyperparameter in the training phase based on the gradient information. Besides, we propose to base the classifier’s decision on the value of the likelihood ratio instead of using the distance based classification approach. Experiments on artificial and real life data show that the hyperparameter crucially influences the performance of RSLVQ. However, it is not possible to estimate the best value from the data prior to learning. We show that the proposed variant of RSLVQ is very robust with respect to the initial value of the hyperparameter. The classification approach based on the likelihood ratio turns out to be superior to distance based classification, if local hyperparameters are adapted for each prototype. 
	
Bui (US 2021/0295191 A1) describes non-transitory computer readable media that can generate hyper-parameters for machine learning models by utilizing modified Bayesian optimization to select hyper-parameters based on a combination of accuracy and efficiency metrics. In particular, the disclosed systems can utilize a unified Bayesian optimization framework for jointly optimizing models for both prediction accuracy (i.e., effectiveness) and training efficiency. More specifically, the disclosed systems can utilize an objective function that reflects both an accuracy acquisition function and an efficiency acquisition function to model the tradeoff between accuracy and training efficiency within a hyper-parameter search space. The disclosed systems can also apply a principled Bayesian optimization framework to select hyper-parameters based on the tradeoff. The disclosed systems can further account for extrinsic hyper-parameters such as training set size within the hyper-parameter space. In this manner, the disclosed systems can select hyper-parameters for machine learning models that improve model accuracy and training efficiency.

Baker (US 2021/0182631 A1) describes a non-transitory machine-readable medium including instructions that, when executed by a machine, cause the machine to perform operation for classifying an object, the operations comprising: receiving data of an object to be classified; and determining, using a neural network, a hyper-opinion classification of the object including an indication of the probabilities of base classes and composite classes that are “or” combinations of proper subsets of the base classes, wherein the neural network is trained using a cost function that includes one or more of an entropy component, a penalty for selecting uncertainty over the composite classes, or a least squares component that includes a hyper parameter indicating cost for choosing a composite class of the composite classes, wherein the cost function includes two or more of an entropy component, a penalty for selecting uncertainty over the composite classes, or a least squares component that includes a hyper parameter indicating cost for choosing a composite class of the composite classes.

Tang (US 2017/0031953 A1) describes a Multi-Task Learning approach based on Discriminative Gaussian Process Latent Variable Model (MTL-DGPLVM) for face verification. The MTL-DGPLVM model is based on Gaussian Processes (GPs) that is a non-parametric Bayesian kernel method. The present application uses GPs method mainly due to at least one of the following three notable advantages. Firstly, it is a non-parametric method, which means it is flexible and can cover complex data distributions in the real world. Secondly, GPs method can be computed effectively because its marginal probability is a closed-form expression. Furthermore, its hyper-parameters can be learned from data automatically without using model selection methods such as cross validation, thereby avoiding the high computational cost. Thirdly, the inference of GPs is based on Bayesian rules, resulting in the robustness to over-fitting. According to one embodiment of the present application, the discriminative information constraint is used to enhance the discriminability of GPs. Considering that GPs depend on the covariance function, it is logical to adopt Kernel Fisher Discriminant Analysis (KFDA) as the discriminative regularizer. In order to take advantage of more data from multiple source-domains to improve the performance in the target-domain, the present application also introduces the multi-task learning constraint to GPs. Here, it investigates the asymmetric multi-task learning because the present application only focuses on the performance improvement of the target task. From the perspective of information theory, this constraint is to maximize the mutual information between the distributions of target-domain data and multiple source-domains data. The MLT-DGPLVM model can be optimized effectively using the gradient descent method. The proposed MLT-DGPLVM model can be applied to face verification in two different ways: as a binary classifier and as a feature extractor. For the first way, given a pair of face images, it directly computes the posterior likelihood for each class to make a prediction. In the second way, it automatically extracts high-dimensional features for each pair of face images, and then feeds them to a classifier to make the final decision. In one aspect, there is disclosed a method for verifying facial data, comprising a step of retrieving a plurality of source-domain datasets from a first database and a target-domain dataset from a second database different from the first database, a step of determining a latent subspace matching with target-domain dataset best, and a posterior distribution for the determined latent subspace from the target-domain dataset and the source-domain datasets; a step of determining information shared between the target-domain data and the source-domain datasets; and a step of establishing a Multi-Task learning model from the posterior distribution P, and the shared information M on the target-domain dataset and the source-domain datasets. In another aspect of the present application, there is disclosed an apparatus for verifying facial data, comprising a model establishing module, wherein the model establishing module comprises a retrieve unit configured to retrieve a plurality of source-domain datasets from a first database and a target-domain dataset from a second database different from the first database and a model establisher configured to determine a latent subspace matching with target-domain dataset best, and a posterior distribution for the determined latent subspace from the target-domain dataset and the source-domain datasets; determine information shared between the target-domain data and the source-domain datasets; and establish a Multi-Task learning model from the posterior distribution, and the shared information on the target-domain dataset and the source-domain datasets.

Ye (US 10013477 B2) describes a method of clustering data objects representative of images, videos, biological processes, genetic sequences, or documents using a scalable optimization technique for scaling computational cost of the clustering to a reduced level, comprising the steps of: a) providing, at a computer processor including parallel processors, a set of data objects, each object being representative of an image, a video, a biological process, genetic sequence, or a document; b) performing an initial data segmentation on the set of data objects to divide the data into trunks before performing a discrete distribution (D2) clustering operation, wherein one of the parallel processors performs the initial data segmentation step and distributes the data trunks to each of the parallel processors; c) performing, by all the parallel processors, a discrete distribution (D2) clustering operation, including: i) optimizing Wasserstein centroids by using a scalable and parallel optimization technique, each data object being handled on one of the parallel processors that contains the data object, local aggregated results based on each data trunk being communicated to all of the parallel processors that uses the local aggregated results to compute the Wasserstein centroids for all the data objects, ii) assigning each data object to the nearest Wasserstein centroid, the nearest Wasserstein centroid being a label of the data object, and iii) iterating i) and ii) until a single segmentation is achieved, the number of Wasserstein centroids is reduced to a predefined number, a predefined threshold based on the distances of the data objects to the Wasserstein centroids is satisfied, the number of changed labels of objects is less than a predefined number, or the number of iterations reaches a predefined number; wherein the scalable optimization technique is a Bregman alternating direction method of multiplier (B-ADMM) or a ADMM method or a subgradient descent method, such that the computational cost of the clustering is reduced to a level which can be handled by each of the parallel processors, the sub gradient descent method being a method that optimizes for Wasserstein centroids using subgradients in optimal transport problems solved via linear programming, the ADMM method optimizing for Wasserstein centroids using updates of ADMM by decoupling one set of constraint in optimal transport into a distributed formulation, such that a number of submodules are solved from quadratic programming, the B-ADMM approach optimizing for Wasserstein centroids using updates of B-ADMM, which is a variant of ADMM, by decoupling two sets of constraints in an optimal transport into a distributed formulation, such that a number of submodules are solved in a closed formula; and d) outputting information regarding a way in which the objects are clustered in terms of similarity.

Tang (US 10339177 B2) describes an apparatus for generating a Discriminative Multi-Task Gaussian Process (DMTGP) model and using the DMTGP model as a binary classifier in facial recognition, the apparatus comprising: at least one processor, a memory configured to store computer program instructions that, when executed by the at least one processor, cause the at least one processor to be configured to: generate the DMTGP model by: retrieving a plurality of source-domain datasets Xi from a first database and a target-domain dataset Xt from a second database different from the first database; determining a latent subspace Zt matching with the target-domain dataset Xt best and a posterior distribution P for the determined latent subspace Zt from the target-domain dataset Xt and the source-domain datasets Xi; determining information M shared between the target-domain data Xt and the source-domain datasets Xi; and establishing a multi-task learning model Lmodel from the posterior distribution P, the shared information M on the target-domain dataset Xt and the source-domain datasets Xi, the multi-task learning model Lmodel being based on Gaussian processes that comprise hyper-parameters, and the hyper-parameters being learned from data automatically without using model selection methods; obtain a pair of images to be compared by the DMTGP model to determine whether the pair of images correspond to a same person or correspond to different people, a first one of the images corresponding to a first face A and a second one of the images corresponding to a second face B, a first plurality of multiple scale features m1, m2, . . . mp being extracted from the first face A, and a second plurality of multiple scale features n1, n2, . . . np being extracted from the second face B in different landmarks of the first face A and the second face B; determine similarities S1, S2, . . . Sp of each two features in same landmarks, S1 referring to a similarity of m1 and n1, S2 referring to a similarity of m2 and n2, . . . , and Sp referring to a similarity of mp and np; feed the similarities S1, S2, . . . Sp to the DMTGP model to determine whether the first face A matches the second face B by applying the multi-task learning model Lmodel to the similarities S1, S2, . . . Sp to determine whether the first face A matches the second face B; and verify that the first face A matches the second face B according to a facial recognition result from the DMTGP model.

Ye (US 2017/0083608 A1) describes a method of clustering complex data objects, comprising the steps of: a) performing an initial segmentation of the data objects; b) performing a series of discrete distribution (D2) clustering operations on the data objects using a scalable method to optimize a set of Wassersrtein centroids within each segment; c) combining the centroids determined in step b) into one data set and performing a segmentation of this data set; d) iteratively repeating steps b) and c) at higher levels in a hierarchy, if necessary, until a single segmentation is achieved, the number of centroids is reduced to an acceptable level, or another stopping criterion is satisfied; and wherein the D2 clustering operations are performed by parallel processors or a single processor in sequence.

IJAZ (US 2021/0192381 A1) describes quantum computing with pre-training, and, in particular, to quantum neural networks (QNNs) including one or more pre-trained layers. In some embodiments, a method includes training a first QNN by sending a first dataset into the first QNN to generate a first output and configuring the first QNN into a first setting based on the training. The method also includes receiving a second dataset and using at least a portion of the first QNN to generate a second output based on the second dataset and using the first setting. The second output is sent to a second QNN, operatively coupled to the first QNN, to train the second QNN. The second QNN is configured in a fixed setting during the training of the first QNN. In some embodiments, a non-transitory, processor-readable medium is configured to store code representing instructions to be executed by a processor. The code comprises code to cause the processor to receive a first dataset, and to use at least a portion of a first QNN to generate a first output using a first setting. The first setting is determined based on training the first QNN using a second dataset. The code also comprises code to cause the processor to send the first output to a second QNN, operatively coupled to the first QNN, to train the second QNN. The second QNN is configured in a first fixed setting during training of the first QNN. In some embodiments, an apparatus includes a first quantum neural network (QNN) configured in a fixed setting based on training. The first QNN is configured to receive a first dataset and generate a first output using the fixed setting. The apparatus also includes a second QNN operatively coupled to the first QNN and being differentiable. The second QNN is configured to receive the first output and generate a second output.

Asar (US 2008/0133434 A1) describes a method and apparatus for predictive modeling & analysis for knowledge discovery comprising: selecting a specific target for which predictive modeling and analysis is to be performed; importing the dataset into learning and testing data sets; learning dataset is further divided into training and validation datasets; normalizing and cleaning the dataset; systematic dimensionality reduction of features from the learning dataset in order to improve the performance of creating models without sacrificing speed; configuring the apparatus for either a single-class or multi-class classification modeling or a regression modeling or optionally both; optionally selecting an appropriate linear or non-linear kernel for modeling; selecting an auto-tuning parameter for automatically optimizing and selecting the best model with the highest accuracy for correct predictions of activity including selecting a linear or non-linear kernel that yields the best model with the highest accuracy; creating models using support vector machines and other algorithms such as Naive Bayes, Random Forest, Ridge Regression with the learning dataset and auto-selecting the best model with the best accuracy for correct predictions of activity; testing the test dataset against the auto-selected best model to determine over-fitting; discovering dominant features and characteristics as in the learning dataset for the given target and the selected model; performing cluster analysis on the learning dataset to discover different classes and series of similar data-points and discovering dominant features and characteristics of each cluster; further systematic dimensionality reduction of features from the learning dataset in order to further improve accuracy based on the selected auto-tuning parameter; iteratively re-creating models using support vector machines or other algorithms including Naïve Bayes, Random Forest and Ridged Regression with the learning dataset with reduced features and then auto-selecting the best model with the best accuracy for correct predictions of activity; discovering noise in the training dataset by performing Noise Discovery Cross Validation Algorithm. predicting activity and level of activity of data-points with unknown ground truth using the selected best model; discovering dominant features and characteristics of the data-points in the prediction dataset for the given target; performing similarity discovery to discover if the prediction dataset and training dataset come from similar distribution and series; packaging and exporting models to be integrated and used with other third party applications; recreating the best model by only training on the support vectors in case the algorithm used for training is Support Vector Machines; allowing users to add additional data to the original training dataset for retraining and generating local models that are more specific to the users problem domain; ability to perform incremental learning by adding new training data to improve the model without having to re-run and re-generate model.

Malik (US 2010/0174670 A1) describes a computer-implemented method comprising: using a computer comprising a processor to perform: initializing a model, the model including a plurality of classes; selecting subsets of patterns from a set of available patterns in a training instance selected from a training set of training instances, the selecting subsets including selecting a subset of size-1 patterns and selecting a subset of size-2 patterns; initializing a weight of each size-1 pattern in the subset of size-1 patterns; including each size-1 pattern in the subset of size-1 patterns in each class in the plurality of classes in the model; calculating an overall significance value of each size-2 pattern in the training instance; sorting the size-2 patterns using the overall significance; selecting the highest k sorted size-2 patterns; initializing a weight of each selected highest k size-2 pattern; adjusting the weights on the size-1 and size-2 patterns; and presenting the model organized with the plurality of classes, each class including the size-1 patterns, the highest k size-2 patterns, and the weights of the size-1 and size-2 patterns.

Dhaka (US 11281686 B2) describes an information processing apparatus comprising a processor and memory storing instructions, the processor is configured to execute the instructions to: acquire multiple data streams, each of the data streams representing the time sequence of observed data along with time stamps thereof; recursively perform, till a preset termination condition is met: assign one of clusters to each input data stream by sampling a cluster identity for each input data stream from a cluster identity distribution, wherein in the assignment of the cluster identity, the cluster identity distribution is updated through optimization of parameters of the cluster identity distribution; for each cluster, update dynamics of the cluster by optimizing a proposal posterior for the dynamics of the cluster; for each data stream, update individual response of the data stream by optimizing a proposal posterior for the individual response of the data stream, the individual response of the data stream representing sensitivity of the data stream towards the dynamics of the cluster to which the data stream is assigned; for each data stream, update the latent states of the data stream based on the updated individual response of the data stream; for each data stream, for each time stamp, update an observation model of the data stream at the time stamp by transforming the latent states of the data stream at the time stamp into parameters of the observation model using transformation function corresponding to a data type of the data stream, transformation function being different for each data type; and for each cluster, generate a model data based on cluster identity distribution, the dynamics of the cluster, the dynamics of each data stream assigned to the cluster, and the latent states corresponding to each data stream assigned to the cluster. A control method performed by a computer, the control method comprising: acquiring multiple data streams, each of the data streams representing the time sequence of observed data along with time stamps thereof; recursively performing, till a preset termination condition is met: assigning one of clusters to each input data stream by sampling a cluster identity for each input data stream from a cluster identity distribution, wherein in the assignment of the cluster identity, the cluster identity distribution is updated through optimization of parameters of the cluster identity distribution; for each cluster, updating dynamics of the cluster by optimizing a proposal posterior for the dynamics of the cluster; for each data stream, updating individual response of the data stream by optimizing a proposal posterior for the individual response of the data stream, the individual response of the data stream representing sensitivity of the data stream towards the dynamics of the cluster to which the data stream is assigned; for each data stream, updating the latent states of the data stream based on the updated individual response of the data stream; for each data stream, for each time stamp, updating an observation model of the data stream at the time stamp by transforming the latent states of the data stream at the time stamp into parameters of the observation model using transformation function corresponding to a data type of the data stream, transformation function being different for each data type; and for each cluster, generating a model data based on cluster identity distribution, the dynamics of the cluster, the dynamics of each data stream assigned to the cluster, and the latent states corresponding to each data stream assigned to the cluster.
Fuchs (US 10810736 B2) describes a method of training models for classifying images, comprising: identifying, by an image classifier executing on one or more processors, a plurality of tiles from an image, the image associated with a label indicating one of a presence or an absence of a condition within the image; applying, by the image classifier, an inference model having a plurality of parameters to the plurality of tiles to determine a tile-specific score for each tile of the plurality of tiles from the image, the tile-specific score indicating a likelihood of one of the presence or the absence of the condition within the tile; selecting, by the image classifier, a subset of tiles from the plurality of tiles based on the determined tile-specific score for each tile; comparing, by the image classifier, the tile-specific score determined for each tile of the subset of tiles to a threshold value for the image associated with the label indicating one of the presence or the absence of the condition; and modifying, by the image classifier, at least one parameter of the inference model based on comparing the tile-specific score of each tile to the threshold value. A method of training models for classifying images, comprising: identifying, by an image classifier executing on one or more processors, a subset of tiles from a plurality of tiles of an image, the image associated with a label indicating one of a presence or an absence of a condition within the image; applying, by the image classifier, an aggregation model having a plurality of parameters to the subset of tiles in a sequence to determine a classification result for the image from which the subset of tiles are identified, the classification result indicating one of the image having at least one feature corresponding to the presence of the condition or the image lacking any feature corresponding to the absence of the condition; comparing, by the image classifier, the classification result determined by the aggregation model to the label indicating one of the presence or the absence of the condition; modifying, by the image classifier, at least one parameter of the aggregation model based on comparing the classification result to the label.

Thayer (US 2007/0118297 A1) describes a computing system for identifying populations of events in a multi-dimensional data set obtained from a flow cytometer, the populations associated with blood components in a sample of human or animal blood, the improvement comprising: one or more machine readable storage media for use with the computing system, the machine readable storage media storing: a) data representing a finite mixture model, the model comprising a weighted sum of multi-dimensional Gaussian probability density functions associated with populations of events expected in the data set; b) an expert knowledge set comprising (1) one or more data transformations and (2) one or more logical statements, the transformations and logical statements encoding a priori expectations as to the populations of events in the data set; and c) program code for the computing system comprising instructions for operating on the multi-dimensional data using the finite mixture model and the expert knowledge set to thereby identify populations of events in the multi-dimensional data set associated with said blood components, wherein the program code comprises: a pre-optimization module performing scaling of the multi-dimensional data set; an optimization module iteratively performing (1) an expectation operation on at least a subset of the multi-dimensional data set, (2) an application of the expert knowledge set to data resulting from the expectation operation, and (3) a maximization operation updating parameters associated with the density functions of the finite mixture model based on the application of the exert knowledge set; and a classification module responsive to the output of the maximization operation for classifying the multidimensional data set into one or more populations.

Fuchs (US 10445879 B1) describes a method of training models for classifying biomedical images. An image classifier executing on one or more processors may generate a plurality of tiles from each biomedical image of a plurality of biomedical images. The plurality of biomedical images may include a first biomedical image and a second biomedical image. The first biomedical image may have a first label indicating a presence of a first condition and the second biomedical image may have a second label indicating a lack of presence of the first condition or a presence of a second condition. The image classifier may establish an inference system to determine, for each tile of the plurality of tiles in each biomedical image of the plurality of biomedical images, a score indicating a likelihood that the tile includes a feature indicative of the presence of the first condition. For the first biomedical image, the image classifier may select a first subset of tiles from the plurality of tiles having the highest scores. The image classifier may compare the scores of the tiles in the first subset to a first threshold value corresponding to the presence of the first condition. The image classifier may modify the inference system responsive to determining that the scores of at least one tile of the first subset of tiles is below the first threshold value. For the second biomedical image, the image classifier may select a second subset of tiles from the plurality of tiles having the highest scores. The image classifier may compare the scores of the tiles in the second subset to a second threshold value corresponding to the lack of the presence of the first condition or the presence of the second condition. The image classifier may modify the inference system responsive to determining that the scores of at least one tile of the second subset of tiles is above the second threshold value.
In some embodiments, the image classifier may determine, for the at least one tile of the first subset, a first error metric between the score of the at least one tile to a first value corresponding to the presence of the first condition. In some embodiments, modifying the inference system may include modifying the inference system based on the first error metric of the at least one tile of the first subset. In some embodiments, the image classifier may determine, for the at least one tile of the second subset, a second error metric between the score of the at least one tile to a second value corresponding to the lack of the presence of the first condition. In some embodiments, modifying the inference system may include modifying the inference system based on the second error metric of the at least one tile of the second subset. In some embodiments, the image classifier may maintain the inference system responsive to determining that scores of none of a plurality of tiles for a third biomedical image of the plurality of biomedical images is below the first threshold. The third biomedical image may have the first label indicating the presence of the first condition. In some embodiments, the image classifier may maintain the inference system responsive to determining that scores of none of a plurality of tiles for a fourth biomedical image of the plurality of biomedical images is below the second threshold. The fourth biomedical image may have the first label indicating the lack of the presence of the first condition.

Thayer (US 7299135 B2) describes he improvement comprises one or more machine readable storage media for use with the computing system, the machine readable storage media storing:
a) data representing a finite mixture model, the model comprising a weighted sum of multi-dimensional Gaussian probability density functions associated with populations of events expected in the data set;
b) an expert knowledge set, comprising one or more data transformations for operation on the multi-dimensional data set and one or more logical statements (“expert rules” herein), the transformations and logical statements encoding a priori expectations as to the populations of events in the data set; and
c) program code for the computing system comprising instructions for operating on the multi-dimensional data, the finite mixture model, and the expert knowledge set, to thereby identify populations of events in the multi-dimensional data set.
The identification of populations in the multi-dimensional data set can be converted to quantitative or qualitative data presented in a human-perceptible form, such as a graph or plot of the data with color coding to identify discrete populations in the data set, or as an output in terms of numbers or percentages of data points in the data set which are associated with the populations. As another example, the identified populations can be represented as one or more files in electronic from which can be stored in memory in the computer system, or transferred over a network to a computer workstation for further analysis or display to an operator (e.g., hematologist, veterinarian, or primary care physician). The use of the expert knowledge set in combination with the finite mixture model allows for more robust and accurate methods of automatically classifying data into one or more populations. In the context of flow cytometry and blood samples, an expert hematologist approaches a given flow cytometry data set expecting to find evidence of the five WBC types and has, as a result of previous information derived from blood manipulation studies, a good idea where they fall in one or more two-dimensional projections of the seven-dimensional data. There are no necessary bounds on what might comprise an expert's a priori knowledge set, but examples can include cluster position (e.g., in a two-dimensional projection or plot of a subset of the data), geometric shape of a cluster within some two-dimensional projections, and cluster position relative to other clusters. Such relationships often correspond to, and encode, known differences between the cell types, e.g. neutrophils are larger than most lymphocytes, and eosinophils contain more dense organelles than monocytes, etc. but could also arise from instrument specific knowledge. The present inventive methods provide an automated classification system and methods that rely on similar types of information, and, importantly, code such knowledge into an expert knowledge set of data transformations and logical statements or operations, and uses such knowledge set on a data set or data derived from the data set (“hidden data” herein) to more accurately classify the data set into populations.

Hegde (US 2018/0189664 A1) describes a method of processing a data set to detect outliers, comprising the steps of: a) receiving as an input a data set of N data points;  b) iterating through the entire data set a plurality of times, wherein each iteration comprises: (i) inferring a prior cluster profile having one or more clusters, each of which is identified as a non-outlier cluster or an outlier cluster, and each of which is characterised by its own probability distribution parameters, and by a weighting based on the relative number of data points assigned to the cluster, the prior cluster profile being inferred according to the output of the preceding iteration or, on the first iteration, according to an initial inferred cluster profile;  ii) for each data point in the data set: (aa) evaluating the probability that the data point belongs to each of the existing clusters in the prior cluster profile and that it belongs to a new cluster identified as a non-outlier cluster or an outlier cluster;  (bb) assigning the data point to one of the existing clusters or creating a new cluster in a probabilistic fashion according to the evaluated probabilities in (aa);  iii) updating the prior cluster profile to reflect the assignment of the data points to existing clusters or the creation of new clusters and assignment of data points to the new clusters and returning the number of clusters, the identification of each cluster as outlier or non-outlier, and the probability distribution parameters and weightings for each cluster, for use as a prior cluster profile in the next iteration; c) after a predetermined number of iterations through the entire data set, computing the most likely number of clusters in the data set according to all iterations in order to label data items and clusters as non-outliers and outliers along with determining the parameters for the cluster

Allowable Subject Matter
2.	Claims 1-20 are allowed.
Reasons for Allowance
3.	The following is an examiner's statement of reasons for allowance:
Independent claims 1, 10 and 16 contain allowable subject matter. None of the prior art of record shows or fairly suggests the claimed invention. 

Regarding claim 1:
The primary reason for the allowance of claim 1 is the inclusion of a computer-implemented method comprising: generating, by one or more computer processors, one or more synthetic data points for each identified cluster utilizing a corresponding calculated probability distribution; and quantitatively assessing, by one or more computer processors, the one or more generated synthetic data points. It is these features found in the claim, as they are claimed in the combination and claimed elements arranged as in the claim, that has not been found, taught or suggested by the prior art of record which makes this claim allowable over the prior art.
Claims 2-9 are allowed due to their dependency on claim 1.

Regarding claim 10:
The primary reason for the allowance of claim 10 is the inclusion of a computer program product comprising: program instructions to generate one or more synthetic data points for each identified cluster utilizing a corresponding calculated probability distribution; and program instructions to quantitatively assess the one or more generated synthetic data points. It is these features found in the claim, as they are claimed in the combination and claimed elements arranged as in the claim, that has not been found, taught or suggested by the prior art of record which makes this claim allowable over the prior art.
Claims 11-15 are allowed due to their dependency on claim 10.


Regarding claim 16:
The primary reason for the allowance of claim 16 is the inclusion of a computer system comprising: program instructions to generate one or more synthetic data points for each identified cluster utilizing a corresponding calculated probability distribution; and program instructions to quantitatively assess the one or more generated synthetic data points. It is these features found in the claim, as they are claimed in the combination and claimed elements arranged as in the claim, that has not been found, taught or suggested by the prior art of record which makes this claim allowable over the prior art.
Claims 17-20 are allowed due to their dependency on claim 16.

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Contact information

4.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to Tung Lau whose telephone number is (571)272-2274, email is Tungs.lau@uspto.gov. The examiner can normally be reached on Tuesday-Friday 7:00 AM-5:00 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, John Breene, can be reached on 571-272-4107. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll- free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272- 1000.
/TUNG S LAU/Primary Examiner, Art Unit 2862
Technology Center 2800 
May 25, 2022