DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
1.	The present application 17/179,265 filed on 02/18/2021, is being examined under the first inventor to file provisions of the AIA .  

Drawings
2.	The drawings received on 02/18/2021 are accepted by the Examiner.
Information Disclosure Statement
3.	The information disclosure statements (IDS) submitted on 05/20/2011, 06/22/2021, 6/30/2022, 07/07/2021 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statements are being considered by the examiner.

Review under 35 USC § 101
4.	Claims 1-20 are directed to a method, an article of manufacture and a system have been reviewed.  Claims 1-9 are appeared to be in one of the statutory categories [e.g. a process]. The process is a method for performing conditional sampling of a particular dataset. Claims 1-9 do not fall within at least one of the grouping of abstract ideas enumerated in the 2019 PEG.  Claim 10 appeared to be in one of the statutory categories [e.g. a process]. The process is a method of generating a random sample dataset foe each bucket of the plurality of buckets. Claim 10 does not fall within at least one of the grouping of abstract ideas enumerated in the 2019 PEG. Claims 11-19 are appeared to be in one of the statutory categories [e.g. an article of manufacture]. The article of manufacture is a non-transitory computer readable medium storing one or more sequences of instructions having stored thereon instructions for performing conditional sampling of a particular dataset.  Claims 11-19 do not fall within at least one of the grouping of abstract ideas enumerated in the 2019 PEG.  Claim 20 is appeared to be in one of the statutory categories [e.g. an article of manufacture]. The article of manufacture is a non-transitory computer readable medium storing one or more sequences of instructions having stored thereon instructions for performing conditional sampling of generating a random sample dataset for each bucket of the plurality of buckets. Claim 20 does not fall within at least one of the grouping of abstract ideas enumerated in the 2019 PEG.  Therefore, claims 1-20 are qualified as eligible subject Matter under 35 USC 101.  
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

5.	Claims 3, 4, 10, 13, 14 and 20 are rejected under 35 U.S.C. 103 as being unpatentable by Das (US 2021/0287136 A1) and in view of Reymond et al. (US 2022/0300528 A1), hereinafter Reymond.
As to claims 3 and 13, Das discloses wherein randomly sampling the particular dataset to identify the random sample dataset of the particular dataset (See para. [0028], the system identifies a set of k-nearest neighbor [kNN] data points in a majority class based on their distance to other points in the same majority class).  
Das does not explicitly disclose generating a KD-Tree that comprises the data instances of the particular dataset; wherein the KD-Tree comprises a plurality of buckets.
Reymond discloses generating a KD-Tree that comprises the data instances of the particular dataset (See para. [0151] and para. [0154], constructing a K nearest graph or tree comprise clusters or buckets through branches and sub-branches of a dataset ); wherein the KD-Tree comprises a plurality of buckets; wherein each bucket, of the plurality of buckets, includes a unique set of similar data instances from the particular dataset; for each bucket of the plurality of buckets, including, in the random sample dataset, a randomly-selected subset of the unique set of similar data instances in said each bucket (See para. [0154], random sampling (d, d,f) from a set of classes and k-nearest neighbors using the constructed K nearest graph or tree, for example,  20 nearest neighbors of a randomly selected compound from a random sample).
Hence, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the Das’s system to include a KD-tree includes buckets or cluster, as taught by Reymond, in order to locate similar data points in a very large databases  efficiently  (See Reymond, para. [0004]). In addition, both references (Das and Reymond) teach features that are directed to analogous art and they are directed to the same field of endeavor, such as using machine learned K-nearest neighbor to locate similar data points. This close relation between both references highly suggests an expectation of success.
As to claims 4 and 14, Das discloses obtaining a first prediction, for the target data instance, using an ML model; obtaining a second prediction, for a particular generated data instance of the set of generated data instances, using the ML model; determining a difference metric that measures a difference between the first prediction and the second prediction; and using the difference metric to determine an importance score for the particular tested feature with respect to the ML model (See para. [0044]-para.[0046], the classification modeling system 104 provides, to the data processing system 102, an option to use a simple likelihood classification (SLC) algorithm that may be used for the classification of data set, the SLC algorithm may return a score, for a given input point, ranging between 0 and 1. Using this score, and thresholds defined by the data processing system 102 or other entity, the SLC algorithm may determine how to classify an input point. The score may be interpreted as the probability that the input point belongs to a particular class [e.g. first prediction class or a second prediction class], whereby the closer the score is to a value corresponding to the class, the more likely that that the input point belongs to that class. The SLC algorithm is implemented to process imbalanced data, whereby few data points for a particular class may be available within the data set).
Referring to claims 10 and 20, Das discloses a method performed by one or more computing devices (See para. [0011] and Figure 1) randomly sampling the particular dataset to identify the random sample dataset of the particular dataset (See para. [0028], the system identifies a set of k-nearest neighbor [kNN] data points in a majority class based on their distance to other points in the same majority class).  
Das does not explicitly disclose generating a KD-Tree that comprises the data instances of the particular dataset; wherein the KD-Tree comprises a plurality of buckets.
Reymond discloses generating a KD-Tree that comprises the data instances of a particular dataset to be sampled; wherein the KD-Tree comprises a plurality of buckets (See para. [0151] and para. [0154], constructing a K nearest graph or tree comprise clusters or buckets through branches and sub-branches of a dataset ); wherein each bucket, of the plurality of buckets, includes a unique set of similar data instances from the particular dataset; generating a random sample dataset from the particular dataset by, for each bucket of the plurality of buckets, including, in the random sample dataset, a randomly-selected subset of the unique set of similar data instances in said each bucket (See para. [0154], random sampling (d, d,f) from a set of classes and k-nearest neighbors using the constructed K nearest graph or tree, for example,  20 nearest neighbors of a randomly selected compound from a random sample).
Hence, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the Das’s system to include a KD-tree includes buckets or cluster, as taught by Reymond, in order to locate similar data points in a very large databases  efficiently  (See Reymond, para. [0004]). In addition, both references (Das and Reymond) teach features that are directed to analogous art and they are directed to the same field of endeavor, such as using machine learned K-nearest neighbor to locate similar data points. This close relation between both references highly suggests an expectation of success.
6.	Claims 5 and 15 are rejected under 35 U.S.C. 103 as being unpatentable by Das (US 2021/0287136 A1) and in view of Reymond (US 2022/0300528 A1) and further in view of Zhu (US 20200097775 A1), hereinafter Zhu.

As to claims 5 and 15, Das does not explicitly disclose the ML model is configured to identify anomalous data instances in a test data set.
However, Zhu discloses the ML model is configured to identify anomalous data instances in a test data set (See para. [0005] and para. [0070], identifying one or more test data has anomalous instances).
Hence, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the Das’s system to include anomalous data instances, as taught by Zhu, in order to detect and classify an anomalous feature since the training data may not have been prepared and/or scaled in a large database system (See Zhu, para. [0070]). In addition, all references (Zhu, Das and Reymond) teach features that are directed to analogous art and they are directed to the same field of endeavor, such as using machine learned algorithm to locate similar data points. This close relation between both references highly suggests an expectation of success.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
7.	The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1, 2, 6-9, 11, 12, 16, 17, 18, 19 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Das et al. (US 2021/0287136 A1), hereinafter Das.
Referring to claims 1 and 11, Das discloses a computer-executed method for performing conditional sampling of a particular dataset (See para. [0004], identifying which sampling algorithm that satisfies one or more criteria or condition will produce best results for a given data set), comprising: randomly sampling the particular dataset to generate, in memory, a random sample dataset of the particular dataset (See para. [0024], para. [0026], para. [0027] the system generates re-sampling of the dataset according to a classification model to address potential of misclassification of imbalanced data); and
 after randomly sampling the particular dataset to identify the random sample dataset (See para. [0024], para. [0026], para[0028], the system identifies a sample data set which is maintained from a data repository maintained by a classification model system or from a third party provider that maintains sample data sets): identifying a set of nearest neighbor data instances, from the random sample dataset, based on one or more similarities between a target data instance in the particular dataset and the nearest neighbor data instances of the set of nearest neighbor data instances (See para. [0028], the system identifies a set of k-nearest neighbor [kNN] data points in a majority class based on their distance to other points in the same majority class); wherein the set of nearest neighbor data instances has a particular number of data instances (See para. [0027]-para. [0030], the kNN algorithm includes m data points corresponding to minority class and the desired ratio sampling is a N:1 ratio of majority class data points to minority data points); wherein each data instance, of the set of nearest neighbor data instances, is one of the particular number of data instances nearest to the target data instance among the data instances of the random sample dataset (See para. [0027] and para. [0028], perform under-sampling of data points in the majority class based on their distance to other points in the same class using a k-nearest neighbors (kNN) algorithm, the kNN algorithm is based on feature similarity, whereby the algorithm is used to determine, based on how closely out-of-sample features resemble a data set, how to classify any given data point. The system uses the kNN algorithm to shortlist k-nearest neighbors in the majority class for every data point in the minority class. Subsequently, the system executing the kNN algorithm may calculate the average distance of these k-nearest neighbors from their respective minority class. The system may maintain only those data points from the majority class whose average distance is the smallest from the minority class. This may result in a sample number of data points of the majority class); and wherein the method is performed by one or more computing devices (See para. [0011] and Figure 1).
As to claims 2 and 12, Das discloses generating a set of generated data instances, based on the target data instance, by generating, for each nearest neighbor data instance of the set of nearest neighbor data instances, a corresponding generated data instance comprising: a feature value of said each nearest neighbor data instance for a particular tested feature of the corresponding generated data instance, and feature values of the target data instance for all features of the corresponding generated data instance other than the particular tested feature (See para. [0027] and para. [0028], perform under-sampling of data points in the majority class based on their distance to other points in the same class using a k-nearest neighbors (kNN) algorithm, the kNN algorithm is based on feature similarity, whereby the algorithm is used to determine, based on how closely out-of-sample features resemble a data set, how to classify any given data point. The system uses the kNN algorithm to shortlist k-nearest neighbors in the majority class for every data point in the minority class. Subsequently, the system executing the kNN algorithm may calculate the average distance of these k-nearest neighbors from their respective minority class. The system may maintain only those data points from the majority class whose average distance is the smallest from the minority class. This may result in a sample number of data points of the majority class).
As to claims 6 and 16, Das discloses wherein the random sample dataset represents 1/kth of the particular dataset (See para. [0027] and para. [0051], the random sample may be determined by selecting a particular ratio).
As to claims 7 and 17, Das discloses wherein the particular number of nearest neighbor data instances in the set of nearest neighbor data instances is one (See para. [0032], other algorithm uses a 1-nearest neighbor rule to iteratively determine if a data point should be removed or not).
As to claims 8 and 18, Das discloses obtaining a first prediction, for the target data instance, using an ML model; obtaining a second prediction, for a particular generated data instance of the set of generated data instances, using the ML model; determining a difference metric that measures a difference between the first prediction and the second prediction; and using the difference metric to determine an importance score for the particular tested feature (See para. [0044]-para.[0046], the classification modeling system 104 provides, to the data processing system 102, an option to use a simple likelihood classification (SLC) algorithm that may be used for the classification of data set, the SLC algorithm may return a score, for a given input point, ranging between 0 and 1. Using this score, and thresholds defined by the data processing system 102 or other entity, the SLC algorithm may determine how to classify an input point. The score may be interpreted as the probability that the input point belongs to a particular class [e.g. first prediction class or a second prediction class], whereby the closer the score is to a value corresponding to the class, the more likely that that the input point belongs to that class. The SLC algorithm is implemented to process imbalanced data, whereby few data points for a particular class may be available within the data set).
As to claims 9 and 19, Das discloses wherein each nearest neighbor data instance, of the set of nearest neighbor data instances, comprises first one or more feature values for one or more features that are more similar, to second one or more feature values for the one or more features of the target data instance, than data instances in the random sample dataset that are excluded from the set of nearest neighbor data instances (See para. [0027] and para. [0028], perform under-sampling of data points in the majority class based on their distance to other points in the same class using a k-nearest neighbors (kNN) algorithm, the kNN algorithm is based on feature similarity, whereby the algorithm is used to determine, based on how closely out-of-sample features resemble a data set, how to classify any given data point. The system uses the kNN algorithm to shortlist k-nearest neighbors in the majority class for every data point in the minority class. Subsequently, the system executing the kNN algorithm may calculate the average distance of these k-nearest neighbors from their respective minority class. The system may maintain only those data points from the majority class whose average distance is the smallest from the minority class. This may result in a sample number of data points of the majority class).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
 	Goyal et al. (US 2022/0222931 A1) discloses an ensemble learning based method is for a binary classification on an imbalanced dataset. The imbalanced dataset has a minority class comprising positive samples and a majority class comprising negative samples. The method includes: generatively oversampling the imbalanced dataset by synthetically generating minority class examples, thereby generating a generated dataset; using the generated dataset to generate subsamples, and learning a base classifier on each of the subsamples to determine a plurality of base classifiers; and learning a weighted majority vote classifier by combining outputs of the base classifiers. Each of the base classifiers is assigned a weight in such a way that a diversity between the base classifiers on the positive samples is minimized.

	Any inquiry concerning this communication or earlier communications from the examiner should be directed to YUK TING CHOI whose telephone number is (571)270-1637. The examiner can normally be reached Monday-Friday 9am-6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alford W Kindred can be reached on 5712724037. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/YUK TING CHOI/Primary Examiner, Art Unit 2153