DETAILED ACTION
This office action is in response to the correspondence filed on 05/29/2019. Claims 1-20 are pending and are examined.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claims recite identifying a plurality of scores generated by a prediction model, determining a plurality of buckets, assigning each entity to one bucket; generating a probability distribution function based on the plurality of scores and a number of scores belonging to each bucket of the plurality of buckets; determining, based on the probability distribution function and a score corresponding to each entity, a probability of sampling said entity; sampling a subset of the plurality of entities based on the probability determined for each entity.
The limitation of the determining, assigning, generating steps, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting “by one or more computing devices/processors,” nothing in the claim element precludes the step from practically being performed 
This judicial exception is not integrated into a practical application. In particular, the claim only recites one additional element – using a processor to perform the identifying, determining, assigning, generating, and sampling steps. The processor in these steps is recited at a high-level of generality (i.e., as a generic processor performing a generic computer function of determining and sampling a subset of the plurality of entities) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using a processor to perform the identifying, determining, assigning, generating, and sampling steps amounts to no more than mere instructions to 

Claims 11-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because the term "storage media" is directed to signal per se, thus non-statutory.
Examiner notes that even though the term "storage media" is defined in the specification paragraph [0061], it is still inconclusive that it indeed excludes all transitory signals as it seems to refer to any non-transitory media that store data and/or instructions but such storage media may also comprise non-volatile media and/or volatile media, which does not exclude signals. Examiner notes that the phrase "which do not include signals" could be added to the claim term to limit the “storage media” to only statutory subject matter.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 4, 6, 8, 10-12, 14, 16, 18, AND 20 are rejected under 35 U.S.C. 103 as being unpatentable over Grechka et al. (US Pub No. 2019/0034827 A1, referred to as Grechka), in view of Rimoldini (NPL-“ Weighted skewness and kurtosis unbiased by sample size”, referred to as Rimoldini).
claim 1, Grechka discloses,
1. A method comprising:
identifying a plurality of scores generated by a prediction model, each score corresponding to a different entity of a plurality of entities; (Grechka: [0004]; (3) for each of the content items (entities), a machine-learning classification model to generate a score for the content item that indicates a likelihood that the content item is of the class of content.)
determining a plurality of buckets, each bucket corresponding to a different range of scores; (Grechka: [0004]; (4)(a) generating buckets that each is assigned a range of scores from the machine learning classification model.)
for each entity of the plurality of entities, assigning, based on the score corresponding to said each entity, said each entity to one bucket of the plurality of buckets; (Grechka: [0004]; (4)(b) each bucket contains a subset of the content items whose scores fall within the range of scores.)
generating a probability distribution function based on the plurality of scores and a number of scores belonging to each bucket of the plurality of buckets; (Grechka: [0005]; (1) a first probability metric of each of the buckets that indicates a probability that a sampled content item will fall into the bucket (bucket are based on a range of scores).)
for each entity of the plurality of entities… a probability of sampling said each entity; (Grechka: [0004]; (5) determining a sampling rate for each of the buckets that minimizes a variance metric of the estimator, (6) selecting, from each of the buckets, a portion of content items according to the sampling rate of the bucket (sampling rate/probability applies to content items in the buckets).)
sampling a subset of the plurality of entities based on the probability determined for each entity of the plurality of entities; (Grechka: [0004]; (6) selecting, from each of the buckets, a portion of content items according to the sampling rate of the bucket, and (7) sending the portion of content items (sampling a subset based on sampling rate for labeling).)
wherein the method is performed by one or more computing devices. (Grechka: [0009].)
Grechka does not explicitly disclose, however Rimoldini teaches,
…determining, based on the probability distribution function and a score corresponding to said each entity, a probability of sampling said each entity… (Rimoldini: p. 2 of 33: (i) weights (e.g. scores) assign more importance to some data (probabilities) at the expense of other ones, effectively reducing the sample size as results depend mostly on fewer `relevant' measurements.)
It would have been obvious to one ordinary skill in the art before the effective filing date of the claimed invention to implement the teachings Rimoldini of into the teachings of Grechka with a motivation to help identify reliable measurements from uncertain or spurious outliers, quantify the relevance of measurements, and enhance targeted features of the data depending on the objectives of the analysis by weighting data (Rimoldini abstract and p. 2 of 33).


Regarding claims 2 and 12, taking claim 2 as exemplary, the combination of Grechka and Rimoldini discloses,
2. The method of Claim 1, wherein:
Grechka further discloses,
each entity of the plurality of entities is associated with an activity measure; (Grechka: [0004]; (1) selecting an estimator of a prevalence of a class of content within an online system (e.g., a class of content that violates a content policy of the online system), (3) using, for each of the content items, a machine-learning classification model to generate a score for the content item that indicates a likelihood (e.g. activity relating to likelihood of violation of a content policy).)
assigning said each entity to one bucket of the plurality of buckets is further based on the activity measure associated with said each entity. (Grechka: [0004]; (4)(b) each bucket contains a subset of the content items whose scores fall within the range of scores.)


Regarding claims 4 and 14, taking claim 4 as exemplary, the combination of Grechka and Rimoldini discloses,
4. The method of Claim 1, 
Grechka further discloses,
wherein the subset of the plurality of entities corresponds to a set of scores, (Grechka: [0004]; (4)(b) each bucket contains a subset of the content items whose scores fall within the range of scores.)
the method further comprising:
receiving, from one or more human reviewers, feedback indicating which entities in the subset of the plurality of entities are associated with a particular classification; (Grechka: [0004]; (7) sending the portion of content items from each of the buckets to one or more human labelers for labeling. In some examples, the estimator may rely upon labeled content items that have been labeled by the one or more human labelers as being of the class of content.)
based on the feedback, generating an estimate of the number of the plurality of entities that are associated with the particular classification. (Grechka: [0048]; selecting content items to send to human labelers for prevalence estimation purposes.)

claims 6 and 16, taking claim 6 as exemplary, the combination of Grechka and Rimoldini discloses,
6. The method of Claim 4, wherein:
Grechka further discloses,
each entity of the plurality of entities is associated with an activity measure; (Grechka: [0004]; (1) selecting an estimator of a prevalence of a class of content within an online system (e.g., a class of content that violates a content policy of the online system), (3) using, for each of the content items, a machine-learning classification model to generate a score for the content item that indicates a likelihood that the content item is of the class of content (e.g. activity relating to likelihood of violation of a content policy).)
assigning said each entity to one bucket of the plurality of buckets is further based on the activity measure associated with said each entity, (Grechka: [0004]; (4)(b) each bucket contains a subset of the content items whose scores fall within the range of scores.)
the method further comprising, generating an estimate of activity of entities that are estimated to be associated with the particular classification. (Grechka: [0004]; (3) for each of the content items (entities), a machine-learning classification model to generate a score for the content item that indicates a likelihood that the content item is of the class of content.)


Regarding claims 8 and 18, taking claim 8 as exemplary, the combination of Grechka and Rimoldini discloses,
8. The method of Claim 1, 
Grechka further discloses,
wherein the plurality of scores are below a particular threshold that is associated with the prediction model, wherein each bucket of the plurality of buckets is below the particular threshold. (Grechka: [0040]; a decision-tree based algorithm to recursively split scored content items into buckets and may stop splitting the scored content items of a bucket into additional buckets once the scores of the content items in the bucket reach a uniformity threshold (below the threshold) (e.g., a threshold based on an entropy measurement of the scores of the content items that are contained in the bucket).)


Regarding claims 10 and 20, taking claim 10 as exemplary, the combination of Grechka and Rimoldini discloses,
10. The method of Claim 1, 
Grechka further discloses,
wherein each entity of the plurality of entities is a content item and (Grechka: [0004]; (2) sampling content items from the online system.) each score of the plurality of scores is associated with a likelihood that the corresponding entity is a fraudulent entity, (Grechka: [0004]; (3) using, for each of the content items, a machine-learning classification model to generate a score for the content item that indicates a likelihood that the content item is of the class of content (e.g. activity relating to likelihood of violation of a content policy such as fraudulent entity).) wherein the content item is one of an online article, an online posting, or a job posting. (Grechka: [0004]; content within an online system (e.g. online posting).)


Regarding claim 11, Grechka discloses,
11. One or more storage media storing instructions which, when executed by one or more processors, cause: (Grechka: [0009].)
identifying a plurality of scores generated by a prediction model, each score corresponding to a different entity of a plurality of entities; (Grechka: [0004]; (3) for each of the content items (entities), a machine-learning classification model to generate a score for the content item that indicates a likelihood that the content item is of the class of content.)
determining a plurality of buckets, each bucket corresponding to a different range of scores; (Grechka: [0004]; (4)(a) generating buckets that each is assigned a range of scores from the machine learning classification model.)
for each entity of the plurality of entities, assigning, based on the score corresponding to said each entity, said each entity to one bucket of the plurality of buckets; (Grechka: [0004]; (4)(b) each bucket contains a subset of the content items whose scores fall within the range of scores.)
generating a probability distribution function based on the plurality of scores and a number of scores belonging to each bucket of the plurality of buckets; (Grechka: [0005]; (1) a first probability metric of each of the buckets that indicates a probability that a sampled content item will fall into the bucket (bucket are based on a range of scores).)
for each entity of the plurality of entities… a probability of sampling said each entity; (Grechka: [0004]; (5) determining a sampling rate for each of the buckets that minimizes a variance metric of the estimator, (6) selecting, from each of the buckets, a portion of content items according to the sampling rate of the bucket (sampling rate/probability applies to content items in the buckets).)
sampling a subset of the plurality of entities based on the probability determined for each entity of the plurality of entities. (Grechka: [0004]; (6) selecting, from each of the buckets, a portion of content items according to the sampling rate of the bucket, and (7) sending the portion of content items (sampling a subset based on sampling rate for labeling).)
Grechka does not explicitly disclose, however Rimoldini teaches,
…determining, based on the probability distribution function and a score corresponding to said each entity, a probability of sampling said each entity… (Rimoldini: p. 2 of 33: (i) weights (e.g. scores) assign more importance to some data (probabilities) at the expense of other ones, effectively reducing the sample size as results depend mostly on fewer `relevant' measurements.)
It would have been obvious to one ordinary skill in the art before the effective filing date of the claimed invention to implement the teachings Rimoldini of into the teachings of Grechka with a motivation to help identify reliable measurements from uncertain or spurious outliers, quantify the relevance of measurements, and enhance targeted features of the data depending on the objectives of the analysis by weighting data (Rimoldini abstract and p. 2 of 33).


Claims 3 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Grechka, in view of Rimoldini, further in view of Civil et al. (US Pub No. 2013/0041710 A1, referred to as Civil).
Regarding claims 3 and 13, taking claim 3 as exemplary, the combination of Grechka and Rimoldini discloses,
3. The method of Claim 2, further comprising:
The combination of Grechka and Rimoldini does not explicitly disclose, however Civil teaches,
performing a log transformation of the activity measure, wherein determining the probability of sampling the entity is also based on the log transformation of the activity measure associated with the entity. (Civil: Fig. 1; [0018]; observed data may be transformed if desired. Use the natural logarithm (can use log transformation to samples before additional statistical analysis).)
It would have been obvious to one ordinary skill in the art before the effective filing date of the claimed invention to implement the teachings of Civil into the combination of Grechka and Rimoldini with a motivation to reduce the amount of variance if the distribution is skewed by using the natural logarithm of the observed variance to suppress outliers. As a result, the data will have more similarity or symmetry in variance, and may improve the rate of sensitivity for a given rate of false alarms (Civil: [0018]).


Claims 9 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Grechka, in view of Rimoldini, further in view of Zhu et al. (US Pub No. 2013/0185230 A1, referred to as Zhu).
Regarding claims 9 and 19, taking claim 9 as exemplary, the combination of Grechka and Rimoldini discloses,
9. The method of Claim 1, 
The combination of Grechka and Rimoldini does not explicitly disclose, however Zhu teaches,
wherein each entity of the plurality of entities is an account and each score of the plurality of scores is associated with a likelihood that the corresponding entity is a fraudulent entity. (Zhu: [0012]; machine-learning techniques may be used to extract features from training data to distinguish malicious accounts from benign accounts, and to generate a classification model to determine a score on likelihood that the account is malicious (fraudulent entity).)
It would have been obvious to one ordinary skill in the art before the effective filing date of the claimed invention to implement the teachings of Zhu into the combination of Grechka and Rimoldini with a motivation to avoid the need of manual identification of malicious accounts, which is costly, labor .



Allowable Subject Matter
Claims 5, 7, 15, AND 17 contain allowable subject matter but remain rejected under 101 rejections. It is also objected to as being dependent upon rejected base claims, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims; and the stated rejection(s) are resolved.
The following is an examiner’s statement of reasons for allowance: 
Although prior arts Grechka, Rimoldini, Civil and Zhu above disclose all the limitations of the prior claims (see rejections above), none of the prior arts of record alone or in combination discloses computing a ratio for each entity, computing a sum of the set of ratios, and calculating an estimate based on sum and a number of entities as described in the claims.
At the effective filing date of the application, the above limitations would not have been obvious over the prior arts of record. 


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The listed references disclose relevant inventions of detecting fraudulent content or account online.
Strauss; Emanuel Alexandre et al.	US-PGPUB	US 20170262635 A1

Lin; Jiun-Ren et al.			US-PGPUB	US 20190036966 A1
Awadallah; Amr et al.			US-PGPUB	US 20100070620 A1

Any inquiry concerning this communication or earlier communications from the examiner should be directed to KA SHAN CHOY whose telephone number is (571) 272-1569.  The examiner can normally be reached on MON - FRI: 9AM-5:30PM EST Alternate Fridays.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Joseph Hirl can be reached on (571) 272-3685.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/KA SHAN CHOY/Examiner, Art Unit 2435