DETAILED ACTION
This action is responsive to the application filed 12/11/2020.
Claims 1-20 are pending.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-9, 11, 12, and 18-20 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Cormack, et al., U.S. PGPUB No. 2014/0280238 (“Cormack”).
Regarding independent claim 1, Cormack discloses a method for generating a natural language model, the method comprising:
selecting by one or more processors in a natural language platform, from a pool of documents, a first set of documents to be annotated ([0067]-[0068], a subset of documents may be selected by the initial document set generator.);
for each document in the first set of documents, generating, by the one or more processors, a first human readable prompt configured to elicit an annotation of said document ([0072], a user may review initial document set in an user interface 1011 at FIG. 5A or the interface 1011 of FIG. 10, to elicit coding decisions for each document.);
receiving annotations of the first set of documents elicited by the first human readable prompts ([0072], the user may determine a class and/or subclass to be associated with each document in the initial document set.);
training, by the one or more processors, a natural language model using the annotated first set of documents ([0066] and [0073], generating classifiers using the user coding decisions.);
determining, by the one or more processors, documents in the pool having uncertain natural language processing results according to the trained natural language model and/or the received annotations; selecting by the one or more processors, from the pool of documents, a second set of documents to be annotated comprising one or more of the documents having uncertain natural language processing results ([0137], selecting documents closely scores to one or more calculated threshold values, i.e., through uncertainty sampling.);
for each document in the second set of documents, generating, by the one or more processors, a second human readable prompt configured to elicit an annotation of said document ([0138], after selection by an active learning module, the document may be transmitted to user for review, and an active learning interface provided to the user.);
receiving annotations of the second set of documents elicited by the second human readable prompts ([0143], a user coding decision is received from the user.); and
retraining, by the one or more processors, a natural language model using the annotated second set of documents ([0143], active learning module update the classifiers.).
Independent claim 19 is directed towards an apparatus equivalent to a method found in the independent claim 1, and is therefore similarly rejected.
Independent claim 20 is directed towards a non-transitory computer readable medium equivalent to a method found in the independent claim 1, and is therefore similarly rejected.
Regarding claim 2, Cormack discloses wherein the annotations of the first and second sets of documents comprise classification of the documents into one or more categories among a plurality of categories. [0079] discloses indicating a coding decision comprises classification of the documents into 3 or 4 categories, e.g., relevant, non-relevant, unsure/skip, flag.
Regarding claim 3, Cormack discloses wherein the annotations of the first and second sets of documents comprise selection of one or more portions of the documents relevant to one or more topics. [0044] discloses classifying documents as members of classes or subclasses, or issues, also [0160] discloses classification involves subject matter.
Regarding claim 4, Cormack discloses wherein the steps of determining documents having uncertain natural language processing results; selecting a second set of documents to be annotated; generating a second human readable prompt; receiving annotations of the second set of documents; and retraining a natural language model using the annotated second set of documents, are repeated until the trained model has reached a predetermined performance level. [0177] discloses iteration of active learning or training repeated until maximum agreement percentage reaches an acceptable level.
claim 5, Cormack discloses wherein selecting the first set of documents comprises selecting documents that are evenly distributed among different document types. [0068] discloses a subset of documents from the document collection may be randomly selected to create the initial document set.
Regarding claim 6, Cormack discloses wherein selecting the first set of documents comprises selecting at least one document within each of a plurality of machine-discovered topics. [0008] discloses unsupervised learning methods includes clusters or groups together documents purportedly pertaining to the same subject matter, without human intervention and [0135], in certain embodiments, using the unsupervised learning techniques in selecting documents.
Regarding claim 7, Cormack discloses wherein selecting the first set of documents comprises selecting documents based on a keyword search. [0069] discloses the initial document set generator execute a keyword search operation.
Regarding claim 8, Cormack discloses wherein selecting the first set of documents comprises selecting documents based on confidence levels generated by analysis of the documents by one or more pre-existing natural language models. [0070] discloses only those documents above a certain rank or score may be added to an initial document set.
Regarding claim 9, Cormack discloses wherein selecting the first set of documents comprises a manual selection. [0068] discloses a manual selection process including entering keywords, rules or regular expressions to identify documents from the collection meeting the specified criteria, and all or some of the identified documents may be added to an initial document set.
claim 11, Cormack discloses wherein selecting the first set of documents comprises selecting documents based on features contained therein (Cormack uses a document information profile as analogous to the term feature at [0091], and discloses  generating features at [0096]-[0106] and uses it for the initial document set at [0107].
Regarding claim 12, Cormack discloses wherein determining documents having uncertain natural language processing results is based on confidence levels generated by analysis of the documents by the trained model. [0180] discloses selecting a random sample of those documents above confidence level threshold, and a random sample of those documents below the threshold by previous classification effort and present the selected documents to a user.
Regarding claim 18, Cormack discloses wherein the second human readable prompt is configured to elicit a true-or-false answer aimed at resolving uncertainty in the natural language processing results . [0180] discloses classification system present the selected documents to a user to elicit a coding decision whether those documents are relevant or non-relevant in a user interface described at [0086].

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Cormack, in view of Payette, et al., U.S. Patent No. 7,730,113 (“Payette”).
Regarding claim 10, Cormack discloses all the limitation of independent claim 1. 
However, Cormack does not explicitly teach wherein selecting the first set of documents comprises removing exact duplicates and/or near duplicates from the first set of documents.
Payette is in the same field of electronic discovery (Payette, at column 4, lines 42-50) that determines certain electronic file is a duplicate, then the duplicate electronic file may be discarded, obsoleted, or removed (id. at column 19, lines 7-20).
Accordingly, it would have been obvious to one of ordinary skill in the art to modify Cormack's method of providing efficient active learning tools that classify and rank each one of a plurality of documents in a collection of electronically stored information with removing duplicates from electronic files as taught by Payette. One of ordinary skill in the art would have been motivated to make such modification because a reviewing attorney may produce same electronic document marked as “privileged” by another attorney, thereby inadvertently waiving privilege. To reduce the chances of such errors from occurring, the reviewing legal .
Claims 13-17 are rejected under 35 U.S.C. 103 as being unpatentable over Cormack in view of Brian Dolan, US 20120303559 A1 (hereinafter, Dolan).
Regarding claim 13, Cormack discloses all the limitation of independent claim 1. 
However, Cormack does not explicitly teach further comprising training, by the one or more processors, a plurality of additional natural language models using the annotated first set of documents, wherein determining documents having uncertain natural language processing results is based on a level of disagreement among the plurality of additional natural language models.
Dolan is in the same field of creating and training computer-based discovery avatars (Dolan, at ¶ [0037]) that training the computer-based discovery avatars wherein each discovery avatar manages a queue of data stream elements to aid at least one human analyst who is conducting an investigation, such as including tokenizing source data within a data stream presented to an analyst such that the source data may be extracted based on a topic such that with each cycles of avatar formation, queuing, and analyst rating, the avatar increasingly reflects the human ratings (id. at ¶ [0039]), and in case of having disagreement on the document labeling between a plurality of avatars, then the avatar send red flag to examine a record and compare them to the responses of other team members (id. at ¶ [0079]).
Accordingly, it would have been obvious to one of ordinary skill in the art to modify Cormack's method of providing efficient active learning tools that classify and rank each one of 
Regarding claim 14, Cormack in view of Dolan discloses all the limitation of independent claim 1 and its dependent claim 13.  However, Cormack does not explicitly teach wherein the level of disagreement is determined by assigning more weight to models with better known performance levels than models with worse known performance levels. 
Dolan is in the same field of creating and training computer-based discovery avatars (Dolan, at ¶ [0037]) that assigning more weight to avatars of more experienced person than avatars of less experienced or junior person in contributing to analytic investigation (id. at ¶ [0042]).
Accordingly, it would have been obvious to one of ordinary skill in the art to modify Cormack's method of providing efficient active learning tools that classify and rank each one of a plurality of documents in a collection of electronically stored information with assigning more weight to avatars of more experienced person than avatars of less experienced or junior person in contributing to analytic investigation as taught by Dolan. One of ordinary skill in the art would have been motivated to make such modification because each avatar represents an 
Regarding claim 15, Cormack discloses all the limitation of independent claim 1. 
However, Cormack does not explicitly teach wherein determining documents having uncertain natural language processing results is based on a level of disagreement among more than one annotator.
Dolan is in the same field of creating and training computer-based discovery avatars (Dolan, at ¶ [0037]) that alerting users of constant disagreement when there is disagreement among more than one annotator (id. at ¶ [0078]).
Accordingly, it would have been obvious to one of ordinary skill in the art to modify Cormack's method of providing efficient active learning tools that classify and rank each one of a plurality of documents in a collection of electronically stored information with training computer-based discovery avatars using the annotated documents, wherein determining documents having uncertain processing results is based on a level of disagreement among the plurality of additional avatars as taught by Dolan. One of ordinary skill in the art would have been motivated to make such modification because annotators are vulnerable to human error (Dolan, at ¶ [0078]).
Regarding claim 16, Cormack in view of Dolan discloses all the limitation of independent claim 1 and its dependent claim 15.  However, Cormack does not explicitly teach wherein the level of disagreement is determined by assigning more weight to annotators with better known performance levels than annotators with worse known performance levels.
id. at ¶ [0042]).
Accordingly, it would have been obvious to one of ordinary skill in the art to modify Cormack's method of providing efficient active learning tools that classify and rank each one of a plurality of documents in a collection of electronically stored information with assigning more weight to annotators with better known experience levels than annotators with less experience levels as taught by Dolan. One of ordinary skill in the art would have been motivated to make such modification because collective ratings may be benefited from the knowledge of very skilled expert annotators according to the ratings of those annotators (Dolan, at ¶ [0040]).
Regarding claim 17, Cormack in view of Dolan discloses all the limitation of independent claim 1 and its dependent claim 15.  Cormack discloses wherein selecting the second set of documents comprises selecting documents similar to documents (Cormack, at ¶ [0135], discloses algorithms that selecting similar document having a similar score that were coded similarly in the same class or subclass or documents have the distance is inside.).  However, Cormack does not explicitly teach that have a high level of disagreement among more than one annotator. Dolan teaches selecting mislabeled documents among more than one annotator (Dolan, at ¶ [0078]).
Accordingly, it would have been obvious to one of ordinary skill in the art to modify Cormack's method of providing efficient active learning tools that classify and rank each one of a plurality of documents in a collection of electronically stored information, and selecting 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KEITH D BLOOMQUIST whose telephone number is (571)270-7718.  The examiner can normally be reached on M-F, 8:30-5 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Stephen Hong can be reached on 571-272-4124.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/KEITH D BLOOMQUIST/Primary Examiner, Art Unit 2178                                                                                                                                                                                                        
8/11/2021