DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is responsive to communications: Application filed on 10/07/2020.
Claims 1-20 are pending. Claims 1, 8, and 15 are independent.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Siracusa et al. (US2020/0380301) in view of Zuo et al. (US2020/0184340).

In regards to claim 1, Siracusa et al. substantially discloses a computer-implemented method, the method comprising: 
classifying, by one or more processors, documents of a first subset of the set of documents based, at least in part, on a text analysis of the documents of the first subset (Siracusa et al. fig. 1 para[0036]-[0037], uses text analysis on a first subset of documents (106)); 
training, by one or more processors, a document classifier using, as training data: (i) results of the classifying of the documents of the first subset, and (ii) metadata associated with the documents of the first subset (Siracusa et al. para[0071], trains model using analysis of first subset and metadata associated with the first subset of documents); and 
classifying, by one or more processors, documents of a second subset of the set of documents by providing metadata of the documents of the second subset to the trained document classifier (Siracusa et al. para[0097], classifies second subset of documents (108) using trained classifier).  
Siracusa et al. does not explicitly disclose identifying, by one or more processors, a set of documents for classification.
However Zuo et al. substantially discloses identifying, by one or more processors, a set of documents for classification (Zuo et al. para[0023], identifies a set of input document); 
It would have been obvious to one of ordinary skill in the art before the filing date of the invention to have combined the of Siracusa et al. with the hybrid model of Zuo et al. in order to train a model to identify documents that provide a false positive (Zuo et al. para[0003]).

In regards to claim 2, Siracusa et al. as modified by Zuo et al. substantially discloses the computer-implemented method of claim 1, further comprising: 
further classifying, by one or more processors, the documents of the second subset based, at least in part, on a text analysis of the documents of the second subset (Siracusa et al. para[0111]); 
comparing, by one or more processors, results of the classifying of the documents of the second subset and results of the further classifying of the documents of the second subset (Siracusa et al. para[0111]); and 
further training, by one or more processors, the document classifier based, at least in part, on the comparing (Siracusa et al. para[0108]).  

In regards to claim 3, Siracusa et al. as modified by Zuo et al. substantially discloses the computer-implemented method of claim 2, further comprising: 
determining, by one or more processors, whether an exit criterion for training the document classifier has been met (Siracusa et al. para[0119]).  

In regards to claim 4, Siracusa et al. as modified by Zuo et al. substantially discloses the computer-implemented method of claim 3, further comprising: in response to determining that that the exit criterion has been met, classifying, by one or more processors, the remaining documents of the set of documents by providing metadata of the remaining documents to the further trained document classifier (Siracusa et al. para[0119]).  

In regards to claim 5, Siracusa et al. as modified by Zuo et al. substantially discloses the computer-implemented method of claim 3, further comprising: 
in response to determining that the exit criterion has not been met: classifying, by one or more processors, documents of a third subset of the set of documents by providing metadata of the documents of the third subset to the further trained document classifier (Siracusa et al. para[0110]); 
further classifying, by one or more processors, the documents of the third subset based, at least in part, on a text analysis of the documents of the third subset (Siracusa et al. para[0111]); 
comparing, by one or more processors, results of the classifying of the documents of the third subset and results of the further classifying of the documents of the third subset (Siracusa et al. para[0111]); and 
further training, by one or more processors, the document classifier based, at least in part, on the comparing of the results of the classifying of the documents of the third subset and the results of the further classifying of the documents of the third subset (Siracusa et al. para[0108]).  

In regards to claim 6, Siracusa et al. as modified by Zuo et al. substantially discloses the computer-implemented method of claim 1, wherein the trained document classifier classifies one or more documents of the set of documents as non-compliant (Zuo et al. para[0024]).  
It would have been obvious to one of ordinary skill in the art before the filing date of the invention to have combined the of Siracusa et al. with the hybrid model of Zuo et al. in order to train a model to identify documents that provide a false positive (Zuo et al. para[0003]).

In regards to claim 7, Siracusa et al. as modified by Zuo et al. substantially discloses the computer-implemented method of claim 6, further comprising: remediating, by one or more processors, the one or more documents classified as non- compliant by: (i) purging the one or more documents classified as non-compliant from the set of documents, (ii) storing the one or more documents classified as non-compliant to a different location than the set of documents, and (iii) informing owners of the one or more documents classified as non-compliant that the owners own a non-compliant document (Zuo et al. para[0036]-[0037] and [0050]).
It would have been obvious to one of ordinary skill in the art before the filing date of the invention to have combined the of Siracusa et al. with the hybrid model of Zuo et al. in order to train a model to identify documents that provide a false positive (Zuo et al. para[0003]).

Claims 8-14 recite substantially similar limitations to claims 1-7. Thus claims 8-14 are rejected along the same rationale as claims 1-7.

Claims 15-20 recite substantially similar limitations to claims 1-6. Thus claims 15-20 are rejected along the same rationale as claims 1-6.






Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Ghulati et al. (US2021/0019339) teaches a method of dividing and labeling a set of documents to train a machine learning model.
Guastella et al. (US2019/0258985) teaches separating data into a training set, a validation set, and a test set.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to NICHOLAS HASTY whose telephone number is (571)270-7775. The examiner can normally be reached Monday-Friday 8:30am-5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Stephen Hong can be reached on (571)272-4124. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/N.H/Examiner, Art Unit 2178                                                                                                                                                                                                        
/STEPHEN S HONG/Supervisory Patent Examiner, Art Unit 2178