DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 06/12/2019 is being considered by the examiner.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 2, 4, 7, 8, 10, 13, 14, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Xu et al, “Representative Sampling for Text Classification Using Support Vector Machines” (published in 25th European Conference on Information Retrieval Research, ECIR 2003, Vol 2633, pages 393-407, January 2003) in view of Tsuchida et al, U.S. Publication No. 2012/0030157.

Regarding claim 7, Xu teaches a computer system for training a machine learning model (see Xu Abstract) configured to 

generate a vector representation for each document in a collection of documents (see section 3.1, first paragraph and step 2); 

cluster the documents based on the vector representations of the documents to produce a plurality of clusters (see section 3.1, step 3); 

produce a training set by selecting one or more documents from each cluster, wherein the selected documents represent a sample of the collection of documents to train the machine learning model (see section 3.1, steps 3 and 4); and 

train the machine learning model by applying the training set to the machine learning model (see section 3.1, final paragraph).

Xu does not expressively teach wherein the computer system comprises one or more computer processors; one or more computer readable storage media; program instructions stored on the one or more computer readable storage media for execution by at least one of the one or more computer processors, the program instructions comprising instructions to perform the above steps.

However, Tsuchida in a similar invention in the same field of endeavor teaches a computer system for training a machine learning model (see Tsuchida Figure 12 and Abstract) configured to cluster documents (see Figure 1, unit 21 and paragraph [0028]) and produce a training set of documents (see Figure 1, unit 22) as taught in Xu wherein the computer system comprises 

one or more computer processors (see Figure 12, processor 300); one or more computer readable storage media (see Figure 12, storage medium 302); program instructions stored on the one or more computer readable storage media for execution by at least one of the one or more computer processors, the program instructions comprising instructions to perform the above steps (see claim 22).

One of ordinary skill in the art before the effective filing date of the invention would have found it obvious to combine the teaching of using a processor and memory storing instructions to perform a training method as taught in Tsuchida with the system taught in Xu, the motivation being to automate the training method thereby increasing its speed and efficiency.

Independent claims 1 and 13 recite similar limitations as claim 7, and are rejected under similar rationale.

Regarding claim 8, Xu in view of Tsuchida teaches all the limitations of claim 7, and further teaches wherein the program instructions to select the one or more documents from each cluster further comprise instructions to: 

select the one or more documents from each cluster over a plurality of iterations based on selection criteria (see Xu section 3.1, step 3 referring to the k medoid documents and step 5).

Xu in view Tsuchida does not expressively teach wherein the selection criteria for each iteration include one of closest to a centroid, farthest from the centroid, and a random selection.

However, one of ordinary skill in the art before the effective filing date of the invention would have found it obvious as a matter of simple substitution to replace the selection criteria taught in Xu in view Tsuchida with the claimed selection criteria to yield the predictable results of successfully selecting training documents for the system. 

Claims 2 and 14 recite similar limitations as claim 8, and are rejected under similar rationale. 
Regarding claim 10, Xu in view of Tsuchida teaches all the limitations of claim 7, and further teaches wherein the vector representation for each document is generated based on dictionaries with keywords, and wherein the vector representation for each document indicates terms of the document in the dictionaries (see Xu section 4.1, first paragraph).

Claims 4 and 16 recite similar limitations as claim 8, and are rejected under similar rationale. 

Claims 3, 6, 9, 12, 15, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Xu et al, “Representative Sampling for Text Classification Using Support Vector Machines” (published in 25th European Conference on Information Retrieval Research, ECIR 2003, Vol 2633, pages 393-407, January 2003) in view of Tsuchida et al, U.S. Publication No. 2012/0030157 and English et al, U.S. Publication No. 2018/0232380.

Regarding claim 9, Xu in view of Tsuchida teaches all the limitations of claim 7, but does not expressively teach wherein the vector representation for each document is generated by applying a latent Dirichlet allocation model, and wherein the vector representation for each document indicates associations of the document to corresponding topics.

However, English in a similar invention in the same field of endeavor teaches a computer system (see English Figure 2) configured to create a vector representation for documents (see paragraph [0024]) as taught in Xu in view of Tsuchida wherein 

the vector representation for each document is generated by applying a latent Dirichlet allocation model, and wherein the vector representation for each document indicates associations of the document to corresponding topics (see paragraph [0024]).


One of ordinary skill in the art before the effective filing date of the invention would have found it obvious as a matter of simple substitution to replace the generic method of generating a vector representation of documents taught in Xu in view of Tsuchida with that taught in English to yield the predictable results of successfully converting the documents to vectors.

Claims 3 and 15 recite similar limitations as claim 8, and are rejected under similar rationale. 

Regarding claim 12, Xu in view of Tsuchida and English teaches all the limitations of claim 9, and further teaches instructions to: 

label the selected documents with a corresponding topic based on the indicated associations to corresponding topics in response to an absence of topic labels for the selected documents (see section 3.1, step 4 and section 4.2, second paragraph).

Claims 6 and 18 recite similar limitations as claim 8, and are rejected under similar rationale. 

Allowable Subject Matter
Claims 5, 11, and 17 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

NOTE: Computer readable medium (CRM) claims 13-18 were not rejected under 35 USC § 101 because paragraph [0064] of the application explicitly excludes noneligible signal embodiments of CRM.  

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CASEY L KRETZER whose telephone number is (571)272-5639. The examiner can normally be reached M-F 10:00-7:00 PM PDT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, DAVID C PAYNE can be reached on (571)272-3024. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/CASEY L KRETZER/Examiner, Art Unit 2637