Hey Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Detailed Action
This Office Action is responsive to the Amendments and Remarks filed on 28 September 2021.  As directed by the Amendments, claims 1, 4, and 5 have been amended.  Claims 1-5 are pending in the application.


Response to Arguments
The arguments presented on pages 6-8 of the Remarks filed on 28 September 2021 have been fully considered by the Examiner, but are not persuasive.

On page 7 of the Remarks, the Applicant states: 
    PNG
    media_image1.png
    480
    648
    media_image1.png
    Greyscale


	The Examiner respectfully disagrees.  While Baxter discloses selecting or choosing “[a] number of documents” from the ranked query results to form “further training document subset D1”, the Examiner contends that this “number of documents” may be one, and that the “document subset D1” may contain a single training document 
	Figure 3 of Baxter depicts the iterative expansion of a “Training Set” for a machine learning model, wherein the “Training Set” for each time step (after initial Time Step 0) is comprised of the “Training Set” from the previous time step, augmented with the “Query Results” R from the previous time step.  For example, element 321 of Figure 3 depicting the Training Set Dt for time step t shows that the Training Set Dt is comprised of the Training Set Dt-1 and the subset of ranked query results Rt-1.  
Baxter discloses no restrictions on the “number of documents” to be selected to augment the training set in each time step, or on the size of the resulting subset R.  In the case where the “number of documents” is one and the subset R for each time step therefore consists of a single training document selected from the ranked query results, the effect is that during each iteration, a single additional training document is added to the previous Training Set.  As each iteration produces a new, expanded “Training Set”, this process reads on the amended claim language generating a plurality of teacher data sets (the multiple “Training Sets” generated by Baxter, one per time step) obtained by adding a teacher data element, one by one, (the subset R of ranked query results added to the Training Set for the subsequent time step may be a single result) from the plurality of teacher data elements (the ranked query results generated at each time step by the classifier 302/312/322).



The arguments regarding dependent claims 2 and 3 are based upon their dependence from independent claim 1 and are not addressed separately here.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having 


Claims 1-5 are rejected under 35 U.S.C. 103 as being unpatentable over Baxter (US 2008/0281764) (previously cited) in view of Phillipps et al., (US 2014/0372346, hereinafter “Phillipps”) (previously cited).

Regarding claim 1, Baxter discloses [a]n information processing apparatus comprising: a memory configured to store therein a plurality of teacher data elements; (Baxter, § [0067] “Corpus 100 is a data set consisting of a plurality of text documents such as web pages.” 
and a processor configured to perform a process including: reading the plurality of teacher data elements from the memory; (Baxter, ¶ [0069] a subset D0 of text documents is selected 110 from corpus 100 and labeled 120 by a person as to whether the document has a predetermined characteristic […]”) 
extracting, from the plurality of teacher data elements, a plurality of potential features each included in at least one of the plurality of teacher data elements; (Baxter, ¶ [0067] “In practice each document is represented by a vector d=[w1, … wn] which is an element of the high-dimensional vector-space consisting of all terms.  In this representation, wt is non-zero for document d only if the document contains term ti.”) 
calculating, based on a number of the plurality of teacher data elements including a potential feature which is one of the plurality of potential features, a degree of importance of each of the plurality of potential features in machine learning; (Baxter, ¶ [0067] “The numerical value of wi can be set in a variety of ways, ranging i in d,  through to the use of more sophisticated weighting schemes such as tfidf (term frequency inverse document frequency) where each matching term in a document is weighted by its frequency in the document multiplied by the logarithm of the total number of documents in the corpus divided by the number of documents in which the term appears.) [Setting the weight value wi using tfidf corresponds to claimed “degree of importance of each of the plurality of potential features.”]
calculating an information amount of each of the plurality of teacher data elements, using degrees of importance calculated respectively for a plurality of potential features included in said each teacher data element; (Baxter, ¶ [0069] “In a typical prior art system, a generic linear classifier is employed wherein a weight wi is generated for each feature [w1,…wn] (usually most of the wi are zero) and the score assigned by the classifier c to document d is then given as the sum of the weights of the features in d: [equation (1)] [Each document is scored based on the summed weights of the individual features contained within the document.]
generating a plurality of teacher data sets obtained by adding a teacher data element, one by one, from the plurality of teacher data elements, the teacher data element being added in an order based on the information amount of each of the plurality of teacher data elements; (Baxter, ¶ [0069] a subset D0 of text documents is selected 110 from corpus 100 and labeled 120 by a person as to whether the document has a predetermined characteristic […]” [The subset D0 is an initial set]; Baxter, ¶ [0077] “Once the classifier has been trained, criteria are determined 140 for further labeling candidates. Corpus 100 [corresponds to claimed “plurality of teacher data elements”] is 150 for label candidates D1 which are then labeled 160, and further incorporated 170 into the training set to train classifier 130.” [Additional documents are selected for adding to the machine learning training set.]; Baxter, ¶ [0086] “Results are then ranked 250 from the inverted index query. As would be apparent to those skilled in the art the results can be ranked according to a formula that depends on the number of features in the document matching the query, the frequency and proximity of features within the document [corresponds to claimed “information amount”] such as tfidf or alternatively incorporate a separate "boost" (weight) associated with the features in the query. A number of documents are then selected or chosen from the ranked results [corresponds to claimed “in an order based on the information amount of each of the plurality of teacher data elements”] and labeled 260 thereby forming further training document subset D1 which is added to initial set D0 270 and the classifier is then trained 230 on this new example set. [Additional documents from the training corpus are ranked, possibly using an information amount metric such as tfidf, and a number of those additional documents form a subset D1 to be added to the initial set D0.  The “number of documents” chosen or selected may be one, and the “further training document subset D1” may therefore consist of a single additional document.]
repeating, […], selecting a teacher data set from the plurality of2PATENTDocket No.: 16-01283 App. Ser. No.: 15/673,606teacher data sets and generating a learning model by performing the machine learning based on the teacher data set;[…]. (Baxter, Figure 3 and ¶ [0088], “Generalizing to time step t 320, a linear classifier ct, 322 is trained to fit data Dt, 321 which is formed from training data Dt-1 311 at time step t-1 310 and labeled documents Rt-1 313. Classifier ct is formed from the classifier ct-1 from the previous time-step and new features wtft with a weight wt t on the new training set Dt.  As would be appreciated by those skilled in the art, several rounds of training could be performed at each time step generating a series of features f1... fk, each minimizing some error which are then added to the existing classifier Ct-1 + [Sigma] i-1k wifi for each time step t.” [Corresponds to repeatedly selecting additional training document(s) to add to the existing set and forming a classifier (corresponds to learning model) based on the previous documents and the newly added document(s) in order to minimize an error at each step.  As noted above, the system of Baxter is operable to add a single new training document at each iteration.]

	Although Baxter as discussed immediately above discloses repeatedly adding new documents to a training set and training a learning model at each step to minimize an error, Baxter does not explicitly disclose until result of the machine learning satisfies a prescribed condition
-or-
and outputting the learning model generated when the prescribed condition is satisfied.

Phillipps teaches until result of the machine learning satisfies a prescribed condition (Phillipps, ¶ [0063] “To evaluate these versions of the organized data sets, the supervised learning module 206 can be configured to generate one or more machine learning ensemble based on each of the multiple versions of the structured data set. Each of these machine learning ensembles 222a-c can be evaluate[d] by the 
-and-
and outputting the learning model generated when the prescribed condition is satisfied (Ibid., “The data intelligence module 102 may use the machine learning ensemble with the highest predictive performance to provide predictive functionality to the user.” [corresponds to claimed “outputting the learning model generated when the prescribed condition (the model with the highest predictive performance) is satisfied.]

	Phillipps is analogous art, as it is directed to the task of repeatedly training learning models using varying sets of training data.
	It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to modify the training document selection of Baxter 
[0063] “Predictive performance may indicate which machine learning ensemble can predict unknown values with the highest degree of accuracy.”

	Claims 4-5 recite similar limitations as claim 1 and are rejected under the same rationale as applied to claim 1 above.

Regarding claim 2, the combination of Baxter and Phillipps as applied to claim 1 above teaches [t]he information processing apparatus according to claim 1.  Further, Baxter discloses wherein the generating a plurality of teacher data sets includes selecting a prescribed number of teacher data elements in descending order of information amount or teacher data elements with information amounts larger than or equal to a threshold. (Baxter, ¶ [0070] “To perform classification, a threshold t is chosen and any document d whose score c(d) exceeds t is deemed to belong to the positive class.”; Baxter, ¶ [0083] “In this embodiment the documents are returned in descending score order but clearly other alternative ranking schemes can be used, such as returning all documents whose scores are close to the decision threshold t (as the latter group may be more informative in some instances).” [Classified 

Regarding claim 3, the combination of Baxter and Phillips as applied to claim 1 above teaches [t]he information processing apparatus according to claim 1.  Further, Baxter discloses wherein the generating a plurality of teacher data sets includes generating a first teacher data set (Baxter, ¶ [0069] “A subset D0 of text documents is selected 110 from corpus 100 and labeled 120 by a person as to whether the document has a predetermined characteristic […]”) and a second teacher data set, (Baxter, ¶ [0077] “Once the classifier has been trained, criteria are determined 140 for further labeling candidates. Corpus 100 is then searched 150 for label candidates D1 which are then labeled 160, and further incorporated 170 into the training set to train classifier 130.” [Data sets D0 and D1 are generated]
the first teacher data set including a first teacher data element and not including a second teacher data element with a smaller information amount than the first teacher data element, (Baxter, ¶ [0069] “A subset D0 of text documents is selected 110 from corpus 100 and labeled 120 by a person as to whether the document has a predetermined characteristic […]”) [The initial data set D0 consists only of the documents that contain a particular feature, while other documents that may later be selected for set D1 or a subsequent set do not contain that predetermined characteristic.  As such, the documents in set D0 have a greater “information amount” regarding that 0 but may later be included in subsequent sets.]
the second teacher data set including the first teacher data element and the second teacher data element, (Baxter, ¶ [0077] “Once the classifier has been trained, criteria are determined 140 for further labeling candidates. Corpus 100 is then searched 150 for label candidates D1 which are then labeled 160, and further incorporated 170 into the training set to train classifier 130.”) [Additional documents are selected for adding to the machine learning training set, such that set D1 contains the documents in set D0 as well as additional documents.]

and Phillips teaches and the process further includes obtaining a first result of the machine learning performed on the first teacher data set and a second result of the machine learning performed on the second teacher data set, and searching for a subset including a plurality of teacher data elements that produce a result of the machine learning satisfying the prescribed condition, based on the first result and the second result. (Phillipps, ¶ [0063] “To evaluate these versions of the organized data sets, the supervised learning module 206 can be configured to generate one or more machine learning ensemble based on each of the multiple versions of the structured data set. [corresponds to claimed “first result” and “second result” of the “first teacher data set” and “second teacher data set.”] Each of these machine learning ensembles 222a-c can be evaluate[d] by the supervised learning module 206, which can then determine which version exhibits the highest predictive performance [corresponds to claimed “prescribed condition”]. Predictive performance may indicate 

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SCOTT R GARDNER whose telephone number is (469)295-9128. The examiner can normally be reached 8:00am - 5:00pm M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann J Lo can be reached on 571-272-9767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SCOTT R GARDNER/Examiner, Art Unit 2126 
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126