Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION

Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.

Claims 1-21 rejected under 35 U.S.C. 103 as being unpatentable over Baughman: 20180197531 hereinafter Bau further in view of Lockett: 10108902 hereinafter Loc.
Regarding claim 1, 8, 15
Bau teaches:
A system, method and coded instructions for document analysis comprising: 
a processor; a data store, comprising a first target corpus of electronic documents and a second target corpus of electronic documents; and a non-transitory computer readable medium (Bau: ¶ 15-22, 86-91; Fig 1, 8: first, second, etc. corpus persisted in a memory datastore, accessible by processor of exemplary computing device) comprising instructions for: 
receiving a definition of a first code in association with the first target corpus (Bau: ¶ 42-48, 52-63; Fig 3, 4: feature vectors received from a first external corpus for which an inclusion test determines a confidence score for a particular code, term, etc. in the first corpus and determines inclusion of the term in a model); 
creating a first dataset for the first code (Bau: ¶ 42-48, 52-63; Fig 3, 4: a dictionary of confidence scores created for each of a set of hypernyms determined for each term subject to inclusion test); 
receiving an indication that the first code is to be boosted with a second code (Bau: ¶ 52-63; Fig 4: each hypernym comprising a second, etc. term is boosted using a collective confidence score and ultimately included in a model to boost the first code if judged to impart a positive sensitivity to the current domain language model),

the second dataset comprising a first set of positive signals associated with the second code and documents of the second corpus and a first set of negative signals associated with the second code and documents of the second target corpus  (Bau: ¶ 52-63; Fig 4: a first set comprising negative and positive signals used to curate a current domain language model); 
adding the second dataset associated with the second code and the second target corpus to the first dataset of the first code such that the first dataset comprises a boosting dataset including the second dataset comprising the first set of positive signals associated with the second code and documents of the second corpus and the first set of negative signals associated with the second code and documents of the second target corpus (Bau: ¶ 52-63; Fig 4: a second code comprising positive and negative signals determined to induce a positive sensitivity into the current domain language model is included in the current domain language model, the first set of positive signals and the first set of negative signals comprise a unary set wherein a particular value determines a positive or negative value with regard to the term); 
training a first machine learning model for the first code on the boosting dataset of the first dataset (Bau: ¶ 49-51; Fig 3: the current domain language model, boosted with terms included from a second, third, etc. domain used to train and evaluate an expanded domain language model); 
generating predictive scores for the first code for documents of the first target corpus using the first machine learning model (Bau: ¶ 25-27, 49-51; Fig 3: predictive relevance, ground truth, and/or error values of the newly added terms included in the expanded domain language model by evaluation).

In a related field of endeavor Loc teaches a system, method and instructions for document analysis comprising: 
a processor; a data store, comprising a first target corpus of electronic documents and a second target corpus of electronic documents; and a non-transitory computer readable medium (Loc: Col 3:04-4:25; Fig 1: first, second, etc. documents viewed, edited added to a corpus 101 by a user of interface 107) comprising instructions for: 
receiving a definition of a first code in association with the first target corpus (Loc: Col 3:04-4:25; Fig 1: documents determined for potential inclusion based on first, second, etc. code, tag, terms, etc.); 
creating a first dataset for the first code (Col 3:04-4:25; Fig 1: first second etc. documents added to a corpus based on a tag or tag likelihood); 
generating predictive scores for the first code for documents of the first target corpus using the first machine learning model (Loc: Abstract; Col 4:36-4:52; Fig 1: a model generates predicted values of likelihood of a tag with respect to a data object such as a document or a document with respect to a tag or tags); and 
presenting the predictive scores in association with documents of the first target corpus to a user (Loc: Abstract; Fig 8: a user interface displays predicted values of the relationships between documents and a target tag). It would have been obvious to one of ordinary skill in the art before the effective filing date of the instant application to adapt the Bau system and method to include a user interface for the display of corpus, tag, and machine learning data as taught or suggested by Loc. The average skilled practitioner would have been motivated to do 


Regarding claim 2, 9, 16
Bau in view of Loc teaches or suggests:
A system, method and coded instructions, wherein the first machine learning model is trained only on the boosting dataset. While Bau and Loc teaches the boosting of the first model subsequent to an initial training of the model the determination of when and how to initiate and/or iterate the training of the machine learning model would have been obvious as a matter of design choice as so doing would have emerged naturally from routine experimentation by an average skilled practitioner without undue experimentation and in full expectation of predictable results.

Regarding claim 3, 10, 17
Bau in view of Loc teaches or suggests:
A system, method and coded instructions, wherein the instructions further comprise instructions for: 
receiving coding decisions for documents of the first target corpus with respect to the first code (Bau: ¶ 15-18, 30-36, 52-63: a target domain invokes a language modelling process comprising positive and negative signals for a first corpus determined to induce a positive sensitivity into the current domain language model is included in the current domain language model); (Loc: Abstract; Col 3:04-4:52: automatic and user determined decisions include a document, terms thereof into a first corpus); 

training a second machine learning model for the first code on the native dataset of the first dataset (Loc: Col 9:35-10:21; 16:18-17:23: system implements a best among a plurality of first, second, etc. machine learning models; further a plurality of machine learning models feed a classification pipeline of Fig 10); 
evaluating the first machine learning model and the second machine learning model to select a best machine learning model based on a test set of documents of the first corpus (Loc: Col 9:35-10:21; 16:18-17:23: system implements a best among a plurality of first, second, etc. machine learning models; models optimized or otherwise selected based on a cost function or other methods based on a class of a particular model); 
generating predictive scores for the first code for documents of the first target corpus using the best machine learning model (Loc: Col 9:35-10:21; 16:18-17:23: system a optimizes and selects a best machine learning model); and 
presenting the predictive scores in association with documents of the first target corpus to a user (Loc: Abstract: a user interface displays predicted values of the relationships between documents and a target tag).

Regarding claim 4, 11, 18

A system, method and coded instructions, wherein evaluating the first machine learning model and the second machine learning model to select the best machine learning model further comprises, 
determining a current best machine learning model from the first machine learning model and the second machine learning model (Loc: Col 9:35-10:21; 16:18-17:23: system implements a best among a plurality of first, second, etc. machine learning models; models optimized or otherwise selected based on a cost function or other methods based on a class of a particular model).
While Bau and Loc do not explicitly teach comparing the current best machine learning model to a previous best machine learning model using the test set of documents of the first target corpus to select the best model, Examiner takes official notice that comparing tested model outputs to generate or otherwise determine a best model for a test dataset was well known in the art before the effective filing date of the instant invention and would have comprised an obvious inclusion. The average skilled practitioner would have been motivated to do so for the purpose of auditing the results of a plurality of models and would have expected predictable results therefrom.

Regarding claim 5, 12, 19
Bau in view of Loc teaches or suggests:
A system, method and coded instructions, wherein the first machine learning model for the first code is trained on the boosting dataset of the first dataset and the native dataset of the first dataset (Bau: ¶ 42-48, 52-63; Fig 3, 4: Bau performs learning on the first data set before and after the boosting by the second, etc. dataset).

Regarding claim 6, 13, 20
Bau in view of Loc teaches or suggests:
A system, method and coded instructions, wherein the boosting dataset includes a plurality of datasets, each dataset having respective positive signals and negative signals and training the first machine learning model for the first code on the boosting dataset comprises selecting a positive signal and a negative signal from each of the plurality of datasets according to a balancing method (Bau: ¶ 52-63; Fig 4: a second, third, etc. code each representing a subsequent domain, corpus, etc. and each comprising positive and negative signals determined to induce a positive sensitivity into the current domain language model is included in the current domain language model, sensitivity threshold managed by a balancing method in order to adjust precision between positive and negative signals);
 	
Regarding claim 7, 14, 21
Bau in view of Loc teaches or suggests:
A system, method and coded instructions, wherein indication that the first code is to be boosted with the second code is received through an interface that presents the second code as one of a pluralityAttorney Docket No.PATENT APPLICATION DISCO1110-1Customer No. 44654of codes, each of the plurality of codes presented with an associated textual description in the interface. Bau teaches determining boosting terms from a second, third, etc. domain (Bau: ¶ 15-18, 30-36, 52-63) and Loc teaches user curation of a terms in a corpus using a presented interface (Loc: Abstract; Col 3:04-4:52, Col 9:35-10:21, 16:18-17:23, etc.). It would have been obvious to one of ordinary skill in the art before the effective filing date of the instant application to utilize the Loc interface to select terms among the second, etc. corpora taught by Bau. The average skilled practitioner would have been motivated to do so for the purpose of curating a training set, machine learning model, etc. and would have expected predictable results therefrom.

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to PAUL C MCCORD whose telephone number is (571)270-3701.  The examiner can normally be reached on 730-630 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, VIVIAN CHIN can be reached on 5712727848.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/PAUL C MCCORD/Primary Examiner, Art Unit 2654