DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claim 1-10 rejected under 35 U.S.C. 101.
Step 1 Analysis:
In the instant case, the claims are directed to method. Thus, each of the claims falls within one of the four statutory categories (i.e., process, machine, manufacture, or composition of matter).

Step 2A Prong 1:
Claim 1
The claim is rejected under 35 U.S.C. 101.
The limitations of claim 1:
“a first step in which the processor creates a training set that includes, as an index term for extracting a document used for learning, one or more of the index terms assigned to the applicable documents and the index terms assigned to the non-applicable documents;”, under its broadest reasonable interpretation, covers performance of the limitation in the mind. Nothing in the claim element precludes the step from practically being performed in the mind. The context of this claim encompasses a person creating a training set. The claim includes the use of machine learning algorithms that further amounts to a combination of mental process and math relationships.
 “a second step in which the processor creates the document 20identification model that learns the document data assigned the index term included in the training set, among a plurality of pieces of document data aside from the training data sample”, under its broadest reasonable interpretation, covers performance of the limitation in the mind. Nothing in the claim element precludes the step from practically being performed in the mind. The context of this claim amounts to organizing and assigning indexes to data.
“a third step in which the processor uses the created document identification model and identifies evaluation data including the plurality of 25pieces of document data that are assigned in advance information indicating whether the document data is the applicable document or the non-applicable document, thereby creating an evaluation value of the created document identification model”, under its broadest reasonable interpretation, covers performance of the limitation in the mind. Nothing in the claim element precludes the step from practically being performed in the mind. The context of this claim amounts to organizing and assigning indexes to data.
“a fourth step in which the processor determines whether to use each 24index term included in the training set for creating the training data on the basis of the evaluation value”, under its broadest reasonable interpretation, covers performance of the limitation in the mind. Nothing in the claim element precludes the step from practically being performed in the mind. The context of this claim amounts to organizing and assigning indexes to data.
“a fifth step in which the processor adds as the applicable document data, to the training data, document data that is assigned an index term of 5an applicable document determined to be appropriate for use in creating the training data, among the plurality of pieces of document data aside from the training data sample, …”, under its broadest reasonable interpretation, covers performance of the limitation in the mind. Nothing in the claim element precludes the step from practically being performed in the mind. The context of this claim amounts to organizing and assigning indexes to data.

If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea. 
The abstract idea is not integrated into to a practical application. The claim recites 
generic computer components (e.g. “analytic system”, “an integration engine”, “an analytics engine”, “a machine learning engine”, “a visual analytics engine”, and “an adaptive graphical user interface modeler”) that amount to no more than mere instructions to apply the exception, however, it does not integrate the abstract idea into a practical application. See MPEP 2106.05(b) or 2106.05(f). 
Step 2A Prong 2: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, 
The step of and adds document data assigned an index term of a non-applicable document determined to be appropriate for use in creating the training data to the training data as the non-applicable document data, 10to create the training data, are recited at a high level of generality (i.e., as a general means of gathering network traffic data for use in the comparison step) and amounts to mere data gathering, which is a form of insignificant extra-solution activity. Refer to MPEP 2106.05(g) Insignificant Extra-Solution Activity. These claims are not patent eligible. The examiner notes that the gathering of data is insignificant post-solution activity. 
Step 2B: As discussed above with respect to integration of the abstract idea into a practical application, the use of (e.g. “analytic system”, “an integration engine”, “an analytics engine”, “a machine learning engine”, “a visual analytics engine”, and “an adaptive graphical user interface modeler”) serves as mere instructions to implement the abstract idea on a computer according to MPEP 2106.05(f).
The step of adds document data assigned an index term of a non-applicable document determined to be appropriate for use in creating the training data to the training data as the non-applicable document data, 10to create the training data; a conclusion that an additional element is insignificant extra-solution activity in step 2A should be re-evaluated in step 2B. The receiving step is also well-understood routine, conventional activity (See MPEP, Receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 (utilizing an intermediary computer to forward information). 

Claim 10 is similarly rejected, but for the recitation of a processor, storage unit, refer to claim 1 for further analysis.

Claim 2
Step 1: A method, as above. 
Step 2A Prong 1: The claim recites that “wherein, in the first step, the processor creates a plurality of said training sets, 15wherein, in the second step, the processor creates the document identification model for each of the plurality of training sets, wherein, in the third step, the processor creates the evaluation value for each of the created document identification models, and wherein, in the fourth step, the processor calculates an appearance 20frequency for each index term in the training set used to create the document identification model for which the evaluation value is greater than a prescribed standard, and determines that index terms with a high said appearance frequency should be used to create the training data.” This limitation merely limits the system further and does not change the nature of the underlying mental process. The examiner further notes that the claim amounts to mathematical concepts in addition to the mental process.
Step 2A Prong 2: This judicial exception is not integrated into a practical application. The mere recitation that the mental process is to be performed on generic computer components or instructions to apply the abstract idea on a computer and does not integrate the judicial exception into a practical application. 
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The mere recitation that the mental process is to be performed on generic computer components does not amount to significantly more than the judicial exception. 

Claim 3 
Step 1: A method, as above. 
Step 2A Prong 1: The claim recites that “wherein, in the fourth step, the processor adds the document data to which the index term was assigned to the training data in order from the highest appearance frequency, creates the document identification model using the training data, and if the evaluation value of the created document 25identification model does not improve, determines that the index term should not be used to create the training data.” This limitation merely limits the system further and does not change the nature of the underlying mental process. The examiner further notes that the claim amounts to mathematical concepts in addition to the mental process. 
Step 2A Prong 2: This judicial exception is not integrated into a practical application. The mere recitation that the mental process is to be performed on generic computer components or instructions to apply the abstract idea on a computer and does not integrate the judicial exception into a practical application. 
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The mere recitation that the mental process is to be performed on generic computer components does not amount to significantly more than the judicial exception. 

Claim 4 
Step 1: A method, as above. 
Step 2A Prong 1: The claim recites that “wherein, in the fourth step, the processor determines that said one or more index terms included in the training set used to create the document identification model for which the evaluation value is greater than a prescribed standard should be used to create the training data.” This limitation merely limits the system further and does not change the nature of the underlying mental process. The examiner further notes that the claim amounts to mathematical concepts in addition to the mental process. 
Step 2A Prong 2: This judicial exception is not integrated into a practical application. The mere recitation that the mental process is to be performed on generic computer components or instructions to apply the abstract idea on a computer and does not integrate the judicial exception into a practical application. 
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The mere recitation that the mental process is to be performed on generic computer components does not amount to significantly more than the judicial exception. 

Claim 5 
Step 1: A method, as above. 
Step 2A Prong 1: The claim recites that “wherein, in the fourth step, the processor determines that each index term included in the training set should be used for creating the training data if the evaluation value is greater than a prescribed standard, and wherein the prescribed standard is the evaluation value of the 15document identification model created by learning the training data sample.” This limitation merely limits the system further and does not change the nature of the underlying mental process. The examiner further notes that the claim amounts to mathematical concepts in addition to the mental process. 
Step 2A Prong 2: This judicial exception is not integrated into a practical application. The mere recitation that the mental process is to be performed on generic computer components or instructions to apply the abstract idea on a computer and does not integrate the judicial exception into a practical application. 
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The mere recitation that the mental process is to be performed on generic computer components does not amount to significantly more than the judicial exception. 

Claim 6
Step 1: A method, as above. 
Step 2A Prong 1: The claim recites that “wherein the evaluation value includes at least one of an F value, recall, precision, and accuracy.” This limitation merely limits the system further and does not change the nature of the underlying mental process. The examiner further notes that the claim amounts to mathematical concepts in addition to the mental process.
Step 2A Prong 2: This judicial exception is not integrated into a practical application. The mere recitation that the mental process is to be performed on generic computer components or instructions to apply the abstract idea on a computer and does not integrate the judicial exception into a practical application.  
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The mere recitation that the mental process is to be performed on generic computer components does not amount to significantly more than the judicial exception. 

Claim 7 
Step 1: A method, as above. 
Step 2A Prong 1: The claim recites that “wherein, in the first step, the processor creates the training set including one or more of the index terms extracted randomly from among the index terms assigned to the applicable documents and the index terms 25assigned to the non-applicable documents.” This limitation merely limits the system further and does not change the nature of the underlying mental process. The examiner further notes that the claim amounts to mathematical concepts in addition to the mental process.
Step 2A Prong 2: This judicial exception is not integrated into a practical application. The mere recitation that the mental process is to be performed on generic computer components or instructions to apply the abstract idea on a computer and does not integrate the judicial exception into a practical application. 
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The mere recitation that the mental process is to be performed on generic computer components does not amount to significantly more than the judicial exception. 

Claim 8 
Step 1: A method, as above. 
Step 2A Prong 1: The claim recites that “wherein, in the first step, the processor creates the training set so as not to include index terms assigned to both the applicable documents and 26the non-applicable documents.” This limitation merely limits the system further and does not change the nature of the underlying mental process. The examiner further notes that the claim amounts to mathematical concepts in addition to the mental process.
Step 2A Prong 2: This judicial exception is not integrated into a practical application. The mere recitation that the mental process is to be performed on generic computer components or instructions to apply the abstract idea on a computer and does not integrate the judicial exception into a practical application. 
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The mere recitation that the mental process is to be performed on generic computer components does not amount to significantly more than the judicial exception. 

Claim 9
Step 1: A method, as above. 
Step 2A Prong 1: The claim recites that “  5a step in which, by the processor learning the training data created in the fifth step, the processor creates the document identification model for identifying whether the inputted document is the applicable document.” This limitation merely limits the system further and does not change the nature of the underlying mental process. The examiner further notes that the claim amounts to mathematical concepts in addition to the mental process.
Step 2A Prong 2: This judicial exception is not integrated into a practical application. The mere recitation that the mental process is to be performed on generic computer components or instructions to apply the abstract idea on a computer and does not integrate the judicial exception into a practical application. 
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The mere recitation that the mental process is to be performed on generic computer components does not amount to significantly more than the judicial exception. 

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1, and 4-10 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Agrawal (US6233575).

Regarding claim 1, A training data creation method executed by a computer system having a processor and a storage unit (Agrawal: Col 8, Lines 13-16 “An example hardware environment for an internet embodiment is shown in FIG. 1, which includes a user computer …The computer 10 operates in accordance with a software program stored on a computer readable medium, such as a floppy disk 13, hard disk (not shown) or other suitable storage medium.” Here, taught as the computer and storage medium.), wherein the storage unit stores a plurality of pieces of document data, 5each of which is assigned one or more index terms (Agrawal: Col 9, Lines 16-20 “In another embodiment, once the system is trained, test documents are analyzed. During this phase, a text document is first tokenized.” “for each topic node with suitably high score, the terms in the test document that are significantly more frequent than in the training set for that topic. These are then used for building a term index.” Here, taught as the document and used for building a term index.), wherein some of the plurality of pieces of document data are training data samples provided in advance as training data to be used for generating a document identification model (Agrawal: Col 9, Lines 16-19 “In another embodiment, once the system is trained, test documents are analyzed. During this phase, a text document is first tokenized. Of all the tokens in the document” Here, the examiner notes that the system is initially trained by using documents previous to testing phase.), wherein the storage unit stores information indicating whether each 10piece of document data included in the training data sample is data of an applicable document that is subject to identification by the document identification model (Agrawal: Col 11, Lines 57-63 “If the action type is “new training document”, then the document is tokenized in block 66. Next, in block 68, statistics are appended. In block 70, the appended statistics are stored in the raw statistics database. Also, the appended statistics are stored in the indexed database in block 90. Then, processing continues at block 72, as discussed above.” Lines 64-67 “If the action type is “test”, documents are tokenized in block 92. In block 94, the root topic is selected as a starting point. In block 96, a top topic is picked form the pool (i.e., a topic with a high goodness score). In block 98, using indexed statistics from block 90 (as indicated by the arrow from block 90 to block 98), the children of the picked topic are evaluated and the best ones (i.e., those with high goodness scores) are added to the pool.” Here, the examiner notes the new documents provided are an applicable document based on their relevance to the root topic. Each model first starts by selecting a root topic as a starting point.) or a non-applicable document that is not subject to identification, and wherein the training data creation method comprises (Agrawal: Col 12, Lines 37-40 “Generally, in a binary model, the focus is on whether a term occurs, and so a term is either associated with zero (i.e., occurs) or one (i.e., does not occur).” Here, the examiner notes that the system only retrieves relevant documents and not irrelevant documents. This is based on whether the term occurs (i.e. Applicable) and does not occur (i.e. No applicable).): 15a first step in which the processor creates a training set that includes, as an index term for extracting a document used for learning, one or more of the index terms assigned to the applicable documents and the index terms assigned to the non-applicable documents (Agrawal: Col 8, Lines 66-5“The training procedure, according to preferred embodiments, employs a plurality of training documents that have been pre-assigned manually to various terminal and intermediate nodes in the taxonomy. The training documents are tokenized, and information related to the frequency of terms or tokens is recorded in a database. A discrimination value is determined for each term in the training documents, and a minimum discrimination value is determined.” Here, the examiner notes that the system employs a plurality of training documents which are tokenized, and information related to the frequency of the terms are further recorded/stored in a database.); a second step in which the processor creates the document 20identification model that learns the document data assigned the index term included in the training set, among a plurality of pieces of document data aside from the training data sample (Agrawal: Col 9, Lines 16-20 “In another embodiment, once the system is trained, test documents are analyzed. During this phase, a text document is first tokenized. Of all the tokens in the document, only those that are also in the feature set of the root topic in the taxonomy are considered useful.” Here, the examiner notes that the new documents and index terms are used to train the system. This learning is executed based on new text acquired from the documents (i.e. aside from the training data sample.).); a third step in which the processor uses the created document identification model and identifies evaluation data including the plurality of 25pieces of document data that are assigned in advance information indicating whether the document data is the applicable document or the non-applicable document, thereby creating an evaluation value of the created document identification model (Agrawal: Col 9, Lines 15-29 “once the system is trained, test documents are analyzed. During this phase, a text document is first tokenized. Of all the tokens in the document, only those that are also in the feature set of the root topic in the taxonomy are considered useful. The statistics related to these useful terms are retrieved from the database, and the statistics are used to compute a score for each of the children of the root node (nodes comprising the next level connected to the root node). A few children with high scores are then picked for further exploration. If any child is an intermediate node, it has associated with it another feature set. The set of all tokens in the test document is now intersected with this new feature set, and the procedure continues from the child in the same manner.” Here, the examiner notes a third step in which the processor uses the created document identification model is equated to the model being trained and further moving into the testing phase. The terms/tokens from the documents are compare to the root topic to determine whether the documents are useful. A document is considered useful (i.e. applicable)or not useful(i.e. not applicable).); a fourth step in which the processor determines whether to use each 24index term included in the training set for creating the training data on the basis of the evaluation value (Agrawal: Col 9, Lines 2-12 “The training documents are tokenized, and information related to the frequency of terms or tokens is recorded in a database. A discrimination value is determined for each term in the training documents, and a minimum discrimination value is determined. Then, for each intermediate node, a set of feature terms is selected, where the feature terms are those that are in the training documents associated with the intermediate node or any of its descendants and that have discrimination values equal to or above the minimum discrimination value for the intermediate node.” Here, the examiner notes that the applicants index/token term which is used as the basis from the evaluation value is equated to the indexes used to get identifiers of the sets of documents for analysis.); and a fifth step in which the processor adds as the applicable document data, to the training data, document data that is assigned an index term of 5an applicable document determined to be appropriate for use in creating the training data, among the plurality of pieces of document data aside from the training data sample (Agrawal: Col 9, Lines 30-35 “In a related embodiment, the system also computes, for each topic node with suitably high score, the terms in the test document that are significantly more frequent than in the training set for that topic. These are then used for building a term index.”and Col 11, Line 58-63 “If the action type is “new training document”, then the document is tokenized in block 66. Next, in block 68, statistics are appended. In block 70, the appended statistics are stored in the raw statistics database. Also, the appended statistics are stored in the indexed database in block 90. Then, processing continues at block 72, as discussed above.” Here, the system generates index entries for the text extracted from documents or a subset of documents which are then stored(i.e. stored as training data.). The text from these documents is deemed relevant and is used for the creation of training documents.), and adds document data assigned an index term of a non-applicable document determined to be appropriate for use in creating the training data to the training data as the non-applicable document data (Agrawal: Col 11, Line 58-63 “If the action type is “new training document”, then the document is tokenized in block 66. Next, in block 68, statistics are appended. In block 70, the appended statistics are stored in the raw statistics database. Also, the appended statistics are stored in the indexed database in block 90. Then, processing continues at block 72, as discussed above.” The examiner notes that the indexed document text is used as a training document and meets a root topic threshold.), 10to create the training data (Agrawal: Col 9, Lines 1-5 “The training documents are tokenized, and information related to the frequency of terms or tokens is recorded in a database. A discrimination value is determined for each term in the training documents, and a minimum discrimination value is determined. Then, for each intermediate node, a set of feature terms is selected, where the feature terms are those that are in the training documents associated with the intermediate node or any of its descendants and that have discrimination values equal to or above the minimum discrimination value for the intermediate node.” Here, the training documents that meet the threshold are used for training.).  

Regarding claim 4, Agrawal teaches the training data creation method according to claim 1, Agrawal further teaches 5wherein, in the fourth step, the processor determines that said one or more index terms included in the training set used to create the document identification model (Agrawal: Col 1, Lines 19-24 “manufacture for organizing and indexing information items. Such as documents by topic, and in preferred embodiments, to Such a process, System and article which employ a topic hierarchy and involve a determination of discriminating terms and Stop terms at each internal node in the topic hierarchy.” Here, Agrawal teaches indexing terms from documents based on the topics to create a classification model.) for which the evaluation value is greater than a prescribed standard should be used to create the training data (Agrawal: Col 9, Lines 1-7 “The training documents are tokenized, and information related to the frequency of terms or tokens is recorded in a database. A discrimination value is determined for each term in the training documents, and a minimum discrimination value is determined.” Here, the examiner notes that the evaluation value is taught as the discrimination value for each of the training documents used to select training data.).  

Regarding claim 5, Agrawal teaches t10RRRhe training data creation method according to claim 1, wherein, in the fourth step, the processor determines that each index term included in the training set should be used for creating the training data if the evaluation value is greater than a prescribed standard (Agrawal: Col 10,  Lines 30-37 “collecting a number of documents 32. For example, for classification and Searching of documents available on the internet, the document collection may be performed with a suitable web crawler. Alternatively, a Sample document collection may be provided with the system software 13 (FIG. 1) or manually collected from any Suitable Source.” Here, the examiner notes that the system collects a number of documents to generate new training data via the tokenized terms in the documents.), and wherein the prescribed standard is the evaluation value of the 15document identification model created by learning the training data sample (Agrawal: Col 10 Lines 46-54 “In block 46, statistics are collected from the statistics collection Set 42, based on terms appearing in those documents and the known classes for those documents. These Statistics are used in the determination of the discriminating power of terms in the documents from the collection Set 42. The Statistics are calculated for each node in the taxonomy, Such that, for any one node, the discriminating power is calculated for the terms in all of the documents that are” Here, the examiner notes that the evaluation value is the statistical discrimination values calculated for the terms in the collected documents used to find the appropriate training data.).  

Regarding claim 6, Agrawal teaches the training data creation method according to claim 1, Agrawal further teaches wherein the evaluation value includes at least one of an F value, recall, precision, and accuracy (Agrawal: Col 7, Lines 37-39 “For example, the term “precision' may be visibly associated with the term “recall” in a set of documents on IR, but not in a collection also including documents on machine tools.” Here, the precision and recall is related to a set of documents. The examiner further notes that a discrimination value is also calculated for each term in the set.).  

Regarding claim 7, Agrawal teaches the training data creation method according to claim 1, Agrawal teaches wherein, in the first step, the processor creates the training set including one or more of the index terms extracted randomly from among the index terms (Col 17, Line 38-43 “The Fisher index of each term based on documents in set T is computed, and then documents in Set V are classified using various prefixes F. Let N be the number of documents incorrectly classified when a prefix of k features is used, then (the smallest) k for which N is minimized is Sought.” Here, the examiner notes that the fisher index contains a list of each term based on the documents in the training set.) assigned to the applicable documents and the index terms 25assigned to the non-applicable documents(Agrawal: Col 11, Line 58-63 “If the action type is “new training document”, then the document is tokenized in block 66. Next, in block 68, statistics are appended. In block 70, the appended statistics are stored in the raw statistics database. Also, the appended statistics are stored in the indexed database in block 90. Then, processing continues at block 72, as discussed above.” The examiner notes that the indexed document text is used as a training document and meets a root topic threshold.).  

Regarding claim 8, Agrawal teaches the training data creation method according to claim 1, Agrawal further teaches wherein, in the first step, the processor creates the training set so as not to include index terms (Col 17, Line 38-43 “The Fisher index of each term based on documents in set T is computed, and then documents in Set V are classified using various prefixes F. Let N be the number of documents incorrectly classified when a prefix of k features is used, then (the smallest) k for which N is minimized is Sought.” Here, the examiner notes that the fisher index contains a list of each term based on the documents in the training set.) assigned to both the applicable documents and 26the non-applicable documents (Agrawal: Col 11, Line 58-63 “If the action type is “new training document”, then the document is tokenized in block 66. Next, in block 68, statistics are appended. In block 70, the appended statistics are stored in the raw statistics database. Also, the appended statistics are stored in the indexed database in block 90. Then, processing continues at block 72, as discussed above.” The examiner notes that the indexed document text is used as a training document and meets a root topic threshold.).  

Regarding claim 9, Agrawal teaches the training data creation method according to claim 1, Agrawal further teaches further comprising: 5a step in which, by the processor learning the training data created in the fifth step, the processor creates the document identification model for identifying whether the inputted document is the applicable document (Agrawal: Abstract “e. The models are used in an estimation technique to assign topic paths to new unlabeled documents. The hierarchical technique, in which feature terms can be very different at different nodes, leads to an efficient context Sensitive classification technique. The hierarchical technique can handle millions of documents and tens of thousands of topics.” Here, the examiner notes that the models are used label the new unlabeled documents.).  

Regarding claim 10, Agrawal teaches a training data creation apparatus, comprising: 10a processor; and a storage unit (Agrawal: Col 8, Lines 13-16 “An example hardware environment for an internet embodiment is shown in FIG. 1, which includes a user computer …The computer 10 operates in accordance with a software program stored on a computer readable medium, such as a floppy disk 13, hard disk (not shown) or other suitable storage medium.” Here, taught as the computer and storage medium.), wherein the storage unit stores a plurality of pieces of document data, each of which is assigned one or more index terms (Agrawal: Col 9, Lines 16-20 “In another embodiment, once the system is trained, test documents are analyzed. During this phase, a text document is first tokenized.” “for each topic node with suitably high score, the terms in the test document that are significantly more frequent than in the training set for that topic. These are then used for building a term index.” Here, taught as the document and used for building a term index.), wherein some of the plurality of pieces of document data are training 15data samples provided in advance as training data to be used for generating a document identification model (Agrawal: Col 9, Lines 16-19 “In another embodiment, once the system is trained, test documents are analyzed. During this phase, a text document is first tokenized. Of all the tokens in the document” Here, the examiner notes that the system is initially trained by using documents previous to testing phase.), wherein the storage unit stores information indicating whether each piece of document data included in the training data sample is data of an applicable document that is subject to identification by the document 20identification model (Agrawal: Col 11, Lines 57-63 “If the action type is “new training document”, then the document is tokenized in block 66. Next, in block 68, statistics are appended. In block 70, the appended statistics are stored in the raw statistics database. Also, the appended statistics are stored in the indexed database in block 90. Then, processing continues at block 72, as discussed above.” Lines 64-67 “If the action type is “test”, documents are tokenized in block 92. In block 94, the root topic is selected as a starting point. In block 96, a top topic is picked form the pool (i.e., a topic with a high goodness score). In block 98, using indexed statistics from block 90 (as indicated by the arrow from block 90 to block 98), the children of the picked topic are evaluated and the best ones (i.e., those with high goodness scores) are added to the pool.” Here, the examiner notes the new documents provided are an applicable document based on their relevance to the root topic. Each model first starts by selecting a root topic as a starting point.) or a non-applicable document that is not subject to identification (Agrawal: Col 12, Lines 37-40 “Generally, in a binary model, the focus is on whether a term occurs, and so a term is either associated with zero (i.e., occurs) or one (i.e., does not occur).” Here, the examiner notes that the system only retrieves relevant documents and not irrelevant documents. This is based on whether the term occurs (i.e. Applicable) and does not occur (i.e. No applicable).), and wherein the processor creates a training set that includes, as an index term for extracting a document used for learning, one or more of the index terms assigned to the 25applicable documents and the index terms assigned to the non-applicable documents (Agrawal: Col 8, Lines 66-5“The training procedure, according to preferred embodiments, employs a plurality of training documents that have been pre-assigned manually to various terminal and intermediate nodes in the taxonomy. The training documents are tokenized, and information related to the frequency of terms or tokens is recorded in a database. A discrimination value is determined for each term in the training documents, and a minimum discrimination value is determined.” Here, the examiner notes that the system employs a plurality of training documents which are tokenized, and information related to the frequency of the terms are further recorded/stored in a database.), creates the document identification model that learns the document data assigned the index term included in the training set, among a plurality of pieces of document data aside from the training data sample (Agrawal: Col 9, Lines 16-20 “In another embodiment, once the system is trained, test documents are analyzed. During this phase, a text document is first tokenized. Of all the tokens in the document, only those that are also in the feature set of the root topic in the taxonomy are considered useful.” Here, the examiner notes that the new documents and index terms are used to train the system. This learning is executed based on new text acquired from the documents (i.e. aside from the training data sample.).), 27uses the created document identification model and identifies evaluation data including the plurality of pieces of document data that are assigned in advance information indicating whether the document data is the applicable document or the non-applicable document, thereby creating 5an evaluation value of the created document identification model (Agrawal: Col 9, Lines 15-29 “once the system is trained, test documents are analyzed. During this phase, a text document is first tokenized. Of all the tokens in the document, only those that are also in the feature set of the root topic in the taxonomy are considered useful. The statistics related to these useful terms are retrieved from the database, and the statistics are used to compute a score for each of the children of the root node (nodes comprising the next level connected to the root node). A few children with high scores are then picked for further exploration. If any child is an intermediate node, it has associated with it another feature set. The set of all tokens in the test document is now intersected with this new feature set, and the procedure continues from the child in the same manner.” Here, the examiner notes a third step in which the processor uses the created document identification model is equated to the model being trained and further moving into the testing phase. The terms/tokens from the documents are compare to the root topic to determine whether the documents are useful. A document is considered useful (i.e. applicable)or not useful(i.e. not applicable).), determines whether to use each index term included in the training set for creating the training data on the basis of the evaluation value(Agrawal: Col 9, Lines 2-12 “The training documents are tokenized, and information related to the frequency of terms or tokens is recorded in a database. A discrimination value is determined for each term in the training documents, and a minimum discrimination value is determined. Then, for each intermediate node, a set of feature terms is selected, where the feature terms are those that are in the training documents associated with the intermediate node or any of its descendants and that have discrimination values equal to or above the minimum discrimination value for the intermediate node.” Here, the examiner notes that the applicants index/token term which is used as the basis from the evaluation value is equated to the indexes used to get identifiers of the sets of documents for analysis.), and adds as the applicable document data, to the training data, document data that is assigned an index term of an applicable document determined to 10be appropriate for use in creating the training data, among the plurality of pieces of document data aside from the training data sample (Agrawal: Col 9, Lines 30-35 “In a related embodiment, the system also computes, for each topic node with suitably high score, the terms in the test document that are significantly more frequent than in the training set for that topic. These are then used for building a term index.”and Col 11, Line 58-63 “If the action type is “new training document”, then the document is tokenized in block 66. Next, in block 68, statistics are appended. In block 70, the appended statistics are stored in the raw statistics database. Also, the appended statistics are stored in the indexed database in block 90. Then, processing continues at block 72, as discussed above.” Here, the system generates index entries for the text extracted from documents or a subset of documents which are then stored(i.e. stored as training data.). The text from these documents is deemed relevant and is used for the creation of training documents.), and adds document data assigned an index term of a non-applicable document determined to be appropriate for use in creating the training data to the training data as the non-applicable document data (Agrawal: Col 11, Line 58-63 “If the action type is “new training document”, then the document is tokenized in block 66. Next, in block 68, statistics are appended. In block 70, the appended statistics are stored in the raw statistics database. Also, the appended statistics are stored in the indexed database in block 90. Then, processing continues at block 72, as discussed above.” The examiner notes that the indexed document text is used as a training document and meets a root topic threshold.), to create the training 15data (Agrawal: Col 9, Lines 1-5 “The training documents are tokenized, and information related to the frequency of terms or tokens is recorded in a database. A discrimination value is determined for each term in the training documents, and a minimum discrimination value is determined. Then, for each intermediate node, a set of feature terms is selected, where the feature terms are those that are in the training documents associated with the intermediate node or any of its descendants and that have discrimination values equal to or above the minimum discrimination value for the intermediate node.” Here, the training documents that meet the threshold are used for training.).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 2 and 3 is/are rejected under 35 U.S.C. 103 as being unpatentable over Agrawal (US6233575) in view of Pickens (US20080052273).

Regarding claim 2, Agrawal teaches the training data creation method according to claim 1, Agrawal further teaches wherein, in the first step, the processor creates a plurality of said training sets (Agrawal: Col 8, Lines 65-67 “The training procedure, according to preferred embodiments, employs a plurality of training documents that have been pre-assigned manually to various terminal and intermediate nodes in the taxonomy.” Here, taught as the new training documents.), 15wherein, in the second step, the processor creates the document identification model for each of the plurality of training sets (Agrawal: Col 11, Line 17-22 “Then, in block 48, feature terms and stop terms are determined for each internal topic node based on the model validation set 44. Finally, class models are constructed over the chosen features in block 49, preferably as described below in the section titled “Document Models.”” Here, taught as the training documents used to create the document models.), wherein, in the third step, the processor creates the evaluation value for each of the created document identification models (Agrawal: Col 11, Lines 23-26 “The class models and statistical information calculated in block 46 are provided to the classifier 50, for classifying the test documents 34 in a testing mode, as well as new documents when the system is deployed. Classification of test (or new) documents is carried out in the taxonomy, such that each test (or new) document is ultimately classified to correspond to one or more classes, designated by terminal or leaf nodes (or, in some cases, intermediate nodes in the hierarchy)” Here, the test or new training document is noted as corresponding to the one or more classes which is equated to the evaluation value.),…
Agrawal does not explicitly disclose and wherein, in the fourth step, the processor calculates an appearance 20frequency for each index term in the training set used to create the document identification model for which the evaluation value is greater than a prescribed standard, and determines that index terms with a high said appearance frequency should be used to create the training data.  
Pickens further teaches and wherein, in the fourth step, the processor calculates an appearance 20frequency for each index term in the training set used to create the document identification model for which the evaluation value is greater than a prescribed standard (Pickens: Paragraph [0072] “Term frequencies and document frequencies are still necessary. But, as mentioned previously, documents with a high term frequency and a low context score will not do as well as documents with a high term frequency and a high context score. Context model scores serve as a method for enhancing already good retrieval results.” Here, the examiner notes the appearance frequency is taught as the term frequency which are determined for terms in documents. It is further noted that evaluation value is equated to the context scores from the context model which have good retrieval results.), and determines that index terms with a high said appearance frequency should be used to create the training data (Pickens: Paragraph [0072] “But, as mentioned previously, documents with a high term frequency and a low context score will not do as well as documents with a high term frequency and a high context score. Context model scores serve as a method for enhancing already good retrieval results.” The examiner notes in Pickens the high term frequency which a high context score (evaluation value) perform better than terms with a low frequency. Therefore, Pickens uses the high frequency terms.).  
It would have been obvious to one of ordinary skill in the art to modify the document extraction and indexing system of Agrawal with the term frequencies of Pickens to allow classifying document rankings based on how relevant they were to a query based on indexed terms, thereby, using term frequency as the basis of the context scores when creating a model (Pickens: Paragraph [0007] “an algorithm that determines textual similarity not by comparing keywords, but by comparing contexts that are appropriate to those keywords.” [0068] “Systems are compared by examining the ranks of documents that are relevant to a user's information need. If one system has more relevant documents at higher ranks than another, it is considered better.”).

Regarding claim 3, Agrawal in view of Pickens teaches t25RegardiRegarhe training data creation method according to claim 2, Pickens further teaches wherein, in the fourth step, the processor adds the document data to which the index term was assigned to the training data in order from the highest appearance frequency (Pickens: Paragraph [0071] “Thus, the model may assign a low probability for a term in a document, even if the term frequency of that term is high. There will also be other documents in which the frequency for a term is low, but the context-based probability in that document will be high.”), creates the document identification model using the training data (Pickens: Paragraph [0062] “It is not limited to document boundaries. The inventive context model may be also trained using a set of data” Here, the examiner notes that the context model is trained using a set of data.), and if the evaluation value of the created document 25identification model does not improve, determines that the index term should not be used to create the training data (Pickens: Paragraph [0072] “But, as mentioned previously, documents with a high term frequency and a low context score will not do as well as documents with a high term frequency and a high context score. Context model scores serve as a method for enhancing already good retrieval results.” The examiner notes in Pickens the high term frequency which a high context score (evaluation value) perform better than terms with a low frequency. Therefore, Pickens uses the high frequency terms and disregards the low context score terms.).  

It would have been obvious to one of ordinary skill in the art to modify the document extraction and indexing system of Agrawal with the term frequencies of Pickens to allow classifying document rankings based on how relevant they were to a query based on indexed terms, thereby, using term frequency as the basis of the context scores when creating a model (Pickens: Paragraph [0007] “an algorithm that determines textual similarity not by comparing keywords, but by comparing contexts that are appropriate to those keywords.” [0068] “Systems are compared by examining the ranks of documents that are relevant to a user's information need. If one system has more relevant documents at higher ranks than another, it is considered better.”).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AHSIF A. SHEIKH whose telephone number is (571)272-2607. The examiner can normally be reached Mon-Fri 7:30-5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abdullah Kawsar can be reached on 5712703169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/A.A.S./Examiner, Art Unit 2127                                                                                                                                                                                                        
/LUIS A SITIRICHE/Primary Examiner, Art Unit 2126