DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 10/14/2020 has been entered.
 
Response to Arguments

Applicant Arguments:
	Claim 1 relates to "A method of classifying a set of unstructured text documents for a subject matter without using pre-classified training examples" which is not the purview of either Mathew or Cormack. Claim 1 recites "identifying a taxonomy of classes having class names for the subject matter." The Office Action pointes to Mathew at [0195], but it is clear that Mathew is using a user provided "synonym dictionary 855" and user provided "taxonomy 860." In Mathew, there is no mention of identifying class names or a taxonomy of classes for the subject matter. Mathew at best identifies synonyms to group words together and a user provided taxonomy to group words.

Examiner Response:
	The examiner respectfully disagrees, the applicants specification in paragraph [0034] “For a given subject matter domain, a hierarchical taxonomy of classes must be made available. The taxonomy may be pre-existing in the literature or custom-built. In either case, the taxonomy becomes the input into which objects are to be classified. See, Specify Taxonomy A in Fig. 1.” provides further detail clarifying that the taxonomy may be pre-existing in the literature or custom built. Mathew in paragraph [0094] recites “FIG. 8 depicts an exemplary process flow diagram 800 to identify themes in accordance with one or more embodiments… Process flow diagram 800 may include receiving target documents 810, taxonomy”. The examiner notes that Mathew teaches identifying themes based on a taxonomy. The examiner further notes that the applicants claim language does not recite how the taxonomy is identified. Therefore, under broadest reasonable interpretation the taxonomy can be user provided. Mathew in claim 1 recites “discover themes” which teaches the method of a theme detection component that identifies themes [i.e. taxonomy].

Applicant Remarks:
	Claim 1 next recites "searching at least some of said set of text documents with one or more of said class names, including extracting N-grams to construct rules for an approximate classifier." The Office Action points to Mathew at [0063] and [0205]. These passage at [0063] does mention "N grams" but only in the context of identifying language. See Fig. 3. Again, Mathew is identifying themes, not searching for class names to construct rules for an 
Claim 1 at c) calls for "classifying at least some of the set of text documents into said 
classes using said approximate classifier and producing a confidence factor for each class where said confidence factor measures the likelihood that said document is associated with said class." The Office Action agrees that Mathew does not disclose such claim limitations, but cites 
Cormack at [0180] and [0154]. Claim 1 has been amended to make clear that the "approximate classifier" is constructed "without using pre-classified training examples." Cormack at [0180] presents documents to a user 210 for the user to determine relevance. 
Claim 1 at g) calls for "modifying said approximate classifier for each class based on said elimination criteria, including adding, removing, or refining an N-gram from said approximate classifier." The Office Action agrees that Mathew does not teach or suggest such a claim 
limitation, but instead points to Cormack at [0150] While it is not clear what type of classification adjustment is being made at Cormack [0150]. It appears to be a change in the weights of a Bayesian filter. It is clear that Cormack [0150] does not teach or suggest the language of claim 1 calling for "-adding, removing, or refining an N-gram." 
Claim 1 at g) recites "repeating steps c)-f) until a stopping condition is met." The Office 
Action cites Cormack at [0192] as suggesting this claim limitation. However, upon examination of Cormack at [0191] it appears that Cormack determines whether another document should be processed. In claim 1, elements c-f) relate to more than a decision regarding processing another document. E.g. element f) calls for "modifying said approximate classifier for each class based on said elimination criteria, including adding, removing, or refining an N-gram." This limitation is not suggested by Cormack at [0192]. 

elements of claim 1 or are combinable. For example, none of the references teach extracting N- grams or adding, removing, or refining an N-gram from an approximate classifier. Even if combined (and nothing provides such a motivation), the references fail to meet the present claim limitations of claim 1. As noted above, the prior art references must teach or suggest all the claim limitations to render a claim obvious under 35 USC § 103(a).
Examiner Response:
	Applicant’s arguments with respect to claim(s) 1 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Applicant Remarks:
Claim 14 
The Office Action further proposes that the combination of Mathew/Cormack renders obvious the elements of claim 14. See Office Action pp. 14- 18. As noted above, none of the references teach extracting N-grams or adding, removing, or refining an N-gram from an approximate classifier. The Office Action agrees (p. 17) that Mathew does not teach such a claim limitation,10 / 12 but cites Cormack at [0150]. As noted above, it is clear that Cormack [0150] does not teach or suggest the language of claim 1 calling for" adding, removing, or refining an N-gram." 

Examiner Response:


Applicant Remarks:
Claim 19 
Claim 19 was also rejected based on the combination of Mathew/Cormack (Office Action at pp. 21- 24). As noted above, Mathews relates to grouping of themes and the build of a synonym dictionary. Neither Mathew nor Cormack relate to "classifying unstructured text documents, without the need for pre-classified training examples." The Office Action (p. 22) agrees that Mathew does not teach or suggest the claim limitation "recursively apply the approximate classifier to evaluate its performance, and modify the approximate classifier, including adding, removing, or refining an N-gram from said approximate classifier, using an elimination criteria until a stopping condition is met." Instead, the Office Action applies 
Cormack at [0174], [0150] and [0192] to this limitation. However, there is nothing in these passages remotely suggesting "adding, removing, or refining an N-gram from said approximate classifier" using an elimination criteria or a stopping condition. This claim limitation is not met. 

Examiner Response:
Applicant’s arguments with respect to claim(s) 19 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 1-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mathew (U.S.  2013/0268534) in view of Cormack (U.S. 20150324451) and Itoh (U.S. 20150279353).

Regarding claim 1, Mathew teaches a method of classifying a set of unstructured text documents for a subject matter without using pre-classified training examples (Mathew: Paragraph [0050] “process a collection of unstructured documents to determine significant themes that repeat across the collection.” Classifying a set of unstructured text documents for a subject matter without using pre-classified training examples is taught as process a collection of unstructured documents to determine significant themes [subject matter] that repeat across the collection.), comprising: 
a) identifying a taxonomy of classes having class names for the subject matter (Mathew: Paragraph [0195] “Grouping 850 may include steps in which related themes are grouped together using: a synonym dictionary 855 to group words with similar meaning together, a taxonomy 860 to group child concepts together into an overreaching parent group” Identifying a taxonomy of classes having class names for the subject matter is taught as a taxonomy grouping child concepts together into a overreaching parent group. Class names for the subject matter are taught as the themes.); 
b) searching at least some of said set of text documents with one or more of said class names (Mathew: Paragraph [0205] “Rule generation 870 may include steps in which text patterns to identify a theme are identified… a document would be said to contain the said theme if any one of the rule conditions are met” Searching at least some of said set of text documents with one or more of said class names is taught by the process in which text patterns of documents are identified to determine if a theme based on any one of the rule conditions are met.), …; 
d) generating a list of plausible terms for a number of said classes based at least in part on said confidence factor (Mathew: Paragraph [0155] “Top-level themes may be identified by from the pool of words and bigrams. An overall sentiment distribution of a group of sentences may be calculated and an item sentiment distribution for sentences containing each item in the pool may be calculated. An item from the pool may be a candidate for a top-level theme if (1) it appears above a certain threshold in the set of documents” Generating a list of plausible terms for a number of said classes based at least in part on said confidence factor is taught as the pool of words and bigrams [from the part of Top-level themes] that may be calculated at least in part by the overall sentiment score.); 
e) eliminating plausible terms from the list for each class based at least in part on a set of elimination criteria (Mathew: Paragraph [0098] “Noise word filtering 825 may include steps to remove noise words in the documents from consideration, Such that the noise words may be ignored.” Eliminating plausible terms from the list for each class based at least in part on a set of elimination criteria is taught as noise word filtering that include steps to remove noise words from consideration[Claim 21 of the prior art teaches “determine the noise terms from of the themes; and remove the noise terms from the category model.”].); 
and 
Mathew does not explicitly disclose 
… including extracting N-grams to construct rules for an approximate classifier without using pre-classified training examples;…
c) classifying at least some of the set of text documents into said classes using said approximate classifier without using pre-classified training examples and producing a confidence factor for each class where said confidence factor measures the likelihood that said document is associated with said class;…
f) modifying said approximate classifier for each class based on said elimination criteria, including adding, removing, or refining an N-gram from said approximate classifier; and 
g) repeating steps c)-f) until a stopping condition is met. 

Cormack further teaches c) classifying at least some of the set of text documents into said classes … document (Cormack: Paragraph [0180] “Classification system 100 may then present the selected documents to a user 210 in order to determine how many documents are actually relevant to a class 130 or subclass” Classifying at least some of the set of text documents into said classes is taught as the classification system may then present the selected documents to a user in order to determine how many documents are actually relevant to a class 1)
 g) repeating steps c)-f) until a stopping condition is met (Cormack: Paragraph [0192] “Using the identified data, a set of documents in the document collection may be classified into one or more of the classes or subclasses and a determination may be made as to whether one or more stopping criteria have been met. If one or more stopping criteria have not been met, another document from the collection may be selected in order to continue classifying the documents of the document collection.” Repeating steps c)-f) until a stopping condition is met is taught as if one or more stopping criteria have not been met, another document from the collection may be selected in order to continue classifying the documents of the document collection. Refer to Paragraph [0193] for further analysis “Furthermore, instead of terminating learning at an arbitrary point in the process, learning continues and is refined (e.g., by continually updating classifiers) until substantially all of the relevant documents have been found (thus achieving high recall)”). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified theme detection of unstructured data of Mathew with the stopping criteria of document classifiers of Cormack in order to allow iteratively applying an active learning process until a stopping criterion has been met, thereby ensuring that all the documents in the collection have been classified by the classification process (Cormack: Paragraph [0153] “active learning process 360 may continue to run until all documents in document collection 120 are processed and classified.”).

…using said approximate classifier without using pre-classified training examples and producing a confidence factor for each class where said confidence factor measures the likelihood that said [data] is associated with said class;…
f) modifying said approximate classifier for each class based on said elimination criteria, including adding, removing, or refining an N-gram from said approximate classifier; and 
Itoh further teaches … including extracting N-grams to construct rules for an approximate classifier without using pre-classified training examples (Itoh: Paragraph [0010] “unsupervised training method for an N-gram language model…reliability to select an N-gram entry; and training, by the computer, the N-gram language model about selected one of more of the N-gram entries using all recognition results.” Extracting N-grams to construct rules for an approximate classifier without using pre-classified training examples is taught as unsupervised training method [i.e. without using pre-classified training examples] for an N-gram language model. The speech recognition system returns N-gram entries from the recognition results which are then used to train the speech recognition system.);…
… using said approximate classifier without using pre-classified training examples and producing a confidence factor for each class where said confidence factor measures the likelihood that said [data]…is associated with said class (Itoh: Paragraph [0049] “The reliability acquisition section 306 calculates by itself or externally acquires a reliability indicating how reliable each text of the recognition results included in the corpus B304 is. A confidence measure to be newly derived in the future as well as currently known confidence measures (see Jiang) can be used as confidence measures to indicate the reliability to be acquired. Specifically, a confidence measure calculated as a logical sum of the likelihood of acoustic model and the likelihood of language model, a confidence measure using the posterior probability of a text unit obtained upon speech recognition, and a confidence measure calculated by the recall or precision of correct words as correct intersections between outputs of two or more speech recognition systems. ” Using said approximate classifier without using pre-classified training examples is taught by the unsupervised training method for an N-gram language model. Producing a confidence factor for each class where said confidence factor measures the likelihood that said [data]…is associated with said class is taught as a confidence measure indicating the likelihood of correctly identified words output from the recognition system.);…
f) modifying said approximate classifier for each class based on said elimination criteria, including adding, removing, or refining an N-gram from said approximate classifier (Itoh: Paragraph [0008] “pruning the language model in a size suitable for the application, in which all the highest order n-grams and their probabilities are removed from an n-gram language model MO to generate an initial base model, and some of the most important pruned n-gram probabilities are added to this initial base model to provide a pruned language model.” Modifying said approximate classifier for each class based on said elimination criteria, including adding, removing, or refining an N-gram from said approximate classifier is taught as pruning the language model in size by removing the n-grams to provide a pruned model [i.e. approximate classifier]); and… 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mathew (Matthew, in paragraph 0063, “In some embodiments, language detection may be performed by identifying frequent N-grams” suggests the use of N-grams, which Itoh teaches an improvement of in the same language detection field of use.) and Cormack with the N-gram recognition model of Itoh in order to allow iteratively applying an unsupervised training method for the N-gram language model, thereby providing an improved, unsupervised training method, training system, and training program for an N-gram language model, which neither requires any manual correction nor causes any distortion in the probabilities of N-gram entries. (Itoh: Paragraph [0019] “provide an improved, unsupervised training method, training system, and training program for an N-gram language model, which neither requires any manual correction nor causes any distortion in the probabilities of N-gram entries.”).

Regarding claim 2, Mathew in view of Cormack and Itoh teach the method of claim 1, Mathew further teaches said taxonomy comprising a hierarchy of classes for said subject matter (Mathew: Paragraph [0212-0214] “grouping 850 using taxonomy 860, define all possible member children of the taxonomy grouping and identify all possible inflections of each child…Generated category model 875 represents a two-level hierarchy of themes or a one-level hierarchy of themes.” Taxonomy comprising a hierarchy of classes for said subject matter is taught as a two-level hierarchy of themes or a one-level hierarchy of themes.). Regarding claim 3, Mathew in view of Cormack and Itoh teach the method of claim 1, Mathew (Mathew: Paragraph [0058] “Theme identification 250 may include steps to statistically determine themes from isolated target documents 240 using natural language features identified using natural language process 220 including but not limited to stemmed words, named entities, bigrams and part-of-speech. In some embodiments, theme identification 250 may include steps to identify themes as a single or multi-level hierarchy with parent themes and child themes.” One or more words or phrases found in one or more documents related to said subject matter is taught as stemmed words, named entities, bigrams and part-of-speech from target documents are used to identify themes[subject matter].). Regarding claim 4, Mathew in view of Cormack and Itoh teach the method of claim 1, Mathew further teaches said constructing an approximate classifier comprising extracting a leaf node for inclusion as a term in said approximate classifier (Mathew: Paragraph [0058] “a single or multi-level hierarchy with parent themes and child themes.” Extracting a leaf node for inclusion as a term in said approximate classifier is taught as the child themes which are more specific than the parent themes. Refer to paragraph [0217-0218] for further analysis.). Regarding claim 5, Mathew in view of Cormack and Itoh teach the method of claim 1, Mathew further teaches said constructing an approximate classifier comprising, for a single word class name, concatenate the word to its parent class (Mathew: Paragraph [0217-0218] “At the highest level of the hierarchy in category model 940, there are parent level themes such as parent theme 910. Parent themes may automatically assigned a label. For example, parent theme 910 is named “Bed”. 0218. At the second level of the hierarchy in category model 940, there are child level themes, such as child theme 920 which is a child of parent theme 910. Child themes may be assigned a label. For example, child theme 920 is named “Size Bed.” ” For a single word class name, concatenate the word to its parent class is taught as “Bed” [single word class name] which is linked to the second level of hierarchy of the category model Size “Bed”). Regarding claim 6, Mathew in view of Cormack and Itoh teach the method of claim 1, Mathew further teaches said constructing an approximate classifier comprising applying a set of linguistic transformations to one or more terms in said approximate classifier (Mathew: Paragraph [0067] “Stemming 330 may include steps to strip each word of any morphological suffixes or prefixes so that the word can be reduced to its root form. For example the token RUDELY may be stemmed to RUDE so that a single concept called RUDE can be identified.” Constructing an approximate classifier comprising applying a set of linguistic transformations to one or more terms in said approximate classifier is taught as strip each word of any morphological suffixes or prefixes so that the word can be reduced to its root form [linguistic transformation].). Regarding claim 7, Mathew in view of Cormack and Itoh teach the method of claim 1, Mathew further teaches said generating a list of plausible terms step comprising an N-gram analysis (Mathew: Paragraph [0067] “In some embodiments, language detection may be performed by identifying frequent N-grams, which are sequences of word patterns, from the document and searching against a corpus.” Generating a list of plausible terms step comprising an N-gram analysis is taught as identifying frequent N-grams, which are sequences of word patterns, from the document and searching against a corpus.). Regarding claim 8, Mathew in view of Cormack and Itoh teach the method of claim 1, Mathew further teaches said generating a list of plausible terms step comprising a linguistic transformation procedure (Mathew: Paragraph [0067] “Stemming 330 may include steps to strip each word of any morphological suffixes or prefixes so that the word can be reduced to its root form. For example the token RUDELY may be stemmed to RUDE so that a single concept called RUDE can be identified.” Constructing an approximate classifier comprising applying a set of linguistic transformations to one or more terms in said approximate classifier is taught as strip each word of any morphological suffixes or prefixes so that the word can be reduced to its root form [linguistic transformation].). Regarding claim 9, Mathew in view of Cormack and Itoh teach the method of claim 1, Mathew further teaches said eliminating plausible terms step comprising a single class N-gram selection procedure (Mathew: Paragraph [0098] “Noise word filtering 825 may include steps to remove noise words in the documents from consideration, Such that the noise words may be ignored… tends to repeat very frequently and may be mistaken for a theme.” Eliminating plausible terms step comprising a single class N-gram selection procedure is taught as noise word filtering that include steps to remove noise words from consideration [Claim 21 of the prior art teaches “determine the noise terms from of the themes; and remove the noise terms from the category model.”] N-grams are identified in paragraph [0067] and the noise terms are removed. Tends to be mistaken for a theme is taught as leading to unsuccessful classification.). Regarding claim 10, Mathew in view of Cormack and Itoh teach the method of claim 1, Mathew further teaches said eliminating plausible terms step comprising a multi-class N-gram selection procedure (Mathew: Paragraph [0097] “In some embodiments filter stop words 820 may be performed by mathematically computing a frequency threshold; words with a frequency above the threshold may be removed from further processing.” Eliminating plausible terms step comprising a multi-class N-gram selection procedure is taught as removing words that have a higher frequency. [Higher frequency means they contribute to more matches] N-grams are identified in paragraph [0067] and the noise terms are removed.). Regarding claim 11, Mathew in view of Cormack and Itoh teach the method of claim 1, Mathew further teaches said elimination criteria comprising applying a single class N-gram selection procedure to remove candidate terms unlikely to contribute to successful classification of documents (Mathew: Paragraph [0098] “Noise word filtering 825 may include steps to remove noise words in the documents from consideration, Such that the noise words may be ignored… tends to repeat very frequently and may be mistaken for a theme.” Applying a single class N-gram selection procedure to remove candidate terms unlikely to contribute to successful classification of documents is taught as noise word filtering that include steps to remove noise words from consideration [Claim 21 of the prior art teaches “determine the noise terms from of the themes; and remove the noise terms from the category model.”] N-grams are identified in paragraph [0067] and the noise terms are removed. Tends to be mistaken for a theme is taught as leading to unsuccessful classification.). Regarding claim 12, Mathew in view of Cormack and Itoh teach the method of claim 1, said selection criteria comprising applying a multi-class N-gram selection procedure based on statistics indicating terms will contribute to successful classification of documents (Mathew: Paragraph [0097] “In some embodiments filter stop words 820 may be performed by mathematically computing a frequency threshold; words with a frequency above the threshold may be removed from further processing.” Applying a multi-class N-gram selection procedure based on statistics indicating terms will contribute to successful classification of documents is taught as removing words that have a higher frequency. [Higher frequency means they contribute to more matches] N-grams are identified in paragraph [0067] and the noise terms are removed.). Regarding claim 13, Mathew in view of Cormack and Itoh teach the method of claim 1, Cormack further teaches said stopping condition (Cormack: Paragraph [0192] “may be repeated until a stopping criterion is satisfied.” Stopping condition is taught as stopping criterion.)  comprising one or more of the following are met- 
a) the difference in the number of plausible terms resulting from repeating step g) is smaller than a pre-set threshold (Cormack: Paragraph [0192] “a stopping criterion may be satisfied when a measure or estimation of precision and recall reach an acceptable level.” The difference in the number of plausible terms resulting from repeating step g) is smaller than a pre-set threshold is taught as when the measure or estimation of precision and recall reaches an acceptable level.), 
b) the same number or more terms are being added in repeating step g) and removed in another repeating step g), or 
c) an approximate classifier has been created for every class in the taxonomy (Cormack: Paragraph [0153] “In some embodiments, active learning process 360 may continue to run until all documents in document collection 120 are processed and classified. In an alternative embodiment, active learning process 360 may stop when a specified or determined number of documents have been classified by classification process” An approximate classifier has been created for every class is taught as active learning process may continue to run until all documents in document collection 120 are processed and classified.). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified theme detection of unstructured data of Mathew with the stopping criteria of document classifiers of Cormack in order to allow iteratively applying an active learning process until a stopping criterion has been met, thereby ensuring that all the documents in the collection have been classified by the classification process (Cormack: Paragraph [0153] “active learning process 360 may continue to run until all documents in document collection 120 are processed and classified.”).
Regarding claim 14, A system of classifying a set of unstructured textual documents, without  (Mathew: Paragraph [0050] “process a collection of unstructured documents to determine significant themes that repeat across the collection.” Classifying a set of unstructured text documents for a subject matter without using pre-classified training examples is taught as process a collection of unstructured documents to determine significant themes [subject matter] that repeat across the collection.), comprising: computer memory (Mathew: Paragraph [0235] “computing device 1400 may include a bus 1410, a processor 1420, a memory 1430”) loaded with one or more class names and one or more computer processors (Mathew: Paragraph [0235] “computing device 1400 may include a bus 1410, a processor 1420, a memory 1430”) programmed to expand the class name into a set of words and phrases (Mathew: Paragraph [0155] “s, Top-level themes may be identified by from the pool of words and bigrams.” Computer memory loaded with one or more class names and one or more computer processors programmed to expand the class name into a set of words and phrases is taught as the Top-level themes that are identified from the pool of words or bigrams [phrase].); 
computer memory (Mathew: Paragraph [0235] “computing device 1400 may include a bus 1410, a processor 1420, a memory 1430”) loaded with a set of unstructured text documents and said one or more computer processors (Mathew: Paragraph [0235] “computing device 1400 may include a bus 1410, a processor 1420, a memory 1430”)programmed to search the set of unstructured text documents (Mathew: Paragraph [0205] “Rule generation 870 may include steps in which text patterns to identify a theme are identified… a document would be said to contain the said theme if any one of the rule conditions are met” Computer memory loaded with a set of unstructured text documents and said one or more computer processors programmed to search the set of unstructured text documents is taught by process in which text patterns of documents are identified to determine if a theme based on any one of the rule conditions are met.)…; 
said one or more computer processors (Mathew: Paragraph [0235] “computing device 1400 may include a bus 1410, a processor 1420, a memory 1430”) programmed to classify at least some of the set of text documents into said classes (Mathew: Paragraph [0084] “Classify 620 may include steps to apply category model 630 to collection of documents 610 so that documents are mapped to a category within model 630 if the document contains a text pattern defined by a rule for the category” Classifying at least some of the set of text documents into said classes using said approximate classifier is taught by applying the category model to a collection of documents in order to map them to a category defined by the rules.) …;
said one or more computer processors (Mathew: Paragraph [0235] “computing device 1400 may include a bus 1410, a processor 1420, a memory 1430”) programmed to generate a list of plausible terms for a number of said classes based at least in part on said confidence factor (Mathew: Paragraph [0155] “Top-level themes may be identified by from the pool of words and bigrams. An overall sentiment distribution of a group of sentences may be calculated and an item sentiment distribution for sentences containing each item in the pool may be calculated. An item from the pool may be a candidate for a top-level theme if (1) it appears above a certain threshold in the set of documents” Generating a list of plausible terms for a number of said classes based at least in part on said confidence factor is taught as the pool of words and bigrams [from the part of Top-level themes] that may be calculated at least in part by the overall sentiment score.); 
(Mathew: Paragraph [0235] “computing device 1400 may include a bus 1410, a processor 1420, a memory 1430”) programmed to eliminate plausible terms from the list for each class based at least in part on an elimination criteria (Mathew: Paragraph [0098] “Noise word filtering 825 may include steps to remove noise words in the documents from consideration, Such that the noise words may be ignored.” Eliminating plausible terms from the list for each class based at least in part on a set of elimination criteria is taught as noise word filtering that include steps to remove noise words from consideration[Claim 21 of the prior art teaches “determine the noise terms from of the themes; and remove the noise terms from the category model.”].)

Mathew does not explicitly disclose… to construct an approximate classifier without using pre-classified training examples …including N-gram extraction … without using pre-classified training examples using said approximate classifier and producing a confidence factor for each document classified…and to modify said approximate classifier for each class based on said elimination criteria including adding, removing, or refining an N-gram from said approximate classifier; 
and said one or more computer processors programmed to iteratively classify text documents, generate plausible terms and modify the approximate classifier until a stopping criteria is met. 

Cormack further teaches and said one or more computer processors programmed (Cormack: Paragraph [0054] “one processor ”) to iteratively classify text documents (Cormack: Paragraph [0153] “successive iteration of the active learning process may be operating with incomplete information, meaning that scores and classification decisions for all documents of the collection will have not been calculated.” Iteratively classify text documents is taught as iterations of the active learning document classification process.), generate plausible terms and modify the approximate classifier until a stopping criteria is met (Cormack: Paragraph [0193] “Furthermore, instead of terminating learning at an arbitrary point in the process, learning continues and is refined (e.g., by continually updating classifiers) until substantially all of the relevant documents have been found (thus achieving high recall) or possibly until an objective measure of system performance is realized. Thus, overall classification effectiveness is improved.” Repeating steps c)-f) until a stopping condition is met is taught as may iteratively refine the segment classifier based upon the performance measure until a stopping criterion is met.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified theme detection of unstructured data of Mathew with the stopping criteria of document classifiers of Cormack in order to allow iteratively applying an active learning process until a stopping criterion has been met, thereby ensuring that all the documents in the collection have been classified by the classification process (Cormack: Paragraph [0153] “active learning process 360 may continue to run until all documents in document collection 120 are processed and classified.”).
Mathew in view of Cormack do not explicitly disclose …to construct an approximate classifier without using pre-classified training examples …including N-gram extraction … without using pre-classified training examples using said approximate classifier and producing a confidence factor for each document classified…and to modify said approximate classifier for 
Itoh further teaches…to construct an approximate classifier without using pre-classified training examples …including N-gram extraction … without using pre-classified training examples using said approximate classifier (Itoh: Paragraph [0010] “unsupervised training method for an N-gram language model…reliability to select an N-gram entry; and training, by the computer, the N-gram language model about selected one of more of the N-gram entries using all recognition results.” Extracting N-grams to construct rules for an approximate classifier without using pre-classified training examples is taught as unsupervised training method [i.e. without using pre-classified training examples] for an N-gram language model. The speech recognition system returns N-gram entries from the recognition results which are then used to train the speech recognition system.) and producing a confidence factor for each document classified (Itoh: Paragraph [0049] “The reliability acquisition section 306 calculates by itself or externally acquires a reliability indicating how reliable each text of the recognition results included in the corpus B304 is. A confidence measure to be newly derived in the future as well as currently known confidence measures (see Jiang) can be used as confidence measures to indicate the reliability to be acquired. Specifically, a confidence measure calculated as a logical sum of the likelihood of acoustic model and the likelihood of language model, a confidence measure using the posterior probability of a text unit obtained upon speech recognition, and a confidence measure calculated by the recall or precision of correct words as correct intersections between outputs of two or more speech recognition systems. ” Using said approximate classifier without using pre-classified training examples is taught by the unsupervised training method for an N-gram language model. Producing a confidence factor for each class where said confidence factor measures the likelihood that said [data]…is associated with said class is taught as a confidence measure indicating the likelihood of correctly identified words output from the recognition system.)…and to modify said approximate classifier for each class based on said elimination criteria including adding, removing, or refining an N-gram from said approximate classifier(Itoh: Paragrpah [0008] “pruning the language model in a size suitable for the application, in which all the highest order n-grams and their probabilities are removed from an n-gram language model MO to generate an initial base model, and some of the most important pruned n-gram probabilities are added to this initial base model to provide a pruned language model.” Modifying said approximate classifier for each class based on said elimination criteria, including adding, removing, or refining an N-gram from said approximate classifier is taught as pruning the language model in size by removing the n-grams to provide a pruned model [i.e. approximate classifier]); 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mathew (Matthew, in paragraph 0063, “In some embodiments, language detection may be performed by identifying frequent N-grams” suggests the use of N-grams, which Itoh teaches an improvement of in the same language detection field of use.) and Cormack with the N-gram recognition model of Itoh in order to allow iteratively applying an unsupervised training method for the N-gram language model, thereby providing an improved, unsupervised training method, training system, and training program for an N-gram language model, which neither requires any manual correction nor causes any distortion in the probabilities of N-gram entries. (Itoh: Paragraph [0019] “provide an improved, unsupervised training method, training system, and training program for an N-gram language model, which neither requires any manual correction nor causes any distortion in the probabilities of N-gram entries.”).
Regarding claim 15, Mathew in view of Cormack and Itoh teach the method of claim 14, Mathew further teaches said list of plausible terms being generated by an N-gram analysis (Mathew: Paragraph [0067] “In some embodiments, language detection may be performed by identifying frequent N-grams, which are sequences of word patterns, from the document and searching against a corpus.” Generating a list of plausible terms step comprising an N-gram analysis is taught as identifying frequent N-grams, which are sequences of word patterns, from the document and searching against a corpus.). Regarding claim 16, Mathew in view of Cormack and Itoh teach the method of claim 14, Mathew further teaches said elimination criteria comprising said one or more processors programmed (Mathew: Paragraph [0235] “computing device 1400 may include a bus 1410, a processor 1420, a memory 1430”) to apply a single class N-gram selection procedure to remove candidate terms unlikely to contribute to successful classification of documents (Mathew: Paragraph [0098] “Noise word filtering 825 may include steps to remove noise words in the documents from consideration, Such that the noise words may be ignored… tends to repeat very frequently and may be mistaken for a theme.” Applying a single class N-gram selection procedure to remove candidate terms unlikely to contribute to successful classification of documents is taught as noise word filtering that include steps to remove noise words from consideration [Claim 21 of the prior art teaches “determine the noise terms from of the themes; and remove the noise terms from the category model.”] N-grams are identified in paragraph [0067] and the noise terms are removed. Tends to be mistaken for a theme is taught as leading to unsuccessful classification.).Regarding claim 17, Mathew in view of Cormack and Itoh teach the method of claim 14, Mathew further teaches said selection criteria comprising said one or more processors (Mathew: Paragraph [0235] “computing device 1400 may include a bus 1410, a processor 1420, a memory 1430”) programmed to apply a multi-class N-gram selection procedure based on statistics indicating terms will contribute to successful classification of documents (Mathew: Paragraph [0097] “In some embodiments filter stop words 820 may be performed by mathematically computing a frequency threshold; words with a frequency above the threshold may be removed from further processing.” Applying a multi-class N-gram selection procedure based on statistics indicating terms will contribute to successful classification of documents is taught as removing words that have a higher frequency. [Higher frequency means they contribute to more matches] N-grams are identified in paragraph [0067] and the noise terms are removed.). Regarding claim 18, Mathew in view of Cormack and Itoh teach the method of claim 14, Cormack further teaches said stopping criteria for stopping iteratively classifying (Cormack: Paragraph [0192] “may be repeated until a stopping criterion is satisfied.” Stopping condition is taught as stopping criterion.) of said one or more processors (Cormack: Paragraph [0054] “one processor ”) comprising one or more of determining if-
(Cormack: Paragraph [0192] “a stopping criterion may be satisfied when a measure or estimation of precision and recall reach an acceptable level.” The difference in the number of plausible terms resulting from repeating step g) is smaller than a pre-set threshold is taught as when the measure or estimation of precision and recall reaches an acceptable level.), 
the same number or more terms are being added during iteration and removed in another iteration, or 
an approximate classifier has been created for every class (Cormack: Paragraph [0153] “In some embodiments, active learning process 360 may continue to run until all documents in document collection 120 are processed and classified. In an alternative embodiment, active learning process 360 may stop when a specified or determined number of documents have been classified by classification process” An approximate classifier has been created for every class is taught as active learning process may continue to run until all documents in document collection 120 are processed and classified.). 

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified theme detection of unstructured data of Mathew with the stopping criteria of document classifiers of Cormack in order to allow iteratively applying an active learning process until a stopping criterion has been met, thereby ensuring that all the documents in the collection have been classified by the classification process (Cormack: Paragraph [0153] “active learning process 360 may continue to run until all documents in document collection 120 are processed and classified.”).
	

Regarding claim 19, Mathew teaches a system for classifying a set of unstructured text documents into a plurality of classes without using pre-classified training examples (Mathew: Paragraph [0050] “process a collection of unstructured documents to determine significant themes that repeat across the collection.” Classifying a set of unstructured text documents for a subject matter without using pre-classified training examples is taught as process a collection of unstructured documents to determine significant themes [subject matter] that repeat across the collection.), comprising: a processor (Mathew: Paragraph [0235] “computing device 1400 may include a bus 1410, a processor 1420, a memory 1430”); and a storage device coupled to the processor and configurable for storing instructions (Mathew: Paragraph [0235] “computing device 1400 may include a bus 1410, a processor 1420, a memory 1430”), which when executed by the processor cause the processor to: use a class name into a set of semantically related terms (Mathew: Paragraph [0155] “s, Top-level themes may be identified by from the pool of words and bigrams.” Use a class name into a set of semantically related terms is taught as the Top-level themes that are identified from the pool of words or bigrams [phrase].), search at least some of said set of unstructured text documents with one or more of said terms (Mathew: Paragraph [0205] “Rule generation 870 may include steps in which text patterns to identify a theme are identified… a document would be said to contain the said theme if any one of the rule conditions are met” Search at least some of said set of unstructured text documents with one or more of said terms is taught by the rule generation process in which text patterns of documents are identified to determine if a theme based on any one of the rule conditions are met.)…,

Mathew does not explicitly disclose…including extracting N-grams to construct an approximate classifier without using pre-classified training examples… recursively apply the approximate classifier to evaluate its performance, and modify the approximate classifier, including adding, removing, or refining an N-gram from said approximate classifier;  using an elimination criteria until a stopping condition is met. 
Cormack further teaches recursively apply the approximate classifier to evaluate its performance (Cormack: Paragraph [0174] “ quality control mode assesses the performance of the classification system by comparing the scores and classifications calculated by classification process 350 of FIG. 3 with the user coding decisions 555 for the same document. ” Recursively apply the approximate classifier to evaluate its performance is taught as assesses the performance of the classification system by comparing the scores and classifications calculated by classification process.)…  using an elimination criteria until a stopping condition is met (Cormack: Paragraph [0192] “Using the identified data, a set of documents in the document collection may be classified into one or more of the classes or subclasses and a determination may be made as to whether one or more stopping criteria have been met. If one or more stopping criteria have not been met, another document from the collection may be selected in order to continue classifying the documents of the document collection.” Repeating steps c)-f) until a stopping condition is met is taught as if one or more stopping criteria have not been met, another document from the collection may be selected in order to continue classifying the documents of the document collection. Refer to Paragraph [0193] for further analysis “Furthermore, instead of terminating learning at an arbitrary point in the process, learning continues and is refined (e.g., by continually updating classifiers) until substantially all of the relevant documents have been found (thus achieving high recall)”). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified theme detection of unstructured data of Mathew with the stopping criteria of document classifiers of Cormack in order to allow iteratively applying an active learning process until a stopping criterion has been met, thereby ensuring that all the documents in the collection have been classified by the classification process (Cormack: Paragraph [0153] “active learning process 360 may continue to run until all documents in document collection 120 are processed and classified.”).

Mathew in view of Cormack does not explicitly disclose …including extracting N-grams to construct an approximate classifier without using pre-classified training examples… and modify the approximate classifier, including adding, removing, or refining an N-gram from said approximate classifier;
Itoh further teaches …including extracting N-grams to construct an approximate classifier without using pre-classified training examples (Itoh: Paragraph [0010] “unsupervised training method for an N-gram language model…reliability to select an N-gram entry; and training, by the computer, the N-gram language model about selected one of more of the N-gram entries using all recognition results.” Extracting N-grams to construct rules for an approximate classifier without using pre-classified training examples is taught as unsupervised training method [i.e. without using pre-classified training examples] for an N-gram language model. The speech recognition system returns N-gram entries from the recognition results which are then used to train the speech recognition system.)… and modify the approximate classifier, including adding, removing,or refining an N-gram from said approximate classifier (Itoh: Paragrpah [0008] “pruning the language model in a size suitable for the application, in which all the highest order n-grams and their probabilities are removed from an n-gram language model MO to generate an initial base model, and some of the most important pruned n-gram probabilities are added to this initial base model to provide a pruned language model.” Modifying said approximate classifier for each class based on said elimination criteria, including adding, removing, or refining an N-gram from said approximate classifier is taught as pruning the language model in size by removing the n-grams to provide a pruned model [i.e. approximate classifier]);
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mathew (Matthew, in paragraph 0063, “In some embodiments, language detection may be performed by identifying frequent N-grams” suggests the use of N-grams, which Itoh teaches an improvement of in the same language detection field of use.) and Cormack with the N-gram recognition model of Itoh in order to allow iteratively applying an unsupervised training method for the N-gram language model, thereby providing an improved, unsupervised training method, training system, and training program for an N-gram language model, which neither requires any manual correction nor causes any distortion in the probabilities of N-gram entries. (Itoh: Paragraph [0019] “provide an improved, unsupervised training method, training system, and training program for an N-gram language model, which neither requires any manual correction nor causes any distortion in the probabilities of N-gram entries.”).
Regarding claim 20, Mathew in view of Cormack and Itoh teach the method of claim 19, Cormack further teaches further comprising instructions to apply a stopping condition (Cormack: Paragraph [0192] “may be repeated until a stopping criterion is satisfied.” Stopping condition is taught as stopping criterion.)  comprising one or more of the following: 
a) the difference in the number of terms resulting from recursively applying the approximate classifier is smaller than a pre-set threshold (Cormack: Paragraph [0192] “a stopping criterion may be satisfied when a measure or estimation of precision and recall reach an acceptable level.” The difference in the number of plausible terms resulting from repeating step g) is smaller than a pre-set threshold is taught as when the measure or estimation of precision and recall reaches an acceptable level.), 
b) the same number or more terms are being added in recursively applying the approximate classifier and removed in recursively applying the approximate classifier, or 
c) an approximate classifier has been created for every class (Cormack: Paragraph [0153] “In some embodiments, active learning process 360 may continue to run until all documents in document collection 120 are processed and classified. In an alternative embodiment, active learning process 360 may stop when a specified or determined number of documents have been classified by classification process” An approximate classifier has been created for every class is taught as active learning process may continue to run until all documents in document collection 120 are processed and classified.)).

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified theme detection of unstructured data of Mathew with the stopping criteria of document classifiers of Cormack. Doing so would allow iteratively applying an active learning process until a stopping criterion has been met. This allows for ensuring that all the documents in the collection have been classified by the classification process (Cormack: Paragraph [0152-0153]).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AHSIF A. SHEIKH whose telephone number is (571)272-2607.  The examiner can normally be reached on Mon-Fri 7:30-5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on 571-270-3428.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.







/A.A.S./Examiner, Art Unit 2123                                                                                                                                                                                                        
/MICHAEL J HUNTLEY/Primary Examiner, Art Unit 2116