DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Specification
The disclosure is objected to because of the following informalities:
In ¶[0003], “manger” should be “manager” (two occurrences).
In ¶[0014], “details description” should be “detailed description”.
In ¶[0046], “a positive response conclude” should be “a positive response concludes”.
In ¶[0049], “terms by L” appears that it should be “terms be L”.
In ¶[0049], “a dictionary of term L” appears that it should be “a dictionary of terms L”.
In ¶[0051], “usecase” does not appear to be a recognized and defined word, but should be “use case”.
In ¶[0051], “For example” begins a clause that is not a complete grammatical sentence, but this could be corrected by changing “TODAY for” to “TODAY is for”.
In ¶[0052], “usecase” does not appear to be a recognized and defined word, but should be “use case”.
In ¶[0053], “usecase” does not appear to be a recognized and defined word, but should be “use case”.
In ¶[0059], “to iteratively manager” should be “to iteratively manage”.
Appropriate correction is required.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1 to 3, 5, 7 to 9, 11, 13 to 15, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang (U.S. Patent Publication 2008/0312911) in view of Clarkson et al. (U.S. Patent Publication 2016/0092435).
Concerning independent claims 1, 7, and 13, Zhang discloses a method, system, and computer program for dictionary determination, comprising:
“applying natural language processing (NPL) to a textual corpus, including applying a dictionary of seed terms to the corpus and identifying one or more matching items in the corpus” – a dictionary has words identified based on candidate words associated with characters found in documents (¶[0010]); a data store stores a document corpus; a processing device identifies candidate words by finding characters in documents of the document corpus (¶[0012]); composition input data store 126 includes an association of composition inputs and entries 128 stored in dictionary 124; input method editor engine 122 can use information in dictionary 124 and composition input data store 126 to identify one or more entries 128 in dictionary with one or more composition inputs in composition input data store 126 (¶[0052]: Figure 2);

“examining the constructed context pattern, including applying each constructed context pattern to the dictionary and identifying matching content between the seed term and content of the constructed context pattern and quantifying the identified matching content” – adding one or more of the candidate words to the input method editor dictionary includes adding a candidate word to the input method editor dictionary when a first count is larger than a second count and the first count is larger than a threshold value (¶[0017]); a candidate word is added to the input method editor dictionary when the first count is larger than the second count (¶[0021]); a first count represents a number of times that the word is the only word in search queries and a second count represents a number of times that the word and one or more other words are included in each of the search queries (¶[0024]); each candidate entry 422 is associated with a count that represents a number of occurrences of the candidate entry 
“identifying lexicon terms from the dictionary that have anomalous behavior reflected in the quantification” – after processing all the documents 420 to identify all of candidate entries 422, engine 406 filters candidate entries 422 to remove candidate entries with counts less than a threshold value; a threshold value can be set to remove candidate entries 422 that contain errors, having words or phrases that are rarely used, or that occur infrequently (¶[0065]: Figure 4); broadly, a word that contains an error, is rarely used, or occurs infrequently is a word that has “anomalous behavior”; Applicants’ Specification, ¶[0034], only defines anomalous behavior as reflected in a score; counts for candidate entry that are less than a threshold, then, represent “anomalous behavior reflected in the quantification”; that is, a count represents “the quantification”;
“selectively removing one or more seed words from the dictionary identified having anomalous behavior” – engine 408 removes from dictionary 412 candidate entries in which a query count is less than a threshold value; removing candidate entries 416 having a low query count can remove candidate entries 416 that contain errors or are rarely used (¶[0069]: Figure 4).
Zhang discloses the main features of comparing terms of a dictionary to a document corpus to identify matching items while taking into account context patterns in the corpus, quantifying matching content using these context patterns, and then removing words that represent anomalous behavior as reflected by this quantification.  Arguably, Zhang may be construed to disclose all of the limitations of these independent claims, but does not expressly disclose ‘natural language processing’, ‘seed terms’, and ‘linguistic properties’ of context patterns.  However, Clarkson et al. teaches whatever might be omitted by Zhang.  Generally, Clarkson et al. teaches building dictionaries of terms related to a seed set of terms, where each of a plurality of patterns is scored based on a plurality of candidate terms and a plurality of seed terms.  (Abstract)  Clarkson et al. states that developing a set of terms and phrases that represent a dictionary is central to natural language processing.  (¶[0007])  Concepts in text are identified based on patterns that approximately surround them in the text, and a set of patterns is iteratively expanded starting from a seed collection of terms.  (¶[0011])  Starting with a set of seed terms for a dictionary, one or more corpus of text is analyzed to locate instances of dictionary terms.  (¶[0013] - ¶[0014])  Potential patterns are created for each occurrence of a dictionary term in a corpus, where these potential patterns may comprise a predetermined number of additional terms to the left and to the right of the occurrence of the search term.  Each of the potential patterns is applied to the corpus to determine additional terms that may fall within the pattern.  (¶[0015] - ¶[0016])  The cardinality of the set of terms is determined, and the fraction of those terms that occur in the seed set is referred to as the confidence.  Those potential patterns with support above a lower Clarkson et al., then, teaches whatever limitations might be omitted by Zhang as directed to “applying a dictionary of seed terms to the corpus” and “natural language processing”.  Clarkson et al. provides a similar “quantification” of “identified matching content” as in Zhang by determining candidates matching seed terms in a dictionary using a cardinality for support of the term – corresponding to a ‘count’ – and a fraction representing a confidence.  Moreover, Clarkson et al. broadly characterizes “linguistic properties” including “constructing a context pattern” by using patterns of a predetermined number of terms to the left and to the right of a word.  The Specification does not actually define what makes this ‘linguistic’, but appears to intend that “linguistic properties” are represented by a context of the words.  An objective is to develop and maintain a dictionary in a manner that does not rely exclusively on expert input and to significantly accelerate development of new dictionaries.  (¶[0010] - ¶[0012])  It would have been obvious to one having ordinary skill in the art to identify matching items in a corpus to seed terms in a dictionary as taught by Clarkson et al. to remove words that are anomalous because they contain errors or are rarely used in Zhang for a purpose of significantly accelerating development of new dictionaries that does not solely rely upon expert input.

Concerning claims 2 to 3, 8 to 9, and 14 to 15, Zhang discloses determining a first count representing a number of times that a candidate word is the only word, and a Zhang’s counts generally can be construed to be ‘scores’.  Similarly, Clarkson et al. teaches that each of the plurality of patterns is scored based on the plurality of candidate terms and the plurality of seed terms (“calculating a score for each identified matching content, the score characterizing matching items between the constructed context pattern and the dictionary seed terms”) (Abstract; ¶[0002]); a cardinality of a pattern for a term is determined as support, a fraction of those terms that occur in the seed set as a confidence, and these values of support and confidence are compared to a limit, i.e., a threshold.  (¶[0017] - ¶[0018]); Clarkson et al., then, teaches “calculating the score includes reviewing a set of terms produced in the constructed pattern and quantifying a match of each term in the set of terms with seed words in the dictionary.”
Concerning claims 5, 11, and 17, Clarkson et al. teaches an embodiment where only some occurrences or instances of dictionary terms are located in a corpus of text; a predetermined number of occurrences are located by sampling only a random subset of each corpus.  (¶[0014])  Generally, random sampling of a corpus appears to be a common technique for training when a size of a corpus can be arbitrarily large.

s 4, 10, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang (U.S. Patent Publication 2008/0312911) in view of Clarkson et al. (U.S. Patent Publication 2016/0092435) as applied to claims 1 to 2, 7 to 8, and 13 to 14 above, and further in view of Erpenbach et al. (U.S. Patent Publication 2018/0113867).
Zhang discloses that a dictionary 124 can include scores or probability values each associated with one or more of dictionary entries 128 to indicate how often entry 128 is used in general.  (¶[0051]: Figure 2)  Here, a dictionary entry is equivalent to a seed term of a dictionary as taught by Clarkson et al.  If a score is associated with each seed term in a dictionary of Zhang, then this score can be construed as ‘metadata’ that is ‘attached’ to a seed word in a dictionary.  However, Zhang does not expressly disclose that this ‘metadata’ represents a score as “a degree having a characteristic selected from the group consisting of: ambiguity and spuriousness of the term with respect to the corpus.”  Mainly, Zhang is directed to a score as how often an entry is used.  Still, ‘spuriousness’ may be understood to reflect words that are not commonly used.  The Specification, ¶[0021] and ¶[0023], does not actually define ‘spuriousness’, but implies that this reflects isolated patterns and noise.  Anyway, Erpenbach et al. teaches natural language processing review and override based on confidence analysis, where override processing component 240 may tag information objects with applicable metadata.  Components analyze confidence scores associated with processing component 220, and ambiguity analysis component 242 is construed to identify ambiguous terms and tag the corresponding information objects with metadata indicating ambiguous terms.  (¶[0048] - ¶[0049]: Figure 2)  An objective is to append metadata to information objects that specify what items are most important to Erpenbach et al. for entry scores or entry probability values that are associated with dictionary entries of Zhang for a purpose of improving a user experience by reducing a number of items that are most important to correct or confirm.  

Claims 6, 12, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang (U.S. Patent Publication 2008/0312911) in view of Clarkson et al. (U.S. Patent Publication 2016/0092435) as applied to claims 1, 7, and 13 above, and further in view of Goud et al. (U.S. Patent Publication 2014/0288924).
Zhang discloses removal of one or more words from a dictionary for words that contain errors or are rarely used.  (¶[0065]; ¶[0069]: Figure 4)  However, Zhang does not clearly disclose that removing words from a dictionary “eliminates distracting items from the dictionary.”  The Specification, ¶[0034], is the only occurrence of “distracting”, and does not actually explain what constitutes a distracting word or why a word might be considered distracting.  Conceivably, a word can be ‘distracting’ simply because there is an error in the word as disclosed by Zhang.  Still, Goud et al. teaches automated personalized dictionary generation, where result quality is a strong function of a dictionary ordering strategy so that considerable effort is required to tune a performance so that a user experience is satisfactory.  Specifically, poor candidates are a distraction rather than a benefit for a user, and well populated dictionaries are a virtual Goud et al., then, expressly teaches that a dictionary is tuned to remove distracting words.  Extracted words are compared against words preexisting with dictionary set 115.  Misspelling is resolved through comparison to frequently misspelled list 229, and dictionary error distance may be calculated for words, and those having low error distances may be used to estimate which candidates are most likely to have been intended.  A simple query may be presented that enables removal of erroneously stored words.  (¶[0100] - ¶[0101]: Figure 11)  Goud et al., then, implies that words are distracting if they are erroneous as disclosed by Zhang.  An objective is to provide semi-automated dictionary population that is fast, efficient, and requires fewer storage resources and less distracting inputs from the user.  (¶[0046])  It would have been obvious to one having ordinary skill in the art to eliminate distracting items from a dictionary due to misspellings as taught by Goud et al. of erroneous words in Zhang for a purpose of providing semi-automated dictionary population that is fast, efficient, and requires fewer storage resources.

Conclusion
The prior art made of record and not relied upon is considered pertinent to Applicants’ disclosure.
Klavans et al., Chiticariu et al., Sukhomlinov, and Feller et al. disclose related prior art.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARTIN LERNER whose telephone number is (571) 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571) 272-5551.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair.  Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/MARTIN LERNER/Primary Examiner
Art Unit 2657                                                                                                                                                                                                        June 9, 2021