DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Drawings
The drawings are objected to because it appears that “contect” should be “context” in Step 11 of Figure 1.  
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office Action to avoid abandonment of the application.  Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended.  The figure or figure number of an amended drawing should not be labeled as “amended.”  If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency.  Additional replacement sheets may be necessary to show the renumbering of the remaining figures.  Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d).  If the changes are not accepted by the examiner, Applicant will be notified and informed of any required corrective action in the next Office Action.  The objection to the drawings will not be held in abeyance.

Specification
The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. 
The following title is suggested: Natural Language Processing with N-Gram Analysis of Narrative Information.
The disclosure is objected to because of the following informalities:
In ¶[0015], “particular file” should be “a particular file”.
In ¶[0016], “a user can also find trending themes that aren’t failure parts this can mitigate” appears to be incorrect, but could be “a user can also find trending themes.  This can mitigate”.  The phrase “that aren’t failure parts” appears to be extraneous, but there should be a sentence break before “this can mitigate”.
In ¶[0018], “results of the NLP process.” should be “and results of the NLP process.”
In ¶[0019], “If a user discovers that includes a word/phrase that is not related the intended purpose” appears that it should be “If a user discovers a word/phrase that is not related to the intended purpose”.
Appropriate correction is required.

Claim Objections
Claims 2 and 3 are objected to because of the following informalities:  
Independent claims 2 and 3 set forth a limitation of “to clarify and filter”, which is somewhat indefinite as a scope, but can be read broadly.  The only support for ‘clarify’ and ‘filter’ is at ¶[0015] of the Specification, but one skilled in the art would not know what is entailed by this limitation.  The Specification refers to Step 11, where the system sorts any n-grams found, collapses the result for repeated n-grams, and optionally searches for clarifying nouns and adjectives near the n-grams.  However, Applicant’s manner of drafting this limitation does not actually require all of these steps, but could be broadly construed as simply sorting n-grams and retaining n-grams that meet a threshold of occurrence.
Independent claim 3 combines the limitations of independent claim 1 and independent claim 2, but there appear to be duplicative limitations due to this combination.  Here, a second section is stated to “extract narrative information from the data files”, but then there is a subsequent limitation that repeats this as “extracting the narrative information from the data files”.  Similarly, a third section is stated to “execute a plurality of NLP algorithms on the narrative information to generate NLP results”, but then there is a subsequent limitation that repeats this as “applying a plurality of natural language processing (NLP) algorithms to the narrative information to generate NLP results”.  Additionally, a fifth section is stated to “output the NLP results to an output file”, but then there is a subsequent limitation that repeats this as “creating an output data file comprising the NLP results.”  
Independent claim 3 includes a variety of problems of antecedent basis due to the repetition of limitations.  The claim language sets forth “a plurality of NLP algorithms” in a third section, but then subsequently sets forth applying “a plurality of natural language processing algorithms”.  Similarly, the claim language sets forth “an output file” in a fifth section, but then subsequently sets forth creating “an output data file”.  Generally, any subsequent occurrence of a same limitation should be accompanied by a definite article of “the” or “said”, and only an initial occurrence of a limitation should have an indefinite article of “a” or “an”.
Appropriate correction is required.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1 to 3 are rejected under 35 U.S.C. 103 as being unpatentable over Tiwari (U.S. Patent Publication 2019/0102374) in view of Shaner (U.S. Patent No. 5,991,714).
Concerning independent claim 1, Tiwari discloses a method of predicting trending topics, comprising:
“identifying at least one data file comprising narrative information, wherein the narrative information comprising a plurality of data entries” – text extractor 344 can receive, e.g., through interface 342, a set of posts or other content items (“at least one data file”) (¶[0032]: Figure 3); process 400 begins by obtaining a set of content items, e.g., posts; these can be all the posts from a social media website from a particular time period, e.g., the past six months; process 400 can set a first post of the obtained posts to be a current post to be operated on (¶[0040]: Figure 4: Steps 404 to 406); broadly, social media posts are “data entries” that include “narrative information”, where each social media post is “one data file”;
“extracting the narrative information from the data files” – text extractor 344 can extract text from a set of posts (¶[0032]: Figure 3); process 400 can extract text from a current post; extracting text can include using text included as part of the post (¶[0042]: Figure 4: Step 410);
“applying a plurality of natural language processing (NLP) algorithms to the narrative information to generate NLP results, wherein the NLP algorithms comprise finding word level n-grams for a plurality of word level n-gram lengths” – N-gram generator 348 can normalize the extracted text for each received post, tokenize the normalized text, and organize the tokenized text into n-grams of a particular length; normalizing the extracted text can include replacing with whitespace and removing special characters including punctuation; n-gram generator 348 can remove from the cumulative set, or not add to the cumulative set, n-grams that contain certain specified stop words (¶[0034] - ¶[0035]: Figure 3); process 400 can tokenize the text that was extracted from a current post, and organize the tokenized text into n-grams of a specified length, e.g., one, two, or three words (“for a plurality of word level n-gram lengths”); process 400 can remove from the cumulative set of n-grams those n-grams that include one or more stop words (¶[0044] - ¶[0047]: Figure 4: Steps 414 to 422); here, normalizing, tokenizing, and removing stop words are “a plurality of NLP algorithms”; Compare Specification, ¶[0016], which describes natural language processes as including tokenizing, removing punctuation, and removing stop words;
“counting repeating n-gram instances within each data entry; counting a number of repeating n-gram instances across the plurality of data entries” – frequency computer 350 can receive the cumulative set of n-grams, and compute a frequency score for each unique n-gram; a frequency score for a ‘unique’ n-gram is an occurrence value for all n-grams within that set that have the same sequence of tokens; an occurrence value is a total count, e.g., ‘here we go’ has an occurrence value of three, and ‘we’re on our way’ has an occurrence value of one; frequency counter 350 can count the number of times that a unique n-gram occurs total, and can provide the counts as a frequency score (¶[0036]: Figure 3); process 400 can determine a frequency value for each unique n-gram; the frequency value can be a total count of the occurrence of the n-gram (¶[0048]: Figure 4: Step 426); implicitly, counting a total number of times an n-gram occurs includes counting the number of times that n-gram occurs in a given post (“counting . . . within each data entry”) and summing the number of times that n-gram occurs over all of the given posts (“counting . . . across the plurality of data entries”); 
“sorting the NLP results for each n-gram based on the number of repeated n-gram instances across the plurality of data entries” – n-grams can be sorted by their frequency score; n-grams with a frequency above a threshold, e.g., ‘high frequency n-grams’, can be passed to prediction engine 352 (¶[0036]: Figure 3); process 400 can select n-grams whose frequency value is above a threshold; n-grams can be selected whose frequency value is within the top 5% (¶[0049}: Figure 4: Step 428); process 400 can sort the n-grams, and select the top five n-grams in each category (¶[0051]: Figure 4: Step 432); 
“creating an output data file comprising the NLP results” – top scoring n-grams with a prediction value above a threshold can be determined as likely to be trending in the future; identifications of these top-scoring n-grams can be provided, e.g., through interface 342 (“creating, an output data file”); advertisers may want to know what topics are trending for their product (¶[0038]: Figure 3); n-grams selected can be surfaced to users in a variety of ways; a user may want to know what topics will be trending (¶[0052]: Figure 4: Step 436).

Concerning independent claim 2, Tiwari discloses a system predicting trending topics, comprising:
“a non-transitory computer readable storage medium operable for storing a plurality of machine readable computer instructions operable to control one or more elements of a NLP processing system comprising:” – memory 150 includes program memory 160 that stores programs and software of topics trending system 164 (¶[0018]: Figure 1); implementations include computer-readable storage media (“a non-transitory computer readable storage medium”) that can store instructions that implement at least portions of the described technology (“for storing a plurality of machine readable computer instructions”) (¶[0062]);
“a first section of machine readable computer instructions adapted to initiate the system” – broadly, ‘initiating a system’ is implicit, as this can be construed as merely turning on the system, or selecting the program to execute from an interface of a computer; alternatively, this limitation can be construed as setting a first post of obtained posts to be a current post to be operated on by a loop between blocks 408 to 420 (¶[0041]: Figure 4: Step 406);
“a second section of machine readable computer instructions adapted to load data files into the system and extract narrative information from the data files” – text extractor 344 can receive, e.g., through interface 342, a set of posts or other content items (“load data files into the system”) (¶[0032]: Figure 3); process 400 begins by obtaining a set of content items, e.g., posts; these can be all the posts from a social media website from a particular time period, e.g., the past six months; process 400 can set a first post of the obtained posts to be a current post to be operated on (¶[0040]: Figure 4: Steps 404 to 406); broadly, social media posts are “data files” that include “narrative information” as “data entries”;
“a third section of machine readable computer instructions adapted to execute a plurality of NLP algorithms on the narrative information to generate NLP results” – N-gram generator 348 can normalize the extracted text for each received post, tokenize the normalized text, and organize the tokenized text into n-grams of a particular length; normalizing the extracted text can include replacing with whitespace and removing special characters including punctuation; n-gram generator 348 can remove from the cumulative set, or not add to the cumulative set, n-grams that contain certain specified stop words (¶[0034] - ¶[0035]: Figure 3); process 400 can tokenize the text that was extracted from a current post, and organize the tokenized text into n-grams of a specified length, e.g., one, two, or three words; process 400 can remove from the cumulative set of n-grams those n-grams that include one or more stop words (¶[0044] - ¶[0047]: Figure 4: Steps 414 to 422); here, normalizing, tokenizing, and removing stop words are “a plurality of NLP algorithms”; Compare Specification, ¶[0016], which describes natural language processes as including tokenizing, removing punctuation, and removing stop words;
“a fourth section of machine readable computer instructions adapted to clarify and filter the NLP results” – frequency computer 350 can receive the cumulative set of n-grams, and compute a frequency score for each unique n-gram; a frequency score for a ‘unique’ n-gram is an occurrence value for all n-grams within that set that have the same sequence of tokens; an occurrence value is a total count, e.g., ‘here we go’ has an occurrence value of three, and ‘we’re on our way’ has an occurrence value of one; frequency counter 350 can count the total number of times that a unique n-gram occurs, and can provide the counts as a frequency score (¶[0036]: Figure 3); n-grams can be sorted by their frequency score; n-grams with a frequency above a threshold, e.g., ‘high frequency n-grams’, can be passed to prediction engine 352 (¶[0036]: Figure 3); process 400 can select n-grams whose frequency value is above a threshold; n-grams can be selected whose frequency value is within the top 5% (¶[0049]: Figure 4: Step 428); process 400 can sort the n-grams, and select the top five n-grams in each category (¶[0051]: Figure 4: Step 432); here, including only top-scoring n-grams is equivalent to “clarify and filter the NLP results”; that is, n-grams that are top-scoring according to their count occurrences are ‘filtered’ according to these threshold, and these top-scoring n-grams ‘clarify’ the most important of the n-grams; Compare Specification, ¶[0015], which describes ‘clarify and filter’ as including sorting n-grams;
“a fifth section of machine readable computer instructions adapted to output the NLP results to an output file” – top scoring n-grams with a prediction value above a threshold can be determined as likely to be trending in the future; identifications of these top-scoring n-grams can be provided, e.g., through interface 342 (“output the NLP results to an output file”); advertisers may want to know what topics are trending for their product (¶[0038]: Figure 3); n-grams selected can be surfaced to users in a variety of ways; a user may want to know what topics will be trending (¶[0052]: Figure 4: Step 436).

Concerning independent claim 3, Tiwari discloses a method that incorporates the limitations of independent claim 1 and independent claim 2.

Concerning independent claims 1, 2, and 3, Tiwari arguably discloses all of the limitations to anticipate these independent claims.  Conceivably, Tiwari does not clearly disclose the limitation of finding word-level n-grams “for a plurality of word level n-gram lengths”.  Still, Tiwari discloses that n-grams can be a specified length, e.g., one, two, or three words.  (¶[0045]: Figure 4: Step 410)  Even if a variable length of n-grams is not disclosed by Tiwari, this is taught by Shaner.
Concerning independent claims 1, 2, and 3, Shaner teaches characterizing data types of interest that includes a step of gathering at least one file for each data type, and counting the number of times each unique n-gram occurs in each file, where n is a range of integer values.  (Column 4, Lines 20 to 31)  A characterization process counts the number of times each unique n-gram occurs within the files for each data type of interest.  The value of n is a range of integers, e.g., from 1 to 8, that includes unique 1-grams, 2-grams, 3-grams, 4-grams, 5-grams, 6-grams, 7-grams, and 8-grams.  (Column 6, Lines 20 to 27)  Shaner, then, clearly teaches “finding word level n-grams for a plurality of word level n-gram lengths”.  An objective is to enable the data itself to indicate how it should best be characterized, as a fixed value for n is often optimal for one type of data but is not optimal for another.  (Column 6, Lines 27 to 36)  It would have been obvious to one having ordinary skill in the art to find word level n-grams for a plurality of word level n-gram lengths as taught by Shaner to predict future trending topics of Tiwari for a purpose of enabling the data itself to optimally indicate how it should best be characterized.

Conclusion
The prior art made of record and not relied upon is considered pertinent to Applicant’s disclosure.
Brants et al., Danielson et al., Eck, Govindarajan et al., Mishra et al., and Ng et al. disclose related prior art.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARTIN LERNER whose telephone number is (571) 272-7608. The examiner can normally be reached Monday-Thursday 8:30 AM-6:00 PM.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571) 272-5551.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center.  Unpublished application information in Patent Center is available to registered users.  To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov.  Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format.  For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).  If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MARTIN LERNER/Primary Examiner
Art Unit 2657                                                                                                                                                                                                        October 31, 2022