EXAMINER'S STATEMENT OF REASONS FOR ALLOWANCE
The following is an examiner’s statement of reasons for allowance:
Independent claims 1, 9, and 15 are allowable because the prior art of record does not disclose or reasonably suggest a system, method, and computer program product for short text identification comprising determining a co-occurrence matrix for training words stored in a corpus, determining a word vector embedding for each of the training words in the corpus to relate each of the training words in the corpus to other ones of the training words in the corpus in an n-dimensional vector space, determining word tokens for words in short text in documents in a data repository that is separate and distinct from the corpus, determining word vectors for the word tokens based on the word vector embedding and the co-occurrence matrix, determining sentence vectors for short text based on the word vectors in each short text, and determining a plurality of topics in the documents based on clustering of sentence vectors, wherein the plurality of topics indicates topics that are predominant in the documents in the data repository, and wherein the determining the plurality of topics comprises lemmatizing the documents, removing stop words, removing at least some punctuation, removing sentences below a word length threshold, and pruning word vectors based on cosine proximity in the word vector embedding.
Generally, the prior art of record does not appear to disclose or reasonably suggest an entire combination directed to determining topics that includes all of the limitations of removing stop words, lemmatizing words in a document, removing punctuation, removing sentences below a certain word length threshold, and pruning e.g., roots of verbs, and removing punctuation.  Similarly, it is known in the prior art to determine topics in documents based clustering of sentence vectors.  However, the entire combination of determining topics using a co-occurrence matrix, removing sentences below a word length threshold, and pruning word vectors based on cosine proximity, in combination, is unobvious to one having ordinary skill in the art.
The Specification, ¶[0023] and ¶[0029], describes these limitations of removing sentences below a word length threshold and pruning word vectors based on cosine proximity.  
The Specification, ¶[0001] and ¶[0008], states objectives of improving evaluation of short text over manual context analysis that is time-consuming and labor intensive, and over machine learning techniques that have difficulty determining the interpretation of short text due to lack of context in the document.
Any comments considered necessary by Applicants must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARTIN LERNER whose telephone number is (571) 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571) 272-5551.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center.  Unpublished application information in Patent Center is available to registered users.  To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov.  Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format.  For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).  If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/MARTIN LERNER/Primary Examiner
Art Unit 2657                                                                                                                                                                                                        January 5, 2022