Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The amendment filed 2021-07-28 has been entered. Claims 1-18 remain pending in the application. 
Response to Arguments
Applicant’s arguments in response to rejections under 35 U.S.C. 101 have been fully considered but they are not persuasive.
Applicant argues that “predicting a distribution probability of a target document in each of multiple topics based on a Latent Dirichlet Allocation (LDA) model" is “not a mathematical concept. On the contrary, for a LDA model, a document is input and a distribution probability is output.”  Examiner respectfully disagrees.  MPEP 2106.04(a)(2)(I) describes Mathematical Concepts, and under subsection C, “Mathematical Calculations”, recites: “A claim that recites a mathematical calculation, when the claim is given its broadest reasonable interpretation in light of the specification, will be considered as falling within the "mathematical concepts" grouping. A mathematical calculation is a mathematical operation (such as multiplication) or an act of calculating using mathematical methods to determine a variable or number, e.g., performing an arithmetic operation such as exponentiation. There is no particular word or set of words that indicates a claim recites a mathematical calculation. That is, a claim does not have to recite the word "calculating" in order to be considered a mathematical calculation. For example, a step of "determining" a variable or number using mathematical methods or "performing" a 
While the next limitation also recites a mathematical concept (“calculating correlation between word vectors of respective words in multiple words of the target document and topic vectors of respective topics in the multiple topics”), Applicant argues that the “wherein” clause “wherein the word vectors of the respective words and the topic vectors of the respective topics are all generated based on a word vector model” is not a mathematical concept.  Examiner points out that the “wherein” clauses in this limitation merely state the nature of the data that is being used in the mathematical calculation, and place no meaningful limits on performing the calculation.  The limitation is still directed to a mathematical calculation.
Applicant argues that the use of the LDA model and word vector model result in integration into a practical application, as they are necessary components and change input data to a different state or thing.  Examiner points out that this is not a criterion for determining integration into a practical application.   Examiner also points out that simply using LDA or the word vector model amounts to “mere linking the use of a judicial exception to a particular technological environment or field of use” as stated in MPEP 2106.05(h) “Field of Use and Technological Environment”.  In other words, the mathematical concepts recited in the 
Applicant argues that the new matter in the amended limitation is inventive and results in more accurate results.  Examiner points out that this alone is not enough to amount to significantly more than the judicial exception, as the new amendment merely states the nature of the inputs to a mathematical calculation.  Examiner again points out MPEP 2106.05(h) “Field of Use and Technological Environment”.
Applicant's argument in response to rejections under 35 U.S.C. 103 has been fully considered and is persuasive.  Applicant argues that neither Niu nor Starr teach the amended matter added to the independent claims.  Examiner agrees.  However, the argument is moot as new grounds of rejection have been presented, as necessitated by the amendment.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-18 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis:
In the instant case, claims are directed to a method (1-5, 16), device (6-10, 17), and non-transitory computer readable medium (11-15, 18). Thus, claims 1-18 fall within one of the four statutory categories (i.e., process, machine, manufacture, or composition of matter).
Step 2 Analysis:
Based on the claims being determined to be within one of the four categories (Step 1), it must be determined if the claims are directed to a judicial exception (i.e., law of nature, natural phenomenon, and abstract idea).  In this case the claims fall within the judicial exception of an abstract idea, specifically, “Mathematical Concepts” (mathematical relationships, mathematical formulas or equations, and mathematical calculations).
Step 2A: Prong 1 analysis:
The claim(s) recite(s):
Claims 1, 6, and 11:
“predicting a distribution probability” (mathematical concept);
“calculating cosine distances” (mathematical concept);
“extracting…words…according to distribution probabilities” (mathematical concept);
Step 2A:  Prong 2 analysis:
The judicial exception is not integrated into a practical application because the additional elements in claims 1, 6, and 11 “computer”, “artificial intelligence”, “processors”, “memory”, and “non-transitory computer readable medium” correspond to mere instructions to implement an abstract idea or other exception on a computer.  Also, additional limitations “based on a Latent Dirichlet Allocation (LDA)” and “wherein the word vectors of the respective words and the topic vectors of the respective topics are all generated based on a word vector model, and wherein the respective topics each corresponds to a topic identifier, and the topic vectors of the respective topics are obtained according to word vectors of respective word materials already trained in a word material repository, topic identifiers corresponding to respective word 
Step 2B analysis:
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional limitations of claims 1, 6, and 11 “computer”, “artificial intelligence”, “processors”, “memory”, and “non-transitory computer readable medium” correspond to mere instructions to implement an abstract idea or other exception on a computer.  Also, additional limitations “based on a Latent Dirichlet Allocation (LDA)” and “wherein the word vectors of the respective words and the topic vectors of the respective topics are all generated based on a word vector model, and wherein the respective topics each corresponds to a topic identifier, and the topic vectors of the respective topics are obtained according to word vectors of respective word materials already trained in a word material repository, topic identifiers corresponding to respective word materials and the word vector model after the training with respect to the word vectors” amount to “mere linking the use of a judicial exception to a particular technological environment or field of use” as stated in MPEP 2106.05(h) “Field of Use and Technological Environment”. The mathematical concepts 
	Dependent claim(s) 2-5, 7-10, and 12-18 when analyzed as a whole are held to be patent ineligible under 35 U.S.C. 101 because the additional recited limitation(s) fail(s) to establish that the claim(s) is/are not directed to an abstract idea, as they recite further embellishment of the judicial exception.  
	Claims 2, 7, and 12 recite that the extracting step of Claims 1, 6, and 11 comprises calculating generation probabilities (mathematical concept) and extracting based on the probabilities (mathematical concept)
	Claims 3, 8, and 13 recite the same limitations as Claims 1, 6, and 11.  Additional limitations “obtaining…word vectors” and “obtaining…topic vectors” do not integrate the abstract idea into a practical application they amount to insignificant extra solution activity (necessary data gathering and outputting - see MPEP 2106.05(g)(3)), nor are they sufficient to amount to significantly more than the judicial exception, since they are well understood, routine, conventional activity (storing and retrieving information in memory - see MPEP 2106.05(d) II (iv)).
Claims 4, 9, and 14 recite the same limitations as Claims 3, 8, and 13, as well as “training a word vector model” (mathematical concept) and “generating word material repository…according to…multiple documents”, which the Instant Specification specifies as word segmentation, which can be performed by a human with pen and paper (mental process).  A mental process is still an abstract idea, and thus together the limitations recite an abstract mere data gathering and outputting – See MPEP 2106.05(g)(3)), nor is it sufficient to amount to significantly more than the judicial exception, since it is well understood, routine, conventional activity (storing and retrieving information in memory - see MPEP 2106.05(d) II (iv)).
	Claims 5, 10, and 15 recite the same limitations as Claims 3, 8, and 13, as well as “obtaining topic identifiers…corresponding to respective word materials” (mathematical concept) and “training topic vectors” (mathematical concept); additional limitation “storing topic vectors” does not integrate the abstract idea into a practical application because it is insignificant extra solution activity (mere data gathering and outputting – See MPEP 2106.05(g)(3)), nor is it sufficient to amount to significantly more than the judicial exception, since it is well understood, routine, conventional activity (storing and retrieving information in memory - see MPEP 2106.05(d) II (iv)).
Claims 16, 17, and 18 recite that the correlation of Claims 1, 6, and 11 comprises calculating the cosine distance (mathematical concept)
	Viewed as a whole, these additional claim element(s) do not provide meaningful limitation(s) to transform the abstract idea into a patent eligible application of the abstract idea such that the claim(s) amounts to significantly more than the abstract idea itself.  Therefore, the claim(s) are rejected under 35 U.S.C. 101 as being directed to non-statutory subject matter.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-18 are rejected under 35 U.S.C. 103 as being unpatentable over Starr et. al. (US 2017/0124174 A1; hereinafter “Starr”), in view of Niu et. al. (“Topic2Vec: Learning Distributed Representations of Topics”; hereinafter “Niu”) and Liu et. al. (“Topical Word Embeddings”; hereinafter “Liu”).
As per Claim 1, Starr teaches a method for extracting keywords based on artificial intelligence by a computer device (Starr, Para [0024], discloses:  “One or more embodiments disclosed herein provide a content management system that improves the organization of electronic text documents by intelligently and accurately categorizing electronic text documents by topic. For example, in one or more embodiments a content management system can categorize electronic text documents by user specified topics. Further, the content management system identifies novel and emerging topics within electronic text documents”.  Here, Starr discloses extracting keywords (“identifies novel and emerging topics”).  Starr, Para [0114], discloses:  “In one or more embodiments, the probabilistic language model is based on a probability matrix and can be built using Latent Dirichlet Allocation (LDA).”  Here, Starr discloses based on artificial intelligence (“using Latent Dirichlet Allocation (LDA)”).  Starr, Para [0151], discloses by a computer device:  “Embodiments of the present disclosure may comprise or utilize a special purpose or general-purpose computer including computer hardware, such as, for example, one or more processors and system memory, as discussed in greater detail below”).
wherein the method comprises:
predicting a distribution probability of a target document in each of multiple topics based on a Latent Dirichlet Allocation (LDA) model (Starr, Para [0114] Lines 18-26, discloses:  “The topic-document matrix also includes a values for the Dirichlet distribution. The topic-document matrix is expressed as alda (z|d), where z is the LDA topic and d is an individual electronic text document. Within the context of LDA matrices, topics refer to latent topics discovered by the algorithm and are not the “topics” that humans assign to responses as otherwise disclosed herein. In the LDA sense, a topic can be thought of as a type of probabilistically derived cluster.”  LDA, by definition, comprises predicting a distribution probability of a target document in each of multiple topics.  Here, Starr confirms that by reciting:  “The topic-document matrix also includes a values for the Dirichlet distribution. The topic-document matrix is expressed as alda (z|d), where z is the LDA topic and d is an individual electronic text document”).
extracting, from the multiple words, words as keywords of the target document, according to distribution probabilities of the target document in each of the multiple topics [and the correlation between the word vectors of the respective words and the topic vectors of the respective topics in the multiple topics] (Starr, Para [0118] Equation (4), discloses 
Plda (w | d) = Sumz { Plda (w | z) * Plda (z | d)  }
, which Starr describes in [0118] as “the probability of a word given an electronic text document and over all topics”, where Plda (z | d) represents distribution probabilities of the target document in each of the multiple topics.  Starr then goes on to use this information to extract keywords.  In Para [0121] Equation (6), Starr uses Plda (w|d) as part of an equation to produce another term P(w|d).  Starr then states in Para [0122], that “To code and/or assign a topic to an electronic text document, as described above, the content management system 106 can use the probability matrix P(w|d)”.  Starr’s use of the term “topic” here has a general meaning, and is not to be confused with the LDA-specific meaning of “topic”.  Thus, Starr’s “assign a topic to an electronic text document” is analogous to extracting, from the multiple words, words as keywords of the target document.) *Extracting based also on the correlation is taught by the combination with Niu below.
	However, Starr fails to explicitly teach calculating correlation between word vectors of respective words in multiple words of the target document and topic vectors of respective topics in the multiple topics, wherein the word vectors of the respective words and the topic vectors of the respective topics are all generated based on a word vector model, and wherein the respective topics each corresponds to a topic identifier, and the topic vectors of the respective topics are obtained according to word vectors of respective word materials already trained in a word material repository, topic identifiers corresponding to respective word materials and the word vector model after the training with respect to the word vectors; extracting, from the multiple words, words as keywords of the target document, according to the cosine distances between the word vectors of the respective words and the topic vectors of the respective topics in the multiple topics.
Niu teaches calculating correlation between word vectors of respective words in multiple words of the target document and topic vectors of respective topics in the multiple topics, wherein the word vectors of the respective words and the topic vectors of the respective topics are all generated based on a word vector model (Niu, Abstract Lines 18-19, discloses “learn topic representations in the same semantic vector space with words”.  Here, Niu discloses word vectors of the respective words and the topic vectors of the respective topics are all generated based on a word vector model.  Niu, Section 4.2, last bullet, discloses:  “Topic2Vec: topics and words are equally represented as the low-dimensional vectors, we can immediately calculate the cosine similarity between words and topics. For each topic, we select higher similarity words.”  Here, Niu discloses calculating correlation between word vectors of respective words in multiple words of the target document and topic vectors of respective topics.)
extracting, from the multiple words, words as keywords of the target document, according to [distribution probabilities of the target document in each of the multiple topics] and the correlation between the word vectors of the respective words and the topic vectors of the respective topics in the multiple topics (Recall that as shown above, Starr discloses extracting, from the multiple words, words as keywords of the target document, according to distribution probabilities of the target document in each of the multiple topics starting with Para [0118] Equation (4): 
Plda (w | d) = Sumz { Plda (w | z) * Plda (z | d)  }
, which Starr describes in [0118] as “the probability of a word given an electronic text document and over all topics”, where Plda (z | d) represents distribution probabilities of the Furthermore, Plda (w | z) represents the distribution probability of a word given a topic.  
Niu, Page 2 Top Left Paragraph, discloses:  “Furthermore, words and topics naturally can estimate similarity and relevance with each other such as using cosine function rather than using probability.”  Here, Niu discloses “using cosine function rather than using probability”, with “rather than” being the key operative phrase.  Here, Niu discloses replacing Plda (w | z) with the cos<w | z>.  Thus, the combination of Starr and Niu results in Plda (w | d) = Sumz { cos<w | z> * Plda (z | d)  }, which is equivalent to according to distribution probabilities of the target document in each of the multiple topics and the correlation between the word vectors of the respective words and the topic vectors of the respective topics in the multiple topics.  As shown above, Starr then goes on to use this information to extract keywords.  In Para [0121] Equation (6), Starr uses Plda (w|d) as part of an equation to produce another term P(w|d).  Starr then states in Para [0122], that “To code and/or assign a topic to an electronic text document, as described above, the content management system 106 can use the probability matrix P(w|d)”.  Thus, the combination of Starr and Niu teaches the limitation of extracting, from the multiple words, words as keywords of the target document, according to distribution probabilities of the target document in each of the multiple topics and the correlation between the word vectors of the respective words and the topic vectors of the respective topics in the multiple topics.)
Starr and Niu are analogous art because they are all in the same field of endeavor of machine learning.  
st full paragraph, 2nd sentence)
However, the combination of Starr and Niu thus far fails to teach wherein the respective topics each corresponds to a topic identifier, and the topic vectors of the respective topics are obtained according to word vectors of respective word materials already trained in a word material repository, topic identifiers corresponding to respective word materials and the word vector model after the training with respect to the word vectors.
Liu teaches wherein the respective topics each corresponds to a topic identifier, and the topic vectors of the respective topics are obtained according to word vectors of respective word materials already trained in a word material repository, topic identifiers corresponding to respective word materials and the word vector model after the training with respect to the word vectors. (Liu, Page 2418 Paragraph 3, discloses:  “We employ the widely used latent Dirichlet allocation (LDA) (Blei, Ng, and Jordan 2003) to obtain word topics, and perform collapsed Gibbs sampling (Griffiths and Steyvers 2004) to iteratively assign latent topics for each word token. In this way, given a sequence of words D = fw1; : : : ;wMg, after LDA converges, each word token wi will be discriminated into a specific topic zi, forming a word-topic pair hwi; zii, which can be used to learn topical word embeddings. We design three TWE models to learn topical word vectors, as shown in Figure 1, where the window size is 1, and wi􀀀1 and wi+1 are contextual words of wi. 
TWE-1.
We regard each topic as a pseudo word, and learn topic embeddings and word embeddings separately. We then build the topical word embedding of hwi; zii according to the embeddings of wi and zi.”
Here, Liu discloses wherein the respective topics each corresponds to a topic identifier (“specific topic zi”).  
Liu, Page 2420 “Optimization and Parameter Estimation” Paragraph 2, discloses: “Initialization is important for learning TWE models. In TWE-1, we first learn word embeddings using Skip-Gram. Afterwards, we initialize each topic vector with the average over all words assigned to this topics, and learn topic embeddings while keeping word embeddings unchanged.”  Here, Liu discloses the topic vectors of the respective topics (“each topic vector”) are obtained according to word vectors of respective word materials already trained in a word material repository (“initialize…with the average over all words… and learn topic embeddings while keeping word embeddings unchanged”), topic identifiers corresponding to respective word materials (“all words assigned to this topics”) and the word vector model after the training with respect to the word vectors (“first learn word embeddings”).  It is clear here that Liu is learning the topic vectors after the word vectors, as Liu states that the topic vectors are initialized with the word vectors, and the word vectors remain unchanged.)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the keyword extraction with common semantic space for words and topics of Starr and Niu, with the independent topic vector generation of Liu.  One would have been motivated to do so because independence of word and topic embeddings appears to result in improved performance for capturing semantic information (Liu, Page 2422 “Evaluation Results”:  “Table 3 shows the evaluation results of text classification on 20NewsGroup. We can observe that TWE-1 outperforms all baselines significantly, especially for topic models and embedding models. This indicates that our model can capture more precise semantic information of documents as compared to topic models and embedding models. Moreover, as compared to the BOW model, the TWE models manage to reduce the document feature space by 99:2 percent in this case.  Among three TWE models, it is amazing that the simplest TWE-1 model again achieves the best performance. As inspired by the anonymous reviewer, word and topic embeddings are learned independently in TWE-1 but are built interactively in TWE-2 and TWE-3. The independence assumption in TWE-1 may be the reason of better performance”)

	As per Claim 2, the combination of Starr, Niu, and Liu teaches the method according to claim 1.  Starr teaches wherein the extracting, from the multiple words, words as keywords of the target document, according to distribution probabilities of the target document in each of 
according to the generation probabilities of the respective words in the target document, extracting, from the multiple words, words as keywords of the target document.  (Starr, Para [0118] Equation (4), discloses 
Plda (w | d) = Sumz { Plda (w | z) * Plda (z | d)  }
, which Starr describes in [0118] as “the probability of a word given an electronic text document and over all topics”, corresponding to generation probabilities of the respective words in the target document.  Starr then goes on to use this information to extract keywords.  In Para [0121] Equation (6), Starr uses Plda (w|d) as part of an equation to produce another term P(w|d).  Starr then states in Para [0122], that “To code and/or assign a topic to an electronic text document, as described above, the content management system 106 can use the probability matrix P(w|d)”.  Starr’s use of the term “topic” here has a general meaning, and is not to be confused with the LDA-specific meaning of “topic”.  Thus, Starr’s “assign a topic to an electronic text document” is analogous to extracting, from the multiple words, words as keywords of the target document.)
calculating generation probabilities of the respective words in the target document, according to distribution probabilities of the target document in each of the multiple topics [and correlation between word vectors of respective words and topic vectors of respective topics in multiple topics]  (Starr, Para [0118] Equation (4), discloses 
Plda (w | d) = Sumz { Plda (w | z) * Plda (z | d)  }
, which Starr describes in [0118] as “the probability of a word given an electronic text document and over all topics”, corresponding to calculating generation probabilities of the respective words in the target document.  Also Plda (z | d) represents distribution probabilities of the target document in each of the multiple topics.)  *Extracting based also on the correlation is taught by the combination with Niu below.
	However, Starr fails to explicitly teach extracting, from the multiple words, words as keywords of the target document, according to the correlation between the word vectors of the respective words and the topic vectors of the respective topics in the multiple topics.
	Niu teaches extracting, from the multiple words, words as keywords of the target document, according to [distribution probabilities of the target document in each of the multiple topics] and the correlation between the word vectors of the respective words and the topic vectors of the respective topics in the multiple topics (Recall that as shown above, Starr discloses extracting, from the multiple words, words as keywords of the target document, according to distribution probabilities of the target document in each of the multiple topics starting with Para [0118] Equation (4): 
Plda (w | d) = Sumz { Plda (w | z) * Plda (z | d)  }
, which Starr describes in [0118] as “the probability of a word given an electronic text document and over all topics”, where Plda (z | d) represents distribution probabilities of the target document in each of the multiple topics.   Furthermore, Plda (w | z) represents the distribution probability of a word given a topic.  
Niu, Page 2 Top Left Paragraph, discloses:  “Furthermore, words and topics naturally can estimate similarity and relevance with each other such as using cosine function rather than using probability.”  Here, Niu discloses “using cosine function rather than using probability”, with “rather than” being the key operative phrase.  Here, Niu discloses replacing Plda (w | z) with the cos<w | z>.  Thus, the combination of Starr, Niu, and Liu results in Plda (w | d) = Sumz { cos<w | z> * Plda (z | d)  }, which is equivalent to according to distribution probabilities of the target document in each of the multiple topics and the correlation between the word vectors of the respective words and the topic vectors of the respective topics in the multiple topics.

As per Claim 3, the combination of Starr, Niu, and Liu teaches the method according to claim 1, wherein before calculating correlation between word vectors of respective words in multiple words of the target document and topic vectors of respective topics in the multiple topics, the method further comprises: 
obtaining, from a preset word material repository, word vectors of word materials corresponding to the respective words (Niu, Section 2.2 “Word2Vec”, discloses “Inspired by Neural Probabilistic Language Model (NPLM) (Bengio et al., 2003), Mikolov et al. (2013a) proposed Word2Vec including CBOW and Skip-gram for computing continuous vector representations of words from large data sets.”  Here, Niu discloses word vectors of word materials corresponding to the respective words (“continuous vector representations of words”), wherein the words are from a preset word material repository (“large data set”)).
and obtaining topic vectors of the respective topics from a preset topic vector repository (Niu, Section 3 “Topic2Vec”, discloses:  “Inspired by word2vec, we incorporate topics and words into the NPLM. We propose Topic2Vec as shown in Fig. 1 for learning distributed topic representations together with word representations”.  Here, Niu discloses (“distributed topic representations”).  Niu, Section 3 Para 2, discloses:  “When training, given a word-topic sequence of a document D = {w1 : z1; :::;wM : zM}, where zi is the word wi’s topic inferred from LDA, the learning objective functions can be defined to maximize the following log-likelihoods, based on CBOW and Skip-gram, respectively.”  Here, Niu discloses that, before running Topic2Vec, topics are generated from LDA, which produces the topics, and thus a preset topic vector repository.)

As per claim 4, the combination of Starr, Niu, and Liu teaches the method according to claim 3 as shown above, as well as wherein before obtaining, from a preset word material repository, word vectors of word materials corresponding to the respective words, the method further comprises:                                     
generating word material repository including several word materials, according to a preset document repository including multiple documents;  (Niu, Section 2.2 “Word2Vec”, discloses “Inspired by Neural Probabilistic Language Model (NPLM) (Bengio et al., 2003), Mikolov et al. (2013a) proposed Word2Vec including CBOW and Skip-gram for computing continuous vector representations of words from large data sets.  Here, Niu discloses generating word material repository including several word materials (“continuous vector representations of words from large data sets”).  Niu, Section 4, discloses:  “We use the English Gigaword Fifth Edition as our training data for learning fundamental word and topic representations. We randomly extract part of documents and construct our training set described as follows: we chose 100,000 documents, where each consists of more than 1,000 characters from subfolder ltw_eng (Los Angeles Times) containing 411,032 documents.”  Here, Niu discloses according to a preset document repository including multiple documents (“we chose 100,000 documents”)).
training the word vector model and word vectors of the respective word materials, according to the respective word materials in the word material repository and co-occurrence information of the word materials with other word materials in respective documents in the document repository (Niu, Section 2.2 “Word2Vec” Para 2, discloses “When training, given a word sequence D = {w1; :::;wM}, the learning objective functions are defined to maximize the following log-likelihoods, based on CBOW and Skip-gram, respectively.” Here, Niu discloses training the word vector model and word vectors of the respective word materials (“when training”).  Niu, Section 2.2 Para 3, discloses “Here, in Equation (1a), wcxt indicates the context of the current word wi. In Equation (1b), k is the window size of context”.  Here, Niu discloses that Word2Vec is based upon the surrounding words (“context of the current word”, “window size of the context”), and is thus according to the respective word materials in the word material repository and co-occurrence information of the word materials with other word materials.  Niu, Section 4, discloses:  “We use the English Gigaword Fifth Edition as our training data for learning fundamental word and topic representations. We randomly extract part of documents and construct our training set described as follows: we chose 100,000 documents, where each consists of more than 1,000 characters from subfolder ltw_eng (Los Angeles Times) containing 411,032 documents.”  Here, Niu discloses that the words come from respective documents in the document repository.)
(Niu, Section 2.2 “Word2Vec”, discloses “Inspired by Neural Probabilistic Language Model (NPLM) (Bengio et al., 2003), Mikolov et al. (2013a) proposed Word2Vec including CBOW and Skip-gram for computing continuous vector representations of words from large data sets”  Here, Niu discloses that Word2Vec produces word vectors of the respective word materials in the word material repository (“continuous vector representations of words from large data sets”).  Niu, Figure 2, discloses experimental results using these word vectors, and thus the word vectors must have been stored in memory.) 

As per claim 5, the combination of Starr, Niu, and Liu as shown above teaches the method according to claim 3, the combination of Starr, Niu, and Liu further teaches wherein before obtaining topic vectors of the respective topics from a preset topic vector repository, the method further comprises: 
obtaining topic identifiers corresponding to the respective word materials; (Niu, Section 5 Para 2 Lines 4-6, discloses:  “Besides, we have to run LDA firstly to assign a topic for each word in the corpus before Topic2Vec.”  Here, Niu discloses obtaining a topic for each word.)  
according to word vectors of the respective word materials in the word material repository, topic identifiers corresponding to the respective word materials and the trained word vector model, training topic vectors of topics corresponding to the respective topic identifiers. (Niu, Abstract Lines 18-19, discloses to “learn topic representations in the same semantic vector space with words” and thus the topic vectors are according to word vectors of   Niu, Section 5 Para 2 Lines 4-6, discloses:  “Besides, we have to run LDA firstly to assign a topic for each word in the corpus before Topic2Vec.” Here, Niu discloses topic identifiers corresponding to the respective word materials.  Niu, Section 2.2 “Word2Vec” Para 1-2, discloses achieving “continuous vector representations of words” via “training…given a word sequence”, or a trained word vector model.  Niu, Section 3 “Topic2Vec”, discloses training topic vectors of topics corresponding to the respective topic identifiers (“Topic2Vec aims at learning topic representations along with word representations”)).
storing topic vectors of the respective topics in the topic vector repository  (Niu, Figure 2, discloses experimental results using these topic vectors of the respective topics in the topic vector repository, and thus the topic vectors must have been stored in memory).

As per claim 6, claim 6 is a device claim corresponding to method claim 1.  The difference is that the system claim recites a memory and one or more processors.  (Starr, Para [0151], discloses “Embodiments of the present disclosure may comprise or utilize a special purpose or general-purpose computer including computer hardware, such as, for example, one or more processors and system memory, as discussed in greater detail below”).  Claim 6 is rejected for the same reasons as claim 1.

As per claim 7, claim 7 is a device claim corresponding to method claim 2.  The difference is that the system claim recites a memory and one or more processors.  Claim 7 is rejected for the same reasons as claim 2.

As per claim 8, claim 8 is a device claim corresponding to method claim 3.  The difference is that the system claim recites a memory and one or more processors.  Claim 8 is rejected for the same reasons as claim 3.

As per claim 9, claim 9 is a device claim corresponding to method claim 4.  The difference is that the system claim recites a memory and one or more processors.  Claim 9 is rejected for the same reasons as claim 4.

As per claim 10, claim 10 is a device claim corresponding to method claim 4.  The difference is that the system claim recites a memory and one or more processors.  Claim 10 is rejected for the same reasons as claim 4.

As per claim 11, claim 11 is a non-transitory computer readable medium claim corresponding to method claim 1.  The difference is that the non-transitory computer readable medium claim recites a non-transitory computer readable medium and a processor.  (Starr, Para [0151], discloses “In general, a processor (e.g., a microprocessor) receives instructions, from a non-transitory computer-readable medium, (e.g., a memory, etc.), and executes those instructions, thereby performing one or more processes, including one or more of the processes described herein”).  Claim 11 is rejected for the same reasons as claim 1.

As per claim 12, claim 12 is a non-transitory computer readable medium claim corresponding to method claim 2.  The difference is that the non-transitory computer readable medium claim recites a non-transitory computer readable medium and a processor.  Claim 12 is rejected for the same reasons as claim 2.

As per claim 13, claim 13 is a non-transitory computer readable medium claim corresponding to method claim 3.  The difference is that the non-transitory computer readable medium claim recites a non-transitory computer readable medium and a processor.  Claim 13 is rejected for the same reasons as claim 3.

As per claim 14, claim 14 is a non-transitory computer readable medium claim corresponding to method claim 4.  The difference is that the non-transitory computer readable medium claim recites a non-transitory computer readable medium and a processor.  Claim 14 is rejected for the same reasons as claim 4.

As per claim 15, claim 15 is a non-transitory computer readable medium claim corresponding to method claim 5.  The difference is that the non-transitory computer readable medium claim recites a non-transitory computer readable medium and a processor.  Claim 15 is rejected for the same reasons as claim 5.

As per Claim 16, the combination of Starr, Niu, and Liu teaches the method according to claim 1.  Niu teaches wherein the correlation between the word vectors of the respective words and the topic vectors of the respective topics in the multiple topics comprises the cosine distances between the word vectors of the respective words and the topic vectors of the respective topics in the multiple topics. (Niu, Section 4.2, last bullet, discloses:  “Topic2Vec: topics and words are equally represented as the low-dimensional vectors, we can immediately calculate the cosine similarity between words and topics. For each topic, we select higher similarity words.”)

As per claim 17, claim 17 is a device claim corresponding to method claim 16.  The difference is that the system claim recites a memory and one or more processors.  Claim 17 is rejected for the same reasons as claim 16.

As per claim 18, claim 18 is a non-transitory computer readable medium claim corresponding to method claim 16.  The difference is that the non-transitory computer readable medium claim recites a non-transitory computer readable medium and a processor.  Claim 18 is rejected for the same reasons as claim 16.


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Moody (“Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec”) discloses combining LDA with word embeddings.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEONARD A SIEGER whose telephone number is (571)272-9710.  The examiner can normally be reached on M-F 8:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571) 272-9767.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-
/L.A.S./Examiner, Art Unit 2126 
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126