Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The amendment filed 2021-10-28 has been entered.  The status of the claims is as follows:
Claims 3, 8, and 13 have been cancelled.
Claims 1, 4-7, 9-12 and 14-18 remain pending in the application.
Claim 19 is new.
Claims 1, 4-6, 9-11, and 14-15 have been amended.

Response to Arguments
Applicant’s arguments in response to rejections under 35 U.S.C. 101 have been fully considered.  The rejection under 35 USC 101 is withdrawn as a result of the addition of “training a word vector model”, as training a machine learning model is considered improving the function of a computer, see PEG Example 39 in the 2019 Guidance.  Also see MPEP 2106.04(a)(1)(vii) that indicates that training a machine learning model is not considered an abstract idea.
Applicant's arguments in response to rejections under 35 U.S.C. 103 have been fully considered but are not persuasive.  Applicant argues on Remarks pg. 11 that Starr’s method of “assigning topics” to documents is not applicable to “extracting keywords”.  Examiner respectfully disagrees.  Examiner pointed out in the action that in Starr’s “assigning topics”, the 
Applicant argues on Remarks pg. 12 that Niu does not teach extracting keywords or training topic vectors.  However, Starr, as explained above, does teach extracting keywords, and Liu, as will be explained below and also shown in the 103 rejection, teaches the training limitations.
Applicant argues on Remarks pg. 12 that the language of Liu is ambiguous, and it is not clear that the topic vectors are learned from already trained word vectors.  Examiner respectfully disagrees that Liu’s intention cannot be ascertained.  Liu, like Starr, and also like the Instant Application, first obtains topic identifiers by running LDA (pg 2419 above “TWE-1”:  “With the favor of latent Dirichlet allocation (LDA), we assign a latent topic zi 2 T for each word wi, according to the probability Pr(zijwi; d) / Pr(wijzi) Pr(zijd).”)  One of ordinary skill in the art will appreciate that the topics produced by LDA are not topic embeddings, but rather vectors representing probabilities of being associated with given words.  Liu then suggests learning topic embeddings from these topic identifiers, based on word embeddings that correspond to 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 4-7, 9-12, and 14-18 are rejected under 35 U.S.C. 103 as being unpatentable over Starr et. al. (US 2017/0124174 A1; hereinafter “Starr”), in view of Niu et. al. (“Topic2Vec: Learning Distributed Representations of Topics”; hereinafter “Niu”) and Liu et. al. (“Topical Word Embeddings”; hereinafter “Liu”).
As per Claim 1, Starr teaches a method for extracting keywords based on artificial intelligence by a computer device (Starr, Para [0024], discloses:  “One or more embodiments disclosed herein provide a content management system that improves the organization of electronic text documents by intelligently and accurately categorizing electronic text documents by topic. For example, in one or more embodiments a content management system can categorize electronic text documents by user specified topics. Further, the content management system identifies novel and emerging topics within electronic text documents”.  Here, Starr discloses extracting keywords (“identifies novel and emerging topics”).  Starr, Para [0114], discloses:  “In one or more embodiments, the probabilistic language model is based on a probability matrix and can be built using Latent Dirichlet Allocation (LDA).”  Here, Starr discloses based on artificial intelligence (“using Latent Dirichlet Allocation (LDA)”).  Starr, Para [0151], discloses by a computer device:  “Embodiments of the present disclosure may comprise or utilize a special purpose or general-purpose computer including computer hardware, such as, for example, one or more processors and system memory, as discussed in greater detail below”).
wherein the method comprises:
predicting a distribution probability of a target document in each of multiple topics based on a Latent Dirichlet Allocation (LDA) model (Starr, Para [0114] Lines 18-26, discloses:  “The topic-document matrix also includes a values for the Dirichlet distribution. The topic-document matrix is expressed as alda (z|d), where z is the LDA topic and d is an individual electronic text document. Within the context of LDA matrices, topics refer to latent topics discovered by the algorithm and are not the “topics” that humans assign to responses as otherwise disclosed herein. In the LDA sense, a topic can be thought of as a type of probabilistically derived cluster.”  LDA, by definition, comprises predicting a distribution probability of a target document in each of multiple topics.  Here, Starr confirms that by reciting:  “The topic-document matrix also includes a values for the Dirichlet distribution. The topic-document matrix is expressed as alda (z|d), where z is the LDA topic and d is an individual electronic text document”).
extracting, from the multiple words, words as keywords of the target document, according to distribution probabilities of the target document in each of the multiple topics [and the correlation between the word vectors of the respective words and the topic vectors of the respective topics in the multiple topics] (Starr, Para [0118] Equation (4), discloses 
Plda (w | d) = Sumz { Plda (w | z) * Plda (z | d)  }
, which Starr describes in [0118] as “the probability of a word given an electronic text document and over all topics”, where Plda (z | d) represents distribution probabilities of the target document in each of the multiple topics.  Starr then goes on to use this information to extract keywords.  In Para [0121] Equation (6), Starr uses Plda (w|d) as part of an equation to produce another term P(w|d).  Starr then states in Para [0122], that “To code and/or assign a topic to an electronic text document, as described above, the content management system 106 can use the probability matrix P(w|d)”.  Starr’s use of the term “topic” here has a general meaning, and is not to be confused with the LDA-specific meaning of “topic”.  Thus, Starr’s “assign a topic to an electronic text document” is analogous to extracting, from the multiple words, words as keywords of the target document.) *Extracting based also on the correlation is taught by the combination with Niu below.
	However, Starr fails to explicitly teach training a word vector model along with word vectors of respective word materials in a word material repository including multiple word materials; storing, in the word material repository, word vectors of the respective word materials already trained; obtaining topics corresponding to the respective word materials; training topic vectors of respective topics, according to the word vectors of the respective word materials already trained and the word vector model already trained; storing, in a topic vector repository, topic vectors of the respective topics already trained; obtaining, from the word material repository, word vectors of respective words in multiple words of the target document; obtaining, from the topic vector repository, topic vectors of respective topics in the multiple topics; calculating correlation between word vectors of respective words in multiple words of the target document and topic vectors of respective topics in the multiple topics; 
	Niu teaches calculating correlation between word vectors of respective words in multiple words of the target document and topic vectors of respective topics in the multiple topics, wherein the word vectors of the respective words and the topic vectors of the respective topics are all generated based on a word vector model (Niu, Abstract Lines 18-19, discloses “learn topic representations in the same semantic vector space with words”.  Here, Niu discloses word vectors of the respective words and the topic vectors of the respective topics are all generated based on a word vector model.  Niu, Section 4.2, last bullet, discloses:  “Topic2Vec: topics and words are equally represented as the low-dimensional vectors, we can immediately calculate the cosine similarity between words and topics. For each topic, we select higher similarity words.”  Here, Niu discloses calculating correlation between word vectors of respective words in multiple words of the target document and topic vectors of respective topics.)
extracting, from the multiple words, words as keywords of the target document, according to [distribution probabilities of the target document in each of the multiple topics] and the correlation between the word vectors of the respective words and the topic vectors of the respective topics in the multiple topics (Recall that as shown above, Starr discloses extracting, from the multiple words, words as keywords of the target document, according to distribution probabilities of the target document in each of the multiple topics starting with Para [0118] Equation (4): 
Plda (w | d) = Sumz { Plda (w | z) * Plda (z | d)  }
, which Starr describes in [0118] as “the probability of a word given an electronic text document and over all topics”, where Plda (z | d) represents distribution probabilities of the target document in each of the multiple topics.   Furthermore, Plda (w | z) represents the distribution probability of a word given a topic.  
Niu, Page 2 Top Left Paragraph, discloses:  “Furthermore, words and topics naturally can estimate similarity and relevance with each other such as using cosine function rather than using probability.”  Here, Niu discloses “using cosine function rather than using probability”, with “rather than” being the key operative phrase.  Here, Niu discloses replacing Plda (w | z) with the cos<w | z>.  Thus, the combination of Starr and Niu results in Plda (w | d) = Sumz { cos<w | z> * Plda (z | d)  }, which is equivalent to according to distribution probabilities of the target document in each of the multiple topics and the correlation between the word vectors of the respective words and the topic vectors of the respective topics in the multiple topics.  As shown above, Starr then goes on to use this information to extract keywords.  In Para [0121] Equation (6), Starr uses Plda (w|d) as part of an equation to produce another term P(w|d).  Starr then states in Para [0122], that “To code and/or assign a topic to an electronic text document, as described above, the content management system 106 can use the probability matrix P(w|d)”.  Thus, the combination of Starr and Niu teaches the limitation of extracting, from the multiple words, words as keywords of the target document, according to distribution probabilities of the target document in each of the multiple topics and the correlation between the word vectors of the respective words and the topic vectors of the respective topics in the multiple topics.)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the keyword extraction of Starr with the joint semantic space embedding of words and topics of Niu.  One would have been motivated to do so because it “produces a better grouping and separation of the words in different topics.” (Niu, Section 4.3, right column, 1st full paragraph, 2nd sentence)
However, the combination of Starr and Niu thus far fails to teach training a word vector model along with word vectors of respective word materials in a word material repository including multiple word materials; storing, in the word material repository, word vectors of the respective word materials already trained; obtaining topics corresponding to the respective word materials; training topic vectors of respective topics, according to the word vectors of the respective word materials already trained and the word vector model already trained; storing, in a topic vector repository, topic vectors of the respective topics already trained; obtaining, from the word material repository, word vectors of respective words in multiple words of the target document; obtaining, from the topic vector repository, topic vectors of respective topics in the multiple topics;
Liu, like Starr, also teaches predicting a distribution probability of a target document in each of multiple topics based on a Latent Dirichlet Allocation (LDA) model (Liu, Intro Para 5, discloses:  “We employ the widely used latent Dirichlet allocation (LDA) (Blei, Ng, and Jordan 2003) to obtain word topics, and perform collapsed Gibbs sampling (Griffiths and Steyvers 2004) to iteratively assign latent topics for each word token.”)
Liu also teaches training a word vector model along with word vectors of respective word materials in a word material repository including multiple word materials (Liu, pg 2420 “Optimization and Parameter Estimation” Para 2, discloses:  “Initialization is important for learning TWE models. In TWE-1, we first learn word embeddings using Skip-Gram.”  Here, Liu trains a word vector model (“Skip-Gram”) that produces a word material repository including multiple word materials (“word embeddings”)).
storing, in the word material repository, word vectors of the respective word materials already trained (Liu, pg 2420 “Optimization and Parameter Estimation” Para 2, discloses:  “Initialization is important for learning TWE models. In TWE-1, we first learn word embeddings using Skip-Gram.”  Here, Liu trains a word vector model (“Skip-Gram”) that produces a word material repository including multiple word materials (“word embeddings”).  Liu goes on to use these produced embeddings for future operations, and thus they are stored in memory.)
obtaining topics corresponding to the respective word materials (Liu, pg 2420 “Optimization and Parameter Estimation” Para 2, discloses:  “Initialization is important for learning TWE models. In TWE-1, we first learn word embeddings using Skip-Gram. Afterwards, we initialize each topic vector with the average over all words assigned to this topics, and learn topic embeddings while keeping word embeddings unchanged.”  Here, Liu discloses obtaining topics corresponding to the word materials.)
(Liu, pg 2420 “Optimization and Parameter Estimation” Para 2, discloses:  “Initialization is important for learning TWE models. In TWE-1, we first learn word embeddings using Skip-Gram. Afterwards, we initialize each topic vector with the average over all words assigned to this topics, and learn topic embeddings while keeping word embeddings unchanged.”  Here, Liu discloses “learn” “topic embeddings”, and thus suggests training topic vectors.)
storing, in a topic vector repository, topic vectors of the respective topics already trained (Liu, pg 2420 “Optimization and Parameter Estimation” Para 2, discloses:  “Initialization is important for learning TWE models. In TWE-1, we first learn word embeddings using Skip-Gram. Afterwards, we initialize each topic vector with the average over all words assigned to this topics, and learn topic embeddings while keeping word embeddings unchanged.”  Here, Liu discloses “learn” “topic embeddings”, and thus suggests training topic vectors.  Liu, pg 2419 TWE-1 Para 3, discloses:  “In TWE-1, we get topical word embedding of a word w in topic z by concatenating the embedding of w and z, i.e., wz = w  z, where is the concatenation operation, and the length of wz is double of w or z.”  Here, Liu discloses that the topic embeddings are used in subsequent operations, and thus they have been stored in memory.)
obtaining, from the word material repository, word vectors of respective words in multiple words of the target document (Liu, pg 2419 TWE-1 Para 3, discloses:  “In TWE-1, we get topical word embedding of a word w in topic z by concatenating the embedding of w and z, i.e., wz = w  z, where is the concatenation operation, and the length of wz is double of w or z.”  Here, Liu discloses obtaining the word vectors and using them in a concatenation operation with the topic vectors.)
obtaining, from the topic vector repository, topic vectors of respective topics in the multiple topics (Liu, pg 2419 TWE-1 Para 3, discloses:  “In TWE-1, we get topical word embedding of a word w in topic z by concatenating the embedding of w and z, i.e., wz = w  z, where is the concatenation operation, and the length of wz is double of w or z.”  Here, Liu discloses obtaining the topic vectors and using them in a concatenation operation with the word vectors.)
Starr, Niu, and Liu are analogous art because they are all in the same field of endeavor of machine learning.  
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the keyword extraction with common semantic space for words and topics of Starr and Niu, with the independent topic vector generation of Liu.  One would have been motivated to do so because independence of word and topic embeddings appears to result in improved performance for capturing semantic information (Liu, Page 2422 “Evaluation Results”:  “Table 3 shows the evaluation results of text classification on 20NewsGroup. We can observe that TWE-1 outperforms all baselines significantly, especially for topic models and embedding models. This indicates that our model can capture more precise semantic information of documents as compared to topic models and embedding models. Moreover, as compared to the BOW model, the TWE models manage to 

	As per Claim 2, the combination of Starr, Niu, and Liu teaches the method according to claim 1.  Starr teaches wherein the extracting, from the multiple words, words as keywords of the target document, according to distribution probabilities of the target document in each of the multiple topics and the correlation between the word vectors of the respective words and the topic vectors of the respective topics in the multiple topics specifically comprises: 
according to the generation probabilities of the respective words in the target document, extracting, from the multiple words, words as keywords of the target document.  (Starr, Para [0118] Equation (4), discloses 
Plda (w | d) = Sumz { Plda (w | z) * Plda (z | d)  }
, which Starr describes in [0118] as “the probability of a word given an electronic text document and over all topics”, corresponding to generation probabilities of the respective words in the target document.  Starr then goes on to use this information to extract keywords.  In Para [0121] Equation (6), Starr uses Plda (w|d) as part of an equation to produce another term P(w|d).  Starr then states in Para [0122], that “To code and/or assign a topic to an electronic text document, as described above, the content management system 106 can use the probability matrix P(w|d)”.  Starr’s use of the term “topic” here has a general meaning, and is not to be confused with the LDA-specific meaning of “topic”.  Thus, Starr’s “assign a topic to an electronic text document” is analogous to extracting, from the multiple words, words as keywords of the target document.)
calculating generation probabilities of the respective words in the target document, according to distribution probabilities of the target document in each of the multiple topics [and correlation between word vectors of respective words and topic vectors of respective topics in multiple topics]  (Starr, Para [0118] Equation (4), discloses 
Plda (w | d) = Sumz { Plda (w | z) * Plda (z | d)  }
, which Starr describes in [0118] as “the probability of a word given an electronic text document and over all topics”, corresponding to calculating generation probabilities of the respective words in the target document.  Also Plda (z | d) represents distribution probabilities of the target document in each of the multiple topics.)  *Extracting based also on the correlation is taught by the combination with Niu below.
	However, Starr fails to explicitly teach extracting, from the multiple words, words as keywords of the target document, according to the correlation between the word vectors of the respective words and the topic vectors of the respective topics in the multiple topics.
	Niu teaches extracting, from the multiple words, words as keywords of the target document, according to [distribution probabilities of the target document in each of the multiple topics] and the correlation between the word vectors of the respective words and the topic vectors of the respective topics in the multiple topics (Recall that as shown above, Starr discloses extracting, from the multiple words, words as keywords of the target document,  starting with Para [0118] Equation (4): 
Plda (w | d) = Sumz { Plda (w | z) * Plda (z | d)  }
, which Starr describes in [0118] as “the probability of a word given an electronic text document and over all topics”, where Plda (z | d) represents distribution probabilities of the target document in each of the multiple topics.   Furthermore, Plda (w | z) represents the distribution probability of a word given a topic.  
Niu, Page 2 Top Left Paragraph, discloses:  “Furthermore, words and topics naturally can estimate similarity and relevance with each other such as using cosine function rather than using probability.”  Here, Niu discloses “using cosine function rather than using probability”, with “rather than” being the key operative phrase.  Here, Niu discloses replacing Plda (w | z) with the cos<w | z>.  Thus, the combination of Starr, Niu, and Liu results in Plda (w | d) = Sumz { cos<w | z> * Plda (z | d)  }, which is equivalent to according to distribution probabilities of the target document in each of the multiple topics and the correlation between the word vectors of the respective words and the topic vectors of the respective topics in the multiple topics.

As per claim 4, the combination of Starr, Niu, and Liu teaches the method according to claim 1.  Niu teaches generating the word material repository including multiple word materials, according to a preset document repository including multiple documents  (Niu, Section 2.2 “Word2Vec”, discloses “Inspired by Neural Probabilistic Language Model (NPLM) (Bengio et al., 2003), Mikolov et al. (2013a) proposed Word2Vec including CBOW and Skip-gram for computing continuous vector representations of words from large data sets.  Here, Niu discloses generating word material repository including multiple word materials (“continuous vector representations of words from large data sets”).  Niu, Section 4, discloses:  “We use the English Gigaword Fifth Edition as our training data for learning fundamental word and topic representations. We randomly extract part of documents and construct our training set described as follows: we chose 100,000 documents, where each consists of more than 1,000 characters from subfolder ltw_eng (Los Angeles Times) containing 411,032 documents.”  Here, Niu discloses according to a preset document repository including multiple documents (“we chose 100,000 documents”)).
wherein training a word vector model along with word vectors of respective word materials comprises:  training the word vector model and word vectors of the respective word materials, according to the respective word materials in the word material repository and co-occurrence information of the word materials with other word materials in respective documents in the document repository (Niu, Section 2.2 “Word2Vec” Para 2, discloses “When training, given a word sequence D = {w1; :::;wM}, the learning objective functions are defined to maximize the following log-likelihoods, based on CBOW and Skip-gram, respectively.” Here, Niu discloses training the word vector model and word vectors of the respective word materials (“when training”).  Niu, Section 2.2 Para 3, discloses “Here, in Equation (1a), wcxt indicates the context of the current word wi. In Equation (1b), k is the window size of context”.  Here, Niu discloses that Word2Vec is based upon the surrounding words (“context of the current word”, “window size of the context”), and is thus according to the respective word materials in the word material repository and co-occurrence information of the word materials with other word materials.  Niu, Section 4, discloses:  “We use the English Gigaword Fifth Edition as our training data for learning fundamental word and topic representations. We randomly extract part of documents and construct our training set described as follows: we chose 100,000 documents, where each consists of more than 1,000 characters from subfolder ltw_eng (Los Angeles Times) containing 411,032 documents.”  Here, Niu discloses that the words come from respective documents in the document repository.)
storing word vectors of the respective word materials in the word material repository (Niu, Section 2.2 “Word2Vec”, discloses “Inspired by Neural Probabilistic Language Model (NPLM) (Bengio et al., 2003), Mikolov et al. (2013a) proposed Word2Vec including CBOW and Skip-gram for computing continuous vector representations of words from large data sets”  Here, Niu discloses that Word2Vec produces word vectors of the respective word materials in the word material repository (“continuous vector representations of words from large data sets”).  Niu, Figure 2, discloses experimental results using these word vectors, and thus the word vectors must have been stored in memory.) 

As per claim 5, the combination of Starr, Niu, and Liu as shown above teaches the method according to claim 1. Liu teaches further comprising obtaining topic identifiers corresponding to the respective word materials; (Liu, Intro Para 5, discloses:  “We employ the widely used latent Dirichlet allocation (LDA) (Blei, Ng, and Jordan 2003) to obtain word topics, and perform collapsed Gibbs sampling (Griffiths and Steyvers 2004) to iteratively assign latent topics for each word token.”  Here, Liu discloses obtaining topic identifiers (“latent topics”) corresponding to the respective word materials (“for each word token”)).
(Liu, pg 2420 “Optimization and Parameter Estimation” Para 2, discloses:  “Initialization is important for learning TWE models. In TWE-1, we first learn word embeddings using Skip-Gram. Afterwards, we initialize each topic vector with the average over all words assigned to this topics, and learn topic embeddings while keeping word embeddings unchanged.”  Here, Liu discloses “learn” “topic embeddings”, and thus suggests training topic vectors.  This is done according to word vectors of the respective word materials (“initialize each topic vector with the average over all words”).  The topics are corresponding to topic identifiers as shown above in Liu, Intro Para 5, where the topic identifiers are found using LDA.)

As per claim 6, claim 6 is a device claim corresponding to method claim 1.  The difference is that the system claim recites a memory and one or more processors.  (Starr, Para [0151], discloses “Embodiments of the present disclosure may comprise or utilize a special purpose or general-purpose computer including computer hardware, such as, for example, one or more processors and system memory, as discussed in greater detail below”).  Claim 6 is rejected for the same reasons as claim 1.

As per claim 7, claim 7 is a device claim corresponding to method claim 2.  The difference is that the system claim recites a memory and one or more processors.  Claim 7 is rejected for the same reasons as claim 2.

As per claim 9, claim 9 is a device claim corresponding to method claim 4.  The difference is that the system claim recites a memory and one or more processors.  Claim 9 is rejected for the same reasons as claim 4.

As per claim 10, claim 10 is a device claim corresponding to method claim 4.  The difference is that the system claim recites a memory and one or more processors.  Claim 10 is rejected for the same reasons as claim 4.

As per claim 11, claim 11 is a non-transitory computer readable medium claim corresponding to method claim 1.  The difference is that the non-transitory computer readable medium claim recites a non-transitory computer readable medium and a processor.  (Starr, Para [0151], discloses “In general, a processor (e.g., a microprocessor) receives instructions, from a non-transitory computer-readable medium, (e.g., a memory, etc.), and executes those instructions, thereby performing one or more processes, including one or more of the processes described herein”).  Claim 11 is rejected for the same reasons as claim 1.

As per claim 12, claim 12 is a non-transitory computer readable medium claim corresponding to method claim 2.  The difference is that the non-transitory computer readable medium claim recites a non-transitory computer readable medium and a processor.  Claim 12 is rejected for the same reasons as claim 2.

As per claim 14, claim 14 is a non-transitory computer readable medium claim corresponding to method claim 4.  The difference is that the non-transitory computer readable medium claim recites a non-transitory computer readable medium and a processor.  Claim 14 is rejected for the same reasons as claim 4.

As per claim 15, claim 15 is a non-transitory computer readable medium claim corresponding to method claim 5.  The difference is that the non-transitory computer readable medium claim recites a non-transitory computer readable medium and a processor.  Claim 15 is rejected for the same reasons as claim 5.

As per Claim 16, the combination of Starr, Niu, and Liu teaches the method according to claim 1.  Niu teaches wherein the correlation between the word vectors of the respective words and the topic vectors of the respective topics in the multiple topics comprises the cosine distances between the word vectors of the respective words and the topic vectors of the respective topics in the multiple topics. (Niu, Section 4.2, last bullet, discloses:  “Topic2Vec: topics and words are equally represented as the low-dimensional vectors, we can immediately calculate the cosine similarity between words and topics. For each topic, we select higher similarity words.”)

As per claim 17, claim 17 is a device claim corresponding to method claim 16.  The difference is that the system claim recites a memory and one or more processors.  Claim 17 is rejected for the same reasons as claim 16.

As per claim 18, claim 18 is a non-transitory computer readable medium claim corresponding to method claim 16.  The difference is that the non-transitory computer readable medium claim recites a non-transitory computer readable medium and a processor.  Claim 18 is rejected for the same reasons as claim 16.

Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Starr in view of Niu and Liu, further in view of Goldberg et. al. (“word2vec Explained: Deriving Mikolov et al.’s Negative-Sampling Word-Embedding Method”; hereinafter “Goldberg”).
As per Claim 19, the combination of Starr, Niu, and Liu teaches the method according to claim 1 and training topic vectors of each topic (see Rejection to Claim 1).  However, the combination of Starr, Niu, and Liu does not teach wherein the training topic vectors of respective topics comprises: training topic vector of each topic with positive word materials belonging to the topic and negative word materials not belonging to the topic, to make the 
Goldberg teaches wherein the training [topic] word vectors of respective [topics] words comprises: training [topic] word vector of each [topic] word with positive word materials belonging to the [topic] word and negative word materials not belonging to the [topic] word, to make the correlation between the [topic] word vector and word vectors of the positive word materials greater than or equal to a preset correlation threshold and/or the correlation between the [topic] word vector and word vectors of the negative word materials smaller than the preset correlation threshold.  (Recall above that Niu and Liu teach topic vectors.  Goldberg teaches using both positive and negative examples to train word vectors.  Goldberg, pg 3, discloses:  “We need a mechanism that prevents all the vectors from having the same value, by disallowing some (w, c) combinations. One way to do so, is to present the model with some (w, c) pairs for which p(D = 1|w, c; θ) must be low, i.e. pairs which are not in the data. This is achieved by generating the set D′ of random (w, c) pairs, assuming they are all incorrect (the name “negative sampling” stems from the set D′ of randomly sampled negative examples). The optimization objective now becomes:”

    PNG
    media_image1.png
    189
    513
    media_image1.png
    Greyscale

Here, Goldberg teaches learning a word vector based on its context of surrounding words, as detailed in Goldberg Section 3:  “Generally speaking, for a sentence of n words w1, . . . , wn, contexts of a word wi comes from a window of size k around the word: C(w) = wi−k, . . . , wi−1, wi+1, . . . , wi+k, where k is a parameter.”  Goldberg above teaches that there are some positive examples and some negative examples (“generating the set D′ of random (w, c) pairs, assuming they are all incorrect”).  When this concept is applied to Liu’s learning of topic vectors based on already learned word vectors, the result is that one trains the topic vector based on positive examples where the word is part of the topic, and negative examples where the word is not part of the topic.)
Niu, Liu, and Goldberg are analogous art because they are all in the same field of endeavor of machine learning and word embeddings.  
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the word and topic embeddings of Niu and Liu, with the negative sampling of Goldberg.  Negative sampling is well-known in the art, and the resulting combination would positively weight good word-topic associations and negatively weight bad word-topic associations.  One would have been motivated to do so to produce more accurate embeddings (Goldberg, Section 4:  “Why does this produce good word representations?  Good question. We don’t really know. The distributional hypothesis states that words in similar contexts have similar meanings. The objective above clearly tries to increase the quantity vw · vc for good word-context pairs, and decrease it for bad ones. Intuitively, this means that words that share many contexts will be similar to each other (note also that contexts sharing many words will also be similar to each other). This is, however, very 
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEONARD A SIEGER whose telephone number is (571)272-9710.  The examiner can normally be reached on M-F 8:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/L.A.S./Examiner, Art Unit 2126 

/NICHOLAS KLICOS/Primary Examiner, Art Unit 2145