Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed on 06/13/2019.
Information Disclosure Statement
The information disclosure statements (IDS) submitted on 08/19/2019 and 01/16/2020 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Specification
The disclosure is objected to because of the following informalities: "based on date generated" should be changed to read "based on data generated" on Page 5 Lines 25.  
Appropriate correction is required.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 6, 7, 8, 13, 14, 15, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Zwicklbauer et. al. (“Robust and Collective Entity Disambiguation through Semantic Embeddings”, hereinafter Zwicklbauer) in view of Amiri et. al. (“Learning Text Pair Similarity with Context-sensitive Autoencoders”; hereinafter Amiri).
As per Claim 1, Zwicklbauer teaches a text processing method based on ambiguous entity words, comprising (Zwicklbauer, Abstract, discloses “Entity disambiguation is the task of mapping ambiguous terms in natural-language text to its entities in a knowledge base”)
obtaining a context of a text to be disambiguated and at least two candidate entities represented by the text to be disambiguated, wherein the at least two candidate entities have different semantics (Zwicklbauer, Intro, discloses “Entity disambiguation refers to the task of linking phrases in a text, also called surface forms, to a set of candidate meanings, referred to as the knowledge base (KB), by resolving the correct semantic meaning of the surface form”.  Here we have “phrases in a text, also called surface forms” (i.e., context of a text), to be disambiguated (“resolving the correct semantic meaning of the surface form”) from among a “set of candidate meanings” (i.e. at least two candidate entities)).
generating a semantic vector of the context based on a trained word vector model (Zwicklbauer, Sec 5.1 Criteria 2, discloses “Context Similarity: We select the top x entities ranked by their context matching. To this end, we compute the cosine similarity between the entity-context embeddings and the Doc2Vec inferred context vector of the surface form.” Examiner’s Note:  Here, a semantic vector of the context is disclosed (“context vector of the surface form”), where the “surface form” is the context.  This is based on a trained word vector model, as “Doc2Vec” is disclosed.  Doc2Vec is well-known in the art, as an extension of the well-known Word2Vec algorithm, which is a trained word vector model.)
determining a similarity between the context and each candidate entity according to the semantic vector of the context and the first entity vector of each of the at least two candidate entities (Zwicklbauer, 5.1 Criteria 2, discloses “We select the top x entities ranked by their context matching. To this end, we compute the cosine similarity between the entity-context embeddings and the Doc2Vec inferred context vector of the surface form.”  Zwicklbauer, Sec 6.1 Paragraph 2 Lines 4-6, discloses “To create the Doc2Vec entity-context embeddings, we parse the entities’ Wikipedia pages and remove all Wikipedia syntax elements as well as tables.” Examiner’s Note: Here, a “cosine similarity” is determined between the “context vector of the surface form” (i.e., semantic vector of the context) and the “entity-context embeddings” (i.e., each candidate entity).  The “entity context embeddings” are disclosed to be Doc2Vec (i.e., first entity vector of each of the at least two candidate entities).  Also, note that examiner is considering “candidate” in the generic sense, not in the specific sense of Zwicklbauer, who uses the similarity measure to find “candidates” for further processing.  Some of these entity-context embeddings will become future “candidates” of Zwicklbauer, and the similarity is being performed on them, regardless of whether or not Zwicklbauer considers them to be “candidates” at this point in time.)
and determining a target entity represented by the text to be disambiguated in the context from the at least two candidate entities according to the similarity between the context and each candidate entity. (Zwicklbauer, Sec 5.2 Final Paragraph, discloses “After constructing the disambiguation graph, we apply the PR algorithm and compute a relevance score for each entity candidate. Depending on the disambiguation task, our approach decides which entity candidate is the correct target entity or abstains if no appropriate candidate is available (cf. Algorithm 1).” Here, Zwicklbauer calculates a relevance score for each candidate.  In order for Zwicklbauer to even consider an entity as a candidate, the cosine similarity has to meet a given threshold.  Therefore, deciding the correct target entity is based on the similarity between the context and each candidate entity.)
However, Zwicklbauer does not explicitly teach generating a first entity vector of each of the at least two candidate entities based on a trained unsupervised neural network model, wherein text semantics of respective entities and a relationship between entities have been learned by the unsupervised neural network model.
Amiri teaches generating a first entity vector of each of the at least two candidate entities based on a trained unsupervised neural network model, wherein text semantics of (Amiri, Abstract, discloses “We present a pairwise context-sensitive Autoencoder for computing text pair similarity. Our model encodes input text into context-sensitive representations and uses them to compute similarity between text pairs. Our model outperforms the state-of-the-art models in two semantic retrieval tasks and a contextual word similarity task. For retrieval, our unsupervised approach that merely ranks inputs with respect to the cosine similarity between their hidden representations shows comparable performance with the state-of-the-art supervised models and in some cases outperforms them.”  Examiner’s Note:  Amiri discloses an “autoencoder”, which is well known in the art as a type of unsupervised neural network, and Amiri discloses “our unsupervised approach”.  It is used to compute similarity between text pairs (i.e., two candidate entities).  The model has learned “semantic retrieval tasks” (i.e., semantics of respective entities) and “contextual word similarity task” (i.e., relationship between entities).  Amiri, Section 2.1, goes on to disclose “Autoencoders are trained using a local unsupervised criterion (Vincent et al., 2010; Hinton and Salakhutdinov, 2006; Vincent et al., 2008). Specifically, the basic autoencoder in Figure 1(a) locally optimizes the hidden representation h of its input x such that h can be used to accurately reconstruct x”.  Here, h is a lower dimensional vector (i.e., first entity vector of the at least two candidate entries) than the input dimension.)
Zwicklbauer and Amiri are analogous art because they are in the field of endeavor of natural language processing. 
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the entity linking of Zwicklbauer, with the context sensitive autoencoder of 

As per Claim 6, the combination of Zwicklbauer and Amiri teaches the method according to claim 1 as shown above, as well as before generating the semantic vector of the context based on the trained word vector model, further comprising: generating training corpus corresponding to various application scenes (Amiri, Section 4.1, discloses a training corpus for 3 different application scenes: “We use three datasets: “SCWS” a word similarity dataset with ground-truth labels on similarity of pairs of target words in sentential context from Huang et al. (2012); “qAns” a TREC QA dataset with ground-truth labels for semantically relevant questions and (single-sentence) answers from Wang et al. (2007); and “qSim” a community QA dataset crawled from Stack Exchange with ground-truth labels for semantically equivalent questions from Dos Santos et al. (2015).”)
and performing word vector model training by using the training corpus corresponding to various application scenes, to obtain word vector models respectively applicable to various application scenes (Amiri, Section 4.1, discloses obtaining word vector models in specific ways that are applicable to the three application scenes: “We consider local and global context for target words in SCWS. The local context of a target word is its ten neighboring words (five before and five after) (Huang et al., 2012), and its global context is a short paragraph that contains the target word (surrounding sentences). We compute average word embeddings to create context vectors for target words. Also, we consider question title and body and answer text as input in qSim and qAns and use NMF to create global context vectors for questions and answers”)

As per Claim 7, the combination of Zwicklbauer and Amiri teaches the method according to claim 1 as shown above, as well as after generating the first entity vector of each of the at least two candidate entities, further comprising: determining a similarity between each two different candidate entities based on the first entity vector of each of the at least two candidate entities (Amiri, Section 3, discloses determining a similarity between each two different candidate entities based on their first entity vectors h1n and h2n: “In unsupervised settings, given a pair of input texts with their corresponding context vectors, (x1,cx1 ) and (x2,cx2 ), we determine their semantic similarity score by computing the cosine similarity between their hidden representations h1n and h2n respectively.”)
and performing entity relationship mining or entity recommendation based on the similarity between each two different candidate entities. (Amiri, Section 4.4, discloses using this similarity (“relevance score”) to make an entity recommendation (“retrieve correct answers from a set of candidates”):  “We evaluate the performance of our model in the answer ranking task in which a model should retrieve correct answers from a set of candidates for test questions. For this evaluation, we rank answers with respect to each test question according to the “relevance score” between question and each answer.”)

As per Claim 8, Claim 8 is a device claim corresponding to method claim 1.  The difference is that it recites one or more processors and a memory.  (Zwicklbauer, Section 6.1 top of page 430, discloses one or more processors and a memory: “The Word2Vec training time takes 90 minutes on our personal computer with a 4x3.4GHz Intel Core i7 processor and 16 GB RAM (1 corpus iteration)”).  Claim 8 is rejected for the same reasons as Claim 1.

As per Claim 13, Claim 13 is a device claim corresponding to method claim 6.  The difference is that it recites one or more processors and a memory.  Claim 13 is rejected for the same reasons as Claim 6.

As per Claim 14, Claim 14 is a device claim corresponding to method claim 7.  The difference is that it recites one or more processors and a memory.  Claim 14 is rejected for the same reasons as Claim 7.

As per Claim 15, Claim 15 is a non-transitory computer readable storage medium claim corresponding to method claim 1.  The difference is that it recites a non-transitory computer readable storage medium and a processor.  (Zwicklbauer, Section 6.1 top of page 430, discloses a non-transitory computer readable storage medium and a processor: “The Word2Vec training time takes 90 minutes on our personal computer with a 4x3.4GHz Intel Core i7 processor and 16 GB RAM (1 corpus iteration)”).  Claim 15 is rejected for the same reasons as Claim 1.

As per Claim 20, Claim 20 is a non-transitory computer readable storage medium claim corresponding to method claim 6.  The difference is that it recites a non-transitory computer readable storage medium and a processor.  Claim 20 is rejected for the same reasons as Claim 6.

Claims 2, 5, 9, 12, 16, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Zwicklbauer and Amiri further in view of Mikolov et. al. (“Distributed Representations of Words and Phrases and their Compositionality”; hereinafter Mikolov).

As per Claim 2, the combination of Zwicklbauer and Amiri teaches the method according to claim 1 as shown above, as well as before generating the first entity vector of each of the at least two candidate entities based on the trained unsupervised neural network model, further comprising:
generating a second entity vector of each entity in a preset knowledge base by using a trained [supervised] neural network model, wherein semantics of respective entities have been learned by the supervised neural network mode (Amiri, Intro Para 2-3, discloses “We represent context information as low dimensional vectors that will be injected to deep autoencoders. To the best of our knowledge, this is the first work that enables integrating context into autoencoders. In representation learning, context may appear in various forms. For example, the context of a current sentence in a document could be either its neighboring sentences (Lin et al., 2015; Wang and Cho, 2015), topics associated with the sentence (Mikolov and Zweig, 2012; Le and Mikolov, 2014), the document that contains the sentence (Huang et al., 2012), as well as their combinations (Ji et al., 2016).”  Examiner’s Note:  Here, Amiri discloses representing “context information as low dimensional vectors” (i.e., a second entity vector of each entity).  Amiri goes on to disclose that the context “may appear in various forms”.  One of the forms suggested is Doc2Vec, as Amiri recites “topics associated with the sentence…Le and Mikolov, 2014”.  Doc2Vec uses a neural network, therefore the second entity vector of each entity is generated using a trained neural network model.  Doc2Vec in its default form is unsupervised, but its author Mikolov provides a supervised addition to this suite of tools which will be described below.  Amiri, Section 4.1, also discloses the entities coming from a knowledge base:  “We use three datasets: “SCWS” a word similarity dataset with ground-truth labels on similarity of pairs of target words in sentential context from Huang et al. (2012); “qAns” a TREC QA dataset with ground-truth labels for semantically relevant questions and (single-sentence) answers from Wang et al. (2007); and “qSim” a community QA dataset crawled from Stack Exchange with ground-truth labels for semantically equivalent questions from Dos Santos et al. (2015). Table 1 shows statistics of these datasets. To enable direct comparison with previous work, we use the same training, development, and test data provided by Dos Santos et al. (2015) and Wang et al. (2007) for qSim and qAns respectively and the entire data of SCWS (in unsupervised setting).”)
initializing first entity vectors of respective entities output by the unsupervised neural network model based on the second entity vector of each entity in the preset knowledge base (Amiri, Intro, discloses “We represent context information as low dimensional vectors that will be injected to deep autoencoders. To the best of our knowledge, this is the first work that enables integrating context into autoencoders.”  Here Amiri discloses that “low dimensional vectors” (i.e., second entity vector of each entity) are injected to deep autoencoders (i.e., unsupervised neural network model)).
and training the initialized unsupervised neural network model based on an association relationship between entities.  (Amiri, Abstract, discloses “We present a pairwise context-sensitive Autoencoder for computing text pair similarity. Our model encodes input text into context-sensitive representations and uses them to compute similarity between text pairs”.  Here, Amiri discloses training an autoencoder (i.e., unsupervised neural network model) based on similarity between text pairs (i.e., association relationship between entities))”
However, the combination of Zwicklbauer and Amiri does not explicitly teach that the neural network used to generate a second entity vector of each entity is supervised.
Mikolov teaches that the neural network used to generate a second entity vector of each entity is supervised. (As discussed above, Amiri discloses using Mikolov’s Doc2Vec algorithm (based on the Word2Vec algorithm) to produce the second entity vector of each entity.  Mikolov, Section 2.2, discloses “We define Negative sampling (NEG) by the objective (Eq 4) which is used to replace every log P(wO|wI ) term in the Skip-gram objective. Thus the task is to distinguish the target word wO from draws from the noise distribution Pn(w) using logistic regression, where there are k negative samples for each data sample”  Examiner’s Note:  The “negative sampling” disclosed by Mikolov is supervised process, using data that is labelled as being “noise” that should generate a negative result from the algorithm.  This labelled data is used to train the neural network, and is therefore a supervised process.)
Zwicklbauer, Amiri, and Mikolov are analogous art because they are in the field of endeavor of natural language processing. 
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the entity linking with autoencoder of the combination of Zwicklbauer and Amiri, with the negative sampling of Mikolov. The modification would have been obvious because one of ordinary skill in the art would be motivated to improve the accuracy of word representations (Mikolov, Section 3 Para 2 Lines 5-8, discloses “The table shows that Negative Sampling outperforms the Hierarchical Softmax on the analogical reasoning task, and has even slightly better performance than the Noise Contrastive Estimation. The subsampling of the frequent words improves the training speed several times and makes the word representations significantly more accurate.”

As per Claim 5, the combination of Zwicklbauer, Amiri, and Mikolov teaches the method according to claim 2 as shown above, as well as wherein training the initialized unsupervised neural network model based on an association relationship between entities comprises: training the initialized unsupervised neural network model based on entities in the knowledge base that have the association relationship, and/or based on entities in a search log that have a co-occurrence relationship (Amiri, Abstract, discloses “We present a pairwise context-sensitive Autoencoder for computing text pair similarity. Our model encodes input text into context-sensitive representations and uses them to compute similarity between text pairs.”  Here, Amiri trains an autoencoder (i.e., unsupervised neural network model) based on similarity between text pairs (i.e., entities that have an association relationship).  Amiri, Section 4.1, discloses at least one knowledge base: “We use three datasets: “SCWS” a word similarity dataset with ground-truth labels on similarity of pairs of target words in sentential context from Huang et al. (2012); “qAns” a TREC QA dataset with ground-truth labels for semantically relevant questions and (single-sentence) answers from Wang et al. (2007); and “qSim” a community QA dataset crawled from Stack Exchange with ground-truth labels for semantically equivalent questions from Dos Santos et al. (2015). Table 1 shows statistics of these datasets. To enable direct comparison with previous work, we use the same training, development, and test data provided by Dos Santos et al. (2015) andWang et al. (2007) for qSim and qAns respectively and the entire data of SCWS (in unsupervised setting).”)
and determining that training the unsupervised neural network model is finished when a distance between first entity vectors output by the unsupervised neural network model corresponds to a closeness between entities.  (Amiri, Abstract, discloses “We present a pairwise context-sensitive Autoencoder for computing text pair similarity”, where Autoencoder is an unsupervised neural network.  Amiri, Section 3, discloses “In unsupervised settings, given a pair of input texts with their corresponding context vectors, (x1,cx1 ) and (x2,cx2 ), we determine their semantic similarity score by computing the cosine similarity between their hidden representations h1n and h2n respectively”.   Here, the “hidden representations” are first entity vectors output by the unsupervised neural network.  Amiri discloses that training is finished when a distance between the vectors (“cosine similarity”) indicates closeness between the entities, as in the Abstract, this is the stated goal of the neural network:  “Autoencoder for computing text pair similarity”.)

As per Claim 9, Claim 9 is a device claim corresponding to method claim 2.  The difference is that it recites one or more processors and a memory.  (Zwicklbauer, Section 6.1 top of page 430, discloses one or more processors and a memory: “The Word2Vec training time takes 90 minutes on our personal computer with a 4x3.4GHz Intel Core i7 processor and 16 GB RAM (1 corpus iteration)”).  Claim 9 is rejected for the same reasons as Claim 2.

As per Claim 12, Claim 12 is a device claim corresponding to method claim 5.  The difference is that it recites one or more processors and a memory.  Claim 12 is rejected for the same reasons as Claim 5.

As per Claim 16, Claim 16 is a non-transitory computer readable storage medium claim corresponding to method claim 2.  The difference is that it recites a non-transitory computer readable storage medium and a processor.  (Zwicklbauer, Section 6.1 top of page 430, discloses a non-transitory computer readable storage medium and a processor: “The Word2Vec training time takes 90 minutes on our personal computer with a 4x3.4GHz Intel Core i7 processor and 16 GB RAM (1 corpus iteration)”).  Claim 16 is rejected for the same reasons as Claim 2.

As per Claim 19, Claim 19 is a non-transitory computer readable storage medium claim corresponding to method claim 5.  The difference is that it recites a non-transitory computer readable storage medium and a processor.  Claim 19 is rejected for the same reasons as Claim 5.

Claims 3, 4, 10, 11, 17, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Zwicklbauer, Amiri, and Mikolov further in view of Bilgin et. al. (“Sentiment Analysis on Twitter data with Semi-Supervised Doc2Vec”; hereinafter Bilgin).
As per Claim 3, the combination of Zwicklbauer, Amiri, and Mikolov teaches the method according to claim 2 as shown above, as well as before generating the second entity vector of each entity in the preset knowledge base by using the trained supervised neural network model, further comprising:
generating negative examples of the training samples based on all text description information of respective entities in the knowledge base (Mikolov, Section 2.2, discloses “there are k negative samples for each data sample”.  Mikolov, Section 2.3, discloses “In very large corpora, the most frequent words can easily occur hundreds of millions of times (e.g., “in”, “the”, and “a”). Such words usually provide less information value than the rare words. We chose this subsampling formula because it aggressively subsamples words whose frequency is greater than t while preserving the ranking of the frequencies”.  Here, Mikolov discloses generating negative samples (“k negative samples…subsampling formula”) based on “very large corpora”.  A corpora can be considered a type of knowledge base, and Mikolov is sampling words from the entire corpora (i.e., based on all text description information of respective entities in the knowledge base)).
	However, the combination of Zwicklbauer, Amiri, and Mikolov does not explicitly teach generating positive examples of training samples based on an attribute of each entity in the knowledge base and a keyword extracted from introduction information of each entity; training the supervised neural network model based on the training samples, wherein the supervised neural network model comprises a first layer configured to input a semantic vector of text in the training samples and a second layer configured to predict an entity described by the text input in the first layer, and a parameter of the second layer configured to generate the second entity vector; and determining that training the supervised neural network model is finished when the text in the training samples input to the first layer is configured to describe an entity output by the second layer
	Bilgin teaches generating positive examples of training samples based on an attribute of each entity in the knowledge base and a keyword extracted from introduction information of each entity (Bilgin, Abstract, discloses “In this study, it was aimed to perform sentiment analysis on Turkish and English Twitter messages using Doc2Vec. The Doc2Vec algorithm was run on Positive, Negative and Neutral tagged data using the Semi-Supervised learning method and the results were recorded. The Doc2Vec algorithm was run on Positive, Negative and Neutral tagged data using the Semi-Supervised learning method and the results were recorded”.  Here, Doc2Vec is used with a semi-supervised learning method.  The positive examples are based off a tag (i.e., attribute) of a collection of Twitter messages (i.e., entities in a knowledge base), the knowledge base disclosed by Bilgin in Sec II D:  “English data set contains 1774 unlabelled and 58817 labeled data sets.”  The “Positive”, “Negative”, and “Neutral” tags can be both considered an attribute and a keyword extracted from introduction information.  The sentiment of a Twitter message can be broadly considered introduction information, i.e. “the following message will be a positive message”).  Also note that while “positive” in Bilgin refers to sentiment of text, it can also be interpreted as in the instant application where “positive” means correlation with an entity.  One can see this also applies with Bilgin, as a tweet with positive sentiment has a correlation with an entity (i.e. “positive tweet”).
	training the supervised neural network model based on the training samples, wherein the supervised neural network model comprises a first layer configured to input a semantic vector of text in the training samples and a second layer configured to predict an entity described by the text input in the first layer, and a parameter of the second layer configured to generate the second entity vector (Bilgin, Section II B Doc2Vec, discloses Figure 3.  In Figure 3, one can see the DM embodiment of the Doc2Vec algorithm with, wherein the input layer consists of a Document ID as well as a plurality of words denoted by V(t +/- n).  These words may or may not individually be pre-trained Word2Vec vectors which have learned semantic information, as Bilgin does not say.  However, if they are not, and are simply one-hot representations of words, the entire plurality of words being input to the neural network comprises a vector input to the neural network [word1 word2 word3…wordn], which inherently comprises semantic information about the document, which is the underlying goal of Doc2Vec – to learn an overall semantic representation of a block of text.  The output of Doc2Vec is another vector (i.e., second entity vector).  In Bilgin, this is to predict the overall sentiment, positive or negative (i.e. predict an entity “positive post” or “negative post”) described by a tweet (i.e., the text input)).
	determining that training the supervised neural network model is finished when the text in the training samples input to the first layer is configured to describe an entity output by the second layer (Bilgin, Section II A, discloses “It is one of the learning methods used in machine
learning. The input data comprises large amount of unlabeled data and small quantities of labeled data. This method is generally useful when the labeled data is low and the unlabeled data is readily available. Semi-Supervised learning is seen in Figure 1.”  Here, Bilgin discloses that learning is semi-supervised, the supervision accomplished by labelled data.  The label on the labelled data is the sentiment (positive, negative, neutral).  Supervised means that the labels are used to accomplish training, and therefore training must accomplish the task of sufficiently assigning these labels (i.e. determining that training is finished when the text input in the first layer is configured to describe an entity output in the second layer).  Bilgin discloses that correctly classified samples is the end goal of the trained neural network in Section III “Accuracy metric, the most popular and simple method used to measure model performance, is the accuracy rate of the model. The number of correctly classified samples (TP + TN) is the ratio of the total sample counts (TP + TN + FP + FN). The error rate is 1 of this value. In other words, the number of misclassified samples (FP + FN) is the ratio of the total number of samples (TP + TN + FP + FN)”)
Zwicklbauer, Amiri, Mikolov, and Bilgin are analogous art because they are in the field of endeavor of natural language processing. 


As per Claim 4, the combination of Zwicklbauer, Amiri, Mikolov, and Bilgin teaches the method according to claim 3 as shown above, as well as wherein generating the negative examples of the training samples based on all text description information of respective entities in the knowledge base comprises: performing word-segmentation on all text description information of respective entities in the knowledge base, and performing term frequency statistics on terms obtained by the word-segmentation; and performing negative sampling on each term based on a term frequency of each term, to obtain the negative examples of the training samples. (Mikolov, Section 2.2, discloses negative sampling:  “Thus the task is to distinguish the target word wO from draws from the noise distribution Pn(w) using logistic regression, where there are k negative samples for each data sample.”  Mikolov, Section 2.3, then goes on to describe the selection of words for negative sampling:  “In very large corpora, the most frequent words can easily occur hundreds of millions of times (e.g., “in”, “the”, and “a”). Such words usually provide less information value than the rare words. To counter the imbalance between the rare and frequent words, we used a simple subsampling approach: (Eq. 5) each word wi in the training set is discarded with probability computed by the formula where f(wi) is the frequency of word wi and t is a chosen threshold, typically around 10−5. We chose this subsampling formula because it aggressively subsamples words whose frequency is greater than t while preserving the ranking of the frequencies.”  Here, Mikolov discloses “each word wi” (i.e., word segmentation) “in very large corpora” (i.e., all text description information of respective entities in the knowledge base) using the function “where f(wi) is the frequency of word wi” (i.e., performing term frequency statistics on terms obtained by the word segmentation).  Mikolov also discloses “subsamples words whose frequency is greater than t while preserving the ranking of the frequencies” (i.e., perform negative sampling on each term based on term frequency to obtain the negative examples)).

As per Claim 10, Claim 10 is a device claim corresponding to method claim 3.  The difference is that it recites one or more processors and a memory.  (Zwicklbauer, Section 6.1 top of page 430, discloses one or more processors and a memory: “The Word2Vec training time takes 90 minutes on our personal computer with a 4x3.4GHz Intel Core i7 processor and 16 GB RAM (1 corpus iteration)”).  Claim 10 is rejected for the same reasons as Claim 3.

As per Claim 11, Claim 11 is a device claim corresponding to method claim 4.  The difference is that it recites one or more processors and a memory.  Claim 11 is rejected for the same reasons as Claim 4.

As per Claim 17, Claim 17 is a non-transitory computer readable storage medium claim corresponding to method claim 3.  The difference is that it recites a non-transitory computer readable storage medium and a processor.  (Zwicklbauer, Section 6.1 top of page 430, discloses a non-transitory computer readable storage medium and a processor: “The Word2Vec training time takes 90 minutes on our personal computer with a 4x3.4GHz Intel Core i7 processor and 16 GB RAM (1 corpus iteration)”).  Claim 17 is rejected for the same reasons as Claim 3.

As per Claim 18, Claim 18 is a non-transitory computer readable storage medium claim corresponding to method claim 4.  The difference is that it recites a non-transitory computer readable storage medium and a processor.  Claim 18 is rejected for the same reasons as Claim 4.

	Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Zhang et. al. (CN 107102989 A) discloses a disambiguation method based on semantic feature vectors constructed according to text information in a knowledge base, wherein a target entity is chosen by the candidate with the maximum similarity based on cosine similarity.
Perianin et. al. ("Exploiting Synonymy and Hypernymy to Learn Efficient Meaning Representations") discloses using an unsupervised neural network (Autoencoder) to learn representations for each meaning of a word.
Francis-Landau et. al. ("Capturing Semantic Similarity for Entity Linking with Convolutional Neural Networks") discloses inputting Word2Vec vectors into a Convolutional Neural Network to produce vectors that capture semantic correspondence between a context and a target entity.
Meij et. al. (US 2016/0189047 A1) discloses entity linking using Word2Vec vectors with negative sampling and calculating cosine distance between vectors to identify the target entity 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEONARD A SIEGER whose telephone number is (571)272-9710.  The examiner can normally be reached on M-F 8:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571) 272-9767.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.





/L.A.S./Examiner, Art Unit 2126                                                                                                                                                                                                        

/ANN J LO/Supervisory Patent Examiner, Art Unit 2126