DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 2021-10-14 has been entered.  The status of the claims is as follows:
Claims 1, 3-8, 10-15, and 17-20 remain pending in the application.
Claims 2, 9, and 16 are cancelled.
Claims 1, 3, 6-8, 10, 12-15, 17, and 19-20 are amended.
Examiner Notes on Understanding of the Invention
	As part of the RCE process, Examiner has again read the Specification several times as with the first action.  Examiner believes it useful to present Examiner’s understanding of the current invention.  The Specification, in several places states that “at least two candidate entities are input to the trained unsupervised neural network model”, but it is not clear in what form these candidate entities “inputs” are.  Since a “semantic vector of the context” is input to the unsupervised neural network (hereinafter “UNN”) model to produce a first entity vector of the context of the text to be disambiguated, and Examiner assumes that inputs to the UNN 





    PNG
    media_image1.png
    1431
    709
    media_image1.png
    Greyscale

Response to Arguments Regarding 35 U.S.C. 103
Applicant’s argument on Pg. 15 that “Zwicklbauer does not disclose training an unsupervised neural network”, and “does not learn the relationship between respective entities”, therefore “does not disclose how to obtain the unsupervised neural network model in which the text semantics of respective entities and the relationship between respective entities have been learned” has been fully considered but is not persuasive.  Examiner has not alleged that Zwicklbauer taught these limitations, as the combination with Amiri was relied upon to teach the unsupervised neural network that learns the relationship between respective entities.  One cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).
Applicant’s arguments on Pg. 15-16 detailing the steps of Zwicklbauer and concluding that “However, Zwicklbauer's above steps for calculating the similarity do not involve any feature for inputting both the context vector and the entity vector to the same unsupervised neural network (the unsupervised neural network has learned the text semantics of respective entities and the relationship between the entities) to calculate the similarity in the same vector space” has been fully considered but is not persuasive.  Examiner has not alleged that Zwicklbauer taught these limitations alone, as the combination with Amiri to teach inputting vectors to the unsupervised neural network.  One cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 
Applicant’s argument on Page 16-17 that “Amiri does not consider the semantic disambiguation between at least two words having different semantics” has been fully considered but is not persuasive.  Examiner has not alleged that Amiri taught this limitation alone, as this was established by Zwicklbauer.  Zwicklbauer discloses disambiguation of a polysemous word between at least two words having different semantics, and Amiri discloses an unsupervised neural network that can reveal similarity between two words.  The combination of these arts was relied upon to read upon instant claim 1.  One cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).
Applicant’s argument on Pages 17-18 that “although both the present invention and Amiri uses the Doc2vec technology, the established neural network models are completely different” has been fully considered, but is not persuasive.  In response to applicant's argument that the references fail to show certain features of applicant's invention, it is noted that the features upon which applicant relies (i.e., a structure of the claimed supervised neural network that distinguishes it over the prior art of record) are not recited in the rejected claim(s).  Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).
Applicant’s argument on Page 18 that, regarding Examiner’s mapping for “knowledge base” from Amiri, “It is obvious that, none of them are entity knowledge base. On the other hand, as described in paragraph [0048] of the Specification, "each entity in the knowledge base entity knowledge base”.  The requirements cited by Applicant are even met by Amiri’s data sets, which can reasonably be interpreted as having attributes and attribute values (labels, questions, answers, etc.).   Also the term “knowledge base” may be broadly interpreted as “a store of information or data that is available to draw on”. The claim merely recites “each entity in a preset knowledge base”, and Applicant does not specify any special closed definition for the term “knowledge base” in the instant Specification.  Nevertheless, in the rejections below for Claim 1, Examiner has also cited Zwicklbauer’s explicit recitation of a knowledge base in addition to Amiri to strengthen the rejection, although this was not strictly necessary.
Applicant’s argument on Pages 18-19 that “the ‘low dimension vector’ disclosed by Amiri is a vector representing context information rather than an entity vector” has been fully considered but is not persuasive.  Examiner respectfully points out that Amiri’s “context information as low dimensional vectors” is a context of a word (which is an entity), and can thus be considered an “entity vector”.  In fact, Amiri discloses on Page 1882 Intro one possible way of producing this “entity vector” is by using Doc2Vec: “In representation learning, context may appear in various forms. For example, the context of a current sentence in a document could be either its neighboring sentences (Lin et al., 2015; Wang and Cho, 2015), topics associated with the sentence (Mikolov and Zweig, 2012; Le and Mikolov, 2014).”  The Le and Mikolov reference is Doc2Vec.  Doc2Vec is the same way that the instant specification is producing the “second 
Drawings
Color photographs and color drawings are not accepted in utility applications unless a petition filed under 37 CFR 1.84(a)(2) is granted. Any such petition must be accompanied by the appropriate fee set forth in 37 CFR 1.17(h), one set of color drawings or color photographs, as appropriate, if submitted via EFS-Web or three sets of color drawings or color photographs, as appropriate, if not submitted via EFS-Web, and, unless already present, an amendment to include the following language as the first paragraph of the brief description of the drawings section of the specification:
The patent or application file contains at least one drawing executed in color. Figure 4 of the drawings contains gray shading, rather than strictly black and white.  Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
Color photographs will be accepted if the conditions for accepting color drawings and black and white photographs have been satisfied. See 37 CFR 1.84(b)(2).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the 

Claims 1, 5-8, 12-15, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Zwicklbauer et. al. (“Robust and Collective Entity Disambiguation through Semantic Embeddings”, hereinafter “Zwicklbauer”) in view of Amiri et. al. (“Learning Text Pair Similarity with Context-sensitive Autoencoders”; hereinafter “Amiri”) and Mikolov et. al. (“Distributed Representations of Words and Phrases and their Compositionality”; hereinafter “Mikolov”).
As per Claim 1, Zwicklbauer teaches a text processing method based on ambiguous entity words, comprising
obtaining a context of a text to be disambiguated and at least two candidate entities represented by the text to be disambiguated, wherein the at least two candidate entities have different semantics (Zwicklbauer, Pg 425 Intro, begins:  “Entity disambiguation refers to the task of linking phrases in a text, also called surface forms, to a set of candidate meanings, referred to as the knowledge base (KB), by resolving the correct semantic meaning of the surface form.” Here, Zwicklbauer discloses text (“phrases in a text, also called surface forms”) to be disambiguated (“resolving the correct semantic meaning of the surface form”). Zwicklbauer, Pg. 427 top right, discloses “Since we want to compute the similarity between an entity-context embedding and the surrounding context of a surface form, one needs to perform an inference step to compute the surrounding context vector.” Here, Zwicklbauer discloses obtaining a context of a text to be disambiguated (“surrounding context of a surface form”).  Zwicklbauer, Pg. 428 Section 5.1, discloses:  “First, we search for all those entities that have already been annotated with mi in a corpus. All entities that provide an exact surface form matching serve as entity candidate.”  Here, Zwicklbauer discloses at least two candidate entities (“All entities that provide an exact surface form matching serve as entity candidate”), and it has already been established above that Zwicklbauer is “resolving the correct semantic meaning”, and thus the entity candidates have different semantics.)
generating a semantic vector of the context based on a trained word vector model (Zwicklbauer, Pg. 427 top right, discloses:  “The Doc2Vec model is trained on the entity-context corpus yielding the entity-context embeddings (see next section), and the same model is later used to generate the surface-form-context embeddings.” Here, a “surface-form-context embedding”, which is a vector produced from the text to be disambiguated and its context, is generated.  This is based on a trained word vector model, as “Doc2Vec” is disclosed.  Doc2Vec is well-known in the art, as an extension of the well-known Word2Vec algorithm, which is a trained word vector model.
In this case, Examiner is considering the output of the intermediate step of using Word2Vec to be the “semantic vector of the context”, rather than the final output of Doc2Vec (“surface-form-context embedding”) itself.  Note that Zwicklbauer suggests using the PV-DM version of Doc2Vec:  “The authors of Doc2Vec report consistently better results with the PV-DM architecture. PV-DM also outperforms PV-DBOW in the context of our disambiguation algorithm”.  Note that Zwicklbauer is citing Le et. al. (“Distributed Representations of Sentences and Documents”; hereinafter “Le”).  Note that in Le Pg. 1 Intro Para 3 states:  “At prediction time, the paragraph vectors are inferred by fixing the word vectors and training the new paragraph vector until convergence.”  Le, Pg. 3 Section 2.2, describing the PV-DM model, states: “The paragraph vector and word vectors are averaged or concatenated to predict the next word in a context. In the experiments, we use concatenation as the method to combine the vectors.”  Thus, Zwicklbauer’s method, which is using Le’s method, is disclosing learning word vectors with Word2Vec and concatenating them together.  This is what Examiner is considering to be the “semantic vector of the context”.  
This appears to be similar to what is described in the Instant Specification Pg. 4 Line 22 to Page 5 Line 12:  “The context of the text to be disambiguated may be represented by matrix A: A = [w1 w2 w3 w4 … wN] in which N is a length of the context of the text to be disambiguated”  …  “At block 102, a semantic vector of the context is generated based on a trained word vector model. In detail, the context of the text to be disambiguated is input to the trained word vector model. A semantic vector table corresponding to all entities in the knowledge base is already generated in the trained word vector model. A semantic vector corresponding to each word in the context of the text to be disambiguated may be obtained by looking up in the table.”  Here Applicant describes applying word2vec to each word in the context.  Thus, the intention of the “semantic vector of the context” appears to be a matrix (a vector of vectors) consisting of vector embeddings of each word in the context, essentially a concatenation of each word’s semantic vector.  Thus, when the “paragraph vector and word vectors are averaged or concatenated to predict the next word in a context” during Doc2Vec, the concatenated word vectors are the “semantic vector of the context” (rather than the final result produced by Doc2Vec).  See Le Figure 2 below for a visual:

    PNG
    media_image2.png
    371
    926
    media_image2.png
    Greyscale

Le’s caption below the Figure states:  “The paragraph vector represents the missing information from the current context and can act as a memory of the topic of the paragraph.”  Thus, this represents Zwicklbauer’s “surrounding context vector” of the surface form.)
However, Zwicklbauer does not teach generating a first entity vector of each of the at least two candidate entities based on a trained unsupervised neural network model, wherein text semantics of respective entities and a relationship between respective entities have been learned by the unsupervised neural network model.
Amiri teaches generating a first entity vector of [each of the at least two candidate] entities based on a trained unsupervised neural network model, wherein text semantics of respective entities and a relationship between respective entities have been learned by the unsupervised neural network model (Recall above that Zwicklbauer established two candidate entities. Amiri, Pg. 1882 Abstract, discloses “We present a pairwise context-sensitive Autoencoder for computing text pair similarity. Our model encodes input text into context-sensitive representations and uses them to compute similarity between text pairs. Our model outperforms the state-of-the-art models in two semantic retrieval tasks and a contextual word similarity task. For retrieval, our unsupervised approach that merely ranks inputs with respect to the cosine similarity between their hidden representations shows comparable performance with the state-of-the-art supervised models and in some cases outperforms them.”  Examiner’s Note:  Amiri discloses an “autoencoder”, which is well known in the art as a type of unsupervised neural network, and Amiri discloses “our unsupervised approach”.  It is used to compute similarity between text pairs (i.e., two candidate entities).  The model has learned “semantic retrieval tasks” (semantics of respective entities) and “contextual word similarity task” (“similarity” is a type of relationship, and thus is has learned a relationship between respective entities).  Amiri, Pg. 1883 Section 2.1, goes on to disclose “Autoencoders are trained using a local unsupervised criterion (Vincent et al., 2010; Hinton and Salakhutdinov, 2006; Vincent et al., 2008). Specifically, the basic autoencoder in Figure 1(a) locally optimizes the hidden representation h of its input x such that h can be used to accurately reconstruct x”.  Here, h is a lower dimensional vector than the input dimension, and “hidden representation h” is a “first entity vector of the entity”).
Returning to Zwicklbauer, Zwicklbauer teaches determining a similarity between the context and each candidate entity according to the semantic vector of the context and the [first entity] vector of each of the at least two candidate entities (Recall above that Amiri discloses a “first entity vector” that is a vector representing an entity that is the output of an unsupervised neural network that has learned semantics and relationships between entities.  Zwicklbauer also discloses a vector representing an entity that has learned semantics of an entity, called “entity-context embeddings”.  Zwicklbauer, Pg. 428 Section 5.1 Criteria 2, discloses “We select the top x entities ranked by their context matching. To this end, we compute the cosine similarity between the entity-context embeddings and the Doc2Vec inferred context vector of the surface form.”  Thus, Zwicklbauer’s entity-context embeddings are analogous in function to the “first entity vector” mapped from Amiri.  Zwicklbauer, Pg. 429 Sec 6.1 Paragraph 2 Lines 4-6, discloses “To create the Doc2Vec entity-context embeddings, we parse the entities’ Wikipedia pages and remove all Wikipedia syntax elements as well as tables.”  Here, Zwicklbauer discloses that Doc2Vec is used to create the “entity-context embeddings” and “context vector of the surface form”.   Note that the “context vector of the surface form” is the output of Doc2Vec, which is “according to” the “context vector of the surface form”, which Examiner above noted is the intermediate concatenation of Word2Vec vectors.  The “entity-context embeddings” are vector representations of the each candidate entity and their context.  A “cosine similarity” is determined between the “context vector of the surface form” and the “entity-context embeddings”, and is thus “according to” the semantic vector of the context and the first entity vector of each of the at least two candidate entities.  Also note that the entities are candidate entities, as Zwicklbauer Pg. 428 Section 5.1 discloses:  “All entities that provide an exact surface form matching serve as entity candidates”.)
and determining a target entity represented by the text to be disambiguated in the context from the at least two candidate entities according to the similarity between the context and each candidate entity. (Zwicklbauer, Sec 5.2 Final Paragraph, discloses “After constructing the disambiguation graph, we apply the PR algorithm and compute a relevance score for each entity candidate. Depending on the disambiguation task, our approach decides which entity candidate is the correct target entity or abstains if no appropriate candidate is available (cf. Algorithm 1).” Here, Zwicklbauer calculates a relevance score for each candidate.  In order for Zwicklbauer to reach this point, the cosine similarity has to first meet a given threshold, as shown in Zwicklbauer, Pg. 428 Section 5.1 Criteria 2:  “We select the top x entities ranked by their context matching. To this end, we compute the cosine similarity between the entity-context embeddings and the Doc2Vec inferred context vector of the surface form.” Therefore, deciding the correct target entity is based on the similarity between the context and each candidate entity.)
each entity in a preset knowledge base (Zwicklbauer, Pg. 429 Section 6.1, discloses:  “Before our algorithm is able to disambiguate entities, we first have to perform some preprocessing steps. First, we choose a KB whose entities define our target entity set . In the context of this work, we make use of the current version of DBpedia 2015-04 as entity data base, which reflects information from the last years Wikipedia version.”  Here, Zwicklbauer discloses a KB (knowledge base) which is an “entity data base”).
However, Zwicklbauer does not explicitly teach wherein the unsupervised neural network model in which the text semantics of respective entities and the relationship between respective entities have been learned is obtained by the following steps:  generating a second entity vector of each entity in a preset knowledge base by using a trained supervised neural network model, wherein semantics of respective entities have been learned by the supervised neural network mode; initializing first entity vectors of respective entities output by the unsupervised neural network model based on the second entity vector of each entity in the preset knowledge base; and training the initialized unsupervised neural network model based on an association relationship between respective entities.
Amiri teaches wherein the unsupervised neural network model in which the text semantics of respective entities and the relationship between respective entities have been learned is obtained by the following steps:  generating a second entity vector of each entity in a preset knowledge base by using a trained [supervised] neural network model, wherein semantics of respective entities have been learned by the supervised neural network mode (Recall above that Zwicklbauer established the entities being in a preset knowledge base.  Amiri also discloses a knowledge base in Pg. 1886 Section 4.1: “We use three datasets: “SCWS” a word similarity dataset with ground-truth labels on similarity of pairs of target words in sentential context from Huang et al. (2012); “qAns” a TREC QA dataset with ground-truth labels for semantically relevant questions and (single-sentence) answers from Wang et al. (2007); and “qSim” a community QA dataset crawled from Stack Exchange with ground-truth labels for semantically equivalent questions from Dos Santos et al. (2015). Table 1 shows statistics of these datasets. To enable direct comparison with previous work, we use the same training, development, and test data provided by Dos Santos et al. (2015) and Wang et al. (2007) for qSim and qAns respectively and the entire data of SCWS (in unsupervised setting).”  Amiri, Intro Para 2-3, discloses “We represent context information as low dimensional vectors that will be injected to deep autoencoders. To the best of our knowledge, this is the first work that enables integrating context into autoencoders. In representation learning, context may appear in various forms. For example, the context of a current sentence in a document could be either its neighboring sentences (Lin et al., 2015; Wang and Cho, 2015), topics associated with the sentence (Mikolov and Zweig, 2012; Le and Mikolov, 2014), the document that contains the sentence (Huang et al., 2012), as well as their combinations (Ji et al., 2016).”  Examiner’s Note:  Here, Amiri discloses representing “context information as low dimensional vectors” (i.e., a second entity vector of each entity).  Amiri goes on to disclose that the context “may appear in various forms”.  One of the forms suggested is Doc2Vec, as Amiri recites “topics associated with the sentence…Le and Mikolov, 2014”.  Doc2Vec uses a neural network, therefore the second entity vector of each entity is generated using a trained neural network model.  This is similar to how the “second entity vector” is produced in the Instant Specification: Pg. 5 Lines 19-20: “a vector generated by the supervised neural network model is referred to as a second entity vector” and Pg. 11 Lines 14-16:  “As a possible implementation, the two layers of the supervised neural network model may be connected and trained by using a document vector (Doc2vec) technology”.)  *Note that Doc2Vec in its default form is unsupervised, but its author Mikolov provides a supervised addition to this suite of tools which will be described below.*
initializing first entity vectors of respective entities output by the unsupervised neural network model based on the second entity vector of each entity [in the preset knowledge base] (Recall above Zwicklbauer established the preset knowledge base.  Amiri, Intro, discloses “We represent context information as low dimensional vectors that will be injected to deep autoencoders. To the best of our knowledge, this is the first work that enables integrating context into autoencoders.”  Here Amiri discloses that “low dimensional vectors” described above (the “second entity vector of each entity”) are “injected” to deep autoencoders (unsupervised neural network model).  This “injecting” is part of the input, and can thus be said to be “initializing” the unsupervised neural network model.  Since this neural network model is what produces the “first entity vectors of respective entities”, this step can be said to be initializing the first entity vectors of respective entities.  Furthermore, Amiri explicitly discloses using Doc2Vec to initialize:  “Input vectors could be initialized through more accurate approaches (Mikolov et al., 2013b; Li and Hovy, 2014)”, wherein the Mikolov reference is Doc2Vec).  
and training the initialized unsupervised neural network model based on an association relationship between respective entities.  (Amiri, Abstract, discloses “We present a pairwise context-sensitive Autoencoder for computing text pair similarity. Our model encodes input text into context-sensitive representations and uses them to compute similarity between text pairs”.  Here, Amiri discloses training an autoencoder (i.e., unsupervised neural network model) based on similarity between text pairs (i.e., association relationship between respective entities))”.
Since the above limitations taught by Amiri initialize and train the unsupervised neural network model, then these limitations thus accomplish wherein the unsupervised neural network model in which the text semantics of respective entities and the relationship between respective entities have been learned is obtained by the following steps, as the initialization and training result in “obtaining” the model.
wherein, the determining the similarity between the context and each candidate entity according to the semantic vector of the context and the first entity vector of each of the at least two candidate entities comprising
inputting the semantic vector of the context to the unsupervised neural network model, to obtain the first entity vector corresponding to the context, so as to calculate the similarity between the first entity vector corresponding to the context and the first entity vectors (Recall above that Zwicklbauer discloses calculating the cosine distance as part of the process to calculate the similarity between the “surface-form-context embedding” and two “entity-context vectors” which are each similar in function to the “first entity vector” (“hidden representations”) produced by Amiri’s unsupervised neural network, which also learns semantic context in its embeddings.  Amiri also discloses a similar comparison process.  In order to measure similarity between words, Amiri states in the Pg. 1882 Abstract:  “For retrieval, our unsupervised approach that merely ranks inputs with respect to the cosine similarity between their hidden representations”.  Here, Amiri discloses taking the cosine distance between “hidden representations” produced by the unsupervised neural network (“first entity vectors”).  Cosine distance requires the vectors be in the same vector space.
Thus, Zwicklbauer and Amiri both disclose calculating the similarity between two semantic vectors of context.  Zwicklbauer inputs the semantic vector of the context (concatenated Word2Vec embeddings) to Doc2Vec to produce a “surface-form-context embedding” to compare to “entity-context vectors”.  Amiri inputs Doc2Vec results into an unsupervised neural network to produce “hidden representations” (“first entity vectors”) to compare to one another.   Therefore, the combination of the two suggests inputting the semantic vector of the context to the unsupervised neural network model to obtain a first entity vector corresponding to the context, to calculate similarities with first entity vectors of at least two candidate entities.  Even if Zwicklbauer’s Doc2Vec is used as an intermediate step, the “semantic vector of the context” of concatenated Word2Vec can still be considered “input” to a system comprising the unsupervised neural network.)

It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the entity linking of Zwicklbauer, with the context sensitive autoencoder of Amiri. The modification would have been obvious because one of ordinary skill in the art would be motivated to achieve better performance in semantic processing tasks (Amiri, Abstract, discloses “Our model outperforms the state-of-the-art models in two semantic retrieval tasks and a contextual word similarity task. For retrieval, our unsupervised approach that merely ranks inputs with respect to the cosine similarity between their hidden representations shows comparable performance with the state-of-the-art supervised models and in some cases outperforms them.”)
However, the combination of Zwicklbauer and Amiri does not explicitly teach that the neural network used to generate a second entity vector of each entity is supervised.
Mikolov teaches that the neural network used to generate a second entity vector of each entity is supervised. (As discussed above, Amiri discloses using Mikolov’s Doc2Vec algorithm (based on the Word2Vec algorithm) to produce the second entity vector of each entity.  Mikolov, Page 3 Section 2.2, discloses “We define Negative sampling (NEG) by the objective (Eq 4) which is used to replace every log P(wO|wI ) term in the Skip-gram objective. Thus the task is to distinguish the target word wO from draws from the noise distribution Pn(w) using logistic regression, where there are k negative samples for each data sample”  Examiner’s Note:  The “negative sampling” disclosed by Mikolov is supervised process, using data that is labelled as being “noise” that should generate a negative result from the algorithm.  This labelled data is used to train the neural network, and is therefore a supervised process.)
Mikolov and the combination of Zwicklbauer and Amiri are analogous art because they are in the field of endeavor of natural language processing.
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the entity linking with autoencoder of the combination of Zwicklbauer and Amiri, with the negative sampling of Mikolov. The modification would have been obvious because one of ordinary skill in the art would be motivated to improve the accuracy of word representations (Mikolov, Pg 5 Section 3 Para 2 Lines 5-8, discloses “The table shows that Negative Sampling outperforms the Hierarchical Softmax on the analogical reasoning task, and has even slightly better performance than the Noise Contrastive Estimation. The subsampling of the frequent words improves the training speed several times and makes the word representations significantly more accurate.”

As per Claim 5, the combination of Zwicklbauer, Amiri, and Mikolov teaches the method according to claim 1.  Amiri teaches wherein training the initialized unsupervised neural network model based on an association relationship between respective entities comprises: training the initialized unsupervised neural network model based on entities in the knowledge base that have the association relationship, and/or based on entities in a search log that have a co-occurrence relationship (Amiri, Pg. 1882 Abstract, discloses “We present a pairwise context-sensitive Autoencoder for computing text pair similarity. Our model encodes input text into context-sensitive representations and uses them to compute similarity between text pairs.”  Here, Amiri trains an autoencoder (i.e., unsupervised neural network model) based on similarity between text pairs (i.e., entities that have an association relationship).  Amiri, Pg. 1886 Section 4.1, discloses at least one knowledge base: “We use three datasets: “SCWS” a word similarity dataset with ground-truth labels on similarity of pairs of target words in sentential context from Huang et al. (2012); “qAns” a TREC QA dataset with ground-truth labels for semantically relevant questions and (single-sentence) answers from Wang et al. (2007); and “qSim” a community QA dataset crawled from Stack Exchange with ground-truth labels for semantically equivalent questions from Dos Santos et al. (2015). Table 1 shows statistics of these datasets. To enable direct comparison with previous work, we use the same training, development, and test data provided by Dos Santos et al. (2015) and Wang et al. (2007) for qSim and qAns respectively and the entire data of SCWS (in unsupervised setting).”  Here, the entities have an association relationship, the relationships comprising questions and answers, similarities, and ground truth labels.)
and determining that training the unsupervised neural network model is finished when a distance between first entity vectors output by the unsupervised neural network model corresponds to a closeness between entities.  (Amiri, Pg. 1882 Abstract, discloses “We present a pairwise context-sensitive Autoencoder for computing text pair similarity”, where Autoencoder is an unsupervised neural network.  Amiri, Pg. 1885 Section 3, discloses “In unsupervised settings, given a pair of input texts with their corresponding context vectors, (x1,cx1 ) and (x2,cx2 ), we determine their semantic similarity score by computing the cosine similarity between their hidden representations h1n and h2n respectively”.   Here, the “hidden representations” are first entity vectors output by the unsupervised neural network.  Amiri discloses that training is finished when a distance between the vectors (“cosine similarity”) indicates closeness between the entities, as in the Abstract, this is the stated goal of the neural network:  “Autoencoder for computing text pair similarity”.)

As per Claim 6, the combination of Zwicklbauer, Amiri, and Mikolov teaches the method according to claim 1. Amiri teaches further comprising: generating training corpus corresponding to various application scenes (Amiri, Pg. 1886 Section 4.1, discloses a training corpus for 3 different application scenes: “We use three datasets: “SCWS” a word similarity dataset with ground-truth labels on similarity of pairs of target words in sentential context from Huang et al. (2012); “qAns” a TREC QA dataset with ground-truth labels for semantically relevant questions and (single-sentence) answers from Wang et al. (2007); and “qSim” a community QA dataset crawled from Stack Exchange with ground-truth labels for semantically equivalent questions from Dos Santos et al. (2015).”  Each of these datasets can be considered to represent 3 different application scenes.)
and performing word vector model training by using the training corpus corresponding to various application scenes, to obtain word vector models respectively applicable to various application scenes (Amiri, Pg. 1886 Section 4.1, discloses obtaining word vector models in specific ways that are applicable to the three application scenes: “We consider local and global context for target words in SCWS. The local context of a target word is its ten neighboring words (five before and five after) (Huang et al., 2012), and its global context is a short paragraph that contains the target word (surrounding sentences). We compute average word embeddings to create context vectors for target words. Also, we consider question title and body and answer text as input in qSim and qAns and use NMF to create global context vectors for questions and answers”.  Here, Amiri discloses a specific method of word embedding for the specific application scene of SCWS.)

As per Claim 7, the combination of Zwicklbauer, Amiri, and Mikolov teaches the method according to claim 1. Amiri teaches further comprising: determining a similarity between each two different candidate entities based on the first entity vector of each of the at least two candidate entities (Amiri, Pg. 1885 Section 3, discloses determining a similarity between each two different candidate entities based on their first entity vectors h1n and h2n: “In unsupervised settings, given a pair of input texts with their corresponding context vectors, (x1,cx1 ) and (x2,cx2 ), we determine their semantic similarity score by computing the cosine similarity between their hidden representations h1n and h2n respectively.”)
and performing entity relationship mining or entity recommendation based on the similarity between each two different candidate entities. (Amiri, Pg. 1887 Section 4.4, discloses using this similarity (“relevance score”) to make an entity recommendation (“retrieve correct answers from a set of candidates”):  “We evaluate the performance of our model in the answer ranking task in which a model should retrieve correct answers from a set of candidates for test questions. For this evaluation, we rank answers with respect to each test question according to the “relevance score” between question and each answer.”)

As per Claim 8, Claim 8 is a device claim corresponding to method claim 1.  The difference is that it recites one or more processors and a memory.  (Zwicklbauer, Section 6.1 top of page 430, discloses one or more processors and a memory: “The Word2Vec training time takes 90 minutes on our personal computer with a 4x3.4GHz Intel Core i7 processor and 16 GB RAM (1 corpus iteration)”).  Claim 8 is rejected for at least the same reasons as Claim 1.
As per Claim 12, Claim 12 is a device claim corresponding to method claim 5.  The difference is that it recites one or more processors and a memory.  Claim 12 is rejected for the same reasons as Claim 5.

As per Claim 13, Claim 13 is a device claim corresponding to method claim 6.  The difference is that it recites one or more processors and a memory.  Claim 13 is rejected for at least the same reasons as Claim 6.

As per Claim 14, Claim 14 is a device claim corresponding to method claim 7.  The difference is that it recites one or more processors and a memory.  Claim 14 is rejected for at least the same reasons as Claim 7.

As per Claim 15, Claim 15 is a non-transitory computer readable storage medium claim corresponding to method claim 1.  The difference is that it recites a non-transitory computer readable storage medium and a processor.  (Zwicklbauer, Section 6.1 top of page 430, discloses a non-transitory computer readable storage medium and a processor: “The Word2Vec training time takes 90 minutes on our personal computer with a 4x3.4GHz Intel Core i7 processor and 16 GB RAM (1 corpus iteration)”).  Claim 15 is rejected for at least the same reasons as Claim 1.


As per Claim 19, Claim 19 is a non-transitory computer readable storage medium claim corresponding to method claim 5.  The difference is that it recites a non-transitory computer readable storage medium and a processor.  Claim 19 is rejected for at least the same reasons as Claim 5.

As per Claim 20, Claim 20 is a non-transitory computer readable storage medium claim corresponding to method claim 6.  The difference is that it recites a non-transitory computer readable storage medium and a processor.  Claim 20 is rejected for at least the same reasons as Claim 6.

Claims 3, 4, 10, 11, 17, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Zwicklbauer, Amiri, and Mikolov further in view of Bilgin et. al. (“Sentiment Analysis on Twitter data with Semi-Supervised Doc2Vec”; hereinafter Bilgin).
As per Claim 3, the combination of Zwicklbauer, Amiri, and Mikolov teaches the method according to claim 1.  Mikolov teaches further comprising: generating negative examples of the training samples based on all text description information of respective entities in the knowledge base (Mikolov, Pg. 3-4 Section 2.2, discloses “there are k negative samples for each data sample”.  Mikolov, Pg. 4-5 Section 2.3, discloses “In very large corpora, the most frequent words can easily occur hundreds of millions of times (e.g., “in”, “the”, and “a”). Such words usually provide less information value than the rare words. We chose this subsampling formula because it aggressively subsamples words whose frequency is greater than t while preserving the ranking of the frequencies”.  Here, Mikolov discloses generating negative samples (“k negative samples…subsampling formula”) based on “very large corpora”.  A corpora can be considered a type of knowledge base, and Mikolov is sampling words from the entire corpora (i.e., based on all text description information of respective entities in the knowledge base)).
	However, the combination of Zwicklbauer, Amiri, and Mikolov does not explicitly teach generating positive examples of training samples based on an attribute of each entity in the knowledge base and a keyword extracted from introduction information of each entity; training the supervised neural network model based on the training samples, wherein the supervised neural network model comprises a first layer configured to input a semantic vector of text in the training samples and a second layer configured to predict an entity described by the text input in the first layer, and a parameter of the second layer configured to generate the second entity vector; and determining that training the supervised neural network model is finished when the text in the training samples input to the first layer is configured to describe an entity output by the second layer
	Bilgin teaches generating positive examples of training samples based on an attribute of each entity in the knowledge base and a keyword extracted from introduction information of each entity (Bilgin, Pg. 661 Abstract, discloses “In this study, it was aimed to perform sentiment analysis on Turkish and English Twitter messages using Doc2Vec. The Doc2Vec algorithm was run on Positive, Negative and Neutral tagged data using the Semi-Supervised learning method and the results were recorded. The Doc2Vec algorithm was run on Positive, Negative and Neutral tagged data using the Semi-Supervised learning method and the results were recorded”.  Here, Doc2Vec is used with a semi-supervised learning method.  The positive examples are based off a tag (i.e., attribute) of a collection of Twitter messages (i.e., entities in a knowledge base), the knowledge base disclosed by Bilgin in Sec II D:  “English data set contains 1774 unlabelled and 58817 labeled data sets.”  The “Positive”, “Negative”, and “Neutral” tags can be both considered an attribute and a keyword extracted from introduction information.  The sentiment of a Twitter message can be broadly considered introduction information, i.e. “the following message will be a positive message”).  Also note that while “positive” in Bilgin refers to sentiment of text, it can also be interpreted as in the instant application where “positive” means correlation with an entity.  One can see this also applies with Bilgin, as a tweet with positive sentiment has a correlation with an entity (i.e. “positive tweet”).
	training the supervised neural network model based on the training samples, wherein the supervised neural network model comprises a first layer configured to input a semantic vector of text in the training samples and a second layer configured to predict an entity described by the text input in the first layer, and a parameter of the second layer configured to generate the second entity vector (Bilgin, Pg. 662 Section II B Doc2Vec, discloses Figure 3.  In Figure 3, one can see the DM embodiment of the Doc2Vec algorithm with, wherein the input layer consists of a Document ID as well as a plurality of words denoted by V(t +/- n).  These words may or may not individually be pre-trained Word2Vec vectors which have learned semantic information, as Bilgin does not say.  However, if they are not, and are simply one-hot representations of words, the entire plurality of words being input to the neural network comprises a vector input to the neural network [word1 word2 word3…wordn], which inherently comprises semantic information about the document, which is the underlying goal of Doc2Vec – to learn an overall semantic representation of a block of text.  The output of Doc2Vec is another vector (i.e., second entity vector).  In Bilgin, this is to predict the overall sentiment, positive or negative (i.e. predict an entity “positive post” or “negative post”) described by a tweet (i.e., the text input)).
	determining that training the supervised neural network model is finished when the text in the training samples input to the first layer is configured to describe an entity output by the second layer (Bilgin, Pg. 661-662 Section II A, discloses “It is one of the learning methods used in machine learning. The input data comprises large amount of unlabeled data and small quantities of labeled data. This method is generally useful when the labeled data is low and the unlabeled data is readily available. Semi-Supervised learning is seen in Figure 1.”  Here, Bilgin discloses that learning is semi-supervised, the supervision accomplished by labelled data.  The label on the labelled data is the sentiment (positive, negative, neutral).  Supervised means that the labels are used to accomplish training, and therefore training must accomplish the task of sufficiently assigning these labels (i.e. determining that training is finished when the text input in the first layer is configured to describe an entity output in the second layer).  Bilgin discloses that correctly classified samples is the end goal of the trained neural network in Pg. 663 Section III “Accuracy metric, the most popular and simple method used to measure model performance, is the accuracy rate of the model. The number of correctly classified samples (TP + TN) is the ratio of the total sample counts (TP + TN + FP + FN). The error rate is 1 of this value. In other words, the number of misclassified samples (FP + FN) is the ratio of the total number of samples (TP + TN + FP + FN)”)
Bilgin and the combination of Zwicklbauer, Amiri, and Mikolov are analogous art because they are in the field of endeavor of natural language processing. 
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the entity linking with autoencoder and negative sampling of the combination of Zwicklbauer, Amiri, and Mikolov, with the supervised Doc2Vec algorithm of Bilgin. The modification would have been obvious because one of ordinary skill in the art would be motivated to properly capture the overall semantic meaning of a block of text with high accuracy, which is the motivation of Bilgin (Bilgin, Intro Para 2, discloses “For machine learning algorithms to be able to classify data with high accuracy, the pre-processing phase must be correctly organized.”)

As per Claim 4, the combination of Zwicklbauer, Amiri, Mikolov, and Bilgin teaches the method according to claim 3.  Mikolov teaches wherein generating the negative examples of the training samples based on all text description information of respective entities in the knowledge base comprises: performing word-segmentation on all text description information of respective entities in the knowledge base, and performing term frequency statistics on terms obtained by the word-segmentation; and performing negative sampling on each term based on a term frequency of each term, to obtain the negative examples of the training samples. (Mikolov, Pg. 3-4 Section 2.2, discloses negative sampling:  “Thus the task is to distinguish the target word wO from draws from the noise distribution Pn(w) using logistic regression, where there are k negative samples for each data sample.”  Mikolov, Pg. 4-5 Section 2.3, then goes on to describe the selection of words for negative sampling:  “In very large corpora, the most frequent words can easily occur hundreds of millions of times (e.g., “in”, “the”, and “a”). Such words usually provide less information value than the rare words. To counter the imbalance between the rare and frequent words, we used a simple subsampling approach: (Eq. 5) each word wi in the training set is discarded with probability computed by the formula where f(wi) is the frequency of word wi and t is a chosen threshold, typically around 10−5. We chose this subsampling formula because it aggressively subsamples words whose frequency is greater than t while preserving the ranking of the frequencies.”  Here, Mikolov discloses “each word wi” (i.e., word segmentation) “in very large corpora” (i.e., all text description information of respective entities in the knowledge base) using the function “where f(wi) is the frequency of word wi” (i.e., performing term frequency statistics on terms obtained by the word segmentation).  Mikolov also discloses “subsamples words whose frequency is greater than t while preserving the ranking of the frequencies” (i.e., perform negative sampling on each term based on term frequency to obtain the negative examples)).

As per Claim 10, Claim 10 is a device claim corresponding to method claim 3.  The difference is that it recites one or more processors and a memory.  (Zwicklbauer, Section 6.1 top of page 430, discloses one or more processors and a memory: “The Word2Vec training time takes 90 minutes on our personal computer with a 4x3.4GHz Intel Core i7 processor and 16 GB RAM (1 corpus iteration)”).  Claim 10 is rejected for at least the same reasons as Claim 3.

As per Claim 11, Claim 11 is a device claim corresponding to method claim 4.  The difference is that it recites one or more processors and a memory.  Claim 11 is rejected for at least the same reasons as Claim 4.

As per Claim 17, Claim 17 is a non-transitory computer readable storage medium claim corresponding to method claim 3.  The difference is that it recites a non-transitory computer readable storage medium and a processor.  (Zwicklbauer, Section 6.1 top of page 430, discloses a non-transitory computer readable storage medium and a processor: “The Word2Vec training time takes 90 minutes on our personal computer with a 4x3.4GHz Intel Core i7 processor and 16 GB RAM (1 corpus iteration)”).  Claim 17 is rejected for at least the same reasons as Claim 3.

As per Claim 18, Claim 18 is a non-transitory computer readable storage medium claim corresponding to method claim 4.  The difference is that it recites a non-transitory computer readable storage medium and a processor.  Claim 18 is rejected for at least the same reasons as Claim 4.
	Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Liou et. al. (“Autoencoder for words”) discloses an unsupervised neural network that produces word embeddings that can distinguish polysemous words
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEONARD A SIEGER whose telephone number is (571)272-9710.  The examiner can normally be reached on M-F 8:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571) 272-9767.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/L.A.S./Examiner, Art Unit 2126  

/NICHOLAS KLICOS/Primary Examiner, Art Unit 2145