DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Drawings
The drawings filed on 04/06/2020 are accepted.

Specification
The specification filed on 04/06/2020 is accepted.

Information Disclosure Statement
The examiner has considered the information disclosure statements (IDS) submitted on 04/06/2020.

Claim Objections
Claims 7, 15 and 20 are objected to because of the following informalities:
In claims 7, 15 and 20, “beginning of the electronic” should read “beginning of the electronic document”.
Appropriate correction is required.



Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-6, 10-14 and 17-19 are rejected under 35 U.S.C. 103 as being unpatentable over Sun et al. (Modeling Mention, Context and Entity with Neural Networks for Entity Disambiguation – Applicant provided NPL) in view of Yu et al. (US Pub. 2017/0286805) and further in view of Manning et al. (US Pub. 2017/0031920).    
As per claim 1, Sun teaches a computer-implemented method [neural network method for entity disambiguation, Fig. 1, page 1334], comprising:
identifying, using the at least one processor [page 1337, section 3.3, 2nd paragraph, CPU], a set of one or more entity mentions in an electronic document [introduction, page 1333, 1st paragraph, “given a document and a mention which is usually a text span occurred in the document”], an entity mention to be linked to a page of a plurality of candidate pages in a knowledge base [introduction, page 1333, 1st paragraph, “entity disambiguation targets at mapping the mention to an entity from reference knowledge base”]; 
representing each entity mention as a plurality of word sequences capturing a context or topic of the entity mention at multiple granularities in the electronic document [Fig. 1, mention representation; page 1334, section 2.1, “neural network for entity disambiguation is given in Figure 1. As is shown, the input includes three parts, namely a mention, the context of mention and a candidate entity from reference knowledge base”; page 1333, introduction, 1st paragraph, “Given a document and a mention which is usually a text span occurred in the document … For example, given a text span “President Obama” in the document “After campaigning on the promise of health care reform, President Obama gave a speech in March 2010 in Pennsylvania.” as input, the purpose of entity disambiguation is to link the mention “President Obama” in this context to an entity”]; 
for each entity mention in the electronic document, identify a set of target candidate pages in the knowledge base that potentially refer to the entity mention in the document [page 1334, section 2.1, “neural network for entity disambiguation is given in Figure 1. As is shown, the input includes three parts, namely a mention, the context of mention and a candidate entity from reference knowledge base”; page 1334, Col. 1, 1st paragraph, “comparing the similarities between an input (mention, context) pair and candidate entities”]; 
applying a scoring function to obtain a relevance score for each said target candidate page of the corpus for each mention [page 1336, section 2.3, “Given the representation of a candidate entity ve and the representation of a mention context pair vmc, we use the cosine similarity between these two vectors to represent their semantic relatedness, namely 
sim(e, mc) = cosine(ve, vmc) (3) 
In the prediction process, we calculate the similarity between a context mention pair with each candidate entity, and select the closest one as the final result … The basic idea is that the output score of a correct entity should be larger than the score of a randomly selected candidate entity”], said applying a scoring function comprising: 
running a CNN model using the plurality of word sequences of the entity mention and a candidate target page of the knowledge base to compute a first score representing a local similarity score between each entity mention and candidate target page [Fig. 1, page 1334, Col. 1, 1st paragraph, “We cast entity disambiguation as a ranking task by comparing the similarities between an input (mention, context) pair and candidate entities. Specifically, we embed mention, context and entity in continuous vector space to capture their semantic representations. The variable-sized context are modeled with convolutional neural networks”; section 2.1, “learn the continuous representations of context words with convolution neural networks, and produce its semantic composition with mention using a neural tensor network … learn the continuous representation of a candidate entity. We then apply the learned representations of context, mention and entity for calculating the similarity between a candidate entity and a given (mention, context) pair”]; and 
ranking each of said scores for the entity mention [page 1336, section 2.3, “Given the representation of a candidate entity ve and the representation of a mention context pair vmc, we use the cosine similarity between these two vectors to represent their semantic relatedness, namely 
sim(e, mc) = cosine(ve, vmc) (3) 
In the prediction process, we calculate the similarity between a context mention pair with each candidate entity, and select the closest one as the final result”]; and 
providing a link for linking the entity mention to the target candidates page of the knowledge base based on a highest score for the entity mention [page 1336, section 2.3, “Given the representation of a candidate entity ve and the representation of a mention context pair vmc, we use the cosine similarity between these two vectors to represent their semantic relatedness, namely 
sim(e, mc) = cosine(ve, vmc) (3) 
In the prediction process, we calculate the similarity between a context mention pair with each candidate entity, and select the closest one as the final result … The basic idea is that the output score of a correct entity should be larger than the score of a randomly selected candidate entity”].  
Sun does not teach
running a RNN model that simultaneously models an interdependence among the other entity mentions in the document and other candidate pages to compute a second score; 
adding the first and second score to obtain a relevance score for the entity mention; 
ranking each of said obtained relevance scores for the entity mention (emphasis added); and 
providing a link for linking the entity mention to the target candidates page of the knowledge base based on a highest relevance score for the entity mention (emphasis added).
Yu teaches 
running a RNN model that simultaneously models an interdependence among the other entity mentions in the document and other candidate pages to compute a second score [paragraphs 0022-0024, “The one or more images that depict the entity and the candidate entity profiles can be provided as input to a machine learning model … the machine learning model can include various other suitable models, such as a recurrent neural network (e.g. a long short-term memory (LSTM) network, and/or a convolutional LSTM network) … the CNN can be configured to extract features from the images, while the LSTM can be configured to obtain text-related information from the candidate entity profile(s). The CNN may be further configured to provide data indicative of the extracted features to the LSTM. The LSTM may model at least a portion of the structured information from a candidate entity profile as a sequence of characters, such that a match score between the extracted features of the one or more images and the data from the candidate entity profile can be determined”; paragraph 0027 further recite “the machine learning model can include multiple LSTM networks along with a single CNN … The multiple LSTM networks can be configured to simultaneously determine a match score between an entity depicted in an image and multiple entity profiles. In this manner, data indicative of a different entity profile can be provided to each LSTM network. The CNN can extract features from one or more images depicting an entity and provide data indicative of the extracted features to each LSTM network. Each LSTM network may then be configured to determine a match score between the respective entity profiles and the entity depicted in the image(s) in a parallel manner”; Figs 1 and 3 disclose the process of combining the CNN network and the RNN network to determine the match score; Fig. 4, paragraphs 0040-0041 disclose how the match scores are calculated and selected, “The machine learning model can be configured to determine match scores between entity 302 and the each candidate entity profile 304 … a match score of 0.727 between entity 302 and the candidate entity profile associated with Mr. Optical Inc. As shown, such match score can be indicative of a "match" between entity 302 and the candidate entity profile associated with Mr. Optical Inc. (highest match score)”]; 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the method of entity linking of Sun to include the process of running a RNN model to compute a second score of Yu. Doing so would help generating a confidence value specifying the likelihood that the entity associated with image data is the same entity as the entity associated with entity data, and providing speed increases in training machine learning model (Yu, 0036 and 0039).
Sun and Yu do not teach
adding the first and second score to obtain a relevance score for the entity mention; 
ranking each of said obtained relevance scores for the entity mention (emphasis added); and 
providing a link for linking the entity mention to the target candidates page of the knowledge base based on a highest relevance score for the entity mention (emphasis added).
Manning teaches
adding the first and second score to obtain a relevance score for the entity mention [Fig. 2, paragraphs 0037-0040, “the hybrid recommender system 200 can be trained to determine each of the weight values via a technique such as a technique used in neural networks … The hybrid recommender system 200 can include a weight 206-1 and one or more other weights 206-2, ... 206-N. Each of the weights 206-1, 206-2, ...  206-N can be associated with a corresponding one of the recommender systems 202-1, 202-2, ... 202-N … the combiner 204 can produce the combined output by numerically combining the weighted outputs from the recommender systems 202-1, 202-2, ... 202-N”; paragraph 0082, “The model rankings can be used to generate a master ranking via a weighted average of ranks of the RNN, CNN …”]; 
ranking each of said obtained relevance scores for the entity mention [Fig. 2, paragraphs 0037-0040, “the hybrid recommender system 200 can be trained to determine each of the weight values via a technique such as a technique used in neural networks … The hybrid recommender system 200 can include a weight 206-1 and one or more other weights 206-2, ... 206-N. Each of the weights 206-1, 206-2, ...  206-N can be associated with a corresponding one of the recommender systems 202-1, 202-2, ... 202-N … the combiner 204 can produce the combined output by numerically combining the weighted outputs from the recommender systems 202-1, 202-2, ... 202-N”; paragraph 0082, “The model rankings can be used to generate a master ranking via a weighted average of ranks of the RNN, CNN …”; Since Sun (as modified) teaches the CNN and RNN networks are combined to generate a match/relevance score [Yu, Fig. 3 shows output of the CNN is input to the RNN network, then output from the RNN (LSTM) is used to generate the match score], however, Sun (as modified) is silent of combining the scores/weights output from the CNN and RNN networks (first and second scores) to generate the relevance score, Manning teaches the relevance score is obtained by combining the scores/weights of the recommender systems 202-1, 202-2, ... 202-N, where, the weights are determined via a technique used in neural networks which comprising CNN and RNN, and  Sun (as modified) further teaches how the scores are ranked [Sun, page 1336, section 2.3, “Given the representation of a candidate entity ve and the representation of a mention context pair vmc, we use the cosine similarity between these two vectors to represent their semantic relatedness, namely 
sim(e, mc) = cosine(ve, vmc) (3) 
In the prediction process, we calculate the similarity between a context mention pair with each candidate entity, and select the closest one as the final result”], therefore, the combination of Sun (as modified) and Manning read on the claim limitation]; and 
providing a link for linking the entity mention to the target candidates page of the knowledge base based on a highest relevance score for the entity mention [Fig. 2, paragraphs 0037-0040, “the hybrid recommender system 200 can be trained to determine each of the weight values via a technique such as a technique used in neural networks … The hybrid recommender system 200 can include a weight 206-1 and one or more other weights 206-2, ... 206-N. Each of the weights 206-1, 206-2, ...  206-N can be associated with a corresponding one of the recommender systems 202-1, 202-2, ... 202-N … the combiner 204 can produce the combined output by numerically combining the weighted outputs from the recommender systems 202-1, 202-2, ... 202-N”; paragraph 0082, “The model rankings can be used to generate a master ranking via a weighted average of ranks of the RNN, CNN …”; Since Sun (as modified) teaches the CNN and RNN networks are combined to generate a match/relevance score [Yu, Fig. 3 shows output of the CNN is input to the RNN network, then output from the RNN (LSTM) is used to generate the match score], however, Sun (as modified) is silent of combining the scores/weights output from the CNN and RNN networks (first and second scores) to generate the relevance score, Manning teaches the relevance score is obtained by combining the scores/weights of the recommender systems 202-1, 202-2, ... 202-N, where, the weights are determined via a technique used in neural networks which comprising CNN and RNN, and  Sun (as modified) further teaches linking the entity mention to the target candidates based on a highest score for the entity mention [Sun, page 1336, section 2.3, “Given the representation of a candidate entity ve and the representation of a mention context pair vmc, we use the cosine similarity between these two vectors to represent their semantic relatedness, namely 
sim(e, mc) = cosine(ve, vmc) (3) 
In the prediction process, we calculate the similarity between a context mention pair with each candidate entity, and select the closest one as the final result … The basic idea is that the output score of a correct entity should be larger than the score of a randomly selected candidate entity”], therefore, the combination of Sun (as modified) and Manning read on the claim limitation].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the method of entity linking of Sun to include the process of adding the first and second score to obtain a relevance score and ranking each of said obtained relevance scores of Manning. Doing so would help generating a master ranking via a weighted average of ranks of the RNN, CNN models (Manning, 0082).

As per claim 2, Sun, Yu and Manning teach the computer-implemented method as claimed in claim 1.
	Sun further teaches
said plurality of word sequences of said entity mention comprises: a data triplet comprising a surface string, a context and the entire document having the entity mention [page 1334, Fig. 1, “the mention “President Obama” comes from an original document “After campaigning on the promise of health care reform, President Obama gave a speech in March 2010 in Pennsylvania””, where, the context is “After campaigning on the promise of health care reform, gave a speech in March 2010 in Pennsylvania”; introduction, page 1333, 2nd paragraph, “Representative mention features include document surface feature like lexical and part-of-speech tags of context words”] and, said page having an associated title and content potentially referring to said entity mention [Fig. 1 shows an entity (page) having a title Barack Obama; page 1335, Col. 2, last paragraph, “We model the semantics of an entity in knowledge base from two aspects: entity surface words and entity class. For example, the surface words of entity Barack Obama are barack and obama. Entity class of an entity is a word or a phrase provided in infobox of reference knowledge base, which indicates the category information of the entity. For example, the class of Barack Obama is president of the united states”; page 1333, introduction, 2nd paragraph, “compare the context of a mention with the text associated with a candidate entity (e.g. the text in the corresponding page in reference KB)”; page 1337, Col. 2, last paragraph, “a mention mi and an entity title ti”].  

As per claim 3, Sun, Yu and Manning teach the computer-implemented method as claimed in claim 1.
Sun further teaches
preserving the order of the entity mentions from the beginning to the end of an input document [page 1335, context modeling, 1st paragraph, “representation of a context is also influenced by the distance between a context word and the mention, this is based on the consideration that a closer context word might be more informative than a farther one for disambiguating the mention, the vector of each context word is made up of two parts: a word embedding and a position embedding, the position of a context word is its distance to the mention in a given piece of text”; page 1337, 4th paragraph, “an improvement is further achieved by incorporating the position information of context words, a closer context word might be more informative than a farther one for disambiguating the mention”; since the distance between a context word and the mention of the document is in part used for disambiguating the entity mention, it can be understood that the order of the mentions and the context words are preserved].
Yu further teaches 
said running said RNN model further comprises: computing the second scores for all the target candidates pages of all the entity mentions in each document simultaneously [Fig. 3, paragraphs 0038-0039, “LSTM networks 204-208 can receive entity data 212, 214, and 216 as input. In particular, LSTM 204 can receive entity 1 data 212, LSTM 206 can receive entity 2 data 214, and LSTM 208 can receive entity 3 data 216 … each LSTM 204-208 can receive the same feature parameters … LSTMs 204-208 can then be configured to obtain text-related data associated with the respective entity data 212-216, and to provide the data indicative of the extracted features and the text-related data to the respective classifiers 218, 220, and 222 to determine match scores between the entity associated with image data 210 and the respective entity profiles associated with entity data 212-216 in a parallel manner”].  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the method of entity linking of Sun to include the process of computing the second scores for all the target candidate pages of all the entity mentions in each document simultaneously of Yu. Doing so would help providing speed increases in training machine learning model (Yu, 0039).

As per claim 4, Sun, Yu and Manning teach the computer-implemented method of claim 1.
Sun further teaches
encoding multiple variables of the word sequences, or target candidate pages, or both [abstract, “the model takes consideration of the semantic representations of mention, context and entity, encodes them in continuous vector space”]; 
transforming each encoded variable into a corresponding encoded vector using a word embedding table [page 1335, section 2.2, 2nd paragraph, “the vector of each context word is made up of two parts: a word embedding ew = Lwiw and a position embedding ep =  Lpip, where Lw and Lp are the lookup tables of words and positions, respectively”]; and 
running, using said at least one processor, CNN model operations on said encoded vectors to obtain a distributed representation for the encoded vectors [page 1335, Figs. 1-2, section 2.2, 2nd paragraph disclose context modeling with convolutional neural network, the input of context convolution includes word embedding and position embedding; using convolutional neural network to produce a fixed-length vector for a context].  

As per claim 5, Sun, Yu and Manning teach the computer-implemented method of claim 4.
Sun further teaches
utilizing, using said at least one processor [page 1337, section 3.3, 2nd paragraph, CPU], a set of multiple window sizes to parameterize the CNN model operations on said encoded vectors to obtain resultant vectors [page 1335, Col. 1, 2nd paragraph, “the convolution layer is a list of linear layers whose parameters are shared in different filter windows as given in Figure 2. Formally, suppose the filter window size of each convolution layer is K, the output vector of a convolution layer is calculated as follow Oconv = Wconvinconv + bconv”], each window size corresponding to a convolution matrix of a predetermined dimensionality [page 1336, section 2.4, 1st paragraph “set the dimension of word vector as 50, window size as 5”; page 1335, Fig. 3, section 2.2, 4th paragraph, “represent each input as the concatenation of mention vector and context vector, a bilinear layer is typically parameterized by a matrix M ϵ RNxN, where N is the dimension of each input vector”].  

As per claim 6, Sun, Yu and Manning teach the computer-implemented method of claim 5.
Sun further teaches
concatenating, using said at least one processor [page 1337, section 3.3, 2nd paragraph, CPU], the resultant vectors for each window size to obtain a distributed representation for encoded variable as a single concatenation vector [page 1335, Figs. 1-2, section 2.2, 2nd paragraph, “suppose the filter window size of each convolution layer is K … the output vector of a convolution layer is the concatenation of representations of K words in a filter window”].  

As per claim 10, Sun teaches a computer program product comprising: 
a non-transitory computer-readable storage medium [page 1337, section 3.3, 2nd paragraph, memory], at least one processor [page 1337, section 3.3, 2nd paragraph, CPU],
identifying a set of one or more entity mentions in an electronic document [introduction, page 1333, 1st paragraph, “given a document and a mention which is usually a text span occurred in the document”], an entity mention to be linked to a page of a plurality of candidate pages in a knowledge base [introduction, page 1333, 1st paragraph, “entity disambiguation targets at mapping the mention to an entity from reference knowledge base”]; 
representing each entity mention as a plurality of word sequences capturing a context or topic of the entity mention at multiple granularities in the electronic document [Fig. 1, mention representation; page 1334, section 2.1, “neural network for entity disambiguation is given in Figure 1. As is shown, the input includes three parts, namely a mention, the context of mention and a candidate entity from reference knowledge base”; page 1333, introduction, 1st paragraph, “Given a document and a mention which is usually a text span occurred in the document … For example, given a text span “President Obama” in the document “After campaigning on the promise of health care reform, President Obama gave a speech in March 2010 in Pennsylvania.” as input, the purpose of entity disambiguation is to link the mention “President Obama” in this context to an entity”]; 
for each entity mention in the electronic document, identify a set of target candidate pages in the knowledge base that potentially refer to the entity mention in the document [page 1334, section 2.1, “neural network for entity disambiguation is given in Figure 1. As is shown, the input includes three parts, namely a mention, the context of mention and a candidate entity from reference knowledge base”; page 1334, Col. 1, 1st paragraph, “comparing the similarities between an input (mention, context) pair and candidate entities”]; 
applying a scoring function to obtain a relevance score for each said target candidate page of the corpus for each mention [page 1336, section 2.3, “Given the representation of a candidate entity ve and the representation of a mention context pair vmc, we use the cosine similarity between these two vectors to represent their semantic relatedness, namely 
sim(e, mc) = cosine(ve, vmc) (3) 
In the prediction process, we calculate the similarity between a context mention pair with each candidate entity, and select the closest one as the final result … The basic idea is that the output score of a correct entity should be larger than the score of a randomly selected candidate entity”], said applying a scoring function comprising: 
running a CNN model using the plurality of word sequences of the entity mention and a candidate target page of the knowledge base to compute a first score representing a local similarity score between each entity mention and candidate target page [Fig. 1, page 1334, Col. 1, 1st paragraph, “We cast entity disambiguation as a ranking task by comparing the similarities between an input (mention, context) pair and candidate entities. Specifically, we embed mention, context and entity in continuous vector space to capture their semantic representations. The variable-sized context are modeled with convolutional neural networks”; section 2.1, “learn the continuous representations of context words with convolution neural networks, and produce its semantic composition with mention using a neural tensor network … learn the continuous representation of a candidate entity. We then apply the learned representations of context, mention and entity for calculating the similarity between a candidate entity and a given (mention, context) pair”]; and 
ranking each of said scores for the entity mention [page 1336, section 2.3, “Given the representation of a candidate entity ve and the representation of a mention context pair vmc, we use the cosine similarity between these two vectors to represent their semantic relatedness, namely 
sim(e, mc) = cosine(ve, vmc) (3) 
In the prediction process, we calculate the similarity between a context mention pair with each candidate entity, and select the closest one as the final result”]; and 
providing a link for linking the entity mention to the target candidates page of the knowledge base based on a highest score for the entity mention [page 1336, section 2.3, “Given the representation of a candidate entity ve and the representation of a mention context pair vmc, we use the cosine similarity between these two vectors to represent their semantic relatedness, namely 
sim(e, mc) = cosine(ve, vmc) (3) 
In the prediction process, we calculate the similarity between a context mention pair with each candidate entity, and select the closest one as the final result … The basic idea is that the output score of a correct entity should be larger than the score of a randomly selected candidate entity”].  
Sun does not teach
a non-transitory computer-readable storage medium having computer readable program instructions embodied therewith, the computer readable program instructions executable by at least one processor to cause a computer to perform a computer-implemented method comprising: 
running a RNN model that simultaneously models an interdependence among the other entity mentions in the document and other candidate pages to compute a second score; 
adding the first and second score to obtain a relevance score for the entity mention; 
ranking each of said obtained relevance scores for the entity mention (emphasis added); and 
providing a link for linking the entity mention to the target candidates page of the knowledge base based on a highest relevance score for the entity mention (emphasis added).
Yu teaches 
a non-transitory computer-readable storage medium having computer readable program instructions embodied therewith, the computer readable program instructions executable by at least one processor to cause a computer to perform a computer-implemented method comprising [paragraph 0060, “The one or more memory devices 614 can store information accessible by the one or more processors 612, including computer-readable instructions 616 that can be executed by the one or more processors 612. The instructions 616 can be any set of instructions that when executed by the one or more processors 612, cause the one or more processors 612 to perform operations”]: 
running a RNN model that simultaneously models an interdependence among the other entity mentions in the document and other candidate pages to compute a second score [paragraphs 0022-0024, “The one or more images that depict the entity and the candidate entity profiles can be provided as input to a machine learning model … the machine learning model can include various other suitable models, such as a recurrent neural network (e.g. a long short-term memory (LSTM) network, and/or a convolutional LSTM network) … the CNN can be configured to extract features from the images, while the LSTM can be configured to obtain text-related information from the candidate entity profile(s). The CNN may be further configured to provide data indicative of the extracted features to the LSTM. The LSTM may model at least a portion of the structured information from a candidate entity profile as a sequence of characters, such that a match score between the extracted features of the one or more images and the data from the candidate entity profile can be determined”; paragraph 0027 further recite “the machine learning model can include multiple LSTM networks along with a single CNN … The multiple LSTM networks can be configured to simultaneously determine a match score between an entity depicted in an image and multiple entity profiles. In this manner, data indicative of a different entity profile can be provided to each LSTM network. The CNN can extract features from one or more images depicting an entity and provide data indicative of the extracted features to each LSTM network. Each LSTM network may then be configured to determine a match score between the respective entity profiles and the entity depicted in the image(s) in a parallel manner”; Figs 1 and 3 disclose the process of combining the CNN network and the RNN network to determine the match score; Fig. 4, paragraphs 0040-0041 disclose how the match scores are calculated and selected, “The machine learning model can be configured to determine match scores between entity 302 and the each candidate entity profile 304 … a match score of 0.727 between entity 302 and the candidate entity profile associated with Mr. Optical Inc. As shown, such match score can be indicative of a "match" between entity 302 and the candidate entity profile associated with Mr. Optical Inc. (highest match score)”]; 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the method of entity linking of Sun to include the process of running a RNN model to compute a second score of Yu. Doing so would help generating a confidence value specifying the likelihood that the entity associated with image data is the same entity as the entity associated with entity data, and providing speed increases in training machine learning model (Yu, 0036 and 0039).
Sun and Yu do not teach
adding the first and second score to obtain a relevance score for the entity mention; 
ranking each of said obtained relevance scores for the entity mention (emphasis added); and 
providing a link for linking the entity mention to the target candidates page of the knowledge base based on a highest relevance score for the entity mention (emphasis added).
Manning teaches
adding the first and second score to obtain a relevance score for the entity mention [Fig. 2, paragraphs 0037-0040, “the hybrid recommender system 200 can be trained to determine each of the weight values via a technique such as a technique used in neural networks … The hybrid recommender system 200 can include a weight 206-1 and one or more other weights 206-2, ... 206-N. Each of the weights 206-1, 206-2, ...  206-N can be associated with a corresponding one of the recommender systems 202-1, 202-2, ... 202-N … the combiner 204 can produce the combined output by numerically combining the weighted outputs from the recommender systems 202-1, 202-2, ... 202-N”; paragraph 0082, “The model rankings can be used to generate a master ranking via a weighted average of ranks of the RNN, CNN …”]; 
ranking each of said obtained relevance scores for the entity mention [Fig. 2, paragraphs 0037-0040, “the hybrid recommender system 200 can be trained to determine each of the weight values via a technique such as a technique used in neural networks … The hybrid recommender system 200 can include a weight 206-1 and one or more other weights 206-2, ... 206-N. Each of the weights 206-1, 206-2, ...  206-N can be associated with a corresponding one of the recommender systems 202-1, 202-2, ... 202-N … the combiner 204 can produce the combined output by numerically combining the weighted outputs from the recommender systems 202-1, 202-2, ... 202-N”; paragraph 0082, “The model rankings can be used to generate a master ranking via a weighted average of ranks of the RNN, CNN …”; Since Sun (as modified) teaches the CNN and RNN networks are combined to generate a match/relevance score [Yu, Fig. 3 shows output of the CNN is input to the RNN network, then output from the RNN (LSTM) is used to generate the match score], however, Sun (as modified) is silent of combining the scores/weights output from the CNN and RNN networks (first and second scores) to generate the relevance score, Manning teaches the relevance score is obtained by combining the scores/weights of the recommender systems 202-1, 202-2, ... 202-N, where, the weights are determined via a technique used in neural networks which comprising CNN and RNN, and  Sun (as modified) further teaches how the scores are ranked [Sun, page 1336, section 2.3, “Given the representation of a candidate entity ve and the representation of a mention context pair vmc, we use the cosine similarity between these two vectors to represent their semantic relatedness, namely 
sim(e, mc) = cosine(ve, vmc) (3) 
In the prediction process, we calculate the similarity between a context mention pair with each candidate entity, and select the closest one as the final result”], therefore, the combination of Sun (as modified) and Manning read on the claim limitation]; and 
providing a link for linking the entity mention to the target candidates page of the knowledge base based on a highest relevance score for the entity mention [Fig. 2, paragraphs 0037-0040, “the hybrid recommender system 200 can be trained to determine each of the weight values via a technique such as a technique used in neural networks … The hybrid recommender system 200 can include a weight 206-1 and one or more other weights 206-2, ... 206-N. Each of the weights 206-1, 206-2, ...  206-N can be associated with a corresponding one of the recommender systems 202-1, 202-2, ... 202-N … the combiner 204 can produce the combined output by numerically combining the weighted outputs from the recommender systems 202-1, 202-2, ... 202-N”; paragraph 0082, “The model rankings can be used to generate a master ranking via a weighted average of ranks of the RNN, CNN …”; Since Sun (as modified) teaches the CNN and RNN networks are combined to generate a match/relevance score [Yu, Fig. 3 shows output of the CNN is input to the RNN network, then output from the RNN (LSTM) is used to generate the match score], however, Sun (as modified) is silent of combining the scores/weights output from the CNN and RNN networks (first and second scores) to generate the relevance score, Manning teaches the relevance score is obtained by combining the scores/weights of the recommender systems 202-1, 202-2, ... 202-N, where, the weights are determined via a technique used in neural networks which comprising CNN and RNN, and  Sun (as modified) further teaches linking the entity mention to the target candidates based on a highest score for the entity mention [Sun, page 1336, section 2.3, “Given the representation of a candidate entity ve and the representation of a mention context pair vmc, we use the cosine similarity between these two vectors to represent their semantic relatedness, namely 
sim(e, mc) = cosine(ve, vmc) (3) 
In the prediction process, we calculate the similarity between a context mention pair with each candidate entity, and select the closest one as the final result … The basic idea is that the output score of a correct entity should be larger than the score of a randomly selected candidate entity”], therefore, the combination of Sun (as modified) and Manning read on the claim limitation].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the method of entity linking of Sun to include the process of adding the first and second score to obtain a relevance score and ranking each of said obtained relevance scores of Manning. Doing so would help generating a master ranking via a weighted average of ranks of the RNN, CNN models (Manning, 0082).

As per claim 11, Sun, Yu and Manning teach the computer program product as claimed in claim 10.
Sun further teaches
said plurality of word sequences of said entity mention comprises: a data triplet comprising a surface string, a context and the entire document having the entity mention [page 1334, Fig. 1, “the mention “President Obama” comes from an original document “After campaigning on the promise of health care reform, President Obama gave a speech in March 2010 in Pennsylvania””, where, the context is “After campaigning on the promise of health care reform, gave a speech in March 2010 in Pennsylvania”; introduction, page 1333, 2nd paragraph, “Representative mention features include document surface feature like lexical and part-of-speech tags of context words”] and, said page having an associated title and content potentially referring to said entity mention [Fig. 1 shows an entity (page) having a title Barack Obama; page 1335, Col. 2, last paragraph, “We model the semantics of an entity in knowledge base from two aspects: entity surface words and entity class. For example, the surface words of entity Barack Obama are barack and obama. Entity class of an entity is a word or a phrase provided in infobox of reference knowledge base, which indicates the category information of the entity. For example, the class of Barack Obama is president of the united states”; page 1333, introduction, 2nd paragraph, “compare the context of a mention with the text associated with a candidate entity (e.g. the text in the corresponding page in reference KB)”; page 1337, Col. 2, last paragraph, “a mention mi and an entity title ti”].  

As per claim 12, Sun, Yu and Manning teach the computer program product as claimed in claim 10.
Sun further teaches
preserving the order of the entity mentions from the beginning to the end of an input document [page 1335, context modeling, 1st paragraph, “representation of a context is also influenced by the distance between a context word and the mention, this is based on the consideration that a closer context word might be more informative than a farther one for disambiguating the mention, the vector of each context word is made up of two parts: a word embedding and a position embedding, the position of a context word is its distance to the mention in a given piece of text”; page 1337, 4th paragraph, “an improvement is further achieved by incorporating the position information of context words, a closer context word might be more informative than a farther one for disambiguating the mention”; since the distance between a context word and the mention of the document is in part used for disambiguating the entity mention, it can be understood that the order of the mentions and the context words are preserved].
Yu further teaches 
said running said RNN model further comprises: computing the second scores for all the target candidates pages of all the entity mentions in each document simultaneously [Fig. 3, paragraphs 0038-0039, “LSTM networks 204-208 can receive entity data 212, 214, and 216 as input. In particular, LSTM 204 can receive entity 1 data 212, LSTM 206 can receive entity 2 data 214, and LSTM 208 can receive entity 3 data 216 … each LSTM 204-208 can receive the same feature parameters … LSTMs 204-208 can then be configured to obtain text-related data associated with the respective entity data 212-216, and to provide the data indicative of the extracted features and the text-related data to the respective classifiers 218, 220, and 222 to determine match scores between the entity associated with image data 210 and the respective entity profiles associated with entity data 212-216 in a parallel manner”].  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the method of entity linking of Sun to include the process of computing the second scores for all the target candidate pages of all the entity mentions in each document simultaneously of Yu. Doing so would help providing speed increases in training machine learning model (Yu, 0039).

As per claim 13, Sun, Yu and Manning teach the computer program product as claimed in claim 10.
Sun further teaches
encoding multiple variables of the word sequences, or target candidate pages, or both [abstract, “the model takes consideration of the semantic representations of mention, context and entity, encodes them in continuous vector space”]; 
transforming each encoded variable into a corresponding encoded vector using a word embedding table [page 1335, section 2.2, 2nd paragraph, “the vector of each context word is made up of two parts: a word embedding ew = Lwiw and a position embedding ep =  Lpip, where Lw and Lp are the lookup tables of words and positions, respectively”]; and 
running CNN model operations on said encoded vectors to obtain a distributed representation for the encoded vectors [page 1335, Figs. 1-2, section 2.2, 2nd paragraph disclose context modeling with convolutional neural network, the input of context convolution includes word embedding and position embedding; using convolutional neural network to produce a fixed-length vector for a context].  

As per claim 14, Sun, Yu and Manning teach the computer program product as claimed in claim 13.
Sun further teaches
utilizing, using said at least one processor [page 1337, section 3.3, 2nd paragraph, CPU], a set of multiple window sizes to parameterize the CNN model operations on said encoded vectors to obtain resultant vectors [page 1335, Col. 1, 2nd paragraph, “the convolution layer is a list of linear layers whose parameters are shared in different filter windows as given in Figure 2. Formally, suppose the filter window size of each convolution layer is K, the output vector of a convolution layer is calculated as follow Oconv = Wconvinconv + bconv”], each window size corresponding to a convolution matrix of a predetermined dimensionality [page 1336, section 2.4, 1st paragraph “set the dimension of word vector as 50, window size as 5”; page 1335, Fig. 3, section 2.2, 4th paragraph, “represent each input as the concatenation of mention vector and context vector, a bilinear layer is typically parameterized by a matrix M ϵ RNxN, where N is the dimension of each input vector”].  
concatenating the resultant vectors for each window size to obtain a distributed representation for encoded variable as a single concatenation vector [page 1335, Figs. 1-2, section 2.2, 2nd paragraph, “suppose the filter window size of each convolution layer is K … the output vector of a convolution layer is the concatenation of representations of K words in a filter window”].  

As per claim 17, Sun teaches a computer system comprising: 
at least one processor [page 1337, section 3.3, 2nd paragraph, CPU]; 
a memory [page 1337, section 3.3, 2nd paragraph, memory];
identify a set of one or more entity mentions in an electronic document [introduction, page 1333, 1st paragraph, “given a document and a mention which is usually a text span occurred in the document”], an entity mention to be linked to a page of a plurality of candidate pages in a knowledge base [introduction, page 1333, 1st paragraph, “entity disambiguation targets at mapping the mention to an entity from reference knowledge base”]; 
represent each entity mention as a plurality of word sequences capturing a context or topic of the entity mention at multiple granularities in the electronic document [Fig. 1, mention representation; page 1334, section 2.1, “neural network for entity disambiguation is given in Figure 1. As is shown, the input includes three parts, namely a mention, the context of mention and a candidate entity from reference knowledge base”; page 1333, introduction, 1st paragraph, “Given a document and a mention which is usually a text span occurred in the document … For example, given a text span “President Obama” in the document “After campaigning on the promise of health care reform, President Obama gave a speech in March 2010 in Pennsylvania.” as input, the purpose of entity disambiguation is to link the mention “President Obama” in this context to an entity”]; 
for each entity mention in the electronic document, identify a set of target candidate pages in the knowledge base that potentially refer to the entity mention in the document [page 1334, section 2.1, “neural network for entity disambiguation is given in Figure 1. As is shown, the input includes three parts, namely a mention, the context of mention and a candidate entity from reference knowledge base”; page 1334, Col. 1, 1st paragraph, “comparing the similarities between an input (mention, context) pair and candidate entities”]; 
apply a scoring function to obtain a relevance score for each said target candidate page of the corpus for each mention [page 1336, section 2.3, “Given the representation of a candidate entity ve and the representation of a mention context pair vmc, we use the cosine similarity between these two vectors to represent their semantic relatedness, namely 
sim(e, mc) = cosine(ve, vmc) (3) 
In the prediction process, we calculate the similarity between a context mention pair with each candidate entity, and select the closest one as the final result … The basic idea is that the output score of a correct entity should be larger than the score of a randomly selected candidate entity”], said applying a scoring function comprising: 
running a CNN model using the plurality of word sequences of the entity mention and a candidate target page of the knowledge base to compute a first score representing a local similarity score between each entity mention and candidate target page [Fig. 1, page 1334, Col. 1, 1st paragraph, “We cast entity disambiguation as a ranking task by comparing the similarities between an input (mention, context) pair and candidate entities. Specifically, we embed mention, context and entity in continuous vector space to capture their semantic representations. The variable-sized context are modeled with convolutional neural networks”; section 2.1, “learn the continuous representations of context words with convolution neural networks, and produce its semantic composition with mention using a neural tensor network … learn the continuous representation of a candidate entity. We then apply the learned representations of context, mention and entity for calculating the similarity between a candidate entity and a given (mention, context) pair”]; and 
rank each of said scores for the entity mention [page 1336, section 2.3, “Given the representation of a candidate entity ve and the representation of a mention context pair vmc, we use the cosine similarity between these two vectors to represent their semantic relatedness, namely 
sim(e, mc) = cosine(ve, vmc) (3) 
In the prediction process, we calculate the similarity between a context mention pair with each candidate entity, and select the closest one as the final result”]; and 
provide a link for linking the entity mention to the target candidates page of the knowledge base based on a highest score for the entity mention [page 1336, section 2.3, “Given the representation of a candidate entity ve and the representation of a mention context pair vmc, we use the cosine similarity between these two vectors to represent their semantic relatedness, namely 
sim(e, mc) = cosine(ve, vmc) (3) 
In the prediction process, we calculate the similarity between a context mention pair with each candidate entity, and select the closest one as the final result … The basic idea is that the output score of a correct entity should be larger than the score of a randomly selected candidate entity”].  
Sun does not teach
a memory storing instructions to be run at said at least one processor; said instructions configuring said at least one processor to perform a method to: 
running a RNN model that simultaneously models an interdependence among the other entity mentions in the document and other candidate pages to compute a second score; 
add the first and second score to obtain a relevance score for the entity mention; 
rank each of said obtained relevance scores for the entity mention (emphasis added); and 
provide a link for linking the entity mention to the target candidates page of the knowledge base based on a highest relevance score for the entity mention (emphasis added).
Yu teaches
 a memory storing instructions to be run at said at least one processor; said instructions configuring said at least one processor to perform a method to [paragraph 0060, “The one or more memory devices 614 can store information accessible by the one or more processors 612, including computer-readable instructions 616 that can be executed by the one or more processors 612. The instructions 616 can be any set of instructions that when executed by the one or more processors 612, cause the one or more processors 612 to perform operations”]: 
running a RNN model that simultaneously models an interdependence among the other entity mentions in the document and other candidate pages to compute a second score [paragraphs 0022-0024, “The one or more images that depict the entity and the candidate entity profiles can be provided as input to a machine learning model … the machine learning model can include various other suitable models, such as a recurrent neural network (e.g. a long short-term memory (LSTM) network, and/or a convolutional LSTM network) … the CNN can be configured to extract features from the images, while the LSTM can be configured to obtain text-related information from the candidate entity profile(s). The CNN may be further configured to provide data indicative of the extracted features to the LSTM. The LSTM may model at least a portion of the structured information from a candidate entity profile as a sequence of characters, such that a match score between the extracted features of the one or more images and the data from the candidate entity profile can be determined”; paragraph 0027 further recite “the machine learning model can include multiple LSTM networks along with a single CNN … The multiple LSTM networks can be configured to simultaneously determine a match score between an entity depicted in an image and multiple entity profiles. In this manner, data indicative of a different entity profile can be provided to each LSTM network. The CNN can extract features from one or more images depicting an entity and provide data indicative of the extracted features to each LSTM network. Each LSTM network may then be configured to determine a match score between the respective entity profiles and the entity depicted in the image(s) in a parallel manner”; Figs 1 and 3 disclose the process of combining the CNN network and the RNN network to determine the match score; Fig. 4, paragraphs 0040-0041 disclose how the match scores are calculated and selected, “The machine learning model can be configured to determine match scores between entity 302 and the each candidate entity profile 304 … a match score of 0.727 between entity 302 and the candidate entity profile associated with Mr. Optical Inc. As shown, such match score can be indicative of a "match" between entity 302 and the candidate entity profile associated with Mr. Optical Inc. (highest match score)”]; 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the method of entity linking of Sun to include the process of running a RNN model to compute a second score of Yu. Doing so would help generating a confidence value specifying the likelihood that the entity associated with image data is the same entity as the entity associated with entity data, and providing speed increases in training machine learning model (Yu, 0036 and 0039).
Sun and Yu do not teach
add the first and second score to obtain a relevance score for the entity mention; 
rank each of said obtained relevance scores for the entity mention (emphasis added); and 
provide a link for linking the entity mention to the target candidates page of the knowledge base based on a highest relevance score for the entity mention (emphasis added).
Manning teaches
add the first and second score to obtain a relevance score for the entity mention [Fig. 2, paragraphs 0037-0040, “the hybrid recommender system 200 can be trained to determine each of the weight values via a technique such as a technique used in neural networks … The hybrid recommender system 200 can include a weight 206-1 and one or more other weights 206-2, ... 206-N. Each of the weights 206-1, 206-2, ...  206-N can be associated with a corresponding one of the recommender systems 202-1, 202-2, ... 202-N … the combiner 204 can produce the combined output by numerically combining the weighted outputs from the recommender systems 202-1, 202-2, ... 202-N”; paragraph 0082, “The model rankings can be used to generate a master ranking via a weighted average of ranks of the RNN, CNN …”]; 
rank each of said obtained relevance scores for the entity mention [Fig. 2, paragraphs 0037-0040, “the hybrid recommender system 200 can be trained to determine each of the weight values via a technique such as a technique used in neural networks … The hybrid recommender system 200 can include a weight 206-1 and one or more other weights 206-2, ... 206-N. Each of the weights 206-1, 206-2, ...  206-N can be associated with a corresponding one of the recommender systems 202-1, 202-2, ... 202-N … the combiner 204 can produce the combined output by numerically combining the weighted outputs from the recommender systems 202-1, 202-2, ... 202-N”; paragraph 0082, “The model rankings can be used to generate a master ranking via a weighted average of ranks of the RNN, CNN …”; Since Sun (as modified) teaches the CNN and RNN networks are combined to generate a match/relevance score [Yu, Fig. 3 shows output of the CNN is input to the RNN network, then output from the RNN (LSTM) is used to generate the match score], however, Sun (as modified) is silent of combining the scores/weights output from the CNN and RNN networks (first and second scores) to generate the relevance score, Manning teaches the relevance score is obtained by combining the scores/weights of the recommender systems 202-1, 202-2, ... 202-N, where, the weights are determined via a technique used in neural networks which comprising CNN and RNN, and  Sun (as modified) further teaches how the scores are ranked [Sun, page 1336, section 2.3, “Given the representation of a candidate entity ve and the representation of a mention context pair vmc, we use the cosine similarity between these two vectors to represent their semantic relatedness, namely 
sim(e, mc) = cosine(ve, vmc) (3) 
In the prediction process, we calculate the similarity between a context mention pair with each candidate entity, and select the closest one as the final result”], therefore, the combination of Sun (as modified) and Manning read on the claim limitation]; and 
provide a link for linking the entity mention to the target candidates page of the knowledge base based on a highest relevance score for the entity mention [Fig. 2, paragraphs 0037-0040, “the hybrid recommender system 200 can be trained to determine each of the weight values via a technique such as a technique used in neural networks … The hybrid recommender system 200 can include a weight 206-1 and one or more other weights 206-2, ... 206-N. Each of the weights 206-1, 206-2, ...  206-N can be associated with a corresponding one of the recommender systems 202-1, 202-2, ... 202-N … the combiner 204 can produce the combined output by numerically combining the weighted outputs from the recommender systems 202-1, 202-2, ... 202-N”; paragraph 0082, “The model rankings can be used to generate a master ranking via a weighted average of ranks of the RNN, CNN …”; Since Sun (as modified) teaches the CNN and RNN networks are combined to generate a match/relevance score [Yu, Fig. 3 shows output of the CNN is input to the RNN network, then output from the RNN (LSTM) is used to generate the match score], however, Sun (as modified) is silent of combining the scores/weights output from the CNN and RNN networks (first and second scores) to generate the relevance score, Manning teaches the relevance score is obtained by combining the scores/weights of the recommender systems 202-1, 202-2, ... 202-N, where, the weights are determined via a technique used in neural networks which comprising CNN and RNN, and  Sun (as modified) further teaches linking the entity mention to the target candidates based on a highest score for the entity mention [Sun, page 1336, section 2.3, “Given the representation of a candidate entity ve and the representation of a mention context pair vmc, we use the cosine similarity between these two vectors to represent their semantic relatedness, namely 
sim(e, mc) = cosine(ve, vmc) (3) 
In the prediction process, we calculate the similarity between a context mention pair with each candidate entity, and select the closest one as the final result … The basic idea is that the output score of a correct entity should be larger than the score of a randomly selected candidate entity”], therefore, the combination of Sun (as modified) and Manning read on the claim limitation].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the method of entity linking of Sun to include the process of adding the first and second score to obtain a relevance score and ranking each of said obtained relevance scores of Manning. Doing so would help generating a master ranking via a weighted average of ranks of the RNN, CNN models (Manning, 0082).

As per claim 18, Sun, Yu and Manning teach the computer system of Claim 17.
Sun further teaches
encoding multiple variables of the word sequences, or target candidate pages, or both [abstract, “the model takes consideration of the semantic representations of mention, context and entity, encodes them in continuous vector space”]; 
transforming each encoded variable into a corresponding encoded vector using a word embedding table [page 1335, section 2.2, 2nd paragraph, “the vector of each context word is made up of two parts: a word embedding ew = Lwiw and a position embedding ep =  Lpip, where Lw and Lp are the lookup tables of words and positions, respectively”]; and 
running CNN model operations on said encoded vectors to obtain a distributed representation for the encoded vectors [page 1335, Figs. 1-2, section 2.2, 2nd paragraph disclose context modeling with convolutional neural network, the input of context convolution includes word embedding and position embedding; using convolutional neural network to produce a fixed-length vector for a context].  

As per claim 19, Sun, Yu and Manning teach the computer system of Claim 17.
Sun further teaches
utilizing, using said at least one processor [page 1337, section 3.3, 2nd paragraph, CPU], a set of multiple window sizes to parameterize the CNN model operations on said encoded vectors to obtain resultant vectors [page 1335, Col. 1, 2nd paragraph, “the convolution layer is a list of linear layers whose parameters are shared in different filter windows as given in Figure 2. Formally, suppose the filter window size of each convolution layer is K, the output vector of a convolution layer is calculated as follow Oconv = Wconvinconv + bconv”], each window size corresponding to a convolution matrix of a predetermined dimensionality [page 1336, section 2.4, 1st paragraph “set the dimension of word vector as 50, window size as 5”; page 1335, Fig. 3, section 2.2, 4th paragraph, “represent each input as the concatenation of mention vector and context vector, a bilinear layer is typically parameterized by a matrix M ϵ RNxN, where N is the dimension of each input vector”].  
concatenating the resultant vectors for each window size to obtain a distributed representation for encoded variable as a single concatenation vector [page 1335, Figs. 1-2, section 2.2, 2nd paragraph, “suppose the filter window size of each convolution layer is K … the output vector of a convolution layer is the concatenation of representations of K words in a filter window”].  

Claims 9 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Sun et al. in view of Yu et al. in view of Manning et al. and further in view of Florance et al. (US Pub. 2017/0169493 – Applicant provided reference).
As per claim 9, Sun, Yu and Manning teach the computer-implemented method of claim 4.
Yu further teaches
receiving, at said at least one processor running said RNN model operations for a current entity mention, said distributed representations for the encoded vectors [paragraphs 0038-0040, “Machine learning model 200 includes a CNN 202 and LSTM networks 204, 206, and 208 … receive image data 210 as input … image 300 can be encoded, for instance, as a representation of pixel values, and provided as input to a machine learning model];  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the method of entity linking of Sun to include the process of receiving, at the processor running said RNN model operations, the distributed representations for the encoded vectors of Yu. Doing so would help generating a confidence value specifying the likelihood that the entity associated with image data is the same entity as the entity associated with entity data (Yu, 0036).
Sun, Yu and Manning do not teach
accumulating information using said RNN model operations of previous entity mentions and target candidate pages, and providing them as the global constraints for a linking process of a current entity mention.  
Florance teaches
accumulating information using said RNN model operations of previous entity mentions and target candidate pages [paragraph 0005, “for the received query, determine historical value of the attribute that was associated with the particular candidate entity at a past time”], and providing them as the global constraints for a linking process of a current entity mention [paragraph 0005, “receiving a query, identifying candidate entities that are identified as responsive to the query, determining, for a particular candidate entity, (i) a current value of an attribute that is currently associated with the particular candidate entity and, (ii) a historical value of the attribute that was associated with the particular candidate entity at a past time, determining that the current value deviates more than a threshold amount from the historical value, and in response to determining that the current value deviates more than a threshold amount from the historical value, adjusting a ranking score for the particular candidate entity, and ranking the particular candidate entity among the identified candidate entities based at least on the ranking score for the particular candidate entity”].  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the method of entity linking of Sun to include the process of accumulating information of previous entity mentions and target candidate pages, and providing them as the global constraints for a linking process of a current entity mention of Florance. Doing so would help adjusting a ranking score for the particular candidate entity (Florance, 0005).

As per claim 16, Sun, Yu and Manning teach the computer program product as claimed in claim 13.
Yu further teaches
receiving, from running said RNN model operations for a current entity mention, said distributed representations for the encoded vectors [paragraphs 0038-0040, “Machine learning model 200 includes a CNN 202 and LSTM networks 204, 206, and 208 … receive image data 210 as input … image 300 can be encoded, for instance, as a representation of pixel values, and provided as input to a machine learning model];  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the method of entity linking of Sun to include the process of receiving, from running said RNN model operations for a current entity mention, the distributed representations for the encoded vectors of Yu. Doing so would help generating a confidence value specifying the likelihood that the entity associated with image data is the same entity as the entity associated with entity data (Yu, 0036).
Sun, Yu and Manning do not teach
accumulating information using said RNN model operations of previous entity mentions and target candidate pages, and providing them as the global constraints for a linking process of a current entity mention.  
Florance teaches
accumulating information using said RNN model operations of previous entity mentions and target candidate pages [paragraph 0005, “for the received query, determine historical value of the attribute that was associated with the particular candidate entity at a past time”], and providing them as the global constraints for a linking process of a current entity mention [paragraph 0005, “receiving a query, identifying candidate entities that are identified as responsive to the query, determining, for a particular candidate entity, (i) a current value of an attribute that is currently associated with the particular candidate entity and, (ii) a historical value of the attribute that was associated with the particular candidate entity at a past time, determining that the current value deviates more than a threshold amount from the historical value, and in response to determining that the current value deviates more than a threshold amount from the historical value, adjusting a ranking score for the particular candidate entity, and ranking the particular candidate entity among the identified candidate entities based at least on the ranking score for the particular candidate entity”].  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the method of entity linking of Sun to include the process of accumulating information of previous entity mentions and target candidate pages, and providing them as the global constraints for a linking process of a current entity mention of Florance. Doing so would help adjusting a ranking score for the particular candidate entity (Florance, 0005).

Allowable Subject Matter
Claims 7-8, 15 and 20 would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims, and the objections above were overcome.
The following is a statement of reasons for the indication of allowable subject matter: 
Claim 7 is allowable for disclosing the computer-implemented method of Claim 3, further comprising: 
providing a forward linking of the entity mentions to identified target candidate pages and ranking said forward linked target candidate pages based on their computed target candidate relevance scores; and 
traversing, using said RNN Model operations, the entity mentions from an end to the beginning of the electronic to provide a backward linking of the entity mentions to identified target candidates and ranking said backward linked target candidate pages based on their computed relevance scores.  
The closest references found
Sun et al. (Modeling Mention, Context and Entity with Neural Networks for Entity Disambiguation) discloses a neural network for entity disambiguation, the system calculates the similarity between a given entity mention and candidate entities in a knowledge base, and select a closest one as the output [page 1336, section 2.3, “Given the representation of a candidate entity ve and the representation of a mention context pair vmc, we use the cosine similarity between these two vectors to represent their semantic relatedness, namely 
sim(e, mc) = cosine(ve, vmc) (3) 
In the prediction process, we calculate the similarity between a context mention pair with each candidate entity, and select the closest one as the final result … The basic idea is that the output score of a correct entity should be larger than the score of a randomly selected candidate entity”].
Yu et al. (US Pub. 2017/0286805) discloses a method of using recurrent neural network (RNN) model to calculate a score indicating a match between the entity depicted in the input and the data from the candidate entity profile in the database [paragraph 0027 “the machine learning model can include multiple LSTM networks along with a single CNN … The multiple LSTM networks can be configured to simultaneously determine a match score between an entity depicted in an image and multiple entity profiles. In this manner, data indicative of a different entity profile can be provided to each LSTM network. The CNN can extract features from one or more images depicting an entity and provide data indicative of the extracted features to each LSTM network. Each LSTM network may then be configured to determine a match score between the respective entity profiles and the entity depicted in the image(s) in a parallel manner”; Fig. 4, paragraphs 0040-0041 disclose how the match scores are calculated and selected, “The machine learning model can be configured to determine match scores between entity 302 and the each candidate entity profile 304 … a match score of 0.727 between entity 302 and the candidate entity profile associated with Mr. Optical Inc].
Manning et al. (US Pub. 2017/0031920) discloses a method of adding the scores generated from different neural networks [paragraph 0082, “The model rankings can be used to generate a master ranking via a weighted average of ranks of the RNN, CNN].
Li et al. (US Pub. 2017/0109355 - IDS) discloses a method for disambiguating different entities using RNN model by computing the relation scores, and the relation having the highest ranking score is identified as the correct relation for the candidate subject [paragraph 0029, “training a system to disambiguate different entities directly, the relations that each entity has are utilized to decide which one is more possible to appear in the question context”; paragraph 0097, “for each candidate subject, the relation having the highest ranking score is identified as the correct relation for the candidate subject”].
However, the prior art of record do not teach or suggest, individually or in combination,
providing a forward linking of the entity mentions to identified target candidate pages and ranking said forward linked target candidate pages based on their computed target candidate relevance scores; and 
traversing, using said RNN Model operations, the entity mentions from an end to the beginning of the electronic to provide a backward linking of the entity mentions to identified target candidates and ranking said backward linked target candidate pages based on their computed relevance scores.  
Therefore the combination of features is considered to be allowable.
Claim 8 is considered to be allowable because it is dependent on claim 7.

Claim 15 is allowable for disclosing the computer program product of Claim 12, wherein the method further comprises: 
providing a forward linking of the entity mentions to identified target candidate pages and ranking said forward linked target candidate pages based on their computed target candidate relevance scores; 
traversing, using said RNN Model operations, the entity mentions from an end to the beginning of the electronic to provide a backward linking of the entity mentions to identified target candidates and ranking said backward linked target candidate pages based on their computed relevance scores; and 
combining the computed relevance score of a forward linked target candidate page for an entity mention and the relevance score of the backward linked target candidate page for that entity mention to create a combined linking score for said ranking.  
The closest references found
Sun et al. (Modeling Mention, Context and Entity with Neural Networks for Entity Disambiguation) discloses a neural network for entity disambiguation, the system calculates the similarity between a given entity mention and candidate entities in a knowledge base, and select a closest one as the output [page 1336, section 2.3, “Given the representation of a candidate entity ve and the representation of a mention context pair vmc, we use the cosine similarity between these two vectors to represent their semantic relatedness, namely 
sim(e, mc) = cosine(ve, vmc) (3) 
In the prediction process, we calculate the similarity between a context mention pair with each candidate entity, and select the closest one as the final result … The basic idea is that the output score of a correct entity should be larger than the score of a randomly selected candidate entity”].
Yu et al. (US Pub. 2017/0286805) discloses a method of using recurrent neural network (RNN) model to calculate a score indicating a match between the entity depicted in the input and the data from the candidate entity profile in the database [paragraph 0027 “the machine learning model can include multiple LSTM networks along with a single CNN … The multiple LSTM networks can be configured to simultaneously determine a match score between an entity depicted in an image and multiple entity profiles. In this manner, data indicative of a different entity profile can be provided to each LSTM network. The CNN can extract features from one or more images depicting an entity and provide data indicative of the extracted features to each LSTM network. Each LSTM network may then be configured to determine a match score between the respective entity profiles and the entity depicted in the image(s) in a parallel manner”; Fig. 4, paragraphs 0040-0041 disclose how the match scores are calculated and selected, “The machine learning model can be configured to determine match scores between entity 302 and the each candidate entity profile 304 … a match score of 0.727 between entity 302 and the candidate entity profile associated with Mr. Optical Inc].
Manning et al. (US Pub. 2017/0031920) discloses a method of adding the scores generated from different neural networks [paragraph 0082, “The model rankings can be used to generate a master ranking via a weighted average of ranks of the RNN, CNN].
Li et al. (US Pub. 2017/0109355) discloses a method for disambiguating different entities using RNN model by computing the relation scores, and the relation having the highest ranking score is identified as the correct relation for the candidate subject [paragraph 0029, “training a system to disambiguate different entities directly, the relations that each entity has are utilized to decide which one is more possible to appear in the question context”; paragraph 0097, “for each candidate subject, the relation having the highest ranking score is identified as the correct relation for the candidate subject”].
However, the prior art of record do not teach or suggest, individually or in combination,
providing a forward linking of the entity mentions to identified target candidate pages and ranking said forward linked target candidate pages based on their computed target candidate relevance scores; 
traversing, using said RNN Model operations, the entity mentions from an end to the beginning of the electronic to provide a backward linking of the entity mentions to identified target candidates and ranking said backward linked target candidate pages based on their computed relevance scores; and 
combining the computed relevance score of a forward linked target candidate page for an entity mention and the relevance score of the backward linked target candidate page for that entity mention to create a combined linking score for said ranking.  
Therefore the combination of features is considered to be allowable.
Claim 20 is considered to be allowable for disclosing the similar subject matter to claim 15.

Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.
De et al. (US Pub. 2013/0326325) describes a method for annotating an entity in a document corpus using cross-document signals.
Kambhatla et al. (US Pub. 2007/0061703) describes methods for annotating documents with one or more of entities, events and relations.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TRI T NGUYEN whose telephone number is (571)272-0103. The examiner can normally be reached M-F, 8 AM-5 PM, (CT).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, OMAR FERNANDEZ can be reached on 571-272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/TRI T NGUYEN/Examiner, Art Unit 2128                                                                                                                                                                                                        
/OMAR F FERNANDEZ RIVAS/Supervisory Patent Examiner, Art Unit 2128