DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
The filing date of the present application was 01/10/2017. 
This action is in response to amended claims filed 001/15/2021 for application 15/403,118 filed January 10th 2017. Claims 1, 11 and 16 have been amended and claims 2, 9, 12, and 17 were cancelled. Claims 1, 3-8, 10-11, 13-16 and 18-23 are currently pending and have been examined. 


Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 02/12/2021 has been entered.
 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 10-11 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Zhao et al. (“SKILL: A System for Skill Identification and Normalization”, hereinafter: Zhao) in view of Liu et al. (“CompanyDepot: Employer Name Normalization in the Online Recruitment Industry”, hereinafter: Liu) and further in view of Quiros et al. (“LABDA at the 2016 TASS challenge task: using word embeddings for the sentiment analysis task”, hereinafter: Quiros).
Regarding claim 1(Currently Amended) 
Zhao teaches a computer-implemented method comprising: receiving, by a computing system, (Although Zhao doesn’t explicitly teaches computer with processor, the examiner takes official notice that it is well known in the art for neural network to be implemented on a computer with a processor see pg. 4012 right col “We further confirmed from various cross-domain samples that frequency based descriptive approaches are inferior to the inference based neural network language models (results not shown).”)
a first identifier and a second identifier known to reference a first entity, (Examiner notes that the texts also known as “surface forms” corresponds to identifiers because Named Entity Recognition(NER) and Named Entity Normalization  (NEN) is mapping the words or texts to the known entities see abstract “Named Entity Recognition (NER) and Named Entity Normalization (NEN) refer to the recognition and normalization of raw texts to known entities.” and also see pg. 4012 right col first paragraph “There are two main challenges in identifying and realizing a skill from resumes or job descriptions: 1) the same skill represented in different ways (e.g., C# and C sharp); 2) the same term representing different skills in different contexts (e.g., Java in Java coffee and Java programming language). This naturally escalates the need for a more comprehensive and accurate system of Named Entity Recognition (NER) and Named Entity Normalization (NEN) to identify and map similar references to entities of skills. NER refers to the recognition of phrases of interest from text (commonly referred to as surface forms) and NEN refers to appropriate associations of these surface forms with a formal entity.”)
(Although Zhao doesn’t explicitly teaches computer with processor, the examiner takes official notice that it is well known in the art for neural network to be implemented on a computer with a processor see pg. 4012 right col “We further confirmed from various cross-domain samples that frequency based descriptive approaches are inferior to the inference based neural network language models”)
an original feature dimensionality associated with a plurality of identifiers, including the first identifier and the second identifier, (pg. 4014 right col “Surface form filtered vectors: original vectors filtered by surface forms. Namely, elements of original vectors are retained only if they are contained in the taxonomy. Note that the size of original vectors could reduce dramatically following the screening.”)
for a plurality of entities, including the first entity, (pg. 4016 section 5 “We developed, deployed, and evaluated SKILL, a system for skill entity recognition and normalization. SKILL is designed to meet the increasing business need of workforce analytics implemented at CareerBuilder in attempt to close the skill-gap in the U.S. labor market. There are 50K surface forms mapped to 30K skill entities in the current system” also see Pg. 4012 right col second paragraph “This premise rationalizes the foundation of the SKILL system: extracting surface forms from resumes, job postings and beyond, then properly normalizing them to eligible skill entities.”)
wherein the plurality of identifiers are associated with resume data; (pg. 1 right col second paragraph “In this paper we present SKILL, an automated system for taxonomy generation and skills recognition and normalization. We assume that recruitment related documents are composed and edited in a professional manner. This premise rationalizes the foundation of the SKILL system: extracting surface forms[corresponds to plurality of identifiers] from resumes, job postings and beyond, then properly normalizing them to eligible skill entities.”)
…
and providing, by the computing system, (Although Zhao doesn’t explicitly teaches computer with processor, the examiner takes official notice that it is well known in the art for neural network to be implemented on a computer with a processor see pg. 4012 right col “We further confirmed from various cross-domain samples that frequency based descriptive approaches are inferior to the inference based neural network language models (results not shown).”)
a first vector representation associated with the first identifier and a second vector representation associated with the second identifier (pg. 4014 right col “Surface form trained vectors: vectors trained solely by surface forms. Instead of using all the seed skill phrases as for original vectors, we only employ surface forms for training. This approach significantly increases the relevancy of skill vectors, but reduces coverage and complicates taxonomy expansion… We set min-count as 1 for surface form trained vectors to recruit all surface forms while default for training original vectors. Note that the min-count controls the level of noise allowed, of which the default value is observed to be optimal by our experiments when noise level is moderate. We increased the vector size to 200 from default of 100, because we have found the size most comprehensive in representation. It is important to mention that the assignment of a surface form with a related word2vec vector is a bijective mapping”)
as features to train a machine learning model, (pg. 4012 right col “We further confirmed from various cross-domain samples that frequency based descriptive approaches are inferior to the inference based neural network[corresponds to machine learning model] language models (results not shown).”)
Zhao does not explicitly teach wherein the first entity is at least one of an academic institution or an employer,
…
the original feature dimensionality associated with a total number of the plurality of identifiers for the plurality of entities; 
applying, by the computing system, the first identifier and the second identifier to an embedding model for generating vector representations in a vector space associated with a desired feature dimensionality representing a reduction in the original feature dimensionality associated with the total number of the plurality of identifiers for the plurality of entities; 
…
wherein the first vector representation and the second vector representation are associated with the desired feature dimensionality representing the reduction in the original feature dimensionality.
Liu teaches herein the first entity is at least one of an academic institution Pg. 527 section 6.1.2 “To understand the generalization ability of CompanyDepot, we apply it to a similar task of academic institution name normalization [11]. We used the same five datasets that are used in the previous work… a list of 145 popular universities and colleges in UK from the website of 4ICU [1]”)
or an employer, (pg. 523 right col “Our task is to link the employer names in job postings or resumes to entities in an employer KB, as illustrated in Figure 1… The entities in the employer KB are denoted by E = {e1, e2, ..., ek}. The employer names and the associated location contexts extracted from job postings and resumes3 are denoted by Q = {q1, q2, ..., qc}, where qi = (ni, li) is a pair of employer name and the associated location” Examiner notes FIG. 1 shows “illustration of employer name normalization in jobs and resumes that shows business name including (CareerBuilder LLC, IBM, Macy, The Home Depot etc..) along with headquarters”)
…
the original feature dimensionality associated with a total number of the plurality of identifiers for the plurality of entities; (Examiner notes that Fig. 1 shows candidate resume for Bob and shows work experience which include two different companies(Macys.com and Microsoft) that Bob has worked for which corresponds to total number of identifiers for plurality of entities see Fig. 1 on pg. 2 “Figure 1: Illustration of employer name normalization in jobs and resumes”)
Zhou and Liu are analogous art because they are both directed to training data set using Machine learning model. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zhou to incorporate the teaching of Liu to include employer name normalization task in online recruitment industry. 
Liu (pg. 521 right col). 
Zhou in view of Liu does not teach applying, by the computing system, the first identifier and the second identifier to an embedding model for generating vector representations in a vector space associated with a desired feature dimensionality representing a reduction in the original feature dimensionality associated with the total number of the plurality of identifiers for the plurality of entities; 
…
wherein the first vector representation and the second vector representation are associated with the desired feature dimensionality representing the reduction in the original feature dimensionality.
Quiros teaches applying, by the computing system, (Although Quiros doesn’t explicitly teaches computer with processor, the examiner takes official notice that it is well known in the art for Machine Learning algorithm to be implemented on a computer with a processor see pg. 31 “As stated before, we have two main training models: Averaged centroids and the averaged centroids including the inverted document frequency, for both the lemmatized and not-lemmatized texts. We performed experiments using three different classifiers: Random Forests, Support Vector Machines and Logistic Regression because these classifiers often achieved the best results for text classification and sentiment analysis.”)
(Examiner notes that the first and second identifiers corresponds to feature vector entry of the word from the bag of word embedding model see pg. 30 left col second paragraph “While the well-known Bag-of-Words (BoW) model involves a very large number of features (as many as the number of nonstopwords words with at least a minimum number of occurrences in the training data), the word embedding representation allows a significant reduction in the feature set size (in our case, from million to just 300). The dimensionality reduction is a desirable goal, because it helps in avoiding overfitting and leads to a reduction of the training and classification times, without any performance loss.”)
associated with a desired feature dimensionality representing a reduction in the original feature dimensionality associated with the total number of the plurality of identifiers for the plurality of entities; (Pg. 30 left col second paragraph “While the well-known Bag-of-Words (BoW) model involves a very large number of features (as many as the number of nonstopwords words with at least a minimum number of occurrences in the training data), the word embedding representation allows a significant reduction in the feature set size (in our case, from million to just 300). The dimensionality reduction is a desirable goal, because it helps in avoiding overfitting and leads to a reduction of the training and classification times, without any performance loss.” Examiner notes that the original feature vector has a dimensionality of 300 equal to the number of identifiers see pg. 30 right col second paragraph “Then, for each token, we search its vector in the word embedding model. We use a pretrained model (Cardellino, 2016), which was generated by using the word2vec algorithm (Mikolov and Dean, 2013) from a collection of Spanish texts with approximately 1.5 billion words. The dimension of the word embedding is 300.”)
…
wherein the first vector representation and the second vector representation are associated with the desired feature dimensionality representing the reduction in the original feature dimensionality. (Pg. 30 left col second paragraph “While the well-known Bag-of-Words (BoW) model involves a very large number of features (as many as the number of nonstopwords words with at least a minimum number of occurrences in the training data), the word embedding representation allows a significant reduction in the feature set size (in our case, from million to just 300). The dimensionality reduction is a desirable goal, because it helps in avoiding overfitting and leads to a reduction of the training and classification times, without any performance loss.”)
Zhou, Liu and Quiros are analogous art because they are all directed to machine learning model. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zhou in view of Liu to incorporate the teaching of Quiros to include word embedding model which can yield significant dimensionality reduction. 
One of ordinary skill in the art would have been motivated to make this modification in order to improve dimensionality reduction and “have several positive effects on our algorithms such as faster training, avoiding overfitting and better performance” as disclosed by Quiros (pg. 21 right col first paragraph). 
Regarding claims 11 and 16
Referring to independent Claim 11 and Claim 16, they are rejected on the same basis as independent claim 1 since they are analogous claims.

Regarding claim 10  
Zhou in view of Liu with Quiros teaches claim 1.
Zhou further teaches …and the machine learning model is trained (pg. 4014 section 2.2 “we utilize word2vec (Mikolov et al. 2013). Word2vec[corresponds to machine learning model] is an efficient implementation of the continuous bag-of-words and skipgram models for computing vector representations of words. These representations can be used in natural language processing applications as equivalent to words represented. The motivation is to combine a given surface form with corresponding neighboring (either semantically or syntacphrases to improve relevancy, under the assumption that related skills are likely to appear closely in a recruitment document…. In order to properly create relevant skill vectors, we experimented with three approaches to train the data. Original vectors: vectors trained using all the seed skill phrases extracted from input sources. This is the same data set used for Wikipedia API call for taxonomy building.”) to identify job candidates for an organization. (abstract “Such analytics help bridge the skills gap between employers and candidate workers by matching the right talent for the right job and identifying in-demand skills for workforce training programs. This can also work towards the social goal of providing more job opportunities to the community. In this paper we propose an automated approach for skill entity recognition and optimal normalization.”)
Liu further teaches wherein the at least one of the academic institution or the employer is reflected in resume data (FIG. 1 shows candidate resumes with employer and job position for Bob see “Figure 1: Illustration of employer name normalization in jobs and resumes”)
Zhou, Liu and Quiros are analogous art because they are all directed to machine learning model. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zhou in view of Quiros to incorporate the teaching of Liu to include entity with employer information or academic institution that enable accurate computation that allows employer name normalization that includes linking the employer name in the job postings and the resumes to the employer entities as disclosed by Liu (pg. 521 right col). 

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Zhao et al. (“SKILL: A System for Skill Identification and Normalization”, hereinafter: Zhao) in view of Liu et al. (“CompanyDepot: Employer Name Normalization in the Online Recruitment Industry”, hereinafter: Liu) in view of Quiros et al. (“LABDA at the 2016 TASS challenge task: using word embeddings for the sentiment analysis task”, hereinafter: Quiros) and further in view of Argerich et al. (“Hash2Vec: Feature Hashing for Word Embeddings”, hereinafter: Argerich).
Regarding claim 3 
Zhou in view of Liu with Quiros teaches claim 1. 
Zhou in view of Liu with Quiros does not teach wherein the desired feature dimensionality is less than the original feature dimensionality by a plurality of orders of magnitude.
Argerich teaches wherein the desired feature dimensionality (pg. 6 section 3.2 “To understand how the algorithm behaves and to further observe the resulting vector space, we applied t-SNE [14], a widely used dimensionality reduction algorithm.”) is less than the original feature dimensionality by a plurality of orders of magnitude. (Pg. 5 first paragraph “Both graphs in Figure 1 show that Hash2Vec approximates the full vectors when the vector dimension increases. A very good approximation can be obtained with dimensions in the order of the hundreds, orders of magnitude smaller than full vectors.”)
Zhou, Quiros, Liu and Argerich are analogous art because they are all directed to training data set using Machine learning model. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zhou in view of Quiros to incorporate the teaching of Liu to include dimensionality reduction that effectively reduces the embedding size of feature vector as disclosed by Argerich (pg. 7 section 4). 

Claims 4, 13 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Zhao et al. (“SKILL: A System for Skill Identification and Normalization”, Liu et al. in view of Quiros et al. and further in view of Rui et al. (“Unsupervised Feature Selection for Text Classification via Word Embedding”, hereinafter: Rui).
Regarding claim 4
Zhou in view of Liu with Quiros teaches claim 1. 
Zhou in view of Liu with Quiros does not teach the method further comprising: selecting the desired feature dimensionality based at least in part on an amount of available training-data for the machine learning model.  
Rui teaches the method further comprising: selecting the desired feature dimensionality based at least in part on an amount of available training-data for the machine learning model. (Pg. 3 left col section B “The K-Means algorithm is the most commonly used clustering method. To use KMeans, the number of clusters k should be provided. Naturally, we take[corresponds to selecting] the dimensionality of the desired feature space as the number of clusters k. A larger k results in finer granularity clustering.”)
Zhou, Quiros, Liu and Rui are analogous art because they are all directed to data automation.  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zhou in view of Quiros with Liu to incorporate the teaching of Rui to include K-means algorithm for selecting desired dimensionality feature space and mapped to a vector space and the distances between word vectors as disclosed by Rui (pg. 3 section B second paragraph). 

Regarding claims 13 and 18
Referring to dependent Claim 13 and Claim 18, they are rejected on the same basis as dependent claim 4 since they are analogous claims.

Claims 5, 14 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Zhao et al. (“SKILL: A System for Skill Identification and Normalization”, hereinafter: Zhao) in view of Liu et al. in view of Quiros et al. and further in view of Espinosa et al. (“Learning to recognise named entities in tweets by exploiting weakly labelled data”, hereinafter: Espinosa).
Regarding claim 5 
Zhou in view of Liu with Quiros teaches claim 1. 
Zhou in view of Liu with Quiros does not teach the method further comprising: -2-Application Serial No. 15/403,118Docket No. 36FB-180377generating the first vector representation associated with the first identifier and the second vector representation associated with the second identifier based on the embedding model.  
Espinosa teaches the method further comprising: -2-Application Serial No. 15/403,118Docket No. 36FB-180377generating the first vector representation associated with the first identifier and the second vector representation associated with the second identifier (Examiner notes that first and second identifiers correspond to the entity types as shown in Table. 2 see pg. 157 first paragraph “In order to categorise these entities according to the shared task’s categories of interest, we defined the mappings shown in Table 2, informed by the annotation guidelines provided by the shared task organisers. For many entities, DBpedia Spotlight provides multiple hierarchical labels. For example, the entity “Justin Bieber” is labelled by the tool as both Person and MusicArtist, which map to W-NUT’s person and musicartist, respectively.” Also see pg. 154 section 3 “We cast the NER task as a sequence labelling problem: every tweet is a sequence of tokens, each of which is automatically assigned a label (or tag) that is indicative of its membership to a semantic type or category… According to the popular begin inside-outside (BIO) scheme, each token is tagged as any of ‘B’, ‘I’ or ‘O’ depending on whether it is at the beginning, inside or outside a named entity, respectively.”) based on the embedding model. (Pg. 154 second paragraph “we combined deep learning-based approaches with distant supervision methods for generating weakly labelled data, and further optimised performance by exploring the use of different tagging schemes and word embeddings.”)
Zhou, Quiros, Liu and Espinosa are analogous art because they are all directed to data automation.  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zhou in view of Quiros with Liu to incorporate the teaching of Espinosa to include deep learning approach for generating labelled data, tagging schemes and word embedding to alleviate data sparsity of some named entity types and resulted in increased performance in their respective entity types as disclosed by Espinosa (pg. 161 section 5). 
Regarding claims 14 and 19
Referring to dependent Claim 14 and Claim 19, .
Claims 6-7, 15 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Zhao et al. (“SKILL: A System for Skill Identification and Normalization”, hereinafter: Zhao) in view of Liu et al. in view of Quiros et al. and further in view of Bilenko et al. (“Adaptive Name Matching in Information Integration”, hereinafter: Bilenko). 
Regarding claim 6 
Zhou in view of Liu with Quiros teaches claim 1. 
Zhou further teaches the method further comprising: associating the first identifier (Examiner notes that the texts also known as “surface forms” corresponds to identifiers because Named Entity Recognition(NER) and Named Entity Normalization  (NEN) is mapping the words or texts to the known entities see abstract “Named Entity Recognition (NER) and Named Entity Normalization (NEN) refer to the recognition and normalization of raw texts to known entities.” and also see pg. 4012 right col first paragraph “There are two main challenges in identifying and realizing a skill from resumes or job descriptions: 1) the same skill represented in different ways (e.g., C# and C sharp); 2) the same term representing different skills in different contexts (e.g., Java in Java coffee and Java programming language). This naturally escalates the need for a more comprehensive and accurate system of Named Entity Recognition (NER) and Named Entity Normalization (NEN) to identify and map similar references to entities of skills. NER refers to the recognition of phrases of interest from text (commonly referred to as surface forms) and NEN refers to appropriate associations of these surface forms with a formal entity.”) for the first entity with a (pg. 4014 right col “Surface form trained vectors: vectors trained solely by surface forms. Instead of using all the seed skill phrases as for original vectors, we only employ surface forms for training. This approach significantly increases the relevancy of skill vectors, but reduces coverage and complicates taxonomy expansion. If any new surface form is added to the taxonomy, the entire vector space has to be retrained.”)
Zhou in view of Quiros with Liu does not teach and associating the second identifier for the first entity with a second vector representation in the vector space that is within a threshold distance from the first vector representation.  
Bilenko teaches and associating the second identifier for the first entity with a second vector representation in the vector space that is within a threshold distance from the first vector representation. (p. 17, Section “Token-based and hybrid distances”, “For instance, the strings “Ray Mooney” and “Mooney, Ray” are likely to be duplicates, even if they aren’t close in edit distance. In such cases, we might convert the strings s and t to token multisets (where each token is a word) and consider similarity metrics on these multisets.”, middle col., para 1, and p. 18, “TF-IDF distance is attractive in that it weights agreement on rare terms more heavily than agreement on more common terms. This means that “Ray Mooney” and “Wray Mooney” will be considered more similar than, say, “Ray Mooney” and “Ray Charles.” Perhaps this is why TF-IDF often (but not always) performs better than methods that are insensitive to individual tokens’ frequency, such as the Jaccard similarity.”, middle col., para. 1, and p. 19, Section “Relative performance of the static similarity metrics”, “(For token-based distance metrics, the blocking algorithm generates all pairs that contain at least one common token. For other distance metrics, it generates all pairs containing at least one common character 4-gram.) In practice, we would examine this ranking and choose a suitable threshold θ. We could consider all pairs more similar than this threshold to be matches and all pairs less similar to be nonmatches.”, on p. 19, right col., para 1, line 5 to p. 20 left col. line 2).
Zhou, Quiros, Liu and Bilenko are analogous art because they are directed to data automation. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zhou in view of Quiros with Liu to incorporate the teaching of Bilenko, and the ability to differentiate and determine they refer to a single entity, among the different potential features to train the machine learning model. The motivation behind such an added feature is that the methods taught by Bilenko lead to improved accuracy in the name matching problem (p. 16, Abstract, “Our research explores approaches to the name matching problem that improve accuracy. Particularly, we employ methods that adapt to a specific domain by combining multiple string similarity methods that capture different notions of similarity.”, left col., entire para. 4).

Regarding claim 7
Zhou in view of Liu with Quiros and Bilenko teaches claim 6. 
Bilenko further teaches wherein the first entity is associated with a plurality of identifiers, including the first identifier and the second identifier, relating to at least one p. 17, Section “Token-based and hybrid distances”, “For instance, the strings “Ray Mooney” and “Mooney, Ray” are likely to be duplicates, even if they aren’t close in edit distance. In such cases, we might convert the strings s and t to token multisets (where each token is a word) and consider similarity metrics on these multisets.”, middle col., para 1: “Mooney, Ray” is the formal name of “Ray Mooney”, as one example)
 Zhou, Quiros, Liu and Bilenko are analogous art because they are directed to data automation. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zhou in view of Quiros with Liu to incorporate the teaching of Bilenko, and the ability to differentiate and determine they refer to a single entity, among the different potential features to train the machine learning model. The motivation behind such an added feature is that the methods taught by Bilenko lead to improved accuracy in the name matching problem (p. 16, Abstract, “Our research explores approaches to the name matching problem that improve accuracy. Particularly, we employ methods that adapt to a specific domain by combining multiple string similarity methods that capture different notions of similarity.”, left col., entire para. 4).
Regarding claims 15 and 20
Referring to dependent Claim 15 and Claim 20, they are rejected on the same basis as dependent claim 6 since they are analogous claims.

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Zhao et al. (“SKILL: A System for Skill Identification and Normalization”, hereinafter: Zhao) in view of Quiros et al. in view of Liu et al. in view of Bilenko et al. (“Adaptive Name Matching in Information Integration”, hereinafter: Bilenko) and further in view of Espinosa et al. (“Learning to recognise named entities in tweets by exploiting weakly labelled data”, hereinafter: Espinosa). 
Regarding claim 8
Zhou in view of Liu with Quiros teaches claim 1. 
Zhou in view of Liu with Quiros does not teach the method further comprising: determining a third identifier for a second entity having a relationship with the first entity, wherein the relationship between the first entity and the second entity is indicated by training data and related contextual information in which the first entity and the second entity are reflected, associating the first identifier for the first entity with a first vector representation in the vector space; and associating the third identifier for the second entity with a third vector representation in the vector space that is within a threshold distance from the first vector representation.  
Espinosa teaches the method further comprising: determining a third identifier for a second entity having a relationship with the first entity, (pg. 157 “For example, the entity “Justin Bieber” is labelled by the tool as both Person and MusicArtist, which map to W-NUT’s person and musicartist, respectively. In such cases, we take the most specific entity type (e.g., musicartist) as the final label. Furthermore, if an entity match x is subsumed by another match y, we disregard x and retain the longer matching entity y with its corresponding type. For example, if a tweet contains three entity matches such as “New York USA”, “New York” and “USA”, [corresponds to third identifier]we disregard the two shorter entity matches (“New York”, “USA”) and choose the entity with the longer span of tokens (“New York USA”)”)
Zhou, Quiros, Liu and Espinosa are analogous art because they are all directed to data automation.  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zhou in view of Quiros to incorporate the teaching of Espinosa to include deep learning approach for generating labelled data, tagging schemes and word embedding to alleviate data sparsity of some named entity types and resulted in increased performance in their respective entity types as disclosed by Espinosa (pg. 161 section 5). 
Zhou in view of Quiros with Liu and Espinosa does not teach wherein the relationship between the first entity and the second entity is indicated by training data and related contextual information in which the first entity and the second entity are reflected, associating the first identifier for the first entity with a first vector representation in the vector space; and associating the third identifier for the second entity with a third vector representation in the vector space that is within a threshold distance from the first vector representation.
Bilenko teaches wherein the relationship between the first entity and the second entity is indicated by training data and related contextual information in which the first entity and the second entity are reflected, (p. 17, Section “Token-based and hybrid distances”, “One simple and often effective token-based metric is Jaccard similarity. Between the word sets S and T, Jaccard similarity is simply (|S ∩ T|)/(|S ∪ T|).. We can define term frequency-inverse document frequency (TF-IDF) or cosine similarity—which the information retrieval community widely uses… where TFw,S is the frequency of word w in S, IDFw is the inverse of the fraction of names in the corpus that contain w, V′(w, S) = log(TFw,S + 1) ⋅ log(IDFw), and For name matching, you might collect the statistics used to compute IDFw from the complete corpus of names to be matched.”, from p. 17, right col., para 1, to p. 18 left col.: the similarity between any first entity and second entity is indicated by the training data, i.e. the corpus, and the related contextual information in which the first entity and second entity are reflected) p. 19, “Recently, researchers have proposed machine-learning methods that use supervision in the form of matched and unmatched record pairs to train classifiers to distinguish between them…. The main text describes our recent work using training data in the form of matched and unmatched record pairs to train an algorithm for classifying record pairs as duplicate and nonduplicate.”, left col., ¶ 3: With the training data including matched record pairs, their similarity will be reflected and indicated in the training data; both record pairs are represented in a vector form and would be similar as would be indicated when using a similarity metric) 
associating the first identifier for the first entity with a first vector representation in the vector space; and associating the third identifier for the second entity with a third vector representation in the vector space that is within a threshold distance from the first vector representation. (p. 17, Section “Token-based and hybrid distances”, “For instance, the strings “Ray Mooney” and “Mooney, Ray” are likely to be duplicates, even if they aren’t close in edit distance. In such cases, we might convert the strings s and t to token multisets (where each token is a word) and consider similarity metrics on these multisets.”, middle col., para 1, and p. 18, “TF-IDF distance is attractive in that it weights agreement on rare terms more heavily than agreement on more common terms. This means that “Ray Mooney” and “Wray Mooney” will be considered more similar than, say, “Ray Mooney” and “Ray Charles.”, middle col., para 1, lines 1 -6, and p. 19, Section “Relative performance of the static similarity metrics”, “(For token-based distance metrics, the blocking algorithm generates all pairs that contain at least one common token. For other distance metrics, it generates all pairs containing at least one common character 4-gram.) In practice, we would examine this ranking and choose a suitable threshold θ. We could consider all pairs more similar than this threshold to be matches and all pairs less similar to be nonmatches.”, on p. 19, right col., para 1, line 5 to p. 20 left col. line 2: “Ray Mooney” and “Wray Mooney” are similar entities in the sense that they are similar in spelling however they are distinct and separate entities).
Zhou, Quiros, Espinosa, Liu and Bilenko are analogous art because they are all directed to data automation. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zhou in view of Quiros with Espinosa and Liu to incorporate the teaching of Bilenko, and the ability to differentiate and determine they refer to a single entity, among the different potential features to train the machine learning model. The motivation behind such an added feature is that the Bilenko lead to improved accuracy in the name matching problem (p. 16, Abstract, “Our research explores approaches to the name matching problem that improve accuracy. Particularly, we employ methods that adapt to a specific domain by combining multiple string similarity methods that capture different notions of similarity.”, left col., entire para. 4).

Claim 21-23 are rejected under 35 U.S.C. 103 as being unpatentable over Zhao et al. (“SKILL: A System for Skill Identification and Normalization”, hereinafter: Zhao) in view of Liu et al. in view of Quiros et al. and further in view of Mones et al. (US 2017/0154314 A1). 
Regarding claim 21
Zhou in view of Liu with Quiros teaches claim 1. 
Zhou in view of Liu with Quiros does not teach wherein training data to train the machine learning model includes positive samples associated with job candidates who have achieved a measure of success in a recruiting process associated with the employer.  
Mones teaches wherein training data to train the machine learning model includes positive samples associated with job candidates who have achieved a measure of success in a recruiting process associated with the employer. (Para [0063] “For example, in some embodiments, the system stores a record of successful(e.g. hired) and/or unsuccessful (e.g., not hired) job applicants for a particular job description or job code at the company of the user. As new applicants are screened, a machine learning engine such as the machine learning engine 326 of FIG. 3 may perform comparative analytics by comparing a new applicant with previous applicants to determine whether the new applicant has a profile most similar to previous successful applicants or previous unsuccessful applicants. The result of the comparison may be provided to the user in a report in a user interface, such as the report 500 of FIG. 5.”)
Zhou, Zhou in view of Liu with Quiros, Liu and Mones are analogous art because they are all directed to data automation.  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zhou in view of Liu with Quiros to incorporate the teaching of Mones to include records of hired and unsuccessful hired job applicants for a particular job descriptor for the purpose of “comparative analytics by comparing a new applicant with previous applicants to determine whether the new applicant has a profile most similar to previous successful applicants or previous unsuccessful applicants” as disclosed by Mones (para [0063]). 
Regarding claims 22 and 23
Referring to dependent Claim 22 and Claim 23, they are rejected on the same basis as dependent claim 21 since they are analogous claims.

Response to Arguments
Applicant’s arguments with respect to claim(s) 1, 11 and 16 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Faliagka et al. (“Application of Machine Learning Algorithms to an online Recruitment System”) teaches evaluating job applicants in online recruitment systems using machine learning algorithm to solve candidate ranking problem. 
Singh et al. (“PROSPECT: A System for Screening Candidates for Recruitment”) teaches PROSPECT decision tool to help screener shortlist resumes efficiently.  
Williams et al. (US Pat No. 6618734 B1) teaches an interviewing method and system for pre-employment Screening and assessment.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to VAN C MANG whose telephone number is (571)270-7598.  The examiner can normally be reached on Mon - Fri 8:00-5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on 5712729767.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  




/V.M./Examiner, Art Unit 2126      
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126