DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA 

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 11/05/2021 has been entered.
Accordingly, claims 1-23 are pending in this application. Claims 1, 7-11, and 23, are currently amended.

Response to Arguments
Applicant’s arguments with respect to amended pending claims filed on 11/05/2021 have been fully considered. In view of the claim amendment filed, the rejection has been withdrawn. However, upon further consideration, a new ground(s) of rejection is made. 
Further, regarding the new limitations recited in claims 1, 7-11, and 23, it is submitted that they are properly addressed by the new ground of rejection.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 5-16, and 21-23 are rejected under 35 U.S.C. 103 as being unpatentable over Calapodescu (US 20170300565 A1, hereinafter Calapodescu) in view of Zeng et al (US Patent No. 7685201 B2, hereinafter Zeng).

Regarding Claim 1, Calapodescu discloses a system for identifying data in the anonymized resume corpus that pertains to the same individual, comprising: 
an anonymized resume corpus data identifier ([0051]: The entity clustering component 62… identifies the document-specific sequences of entities (and their respective classes) in each cluster, within the given resume or resume section) comprising 
([0039]: FIG. 2 illustrates a computer-implemented system 30 for entity extraction from resumes. The system includes memory 32 which stores software instructions 34 for performing the method illustrated in FIG. 3 and a processor device 36 in communication with the memory for executing the instructions), and 
an anonymized resume corpus data identification computer program having a plurality of sub-programs executable by said computing device or devices (Fig. 2; [0041]: The exemplary instructions 34 include a segmentation component 50, a first entity extraction component 52, a second entity extraction component 54, a machine learning component 56, and an output component 58), wherein the sub-programs configure said computing device or devices to, 
access an anonymized resume corpus (Fig. 3; [0065]: At S102, an input document 10 is received; Fig. 3; [0157]: Several data sets of resumes were obtained, each one in a specific industry domain:[0158]: CV200 (software engineers): this set was built manually from resumes found on the Internet, manually anonymized),
segment the corpus into resume snippets, wherein each snippet is associated with at least one item of summary-level information (Fig. 3, step S104; [0042]: The segmentation component 50 (optional) segments the document 10 into sections 18, e.g., based on document structure, such as lines between rows of text, section titles, paragraph breaks, combinations thereof, and the like… [0155]: For segmenting the resume in sections, four main section types are used: Work experience, Education, Skills and Others (from hobbies to publications) [The four main section types correspond to summary-level information]),
However, Calapodescu does not explicitly teach “cluster the resume snippets into groups, wherein within each group the resume snippets pertain to the same individual, said clustering 
On the other hand, in the same field of endeavor, Zeng teaches
cluster the resume snippets into groups (Fig. 1, step 108; [Col. 2, lines 14-17]-[Col. 3, line 14]; FIG. 1 shows an illustrative example block diagram of a person disambiguation mechanism that uses name entity extraction and clustering to distinguish persons' data located in a search… a disambiguation mechanism 110 processes the snippets 108)
wherein within each group the resume snippets pertain to the same individual, said clustering grouping snippets having one or more of the summary-level information items in common (Fig. 2, step 216; [Col. 5, lines 1-4]: The clusters that share common entities are then merged, as represented by step 216. This step ensures that clusters representing the same person are grouped together, referred to as coverage [The named entities correspond to the summary-level information items]) and 
employing a textual similarity measuring technique to cluster the snippets that pertain to the same individual (Fig. 1, step 116; [Abstract]: In one example, each person is represented as a vector, and similarity among vectors is calculated based on weighting that corresponds to nearness of the terms to a person, and/or the types of terms).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Calapodescu with the teachings of Zeng to include “cluster the resume snippets into groups, wherein within each group the resume snippets pertain to the same individual, said clustering grouping snippets 
The motivation for doing so would be to disambiguate data corresponding to persons that are located from search results, as recognized by Zeng ([0009] of Zeng: Briefly, various aspects of the subject matter described herein are directed towards disambiguating data corresponding to persons that are located from search results).

Regarding Claim 2, the combined teachings of Calapodescu and Zeng teach the system of Claim 1, 
and Calapodescu further discloses wherein the sub-program for segmenting the corpus into resume snippets comprises, on a per-resume basis, sub-programs for:
extracting summary-level information from the resume (Fig. 3, [0067]: At S106, a first set of entities is extracted from one of the sections (or entire document) by component 52, e.g., in the form of a list 61); 
segmenting the resume into snippets (Fig. 3, [0066]: At S104, the document may be segmented into sections 18, using the segmentation component 50; [0155]: With the CV segmented in sections the entity extraction part (1.sup.st and 2.sup.nd passes) is performed [the entities are extracted from the sections. The entities are interpreted as the snippets]); and 
assigning the extracted summary-level information to each snippet segmented from the resume (Fig. 3, [0068] At S108, for each extracted entity in the list 61 generated in the first pass, its offset, length, and type (class) is stored).

Regarding Claim 3, the combined teachings of Calapodescu and Zeng teach the system of Claim 2, 
and Calapodescu further discloses wherein the summary-level information comprises at least one of: an entity that the individual who is the subject of the resume works for or is associated with; or a business location of the entity that the individual who is the subject of the resume works for or is associated with, wherein the business location comprises at least one of a city, or state, or country; or at least one calendar date associated with the resume; or a current job title associated with the individual who is the subject of the resume ([0036]: FIG. 1 illustrates part of a loosely-structured document 10, such as a résumé. Entities 12, 14, 16, etc., to be extracted, e.g., company names (shown in italics), locations (shown underlined), dates (shown in bold, and job titles (shown in bold underlined), inside a given section 18 of the resume (here the work experience section)).

Regarding Claim 5, the combined teachings of Calapodescu and Zeng teach the system of Claim 3, 
and Calapodescu further discloses wherein each extracted calendar date associated with the resume comprises a date the author of the resume put on the resume itself, or a date the resume was published, or a date the resume was submitted, or a date the resume was received, or a date the resume was entered into a resume database ([0036]: Moreover, in each cluster, the entities generally follow a pattern chosen by the author (in this example “Organization, City, State/Country, Date range, new line, job title.”).

Regarding Claim 6, the combined teachings of Calapodescu and Zeng teach the system of Claim 3, 
and Calapodescu further discloses wherein one of the extracted calendar dates is assigned as the baseline date of the resume ([0139]: To limit the potential loss in precision, the search may be limited to the entities in complete clusters. This assumes that an entity, for example a company name, closely related (by the clustering) to a date range… has a higher probability of being correctly extracted (i.e., not a false positive). This provides greater confidence that the observed surface form (as seen in the text) can be used to generate alternative surface forms for finding new instances of the same entity in the document).

Regarding Claim 7, the combined teachings of Calapodescu and Zeng teach the system of Claim 3, 
and Zeng further discloses wherein the sub-program for clustering the resume snippets into groups, comprises a sub-program for: 
employing the textual similarity measuring technique to eliminate snippets for each group not likely to be associated with the same individual as the other snippets in the group ([Col. 3, lines 47-53]): As also represented in FIG. 1 by the personal similarity calculation block 116, and in FIG. 2 at step 206, similarity is then calculated based on feature information. The feature information may be weighted, e.g., with the weight of each feature, such as organization, career, location, relationships to other persons and/or other features learned through linear regression or similar techniques).

Regarding Claim 8, the combined teachings of Calapodescu and Zeng teach the system of Claim 1, 
and Zeng further discloses wherein the sub-program for grouping snippets having one or more of the summary-level information items in common, comprises clustering snippets having the same entity assigned to them into the same group (Fig. 2, step 216; [Col. 5, lines 1-4]: The clusters that share common entities are then merged, as represented by step 216. This step ensures that clusters representing the same person are grouped together, referred to as coverage [The named entities correspond to the summary-level information items]).

Regarding Claim 9, the combined teachings of Calapodescu and Zeng teach the system of Claim 1, 
and Zeng further discloses wherein the sub-program for grouping 3 of 14snippets having one or more of the summary-level information items in common, comprises 
clustering snippets having at least one of a same city, or same state, or same country location assigned to them into the same group (Fig. 1; [Col. 3, lines 18-23]: More particularly, within the disambiguation mechanism 110, a name entity extraction (tagger) component 112 extracts data from the returned snippets 108, such as data related to each resultant person's organization, location (e.g., city, state country and so forth); (Fig. 2, step 216; [Col. 5, lines 1-4]: The clusters that share common entities are then merged, as represented by step 216. This step ensures that clusters representing the same person are grouped together, referred to as coverage [The named entities correspond to the summary-level information items]).

Regarding Claim 10, the combined teachings of Calapodescu and Zeng teach the system of Claim 1, 
and Zeng further discloses wherein the sub-program for grouping 3 of 14snippets having one or more of the summary-level information items in common, comprises 
clustering snippets having both the same entity and the same location assigned to them into the same group (Fig. 1; [Col. 3, lines 18-23]: More particularly, within the disambiguation mechanism 110, a name entity extraction (tagger) component 112 extracts data from the returned snippets 108, such as data related to each resultant person's organization, location (e.g., city, state country and so forth); Fig. 2, step 216; [Col. 5, lines 1-4]: The clusters that share common entities are then merged, as represented by step 216. This step ensures that clusters representing the same person are grouped together, referred to as coverage [The named entities correspond to the summary-level information items]).

Regarding Claim 11, the combined teachings of Calapodescu and Zeng teach the system of Claim 1, 
and Calapodescu further discloses wherein the sub-program for grouping snippets having one or more of the summary-level information items in common, comprises: 
first clustering snippets having the same entity assigned to them into the same group (Fig. 1, [0036]: Entities 12, 14, 16, etc., to be extracted, e.g., company names (shown in italics), locations (shown underlined), dates (shown in bold, and job titles (shown in bold underlined), inside a given section 18 of the resume (here the work experience section), tend to be grouped or clustered together (location wise)),
(Fig. 3, [0093]: To determine whether each cluster is complete an analysis of the numbers of entities of each type in a given section may be performed), and then 
assigning to the group, snippets lacking an entity designation (Fig. 3, step S114, Assign each cluster into a complete or incomplete category; [0097]: This approach provides a split of the list of clusters into a “complete clusters” list (having at least a threshold number of the specified entity types represented in the cluster, which may depend on the correlation coefficients obtained for the entity types) and an “incomplete clusters” list (having less than the threshold number of the specified entity types represented in the cluster)), but having a location assigned to them that matches the location of other snippets in the group ([0016]: At least a subset of the extracted entities in the first set is clustered into clusters, based on locations of the entities in the document).

Regarding Claim 12, the combined teachings of Calapodescu and Zeng teach the system of Claim 7, 
and Calapodescu further discloses wherein the sub-program for employing a textual similarity measuring technique to eliminate snippets for each group, comprises sub- programs for: 
establishing a length of each snippet in a group under consideration ([0068]: At S108, for each extracted entity in the list 61 generated in the first pass, its offset, length, and type (class) is stored); 
for each snippet in the group, determining if the length of the snippet falls with a prescribed comparable length range based on the length of all the snippets in the group ([0088]: For each entity in turn, if the number of clusters is zero (i.e., this is the first entity) then a new cluster is generated using this entity. The centroid of this cluster is based on the location of this entity (e.g., based on its offset and length)); and 
whenever the length of a snippet in the group falls outside the prescribed comparable length range, removing it from the group ([0088]: For example, the cluster centroid is computed as the sum of the offsets of the entities in the cluster divided by the length of the entities in the cluster. If the distance to the current cluster centroid is greater than r, a new cluster will be initialized with this entity).

Regarding Claim 13, the combined teachings of Calapodescu and Zeng teach the system of Claim 12, 
and Calapodescu further discloses wherein the sub-program for establishing a length of each snippet in a group, comprises: 
tokenizing each snippet in the group under consideration ([0043]: A suitable rule-based system 52 includes a parser which applies rules for tokenizing the text to form a sequence of tokens (generally, words)); and 
establishing the length of each tokenized snippet as its character count ([0048] The first entity extraction component 52 also identifies a location of each of the extracted entities, e.g., with offset precision or other location indicator. For example, each character (including spaces between tokens) is indexed in sequence. Each entity can them be located by its first index and its length).

Regarding Claim 14, the combined teachings of Calapodescu and Zeng teach the system of Claim 7, 
and Calapodescu further discloses wherein the anonymized resume corpus data identification computer program further comprises a sub-program for filtering each group after employing the textual similarity measuring technique to remove additional resume snippets ([0050]: chunker component 65), said filtering comprising for each group, 
identifying snippets in the group that do not have a specified part of the summary-level information assigned thereto, which was not employed to cluster the snippets, in common with the other snippets ([0114]: For example, as illustrated in FIG. 7, for each cluster window, a sequence of text chunks is identified using one or more of the chunking rules); and 
removing the snippets from the group that do not have the specified part of the summary-level information assigned thereto ([0129]: Then, clusters that became complete (following the same decision rules described earlier) are removed from the incomplete cluster list.).

Regarding Claim 15, the combined teachings of Calapodescu and Zeng teach the system of Claim 14, 
and Calapodescu further discloses wherein the specified part of the summary-level information which was not employed to cluster the snippets comprises a current job title associated with the individual who is the subject of the resume (Fig. 7, [0096]: For example, for each experience in the experience section in FIG. 7, it may be assumed that there must be at least one job title, one date range and one company name; [0104]: As an example, consider the two examples 94, 96 found in the same resume shown in FIG. 7, which have been annotated by the first pass CRF model 78. The first example 94 is the search window of a complete sample and the second one 96 is the search window of an incomplete one (missing organization)).

Regarding Claim 16, the combined teachings of Calapodescu and Zeng teach the system of Claim 2, 
and Calapodescu further discloses wherein the anonymized resume corpus data identification computer program further comprises a sub-program for
filtering each group to remove resume snippets not likely to be associated with the same individual as other resume snippets in the group ([0085]: To find these closely located entities, an agglomerative clustering algorithm, based on offsets of the entities, may be employed.), said filtering comprising for each group, executing sub-programs for: 
identifying all possible pairs of snippets in the group ([0085]: The algorithm aggregates entities inside a cluster if the distance from a dynamically computed representative location in the cluster, such as the cluster centroid or medoid, is below a threshold distance r (maximum radius around the representative location); 
measuring the similarity of the snippets in each pair of snippets in the group ([0093]: For example, a correlation coefficient (e.g., Pearson's product-moment coefficient) may be computed for co-occurrence of entities in pairs of types (e.g., selected from number of job titles, number of company names, number of dates and so forth)); 
removing from the group, pairs of snippets found to have a measured degree of similarity below a prescribed minimum similarity threshold ([0097] This approach provides a split of the list of clusters into a “complete clusters” list (having at least a threshold number of the specified entity types represented in the cluster, which may depend on the correlation coefficients obtained for the entity types) and an “incomplete clusters” list (having less than the threshold number of the specified entity types represented in the cluster)).

Regarding Claim 21, the combined teachings of Calapodescu and Zeng teach the system of Claim 16, 
and Calapodescu further discloses wherein the sub-program for measuring the similarity of the snippets in each pair of snippets in the group, comprises sub-programs for: 
tokenizing each snippet in each pair of snippets in the group that has not already been tokenized ([0107]: The chunker component 65 may employ tokenization and syntactic and semantic features provided by a syntactic parser 52, such as the Xerox Incremental Parser (XIP), in addition to the results of the first extraction component 52 to split the text into chunks);
applying a fuzzy string-matching technique to the tokenized snippets in each remaining snippet pair in the group to calculate the difference between the snippets in each snippet pair ([0140]: The entity extraction components 52, 54 may miss such acronyms. But after a first pass, these missed instances can be searched for by using exact string matching or approximate string matching techniques or surface variation generation) and
produce a score ([0165]: For scoring the performance, Precision, Recall, and F1 metrics are applied per entity type on each document).

Regarding Claim 22, Calapodescu discloses a system for identifying data in the anonymized resume corpus that pertains to the same individual, comprising: an anonymized resume corpus data identifier ([0051]: The entity clustering component 62… identifies the document-specific sequences of entities (and their respective classes) in each cluster, within the given resume or resume section) comprising one or more computing devices each of which comprises one or more hardware-based processors ([0039]: FIG. 2 illustrates a computer-implemented system 30 for entity extraction from resumes), and an anonymized resume corpus data identification computer program having a plurality of sub-programs executable by said computing device or devices ([0041] The exemplary instructions 34 include a segmentation component 50, a first entity extraction component 52, a second entity extraction component 54, a machine learning component 56, and an output component 58.), wherein the sub-programs configure said computing device or devices to, 
access an anonymized resume corpus ([0157]: Several data sets of resumes were obtained, each one in a specific industry domain:[0158]:  CV200 (software engineers): this set was built manually from resumes found on the Internet, manually anonymized), 
for each resume in the corpus, extract summary-level information from the resume (A method for extracting entities from a text document includes, for at least a section of a text document, providing a first set of entities extracted from the at least a section), 
segment the resume into snippets (Fig. 2, [0042]: The segmentation component 50 (optional) segments the document 10 into sections 18; [0155]: With the CV segmented in sections the entity extraction part (1.sup.st and 2.sup.nd passes) is performed [the entities are extracted from the sections. The entities are interpreted as the snippets]), and 
assign the extracted summary-level information to each snippet segmented from the resume (Fig. 3, [0068] At S108, for each extracted entity in the list 61 generated in the first pass, its offset, length, and type (class) is stored), and 
However, Calapodescu does not explicitly teach “cluster the resume snippets into groups, wherein within each group the resume snippets, said clustering comprising, first employing a 
On the other hand, in the same field of endeavor, Zeng teaches cluster the resume snippets into groups ([0034]: Referring now to FIG. 3, there is illustrated a flow chart of one methodology for snippet clustering in accordance with the present invention; [0036]: At 308, documents are assigned to relevant topics [relevant groups]), 
wherein within each group the resume snippets pertain to the same individual (Fig. 3; [0009]: A frequent itemset method is employed to identify topics inside the snippets first, then classify all the snippets into those topics… Documents related to the same topic usually share a common vocabulary. Keywords are chosen and then clustered by counting co-occurrences of pairs of keywords. Documents are assigned to relevant topics based on the feature vectors of the clusters. [The same topic corresponds to the same individual, and the co-occurrences of word pairs corresponds to identifying word pairs that would not be common in resumes pertaining to a variety of individuals, such as same employers, business locations, job titles, dates of employment, specific job skills, education dates and institutions, awards, certifications, personal interests, and so on]), said clustering comprising, 
first employing a textual similarity measuring technique to cluster the snippets into groups, and then, for each group using at least some of the summary- level information assigned to each snippet in the group to eliminate snippets not likely to be associated with the same individual as the other snippets in the group. ([0055]: Intuitively, if a phrase is a good representation of a single topic, the documents that contain the phrase will be similar to each other. At 504, Intra-Cluster Similarity (ICS) is used to measure the content compactness of documents that contain the phrase), and then, 
for each group using at least some of the summary-level information assigned to each snippet in the group to eliminate snippets not likely to be associated with the same individual as the other snippets in the group ([0035]: At 306, pairs of the keywords are sorted in descending order of their co-occurrences. Accordingly, starting from the first pair (t1, t2), if(t1 is in cluster C and t2 does not belong to any cluster), then put t2 into cluster C; if(t2 is in cluster C and t1 does not belong to any cluster), then put t1 into cluster C, if(neither t1 nor t2 belongs to any cluster), then t1 and t2 form a new topic cluster; afterwards, all the keywords of a topic form a feature vector of the cluster.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Calapodescu to incorporate the teachings of Zeng to include “cluster the resume snippets into groups, wherein within each group the resume snippets pertain to the same individual, said clustering comprising, first employing a textual similarity measuring technique to cluster the snippets into groups, and then, for each group using at least some of the summary- level information assigned to each snippet in the group to eliminate snippets not likely to be associated with the same individual as the other snippets in the group.”
The motivation for doing so would be to classify snippets according to topics, as recognized by Zeng ([0009] of Zeng: A frequent itemset method is employed to identify topics inside the snippets first, then classify all the snippets into those topics).
 
Regarding Claim 23, Calapodescu discloses a computer-implemented process for identifying data in the anonymized resume corpus that pertains to the same individual, the process comprising the actions of: 
using one or more computing devices each of which comprises one or more hardware-based processors to perform the following process actions ([0059]: The computer-implemented system 30 may include one or more computing devices 30), the computing devices being in communication with each other via a computer network whenever a plurality of computing devices is used ([0061]: The network interface 40, 42 allows the computer to communicate with other devices via a computer network): 
accessing an anonymized resume corpus (Fig. 3; [0065]: At S102, an input document 10 is received; Fig. 3; [0157]: Several data sets of resumes were obtained, each one in a specific industry domain:[0158]: CV200 (software engineers): this set was built manually from resumes found on the Internet, manually anonymized), 
segmenting the corpus into resume snippets wherein each snippet is associated with at least one item of summary-level information (Fig. 3, step S104; [0042]: The segmentation component 50 (optional) segments the document 10 into sections 18, e.g., based on document structure, such as lines between rows of text, section titles, paragraph breaks, combinations thereof, and the like… [0155]: For segmenting the resume in sections, four main section types are used: Work experience, Education, Skills and Others (from hobbies to publications) [The four main section types correspond to summary-level information]). 
However, Calapodescu does not explicitly teach “clustering the resume snippets into groups, wherein within each group the resume snippets pertain to the same individual, said clustering grouping snippets having one or more of the summary-level information items in 
On the other hand, in the same field of endeavor, Zeng teaches
clustering the resume snippets into groups (Fig. 1, step 108; [Col. 2, lines 14-17]-[Col. 3, line 14]; FIG. 1 shows an illustrative example block diagram of a person disambiguation mechanism that uses name entity extraction and clustering to distinguish persons' data located in a search… a disambiguation mechanism 110 processes the snippets 108)
wherein within each group the resume snippets pertain to the same individual, said clustering grouping snippets having one or more of the summary-level information items in common (Fig. 2, step 216; [Col. 5, lines 1-4]: The clusters that share common entities are then merged, as represented by step 216. This step ensures that clusters representing the same person are grouped together, referred to as coverage [The named entities correspond to the summary-level information items]) and 
employing a textual similarity measuring technique to cluster the snippets that pertain to the same individual (Fig. 1, step 116; [Abstract]: In one example, each person is represented as a vector, and similarity among vectors is calculated based on weighting that corresponds to nearness of the terms to a person, and/or the types of terms).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Calapodescu with the teachings of Zeng to include “clustering the resume snippets into groups, wherein within each group the resume snippets pertain to the same individual, said clustering grouping snippets having one or more of the summary-level information items in common and employing a textual similarity measuring technique to cluster the snippets that pertain to the same individual”.
([0009] of Zeng: Briefly, various aspects of the subject matter described herein are directed towards disambiguating data corresponding to persons that are located from search results).

Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Calapodescu (US 20170300565 A1, hereinafter Calapodescu) in view of Zeng et al (US Patent No.  7685201 B2, hereinafter Zeng) and in further view of Dane et al (US Patent No. 10860737 B2, hereinafter Dane).

Regarding Claim 4, the combined teachings of Calapodescu and Zeng discloses the system of claim 3.
However, the combined teachings of Calapodescu and Zeng does not explicitly teach “wherein the entity that the individual who is the subject of the resume works for is associated with a business entity identified by its uniform resource locator.”
On the other hand, in the same field of endeavor, Dane teaches wherein the entity that the individual who is the subject of the resume works for is associated with is a business entity identified by its uniform resource locator ([Col.3, lines 3-5]: FIG. 13 shows the format of a Uniform Resource Locator (URL) pointing to resume data according to one embodiment of the example resume system; Fig. 15, [Col. 9, lines 59-62]: In one embodiment, the URLs 1515, 1520 are presented as hyperlinks that indicate the last name of the candidate, the current or last employer as well as the current or last position as is stored in the resume database).

The motivation for doing so would be to allow the user to access the resume information by clicking on the URL, as recognized by Dane ([Col. 9, lines 64-67] of Dane: the Hiring Manager would then access the resume database for the resume information that has been forwarded to him or her by clicking on each of the URLs 1515, 1520).

Claims 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Calapodescu (US 20170300565 A1, hereinafter Calapodescu) in view of Zeng et al (US Patent No. 7685201 B2, hereinafter Zeng) and in further view of Buhrmann (US 20160232160 A1).

Regarding Claim 17, the combined teachings of Calapodescu and Zeng teach the system of Claim 16, 
and Calapodescu further discloses wherein the sub-program for measuring the similarity of the snippets in each pair of snippets in the group, comprises sub-programs for: 
tokenizing each snippet in each pair of snippets in the group that has not already been tokenized; removing extraneous words and symbols from the tokenized snippets ([0107]: The chunker component 65 may employ tokenization and syntactic and semantic features provided by a syntactic parser 52, such as the Xerox Incremental Parser (XIP), in addition to the results of the first extraction component 52 to split the text into chunks);
 vectorizing and normalizing each of the tokenized snippets ([0155]: The extracted text is saved in UTF-8 and some character normalization is applied (like quotes, new lines, etc.)).
However, the combined teachings of Calapodescu and Zeng does not explicitly teach “applying a cosine similarity measurement technique to the vectorized snippets in each snippet pair in the group.”
On the other hand, in the same field of endeavor, Buhrmann teaches applying a cosine similarity measurement technique to the vectorized snippets in each snippet pair in the group ([0137]: When comparing a document to a corpus, such as shown in FIG. 1B, both the Input 12 document and the Corpus 14 are converted into a concept vector representation or “concept map”. In this form, two concept maps may be compared using vector mathematical treatments. For example, cosine similarity may be applied between two concept vector representations to determine their similarity based on the distance between their concept vectors).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined teachings of Calapodescu and Zeng to incorporate the teachings of Buhrmann to include “applying a cosine similarity measurement technique to the vectorized snippets in each snippet pair in the group.”
The motivation for doing so would be to compare natural language texts on the basis of their conceptual substance, as recognized by Buhrmann ([0137] of Buhrmann: In this way, natural language texts may be compared on the basis of their conceptual substance and not syntactical composition).


Regarding Claim 18, the modified system of Calapodescu, Zeng, and Buhrmann discloses the system of Claim 17,
and Calapodescu further discloses wherein the sub-program for measuring the similarity of the snippets in each pair of snippets in the group, further comprises additional sub-programs executed after applying the cosine similarity measurement technique and removing pairs of snippets found to have a measured degree of similarity below a prescribed minimum cosine similarity threshold, said additional sub-programs comprising:
applying a fuzzy string-matching technique to the tokenized snippets in each remaining snippet pair in the group to calculate the difference between the snippets in each snippet pair ([0140]: The entity extraction components 52, 54 may miss such acronyms. But after a first pass, these missed instances can be searched for by using exact string matching or approximate string matching techniques or surface variation generation.) and 
produce a score ([0165]: For scoring the performance, Precision, Recall, and F1 metrics are applied per entity type on each document); and 
removing pairs of snippets found to have a score below a prescribed minimum score threshold ([0097]: This approach provides a split of the list of clusters into a “complete clusters” list (having at least a threshold number of the specified entity types represented in the cluster, which may depend on the correlation coefficients obtained for the entity types) and an “incomplete clusters” list (having less than the threshold number of the specified entity types represented in the cluster). 

Regarding Claim 19, the modified system of Calapodescu, Zeng, and Buhrmann discloses the system of Claim 18, wherein the sub-program for measuring the similarity of the snippets in each pair of snippets in the group, further comprises other sub-programs executed after applying a fuzzy string-matching technique and removing pairs of snippets found to have a score below a prescribed minimum score threshold, 
and Calapodescu further discloses said additional sub-programs comprising: 
identifying snippets in the group that do not have a specified part of the summary-level information assigned thereto, which was not employed to cluster the snippets, in common with the other snippets ([0114] For example, as illustrated in FIG. 7, for each cluster window, a sequence of text chunks is identified using one or more of the chunking rules); and 
removing the snippets from the group that do not have the specified part of the summary-level information assigned thereto ([0129]: Then, clusters that became complete (following the same decision rules described earlier) are removed from the incomplete cluster list).

Regarding Claim 20, the modified system of Calapodescu, Zeng, and Buhrmann discloses the system of Claim 19, 
and Calapodescu further discloses wherein the specified part of the summary-level information which was not employed to cluster the snippets comprises a current job title associated with the individual who is the subject of the resume (Fig. 7, [0096]: For example, for each experience in the experience section in FIG. 7, it may be assumed that there must be at least one job title, one date range and one company name; [0104] As an example, consider the two examples 94, 96 found in the same resume shown in FIG. 7, which have been annotated by the first pass CRF model 78. The first example 94 is the search window of a complete sample and the second one 96 is the search window of an incomplete one (missing organization)).



Examiner Note
Examiner has cited particular columns/paragraph and line numbers in the references applied to the claims above for the convenience of the applicant. Although the specified citations are representative of the teachings of the art and are applied to specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested from the applicant in preparing responses, to fully consider the references in entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the Examiner.
In the case of amending the Claimed invention, Applicant is respectfully requested to indicate the portion(s) of the specification which dictate(s) the structure relied on for proper interpretation and also to verify and ascertain the metes and bounds of the claimed invention. This will assist in expediting compact prosecution. MPEP 714.02 recites: "Applicant should also specifically point out the support for any amendments made to the disclosure. See MPEP § 163.06. An amendment which does not comply with the provisions of 37 CFR 1.12l(b), (c),  (d), and (h) may be held not fully responsive. See MPEP § 714." Amendments not pointing to
specific support in the disclosure may be deemed as not complying with provisions of 37 C.F.R. 1.131(b), (c), (d), and (h) and therefore held not fully responsive. Generic statements such as "Applicants believe no new matter has been introduced" may be deemed insufficient.
 


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHIRLEY D. HICKS whose telephone number is (571)272-3304.  The examiner can normally be reached on Mon - Fri 7:30 - 4:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Fred Ehichioya can be reached on (571) 272-4034.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.







/IRETE F EHICHIOYA/Supervisory Patent Examiner, Art Unit 2168