DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA 

Response to Amendments
The action is responsive to the Applicant’s Amendment filed on 4/13/2022. Claims 1, 3-6, and 8-23 are pending in the application. Claims 1, 3, 8-19, 21-23 are amended. Claims 2 and 7 are cancelled.

Response to Arguments
Applicant’s arguments with respect to the rejections of claims 1, 3-6, and 8-23 have been fully considered. In view of the claim amendment filed, the rejection has been withdrawn. However, upon further consideration, a new ground(s) of rejection is made. 
Further, regarding the new limitations recited in claims 1, 3, 8-19, 21-23, it is submitted that they are properly addressed by the new ground of rejection.
Furthermore, it is also submitted that all limitations in pending claims, including those not specifically argued, are properly addressed. The reason is set forth in the rejections. See claim analysis below for detail.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3, 5-6, 8-16, and 21-23 are rejected under 35 U.S.C. 103 as being unpatentable over Calapodescu (US 20170300565 A1, hereinafter Calapodescu) in view of Martin et al. (US 20140081995 A1, hereinafter Martin).

Regarding Claim 1, Calapodescu discloses a system for identifying data in the anonymized resume corpus that pertains to the same individual, comprising: 
an anonymized resume corpus data identifier ([0051]: The entity clustering component 62… identifies the document-specific sequences of entities (and their respective classes) in each cluster, within the given resume or resume section) comprising one or more computing devices each of which comprises one or more hardware-based processors ([0039]: FIG. 2 illustrates a computer-implemented system 30 for entity extraction from resumes. The system includes memory 32 which stores software instructions 34 for performing the method illustrated in FIG. 3 and a processor device 36 in communication with the memory for executing the instructions), and 
an anonymized resume corpus data identification computer program having a plurality of sub-programs executable by said computing device or devices (Fig. 2; [0041]: The exemplary instructions 34 include a segmentation component 50, a first entity extraction component 52, a second entity extraction component 54, a machine learning component 56, and an output component 58), wherein the sub-programs configure said computing device or devices to: 
access an anonymized resume corpus made up of anonymized resumes in which the name of an individual an anonymized resume pertains to is not included (Fig. 3; [0065]: At S102, an input document 10 is received; Fig. 3; [0157]: Several data sets of resumes were obtained, each one in a specific industry domain:[0158]: CV200 (software engineers): this set was built manually from resumes found on the Internet, manually anonymized. See Fig. 1);
on a per-resume basis ([0040]: The system 30 receives as input a text document 10, such as a resume, for processing), 
extract summary-level information items from the resume ([0036]: FIG. 1 illustrates part of a loosely-structured document 10, such as a résumé. Entities 12, 14, 16, etc., to be extracted; [0043]: The first entity extraction component 52 performs a first pass of entity extraction; Fig. 3, step s106 – first pass extraction of entities. See also para [0016]- [0020], Fig. 3, step s108), 
segment the resume into snippets (Fig. 2; [0042]: The segmentation component 50 (optional) segments the document 10 into sections 18, e.g., based on document structure [the sections correspond to the snippets. See also Fig. 3, step S104]), and 
assign the extracted summary-level information items to each snippet segmented from the resume (Fig. 3, [0068]: At S108, for each extracted entity in the list 61 generated in the first pass, its offset, length, and type (class) is stored; [0155]: For segmenting the resume in sections, four main section types are used: Work experience, Education, Skills and Others (from hobbies to publications) [the four main sections correspond to the snippets to which the information is assigned]. See also Fig. 3, para [0008], [0019], [0068], [0041]-[0042]); 
However, Calapodescu does not explicitly teach “group resume snippets which have one or more summary-level information items in common into a same group to create one or more snippet groups; and for each created snippet group, employ a textual similarity measuring technique to cluster the snippets into sub-groups, and designate the snippets of each sub-group as pertaining to the same individual.”
On the other hand, in the same field of endeavor, Martin teaches 
group resume snippets which have one or more summary-level information items in common into a same group to create one or more snippet groups (Fig. 5, [0061]: Matching sets of text blocks sharing common characteristics are grouped into text block groups 508 [the text blocks correspond to the resume snippets. See also para [0022]]); and 
for each created snippet group, employ a textual similarity measuring technique to cluster the snippets into sub-groups ([0114]: The match scoring methodology compares any two text blocks and applies a number of different statistical measures to assess similarity and difference, each of which may generate a different score; [0230]: In addition formulas are applied to identify word clusters within a text block group to identify common groups of words within the text of each member of the text block group; See also [0233]: Identify Text Block Groups and Sub Text Block Groups,  para [0023], [0036], [0043], [0066], [0120]), and
designate the snippets of each sub-group as pertaining to the same individual ([0061]: In certain implementations, redundant or duplicate text block groups are identified; [0144]-[0150]:  (ii) Match Text Blocks Across Files… Where 1,000 resumes are processed, the results are shown in Table 6 showing the text blocks groups and the number of matching text blocks in each group [Matching text blocks in each text block group designate the text blocks as pertaining to the same individual. See para [0043]-[0049], [0133], [0233]-[0239]]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Calapodescu with the teachings of Martin to include “cluster the resume snippets into groups, wherein within each group the resume snippets pertain to the same individual, said clustering grouping snippets having one or more of the summary-level information items in common and employing a textual similarity measuring technique to cluster the snippets that pertain to the same individual.”
The motivation for doing so would be to identify and match sections of files across multiple files, as recognized by Martin ([0002] of Martin: A data profile engine for identifying, classifying, analyzing, searching, comparing and cross-referencing entire files and sections of files, records and other forms of electronic media is provided).

Regarding Claim 3, the combined teachings of Calapodescu and Martin teach the system of Claim 1, 
and Calapodescu further discloses wherein the summary-level information comprises at least one of: an entity that the individual who is the subject of the resume works for or is associated with; or a business location of the entity that the individual who is the subject of the resume works for or is associated with, wherein the business location comprises at least one of a city, or state, or country; or at least one calendar date associated with the resume; or a current job title associated with the individual who is the subject of the resume ([0036]: FIG. 1 illustrates part of a loosely-structured document 10, such as a résumé. Entities 12, 14, 16, etc., to be extracted, e.g., company names (shown in italics), locations (shown underlined), dates (shown in bold, and job titles (shown in bold underlined), inside a given section 18 of the resume (here the work experience section)).

Regarding Claim 5, the combined teachings of Calapodescu and Martin teach the system of Claim 3, 
and Calapodescu further discloses wherein each extracted calendar date associated with the resume comprises a date the author of the resume put on the resume itself, or a date the resume was published, or a date the resume was submitted, or a date the resume was received, or a date the resume was entered into a resume database ([0036]: Moreover, in each cluster, the entities generally follow a pattern chosen by the author (in this example “Organization, City, State/Country, Date range, new line, job title.”).

Regarding Claim 6, the combined teachings of Calapodescu and Martin teach the system of Claim 3, 
and Calapodescu further discloses wherein one of the extracted calendar dates is assigned as the baseline date of the resume ([0139]: To limit the potential loss in precision, the search may be limited to the entities in complete clusters. This assumes that an entity, for example a company name, closely related (by the clustering) to a date range… has a higher probability of being correctly extracted (i.e., not a false positive). This provides greater confidence that the observed surface form (as seen in the text) can be used to generate alternative surface forms for finding new instances of the same entity in the document).

Regarding Claim 8, the combined teachings of Calapodescu and Martin teach the system of Claim 1, 
and Martin further discloses wherein the sub-program for grouping resume snippets having one or more of the summary-level information items in common into the same group, comprises grouping snippets having the same entity assigned to them into the same group ([0022]: The data profile engine analyzes multiple source files (e.g., electronic media including documents and records), identifies sections of text in each file (a "text block"), matches text blocks within a file and across all files by constructing collections of matched text blocks (a "text block group") resulting in multiple sets of text block groups (a "text block group set"), and… identifying their common and distinguishing attributes; Fig. 5, [0061]: Matching sets of text blocks sharing common characteristics are grouped into text block groups 508 [the text blocks correspond to the resume snippets]).

Regarding Claim 9, the combined teachings of Calapodescu and Martin teach the system of Claim 1, 
and Calapodescu further discloses wherein the sub-program for grouping resume 3 of 14snippets having one or more of the summary-level information items in common into the same group, comprises grouping snippets having at least one of a same city, or same state, or same country location assigned to them into the same group ([0036]: Moreover, in each cluster, the entities generally follow a pattern chosen by the author (in this example “Organization, City, State/Country, Date range, new line, job title.” This type of pattern is specific to each document and often also to each section in the document; [0051]: For example, for section 18 in FIG. 1, the first two clusters have an entity class sequence: ORG, CITY, STATE DATE).

Regarding Claim 10, the combined teachings of Calapodescu and Martin teach the system of Claim 1, 
and Calapodescu further discloses wherein the sub-program for grouping resume 3 of 14snippets having one or more of the summary-level information items in common into the same group, comprises grouping snippets having both the same entity and the same location assigned to them into the same group ([0036]: FIG. 1 illustrates part of a loosely-structured document 10, such as a résumé. Entities 12, 14, 16, etc., to be extracted, e.g., company names (shown in italics), locations (shown underlined), dates (shown in bold, and job titles (shown in bold underlined), inside a given section 18 of the resume (here the work experience section), tend to be grouped or clustered together (location wise). See also para [0048]).

Regarding Claim 11, the combined teachings of Calapodescu and Martin teach the system of Claim 1, 
and Calapodescu further discloses wherein the sub-program for grouping resume snippets having one or more of the summary-level information items in common into the same group, comprises: first grouping snippets having the same entity assigned to them into the same group (Fig. 1, [0036]: Entities 12, 14, 16, etc., to be extracted, e.g., company names (shown in italics), locations (shown underlined), dates (shown in bold, and job titles (shown in bold underlined), inside a given section 18 of the resume (here the work experience section), tend to be grouped or clustered together (location wise)),
 determining if any resume snippet not already assigned to the group is lacking an entity designation (Fig. 3, [0093]: To determine whether each cluster is complete an analysis of the numbers of entities of each type in a given section may be performed), and then 
assigning to the group, resume snippets lacking an entity designation (Fig. 3, step S114, Assign each cluster into a complete or incomplete category; [0097]: This approach provides a split of the list of clusters into a “complete clusters” list (having at least a threshold number of the specified entity types represented in the cluster, which may depend on the correlation coefficients obtained for the entity types) and an “incomplete clusters” list (having less than the threshold number of the specified entity types represented in the cluster)), but having a location assigned to them that matches the location of other resume snippets in the group ([0016]: At least a subset of the extracted entities in the first set is clustered into clusters, based on locations of the entities in the document).

Regarding Claim 12, the combined teachings of Calapodescu and Martin teach the system of Claim 3, 
and Calapodescu further discloses wherein prior to executing the sub-program for employing a textual similarity measuring technique to cluster the snippets into sub-groups, a sub-program to eliminate snippets from each created snippet group is executed, said snippet elimination sub-program comprising: establishing a length of each snippet in a created snippet group under consideration ([0068]: At S108, for each extracted entity in the list 61 generated in the first pass, its offset, length, and type (class) is stored); 
for each snippet in the created snippet group, determining if the length of the snippet falls with a prescribed comparable length range based on the length of all the snippets in the created snippet group ([0088]: For each entity in turn, if the number of clusters is zero (i.e., this is the first entity) then a new cluster is generated using this entity. The centroid of this cluster is based on the location of this entity (e.g., based on its offset and length)); and 
whenever the length of a snippet in the created snippet group falls outside the prescribed comparable length range, removing it from the created snippet group ([0088]: For example, the cluster centroid is computed as the sum of the offsets of the entities in the cluster divided by the length of the entities in the cluster. If the distance to the current cluster centroid is greater than r, a new cluster will be initialized with this entity).

Regarding Claim 13, the combined teachings of Calapodescu and Martin teach the system of Claim 12, 
and Calapodescu further discloses wherein the sub-program for establishing a length of each snippet in a created snippet group, comprises: 
tokenizing each snippet in the created snippet group under consideration ([0043]: A suitable rule-based system 52 includes a parser which applies rules for tokenizing the text to form a sequence of tokens (generally, words)); and 
establishing the length of each tokenized snippet as its character count ([0048] The first entity extraction component 52 also identifies a location of each of the extracted entities, e.g., with offset precision or other location indicator. For example, each character (including spaces between tokens) is indexed in sequence. Each entity can them be located by its first index and its length).

Regarding Claim 14, the combined teachings of Calapodescu and Martin teach the system of Claim 3, 
and Calapodescu further discloses wherein the anonymized resume corpus data identification computer program further comprises executing a sub-program for filtering each sub-group after executing the sub-program for employing a textual similarity measuring technique to cluster the snippets into sub-groups ([0026]: FIG. 7 shows annotations of an example resume in a first pass and extracted chunk sequences; [0050]: chunker component 65. See para [0050]-[0054]), said filtering comprising for each sub-group, 
identifying resume snippets in the sub-group that do not have a specified part of the summary-level information assigned thereto, which was not employed to cluster the resume snippets, in common with the other resume snippets ([0114]: For example, as illustrated in FIG. 7, for each cluster window, a sequence of text chunks is identified using one or more of the chunking rules); and 
removing the resume snippets from the sub-group that do not have the specified part of the summary-level information assigned thereto ([0129]: Then, clusters that became complete (following the same decision rules described earlier) are removed from the incomplete cluster list.).

Regarding Claim 15, the combined teachings of Calapodescu and Martin teach the system of Claim 14, 
and Calapodescu further discloses wherein the specified part of the summary-level information which was not employed to cluster the resume snippets comprises a current job title associated with the individual who is the subject of the resume (Fig. 7, [0096]: For example, for each experience in the experience section in FIG. 7, it may be assumed that there must be at least one job title, one date range and one company name; [0104]: As an example, consider the two examples 94, 96 found in the same resume shown in FIG. 7, which have been annotated by the first pass CRF model 78. The first example 94 is the search window of a complete sample and the second one 96 is the search window of an incomplete one (missing organization)).

Regarding Claim 16, the combined teachings of Calapodescu and Martin teach the system of Claim 2, 
and Calapodescu further discloses wherein the anonymized resume corpus data identification computer program further comprises a sub-program for filtering each sub-group to remove resume snippets not likely to be associated with the same individual as other resume snippets in the group ([0085]: To find these closely located entities, an agglomerative clustering algorithm, based on offsets of the entities, may be employed.), said filtering comprising for each group, executing sub-programs for: 
identifying all possible pairs of snippets in the sub-group ([0085]: The algorithm aggregates entities inside a cluster if the distance from a dynamically computed representative location in the cluster, such as the cluster centroid or medoid, is below a threshold distance r (maximum radius around the representative location); 
measuring the similarity of the snippets in each pair of snippets in the sub-group ([0093]: For example, a correlation coefficient (e.g., Pearson's product-moment coefficient) may be computed for co-occurrence of entities in pairs of types (e.g., selected from number of job titles, number of company names, number of dates and so forth)); 
removing from the sub-group, pairs of snippets found to have a measured degree of similarity below a prescribed minimum similarity threshold ([0097] This approach provides a split of the list of clusters into a “complete clusters” list (having at least a threshold number of the specified entity types represented in the cluster, which may depend on the correlation coefficients obtained for the entity types) and an “incomplete clusters” list (having less than the threshold number of the specified entity types represented in the cluster)).

Regarding Claim 21, the combined teachings of Calapodescu and Martin teach the system of Claim 16, 
and Calapodescu further discloses wherein the sub-program for measuring the similarity of the snippets in each pair of snippets in the sub-group, comprises sub-programs for: tokenizing each snippet in each pair of snippets in the sub-group that has not already been tokenized ([0107]: The chunker component 65 may employ tokenization and syntactic and semantic features provided by a syntactic parser 52, such as the Xerox Incremental Parser (XIP), in addition to the results of the first extraction component 52 to split the text into chunks);
applying a fuzzy string-matching technique to the tokenized snippets in each remaining snippet pair in the sub-group to calculate the difference between the snippets in each snippet pair ([0140]: The entity extraction components 52, 54 may miss such acronyms. But after a first pass, these missed instances can be searched for by using exact string matching or approximate string matching techniques or surface variation generation) and
produce a score ([0165]: For scoring the performance, Precision, Recall, and F1 metrics are applied per entity type on each document).

Regarding Claim 22, Calapodescu discloses a system for identifying data in the anonymized resume corpus that pertains to the same individual, comprising: an anonymized resume corpus data identifier ([0051]: The entity clustering component 62… identifies the document-specific sequences of entities (and their respective classes) in each cluster, within the given resume or resume section) comprising one or more computing devices each of which comprises one or more hardware-based processors ([0039]: FIG. 2 illustrates a computer-implemented system 30 for entity extraction from resumes), and 
an anonymized resume corpus data identification computer program having a plurality of sub-programs executable by said computing device or devices ([0041]: The exemplary instructions 34 include a segmentation component 50, a first entity extraction component 52, a second entity extraction component 54, a machine learning component 56, and an output component 58), wherein the sub-programs configure said computing device or devices to: 
access an anonymized resume corpus made up of anonymized resumes in which the name of an individual an anonymized resume pertains to is not included (Fig. 3; [0065]: At S102, an input document 10 is received; [0157]: Several data sets of resumes were obtained, each one in a specific industry domain:[0158]:  CV200 (software engineers): this set was built manually from resumes found on the Internet, manually anonymized. See Fig. 1);
for each resume in the corpus ([0040]: The system 30 receives as input a text document 10, such as a resume, for processing), 
extract summary-level information from the resume ([0036]: FIG. 1 illustrates part of a loosely-structured document 10, such as a résumé. Entities 12, 14, 16, etc., to be extracted; [0043]: The first entity extraction component 52 performs a first pass of entity extraction; Fig. 3, step s106 – first pass extraction of entities. See also para [0016]- [0020], Fig. 3, step s108), 
segment the resume into snippets (Fig. 2, [0042]: The segmentation component 50 (optional) segments the document 10 into sections 18, e.g., based on document structure [the sections correspond to the snippets. See also Fig. 3, step S104]), and 
assign the extracted summary-level information to each snippet segmented from the resume (Fig. 3, [0068]: At S108, for each extracted entity in the list 61 generated in the first pass, its offset, length, and type (class) is stored; [0155]: For segmenting the resume in sections, four main section types are used: Work experience, Education, Skills and Others (from hobbies to publications [the four main sections correspond to the snippets to which the information is assigned. See also Fig. 3, para [0008], [0019], [0068], [0041]-[0042]]); 
However, Calapodescu does not explicitly teach “employ a textual similarity measuring technique to cluster the resume snippets into initial groups; and for each initial group identify resume snippets in the initial group that do not have a specified part of the summary-level information assigned thereto in common with the other snippets, and remove the resume snippets from the initial group that do not have the specified part of the summary-level information assigned thereto to form a final group; and designate the resume snippets of each final group as pertaining to the same individual.”
On the other hand, in the same field of endeavor, Martin teaches 
employ a textual similarity measuring technique to cluster the resume snippets into initial groups ([0114]: The match scoring methodology compares any two text blocks and applies a number of different statistical measures to assess similarity and difference, each of which may generate a different score; [0230]: In addition formulas are applied to identify word clusters within a text block group to identify common groups of words within the text of each member of the text block group; See also [0233]: Identify Text Block Groups and Sub Text Block Groups,  para [0023], [0036], [0043], [0066], [0120]); and 
for each initial group identify resume snippets in the initial group that do not have a specified part of the summary-level information assigned thereto in common with the other snippets, and remove the resume snippets from the initial group that do not have the specified part of the summary-level information assigned thereto to form a final group (Fig. 5, [0061]: In some implementations, certain text blocks are initially unmatched to a text block group); and 
designate the resume snippets of each final group as pertaining to the same individual
([0061]: In certain implementations, redundant or duplicate text block groups are identified; [0144]-[0150]:  (ii) Match Text Blocks Across Files… Where 1,000 resumes are processed, the results are shown in Table 6 showing the text blocks groups and the number of matching text blocks in each group [Matching text blocks in each text block group designate the text blocks as pertaining to the same individual. See para [0043]-[0049], [0133], [0233]-[0239]].
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Calapodescu to incorporate the teachings of Martin to include “employ a textual similarity measuring technique to cluster the resume snippets into initial groups; and for each initial group identify resume snippets in the initial group that do not have a specified part of the summary-level information assigned thereto in common with the other snippets, and remove the resume snippets from the initial group that do not have the specified part of the summary-level information assigned thereto to form a final group; and designate the resume snippets of each final group as pertaining to the same individual.”
The motivation for doing so would be to identify and match sections of files across multiple files, as recognized by Martin ([0002] of Martin: A data profile engine for identifying, classifying, analyzing, searching, comparing and cross-referencing entire files and sections of files, records and other forms of electronic media is provided).
 
Regarding Claim 23, Calapodescu discloses a computer-implemented process for identifying data in the anonymized resume corpus that pertains to the same individual, the process comprising the actions of: 
using one or more computing devices each of which comprises one or more hardware-based processors to perform the following process actions ([0059]: The computer-implemented system 30 may include one or more computing devices 30), the computing devices being in communication with each other via a computer network whenever a plurality of computing devices is used ([0061]: The network interface 40, 42 allows the computer to communicate with other devices via a computer network): 
accessing an anonymized resume corpus made up of anonymized resumes in which the name of an individual an anonymized resume pertains to is not included (Fig. 3; [0065]: At S102, an input document 10 is received; Fig. 3; [0157]: Several data sets of resumes were obtained, each one in a specific industry domain:[0158]: CV200 (software engineers): this set was built manually from resumes found on the Internet, manually anonymized. See Fig. 1); 
on a per-resume basis ([0040]: The system 30 receives as input a text document 10, such as a resume, for processing), 
extracting summary-level information items from the resume ([0036]: FIG. 1 illustrates part of a loosely-structured document 10, such as a résumé. Entities 12, 14, 16, etc., to be extracted; [0043]: The first entity extraction component 52 performs a first pass of entity extraction; Fig. 3, step s106 – first pass extraction of entities. See also para [0016]- [0020], Fig. 3, step s108),
segmenting the resume into snippets (Fig. 2; [0042]: The segmentation component 50 (optional) segments the document 10 into sections 18, e.g., based on document structure [the sections correspond to the snippets. See also Fig. 3, step S104]), and 
assigning the extracted summary-level information items to each snippet segmented from the resume (Fig. 3, [0068]: At S108, for each extracted entity in the list 61 generated in the first pass, its offset, length, and type (class) is stored; [0155]: For segmenting the resume in sections, four main section types are used: Work experience, Education, Skills and Others (from hobbies to publications)). See also Fig. 3, para [0008], [0019], [0068], [0041]-[0042]); 
However, Calapodescu does not explicitly teach “grouping resume snippets which have one or more summary-level information items in common into a same group to create one or more snippet groups; and for each created snippet group, employing a textual similarity measuring technique to cluster the snippets into sub-groups, and designating the snippets of each sub-group as pertaining to the same individual.”
On the other hand, in the same field of endeavor, Martin teaches	
grouping resume snippets which have one or more summary-level information items in common into a same group to create one or more snippet groups ([0022]: The data profile engine analyzes multiple source files (e.g., electronic media including documents and records), identifies sections of text in each file (a "text block"), matches text blocks within a file and across all files by constructing collections of matched text blocks (a "text block group") resulting in multiple sets of text block groups (a "text block group set"), and… identifying their common and distinguishing attributes; Fig. 5, [0061]: Matching sets of text blocks sharing common characteristics are grouped into text block groups 508 [the text blocks correspond to the resume snippets]); and 
for each created snippet group, employing a textual similarity measuring technique to cluster the snippets into sub-groups ([0114]: The match scoring methodology compares any two text blocks and applies a number of different statistical measures to assess similarity and difference, each of which may generate a different score; [0230]: In addition formulas are applied to identify word clusters within a text block group to identify common groups of words within the text of each member of the text block group; See also [0233]: Identify Text Block Groups and Sub Text Block Groups,  para [0023], [0036], [0043], [0066], [0120]), and 
designating the snippets of each sub-group as pertaining to the same individual ([0061]: In certain implementations, redundant or duplicate text block groups are identified; [0144]-[0150]:  (ii) Match Text Blocks Across Files… Where 1,000 resumes are processed, the results are shown in Table 6 showing the text blocks groups and the number of matching text blocks in each group [Matching text blocks in each text block group designate the text blocks as pertaining to the same individual. See para [0043]-[0049], [0133], [0233]-[0239]]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Calapodescu with the teachings of Martin to include “grouping resume snippets which have one or more summary-level information items in common into a same group to create one or more snippet groups; and for each created snippet group, employing a textual similarity measuring technique to cluster the snippets into sub-groups, and designating the snippets of each sub-group as pertaining to the same individual.”
The motivation for doing so would be to identify and match sections of files across multiple files, as recognized by Martin ([0002] of Martin: A data profile engine for identifying, classifying, analyzing, searching, comparing and cross-referencing entire files and sections of files, records and other forms of electronic media is provided).

Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Calapodescu (US 20170300565 A1, hereinafter Calapodescu) in view of Martin et al. (US 20140081995 A1, hereinafter Martin) and in further view of Dane et al (US Patent No. 10860737 B2, hereinafter Dane).

Regarding Claim 4, the combined teachings of Calapodescu and Martin discloses the system of claim 3.
However, the combined teachings of Calapodescu and Martin does not explicitly teach “wherein the entity that the individual who is the subject of the resume works for is associated with a business entity identified by its uniform resource locator.”
On the other hand, in the same field of endeavor, Dane teaches wherein the entity that the individual who is the subject of the resume works for is associated with is a business entity identified by its uniform resource locator ([Col.3, lines 3-5]: FIG. 13 shows the format of a Uniform Resource Locator (URL) pointing to resume data according to one embodiment of the example resume system; Fig. 15, [Col. 9, lines 59-62]: In one embodiment, the URLs 1515, 1520 are presented as hyperlinks that indicate the last name of the candidate, the current or last employer as well as the current or last position as is stored in the resume database).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined teachings of Calapodescu and Martin to incorporate the teachings of Dane to include “wherein the entity that the individual who is the subject of the resume works for is associated with is a business entity identified by its uniform resource locator.”
The motivation for doing so would be to allow the user to access the resume information by clicking on the URL, as recognized by Dane ([Col. 9, lines 64-67] of Dane: the Hiring Manager would then access the resume database for the resume information that has been forwarded to him or her by clicking on each of the URLs 1515, 1520).

Claims 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Calapodescu (US 20170300565 A1, hereinafter Calapodescu) in view of Martin et al. (US 20140081995 A1, hereinafter Martin) and in further view of Buhrmann (US 20160232160 A1).

Regarding Claim 17, the combined teachings of Calapodescu and Martin teach the system of Claim 16, 
and Calapodescu further discloses wherein the sub-program for measuring the similarity of the snippets in each pair of snippets in the sub-group, comprises sub-programs for: 
tokenizing each snippet in each pair of snippets in the sub-group that has not already been tokenized; removing extraneous words and symbols from the tokenized snippets ([0107]: The chunker component 65 may employ tokenization and syntactic and semantic features provided by a syntactic parser 52, such as the Xerox Incremental Parser (XIP), in addition to the results of the first extraction component 52 to split the text into chunks);
 vectorizing and normalizing each of the tokenized snippets ([0155]: The extracted text is saved in UTF-8 and some character normalization is applied (like quotes, new lines, etc.)).
However, the combined teachings of Calapodescu and Martin does not explicitly teach “applying a cosine similarity measurement technique to the vectorized snippets in each snippet pair in the sub-group.”
On the other hand, in the same field of endeavor, Buhrmann teaches applying a cosine similarity measurement technique to the vectorized snippets in each snippet pair in the group ([0137]: When comparing a document to a corpus, such as shown in FIG. 1B, both the Input 12 document and the Corpus 14 are converted into a concept vector representation or “concept map”. In this form, two concept maps may be compared using vector mathematical treatments. For example, cosine similarity may be applied between two concept vector representations to determine their similarity based on the distance between their concept vectors).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined teachings of Calapodescu and Martin to incorporate the teachings of Buhrmann to include “applying a cosine similarity measurement technique to the vectorized snippets in each snippet pair in the group.”
The motivation for doing so would be to compare natural language texts on the basis of their conceptual substance, as recognized by Buhrmann ([0137] of Buhrmann: In this way, natural language texts may be compared on the basis of their conceptual substance and not syntactical composition).


Regarding Claim 18, the combined teaching of Calapodescu, Martin, and Buhrmann discloses the system of Claim 17,
and Calapodescu further discloses wherein the sub-program for measuring the similarity of the snippets in each pair of snippets in the sub-group, further comprises additional sub-programs executed after applying the cosine similarity measurement technique and removing pairs of snippets found to have a measured degree of similarity below a prescribed minimum cosine similarity threshold, said additional sub-programs comprising:
applying a fuzzy string-matching technique to the tokenized snippets in each remaining snippet pair in the sub-group to calculate the difference between the snippets in each snippet pair ([0140]: The entity extraction components 52, 54 may miss such acronyms. But after a first pass, these missed instances can be searched for by using exact string matching or approximate string matching techniques or surface variation generation.) and 
produce a score ([0165]: For scoring the performance, Precision, Recall, and F1 metrics are applied per entity type on each document); and 
removing pairs of snippets found to have a score below a prescribed minimum score threshold ([0097]: This approach provides a split of the list of clusters into a “complete clusters” list (having at least a threshold number of the specified entity types represented in the cluster, which may depend on the correlation coefficients obtained for the entity types) and an “incomplete clusters” list (having less than the threshold number of the specified entity types represented in the cluster). 

Regarding Claim 19, the modified system of Calapodescu, Martin, and Buhrmann discloses the system of Claim 18, wherein the sub-program for measuring the similarity of the snippets in each pair of snippets in the sub-group, further comprises other sub-programs executed after applying a fuzzy string-matching technique and removing pairs of snippets found to have a score below a prescribed minimum score threshold, 
and Calapodescu further discloses said additional sub-programs comprising: 
identifying snippets in the sub-group that do not have a specified part of the summary-level information assigned thereto, which was not employed to cluster the snippets, in common with the other snippets ([0114] For example, as illustrated in FIG. 7, for each cluster window, a sequence of text chunks is identified using one or more of the chunking rules); and 
removing the snippets from the sub-group that do not have the specified part of the summary-level information assigned thereto ([0129]: Then, clusters that became complete (following the same decision rules described earlier) are removed from the incomplete cluster list).

Regarding Claim 20, the modified system of Calapodescu, Martin, and Buhrmann discloses the system of Claim 19, 
and Calapodescu further discloses wherein the specified part of the summary-level information which was not employed to cluster the snippets comprises a current job title associated with the individual who is the subject of the resume (Fig. 7, [0096]: For example, for each experience in the experience section in FIG. 7, it may be assumed that there must be at least one job title, one date range and one company name; [0104] As an example, consider the two examples 94, 96 found in the same resume shown in FIG. 7, which have been annotated by the first pass CRF model 78. The first example 94 is the search window of a complete sample and the second one 96 is the search window of an incomplete one (missing organization)).



Examiner Note
Examiner has cited particular columns/paragraph and line numbers in the references applied to the claims above for the convenience of the applicant. Although the specified citations are representative of the teachings of the art and are applied to specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested from the applicant in preparing responses, to fully consider the references in entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the Examiner.
In the case of amending the Claimed invention, Applicant is respectfully requested to indicate the portion(s) of the specification which dictate(s) the structure relied on for proper interpretation and also to verify and ascertain the metes and bounds of the claimed invention. This will assist in expediting compact prosecution. MPEP 714.02 recites: "Applicant should also specifically point out the support for any amendments made to the disclosure. See MPEP § 163.06. An amendment which does not comply with the provisions of 37 CFR 1.12l(b), (c),  (d), and (h) may be held not fully responsive. See MPEP § 714." Amendments not pointing to
specific support in the disclosure may be deemed as not complying with provisions of 37 C.F.R. 1.131(b), (c), (d), and (h) and therefore held not fully responsive. Generic statements such as "Applicants believe no new matter has been introduced" may be deemed insufficient.
 


Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHIRLEY D. HICKS whose telephone number is (571)272-3304.  The examiner can normally be reached on Mon - Fri 7:30 - 4:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Fred Ehichioya can be reached on (571) 272-4034.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/S D H/Examiner, Art Unit 2168
/IRETE F EHICHIOYA/Supervisory Patent Examiner, Art Unit 2168