Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
This action is responsive to the Amendments and Remarks filed in the U.S. on 8/16/2022.  Claims 1-11 and 14-15 are pending in the case. Claims 1 and 7 are written in independent form. Claims 12-13 are cancelled claims. Claims 14 and 15 are newly added claims.
Applicant’s amendments and remarks filed on 8/16/2022 have been fully considered but were not found to overcome the previously cited prior art. Accordingly, THIS ACTION IS MADE FINAL.

Claim Objections
Claim 15 is objected to because of the following informalities:
Claim 15 is objected to for lacking antecedent basis for “the group-based methodology” because a group-based methodology is not previously recited anywhere in the claim or in any of the claim(s) upon which Claim 15 depends.
  Appropriate correction is required.


Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1, and 7 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-patentable subject matter. The claims are directed to an abstract idea without significantly more.
At least Claims 1 and 7 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The judicial exception is not integrated into a practical application. The claims do not include additional elements that are sufficient to amount to significantly more than judicial exception. The eligibility analysis in support of these findings is provided below.

As per Claim 1,
STEP 1 (Yes):In accordance with Step 1 of the eligibility inquiry (as explained in MPEP 2106), it is fist noted in the claim system (claims 1-6 and 14-15) and method (claims 7-11) are directed to one of the eligible categories of subject matter and therefore satisfies Step 1.
STEP 2A Prong One (Yes):In accordance with Step 2A Prong one, it is noted that the claims recite an abstract idea by reciting concepts capable of being performed in the human mind (including an observation, evaluation, judgment, and opinion), which falls into the “Mental Processes” group within the enumerated groupings of abstract ideas. The independent claims recite the abstract idea of extracting personal data from a source to perform entity recognition, linking entities to one or more individuals, extracting context and record features from repositories, and predicting a purpose of processing personal data, and labelling personal data from documents, which falls within the abstract idea of performing mental processes of observation, evaluation, judgement, and opinion. The recitation of generic computer components does not negate the abstractness of the given limitations. 
The limitations include:
A method of data ingestion over a network, using one or more network computers that employ one or more processors to execute the method by performing actions, comprising:
an entity extraction module configured to extract personal data from one or more data repositories in a computer network or cloud infrastructure, wherein the entity extraction module is configured to perform entity recognition from structured, semi-structured and unstructured records in the one or more data repositories; (performing a data gathering step of extracting data to be used in a mental process of observation and evaluation for entity extraction)
a linkage module coupled to the entity extraction module and configured to use graph-based methodology to link the personal data to one or more individuals; and (performing a mental process of observing personal data and individuals and judgement to link personal data to one or more individuals)
a purpose prediction module comprising:
a feature extraction module, wherein the feature extraction module is configured to extract both context features and record’s features from records in the one or more data repositories, wherein the record’s features comprise metadata of the records and language of the records; and (performing a mental process of observing and evaluating features and judging which features to extract).
a purpose of processing prediction module configured to predict a unique or multiple purpose of processing of the personal data in response to the context features and the record’s features (performing a mental process of predicting a purpose of processing personal data based on observation, evaluation, and judgement).
unsupervised auto-labelling of personal data from documents in one or more data repositories in a computer network or cloud infrastructure, wherein the unsupervised auto-labelling reuses a text summarization methodology and includes key-phrase aggregation and linking techniques to predict a purpose of processing topic for the personal data (performing a mental process of applying labels to personal data from documents using a text summarization methodology and predicting a purpose of processing based on observation, evaluation, and judgement).

Step 2A Prong Two (No)
The additional elements are directed to the use of modules for performing the steps (Claim 1) and implementing the system of Claim 1 and method of Claim 7 using a computer. However,  these elements fail to integrate the abstract idea into a practical application because they fail to provide an improvement to the functioning of a computer or to any other technology or technical field, fail to apply the exception with a particular machine, fail to apply the judicial exception to effect a particular treatment or prophylaxis for a disease or medical condition, fail to effect a transformation of a particular article to a different state or thing, and fail to apply/use the abstract idea in a meaningful way beyond generally linking the use of the judicial exception to a particular technological environment. Furthermore, these elements have been fully considered, however they are directed to the use of generic computing elements to perform the abstract idea, which is not sufficient to amount to practical application.
Accordingly, because the Step 2A Prong One and Prong Two analysis resulted in the conclusion that the claims are directed to an abstract idea, additional analysis under Step 2B of the eligibility inquiry must be conducted in order to determine whether any claim element or combination of elements amount to significantly more than the judicial exception.

Step 2B (No):
It has been determined that the claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. The additional limitation(s) is/are directed to the use of modules for performing steps (Claim 1) and implementing the system of Claim 1 and method of Claim 7 using a computer, though at a very high level of generality and without imposing meaningful limitation on the scope of the claims. Such generic, high-level, and nominal involvement of a computer or computer-based elements for carrying out the invention merely servers to tie the abstract idea to a particular technological environment, which is not enough to render the claims patent-eligible, as noted at pg. 74624 of Federal Register/Vol. 79, No. 241, citing Alice, which in turn cites Mayo. Further, See, e.g., Alice Corp. Pty. Ltd. v. CLS Bank Int'l, 134 S. Ct. 2347, 2359-60, 110 USPQ2d 1976, 1984 (2014). See also OIP Techs. v. Amazon.com, 788 F.3d 1359, 1364, 115 USPQ2d 1090, 1093-94 (Fed. Cir. 2015) ("Just as Diehr could not save the claims in Alice, which were directed to 'implement[ing] the abstract idea of intermediated settlement on a generic computer', it cannot save O/P's claims directed to implementing the abstract idea of price optimization on a generic computer.") ( citations omitted). See also, Affinity Labs of Texas LLC v. DirecTV LLC, 838 F.3d 1253, 1257-1258 (Fed. Cir. 2016) (mere recitation of a GUI does not make a claim patent-eligible); Intellectual Ventures I LLC v. Capital One Bank, 792 F.3d 1363, 1370 (Fed. Cir. 2015)
("the interactive interface limitation is a generic computer element").
The additional elements are broadly applied to the abstract idea(s) at a high level of generality ("similar to how the recitation of the computer in the claims in Alice amounted to mere instructions to apply the abstract idea of intermediated settlement on a generic computer," as explained in MPEP §2106.05(f)) and they operate in well-understood, routine, and conventional manners. Furthermore, generally transmitting, analyzing, and outputting (e.g., displaying) data are examples of insignificant extra-solution activity. The recitation routing, moving, identifying are performed by an apparatus/device is the epitome of "mere instructions to implement an abstract idea on a computer".
MPEP § 2106.0S(d)(II) sets forth the following:
The courts have recognized the following computer functions as well-understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity.
• Receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec ... ; TLI Communications LLC v. AV Auto. LLC ... ; OIP Techs., Inc., v. Amazon.com, Inc ... ; buySAFE, Inc. v. Google, Inc ... ;
• Performing repetitive calculations, Flook ... ; Bancorp Services v. Sun Life ... ;
• Electronic recordkeeping, Alice Corp ... ; Ultramercial ... ;
• Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc ... ;
• Electronically scanning or extracting data from a physical document, Content Extraction and
Transmission, LLC v. Wells Fargo Bank ... ; and
• A web browser's back and forward button functionality, Internet Patent
• Corp. v. Active Network, Inc ...
. . . Courts have held computer-implemented processes not to be significantly more than an abstract idea (and thus ineligible) where the claim as a whole amounts to nothing more than generic computer functions merely used to implement an abstract idea, such as an idea that could be done by a human analog (i.e., by hand or by merely thinking) ...

In addition, when taken as an ordered combination, the ordered combination adds nothing that is not already present as when the elements are taken individually. There is no indication that the combination of elements integrate the abstract idea into a practical application. Their collective functions merely provide conventional computer implementation. Therefore, when viewed as a whole, these additional claim elements do not provide meaningful limitations to transform the abstract idea into a practical application of the abstract idea or that the ordered combination amounts to significantly more than the abstract idea itself


Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are:
“an entity extraction module configured to extract…” in Claim 1.
“the entity extraction module is configured to perform entity recognition…” in Claim 1.
“a linkage module… configured to use graph-based methodology…” in Claim 1.
“a purpose prediction module” in Claim 1.
“a feature extraction module…configured to extract…” in Claim 1.
“a purpose of processing prediction module configured to predict…” in Claim 1.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.



Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 4-8, and 14-15 are rejected under 35 U.S.C. 103 as being unpatentable over Hertz et al. (U.S. Pre-Grant Publication No. 2018/0082183, hereinafter referred to as Hertz), and further in view of Lagi et al. (U.S. Pre-Grant Publication No. 2020/0065857, hereinafter referred to as Lagi).


Regarding Claim 1:
Hertz teaches a system for personal data classification comprising:
an entity extraction module configured to extract personal data from one or more data repositories in a computer network or cloud infrastructure, wherein the entity extraction module is configured to perform entity recognition from structured, semi-structured and unstructured records in the one or more data repositories;
Hertz teaches “Thomson Reuters' Text Metadata Services group ("TMS") formerly known as ClearForest prior to acquisition in 2007, is one exemplary IE-based solution provider offering text analytics software used to "tag," or categorize, unstructured information and to extract facts about people, organizations, places or other details from news articles, Web pages and other documents” (Para. [0012]) thereby teaching an entity extraction module for extracting data about people, organizations, places, or other details from multiple data sources, wherein the entity extraction performs the entity recognition on unstructured information. Hertz further teaches extracting features from “structured and unstructured data across news, research, filings, transcripts, industry classifications, and economics” (para. [0029]) and data sets comprising legal content that “is mostly unstructured or semi-structured” (Page 13 - Table 1).
Hertz further teaches “the systems and techniques disclosed can be used to identify and quantify the significance of relationships among various entities including, but not limited to, organizations, people, products, industries, geographies, commodities, financial indicators, economic indicators, events, topics, subject codes, unique identifiers, social tags, industry terms, general terms, metadata elements, classification codes, and combinations thereof” (Para. [0026]) thereby teaching personal data being extracted to identify people and unique identifiers
a linkage module coupled to the entity extraction module and configured to use graph-based methodology to link the personal data to one or more individuals; and
Hertz teaches “Thomson Reuters' Text Metadata Services group ("TMS") formerly known as ClearForest prior to acquisition in 2007, is one exemplary IE-based solution provider offering text analytics software used to "tag," or categorize, unstructured information and to extract facts about people, organizations, places or other details from news articles, Web pages and other documents” (Para. [0012]). Hertz further teaches “we mine useful information from the data by adopting a variety of techniques including Named Entity Recognition (NER) and Relation Extraction (RE)” where “such mined information is further integrated with existing structured data (e.g., via Entity Linking (EL) techniques) to obtain relatively comprehensive descriptions of the entities” (Para. [0023]) and linking entities to nodes in a knowledge graph (Para. [0142]).
a purpose prediction module comprising:
a feature extraction module, wherein the feature extraction module is configured to extract both context features and record’s features from records in the one or more data repositories;
Hertz teaches “the extracted set of features may include context-based features” (Para. [0031]) where “a context module 28 is provided for determining a context (e.g., a circumstance, background) in which an identified entity is typically referenced in or referred to” (Para. [0058]) and “[predicting] a probability of a relationship based on an extracted set of features from a sentence” (Para. [0031]).  Therefore, Hertz teaches extracting both context and record features from the one or more data repositories.
wherein the record’s features comprise metadata of the records; and
Hertz teaches documents having tagged metadata (Para. [0008]). Hertz also teaches applying metadata elements to context between two entities (Para. [0082]) thereby teaching metadata as a record feature.
a purpose of processing prediction module configured to predict a unique or multiple purpose of processing of the personal data in response to the context features and the record’s features.
Hertz teaches “the extracted set of features may include context-based features” (Para. [0031]) where “a context module 28 is provided for determining a context (e.g., a circumstance, background) in which an identified entity is typically referenced in or referred to” (Para. [0058]) and “[predicting] a probability of a relationship based on an extracted set of features from a sentence” (Para. [0031]). Therefore, Hertz teaches extracting both context and record features from the one or more data repositories.
Hertz further teaches “when trying to identify the relationship between two identified companies, the industry information (i.e., healthcare, finance, automobile, etc.) of each company is retrieved from the knowledge graph and is used as a feature” (Para. [0140]) and “the association module 26 determines a frequency of the first entity and the second entity occurring in a context of each document of the set of documents 36” where “the context may include, but is not limited to, organizations, people…[and] topics” (Para. [0082])
Therefore, Hertz teaches predicting one or multiple purposes of processing the contents of a document using features relating to entities in the document being processed.

Hertz teaches all of the elements of the claimed invention as recited above except:
wherein the record’s features comprise language of the records;

However, in the related field of endeavor of feature extraction, Lagi teaches:
wherein the record’s features comprise language of the records;
Lagi teaches record features comprising language of the records by teaching “present concepts can be carried across languages insofar as an aspect hereof provides for manual or automated translation from a first language to a second language  (Para. [0082]) thereby teaching metadata about the records as being language of the records.  Lagi further teaches parsing news articles that “may use a wide range of language variations, including jargon, shorthand, and word play to describe a given type of the event” (Para. [0142]) thereby teaching record features being language variations including jargon, shorthand, and word play.


Thus, it would have been obvious to one of ordinary skill in the art, having the teachings of Lagi and Hertz at the time that the claimed invention was effectively filed, to have combined the feature extraction of language, document date, and document owner, as taught by Lagi, with the systems and techniques for determining relationships and association significance between entities, as taught by Hertz.
One would have been motivated to make such combination because Lagi teaches additional features for extraction, such as language (Paras. [0082], [0142] & [0163]), document date, and document owner (Para. [0127]) and it would have been obvious to a person having ordinary skill in the art that adding more features to extract in the system and method taught by Hertz would improve the teachings of Hertz by expanding the extracted information upon which entities can be related.


Regarding Claim 4:
Hertz further teaches:
wherein the entity extraction module is configured to perform entity recognition to extract more than fifty entity types with their further characterization.
Hertz teaches performing “named entity recognition (NER) on [a] document to extract various types of entities, including companies, people, locations, events, etc.” (Para. [0137]) thereby teaching a boundless list of types of entities that can be extracted for characterization.  Fifty entity types would be a reasonable amount of types of entities for a person having ordinary skill in the art to be able to be able to extract, especially with Hertz teaching extending capabilities “to suit many different industries” (Para. [0102]) such as the automobile industry, technology sector, healthcare, finance, and law.

Regarding Claim 5:
Hertz and Lagi further teach:
wherein the record’s features further include:
user rights,
Hertz teaches providing “users with electronic access to a system of databases and research tools” (Para. [0008]) thereby teaching access rights to various databases and thus records.
document date (Lagi - Para. [0127]) and
document owner (Lagi - Para. [0127]).

Regarding Claim 6:
Hertz further teaches:
wherein the purpose prediction module is coupled to the linkage module and is further configured to use the link of the personal data to one or more individuals to predict the unique or multiple purpose of processing of the personal data.
Hertz teaches “the extracted set of features may include context-based features” (Para. [0031]) where “a context module 28 is provided for determining a context (e.g., a circumstance, background) in which an identified entity is typically referenced in or referred to” (Para. [0058]) and “[predicting] a probability of a relationship based on an extracted set of features from a sentence” (Para. [0031]).
Hertz further teaches “when trying to identify the relationship between two identified companies, the industry information (i.e., healthcare, finance, automobile, etc.) of each company is retrieved from the knowledge graph and is used as a feature” (Para. [0140]) and “the association module 26 determines a frequency of the first entity and the second entity occurring in a context of each document of the set of documents 36” where “the context may include, but is not limited to, organizations, people…[and] topics” (Para. [0082]).
Therefore, Hertz teaches predicting one or multiple purposes of processing the contents of a document using features relating to entities in the document being processed where the entity can be a person.

Regarding Claim 7:
Hertz further teaches a method for personal data extraction comprising:
scanning one or more documents in one or more data repositories in a computer network or cloud infrastructure, the one or more documents comprising structured, semi-structured or unstructured documents;
Hertz teaches “Thomson Reuters' Text Metadata Services group ("TMS") formerly known as ClearForest prior to acquisition in 2007, is one exemplary IE-based solution provider offering text analytics software used to "tag," or categorize, unstructured information and to extract facts about people, organizations, places or other details from news articles, Web pages and other documents” (Para. [0012]) thereby teaching an entity extraction module for extracting data about people, organizations, places, or other details from multiple data sources, wherein the entity extraction performs the entity recognition on unstructured information. Hertz further teaches extracting features from “structured and unstructured data across news, research, filings, transcripts, industry classifications, and economics” (para. [0029]) and data sets comprising legal content that “is mostly unstructured or semi-structured” (Page 13 - Table 1).
Hertz further teaches “the systems and techniques disclosed can be used to identify and quantify the significance of relationships among various entities including, but not limited to, organizations, people, products, industries, geographies, commodities, financial indicators, economic indicators, events, topics, subject codes, unique identifiers, social tags, industry terms, general terms, metadata elements, classification codes, and combinations thereof” (Para. [0026]) thereby teaching personal data being extracted to identify people and unique identifiers
performing entity recognition in the structured, semi-structured and unstructured documents; and
Hertz teaches “Thomson Reuters' Text Metadata Services group ("TMS") formerly known as ClearForest prior to acquisition in 2007, is one exemplary IE-based solution provider offering text analytics software used to "tag," or categorize, unstructured information and to extract facts about people, organizations, places or other details from news articles, Web pages and other documents” (Para. [0012]) thereby teaching an entity extraction module for extracting data about people, organizations, places, or other details from multiple data sources, wherein the entity extraction performs the entity recognition on unstructured information. Hertz further teaches extracting features from “structured and unstructured data across news, research, filings, transcripts, industry classifications, and economics” (para. [0029]) and data sets comprising legal content that “is mostly unstructured or semi-structured” (Page 13 - Table 1).
extracting features from the structured, semi-structured and unstructured documents using deep learning and deterministic learning methodologies,
Hertz teaches extracting features from “structured and unstructured data across news, research, filings, transcripts, industry classifications, and economics” (para. [0029]) and data sets comprising legal content that “is mostly unstructured or semi-structured” (Page 13 - Table 1).
Hertz further teaches using “machine learning and/or deep learning models to identify sentences mentioning or references or representing a supply chain connection between two companies” (Para. [0027])
wherein the features extracted from the structured, semi-structured and unstructured documents comprise metadata and language of the structured, semi-structured and unstructured documents.
Hertz teaches documents having tagged metadata (Para. [0008]). Hertz also teaches applying metadata elements to context between two entities (Para. [0082]) thereby teaching metadata as a record feature.
Lagi teaches record features comprising language of the records by teaching “present concepts can be carried across languages insofar as an aspect hereof provides for manual or automated translation from a first language to a second language  (Para. [0082]) thereby teaching metadata about the records as being language of the records.  Lagi further teaches parsing news articles that “may use a wide range of language variations, including jargon, shorthand, and word play to describe a given type of the event” (Para. [0142]) thereby teaching record features being language variations including jargon, shorthand, and word play.


Regarding Claim 8:
Hertz further teaches:
wherein the extracting features comprises extracting more than fifty entity types and further categorization of personal data in the one or more documents.
Hertz teaches performing “named entity recognition (NER) on [a] document to extract various types of entities, including companies, people, locations, events, etc.” (Para. [0137]) thereby teaching a boundless list of types of entities that can be extracted for characterization.  Fifty entity types would be a reasonable amount of types of entities for a person having ordinary skill in the art to be able to be able to extract, especially with Hertz teaching extending capabilities “to suit many different industries” (Para. [0102]) such as the automobile industry, technology sector, healthcare, finance, and law.

Regarding Claim 14:
Hertz and Lagi further teach:
an interactive user interface coupled to the linkage module, wherein the linkage module is further configured to generate one or more visualizations of linkage of the extracted personal data and the one or more individuals, and wherein the interactive user interface is configured to present one or more visualizations of the linkage of the extracted personal data and the one or more individuals.
Hertz teaches “a user interface 44 operated by a user at access device 43 may be used for querying or otherwise interrogating the Knowledge Graph via Natural Language Interface/Knowledge Graph Interface Module 27 for responsive information” and “responsive data outputs may be generated at the Server 12 and returned to the remote access device 43 and presented and displayed to the associated user” where “Fig. 7 illustrates several exemplary input/output scenarios” (Para. [0059]).  Hertz further teaches outputting to the user “Linked Entities and Extracted Relations” (Figure 7) thereby teaching generating an output visualization of linkage relationships of extracted personal data and one or more individuals.

Regarding Claim 15:
Hertz and Lagi further teach:
wherein the group-based methodology comprises vector representation of graphs, and wherein the entity extraction module is configured to learn vector representations of graphs to link the extracted personal data and the one or more individuals.
Hertz teaches a support vector machine (SVM) where “the features needed for the SVM model are extracted from all pairs of comparable attributes between the given entity and a candidate node” and “based on…calculated similarity scores, the given entity is linked to the candidate node that hit has the highest similarity score with” (Paras. [0143] & [0145]).
A support vector machine is a supervised learning model with associated learning algorithms that analyze data for classification and regression analysis.  Therefore, Hertz teaches training the extraction module for learning vector representations in entity linking.

Claims 2, 3, and 9-11 are rejected under 35 U.S.C. 103 as being unpatentable over Hertz and Lagi, and further in view of Jones, JR. et al. (U.S. Pre-Grant Publication No. 2019/0164015, hereinafter referred to as Jones).

Regarding Claim 2:
Hertz and Jones teach all of the elements of the claimed invention as recited above except:
an anomaly detection configured to detect data breaches in response to the predicted purpose of processing the personal data.

However, in the related field of endeavor of machine learning techniques for evaluating entities, Jones teaches:
an anomaly detection configured to detect data breaches in response to the predicted purpose of processing the personal data.
Jones teaches “system100 may also include a news classification model 125 that is configured to classify relevant data about entities…into different areas of risk. For example, the different areas of risk may include one or more of regulatory risk, reputational risk, financial crime risk, control risk, cybersecurity risk, governance risk, environmental risk, and/or geopolitical risk” (Para. [0024]) where a risk score is determined “using observable risk attributes or factors associated with the entity such as…cybersecurity” (Para. [0028]).

Thus, it would have been obvious to one of ordinary skill in the art, having the teachings of Jones, Lagi, and Hertz at the time that the claimed invention was effectively filed, to have combined the classification of entities into risk areas, as taught by Jones, with the feature extraction of language, document date, and document owner, as taught by Lagi, and the systems and techniques for determining relationships and association significance between entities, as taught by Hertz.
One would have been motivated to make such combination because Jones teaches “predicting and rating the risk associated with certain entities” (Para. [0050]) including “credit and/or non-credit risk associated with companies or institutions” (Para. [0017]) and it would have been obvious to a person having ordinary skill in the art that using the relationship discovery and association taught by Hertz to then determine risks associated with the related entities, based on the entities, as taught by Jones, would provide greater insight into the entities being analyzed in the Hertz reference.

Regarding Claim 3:
Hertz, Lagi, and Jones further teach:
wherein the entity extraction module is configured to perform entity recognition using unsupervised deep learning and deterministic learning methodology.
Hertz teaches using “machine learning and/or deep learning models to identify sentences mentioning or references or representing a supply chain connection between two companies” (Para. [0027])
Jones teaches “the language detection model 104 maybe a deep learning method, for example, based on learning data representations” and “the deep learning method of the language detection model 104 may be supervised, semi-supervised, or unsupervised” (Para. [0020])


Regarding Claim 9:
Hertz, Lagi, and Jones further teach:
personal data processing prediction comprising unsupervised auto-labeling of personal data from documents in one or more data repositories in a computer network or cloud infrastructure,
Hertz teaches using “machine learning and/or deep learning models to identify sentences mentioning or references or representing a supply chain connection between two companies” (Para. [0027])
Jones teaches “the language detection model 104 maybe a deep learning method, for example, based on learning data representations” and “the deep learning method of the language detection model 104 may be supervised, semi-supervised, or unsupervised” (Para. [0020])
wherein the unsupervised auto-labeling reuses a text summarization methodology and includes a key-phase aggregation and linking techniques to predict a purpose of processing topic for the personal data.
Hertz teaches “Thomson Reuters' Text Metadata Services group ("TMS") formerly known as ClearForest prior to acquisition in 2007, is one exemplary IE-based solution provider offering text analytics software used to "tag," or categorize, unstructured information and to extract facts about people, organizations, places or other details from news articles, Web pages and other documents” (Para. [0012]). Hertz further teaches “we mine useful information from the data by adopting a variety of techniques including Named Entity Recognition (NER) and Relation Extraction (RE)” where “such mined information is further integrated with existing structured data (e.g., via Entity Linking (EL) techniques) to obtain relatively comprehensive descriptions of the entities” (Para. [0023]) and linking entities to nodes in a knowledge graph (Para. [0142]).
Hertz also teaches “the extracted set of features may include context-based features” (Para. [0031]) where “a context module 28 is provided for determining a context (e.g., a circumstance, background) in which an identified entity is typically referenced in or referred to” (Para. [0058]) and “[predicting] a probability of a relationship based on an extracted set of features from a sentence” (Para. [0031]). Hertz further teaches “when trying to identify the relationship between two identified companies, the industry information (i.e., healthcare, finance, automobile, etc.) of each company is retrieved from the knowledge graph and is used as a feature” (Para. [0140]) and “the association module 26 determines a frequency of the first entity and the second entity occurring in a context of each document of the set of documents 36” where “the context may include, but is not limited to, organizations, people…[and] topics” (Para. [0082])
Therefore, Hertz teaches predicting one or multiple purposes of processing the contents of a document using features relating to entities in the document being processed.

Regarding Claim 10:
Hertz, Lagi, and Jones further teach:
supervised and unsupervised machine learning model training for identifying relationship between personal data entities; and
Hertz teaches using “machine learning and/or deep learning models to identify sentences mentioning or references or representing a supply chain connection between two companies” (Para. [0027])
Jones teaches “the language detection model 104 maybe a deep learning method, for example, based on learning data representations” and “the deep learning method of the language detection model 104 may be supervised, semi-supervised, or unsupervised” (Para. [0020])
calculating a probability that a linking of the personal data entities form the identified relationships.
Hertz teaches “the extracted set of features may include context-based features” (Para. [0031]) where “a context module 28 is provided for determining a context (e.g., a circumstance, background) in which an identified entity is typically referenced in or referred to” (Para. [0058]) and “[predicting] a probability of a relationship based on an extracted set of features from a sentence” (Para. [0031]).

Regarding Claim 11:
All of the elements herein are similar to some or all of the elements of Claim 9.




Response to Amendment
Applicant’s Amendments, filed on 8/16/2022, are acknowledged and accepted.
As stated above and restated here for convenience, Applicant’s amendments and remarks filed on 8/16/2022 have been fully considered but were not found to overcome the previously cited prior art. Accordingly, THIS ACTION IS MADE FINAL.


Response to Arguments
On page 8 of the remarks filed on 8/16/2022, Applicant argues that “extracting record features such as metadata and language from structured, semi-structured, and unstructured records is not ‘an abstract idea capable of being performed in the human mind’” because “as defined in the present application at Paragraph [0037]), structured records, unstructured records, and semi-structure records are: 
A structured table is a list of rows that share the same set of named columns such as an Excel spreadsheet. On the other hand, unstructured text is a list sentences without any specific structure. Articles are mostly composed of unstructured text. An unstructured record 110 is a document that only contains unstructured text. A structured record 106 is a document that only includes structured tables. A semi-structured record 108 contains both structured and unstructured elements.”
Applicant’s argument is not convincing because a human could very easily extract metadata, which is merely data about data, and language through observing, evaluating, judging, and opining on any of the  structured, unstructured, and semi-structured examples recited in the present application at Paragraph [0037]). For example a human could look at a structured table such as an excel spreadsheet and observe data about the data and language associated with the spreadsheet. As another example, a human could look at unstructured text such as sentences and observe data about the sentences and the language associated with the sentences.

On pages 9-10 of the remarks filed on 8/16/2022, Applicant argues that “neither instance of ‘metadata’ mentioned in Hertz teaches or discloses extracting record’s features which include metadata to predict a unique or multiple purpose of processing personal data as called for in independent Claims 1 and 7, as amended”.Applicant’s argument is not convincing because Hertz teaches extracting record features, such as “extracted set of features from a sentence” for predicting a probability of a relationship (Para. [0031]) where extracted features of a record include metadata tagged to documents and used to determine context between two entities (Paras. [0008] & [0082]). 

On pages 11-12 of the remarks filed on 8/16/2022, Applicant argues that “the use of the word ‘language’ in Lagi at the paragraphs indicated by the Examiner do not relate to extracting record’s features which include metadata and language to predict a unique or multiple purpose of processing personal data as called for in independent Claims 1 and 7, as amended”.Applicant’s argument is not convincing because Hertz is relied upon as teaching extracting record’s features to predict a unique or multiple purpose of processing personal data. Lagi is relied upon as teaching that the extracted record’s features comprise language of the records (Paras. [0082] & [0142]). Therefore, Lagi in combination with Hertz teaches extracting record’s features which include metadata and language to predict a unique or multiple purpose of processing personal data as called for in independent Claims 1 and 7.


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Malabarba (U.S. Pre-Grant Publication NO. 2020/0184210) teaches a multi-modal document feature extraction for extracting features from a document comprising text information and image information.
Kasravi et al. (U.S. Pre-Grant Publication No. 2014/0201111) teaches confidentiality classification of files including vectorizing a file to reduce the file to a single structured representation and analyzing the representation with a machine learning engine that generates a confidentiality classification for the file wherein the vectorization includes feature extraction of the file.
Zhao et al. (U.S. Pre-Grant Publication No. 2018/0246973) teaches techniques for providing user interest modeling by determining a plurality of interests associated with a user account and generating a content feed for a user that includes one or more web documents based on the plurality of interests.

THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ROBERT F MAY whose telephone number is (571)272-3195. The examiner can normally be reached Monday-Friday 9:30am to 6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hosain Alam can be reached on 571-272-3978. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ROBERT F MAY/Examiner, Art Unit 2154                                                                                                                                                                                                        9/21/2022

/HOSAIN T ALAM/Supervisory Patent Examiner, Art Unit 2154