Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
This action is responsive to the Application filed in the U.S. on 12/31/2019.  Claims 1-13 are pending in the case. Claims 1, 7, 12, and 13 are written in independent form.


Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1, 7, 12, and 13, and 17 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-patentable subject matter. The claims are directed to an abstract idea without significantly more.
At least Claims 1, 7, 12, and 13 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The judicial exception is not integrated into a practical application. The claims do not include additional elements that are sufficient to amount to significantly more than judicial exception. The eligibility analysis in support of these findings is provided below.

As per Claim 1,
STEP 1 (Yes):In accordance with Step 1 of the eligibility inquiry (as explained in MPEP 2106), it is fist noted in the claim system (claims 1-6),  method (claims 7-11), method (claim 12) and method (claim 13) are directed to one of the eligible categories of subject matter and therefore satisfies Step 1.
STEP 2A Prong One (Yes):In accordance with Step 2A Prong one, it is noted that the claims recite an abstract idea by reciting concepts capable of being performed in the human mind (including an observation, evaluation, judgment, and opinion), which falls into the “Mental Processes” group within the enumerated groupings of abstract ideas. The independent claims recite the abstract idea of extracting personal data from a source to perform entity recognition, linking entities to one or more individuals, extracting context and record features from repositories, and predicting a purpose of processing personal data, and labelling personal data from documents, which falls within the abstract idea of performing mental processes of observation, evaluation, judgement, and opinion. The recitation of generic computer components does not negate the abstractness of the given limitations. 
The limitations include:
A method of data ingestion over a network, using one or more network computers that employ one or more processors to execute the method by performing actions, comprising:
an entity extraction module for extracting personal data from one or more data repositories in a computer network or cloud infrastructure, wherein the entity extraction module performs entity recognition from structured, semi-structured and unstructured records in the one or more data repositories; (performing a data gathering step of extracting data to be used in a mental process of observation and evaluation for entity extraction)
a linkage module coupled to the entity extraction module and using graph-based methodology to link the personal data to one or more individuals; and (performing a mental process of observing personal data and individuals and judgement to link personal data to one or more individuals)
a purpose prediction module comprising:
a feature extraction module, wherein the feature extraction module extracts both context features and record’s features from records in the one or more data repositories; and (performing a mental process of observing and evaluating features and judging which features to extract)
a purpose of processing prediction module for predicting a unique or multiple purpose of processing of the personal data; and (performing a mental process of predicting a purpose of processing personal data based on observation, evaluation, and judgement)
unsupervised auto-labelling of personal data from documents in one or more data repositories in a computer network or cloud infrastructure, wherein the unsupervised auto-labelling reuses a text summarization methodology and includes key-phrase aggregation and linking techniques to predict a purpose of processing topic for the personal data (performing a mental process of applying labels to personal data from documents using a text summarization methodology and predicting a purpose of processing based on observation, evaluation, and judgement).

Step 2A Prong Two (No)
The additional elements are directed to the use of a modules for performing the steps (Claim 1) and using supervised and unsupervised machine learning (Claims 12 and 13). However,  these elements fail to integrate the abstract idea into a practical application because they fail to provide an improvement to the functioning of a computer or to any other technology or technical field, fail to apply the exception with a particular machine, fail to apply the judicial exception to effect a particular treatment or prophylaxis for a disease or medical condition, fail to effect a transformation of a particular article to a different state or thing, and fail to apply/use the abstract idea in a meaningful way beyond generally linking the use of the judicial exception to a particular technological environment. Furthermore, these elements have been fully considered, however they are directed to the use of generic computing elements to perform the abstract idea, which is not sufficient to amount to practical application.
Accordingly, because the Step 2A Prong One and Prong Two analysis resulted in the conclusion that the claims are directed to an abstract idea, additional analysis under Step 2B of the eligibility inquiry must be conducted in order to determine whether any claim element or combination of elements amount to significantly more than the judicial exception.

Step 2B (No):
It has been determined that the claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. The additional limitation(s) is/are directed to the use of a modules for performing steps (Claim 1) and using supervised and unsupervised machine learning (Claims 12 and 13), though at a very high level of generality and without imposing meaningful limitation on the scope of the claims. Such generic, high-level, and nominal involvement of a computer or computer-based elements for carrying out the invention merely servers to tie the abstract idea to a particular technological environment, which is not enough to render the claims patent-eligible, as noted at pg. 74624 of Federal Register/Vol. 79, No. 241, citing Alice, which in turn cites Mayo. Further, See, e.g., Alice Corp. Pty. Ltd. v. CLS Bank Int'l, 134 S. Ct. 2347, 2359-60, 110 USPQ2d 1976, 1984 (2014). See also OIP Techs. v. Amazon.com, 788 F.3d 1359, 1364, 115 USPQ2d 1090, 1093-94 (Fed. Cir. 2015) ("Just as Diehr could not save the claims in Alice, which were directed to 'implement[ing] the abstract idea of intermediated settlement on a generic computer', it cannot save O/P's claims directed to implementing the abstract idea of price optimization on a generic computer.") ( citations omitted). See also, Affinity Labs of Texas LLC v. DirecTV LLC, 838 F.3d 1253, 1257-1258 (Fed. Cir. 2016) (mere recitation of a GUI does not make a claim patent-eligible); Intellectual Ventures I LLC v. Capital One Bank, 792 F.3d 1363, 1370 (Fed. Cir. 2015)
("the interactive interface limitation is a generic computer element").
The additional elements are broadly applied to the abstract idea(s) at a high level of generality ("similar to how the recitation of the computer in the claims in Alice amounted to mere instructions to apply the abstract idea of intermediated settlement on a generic computer," as explained in MPEP §2106.05(f)) and they operate in well-understood, routine, and conventional manners. Furthermore, generally transmitting, analyzing, and outputting (e.g., displaying) data are examples of insignificant extra-solution activity. The recitation routing, moving, identifying are performed by an apparatus/device is the epitome of "mere instructions to implement an abstract idea on a computer".
MPEP § 2106.0S(d)(II) sets forth the following:
The courts have recognized the following computer functions as well-understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity.
• Receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec ... ; TLI Communications LLC v. AV Auto. LLC ... ; OIP Techs., Inc., v. Amazon.com, Inc ... ; buySAFE, Inc. v. Google, Inc ... ;
• Performing repetitive calculations, Flook ... ; Bancorp Services v. Sun Life ... ;
• Electronic recordkeeping, Alice Corp ... ; Ultramercial ... ;
• Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc ... ;
• Electronically scanning or extracting data from a physical document, Content Extraction and
Transmission, LLC v. Wells Fargo Bank ... ; and
• A web browser's back and forward button functionality, Internet Patent
• Corp. v. Active Network, Inc ...
. . . Courts have held computer-implemented processes not to be significantly more than an abstract idea (and thus ineligible) where the claim as a whole amounts to nothing more than generic computer functions merely used to implement an abstract idea, such as an idea that could be done by a human analog (i.e., by hand or by merely thinking) ...

In addition, when taken as an ordered combination, the ordered combination adds nothing that is not already present as when the elements are taken individually. There is no indication that the combination of elements integrate the abstract idea into a practical application. Their collective functions merely provide conventional computer implementation. Therefore, when viewed as a whole, these additional claim elements do not provide meaningful limitations to transform the abstract idea into a practical application of the abstract idea or that the ordered combination amounts to significantly more than the abstract idea itself


Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1, 4, 6-8 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Hertz et al. (U.S. Pre-Grant Publication No. 2018/0082183, hereinafter referred to as Hertz).

Regarding Claim 1:
Hertz teaches a system for personal data classification comprising:
an entity extraction module for extracting personal data from one or more data repositories in a computer network or cloud infrastructure, wherein the entity extraction module performs entity recognition from structured, semi-structured and unstructured records in the one or more data repositories;
Hertz teaches “Thomson Reuters' Text Metadata Services group ("TMS") formerly known as ClearForest prior to acquisition in 2007, is one exemplary IE-based solution provider offering text analytics software used to "tag," or categorize, unstructured information and to extract facts about people, organizations, places or other details from news articles, Web pages and other documents” (Para. [0012]) thereby teaching an entity extraction module for extracting data about people, organizations, places, or other details from multiple data sources, wherein the entity extraction performs the entity recognition on unstructured information. Hertz further teaches extracting features from “structured and unstructured data across news, research, filings, transcripts, industry classifications, and economics” (para. [0029]) and data sets comprising legal content that “is mostly unstructured or semi-structured” (Page 13 - Table 1).
Hertz further teaches “the systems and techniques disclosed can be used to identify and quantify the significance of relationships among various entities including, but not limited to, organizations, people, products, industries, geographies, commodities, financial indicators, economic indicators, events, topics, subject codes, unique identifiers, social tags, industry terms, general terms, metadata elements, classification codes, and combinations thereof” (Para. [0026]) thereby teaching personal data being extracted to identify people and unique identifiers
a linkage module coupled to the entity extraction module and using graph-based methodology to link the personal data to one or more individuals; and
Hertz teaches “Thomson Reuters' Text Metadata Services group ("TMS") formerly known as ClearForest prior to acquisition in 2007, is one exemplary IE-based solution provider offering text analytics software used to "tag," or categorize, unstructured information and to extract facts about people, organizations, places or other details from news articles, Web pages and other documents” (Para. [0012]). Hertz further teaches “we mine useful information from the data by adopting a variety of techniques including Named Entity Recognition (NER) and Relation Extraction (RE)” where “such mined information is further integrated with existing structured data (e.g., via Entity Linking (EL) techniques) to obtain relatively comprehensive descriptions of the entities” (Para. [0023]) and linking entities to nodes in a knowledge graph (Para. [0142]).
a purpose prediction module comprising:
a feature extraction module, wherein the feature extraction module extracts both context features and record’s features from records in the one or more data repositories; and
Hertz teaches “the extracted set of features may include context-based features” (Para. [0031]) where “a context module 28 is provided for determining a context (e.g., a circumstance, background) in which an identified entity is typically referenced in or referred to” (Para. [0058]) and “[predicting] a probability of a relationship based on an extracted set of features from a sentence” (Para. [0031]).  Therefore, Hertz teaches extracting both context and record features from the one or more data repositories.
a purpose of processing prediction module for predicting a unique or multiple purpose of processing of the personal data.
Hertz teaches “the extracted set of features may include context-based features” (Para. [0031]) where “a context module 28 is provided for determining a context (e.g., a circumstance, background) in which an identified entity is typically referenced in or referred to” (Para. [0058]) and “[predicting] a probability of a relationship based on an extracted set of features from a sentence” (Para. [0031]).
Hertz further teaches “when trying to identify the relationship between two identified companies, the industry information (i.e., healthcare, finance, automobile, etc.) of each company is retrieved from the knowledge graph and is used as a feature” (Para. [0140]) and “the association module 26 determines a frequency of the first entity and the second entity occurring in a context of each document of the set of documents 36” where “the context may include, but is not limited to, organizations, people…[and] topics” (Para. [0082])
Therefore, Hertz teaches predicting one or multiple purposes of processing the contents of a document using features relating to entities in the document being processed.

Regarding Claim 4:
Hertz further teaches:
wherein the entity extraction module performs entity recognition to extract more than fifty entity types with their further characterization.
Hertz teaches performing “named entity recognition (NER) on [a] document to extract various types of entities, including companies, people, locations, events, etc.” (Para. [0137]) thereby teaching a boundless list of types of entities that can be extracted for characterization.  Fifty entity types would be a reasonable amount of types of entities for a person having ordinary skill in the art to be able to be able to extract, especially with Hertz teaching extending capabilities “to suit many different industries” (Para. [0102]) such as the automobile industry, technology sector, healthcare, finance, and law.

Regarding Claim 6:
Hertz further teaches:
wherein the purpose prediction module is coupled to the linkage module and further uses the link of the personal data to one or more individuals to predict the unique or multiple purpose of processing of the personal data.
Hertz teaches “the extracted set of features may include context-based features” (Para. [0031]) where “a context module 28 is provided for determining a context (e.g., a circumstance, background) in which an identified entity is typically referenced in or referred to” (Para. [0058]) and “[predicting] a probability of a relationship based on an extracted set of features from a sentence” (Para. [0031]).
Hertz further teaches “when trying to identify the relationship between two identified companies, the industry information (i.e., healthcare, finance, automobile, etc.) of each company is retrieved from the knowledge graph and is used as a feature” (Para. [0140]) and “the association module 26 determines a frequency of the first entity and the second entity occurring in a context of each document of the set of documents 36” where “the context may include, but is not limited to, organizations, people…[and] topics” (Para. [0082]).
Therefore, Hertz teaches predicting one or multiple purposes of processing the contents of a document using features relating to entities in the document being processed where the entity can be a person.

Regarding Claim 7:
Hertz further teaches a method for personal data extraction comprising:
scanning one or more documents in one or more data repositories in a computer network or cloud infrastructure, the one or more documents comprising structured, semi-structured or unstructured documents;
Hertz teaches “Thomson Reuters' Text Metadata Services group ("TMS") formerly known as ClearForest prior to acquisition in 2007, is one exemplary IE-based solution provider offering text analytics software used to "tag," or categorize, unstructured information and to extract facts about people, organizations, places or other details from news articles, Web pages and other documents” (Para. [0012]) thereby teaching an entity extraction module for extracting data about people, organizations, places, or other details from multiple data sources, wherein the entity extraction performs the entity recognition on unstructured information. Hertz further teaches extracting features from “structured and unstructured data across news, research, filings, transcripts, industry classifications, and economics” (para. [0029]) and data sets comprising legal content that “is mostly unstructured or semi-structured” (Page 13 - Table 1).
Hertz further teaches “the systems and techniques disclosed can be used to identify and quantify the significance of relationships among various entities including, but not limited to, organizations, people, products, industries, geographies, commodities, financial indicators, economic indicators, events, topics, subject codes, unique identifiers, social tags, industry terms, general terms, metadata elements, classification codes, and combinations thereof” (Para. [0026]) thereby teaching personal data being extracted to identify people and unique identifiers
performing entity recognition in the structured, semi-structured and unstructured documents; and
Hertz teaches “Thomson Reuters' Text Metadata Services group ("TMS") formerly known as ClearForest prior to acquisition in 2007, is one exemplary IE-based solution provider offering text analytics software used to "tag," or categorize, unstructured information and to extract facts about people, organizations, places or other details from news articles, Web pages and other documents” (Para. [0012]) thereby teaching an entity extraction module for extracting data about people, organizations, places, or other details from multiple data sources, wherein the entity extraction performs the entity recognition on unstructured information. Hertz further teaches extracting features from “structured and unstructured data across news, research, filings, transcripts, industry classifications, and economics” (para. [0029]) and data sets comprising legal content that “is mostly unstructured or semi-structured” (Page 13 - Table 1).
extracting features from the structured, semi-structured and unstructured documents using deep learning and deterministic learning methodologies.
Hertz teaches extracting features from “structured and unstructured data across news, research, filings, transcripts, industry classifications, and economics” (para. [0029]) and data sets comprising legal content that “is mostly unstructured or semi-structured” (Page 13 - Table 1).
Hertz further teaches using “machine learning and/or deep learning models to identify sentences mentioning or references or representing a supply chain connection between two companies” (Para. [0027])

Regarding Claim 8:
Hertz further teaches:
wherein the extracting features comprises extracting more than fifty entity types and further categorization of personal data in the one or more documents.
Hertz teaches performing “named entity recognition (NER) on [a] document to extract various types of entities, including companies, people, locations, events, etc.” (Para. [0137]) thereby teaching a boundless list of types of entities that can be extracted for characterization.  Fifty entity types would be a reasonable amount of types of entities for a person having ordinary skill in the art to be able to be able to extract, especially with Hertz teaching extending capabilities “to suit many different industries” (Para. [0102]) such as the automobile industry, technology sector, healthcare, finance, and law.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 2, 3, and 9-13 are rejected under 35 U.S.C. 103 as being unpatentable over Hertz, and further in view of Jones, JR. et al. (U.S. Pre-Grant Publication No. 2019/0164015, hereinafter referred to as Jones).

Regarding Claim 2:
Hertz teaches all of the elements of the claimed invention as recited above except:
an anomaly detection for detecting data breaches in response to the predicted purpose of processing the personal data.

However, in the related field of endeavor of machine learning techniques for evaluating entities, Jones teaches:
an anomaly detection for detecting data breaches in response to the predicted purpose of processing the personal data.
Jones teaches “system100 may also include a news classification model 125 that is configured to classify relevant data about entities…into different areas of risk. For example, the different areas of risk may include one or more of regulatory risk, reputational risk, financial crime risk, control risk, cybersecurity risk, governance risk, environmental risk, and/or geopolitical risk” (Para. [0024]) where a risk score is determined “using observable risk attributes or factors associated with the entity such as…cybersecurity” (Para. [0028]).

Thus, it would have been obvious to one of ordinary skill in the art, having the teachings of Jones and Hertz at the time that the claimed invention was effectively filed, to have combined the classification of entities into risk areas, as taught by Jones, with the systems and techniques for determining relationships and association significance between entities, as taught by Hertz.
One would have been motivated to make such combination because Jones teaches “predicting and rating the risk associated with certain entities” (Para. [0050]) including “credit and/or non-credit risk associated with companies or institutions” (Para. [0017]) and it would have been obvious to a person having ordinary skill in the art that using the relationship discovery and association taught by Hertz to then determine risks associated with the related entities, based on the entities, as taught by Jones, would provide greater insight into the entities being analyzed in the Hertz reference.

Regarding Claim 3:
Hertz and Jones further teach:
wherein the entity extraction module performs entity recognition using unsupervised deep learning and deterministic learning methodology.
Hertz teaches using “machine learning and/or deep learning models to identify sentences mentioning or references or representing a supply chain connection between two companies” (Para. [0027])
Jones teaches “the language detection model 104 maybe a deep learning method, for example, based on learning data representations” and “the deep learning method of the language detection model 104 may be supervised, semi-supervised, or unsupervised” (Para. [0020])

Regarding Claim 9:
Hertz and Jones further teach:
personal data processing prediction comprising unsupervised auto-labeling of personal data from documents in one or more data repositories in a computer network or cloud infrastructure,
Hertz teaches using “machine learning and/or deep learning models to identify sentences mentioning or references or representing a supply chain connection between two companies” (Para. [0027])
Jones teaches “the language detection model 104 maybe a deep learning method, for example, based on learning data representations” and “the deep learning method of the language detection model 104 may be supervised, semi-supervised, or unsupervised” (Para. [0020])
wherein the unsupervised auto-labeling reuses a text summarization methodology and includes a key-phase aggregation and linking techniques to predict a purpose of processing topic for the personal data.
Hertz teaches “Thomson Reuters' Text Metadata Services group ("TMS") formerly known as ClearForest prior to acquisition in 2007, is one exemplary IE-based solution provider offering text analytics software used to "tag," or categorize, unstructured information and to extract facts about people, organizations, places or other details from news articles, Web pages and other documents” (Para. [0012]). Hertz further teaches “we mine useful information from the data by adopting a variety of techniques including Named Entity Recognition (NER) and Relation Extraction (RE)” where “such mined information is further integrated with existing structured data (e.g., via Entity Linking (EL) techniques) to obtain relatively comprehensive descriptions of the entities” (Para. [0023]) and linking entities to nodes in a knowledge graph (Para. [0142]).
Hertz also teaches “the extracted set of features may include context-based features” (Para. [0031]) where “a context module 28 is provided for determining a context (e.g., a circumstance, background) in which an identified entity is typically referenced in or referred to” (Para. [0058]) and “[predicting] a probability of a relationship based on an extracted set of features from a sentence” (Para. [0031]). Hertz further teaches “when trying to identify the relationship between two identified companies, the industry information (i.e., healthcare, finance, automobile, etc.) of each company is retrieved from the knowledge graph and is used as a feature” (Para. [0140]) and “the association module 26 determines a frequency of the first entity and the second entity occurring in a context of each document of the set of documents 36” where “the context may include, but is not limited to, organizations, people…[and] topics” (Para. [0082])
Therefore, Hertz teaches predicting one or multiple purposes of processing the contents of a document using features relating to entities in the document being processed.

Regarding Claim 10:
Hertz and Jones further teach:
supervised and unsupervised machine learning model training for identifying relationship between personal data entities; and
Hertz teaches using “machine learning and/or deep learning models to identify sentences mentioning or references or representing a supply chain connection between two companies” (Para. [0027])
Jones teaches “the language detection model 104 maybe a deep learning method, for example, based on learning data representations” and “the deep learning method of the language detection model 104 may be supervised, semi-supervised, or unsupervised” (Para. [0020])
calculating a probability that a linking of the personal data entities form the identified relationships.
Hertz teaches “the extracted set of features may include context-based features” (Para. [0031]) where “a context module 28 is provided for determining a context (e.g., a circumstance, background) in which an identified entity is typically referenced in or referred to” (Para. [0058]) and “[predicting] a probability of a relationship based on an extracted set of features from a sentence” (Para. [0031]).

Regarding Claim 11:
All of the elements herein are similar to some or all of the elements of Claim 9.

Regarding Claim 12:
Hertz and Jones further teach a method for personal data linking comprising:
supervised and unsupervised machine learning model training for identifying relationships between personal data entities; and
Hertz teaches using “machine learning and/or deep learning models to identify sentences mentioning or references or representing a supply chain connection between two companies” (Para. [0027])
Jones teaches “the language detection model 104 maybe a deep learning method, for example, based on learning data representations” and “the deep learning method of the language detection model 104 may be supervised, semi-supervised, or unsupervised” (Para. [0020])
calculating a probability of a linking of the personal data entities form the identified relationships.
Hertz teaches “the extracted set of features may include context-based features” (Para. [0031]) where “a context module 28 is provided for determining a context (e.g., a circumstance, background) in which an identified entity is typically referenced in or referred to” (Para. [0058]) and “[predicting] a probability of a relationship based on an extracted set of features from a sentence” (Para. [0031]).

Regarding Claim 13:
Hertz further teaches a method for personal data processing prediction comprises:
unsupervised auto-labeling of personal data from documents in one or more data repositories in a computer network or cloud infrastructure,
Hertz teaches using “machine learning and/or deep learning models to identify sentences mentioning or references or representing a supply chain connection between two companies” (Para. [0027])
Jones teaches “the language detection model 104 maybe a deep learning method, for example, based on learning data representations” and “the deep learning method of the language detection model 104 may be supervised, semi-supervised, or unsupervised” (Para. [0020])
wherein the unsupervised auto-labeling reuses a text summarization methodology and includes a key-phase aggregation and linking techniques to predict a purpose of processing topic for the personal data.
Hertz teaches “Thomson Reuters' Text Metadata Services group ("TMS") formerly known as ClearForest prior to acquisition in 2007, is one exemplary IE-based solution provider offering text analytics software used to "tag," or categorize, unstructured information and to extract facts about people, organizations, places or other details from news articles, Web pages and other documents” (Para. [0012]). Hertz further teaches “we mine useful information from the data by adopting a variety of techniques including Named Entity Recognition (NER) and Relation Extraction (RE)” where “such mined information is further integrated with existing structured data (e.g., via Entity Linking (EL) techniques) to obtain relatively comprehensive descriptions of the entities” (Para. [0023]) and linking entities to nodes in a knowledge graph (Para. [0142]).
Hertz also teaches “the extracted set of features may include context-based features” (Para. [0031]) where “a context module 28 is provided for determining a context (e.g., a circumstance, background) in which an identified entity is typically referenced in or referred to” (Para. [0058]) and “[predicting] a probability of a relationship based on an extracted set of features from a sentence” (Para. [0031]). Hertz further teaches “when trying to identify the relationship between two identified companies, the industry information (i.e., healthcare, finance, automobile, etc.) of each company is retrieved from the knowledge graph and is used as a feature” (Para. [0140]) and “the association module 26 determines a frequency of the first entity and the second entity occurring in a context of each document of the set of documents 36” where “the context may include, but is not limited to, organizations, people…[and] topics” (Para. [0082])
Therefore, Hertz teaches predicting one or multiple purposes of processing the contents of a document using features relating to entities in the document being processed.


Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Hertz, and further in view of Lagi et al. (U.S. Pre-Grant Publication No. 2020/0065857, hereinafter referred to as Lagi).

Regarding Claim 5:
Hertz further teaches:
wherein the record’s features include:
user rights,
Hertz teaches providing “users with electronic access to a system of databases and research tools” (Para. [0008]) thereby teaching access rights to various databases and thus records.
metadata,
Hertz teaches structured documents having tagged metadata (Para. [0008]). Hertz also teaches applying metadata elements to context between two entities (Para. [0082]) thereby teaching metadata as a record feature.

Hertz teaches all of the elements of the claimed invention as recited above except:
wherein the record’s features include language, document date and document owner.

However, in the related field of endeavor of feature extraction, Lagi teaches:
wherein the record’s features include:
language (Paras. [0082], [0142], & [0163]),
document date (Para. [0127]) and
document owner (Para. [0127])

One would have been motivated to make such combination because Lagi teaches additional features for extraction, such as language (Paras. [0082], [0142] & [0163]), document date, and document owner (Para. [0127]) and it would have been obvious to a person having ordinary skill in the art that adding more features to extract in the system and method taught by Hertz would improve the teachings of Hertz by expanding the extracted information upon which entities can be related.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Malabarba (U.S. Pre-Grant Publication NO. 2020/0184210) teaches a multi-modal document feature extraction for extracting features from a document comprising text information and image information.
Kasravi et al. (U.S. Pre-Grant Publication No. 2014/0201111) teaches confidentiality classification of files including vectorizing a file to reduce the file to a single structured representation and analyzing the representation with a machine learning engine that generates a confidentiality classification for the file wherein the vectorization includes feature extraction of the file.
Zhao et al. (U.S. Pre-Grant Publication No. 2018/0246973) teaches techniques for providing user interest modeling by determining a plurality of interests associated with a user account and generating a content feed for a user that includes one or more web documents based on the plurality of interests.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ROBERT F MAY whose telephone number is (571)272-3195. The examiner can normally be reached Monday-Friday 9:30am to 6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hosain Alam can be reached on 571-272-3978. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ROBERT F MAY/Examiner, Art Unit 2154                                                                                                                                                                                                        2/12/2022

/HOSAIN T ALAM/Supervisory Patent Examiner, Art Unit 2154