DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
This action is in response to applicant’s arguments and amendments filed 5/25/2022, which are in response to USPTO Office Action mailed 4/19/2022. Applicant’s arguments have been considered with the results that follow: THIS ACTION IS MADE FINAL.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 1-3, 6, 12-13 and 16-18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Al-Kohafi et al. (US PGPUB No. 2010/0114911; Pub. Date: May 6, 2010) in view of Brailovskiy et al. (US Patent No. 10,523,170; Date of Patent: Dec. 31, 2019) and Goncharov (US Patent No.: 8,449,355; Date of Patent: Jul. 30, 2013).
Regarding independent claim 1,
Al-Kohafi discloses a system for association of data elements within a document comprising: an input data receiving subsystem configured to receive an input data source of the document in one or more formats; See Paragraph [0054], (Disclosing a method for classifying text such as headnotes and/or documents. The method comprises modeling one or more input headnotes as a set of corresponding headnote-text vectors, i.e. receive an input data source (e.g. the input headnote).)
a feature generation subsystem operatively coupled to the input data receiving subsystem, wherein the feature extraction subsystem is configured to: obtain one or more lists of personal data extracted from the input data source upon scanning the input data source of the document using a data source scanning technique; See Paragraph [0094], (Classifiers model each input headnote as a feature vector of noun-word pairs and each class identifier as a feature vector of non-word pairs extracted from headnotes assigned to it, i.e. obtain one or more lists of personal data extracted from the input data source upon scanning the input data source of the document (e.g. the feature vector modeling is a data scanning technique).)
and generate one or more personal data features representing a relationship between one or more personal data elements obtained from the one or more lists of the personal data; See Paragraph [0047], (To determine the most relevant headnotes, the system uses classifiers to compute similarity scores for each headnote based on the contents of said headnote, i.e. the nouns and/or noun-word pairs in a headnote are data features of a data element that represents a relationship between one or more data elements (e.g. similarity is a relationship).) Note [0048] wherein headnotes are described as documents having text which can be parsed into individual sets of nouns, noun-noun, noun-verb and noun-adjective pairs.)
The examiner notes that the claim does not establish a functional relationship between the system and specifically personal data, the computer system is described as merely a support for the information. Therefore, no functional relationship exists between the computer system and personal data. If a new and unobvious functional relationship between the printed matter and the substrate does not exist, USPTO personnel need not give patentable weight to printed matter. See In re Lowry, 32 F.3d 1579, 1583-84, 32 USPQ2d 1031, 1035 (Fed. Cir. 1994);
an affinity computation subsystem operatively coupled to the feature generation subsystem, wherein the affinity computation subsystem is configured to: assess each of the one or more personal data features generated from the one or more personal data elements at a predetermined time interval based on consideration of one or more levels of affinity; See Paragraph [0047], (To determine the most relevant headnotes, the system uses classifiers to compute similarity scores for each headnote.) See Paragraph [0090]-[0091], (Headnote recommendations and analysis may be based on a total number of input headnotes per fixed time period. Headnote composite scores may also be used to re-order headnote-section pairs based on a cut-off threshold, i.e. a predetermined time interval based on consideration of one or more levels of affinity.)
compute an affinity score between the one or more personal data elements using at least one type of affinity function upon assessment of each of the one or more personal data features; See Paragraph [0047], (To determine the most relevant headnotes, the system uses classifiers to compute similarity scores for each headnote, i.e. compute an affinity score between the one or more personal data elements using at least one type of affinity function (similarity is an affinity function) upon assessment of each of the one or more personal data features (e.g. the feature vector of the headnote/document).)
and generate one or more affinities for quantification of the relationship between the one or more personal data elements based on the affinity score computed; See Paragraph [0067]-[0068], (A similarity score generator for an input headnote handles generation of scores according to mathematical relationships, i.e. the scores generated for each headnote represent a quantification of a relationship between data elements.)
and an identity filtration subsystem operatively coupled to the personal data identification subsystem, wherein the identity filtration subsystem is configured to: receive the one or more affinities; See Paragraph [0092], (Disclosing that headnote-section pairs may be reordered and subsequently filtered based on acceptance criteria.) 
The examiner notes that Al-Kohafi does not explicitly disclose "the set of identities corresponding to one or more personal data elements of the individual".
Al-Kohafi does not disclose a personal data relationship identification subsystem operatively coupled to the affinity computation subsystem, wherein the personal data relationship identification subsystem is configured to: assign the one or more personal data elements to corresponding one or more identification stages based on the one or more affinities generated; 
and derive a set of identities corresponding to the one or more personal data elements of an individual assigned on the one or more identification stages using an identity creation technique;
each identity represents a unique individual;
	determine a validation of the set of identities corresponding to the one or more data elements of the individual based on a utilisation of an identity filtration technique;
Brailovskiy discloses a personal data relationship identification subsystem operatively coupled to the affinity computation subsystem, wherein the personal data relationship identification subsystem is configured to: assign the one or more personal data elements to corresponding one or more identification stages based on the one or more affinities generated; See Col. 14, line 49 - Col. 15 line 5, (An initial calibration stage, i.e. one or more identification stages, may be used to populate the lookup table by determining a pitch period for spoken names and/or other words which may then be associated with an ID and/or may prompt a user to identify the pitch period using an additional application, i.e. assign the one or more personal data elements to corresponding one or more identification stages based on the one or more affinities generated.)
and derive a set of identities corresponding to the one or more personal data elements of an individual assigned on the one or more identification stages using an identity creation technique; See FIG. 4A, (FIG. 4A illustrates a recognized person filter lookup table which includes identifiers and pitch periods for all recognized persons, i.e. derive a set of identities corresponding to the one or more personal data elements of an individual assigned on the one or more identification stages using an identity creation technique (e.g. the lookup table entries correspond to recognized persons and their attributes, i.e. creation of a lookup table is an identity creation technique as each created entry represents an identity.)
	each identity represents a unique individual; See Col. 12, lines 25-43 , (The system comprising a storage element including a recognized person filter configured to store pitch periods commonly associated with commonly used names within a particular environment, i.e. a set of identifies corresponding to one or more personal data elements of the individual (e.g. a person's name and associated data are personal data elements) and each identity represents a unique individual (e.g. a recognized person is an individual). The recognized person filter is additionally configured for determining a recognized person in response to an audio input, i.e. receiving one or more affinities (e.g. audio information detected by a microphone)
	determine a validation of the set of identities corresponding to the one or more data elements of the individual based on a utilisation of an identity filtration technique; See Col. 12, lines 29-43 , (The recognized person filter is additionally configured for determining a recognized person in response to an audio input, i.e. determine validation of the set of identities corresponding  the one or more data elements of the individual (e.g. determining a recognized person from recognized person data such as pitch periods associated with individuals)  based on a utilisation of an identity filtration technique (e.g. the recognized person filter is an identity filtration technique as it is explicitly a filter for determining recognized individuals.)
	Al-Kohafi and Brailovskiy are analogous art because they are in the same field of endeavor, data filtering and identification. It would have been obvious to anyone having ordinary skill in the art before the effective filing date to modify the system of Al-Kohafi to include the method of determining recognized data elements via filtering techniques as disclosed by Brailovskiy. Doing so would allow the system to identify known entities, such as individual people in a family, based on associated metrics and additionally reducing incidences of false positives as described in Col. 6, lines 20-24 of Brailovskiy.
Al-Kohafi-Brailovskiy does not disclose filter[ing] out the set of identities by eliminating one or more false positive identities based on the validation of the set of identities corresponding to the one or more data elements of the individual determined.
Goncharov discloses and filter out the set of identities by eliminating one or more false positive identities based on the validation of the set of identities corresponding to the one or more data elements of the individual determined. See Col. 7, lines 15-20, (The process may eliminate the generation of a false positive result by comparing the received identification information with data from a master authentication list maintained in storage, i.e. filter out the set of identities by eliminating one or more false positive identities based on the validation of the set of identities corresponding to the one or more data elements of the individual determined.)
Al-Kohafi, Brailovskiy and Goncharov are analogous art because they are in the same field of endeavor, data feature extraction and processing. It would have been obvious to anyone having ordinary skill in the art before the effective filing date to modify the system of Al-Kohafi-Brailovskiy to include the data filtering steps disclosed by Goncharov. Doing so would allow the system to filter false positive data matches out of processing. The removal of false positives ensures that the data filtering operation is accurate, thereby improving data consistency.

Regarding dependent claim 2,
As discussed above with claim 1, Al-Kohafi-Brailovskiy-Goncharov discloses all of the limitations.
Al-Kohafi further discloses the step wherein the input data source of the document comprises at least one of a structured data source, a semi-structured data source, an unstructured data source or a combination thereof. See Paragraphs [0038]-[0041], (Paragraphs [0038]-[0041] describe formatting and contents included in each headnote in a headnote database such as class identifiers, etc., i.e. the input data source  is a structured data source.)
The examiner notes that the step " a structured data source, a semi-structured data source, an unstructured data source or a combination thereof " is optional due to the use of the term “at least one of” and “or”, the claim requires selection of an element from a list of alternatives (e.g. a structured data source, a semi-structured data source, an unstructured data source or a combination thereof.), the prior art teaches the element if one of the alternatives is taught by the prior art, see MPEP 2143.03.


Regarding dependent claim 3,
As discussed above with claim 1, Al-Kohafi-Brailovskiy-Goncharov discloses all of the limitations.
Al-Kohafi further discloses the step wherein the one or more formats comprises at least one of a text format, an image format or a combination thereof. See Paragraphs [0038]-[0041], (Paragraphs [0038]-[0041] describe formatting and contents included in each headnote in a headnote database such as class identifiers, etc. Note [0050] where term frequencies of terms and/or noun-word pairs are determined for individual elements of a headnote, i.e. the one or more formats comprise at least text.)
The examiner notes that the step "a text format, an image format or a combination thereof." is optional due to the use of the term “at least one of” and “or”, the claim requires selection of an element from a list of alternatives (e.g. a text format, an image format or a combination thereof.), the prior art teaches the element if one of the alternatives is taught by the prior art, see MPEP 2143.03.

Regarding dependent claim 6,
As discussed above with claim 1, Al-Kohafi-Brailovskiy-Goncharov discloses all of the limitations.
Al-Kohafi further discloses the step wherein the one or more personal data features comprises at least one of a node feature, an edge feature, a chord feature, a metadata feature, a personal data content-based feature or a combination thereof.  See Paragraph [0058], (Similarity scores are described as meta-data associated with the input headnote, i.e. a metadata feature.)
The examiner notes that the step " of a node feature, an edge feature, a chord feature, a metadata feature, a personal data content-based feature or a combination thereof." is optional due to the use of the term “at least one of” and “or”, the claim requires selection of an element from a list of alternatives (e.g. of a node feature, an edge feature, a chord feature, a metadata feature, a personal data content-based feature or a combination thereof.), the prior art teaches the element if one of the alternatives is taught by the prior art, see MPEP 2143.03.

Regarding dependent claim 12,
As discussed above with claim 1, Al-Kohafi-Brailovskiy-Goncharov discloses all of the limitations.
Al-Kohafi further discloses the step wherein the affinity score comprises a probability, a logit, an unbounded score, a binary value, an energy value or a correlation value. See Paragraph [0061], (Similarity scores describe a probability that a headnote is associated with a given annotation from class-identifier or other meta-data statistics as defined by Equation #6, i.e. a probability.)
The examiner notes that the step " a probability, a logit, an unbounded score, a binary value, an energy value or a correlation value." is optional due to the use of the term “or”, the claim requires selection of an element from a list of alternatives (e.g. a probability, a logit, an unbounded score, a binary value, an energy value or a correlation value), the prior art teaches the element if one of the alternatives is taught by the prior art, see MPEP 2143.03.

Regarding dependent claim 13,
As discussed above with claim 1, Al-Kohafi-Brailovskiy-Goncharov discloses all of the limitations.
Al-Kohafi further discloses the step wherein the at least one type of the affinity function for a structured input data source comprises at least one of a rule-based scheme for computation of affinity score, a feature based scheme for computation of affinity score, or a combination thereof. See Paragraph [0094], (The classifier component generates similarity scores based on a tf-idf product for feature vectors of noun-word pairs associated with each headnote assigned to each class identifier and generates similarity scores based on the probabilities of a class identifier given the input headnote, i.e. a feature-based scheme for computation of affinity score.)
The examiner notes that the step "a rule-based scheme for computation of affinity score, a feature based scheme for computation of affinity score, or a combination thereof." is optional due to the use of the term " at least one of " and “or”, the claim requires selection of an element from a list of alternatives (e.g. a rule-based scheme for computation of affinity score, a feature based scheme for computation of affinity score, or a combination thereof.), the prior art teaches the element if one of the alternatives is taught by the prior art, see MPEP 2143.03.


Regarding dependent claim 16,
As discussed above with claim 1, Al-Kohafi-Brailovskiy-Goncharov discloses all of the limitations.
Al-Kohafi further discloses the step wherein the identity creation technique corresponding to the input data source comprises at least one of a rule-based identity creation technique, a model- based identity creation technique or a combination thereof. See Paragraph [0054], (The method comprises representing headnote annotations as text-based feature vectors and then subsequently modeling one or more input headnotes from a database as a set of corresponding headnote-text vectors, i.e. a model-based identity creation technique.)  
	The examiner notes that the step " at least one of a rule-based identity creation technique, a model- based identity creation technique or a combination thereof." is optional due to the use of the terms “at least one of” and "or", the claim requires selection of an element from a list of alternatives, the prior art teaches the element if one of the alternatives is taught by the prior art, see MPEP 2143.03.

Regarding dependent claim 17,
	As discussed above with claim 1, Al-Kohafi-Brailovskiy-Goncharov discloses all of the limitations.
Goncharov further discloses the step wherein the identity filtration technique comprises at least one of a rule-based identity and affinity filtration technique, a learning-based affinity and identity filtration technique or a combination thereof. See Col. 6, lines 62-, (Identification of a user comprises the risk engine device performing a comparison of identification information with a master authentication list stored in a database. A hashing scheme is used to determine whether identification information matches the Bloom filtered authentication list. The process may eliminate the generation of a false positive result by comparing the received identification information with data from a master authentication list maintained in storage, i.e. a rule-based identify and affinity filtration techniques (e.g. the filtering and comparison process represent rules that determine which data elements are valid).)

Regarding independent claim 18,
	The claim is analogous to the subject matter of independent claim 1 directed to a method or process and is rejected under similar rationale.

Claim 4 is/are rejected under 35 U.S.C. 103 as being unpatentable over Al-Kohafi in view of Brailovskiy and Goncharov as applied to claim 1 above, and further in view of Dintenfass et al. (US Patent No.: 10/635,506; Date of Patent: Apr. 28, 2020).
Regarding dependent claim 4,
	As discussed above with claim 1, Al-Kohafi-Brailovskiy-Goncharov discloses all of the limitations.
Al-Kohafi-Brailovskiy-Goncharov does not disclose the step wherein the one or more lists of the personal data comprises at least one of a static list of one or more personal data elements, a dynamic stream of one or more personal data elements or a combination thereof.  
	Dintenfass discloses the step wherein the one or more lists of the personal data comprises at least one of a static list of one or more personal data elements, a dynamic stream of one or more personal data elements or a combination thereof.  See Col. 13, lines 26-33, (Disclosing a system for passive scanning and evaluation of event execution of a user. The system may extract and pull data from a user in order to generate a dynamic list of user data in order to draw conclusions about user behavior, i.e. the list of personal data comprises a dynamic stream of one or more personal data elements.)
The examiner notes that the step "a static list of one or more personal data elements, a dynamic stream of one or more personal data elements or a combination thereof." is optional due to the use of the term “at least one of” and “or”, the claim requires selection of an element from a list of alternatives (e.g. a static list of one or more personal data elements, a dynamic stream of one or more personal data elements or a combination thereof), the prior art teaches the element if one of the alternatives is taught by the prior art, see MPEP 2143.03.
Al-Kohafi, Brailovskiy, Goncharov and Dintenfass are analogous art because they are in the same field of endeavor, data feature extraction and processing. It would have been obvious to anyone having ordinary skill in the art before the effective filing date to modify the system of Al-Kohafi-Brailovskiy-Goncharov to include the functionality for processing dynamic lists of data as disclosed by Dintenfass. Doing so would allow the system to process incoming data in real-time, thereby allowing for a constant aggregation and processing of data.

Claim 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over Al-Kohafi in view of Brailovskiy and Goncharov as applied to claim 1 above, and further in view of Borodin (US PGPUB No. 2019/0384971; Pub. Date: Dec. 19, 2019).
Regarding dependent claim 5,
	As discussed above with claim 1, Al-Kohafi-Brailovskiy-Goncharov discloses all of the limitations.
	Al-Kohafi-Brailovskiy-Goncharov does not disclose the step wherein the data source scanning technique comprises an optical character recognition technique for scanning the input data source statically or dynamically.
	Borodin discloses the step wherein the data source scanning technique comprises an optical character recognition technique for scanning the input data source statically or dynamically.  See Paragraph [0010], (Disclosing the use of a "Smart OCR" technology that uses algorithms to capture data from documents based on dynamic virtual templates that maintain the correct context of scanned data, i.e. an OCR technique for scanning the input data source dynamically (e.g. via dynamic virtual templates).)
Al-Kohafi, Brailovskiy, Goncharov and Borodin are analogous art because they are in the same field of endeavor, data feature extraction and processing. It would have been obvious to anyone having ordinary skill in the art before the effective filing date to modify the system of Al-Kohafi-Brailovskiy-Goncharov to include the use of a Smart OCR algorithm for processing incoming documents as disclosed by Borodin. Doing so would allow the system to parse incoming via dynamic virtual templates that preserve the correct context of scanned documents, which results in an efficient means for mapping documents to data structures that can be used for further processing.

Claim 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Al-Kohafi in view of Brailovskiy and Goncharov as applied to claim 6 above, and further in view of Szcezepanik et al. (US PGPUB No. 2020/0174966; Pub. Date: Jun. 4, 2020).
Regarding dependent claim 7,
	As discussed above with claim 6, Al-Kohafi-Brailovskiy-Goncharov discloses all of the limitations.
	Al-Kohafi-Brailovskiy-Goncharov does not disclose the step wherein the metadata feature comprises at least one of features associated with a type of input data source, features associated with a size of input data source or a combination thereof.
	Szcezepanik discloses the step wherein the metadata feature comprises at least one of features associated with a type of input data source, features associated with a size of input data source or a combination thereof. See Paragraph [0082], (Disclosing a system for implementing a cognitive data lake. The system comprising an analysis module configured to parse through metatags wherein each file type and/or associated metadata being analyzed may have a distinct, recognizable pattern of attributes that is useful for categorizing the type of data stored by each file, i.e. the metadata feature comprises at least a feature associated with a type of input data source (e.g. the file type represents a type of input document).)
The examiner notes that the step "features associated with a type of input data source, features associated with a size of input data source or a combination thereof." is optional due to the use of the term “at least one of” and “or”, the claim requires selection of an element from a list of alternatives (e.g. features associated with a type of input data source, features associated with a size of input data source or a combination thereof.), the prior art teaches the element if one of the alternatives is taught by the prior art, see MPEP 2143.03.
Al-Kohafi, Brailovskiy, Goncharov and Szcezepanik are analogous art because they are in the same field of endeavor, data feature processing. It would have been obvious to anyone having ordinary skill in the art before the effective filing date to modify the system of Al-Kohafi-Brailovskiy-Goncharov to include the method of determining data types for metadata as disclosed by Szcezepanik. Paragraph [0082] of Szcezepanik discloses that the process of identifying metadata features allows a machine learning module to categorize data being ingested in accordance with the discovered metadata. This process allows the system to identify types of documents according to file metadata.

Claim 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Al-Kohafi in view of Brailovskiy and Goncharov as applied to claim 6 above, and further in view of Szcezepanik et al. (US PGPUB No. 2020/0174966; Pub. Date: Jun. 4, 2020).
Regarding dependent claim 8,
	As discussed above with claim 6, Al-Kohafi-Brailovskiy-Goncharov discloses all of the limitations.
	Al-Kohafi-Brailovskiy-Goncharov does not disclose the step wherein the personal data content-based feature corresponding to structured input data source is analysed by utilising a plurality of structured data source specific parameters comprising at least one of number of rows in a table, similarity relationship between a table name and a column name, identification of a repetitive personal data in a column, a confidence value of a column comprising a specific personal data, number of levels in a tree, similarity of keys, continuous representation of personal data in a tree format or a combination thereof.
	MALABARBA discloses the step wherein the personal data content-based feature corresponding to structured input data source is analysed by utilising a plurality of structured data source specific parameters comprising at least one of number of rows in a table, similarity relationship between a table name and a column name, identification of a repetitive personal data in a column, a confidence value of a column comprising a specific personal data, number of levels in a tree, similarity of keys, continuous representation of personal data in a tree format or a combination thereof.  SEE Paragraph [0048]-[0049], (Disclosing a method for generating a machine learning model for multi-modal feature extraction. The method includes receiving a document in a digital format. A feature extraction component extracts text information from the submitted document to identify text and image information to identify text and image features. The feature tree component takes the list of features and structures said data into a tree, i.e. continuous representation of data in a tree format.)
The examiner notes that the step " number of rows in a table, similarity relationship between a table name and a column name, identification of a repetitive personal data in a column, a confidence value of a column comprising a specific personal data, number of levels in a tree, similarity of keys, continuous representation of personal data in a tree format or a combination thereof.  ." is optional due to the use of the term “at least one of” and “or”, the claim requires selection of an element from a list of alternatives (e.g. number of rows in a table, similarity relationship between a table name and a column name, identification of a repetitive personal data in a column, a confidence value of a column comprising a specific personal data, number of levels in a tree, similarity of keys, continuous representation of personal data in a tree format or a combination thereof.  .), the prior art teaches the element if one of the alternatives is taught by the prior art, see MPEP 2143.03.
Al-Kohafi, Brailovskiy , Goncharov and MALABARBA are analogous art because they are in the same field of endeavor, data feature extraction and processing. It would have been obvious to anyone having ordinary skill in the art before the effective filing date to modify the system of Al-Kohafi-Brailovskiy-Goncharov to include the method of generating a tree expression for incoming data features as disclosed by MALABARBA. Paragraph [0044] of MALABARBA discloses that the method of analyzing, classifying and extracting information from formatted documents is useful for a wide variety of data sources and types, such that expression trees can be generated to describe nearly any document using machine learning techniques to improve accuracy of results.
Claim 9 and 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Al-Kohafi in view of Brailovskiy and Goncharov as applied to claim 6 above, and further in view of Mehta et al. (US PGPUB No. 2013/0031032; Pub. Date; Jan. 31, 2013)
Regarding dependent claim 9,
	As discussed above with claim 6, Al-Kohafi-Brailovskiy-Goncharov discloses all of the limitations.
Al-Kohafi-Brailovskiy-Goncharov does not disclose the step wherein the personal data content-based feature corresponding to a semi-structured input data source is analysed by utilising a plurality of semi-structured data source specific parameters comprising at least one of one or more visuals analysis of the personal data for a chunk, a page and a document, text analysis of the personal data for a chunk, a page and a document or a combination thereof.
Mehta discloses the step wherein the personal data content-based feature corresponding to a semi-structured input data source is analysed by utilising a plurality of semi-structured data source specific parameters comprising at least one of one or more visuals analysis of the personal data for a chunk, a page and a document, text analysis of the personal data for a chunk, a page and a document or a combination thereof. See Paragraph [0052], (Disclosing a method for automatically extracting features from semi-structured web pages. The method includes a score learner component that can learn a function that assigns scores to features extracted from semi-structured web pages wherein the scores are indicative of the importance of the features to the positioning of semi-structured web pages in search results.) See Paragraph [0032], (Feature analysis for documents includes extracting text in the documents, information about fonts utilized to describe or emphasize text, positions of text on relevant web pages, etc., i.e. text analysis of the data for a page (e.g. the webpage).)
The examiner notes that the step one or more visuals analysis of the personal data for a chunk, a page and a document, text analysis of the personal data for a chunk, a page and a document or a combination thereof." is optional due to the use of the term “at least one of” and “or”, the claim requires selection of an element from a list of alternatives (e.g. one or more visuals analysis of the personal data for a chunk, a page and a document, text analysis of the personal data for a chunk, a page and a document or a combination thereof.), the prior art teaches the element if one of the alternatives is taught by the prior art, see MPEP 2143.03.
Al-Kohafi, Brailovskiy , Goncharov and Mehta are analogous art because they are in the same field of endeavor, data feature extraction and processing. It would have been obvious to anyone having ordinary skill in the art before the effective filing date to modify the system of Al-Kohafi-Brailovskiy -Goncharov to include the method of automatically extracting features from semi-structured webpages as disclosed by Mehta. Doing so would provide a means for extracting and verifying features obtained from web-pages. Paragraph [0049] of Mehta discloses that verification data is used to determine if the automatically identified features are appropriate such that the model may be trained further in order to more accurately extract useful features from web pages. This results in a system that becomes more accurate over time, therefore improving system performance.

Regarding dependent claim 14,
	As discussed above with claim 1, Al-Kohafi-Brailovskiy-Goncharov discloses all of the limitations.
Al-Kohafi-Brailovskiy-Goncharov does not disclose the step wherein the at least one type of the affinity function for a semi-structured data source comprises at least one of a feature based scheme for computation of affinity score, a distance based scheme for computation of affinity score or a combination thereof  
	Mehta discloses the step wherein the at least one type of the affinity function for a semi-structured data source comprises at least one of a feature based scheme for computation of affinity score, a distance based scheme for computation of affinity score or a combination thereof. See Paragraph [0052], (Disclosing a method for automatically extracting features from semi-structured web pages. The method includes a score learner component that can learn a function that assigns scores to features extracted from semi-structured web pages wherein the scores are indicative of the importance of the features to the positioning of semi-structured web pages in search results, i.e. an affinity function for a semi-structured data source comprising a feature-based scheme for computation of affinity score (e.g. the importance score reflects an "affinity" of a web page to a particular feature).)
The examiner notes that the step " a feature based scheme for computation of affinity score, a distance based scheme for computation of affinity score or a combination thereof." is optional due to the use of the term “at least one of” and “or”, the claim requires selection of an element from a list of alternatives (e.g. a feature based scheme for computation of affinity score, a distance based scheme for computation of affinity score or a combination thereof.), the prior art teaches the element if one of the alternatives is taught by the prior art, see MPEP 2143.03.
Al-Kohafi, Brailovskiy, Goncharov and Mehta are analogous art because they are in the same field of endeavor, data feature extraction and processing. It would have been obvious to anyone having ordinary skill in the art before the effective filing date to modify the system of Al-Kohafi-Brailovskiy -Goncharov to include the method of automatically extracting features from semi-structured webpages as disclosed by Mehta. Doing so would provide a means for extracting and verifying features obtained from web-pages. Paragraph [0049] of Mehta discloses that verification data is used to determine if the automatically identified features are appropriate such that the model may be trained further in order to more accurately extract useful features from web pages. This results in a system that becomes more accurate over time, therefore improving system performance.
Claim 10 and 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Al-Kohafi in view of Brailovskiy and Goncharov as applied to claim 6 above, and further in view of DELUCA et al. (US PGPUB No. 2018/0308128; Pub. Date: Oct. 25, 2018).
Regarding dependent claim 10,
	As discussed above with claim 6, Al-Kohafi-Brailovskiy -Goncharov discloses all of the limitations.
Al-Kohafi-Brailovskiy-Goncharov does not disclose the step wherein the personal data content-based feature corresponding to an un-structured input data source is analysed by utilising a plurality of unstructured data source specific parameters comprising at least one of a continuous representation of personal data analysis for capturing a summary of a personal data content, a per token or a continuous sentence representation analysis of the personal data or a combination thereof.  
	DELUCA discloses the step wherein the personal data content-based feature corresponding to an un-structured input data source is analysed by utilising a plurality of unstructured data source specific parameters comprising at least one of a continuous representation of personal data analysis for capturing a summary of a personal data content, a per token or a continuous sentence representation analysis of the personal data or a combination thereof. See Paragraph [0025], (Disclosing a system configured to perform natural language processing on unstructured text. An NLP processing system handles unstructured data using a number of processes including (a) topic classification, (b) sentiment classification, (c) other NLP classifications. Sentiment analysis is able to classify the polarity (positive, negative or neutral sentiment) of a given text at document, sentence or feature/aspect level., i.e. topic classification, sentiment classification, etc. represent continuous sentence analysis of data.)
The examiner notes that the step “a continuous representation of personal data analysis for capturing a summary of a personal data content, a per token or a continuous sentence representation analysis of the personal data or a combination thereof." is optional due to the use of the term “at least one of” and “or”, the claim requires selection of an element from a list of alternatives (e.g. a continuous representation of personal data analysis for capturing a summary of a personal data content, a per token or a continuous sentence representation analysis of the personal data or a combination thereof.), the prior art teaches the element if one of the alternatives is taught by the prior art, see MPEP 2143.03.
Al-Kohafi, Brailovskiy, Goncharov and DELUCA are analogous art because they are in the same field of endeavor, data feature extraction and processing. It would have been obvious to anyone having ordinary skill in the art before the effective filing date to modify the system of Al-Kohafi-Brailovskiy-Goncharov to include the method of performing natural language processing on unstructured data as disclosed by DELUCA. Doing so would allow the system to receive any form of document and automatically parse and evaluate its contents via the plurality of NLP techniques described in Paragraph [0025] of DELUCA, thereby allowing the system to classify and characterize incoming documents.


Regarding dependent claim 15,
	As discussed above with claim 1, Al-Kohafi-Brailovskiy-Goncharov discloses all of the limitations.
Al-Kohafi-Brailovskiy-Goncharov does not disclose the step wherein the at least one type of the affinity function for an un-structured data source comprises at least one of a feature based scheme for computation of affinity score, a language based scheme for computation of affinity score or a combination thereof.
DELUCA discloses the step wherein the at least one type of the affinity function for an un-structured data source comprises at least one of a feature based scheme for computation of affinity score, a language based scheme for computation of affinity score or a combination thereof.  See Paragraph [0025], (Disclosing a system configured to perform natural language processing on unstructured text. The method can assign a probable affinity to particular concepts or terms via statistical analyses including machine learning such as latent semantic analysis, support vector machines, etc., i.e. an affinity function for unstructured data comprising at least a language-based scheme for computation of an affinity score.)
The examiner notes that the step “a feature based scheme for computation of affinity score, a language based scheme for computation of affinity score or a combination thereof." is optional due to the use of the term “at least one of” and “or”, the claim requires selection of an element from a list of alternatives (e.g. a feature based scheme for computation of affinity score, a language based scheme for computation of affinity score or a combination thereof.), the prior art teaches the element if one of the alternatives is taught by the prior art, see MPEP 2143.03.
Al-Kohafi, Brailovskiy , Goncharov and DELUCA are analogous art because they are in the same field of endeavor, data feature extraction and processing. It would have been obvious to anyone having ordinary skill in the art before the effective filing date to modify the system of Al-Kohafi-Brailovskiy -Goncharov to include the method of performing natural language processing on unstructured data as disclosed by DELUCA. Doing so would allow the system to receive any form of document and automatically parse and evaluate its contents via the plurality of NLP techniques described in Paragraph [0025] of DELUCA, thereby allowing the system to classify and characterize incoming documents.

Claim 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Al-Kohafi in view of Brailovskiy and Goncharov as applied to claim 6 above, and further in view of Takahashi (US PGPUB No. 2015/0254462; Pub. Date: Sep. 10, 2015).
Regarding dependent claim 10,
	As discussed above with claim 6, Al-Kohafi-Brailovskiy -Goncharov discloses all of the limitations.
Al-Kohafi-Brailovskiy -Goncharov does not disclose the step wherein the one or more levels of the affinity comprises at least one of node-level affinity, an edge level affinity, a chord level affinity or a combination thereof.
Takahashi discloses the step wherein the one or more levels of the affinity comprises at least one of node-level affinity, an edge level affinity, a chord level affinity or a combination thereof.  See Paragraph [0101], (Disclosing a method for extracting data from records. The system comprises a record extraction unit configured to generate transition vectors which comprise attributes of an extracted record.) See FIG. 25 and Paragraph [0253], (FIG. 25 illustrates a graph structure having nodes and edges wherein transition vectors are represented as nodes and a level of similarity metric is expressed as an edge, i.e. edge-level affinity.)
The examiner notes that the step “node-level affinity, an edge level affinity, a chord level affinity or a combination thereof." is optional due to the use of the term “at least one of” and “or”, the claim requires selection of an element from a list of alternatives (e.g. node-level affinity, an edge level affinity, a chord level affinity or a combination thereof.), the prior art teaches the element if one of the alternatives is taught by the prior art, see MPEP 2143.03.
Al-Kohafi, Brailovskiy , Goncharov and Takahashi are analogous art because they are in the same field of endeavor, data feature extraction and processing. It would have been obvious to anyone having ordinary skill in the art before the effective filing date to modify the system of Al-Kohafi-Brailovskiy -Goncharov to include the step of representing data vectors in a hierarchical graph format wherein nodes and edges represent vector characteristics and relationships as disclosed by Takahashi. Doing so would allow the system to generate correspondence relationships between vector data in order to represent whether or not two records are related.



Response to Arguments
Applicant's arguments regarding independent claim 1 have been fully considered but they are not persuasive.
Regarding independent claim 1,
	Applicant argues Al-Kohafi does not disclose the following limitations of independent claim 1:
1.	an input data receiving subsystem configured to receive an input data source of the document in one or more formats;
	The examiner respectfully disagrees,
Applicant argues that an "input data source" refers to "at least one of a structured data source, a semi-structured data source, an unstructured data source or a combination thereof", however these limitations are not explicitly present in the claims. If Applicant believes this to be a crucial feature of the claimed invention, the examiner suggests amending independent claim 1 to include language that further defines the input data source.
In response to applicant's argument that the references fail to show certain features of applicant’s invention, it is noted that the features upon which applicant relies (i.e., at least one of a structured data source, a semi-structured data source, an unstructured data source or a combination thereof) are not recited in the rejected claim.  Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).
Additionally, Paragraph [0024] of Al-Kofahi describes the structure of a headnote as "referring to an electronic textual summary or abstract concerning a point of law within a written judicial opinion". Paragraph [0038] further describes classes and/or class identifiers associated with individual headnotes. One of ordinary skill in the art would recognize a "structured document" to refer to an electronic document having labelled elements that have various meanings beyond their formatting, for example a survey having individual question elements. The headnotes of Al-Kofahi are comprised of class identifiers and additionally comprise textual summaries, therefore are structured documents.

2.	obtain one or more lists of personal data extracted from the input data source upon scanning the input data source of the document using a data source scanning technique;
The examiner respectfully disagrees,
Applicant argues that "personal data extracted in the present application has much wider scope and not merely feature-vector of noun-word pairs". Paragraph [0019] of Applicant's Specification describes the "one or more lists of personal data" as including "at least one of a static list of one or more personal data elements, a dynamic stream of one or more personal data elements, or a combination thereof". The broadest, reasonable interpretation for a "list of personal data" in view of the specification's definition of "at least a static list" includes a listing or collection of data elements that describe or characterize a document. 
The examiner notes that the set of headnote-text vectors represents a collection of data elements that characterize a headnote. Therefore, Al-Kohafi discloses the claim limitations as currently presented.

3.	and generate one or more personal data features representing a relationship between one or more personal data elements obtained from the one or more lists of the personal data;
The examiner respectfully disagrees,
Applicant argues that "the retrieved personal data is firstly converted into machine learnable representation". The limitation at issue does not explicitly recite nor require these steps be performed. If Applicant believes this to be a crucial feature of the claimed invention, the examiner suggests amending independent claim 1 to include language that describes processing personal data into a machine learnable representation.
In response to applicant's argument that the references fail to show certain features of applicant’s invention, it is noted that the features upon which applicant relies (i.e., the retrieved personal data is firstly converted into machine learnable representation) are not recited in the rejected claim(s).  Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).


4.	receive the one or more affinities and the set of identities corresponding to the one or more personal data elements of the individual and each identity represents a unique individual;
	Applicant’s amendments required the new grounds of rejection presented above. See rejection above for further discussion regarding this limitation.

Applicant argues that Goncharov does not disclose the following limitation of independent claim 1:
5.	filter out the set of identities by eliminating one or more false positive identities based on the validation of the set of identities corresponding to the one or more data elements of the individual determined.
	The examiner respectfully disagrees,
	Applicant argues that the claimed invention does not comprise a “master authentication list” as in the cited portions of Goncharov and that the method instead uses matrices to accomplish the limitation at issue. The current claim language does not explicitly require matrices, merely requiring a filtering process that results in the elimination of one or more false positives. Under broadest, reasonable interpretation, the limitation does not preclude the use of a master authentication list to perform the task of eliminating false positives which is explicitly described in Col. 7, lines 15-20 of Goncharov.
	Therefore, Goncharov teaches filter[ing] out the set of identities by eliminating one or more false positive identities.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Fernando M Mari whose telephone number is (571)272-2498. The examiner can normally be reached Monday-Friday 6am-3pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mariela Reyes can be reached on (571) 270-1006. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/FMMV/Examiner, Art Unit 2159                                                                                                                                                                                                        /Mariela Reyes/Supervisory Patent Examiner, Art Unit 2159