DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
This Final Office Action is in response to remarks and amendment filed on 04/20/2021.  Amended claims 1, 8, and 15, filed on 04/20/2021 are being considered on the merits.  
In response to the last Office Action: 
Claims 1, 8 and 15 have been amended.
Claim 7 has been canceled.
Claims 1-6, and 8-20 remain pending in this application.

Response to Arguments
The applicant’s remarks and/or arguments, filed on 04/20/2021 have been fully considered. 
The examiner is entitled to give claim limitations their broadest reasonable interpretation in light of the specification. See MPEP 2111 [R-1] Interpretation of Claims-Broadest Reasonable Interpretation. The applicant always has the opportunity to amend the claims during prosecution, and broad interpretation by the examiner reduces the possibility that the claim, once issued, will be interpreted more broadly than is justified. In re Prater, 162 USPQ 541,550-51 (CCPA 1969).

Applicant's below arguments in the applicant’s remarks regarding independent claims 1, 8 and 15, found on pages 9-10, and filed on 10/16/2020, have been fully considered but they are not persuasive.


Regarding the aforementioned claim limitations, Examiner respectfully disagrees.  Examiner asserts that the aforementioned amended limitations of independent claims 1, 8 and 15, as drafted and given the broadest reasonable interpretation, are disclosed by the combination of Parkinson and Sam cited prior arts.  In particular, Parkinson discloses in Para. [0045]: “The vector of tokens is then subjected to featurization by statistical parsing engine 204. Featurization is indicated by block 218 in FIG. 3 and illustratively formulates a feature vector associated with each token, wherein the values in the feature vector indicate the presence or absence of features of interest found in that corresponding token. Of course, the particular features represented by the feature vector will vary widely based on the particular schema type that is to be populated by engine 204.”, the examiner notes that the tokenization algorithm and then the featurization, i.e. characterization, by the statistical parsing engine that formulates a feature vector associated with each token, wherein the values in the feature vector indicate the presence or absence of features of interest to that of the metrics characterizing relationships between the features of each token and a structure of a corresponding segment.  Furthermore, the reference to Sam discloses determining relationships between a token and other portions of 
Further details are provided in the below set forth 35 USC 103 rejection.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date 


Claims 1-5, and 8-20 are rejected under 35 U.S.C. 103 as being unpatentable over US Patent Application Publication (US 2006/0053133 A1) issued to Parkinson (hereinafter as “PARKINSON”), and in view of US Patent Application Publication (US 2014/0280352 A1) issued to Sam et al. (hereinafter as “SAM”).
Regarding claim 1, (Currently Amended),  PARKINSON teaches a system comprising: at least one processor and a non-transitory computer readable storage medium having program instructions embodied therewith, the program instructions executable by the at least one processor to cause the at least one processor to perform operations (PARKINSON Para. [0022], lines (1-3): “The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer.”; and Fig. 1, Para. [0024], lines (1-2): “Computer 110 typically includes a variety of computer readable media.”), comprising: 
receiving input data, wherein the input data includes a plurality of segments, and wherein the segments include a plurality of source fields (PARKINSON Fig. 2, Para. [0031], lines (1-4): “FIG. 2 is a block diagram of a system 200 used in accordance with one embodiment of the present invention for receiving an unstructured data input and generating a structured output, from that input.”; and 
Fig. 2/3, Para. [0039], lines (4-7): “…, file system 202 provides the unstructured data input 208, along with a Type indicator and an Integer value (collectively referred to as data 212) to statistical parsing engine 204.”; and 
Fig. 4, Para. [0041], lines (2-6): “In the embodiment shown in FIG. 4, the unstructured contact information entered by the user is "Mr. John Doe 123 Main Street, Seattle, Wash. 43678", 
the Examiner notes that the reference illustrates in Fig. 4 input data of contact information of multiple parts/segment that includes plurality of data source fields); 
parsing each of the segments into tokens, wherein each token is data that is contained in a particular source field of the plurality of source fields (PARKINSON Fig. 2, Para. [0042], lines (1-5): “Once statistical parsing engine 204 receives data 212, it tokenizes the data and creates a contact record data structure, because the object type has been set to "Contact". Tokens are illustratively objects and represent individual items in the unstructured textual input.”); 
determining contextual information associated with each token wherein the contextual information comprises one or more features associated with each token, and wherein determining the contextual information comprises determining one or more features of each token based on metrics  (PARKINSON Para. [0042], lines (5-15): “Any known tokenization algorithm can be used to identify tokens in the unstructured input. …, the tokenization algorithm simply breaks the unstructured data input into separate tokens by identifying substrings that are separated by white space (such as a space, a tab, etc.) of course, additional complexity can also be added, in any known manner, to handle parenthetical items, to separate content words from attached punctuation marks, etc. In any case, the tokenization algorithm breaks the input into a vector of tokens.”; and
Fig. 2/3, Para. [0045], lines (1-9): “The vector of tokens is then subjected to featurization by statistical parsing engine 204. Featurization is indicated by block 218 in FIG. 3 and illustratively formulates a feature vector associated with each token, wherein the values in the feature vector indicate the presence or absence of features of interest found in that corresponding token. Of course, the particular features represented by the feature vector will vary widely based on the particular schema type that is to be populated by engine 204.”; and 
Fig. 6, Para. [0046], lines (1-6): “The features shown in FIG. 6 indicate whether the token contains an initial capitalized letter, whether the token comprises all digits or all alpha characters, and whether the token contains a hyphen. Of course, a wide variety of other features could be used as well and those illustrated in FIG. 6 are shown for the sake of example only.”, 
the Examiner notes that the tokenization algorithm and then the featurization, i.e. characterization, by the statistical parsing engine that formulates a feature vector associated with each token, wherein the values in the feature vector indicate the presence or absence of features of interest to that of the metrics characterizing relationships between the features of each token and a structure of a corresponding segment); 
mapping the tokens in the source fields of the segments to a plurality of target fields of a target schema based at least in part on the contextual information (PARKINSON Para. [0010], lines (1-2): “The present invention uses a machine-learned statistical model to map between unstructured data and structured data”; and 
Fig. 2, Para. [0032], lines (1-8): “Statistical parsing engine 204 receives an unstructured input and generates a structured output, from that input. …, the unstructured input is a text string and the structured output is a populated data schema that is defined by file system 202. In other words, statistical parsing engine 204 receives an unstructured textual input and maps components of the input into a structured data schema.”; and
Fig. 2, Para. [0033], lines (1-11): “In order to train statistical engine 204 to perform this mapping function, data is collected or generated that includes examples of how people represent the type of data on which engine 204 is being trained. For instance, the present example will proceed with respect to engine 204 being trained to map contact information (such as names, addresses, telephone numbers, electronic mail addresses, etc.) to a data schema used by a personal information manager. In that instance, training data is collected or generated that provides examples of how users represent contact information”); and 2 of 12U.S. Application No.: 16/004,863 
populating the target fields of the target schema with the tokens from the source fields based at least in part on the mapping, wherein the parsing of the segments into tokens, the determining of the contextual information associated the tokens, and the mapping of the tokens in the source fields to the target fields is performed substantially during the populating the target fields of the target schema with the tokens from the source fields (PARKISON Fig. 2, Para. [0032], lines (1-8): “Statistical parsing engine 204 receives an unstructured input and generates a structured output, from that input. …, the unstructured input is a text string and the structured output is a populated data schema that is defined by file system 202. In other words, statistical parsing engine 204 receives an unstructured textual input and maps components of the input into a structured data schema.”; and
Fig. 2/3, Para. [0045], lines (1-9): “The vector of tokens is then subjected to featurization by statistical parsing engine 204. Featurization is indicated by block 218 in FIG. 3 and illustratively formulates a feature vector associated with each token, wherein the values in the feature vector indicate the presence or absence of features of interest found in that corresponding token. Of course, the particular features represented by the feature vector will vary widely based on the particular schema type that is to be populated by engine 204.”; and
Para. [0048], lines (1-9): “…, during the training phase, the training data is tokenized and feature vectors are associated with each token as well. It will further be appreciated that the featurization algorithm used at runtime to generate the feature vectors for the tokens will illustratively be identical to that used in the training phase. This will produce a more accurate estimate of the probabilities that the various tokens belong to the various portions of the data schema being populated.”; and
Figs. 2, 6, and 7, Para. [0052], lines (1-19): “Once the ordered array of feature vectors 220 is generated, statistical parsing engine 204 maps those feature vectors to slots in the structured data schema being populated. FIG. 7 shows a results lattice 224 which represents each of the tokens shown in FIG. 6 mapped to slots in a data schema. The data schema is represented by the entries in the left-most column of entries. The first entry "F. Name" indicates that the value associated with that slot in the schema in the schema is a first name. The second entry "L. Name" indicates that the values associated with that slot in the schema comprises a last name., ...”; and
Fig. 8, Para. [0061], lines (1-5): “FIG. 8 illustrates a solution 228 for the example discussed herein in greater detail. It can be seen that solution 228 includes the data schema on the left half thereof and the values associated with each slot in the data schema on the right half thereof.”,
the Examiner notes that the referenced system receives unstructured textual input and maps components of the input into a structured data schema wherein the populating takes place as mapping of those feature vectors to slots in the structured data schema).

However, PARKISON does not explicitly teach determining one or more relationships between a token and other portions of the received input data including determining a position of the token in the received input data 
But, SAM teaches determining one or more relationships between a token and other portions of the received input data including determining a position of the token in the received input data (SAM Para. [0008], lines (5-9): “…, and mapping the key to the attribute based on the additional data includes at least one of (i) determining that the value of the semi-structured data is associated with the location and the location is associated with the attribute, ...”).


Regarding claim 2 (Original), the combination of PARKINSON and SAM teach the limitations of claim 1.  Further, PARKINSON teaches wherein the input data is semi-structured data (PARKINSON Fig. 2, Para. [0031], lines (1-4): “FIG. 2 is a block diagram of a system 200 used in accordance with one embodiment of the present invention for receiving an unstructured data input and generating a structured output, from that input.”; and Fig. 2/3, Para. [0039], lines (4-7): “…, file system 202 provides the unstructured data input 208, along with a Type indicator and an Integer value (collectively referred to as data 212) to statistical parsing engine 204.”, 
the Examiner notes that the reference illustrates in Fig. 4 input data of contact information along with field type indicators to that of the input data being semi-structured).

Regarding claim 3 (Original), the combination of PARKINSON and SAM teach the limitations of claim 1.  Further, PARKINSON teaches wherein the target schema is a structured schema (PARKINSON Fig. 2, Para. [0031], lines (1-4): “FIG. 2 is a block diagram of a system 200 used in accordance with one embodiment of the present invention for receiving an unstructured data input and generating a structured output, from that input.”).

Regarding claim 4 (Previously Presented), the combination of PARKINSON and SAM teach the limitations of claim 1.  Further, PARKINSON teaches wherein the structure of each token comprises a relationship between one or more structural features of each token and at least one other token in a same segment (PARKINSON  Fig. 4, Para. [0041], lines (2-6): “In the embodiment shown in FIG. 4, the unstructured contact information entered by the user is "Mr. John Doe 123 Main Street, Seattle, Wash. 43678". The Type indicator illustrates a Contact type data schema, and the Integer value is set to three.”, the examiner notes the reference illustrates in Fig. 4 that the input data lines comprise information that are related, i.e. record related relationship, of contact information of a person).

Regarding claim 5 (Previously Presented), the combination of PARKINSON and SAM teach the limitations of claim 1.  Further, SAM teaches wherein to map the tokens in the source fields of the segments to the target fields of a target schema, the at least one processor further performs operations comprising:
comparing each target field to contextual information associated with each token (SAM Para. [0044], lines (8-10): “The values of the semi-structured data may be compared to the entity names in the name catalogs to find a match”); and
matching the token in each source field to one of the target fields of the target schema based at least in part on the comparing of each target field to the contextual information (SAM Para. [0044], lines 10-13): “A value that matches an entity name in a name catalog may be mapped to a cell referenced by the attribute corresponding to the name catalog, and the corresponding key may be mapped to the attribute”).

Regarding claim 8 (Currently Amended), PARKINSON teaches a system comprising:
at least one processor and a non-transitory computer readable storage medium having program instructions embodied therewith, the program instructions executable by the at least one processor to cause the at least one processor to perform operations (PARKINSON Para. [0022], lines (1-3): “The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer.”; and Fig. 1, Para. [0024], lines (1-2): “Computer 110 typically includes a variety of computer readable media.”), comprising:
receiving input data, wherein the input data includes a plurality of segments, and wherein the segments include a plurality of source fields (PARKINSON Fig. 2, Para. [0031], lines (1-4): “FIG. 2 is a block diagram of a system 200 used in accordance with one embodiment of the present invention for receiving an unstructured data input and generating a structured output, from that input.”; and 
Fig. 2/3, Para. [0039], lines (4-7): “…, file system 202 provides the unstructured data input 208, along with a Type indicator and an Integer value (collectively referred to as data 212) to statistical parsing engine 204.”; and 
Fig. 4, Para. [0041], lines (2-6): “In the embodiment shown in FIG. 4, the unstructured contact information entered by the user is "Mr. John Doe 123 Main Street, Seattle, Wash. 43678", 
the Examiner notes that the reference illustrates in Fig. 4 input data of contact information of multiple parts/segment that includes plurality of data source fields);
parsing each of the segments into tokens, wherein each token is data that is contained in a particular source field of the plurality of source fields (PARKINSON Fig. 2, Para. [0042], lines (1-5): “Once statistical parsing engine 204 receives data 212, it tokenizes the data and creates a contact record data structure, because the object type has been set to "Contact". Tokens are illustratively objects and represent individual items in the unstructured textual input.”);  
determining contextual information associated with each token wherein the contextual information comprises one or more features associated with each token, and wherein determining the contextual information comprises determining one or more features of each token based on metrics  (PARKINSON Para. [0042], lines (5-15): “Any known tokenization algorithm can be used to identify tokens in the unstructured input. …, the tokenization algorithm simply breaks the unstructured data input into separate tokens by identifying substrings that are separated by white space (such as a space, a tab, etc.) of course, additional complexity can also be added, in any known manner, to handle parenthetical items, to separate content words from attached punctuation marks, etc. In any case, the tokenization algorithm breaks the input into a vector of tokens.”; and
Fig. 2/3, Para. [0045], lines (1-9): “The vector of tokens is then subjected to featurization by statistical parsing engine 204. Featurization is indicated by block 218 in FIG. 3 and illustratively formulates a feature vector associated with each token, wherein the values in the feature vector indicate the presence or absence of features of interest found in that corresponding token. Of course, the particular features represented by the feature vector will vary widely based on the particular schema type that is to be populated by engine 204.”; and 
Fig. 6, Para. [0046], lines (1-6): “The features shown in FIG. 6 indicate whether the token contains an initial capitalized letter, whether the token comprises all digits or all alpha characters, and whether the token contains a hyphen. Of course, a wide variety of other features could be used as well and those illustrated in FIG. 6 are shown for the sake of example only.”, 
the Examiner notes that the tokenization algorithm and then the featurization, i.e. characterization, by the statistical parsing engine that formulates a feature vector associated with each token, wherein the values in the feature vector indicate the presence or absence of features of interest to that of the metrics characterizing relationships between the features of each token and a structure of a corresponding segment);
mapping the tokens  in the source fields of the segments to a plurality of target fields of a target schema based at least in part on the contextual information  (PARKINSON Para. [0010], lines (1-2): “The present invention uses a machine-learned statistical model to map between unstructured data and structured data”; and 
Fig. 2, Para. [0032], lines (1-8): “Statistical parsing engine 204 receives an unstructured input and generates a structured output, from that input. …, the unstructured input is a text string and the structured output is a populated data schema that is defined by file system 202. In other words, statistical parsing engine 204 receives an unstructured textual input and maps components of the input into a structured data schema.”; and
Fig. 2, Para. [0033], lines (1-11): “In order to train statistical engine 204 to perform this mapping function, data is collected or generated that includes examples of how people represent the type of data on which engine 204 is being trained. For instance, the present example will proceed with respect to engine 204 being trained to map contact information (such as names, addresses, telephone numbers, electronic mail addresses, etc.) to a data schema used by a personal information manager. In that instance, training data is collected or generated that provides examples of how users represent contact information”); and
populating the target fields of the target schema with the tokens  from the source fields based at least in part on the mapping, wherein the parsing of the segments into tokens, the determining of the contextual information associated the tokens, and the mapping of the tokens in the source fields (PARKISON Fig. 2, Para. [0032], lines (1-8): “Statistical parsing engine 204 receives an unstructured input and generates a structured output, from that input. …, the unstructured input is a text string and the structured output is a populated data schema that is defined by file system 202. In other words, statistical parsing engine 204 receives an unstructured textual input and maps components of the input into a structured data schema.”; and
Fig. 2/3, Para. [0045], lines (1-9): “The vector of tokens is then subjected to featurization by statistical parsing engine 204. Featurization is indicated by block 218 in FIG. 3 and illustratively formulates a feature vector associated with each token, wherein the values in the feature vector indicate the presence or absence of features of interest found in that corresponding token. Of course, the particular features represented by the feature vector will vary widely based on the particular schema type that is to be populated by engine 204.”; and
Para. [0048], lines (1-9): “…, during the training phase, the training data is tokenized and feature vectors are associated with each token as well. It will further be appreciated that the featurization algorithm used at runtime to generate the feature vectors for the tokens will illustratively be identical to that used in the training phase. This will produce a more accurate estimate of the probabilities that the various tokens belong to the various portions of the data schema being populated.”; and
Figs. 2, 6, and 7, Para. [0052], lines (1-19): “Once the ordered array of feature vectors 220 is generated, statistical parsing engine 204 maps those feature vectors to slots in the structured data schema being populated. FIG. 7 shows a results lattice 224 which represents each of the tokens shown in FIG. 6 mapped to slots in a data schema. The data schema is represented by the entries in the left-most column of entries. The first entry "F. Name" indicates that the value associated with that slot in the schema in the schema is a first name. The second entry "L. Name" indicates that the values associated with that slot in the schema comprises a last name., ...”; and
Fig. 8, Para. [0061], lines (1-5): “FIG. 8 illustrates a solution 228 for the example discussed herein in greater detail. It can be seen that solution 228 includes the data schema on the left half thereof and the values associated with each slot in the data schema on the right half thereof.”,
the Examiner notes that the referenced system receives unstructured textual input and maps components of the input into a structured data schema wherein the populating takes place as mapping of those feature vectors to slots in the structured data schema).

However, PARKISON does not explicitly teach determining one or more relationships between a token and other portions of the received input data including determining a position of the token in the received input data 
But, SAM teaches determining one or more relationships between a token and other portions of the received input data including determining a position of the token in the received input data (SAM Para. [0008], lines (5-9): “…, and mapping the key to the attribute based on the additional data includes at least one of (i) determining that the value of the semi-structured data is associated with the location and the location is associated with the attribute, ...”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of PARKINSON (disclosing method for parsing unstructured data into structured data) to include the teachings of SAM (disclosing methods for processing semi-structured data) and arrive at a method to determine relationships between a processed data input tokens and the received input data.  One of ordinary skill in the art would have been motivated to make this combination because by implementing such techniques for processing 

Regarding claim 9 (Original), the combination of PARKINSON and SAM teach the limitations of claim 8.  Further, PARKINSON teaches wherein the input data is semi-structured data (PARKINSON Fig. 2, Para. [0031], lines (1-4): “FIG. 2 is a block diagram of a system 200 used in accordance with one embodiment of the present invention for receiving an unstructured data input and generating a structured output, from that input.”; and Fig. 2/3, Para. [0039], lines (4-7): “…, file system 202 provides the unstructured data input 208, along with a Type indicator and an Integer value (collectively referred to as data 212) to statistical parsing engine 204.”, 
the Examiner notes that the reference illustrates in Fig. 4 input data of contact information along with field type indicators to that of the input data being semi-structured).

Regarding claim 10 (Original), the combination of PARKINSON and SAM teach the limitations of claim 8.  Further, PARKINSON teaches wherein the target schema is a structured schema (PARKINSON Fig. 2, Para. [0031], lines (1-4): “FIG. 2 is a block diagram of a system 200 used in accordance with one embodiment of the present invention for receiving an unstructured data input and generating a structured output, from that input.”).

Regarding claim 11 (Previously Presented), the combination of PARKINSON and SAM teach the limitations of claim 8.  Further, PARKINSON teaches wherein the structure of each token comprises a relationship between one or more structural features of each token and at least one other token in a same segment (PARKINSON  Fig. 4, Para. [0041], lines (2-6): “In the embodiment shown in FIG. 4, the unstructured contact information entered by the user is "Mr. John Doe 123 Main Street, Seattle, Wash. 43678". The Type indicator illustrates a Contact type data schema, and the Integer value is set to three.”, the examiner notes the reference illustrates in Fig. 4 that the input data lines comprise information that are related, i.e. record related relationship, of contact information of a person).

Regarding claim 12 (Previously Presented), the combination of PARKINSON and SAM teach the limitations of claim 8.  Further, SAM teaches wherein to map the token in the source fields of the segments to the target fields of a target schema, the at least one processor further performs operations comprising:
comparing each target field to contextual information associated with each token (SAM Para. [0044], lines (8-10): “The values of the semi-structured data may be compared to the entity names in the name catalogs to find a match”); and
matching the token in each source field to one of the target fields of the target schema based at least in part on the comparing of each target field to the contextual information (SAM Para. [0044], lines 10-13): “A value that matches an entity name in a name catalog may be mapped to a cell referenced by the attribute corresponding to the name catalog, and the corresponding key may be mapped to the attribute”).

Regarding claim 13, (Previously Presented), the combination of PARKINSON and SAM teach the limitations of claim 8.  Further, SAM teaches wherein the mapping of the token in the source fields of (SAM Para. [0045], lines (1-9): “The name catalogs may be used to resolve ambiguities associated with mapping values and keys to attributes. For example, a value that is mapped to two or more attributes using each attribute's rules may be compared with values in each attribute's name catalog. If the value mapped to the attributes using the rules matches a value in one attribute's name catalog but not the other attribute's name catalog, the value is more likely associated with the attribute in which the matching value was found in the corresponding name catalog”; and Para. [0052], lines (1-6): “When using more than one type of additional data to resolve ambiguities, the additional data may be associated with an accuracy modifier based on the type of data. For example, language use history matches could be considered 1.2 times more likely to be correct than social media history matches”, the Examiner asserts that the accuracy modifier is used as likelihood indicator for matching when mapping values and keys to attributes to that using confidence values to indicate degrees of matching between the target data and target fields of the instant application).

Regarding claim 14 (Previously Presented), the combination of PARKINSON and SAM teach the limitations of claim 8.  Further, SAM teaches wherein, to map the tokens in the source fields of the segments to the target fields of a target schema, the at least one processor further performs operations comprising:
determining confidence values, wherein the confidence values indicate degrees of matching between each token and one or more target fields (SAM Para. [0052], lines (1-6): “When using more than one type of additional data to resolve ambiguities, the additional data may be associated with an accuracy modifier based on the type of data. For example, language use history matches could be considered 1.2 times more likely to be correct than social media history matches”) ; and
(SAM Para. [0045], lines (1-9): “The name catalogs may be used to resolve ambiguities associated with mapping values and keys to attributes. For example, a value that is mapped to two or more attributes using each attribute's rules may be compared with values in each attribute's name catalog. If the value mapped to the attributes using the rules matches a value in one attribute's name catalog but not the other attribute's name catalog, the value is more likely associated with the attribute in which the matching value was found in the corresponding name catalog”).

Regarding claim 15 (Currently Amended), PARKINSON teaches a computer-implemented method for transforming data for a target schema (PARKINSON Title: “System And Method For Parsing Unstructured Data Into Structured Data”), the method comprising:
receiving input data, wherein the input data includes a plurality of segments, and wherein the segments include a plurality of source fields (PARKINSON Fig. 2, Para. [0031], lines (1-4): “FIG. 2 is a block diagram of a system 200 used in accordance with one embodiment of the present invention for receiving an unstructured data input and generating a structured output, from that input.”; and 
Fig. 2/3, Para. [0039], lines (4-7): “…, file system 202 provides the unstructured data input 208, along with a Type indicator and an Integer value (collectively referred to as data 212) to statistical parsing engine 204.”; and 
Fig. 4, Para. [0041], lines (2-6): “In the embodiment shown in FIG. 4, the unstructured contact information entered by the user is "Mr. John Doe 123 Main Street, Seattle, Wash. 43678", 
the Examiner notes that the reference illustrates in Fig. 4 input data of contact information of multiple parts/segment that includes plurality of data source fields);
parsing each of the segments into tokens, wherein each token is data that is contained in a particular source field of the plurality of source fields (PARKINSON Fig. 2, Para. [0042], lines (1-5): “Once statistical parsing engine 204 receives data 212, it tokenizes the data and creates a contact record data structure, because the object type has been set to "Contact". Tokens are illustratively objects and represent individual items in the unstructured textual input.”);  
  determining contextual information associated with each token wherein the contextual information comprises one or more features associated with each token, and wherein determining the 5 of 12U.S. Application No.: 16/004,863contextual information comprises determining one or more features of each token based on metrics (PARKINSON Para. [0042], lines (5-15): “Any known tokenization algorithm can be used to identify tokens in the unstructured input. …, the tokenization algorithm simply breaks the unstructured data input into separate tokens by identifying substrings that are separated by white space (such as a space, a tab, etc.) of course, additional complexity can also be added, in any known manner, to handle parenthetical items, to separate content words from attached punctuation marks, etc. In any case, the tokenization algorithm breaks the input into a vector of tokens.”; and
Fig. 2/3, Para. [0045], lines (1-9): “The vector of tokens is then subjected to featurization by statistical parsing engine 204. Featurization is indicated by block 218 in FIG. 3 and illustratively formulates a feature vector associated with each token, wherein the values in the feature vector indicate the presence or absence of features of interest found in that corresponding token. Of course, the particular features represented by the feature vector will vary widely based on the particular schema type that is to be populated by engine 204.”; and 
Fig. 6, Para. [0046], lines (1-6): “The features shown in FIG. 6 indicate whether the token contains an initial capitalized letter, whether the token comprises all digits or all alpha characters, and whether the token contains a hyphen. Of course, a wide variety of other features could be used as well and those illustrated in FIG. 6 are shown for the sake of example only.”, 
the Examiner notes that the tokenization algorithm and then the featurization, i.e. characterization, by the statistical parsing engine that formulates a feature vector associated with each token, wherein the values in the feature vector indicate the presence or absence of features of interest to that of the metrics characterizing relationships between the features of each token and a structure of a corresponding segment);
mapping the tokens  in the source fields of the segments to a plurality of target fields of a target schema based at least in part on the contextual information  (PARKINSON Para. [0010], lines (1-2): “The present invention uses a machine-learned statistical model to map between unstructured data and structured data”; and 
Fig. 2, Para. [0032], lines (1-8): “Statistical parsing engine 204 receives an unstructured input and generates a structured output, from that input. …, the unstructured input is a text string and the structured output is a populated data schema that is defined by file system 202. In other words, statistical parsing engine 204 receives an unstructured textual input and maps components of the input into a structured data schema.”; and
Fig. 2, Para. [0033], lines (1-11): “In order to train statistical engine 204 to perform this mapping function, data is collected or generated that includes examples of how people represent the type of data on which engine 204 is being trained. For instance, the present example will proceed with respect to engine 204 being trained to map contact information (such as names, addresses, telephone numbers, electronic mail addresses, etc.) to a data schema used by a personal information manager. In that instance, training data is collected or generated that provides examples of how users represent contact information”); and
populating the target fields of the target schema with the tokens  from the source fields based at least in part on the mapping, wherein the parsing of the segments into tokens, the determining of the contextual information associated the tokens, and the mapping of the tokens in the source fields (PARKISON Fig. 2, Para. [0032], lines (1-8): “Statistical parsing engine 204 receives an unstructured input and generates a structured output, from that input. …, the unstructured input is a text string and the structured output is a populated data schema that is defined by file system 202. In other words, statistical parsing engine 204 receives an unstructured textual input and maps components of the input into a structured data schema.”; and
Fig. 2/3, Para. [0045], lines (1-9): “The vector of tokens is then subjected to featurization by statistical parsing engine 204. Featurization is indicated by block 218 in FIG. 3 and illustratively formulates a feature vector associated with each token, wherein the values in the feature vector indicate the presence or absence of features of interest found in that corresponding token. Of course, the particular features represented by the feature vector will vary widely based on the particular schema type that is to be populated by engine 204.”; and
Para. [0048], lines (1-9): “…, during the training phase, the training data is tokenized and feature vectors are associated with each token as well. It will further be appreciated that the featurization algorithm used at runtime to generate the feature vectors for the tokens will illustratively be identical to that used in the training phase. This will produce a more accurate estimate of the probabilities that the various tokens belong to the various portions of the data schema being populated.”; and
Figs. 2, 6, and 7, Para. [0052], lines (1-19): “Once the ordered array of feature vectors 220 is generated, statistical parsing engine 204 maps those feature vectors to slots in the structured data schema being populated. FIG. 7 shows a results lattice 224 which represents each of the tokens shown in FIG. 6 mapped to slots in a data schema. The data schema is represented by the entries in the left-most column of entries. The first entry "F. Name" indicates that the value associated with that slot in the schema in the schema is a first name. The second entry "L. Name" indicates that the values associated with that slot in the schema comprises a last name., ...”; and
Fig. 8, Para. [0061], lines (1-5): “FIG. 8 illustrates a solution 228 for the example discussed herein in greater detail. It can be seen that solution 228 includes the data schema on the left half thereof and the values associated with each slot in the data schema on the right half thereof.”,
the Examiner notes that the referenced system receives unstructured textual input and maps components of the input into a structured data schema wherein the populating takes place as mapping of those feature vectors to slots in the structured data schema).

However, PARKISON does not explicitly teach determining one or more relationships between a token and other portions of the received input data including determining a position of the token in the received input data 
But, SAM teaches determining one or more relationships between a token and other portions of the received input data including determining a position of the token in the received input data (SAM Para. [0008], lines (5-9): “…, and mapping the key to the attribute based on the additional data includes at least one of (i) determining that the value of the semi-structured data is associated with the location and the location is associated with the attribute, ...”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of PARKINSON (disclosing method for parsing unstructured data into structured data) to include the teachings of SAM (disclosing methods for processing semi-structured data) and arrive at a method to determine relationships between a processed data input tokens and the received input data.  One of ordinary skill in the art would have been motivated to make this combination because by implementing such techniques for processing 

Regarding claim 16 (Original), the combination of PARKINSON and SAM teach the limitations of claim 15.  Further, PARKINSON teaches wherein the input data is semi-structured data (PARKINSON Fig. 2, Para. [0031], lines (1-4): “FIG. 2 is a block diagram of a system 200 used in accordance with one embodiment of the present invention for receiving an unstructured data input and generating a structured output, from that input.”; and Fig. 2/3, Para. [0039], lines (4-7): “…, file system 202 provides the unstructured data input 208, along with a Type indicator and an Integer value (collectively referred to as data 212) to statistical parsing engine 204.”, 
the Examiner notes that the reference illustrates in Fig. 4 input data of contact information along with field type indicators to that of the input data being semi-structured).

Regarding claim 17 (Original), the combination of PARKINSON and SAM teach the limitations of claim 15.  Further, PARKINSON teaches wherein the target schema is a structured schema (PARKINSON Fig. 2, Para. [0031], lines (1-4): “FIG. 2 is a block diagram of a system 200 used in accordance with one embodiment of the present invention for receiving an unstructured data input and generating a structured output, from that input.”).

Regarding claim 18 (Previously Presented), the combination of PARKINSON and SAM teach the limitations of claim 15.  Further, PARKINSON teaches wherein the structure of each token comprises a relationship between one or more structural features of each token and at least one other token in a same segment (PARKINSON  Fig. 4, Para. [0041], lines (2-6): “In the embodiment shown in FIG. 4, the unstructured contact information entered by the user is "Mr. John Doe 123 Main Street, Seattle, Wash. 43678". The Type indicator illustrates a Contact type data schema, and the Integer value is set to three.”, the examiner notes the reference illustrates in Fig. 4 that the input data lines comprise information that are related, i.e. record related relationship, of contact information of a person).

Regarding claim 19 (Previously Presented), the combination of PARKINSON and SAM teach the limitations of claim 15.  Further, SAM teaches wherein to map the token in the source fields of the segments to the target fields of a target schema, the at least one processor further performs operations comprising:
comparing each target field to contextual information associated with each token (SAM Para. [0044], lines (8-10): “The values of the semi-structured data may be compared to the entity names in the name catalogs to find a match”); and
matching the token in each source field to one of the target fields of the target schema based at least in part on the comparing of each target field to the contextual information (SAM Para. [0044], lines 10-13): “A value that matches an entity name in a name catalog may be mapped to a cell referenced by the attribute corresponding to the name catalog, and the corresponding key may be mapped to the attribute”).

Regarding claim 20 (Previously Presented), the combination of PARKINSON and SAM teach the limitations of claim 15.  Further, SAM teaches wherein the mapping of the token in the source fields of (SAM Para. [0045], lines (1-9): “The name catalogs may be used to resolve ambiguities associated with mapping values and keys to attributes. For example, a value that is mapped to two or more attributes using each attribute's rules may be compared with values in each attribute's name catalog. If the value mapped to the attributes using the rules matches a value in one attribute's name catalog but not the other attribute's name catalog, the value is more likely associated with the attribute in which the matching value was found in the corresponding name catalog”; and Para. [0052], lines (1-6): “When using more than one type of additional data to resolve ambiguities, the additional data may be associated with an accuracy modifier based on the type of data. For example, language use history matches could be considered 1.2 times more likely to be correct than social media history matches”, the Examiner asserts that the accuracy modifier is used as likelihood indicator for matching when mapping values and keys to attributes to that using confidence values to indicate degrees of matching between the target data and target fields of the instant application).

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over US Patent Application Publication (US 2006/0053133 A1) issued to Parkinson (hereinafter as “Parkinson”), in view of US Patent Application Publication (US 2014/0280352 A1) issued to Sam et al. (hereinafter as “SAM”), and in view of US Patent Application Publication (US 2008/0021912 A1) issued to Seligman et al. (hereinafter as “SELIGMAN”).
Regarding claim 6 (Previously Presented), the combination of PARKINSON and SAM teach the limitations of claim 1.  However, the combination of PARKINSON and SAM do not explicitly teach wherein the mapping of the tokens in the source fields of the segments to the target fields of a target schema is based at least in part on confidence values, and wherein confidence values that meet one or 
But SELIGMAN teaches wherein the mapping of the tokens in the source fields of the segments to the target fields of a target schema is based at least in part on confidence values, and wherein confidence values that meet one or more confidence value thresholds indicate degrees of matching between each token and one or more target fields (SELIGMAN  Fig. 4, Para. [0038], lines (1-8): “The final match matrix 432 is presented to the user as a collection of lines connecting the source elements to the target elements within the GUI 406. The GUI includes a number filters that limit which potential matches are shown onscreen. For example, one filter hides any potential match whose confidence score falls below some threshold value. Another filter displays only those potential matches pertaining to a given subset of the schema graph.”, 
the Examiner asserts that connecting the source elements to the target elements based on filters that hides any potential match whose confidence score falls below some threshold value to that using confidence values that meet one or more confidence value thresholds to indicate degrees of matching).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combinations of PARKINSON (disclosing method for parsing unstructured data into structured data) and SAM (disclosing methods for processing semi-structured data), to include the teachings of SELIGMAN  (disclosing methods for schema matching techniques) and arrive at a method to map tokens/elements from source fields to target fields based on confidence values.  One of ordinary skill in the art would have been motivated to make this combination because by implementing such techniques for processing semi-structured data, a user can improve the accuracy of schema-matching techniques, thereby achieve quality matches that more accurately reflect the semantic correspondences between the source and target schemas, as recognized by (SELIGMAN, Para. .

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Scott et al.; (US 2007/0073745 A1); “Similarity metric for semantic profiling”.
Goris et al.; (US 2017/0242907 A1); “Techniques for processing unstructured data set”.
Subrahmanyam et al.; (US 2011/0066585 A1); “Extracting information from unstructured data and mapping the information to a structured schema”.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Zuheir Mheir whose telephone number is (571)272-4151.  The examiner can normally be reached on Monday - Friday 9:00 - 5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Vital can be reached on (571)272-4215.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
08/10/2021

/ZUHEIR A MHEIR/Patent Examiner, Art Unit 2162      

/PIERRE M VITAL/Supervisory Patent Examiner, Art Unit 2162