DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claims 5, 9, 12, 20 are objected to because of the following informalities:
"wherein determining" should be "wherein the determining" [Claim 5, line 1]. That is, the claims only include a singular determining step which these limitations intend to further limit;
"wherein generating" should be "wherein the generating" [Claim 9, line 1]. That is, the claims only include a singular “generating the confidence score” step which these limitations intend to further limit;
"wherein generating" should be "wherein the generating" [Claim 12, line 1]. That is, the claims only include a singular “generating a trained” step which these limitations intend to further limit;
"wherein generating" should be "wherein the program instructions programmed to generate " [Claim 20, line 1]. That is, the claims only include a singular “program instructions programmed to generate a confidence score” step which these limitations intend to further limit.
Appropriate correction is required. Further, in an effort to practice compact prosecution, each of these limitations has been interpreted similarly as in the provided recommendation for each limitation, above.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-6, 8-9, 14-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. Independent claim 1 recites a computer-implemented method comprising: obtaining source schema metadata, wherein the source schema metadata is associated with fields of a source schema; obtaining target schema metadata with a target schema, wherein the target schema metadata is associated with fields of a target schema; determining, for each field of the source schema and each field of the target schema, a representation for each field based, at least in part, on the source schema metadata or the target schema metadata associated with each field; and providing the representation for each field of the source schema and each field of the target schema for use in generating data mappings between the source schema and the target schema.
The limitation of determining, for each field of the source schema and each field of the target schema, a representation for each field based, at least in part, on the source schema metadata or the target schema metadata associated with each field, as drafted, is a process that, under its broadest reasonable interpretation, covers a mental process but from the recitation of implementing it on generic computer components. That is, nothing in the claim elements preclude the steps from practically being performed in the mind. For example, “determining” in the context of this claim encompasses the user observing/analyzing field data and judging representations of fields based on observed/analyzed source or target metadata about each field. The examiner has interpreted “a representation”, under its broadest reasonable interpretation and in light of [0047] of the specification, to be at least any field or identifier of a field. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas.  Accordingly, claim 1 recites an abstract idea (Step 2A, Prong 1).
This judicial exception is not integrated into a practical application. In particular, the claim recites the additional elements of – a computer-implemented method comprising: obtaining source schema metadata, wherein the source schema metadata is associated with fields of a source schema; obtaining target schema metadata with a target schema, wherein the target schema metadata is associated with fields of a target schema; and providing the representation for each field of the source schema and each field of the target schema for use in generating data mappings between the source schema and the target schema. The computer is recited at a high-level of generality (i.e., as generic computer devices performing generic computer functions) and do not meaningfully limit the claim. The additional elements of obtaining source schema metadata, wherein the source schema metadata is associated with fields of a source schema; obtaining target schema metadata with a target schema, wherein the target schema metadata is associated with fields of a target schema; and providing the representation for each field of the source schema and each field of the target schema for use in generating data mappings between the source schema and the target schema represent insignificant extra-solution activities to the judicial exception. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea (Step 2A, Prong 2).
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of obtaining source schema metadata, wherein the source schema metadata is associated with fields of a source schema; obtaining target schema metadata with a target schema, wherein the target schema metadata is associated with fields of a target schema; and providing the representation for each field of the source schema and each field of the target schema for use in generating data mappings between the source schema and the target schema represent insignificant extra-solution activities that are well-understood, routine, and conventional activities previously known to the industry. That is, these limitations represent well-understood, routine, conventional activities in the fields of data processing and/or data storage and retrieval and are merely directed to the well-understood, routine, conventional activity of storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015). Therefore, these additional elements do not cause the claim to amount to significantly more than the judicial exception. (Step 2B). Accordingly, claim 1 is not patent eligible.
Independent claim 14 recites a computer program product comprising a computer readable storage medium having stored thereon: program instructions programmed to obtain source schema metadata, wherein the source schema metadata is associated with fields of a source schema; program instructions programmed to obtain target schema metadata with a target schema, wherein the target schema metadata is associated with fields of a target schema; program instructions programmed to determine, for each field of the source schema and each field of the target schema, a representation for each field based, at least in part, on the source schema metadata or the target schema metadata associated with each field; and program instructions programmed to provide the representation for each field of the source schema and each field of the target schema for use in generating data mappings between the source schema and the target schema.
The limitation of … determine, for each field of the source schema and each field of the target schema, a representation for each field based, at least in part, on the source schema metadata or the target schema metadata associated with each field, as drafted, is a process that, under its broadest reasonable interpretation, covers a mental process but from the recitation of implementing it on generic computer components. That is, but for the “program instructions programmed to” language, nothing in the claim elements preclude the steps from practically being performed in the mind. For example, “…determine” in the context of this claim encompasses the user observing/analyzing field data and judging representations of fields based on observed/analyzed source or target metadata about each field. The examiner has interpreted “a representation”, under its broadest reasonable interpretation and in light of [0047] of the specification, to be at least any field or identifier of a field. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas.  Accordingly, claim 14 recites an abstract idea (Step 2A, Prong 1).
This judicial exception is not integrated into a practical application. In particular, the claim recites the additional elements of – a computer program product comprising a computer readable storage medium having stored thereon: program instructions programmed to obtain source schema metadata, wherein the source schema metadata is associated with fields of a source schema; program instructions programmed to obtain target schema metadata with a target schema, wherein the target schema metadata is associated with fields of a target schema; and program instructions programmed to provide the representation for each field of the source schema and each field of the target schema for use in generating data mappings between the source schema and the target schema. The computer program product and computer readable storage medium having program instructions are recited at a high-level of generality (i.e., as generic computer devices performing generic computer functions) and do not meaningfully limit the claim. The additional elements of program instructions programmed to obtain source schema metadata, wherein the source schema metadata is associated with fields of a source schema; program instructions programmed to obtain target schema metadata with a target schema, wherein the target schema metadata is associated with fields of a target schema; and program instructions programmed to provide the representation for each field of the source schema and each field of the target schema for use in generating data mappings between the source schema and the target schema represent insignificant extra-solution activities to the judicial exception. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea (Step 2A, Prong 2).
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of program instructions programmed to obtain source schema metadata, wherein the source schema metadata is associated with fields of a source schema; program instructions programmed to obtain target schema metadata with a target schema, wherein the target schema metadata is associated with fields of a target schema; and program instructions programmed to provide the representation for each field of the source schema and each field of the target schema for use in generating data mappings between the source schema and the target schema represent insignificant extra-solution activities that are well-understood, routine, and conventional activities previously known to the industry. That is, these limitations represent well-understood, routine, conventional activities in the fields of data processing and/or data storage and retrieval and are merely directed to the well-understood, routine, conventional activity of storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015). Therefore, these additional elements do not cause the claim to amount to significantly more than the judicial exception. (Step 2B). Accordingly, claim 14 is not patent eligible
Claims 2-6, 8-9, 15-20 depend on claims 1, 14 and include all the limitations of these claims. Therefore, claims 2-6, 8-9, 15-20 are directed to the same abstract idea and the analysis must proceed to (Step 2A, Prong 2). 
Claims 2, 15 similarly recite additional limitations of wherein the representation for each field of the source schema and each field of the target schema comprises a fixed size vector embedding that describes a combination of metadata associated with a particular field of the source schema or the target schema. This judicial exception is not integrated into a practical application. The additional elements represent further mental process steps of judging a representation, such as a fixed size vector, for fields of source and target schema based on the source or target metadata. That is, one could observe and analyze the data, judge components for the vector based on the analysis of the data, and perhaps, write down the data in vector form on a piece of paper.  If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. This additional step is considered an abstract idea (mental process step) and does not integrate the judicial exception into a practical application. 
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element represents a further mental process step. Therefore, these additional limitations are not sufficient to amount to significantly more than the judicial exception. Claims 2, 15 are not patent eligible.
Claims 3, 16 similarly recite additional limitations pertaining to calculating field-to-field matches between the source schema and the target schema using the representations and generating data mapping suggestions between fields of the source schema and the target schema. This judicial exception is not integrated into a practical application. The additional elements represent further mental process steps of judging matches between representations of the data and also judging suggestions for mapping of fields based on the matching judgments. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. This additional step is considered an abstract idea (mental process step) and does not integrate the judicial exception into a practical application. 
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements represent further mental process steps. Therefore, these additional limitations are not sufficient to amount to significantly more than the judicial exception. Claims 3, 16 are not patent eligible.
Claim 4 recites an additional limitation of wherein the representations are determined for fields of a dynamic object schema. This judicial exception is not integrated into a practical application. The additional elements represent further mental process steps of judging a representation for fields of a dynamic object schema. That is, one could observe and analyze the data, judge representations based on the analysis of the data, and perhaps, write down them down on a piece of paper. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. This additional step is considered an abstract idea (mental process step) and does not integrate the judicial exception into a practical application. 
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element represents a further mental process step. Therefore, these additional limitations are not sufficient to amount to significantly more than the judicial exception. Claim 4 is not patent eligible.
Claims 5, 17 similarly recite the additional limitations pertaining to determining the representation for each field of the source schema and each field of the target schema comprises: providing the source schema metadata or the target schema metadata associated with each field individually as input to a machine learning model, and receiving, for each field as output from the machine learning model, a vector representation. The “determining” recites an abstract idea (mental process step) that involves a mental judgment by the user, as aforementioned. However, the additional limitations pertaining to the “…providing” and “receiving” do not integrate the abstract idea into a practical application and merely represent insignificant extra-solution activities to the judicial exception and are mere data gathering steps.  Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements pertaining to the “…providing” and “receiving” represent well-understood, routine, conventional activity previously known to the industry. That is, these limitations represent well-understood, routine, conventional activity in the fields of data processing and/or data storage and retrieval and are merely directed to the well-understood, routine, conventional activity of storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015). Therefore, these additional elements do not cause the claim to amount to significantly more than the judicial exception.
Claims 6, 18 similarly recite additional limitations of pertaining to the machine learning model being trained to identify relevant information from multiple metadata columns associated with a schema field to generate a single vector representation for the schema field. This judicial exception is not integrated into a practical application. The additional elements of “…identify relevant information from multiple metadata columns associated with a schema field to generate a single vector representation for the schema field” represents further mental process step of judging relevant information from multiple metadata columns to generate a vector representation. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. This additional step is considered an abstract idea (mental process step) and does not integrate the judicial exception into a practical application. The claims also recite “wherein the machine learning model is trained to…”. This portion of the limitation does not integrate the abstract idea into a practical application because it generally links the use of a judicial exception to a particular technological environment or field of use. That is, “wherein the machine learning model is trained to…” links the abstract idea (mental process steps) to the field of machine learning and artificial intelligence. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of “…identify relevant information from multiple metadata columns associated with a schema field to generate a single vector representation for the schema field” represents a further mental process step. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of “wherein the machine learning model is trained to…” generally links the use of a judicial exception to a particular technological environment or field of use. Therefore, these additional limitations are not sufficient to amount to significantly more than the judicial exception. Claims 6, 18 are not patent eligible.
Claims 8, 19 similarly recite additional limitations pertaining to generating a confidence score for each field of the target schema as compared to each field of the source schema, and generating data mapping suggestions between the source schema and the target schema for the fields of the target schema based, at least in part, on the confidence scores. This judicial exception is not integrated into a practical application. The additional elements represent further mental process steps of judging confidence scores based on comparing representations of the fields of the schemas, and further judging a mapping of suggestions between the schemas for the fields based on the scores. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. This additional step is considered an abstract idea (mental process step) and does not integrate the judicial exception into a practical application. 
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements represent further mental process steps. Therefore, these additional limitations are not sufficient to amount to significantly more than the judicial exception. Claims 8, 19 are not patent eligible.
Claims 9, 20 similarly recite additional limitations of generating the confidence score for each field of the target schema comprises calculating an overall confidence score between two fields using cosine similarity. This judicial exception is not integrated into a practical application. These additional elements merely recite mathematical calculations or relationships. If a claim limitation, under its broadest reasonable interpretation, covers mathematical calculations or relationships, then it falls within the “mathematical concepts” grouping of abstract ideas. This additional step is considered an abstract idea (mathematical concept) and does not integrate the judicial exception into a practical application. 
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements represent mathematical concepts. Therefore, these additional limitations are not sufficient to amount to significantly more than the judicial exception. Claims 9, 20 are not patent eligible.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.	

Claims 1-3, 14-16 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Sassin (US 2019/0318272).	
Regarding claim 1, Sassin discloses:
A computer-implemented method comprising: obtaining source schema metadata, wherein the source schema metadata is associated with fields of a source schema at least by ([0021] “feature extraction 106 is performed on source schema 102 and target schema 104. For example, feature extraction 106 can be used to provide machine learning component 110 the fine grain detail that enables ETL rules/mappings generation. In turn, learning component 110 can analyze at this fine granularity to predict the schema specific rules/mappings for source schema 102 and target schema 104. The features extracted for each schema can include table names, column names, table type, foreign keys between tables, self-referential tables, and a number of other features.” [0038]-[0039] disclose features (source schema metadata) that are extracted by the feature extraction module from source schema 102, as shown in Fig. 1, such as at least table or column names (fields of a source schema));
obtaining target schema metadata with a target schema, wherein the target schema metadata is associated with fields of a target schema at least by ([0021] “feature extraction 106 is performed on source schema 102 and target schema 104. For example, feature extraction 106 can be used to provide machine learning component 110 the fine grain detail that enables ETL rules/mappings generation. In turn, learning component 110 can analyze at this fine granularity to predict the schema specific rules/mappings for source schema 102 and target schema 104. The features extracted for each schema can include table names, column names, table type, foreign keys between tables, self-referential tables, and a number of other features.” [0038]-[0039] disclose features (target schema metadata) that are extracted by the feature extraction module from target schema 104, as shown in Fig. 1, such as at least table or column names (fields of a target schema));
determining, for each field of the source schema and each field of the target schema, a representation for each field based, at least in part, on the source schema metadata or the target schema metadata associated with each field at least by ([0169] “FIG. 8 illustrates a sample attribute vector according to an example embodiment. For example, attribute vector 800 can include source schema, target schema, and extracted features data relevant to the considered use case. Columns 802 depict source column attributes while columns 804 depict column target attributes. Features 806 depict extracted features from the source schema, such as a pre-fix and a post-fix for the relevant column names”) and the representations for each field is at least a portion, or field, of the attribute vector, as shown in Fig. 8, which includes source schema, target, schema, and the extracted features (source schema metadata and target schema metadata);
and providing the representation for each field of the source schema and each field of the target schema for use in generating data mappings between the source schema and the target schema at least by ([0022] “Embodiments also provide the features extracted by feature extraction 106 to rules interpreter 114. For example, rules interpreter 114 can leverage the extracted features to interpret the predicted rules to generate output ETL mappings 116” [0169] “Mapping expressions 808 depict the mapping expression to be learned, for example by the learning component.” [0170] “In an embodiment, a plurality of example mapping expressions can be provided to the learning component in the use case.”) and the mapping expressions, as specified in the attribute vector 800, are provided to the learning component as an example for use in generating mappings.
As per claim 2, claim 1 is incorporated, Sassin further discloses:
wherein the representation for each field of the source schema and each field of the target schema comprises a fixed size vector embedding that describes a combination of metadata associated with a particular field of the source schema or the target schema at least by ([0169] “FIG. 8 illustrates a sample attribute vector according to an example embodiment. For example, attribute vector 800 can include source schema, target schema, and extracted features data relevant to the considered use case. Columns 802 depict source column attributes while columns 804 depict column target attributes. Features 806 depict extracted features from the source schema, such as a pre-fix and a post-fix for the relevant column names. Mapping expressions 808 depict the mapping expression to be learned, for example by the learning component.”) and Fig. 8 shows that the vector is of fixed size, that is, each vector contains eight fields in the example.
As per claim 3, claim 1 is incorporated, Sassin further discloses:
further comprising: calculating field-to-field matches between the source schema and the target schema using the representations for the fields of the source schema and the target schema, wherein the representations are generated based on multiple metadata columns associated with a schema field at least by ([0164] “FIG. 7 illustrates mapping expressions according to an embodiment. For example, structure 700 includes mappings expressions 702, 704, 706, and 708. Mappings expressions can map a source data (e.g., from a data table) to a target attribute and data table. In some instances, a source column can be mapped to a target column with a similar name, as illustrated by mapping expression 704. In some instances, a system function, such as sysdata, can be used in a mapping expression, as illustrated in mapping expression 708” [0185] “At 908, the predicted ETL rules can be interpreted. For example, a rules interpreter can interpret the logic defined in the ETL rules to generate mapping expressions between the source schema and target schema. The interpretation can be based on the source schema, target schema, and extracted features. In some embodiments, an ETL specification is implemented that defines details of the ETL solution, and the interpretation can be based on the ETL specification”);
and generating data mapping suggestions between fields of the source schema and the target schema based on the field-to-field matches at least by ([0021] “For example, feature extraction 106 can be used to provide machine learning component 110 the fine grain detail that enables ETL rules/mappings generation. In turn, learning component 110 can analyze at this fine granularity to predict the schema specific rules/mappings for source schema 102 and target schema 104. The features extracted for each schema can include table names, column names, table type, foreign keys between tables, self-referential tables, and a number of other features. These extracted features can express generic, commonly used structures or patterns in the schemas, can enhance the schema information, and can simplify the work for machine learning component 110.” [0186] “At 908, additional ETL mappings can be generated based on the predicted ETL rules, the source schema, the target schema, and the extracted features, the additional ETL mappings providing additional definitions for extracting data from one or more tables of the source schema and loading the extracted data into one or more tables of the target schema. For example, the rules interpreter can be used to generate the additional ETL mappings. In some embodiments, the additional ETL mappings can be mapping expressions that define a relationship between one or more source columns of a source table and a target column of a target table”) and the data mapping suggestions are the predicted or additional mappings; Fig. 9 shows the generating of predicted ETL mappings (data mapping suggestions) in step 910.
Regarding claim 14, Sassin discloses:
A computer program product comprising a computer readable storage medium having stored thereon: program instructions programmed to obtain source schema metadata, wherein the source schema metadata is associated with fields of a source schema at least by ([0021] “feature extraction 106 is performed on source schema 102 and target schema 104. For example, feature extraction 106 can be used to provide machine learning component 110 the fine grain detail that enables ETL rules/mappings generation. In turn, learning component 110 can analyze at this fine granularity to predict the schema specific rules/mappings for source schema 102 and target schema 104. The features extracted for each schema can include table names, column names, table type, foreign keys between tables, self-referential tables, and a number of other features.” [0027] “FIG. 2 is a block diagram of a computer server/system 210 in accordance with embodiments.” [0030] “System 210 may include memory 214 for storing information and instructions for execution by processor 222.” [0038]-[0039] disclose features (source schema metadata) that are extracted by the feature extraction module from source schema 102, as shown in Fig. 1, such as at least table or column names (fields of a source schema));
program instructions programmed to obtain target schema metadata with a target schema, wherein the target schema metadata is associated with fields of a target schema at least by ([0021] “feature extraction 106 is performed on source schema 102 and target schema 104. For example, feature extraction 106 can be used to provide machine learning component 110 the fine grain detail that enables ETL rules/mappings generation. In turn, learning component 110 can analyze at this fine granularity to predict the schema specific rules/mappings for source schema 102 and target schema 104. The features extracted for each schema can include table names, column names, table type, foreign keys between tables, self-referential tables, and a number of other features.” [0030] “System 210 may include memory 214 for storing information and instructions for execution by processor 222.” [0038]-[0039] disclose features (target schema metadata) that are extracted by the feature extraction module from target schema 104, as shown in Fig. 1, such as at least table or column names (fields of a target schema));
program instructions programmed to determine, for each field of the source schema and each field of the target schema, a representation for each field based, at least in part, on the source schema metadata or the target schema metadata associated with each field at least by ([0030] “System 210 may include memory 214 for storing information and instructions for execution by processor 222.” [0169] “FIG. 8 illustrates a sample attribute vector according to an example embodiment. For example, attribute vector 800 can include source schema, target schema, and extracted features data relevant to the considered use case. Columns 802 depict source column attributes while columns 804 depict column target attributes. Features 806 depict extracted features from the source schema, such as a pre-fix and a post-fix for the relevant column names”) and the representations for each field is at least a portion, or field, of the attribute vector, as shown in Fig. 8, which includes source schema, target, schema, and the extracted features (source schema metadata and target schema metadata);
and program instructions programmed to provide the representation for each field of the source schema and each field of the target schema for use in generating data mappings between the source schema and the target schema at least by ([0022] “Embodiments also provide the features extracted by feature extraction 106 to rules interpreter 114. For example, rules interpreter 114 can leverage the extracted features to interpret the predicted rules to generate output ETL mappings 116” [0030] “System 210 may include memory 214 for storing information and instructions for execution by processor 222.” [0169] “Mapping expressions 808 depict the mapping expression to be learned, for example by the learning component.” [0170] “In an embodiment, a plurality of example mapping expressions can be provided to the learning component in the use case.”) and the mapping expressions, as specified in the attribute vector 800, are provided to the learning component as an example for use in generating mappings.
As per claim 15, claim 14 is incorporated, Sassin further discloses:
wherein the representation for each field of the source schema and each field of the target schema comprises a fixed size vector embedding that describes a combination of metadata associated with a particular field of the source schema or the target schema at least by ([0030] “System 210 may include memory 214 for storing information and instructions for execution by processor 222.” [0169] “FIG. 8 illustrates a sample attribute vector according to an example embodiment. For example, attribute vector 800 can include source schema, target schema, and extracted features data relevant to the considered use case. Columns 802 depict source column attributes while columns 804 depict column target attributes. Features 806 depict extracted features from the source schema, such as a pre-fix and a post-fix for the relevant column names. Mapping expressions 808 depict the mapping expression to be learned, for example by the learning component.”) and Fig. 8 shows that the vector is of fixed size, that is, each vector contains eight fields in the example.
As per claim 16, claim 14 is incorporated, Sassin further discloses:
the computer readable storage medium having further stored thereon: program instructions programmed to calculate field-to-field matches between the source schema and the target schema using the representations for the fields of the source schema and the target schema, wherein the representations are generated based on multiple metadata columns associated with a schema field; and program instructions programmed to generate data mapping suggestions between fields of the source schema and the target schema based on the field-to-field matches at least by ([0030] “System 210 may include memory 214 for storing information and instructions for execution by processor 222.” [0164] “FIG. 7 illustrates mapping expressions according to an embodiment. For example, structure 700 includes mappings expressions 702, 704, 706, and 708. Mappings expressions can map a source data (e.g., from a data table) to a target attribute and data table. In some instances, a source column can be mapped to a target column with a similar name, as illustrated by mapping expression 704. In some instances, a system function, such as sysdata, can be used in a mapping expression, as illustrated in mapping expression 708” [0185] “At 908, the predicted ETL rules can be interpreted. For example, a rules interpreter can interpret the logic defined in the ETL rules to generate mapping expressions between the source schema and target schema. The interpretation can be based on the source schema, target schema, and extracted features. In some embodiments, an ETL specification is implemented that defines details of the ETL solution, and the interpretation can be based on the ETL specification”);
and generating data mapping suggestions between fields of the source schema and the target schema based on the field-to-field matches at least by ([0021] “For example, feature extraction 106 can be used to provide machine learning component 110 the fine grain detail that enables ETL rules/mappings generation. In turn, learning component 110 can analyze at this fine granularity to predict the schema specific rules/mappings for source schema 102 and target schema 104. The features extracted for each schema can include table names, column names, table type, foreign keys between tables, self-referential tables, and a number of other features. These extracted features can express generic, commonly used structures or patterns in the schemas, can enhance the schema information, and can simplify the work for machine learning component 110.” [0186] “At 908, additional ETL mappings can be generated based on the predicted ETL rules, the source schema, the target schema, and the extracted features, the additional ETL mappings providing additional definitions for extracting data from one or more tables of the source schema and loading the extracted data into one or more tables of the target schema. For example, the rules interpreter can be used to generate the additional ETL mappings. In some embodiments, the additional ETL mappings can be mapping expressions that define a relationship between one or more source columns of a source table and a target column of a target table”) and the data mapping suggestions are the predicted or additional mappings; Fig. 9 shows the generating of predicted ETL mappings (data mapping suggestions) in step 910.

Claims 10-11, 13 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Xian (US 2021/0232908).
Regarding claim 10, Xian discloses:
A computer-implemented method comprising: obtaining a first training dataset, wherein the first training dataset is a sentence related dataset; generating a first stage machine learning model based on the first training dataset, wherein the first stage machine learning model is trained to encode a sentence included in a dataset into a vector representation at least by ([0089] “FIG. 5 illustrates the dynamic schema determination system 106 training neural network encoder models to determine schema labels for columns. As shown in FIG. 5, the dynamic schema determination system 106 utilizes training data 502 and ground truth data 516 (e.g., from correctly labeled schema-column pairs) to train neural network encoder models 504 (which include a header neural network encoder 506 and a cell neural network encoder 508). In particular, as shown in FIG. 5, the dynamic schema determination system 106 provides the training data 502 (e.g., training columns and candidate schema labels) to the neural network encoder models 504 to generate column vector embeddings and schema vector embeddings (in accordance with one or more embodiments).” [0072]-[0087] disclose, in detail, the generating of column vector embeddings based on received columns which contain words or terms) and Fig. 5 shows a first stage in training of the neural network encoder models which generates column vector embeddings based on received training columns (first training dataset); Examiner notes that, under its BRI and in light of the specification, “sentence” is being interpreted as referring to at least a singular word or term. That is, the specification is silent regarding any explicit, or even suggested, definitions of a sentence;
obtaining a second training dataset, wherein the second training dataset includes schema metadata from various schema objects at least by ([0089] “FIG. 5 illustrates the dynamic schema determination system 106 training neural network encoder models to determine schema labels for columns. As shown in FIG. 5, the dynamic schema determination system 106 utilizes training data 502 and ground truth data 516 (e.g., from correctly labeled schema-column pairs) to train neural network encoder models 504 (which include a header neural network encoder 506 and a cell neural network encoder 508). In particular, as shown in FIG. 5, the dynamic schema determination system 106 provides the training data 502 (e.g., training columns and candidate schema labels) to the neural network encoder models 504 to generate column vector embeddings and schema vector embeddings (in accordance with one or more embodiments).”) and Fig. 5 shows the training of the neural network encoder models by obtaining schema vector embeddings generated based on received candidate schema labels (second training dataset); further the schema labels are the schema metadata from various schema objects;
generating a trained machine learning model based on the second training dataset and the first stage machine learning model at least by ([0090] “Using the cosine similarities (from the act 510), the dynamic schema determination system 106 determines similarity scores 512 as shown in FIG. 5. Moreover, as illustrated in FIG. 5, the dynamic schema determination system 106 utilizes the ground truth data 516 with the similarity scores 512 to calculate a ranking loss 514. The ranking loss 514 can describe the accuracy of the neural network encoder models 504 by comparing the similarity scores corresponding to correctly determined schema-column pairs (e.g., from the column vector embeddings and schema vector embeddings) and the similarity scores corresponding to incorrectly determined schema-column pairs. Then, the dynamic schema determination system 106 provides the ranking loss 514 to the neural network encoder models 504 to iteratively optimize parameters of the neural network encoder models 504 and generate updated similarity scores between schema-column pairs determined from the training data 502. Indeed, as mentioned above, the dynamic schema determination system 106 can iteratively determine a ranking loss with updated parameters for the neural network encoder models 504 to minimize the ranking loss 514 (e.g., to train the neural network encoder models 504 to map columns to schema labels).”),
wherein the trained machine learning model is trained to encode multiple metadata columns associated with a schema field into a single vector representation at least by ([0065] “the dynamic schema determination system 106 concatenates the vector embedding for the header label and the vector embedding for the at least one populated column cell to generate a column vector embedding for the column. Indeed, selecting and applying both a header neural network and a cell neural network to a column that includes a header column type and a cell column type is described in detail below (e.g., in relation to FIG. 4C)” [0087] “Then, as shown in FIG. 4C, the dynamic schema determination system 106 generates a column vector embedding in the act 422 by using the header label vector embedding (from the act 418) and the column cell vector embedding (from the act 420). In some embodiments, the dynamic schema determination system 106 can generate the column vector embedding in the act 422 by concatenating the header label vector embedding (from the act 418) and the column cell vector embedding (from the act 420). Specifically, the dynamic schema determination system 106 can concatenate the header label vector embedding (i.e., gsum(hc) or ggru(hc)) and the column cell vector embedding (i.e., (gcnn(xc)).” [0051]-[0058] disclose the identifying of columns within a digital dataset, determining of column input types, and selecting of a neural network encoder model based on column input types) and the columns from multiple different schema fields, as shown least by Fig. 3A, are the different columns within a digital dataset, which are each of different column input types, and are each inputted into different encoders based on the column input type. Further, Fig. 4C discloses the generating of a single vector embedding 422 based on a concatenation of the header label vector embedding and column cell vector embedding;
and providing the trained machine learning model for use in generating data mapping suggestions between a source schema and a target schema at least by ([0029] “For example, as outlined in greater detail below, the dynamic schema determination system can provide schema mapping user interfaces with suggested schema label elements together with digital data columns of digital datasets.” [0051] “the dynamic schema determination system 106 can generate schema labels for columns regardless of information availability within the columns. For instance, FIGS. 3A and 3B illustrate a flowchart of the dynamic schema determination system 106 determining a schema label for a column regardless of the column input type. Indeed, FIGS. 3A and 3B illustrates a flowchart of the dynamic schema determination system 106 identifying columns, determining column input types, selecting neural network encoder models, applying the neural network encoder models to identified columns to generate column vector embeddings, and comparing the column vector embeddings to schema vector embeddings to determine schema labels for the columns” [0109] “Additionally, as described above, the dynamic schema determination system 106 can determine schema labels for a dataset and display the schema labels, header labels (from the input column), and similarity scores in a graphical user interface. For example, FIG. 7C illustrates the dynamic schema determination system 106 providing determined schema labels, header labels (from the dataset 708), and similarity scores in a graphical user interface 713. In particular, as shown in FIG. 7C, the dynamic schema determination system 106 displays header labels 714, schema labels 716, and similarity scores 718 for each column from the dataset 708. Indeed, as shown in FIG. 7C, the dynamic schema determination system 106 displays the schema labels 716 after determining the schema labels in accordance with one or more embodiments herein. Furthermore, as illustrated in FIG. 718, the dynamic schema determination system 106 also displays the similarity scores 718 (between the schema-column pairs) after determining the similarity scores in accordance with one or more embodiments herein (e.g., as a confidence score between 0 and 1).”).
As per claim 11, claim 10 is incorporated, Xian further discloses:
wherein the trained machine learning model is trained to learn domain specific semantic representations relating to a schema domain at least by ([0034] “As used herein, the term “schema label” refers to a classification, descriptor, label, or identifier. For instance, a schema label can include a descriptor or label that describes a collection of digital data (e.g., a column or other data construct). In particular, the term “schema label” can refer to a classification, descriptor, or identifier that classifies content within a list or set of data (e.g., a semantically closed schema). For example, for a data column comprising a plurality of dates in different cells, the dynamic schema determination system can determine and apply a schema label of “birthdates” to the data column (e.g., as a new classifier or label for the column). In some embodiments, the dynamic schema determination system utilizes a plurality of schema labels in analyzing data, and automatically aligns imported data columns to the corresponding schema labels. A more detailed description of schema labels and corresponding examples are provided below in relation to the illustrative figures” [0089] “FIG. 5 illustrates the dynamic schema determination system 106 training neural network encoder models to determine schema labels for columns.”) and the domain specific semantic representations relating to a schema domain are the schema labels which can refer to schema classifications.
As per claim 13, claim 10 is incorporated, Xian further discloses:
wherein the second training dataset includes schema metadata that describes fields included in a schema at least by ([0034] “As used herein, the term “schema label” refers to a classification, descriptor, label, or identifier. For instance, a schema label can include a descriptor or label that describes a collection of digital data (e.g., a column or other data construct). In particular, the term “schema label” can refer to a classification, descriptor, or identifier that classifies content within a list or set of data (e.g., a semantically closed schema). For example, for a data column comprising a plurality of dates in different cells, the dynamic schema determination system can determine and apply a schema label of “birthdates” to the data column (e.g., as a new classifier or label for the column). In some embodiments, the dynamic schema determination system utilizes a plurality of schema labels in analyzing data, and automatically aligns imported data columns to the corresponding schema labels. A more detailed description of schema labels and corresponding examples are provided below in relation to the illustrative figures” [0089] “FIG. 5 illustrates the dynamic schema determination system 106 training neural network encoder models to determine schema labels for columns. As shown in FIG. 5, the dynamic schema determination system 106 utilizes training data 502 and ground truth data 516 (e.g., from correctly labeled schema-column pairs) to train neural network encoder models 504 (which include a header neural network encoder 506 and a cell neural network encoder 508). In particular, as shown in FIG. 5, the dynamic schema determination system 106 provides the training data 502 (e.g., training columns and candidate schema labels) to the neural network encoder models 504 to generate column vector embeddings and schema vector embeddings (in accordance with one or more embodiments)”).


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 4-9, 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Sassin (US 2019/0318272) in view of Xian (US 2021/0232908).
As per claim 4, claim 1 is incorporated, Sassin fails to disclose “wherein the representations are determined for fields of a dynamic object schema”
However, Xian teaches the above limitation at least by ([0022] “the dynamic schema determination system can generate schema vector embeddings for candidate schema labels using a header neural network encoder. Subsequently, the dynamic schema determination system can determine a schema label for the column by comparing the column vector embedding to schema vector embeddings (e.g., using cosine similarities)” [0027] “the dynamic schema determination system can determine schema labels for columns regardless of the availability of data within cells of the column (e.g., for any column input type). In addition, as discussed above, the dynamic schema determination system can train neural network encoder models using a pair-wise ranking loss to generate vector embedding of the column and candidate schema labels in the same latent space.”).
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the teaching of Xian into the teaching of Sassin because the references similarly disclose data mapping and/or vectorization. Consequently, one of ordinary skill in the art would be motivated to further modify the system as in Sassin to further include the dynamic schema determination system as in Xian in order to ultimately “improve efficiency” (Xian, [0028]).
As per claim 5, claim 1 is incorporated, Sassin further discloses:
wherein determining the representation for each field of the source schema and each field of the target schema comprises: providing the source schema metadata or the target schema metadata associated with each field individually as input to a machine learning model at least by ([0021] “feature extraction 106 is performed on source schema 102 and target schema 104. For example, feature extraction 106 can be used to provide machine learning component 110 the fine grain detail that enables ETL rules/mappings generation. In turn, learning component 110 can analyze at this fine granularity to predict the schema specific rules/mappings for source schema 102 and target schema 104. The features extracted for each schema can include table names, column names, table type, foreign keys between tables, self-referential tables, and a number of other features.” [0038]-[0039] disclose features (source and target schema metadata) that are extracted by the feature extraction module from source schema 102 target schema 104, as shown in Fig. 1, which can include at least table or column names (fields of a target schema)); further, the features are input to machine learning component (machine learning model);
Sassin fails to disclose “the machine learning model having been trained to generate the representation for each field based on one or more metadata columns associated with each field; and receiving, for each field as output from the machine learning model, a vector representation that describes a combination of the one or more metadata columns associated with each field as the representation for each field”
However, Xian teaches the following limitations, the machine learning model having been trained to generate the representation for each field based on one or more metadata columns associated with each field at least by ([0035] “As used herein, the term “neural network encoder model” (sometimes referred to as “neural network” or “neural network encoder”) refers to a machine learning model that can be tuned (e.g., trained) based on inputs to approximate unknown functions. In particular, the term “neural network encoder model” can refer to a model of interconnected layers that communicate and analyze attributes at varying degrees of abstraction to learn to approximate functions and generate outputs based on a plurality of inputs provided to the model.” [0136] “As illustrated in FIG. 9, the series of acts 900 includes an act 930 of generating a column vector embedding. In particular, the act 930 can include generating a column vector embedding for a column utilizing a selected neural network encoder model.”) and the machine learning model is the neural network encoder model which generates the column vector embeddings for columns (representations for each field);
and receiving, for each field as output from the machine learning model, a vector representation that describes a combination of the one or more metadata columns associated with each field as the representation for each field at least by ([0035] “a neural network encoder model can analyze attributes of a column (e.g., a header and/or populated column cell) and output a vector embedding (or latent vector) for the column in a latent space” [0136] “Moreover, the act 930 can include generating a column vector embedding for a column by: generating a vector embedding for a header label utilizing a header neural network encoder, generating a vector embedding for a populated column cell utilizing a cell neural network encoder, and concatenating the vector embedding for the header label and the vector embedding for the populated column cell. Furthermore, the act 930 can include generating an additional schema vector embedding of an additional schema label utilizing a header neural network encoder”).
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the teaching of Xian into the teaching of Sassin because the references similarly disclose data mapping/similarity and/or vectorization. Consequently, one of ordinary skill in the art would be motivated to further modify the system as in Sassin to further include the training of the dynamic schema determination system as in Xian in order to ultimately “improve efficiency” (Xian, [0028]).
As per claim 6, claim 5 is incorporated, Sassin fails to disclose “wherein the machine learning model is trained to identify relevant information from multiple metadata columns associated with a schema field to generate a single vector representation for the schema field”
However, Xian teaches the above limitations at least by ([0065] “the dynamic schema determination system 106 concatenates the vector embedding for the header label and the vector embedding for the at least one populated column cell to generate a column vector embedding for the column. Indeed, selecting and applying both a header neural network and a cell neural network to a column that includes a header column type and a cell column type is described in detail below (e.g., in relation to FIG. 4C)” [0087] “Then, as shown in FIG. 4C, the dynamic schema determination system 106 generates a column vector embedding in the act 422 by using the header label vector embedding (from the act 418) and the column cell vector embedding (from the act 420). In some embodiments, the dynamic schema determination system 106 can generate the column vector embedding in the act 422 by concatenating the header label vector embedding (from the act 418) and the column cell vector embedding (from the act 420). Specifically, the dynamic schema determination system 106 can concatenate the header label vector embedding (i.e., gsum(hc) or ggru(hc)) and the column cell vector embedding (i.e., (gcnn(xc)).” [0051]-[0058] disclose the identifying of columns within a digital dataset, determining of column input types, and selecting of a neural network encoder model based on column input types) and the columns from multiple different schema fields, as shown least by Fig. 3A, are the different columns within a digital dataset, which are each of different column input types, and are each inputted into different encoders based on the column input type. Further, Fig. 4C discloses the generating of a single vector embedding 422 based on a concatenation of the header label vector embedding and column cell vector embedding.
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the teaching of Xian into the teaching of Sassin because the references similarly disclose data mapping/similarity and/or vectorization. Consequently, one of ordinary skill in the art would be motivated to further modify the system as in Sassin to further include the training of the dynamic schema determination system as in Xian in order to ultimately “improve efficiency” (Xian, [0028]).
As per claim 7, claim 5 is incorporated, Sassin fails to disclose “wherein training of the machine learning model comprises: obtaining a first training dataset; generating a first stage machine learning model based on the first training dataset, wherein the first stage machine learning model is trained to encode a sentence included in a dataset into a vector representation; obtaining a second training dataset, wherein the second training dataset includes schema metadata from various schema objects; and generating a trained machine learning model based on the second training dataset and the first stage machine learning model, wherein the trained machine learning model is trained to encode sentences associated with multiple metadata columns of a schema field into a single vector representation”
However, Xian teaches the following limitations, wherein training of the machine learning model comprises: obtaining a first training dataset; generating a first stage machine learning model based on the first training dataset, wherein the first stage machine learning model is trained to encode a sentence included in a dataset into a vector representation at least by ([0089] “FIG. 5 illustrates the dynamic schema determination system 106 training neural network encoder models to determine schema labels for columns. As shown in FIG. 5, the dynamic schema determination system 106 utilizes training data 502 and ground truth data 516 (e.g., from correctly labeled schema-column pairs) to train neural network encoder models 504 (which include a header neural network encoder 506 and a cell neural network encoder 508). In particular, as shown in FIG. 5, the dynamic schema determination system 106 provides the training data 502 (e.g., training columns and candidate schema labels) to the neural network encoder models 504 to generate column vector embeddings and schema vector embeddings (in accordance with one or more embodiments).” [0072]-[0087] disclose, in detail, the generating of column vector embeddings based on received columns which contain words or terms) and Fig. 5 shows a first stage in training of the neural network encoder models which generates column vector embeddings based on received training columns (first training dataset); Examiner notes that, under its BRI and in light of the specification, “sentence” is being interpreted as referring to at least a singular word or term. That is, the specification is silent regarding any explicit, or even suggested, definitions of a sentence;
obtaining a second training dataset, wherein the second training dataset includes schema metadata from various schema objects at least by ([0034] discloses [0089] “FIG. 5 illustrates the dynamic schema determination system 106 training neural network encoder models to determine schema labels for columns. As shown in FIG. 5, the dynamic schema determination system 106 utilizes training data 502 and ground truth data 516 (e.g., from correctly labeled schema-column pairs) to train neural network encoder models 504 (which include a header neural network encoder 506 and a cell neural network encoder 508). In particular, as shown in FIG. 5, the dynamic schema determination system 106 provides the training data 502 (e.g., training columns and candidate schema labels) to the neural network encoder models 504 to generate column vector embeddings and schema vector embeddings (in accordance with one or more embodiments).” [0034] discloses that the schema label can be for a collection or list of data, such as a plurality of data within a column) and Fig. 5 shows the training of the neural network encoder models by obtaining schema vector embeddings generated based on received candidate schema labels (second training dataset); further the schema labels are the schema metadata from various schema objects;
and generating a trained machine learning model based on the second training dataset and the first stage machine learning model at least by ([0090] “Using the cosine similarities (from the act 510), the dynamic schema determination system 106 determines similarity scores 512 as shown in FIG. 5. Moreover, as illustrated in FIG. 5, the dynamic schema determination system 106 utilizes the ground truth data 516 with the similarity scores 512 to calculate a ranking loss 514. The ranking loss 514 can describe the accuracy of the neural network encoder models 504 by comparing the similarity scores corresponding to correctly determined schema-column pairs (e.g., from the column vector embeddings and schema vector embeddings) and the similarity scores corresponding to incorrectly determined schema-column pairs. Then, the dynamic schema determination system 106 provides the ranking loss 514 to the neural network encoder models 504 to iteratively optimize parameters of the neural network encoder models 504 and generate updated similarity scores between schema-column pairs determined from the training data 502. Indeed, as mentioned above, the dynamic schema determination system 106 can iteratively determine a ranking loss with updated parameters for the neural network encoder models 504 to minimize the ranking loss 514 (e.g., to train the neural network encoder models 504 to map columns to schema labels).”),
wherein the trained machine learning model is trained to encode sentences associated with multiple metadata columns of a schema field into a single vector representation at least by ([0065] “the dynamic schema determination system 106 concatenates the vector embedding for the header label and the vector embedding for the at least one populated column cell to generate a column vector embedding for the column. Indeed, selecting and applying both a header neural network and a cell neural network to a column that includes a header column type and a cell column type is described in detail below (e.g., in relation to FIG. 4C)” [0087] “Then, as shown in FIG. 4C, the dynamic schema determination system 106 generates a column vector embedding in the act 422 by using the header label vector embedding (from the act 418) and the column cell vector embedding (from the act 420). In some embodiments, the dynamic schema determination system 106 can generate the column vector embedding in the act 422 by concatenating the header label vector embedding (from the act 418) and the column cell vector embedding (from the act 420). Specifically, the dynamic schema determination system 106 can concatenate the header label vector embedding (i.e., gsum(hc) or ggru(hc)) and the column cell vector embedding (i.e., (gcnn(xc)).” [0051]-[0058] disclose the identifying of columns within a digital dataset, determining of column input types, and selecting of a neural network encoder model based on column input types) and the columns from multiple different schema fields, as shown least by Fig. 3A, are the different columns within a digital dataset, which are each of different column input types, and are each inputted into different encoders based on the column input type. Further, Fig. 4C discloses the generating of a single vector embedding 422 based on a concatenation of the header label vector embedding and column cell vector embedding.
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the teaching of Xian into the teaching of Sassin because the references similarly disclose data mapping/similarity and/or vectorization. Consequently, one of ordinary skill in the art would be motivated to further modify the system as in Sassin to further include the training of the dynamic schema determination system as in Xian in order to ultimately “improve efficiency” (Xian, [0028]).
As per claim 8, claim 1 is incorporated, Sassin fails to disclose “further comprising: generating a confidence score for each field of the target schema as compared to each field of the source schema, wherein the confidence score for each field of the target schema is based on comparing the representations for each field of the source schema to the representation for the field of the target schema; and generating data mapping suggestions between the source schema and the target schema for the fields of the target schema based, at least in part, on the confidence scores”
However, Xian teaches the following limitations, further comprising: generating a confidence score for each field of the target schema as compared to each field of the source schema, wherein the confidence score for each field of the target schema is based on comparing the representations for each field of the source schema to the representation for the field of the target schema at least by ([0039] “As used herein, the term “similarity score” (sometimes referred to as a “confidence score”) refers to one or more values that quantify a measure of similarity between two objects. In particular, the term “similarity score” can refer to a value that quantifies a measure of similarity between a column (or column header) and a schema label using a cosine similarity between vector embeddings of the column and the schema label. For example, a similarity score can include a value between 0 and 1 that represents how similar a column is to a particular schema label (where a higher value represents a greater similarity between the column and schema label).”);
and generating data mapping suggestions between the source schema and the target schema for the fields of the target schema based, at least in part, on the confidence scores at least by ([0109] “the dynamic schema determination system 106 can determine schema labels for a dataset and display the schema labels, header labels (from the input column), and similarity scores in a graphical user interface. For example, FIG. 7C illustrates the dynamic schema determination system 106 providing determined schema labels, header labels (from the dataset 708), and similarity scores in a graphical user interface 713. In particular, as shown in FIG. 7C, the dynamic schema determination system 106 displays header labels 714, schema labels 716, and similarity scores 718 for each column from the dataset 708. Indeed, as shown in FIG. 7C, the dynamic schema determination system 106 displays the schema labels 716 after determining the schema labels in accordance with one or more embodiments herein. Furthermore, as illustrated in FIG. 718, the dynamic schema determination system 106 also displays the similarity scores 718 (between the schema-column pairs) after determining the similarity scores in accordance with one or more embodiments herein (e.g., as a confidence score between 0 and 1).”).
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the teaching of Xian into the teaching of Sassin because the references similarly disclose data mapping/similarity and/or vectorization. Consequently, one of ordinary skill in the art would be motivated to further modify the system as in Sassin to further include the training of the dynamic schema determination system as in Xian in order to ultimately “improve efficiency” (Xian, [0028]).
As per claim 9, claim 8 is incorporated, Xian further discloses:
wherein generating the confidence score for each field of the target schema comprises calculating an overall confidence score between two fields using cosine similarity at least by ([0025] “the dynamic schema determination system can utilize cosine similarities between the column vector embedding and the schema vector embeddings to determine similarity (or confidence) scores between the column and particular schema label pairs.” [0039] “As used herein, the term “similarity score” (sometimes referred to as a “confidence score”) refers to one or more values that quantify a measure of similarity between two objects. In particular, the term “similarity score” can refer to a value that quantifies a measure of similarity between a column (or column header) and a schema label using a cosine similarity between vector embeddings of the column and the schema label. For example, a similarity score can include a value between 0 and 1 that represents how similar a column is to a particular schema label (where a higher value represents a greater similarity between the column and schema label).” [0109] “the dynamic schema determination system 106 can determine schema labels for a dataset and display the schema labels, header labels (from the input column), and similarity scores in a graphical user interface. For example, FIG. 7C illustrates the dynamic schema determination system 106 providing determined schema labels, header labels (from the dataset 708), and similarity scores in a graphical user interface 713. In particular, as shown in FIG. 7C, the dynamic schema determination system 106 displays header labels 714, schema labels 716, and similarity scores 718 for each column from the dataset 708. Indeed, as shown in FIG. 7C, the dynamic schema determination system 106 displays the schema labels 716 after determining the schema labels in accordance with one or more embodiments herein. Furthermore, as illustrated in FIG. 718, the dynamic schema determination system 106 also displays the similarity scores 718 (between the schema-column pairs) after determining the similarity scores in accordance with one or more embodiments herein (e.g., as a confidence score between 0 and 1).”).
As per claim 17, claim 14 is incorporated, Sassin further discloses:
wherein the program instructions programmed to determine, for each field of the source schema and each field of the target schema, the representation for each field, further comprise: program instructions programmed to provide the source schema metadata or the target schema metadata associated with each field individually as input to a machine learning model at least by ([0021] “feature extraction 106 is performed on source schema 102 and target schema 104. For example, feature extraction 106 can be used to provide machine learning component 110 the fine grain detail that enables ETL rules/mappings generation. In turn, learning component 110 can analyze at this fine granularity to predict the schema specific rules/mappings for source schema 102 and target schema 104. The features extracted for each schema can include table names, column names, table type, foreign keys between tables, self-referential tables, and a number of other features.” [0030] “System 210 may include memory 214 for storing information and instructions for execution by processor 222.” [0038]-[0039] disclose features (source and target schema metadata) that are extracted by the feature extraction module from source schema 102 target schema 104, as shown in Fig. 1, which can include at least table or column names (fields of a target schema)); further, the features are input to machine learning component (machine learning model);
Sassin fails to disclose “the machine learning model having been trained to generate the representation for each field based on one or more metadata columns associated with each field; and program instructions programmed to receive, for each field as output from the machine learning model, a vector representation that describes a combination of the one or more metadata columns associated with each field as the representation for each field”
However, Xian teaches the following limitations, the machine learning model having been trained to generate the representation for each field based on one or more metadata columns associated with each field at least by ([0035] “As used herein, the term “neural network encoder model” (sometimes referred to as “neural network” or “neural network encoder”) refers to a machine learning model that can be tuned (e.g., trained) based on inputs to approximate unknown functions. In particular, the term “neural network encoder model” can refer to a model of interconnected layers that communicate and analyze attributes at varying degrees of abstraction to learn to approximate functions and generate outputs based on a plurality of inputs provided to the model.” [0136] “As illustrated in FIG. 9, the series of acts 900 includes an act 930 of generating a column vector embedding. In particular, the act 930 can include generating a column vector embedding for a column utilizing a selected neural network encoder model.”) and the machine learning model is the neural network encoder model which generates the column vector embeddings for columns (representations for each field);
and program instructions programmed to receive, for each field as output from the machine learning model, a vector representation that describes a combination of the one or more metadata columns associated with each field as the representation for each field at least by ([0035] “a neural network encoder model can analyze attributes of a column (e.g., a header and/or populated column cell) and output a vector embedding (or latent vector) for the column in a latent space” [0136] “Moreover, the act 930 can include generating a column vector embedding for a column by: generating a vector embedding for a header label utilizing a header neural network encoder, generating a vector embedding for a populated column cell utilizing a cell neural network encoder, and concatenating the vector embedding for the header label and the vector embedding for the populated column cell. Furthermore, the act 930 can include generating an additional schema vector embedding of an additional schema label utilizing a header neural network encoder”).
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the teaching of Xian into the teaching of Sassin because the references similarly disclose data mapping/similarity and/or vectorization. Consequently, one of ordinary skill in the art would be motivated to further modify the system as in Sassin to further include the training of the dynamic schema determination system as in Xian in order to ultimately “improve efficiency” (Xian, [0028]).
As per claim 18, claim 17 is incorporated, Sassin fails to disclose “wherein the machine learning model is trained to identify relevant information from multiple metadata columns associated with a schema field to generate a single vector representation for the schema field”
However, Xian teaches the above limitations at least by ([0065] “the dynamic schema determination system 106 concatenates the vector embedding for the header label and the vector embedding for the at least one populated column cell to generate a column vector embedding for the column. Indeed, selecting and applying both a header neural network and a cell neural network to a column that includes a header column type and a cell column type is described in detail below (e.g., in relation to FIG. 4C)” [0087] “Then, as shown in FIG. 4C, the dynamic schema determination system 106 generates a column vector embedding in the act 422 by using the header label vector embedding (from the act 418) and the column cell vector embedding (from the act 420). In some embodiments, the dynamic schema determination system 106 can generate the column vector embedding in the act 422 by concatenating the header label vector embedding (from the act 418) and the column cell vector embedding (from the act 420). Specifically, the dynamic schema determination system 106 can concatenate the header label vector embedding (i.e., gsum(hc) or ggru(hc)) and the column cell vector embedding (i.e., (gcnn(xc)).” [0051]-[0058] disclose the identifying of columns within a digital dataset, determining of column input types, and selecting of a neural network encoder model based on column input types) and the columns from multiple different schema fields, as shown least by Fig. 3A, are the different columns within a digital dataset, which are each of different column input types, and are each inputted into different encoders based on the column input type. Further, Fig. 4C discloses the generating of a single vector embedding 422 based on a concatenation of the header label vector embedding and column cell vector embedding.
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the teaching of Xian into the teaching of Sassin because the references similarly disclose data mapping/similarity and/or vectorization. Consequently, one of ordinary skill in the art would be motivated to further modify the system as in Sassin to further include the training of the dynamic schema determination system as in Xian in order to ultimately “improve efficiency” (Xian, [0028]).
As per claim 19, claim 14 is incorporated, Sassin fails to disclose “the computer readable storage medium having further stored thereon: program instructions programmed to generate a confidence score for each field of the target schema as compared to each field of the source schema, wherein the confidence score for each field of the target schema is based on comparing the representations for each field of the source schema to the representation for the field of the target schema; and program instructions programmed to generate data mapping suggestions between the source schema and the target schema for the fields of the target schema based, at least in part, on the confidence scores”
However, Xian teaches the following limitations, the computer readable storage medium having further stored thereon: program instructions programmed to generate a confidence score for each field of the target schema as compared to each field of the source schema, wherein the confidence score for each field of the target schema is based on comparing the representations for each field of the source schema to the representation for the field of the target schema at least by ([0039] “As used herein, the term “similarity score” (sometimes referred to as a “confidence score”) refers to one or more values that quantify a measure of similarity between two objects. In particular, the term “similarity score” can refer to a value that quantifies a measure of similarity between a column (or column header) and a schema label using a cosine similarity between vector embeddings of the column and the schema label. For example, a similarity score can include a value between 0 and 1 that represents how similar a column is to a particular schema label (where a higher value represents a greater similarity between the column and schema label).”);
and program instructions programmed to generate data mapping suggestions between the source schema and the target schema for the fields of the target schema based, at least in part, on the confidence scores at least by ([0109] “the dynamic schema determination system 106 can determine schema labels for a dataset and display the schema labels, header labels (from the input column), and similarity scores in a graphical user interface. For example, FIG. 7C illustrates the dynamic schema determination system 106 providing determined schema labels, header labels (from the dataset 708), and similarity scores in a graphical user interface 713. In particular, as shown in FIG. 7C, the dynamic schema determination system 106 displays header labels 714, schema labels 716, and similarity scores 718 for each column from the dataset 708. Indeed, as shown in FIG. 7C, the dynamic schema determination system 106 displays the schema labels 716 after determining the schema labels in accordance with one or more embodiments herein. Furthermore, as illustrated in FIG. 718, the dynamic schema determination system 106 also displays the similarity scores 718 (between the schema-column pairs) after determining the similarity scores in accordance with one or more embodiments herein (e.g., as a confidence score between 0 and 1).”).
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the teaching of Xian into the teaching of Sassin because the references similarly disclose data mapping/similarity and/or vectorization. Consequently, one of ordinary skill in the art would be motivated to further modify the system as in Sassin to further include the training of the dynamic schema determination system as in Xian in order to ultimately “improve efficiency” (Xian, [0028]).
As per claim 20, claim 19 is incorporated, Xian further discloses:
wherein generating the confidence score for each field of the target schema comprises calculating an overall confidence score between two fields using cosine similarity at least by ([0025] “the dynamic schema determination system can utilize cosine similarities between the column vector embedding and the schema vector embeddings to determine similarity (or confidence) scores between the column and particular schema label pairs.” [0039] “As used herein, the term “similarity score” (sometimes referred to as a “confidence score”) refers to one or more values that quantify a measure of similarity between two objects. In particular, the term “similarity score” can refer to a value that quantifies a measure of similarity between a column (or column header) and a schema label using a cosine similarity between vector embeddings of the column and the schema label. For example, a similarity score can include a value between 0 and 1 that represents how similar a column is to a particular schema label (where a higher value represents a greater similarity between the column and schema label).” [0109] “the dynamic schema determination system 106 can determine schema labels for a dataset and display the schema labels, header labels (from the input column), and similarity scores in a graphical user interface. For example, FIG. 7C illustrates the dynamic schema determination system 106 providing determined schema labels, header labels (from the dataset 708), and similarity scores in a graphical user interface 713. In particular, as shown in FIG. 7C, the dynamic schema determination system 106 displays header labels 714, schema labels 716, and similarity scores 718 for each column from the dataset 708. Indeed, as shown in FIG. 7C, the dynamic schema determination system 106 displays the schema labels 716 after determining the schema labels in accordance with one or more embodiments herein. Furthermore, as illustrated in FIG. 718, the dynamic schema determination system 106 also displays the similarity scores 718 (between the schema-column pairs) after determining the similarity scores in accordance with one or more embodiments herein (e.g., as a confidence score between 0 and 1).”).

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Xian (US 2021/0232908) in view of Xu (CN 110532399).
As per claim 12, claim 10 is incorporated, Xian further discloses:
…schema mapping… at least by ([0006] “the disclosed systems can determine a schema label for the column using a hybrid neural network encoder model trained using a ranking loss and historical matching records to map a column to a schema label.”).
Xian fails to disclose “wherein generating the trained machine learning model for use in generating data mapping does not require any prior … data”
However, Xu teaches the above limitation at least by ([Abstract] “The method can identify newly appearing entity types, … does not need to label training data in advance, and also meets the requirement of knowledge graph automatic updating at the same time”).
Therefore, it would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate the teaching of Xu into the teaching of Xian because the references similarly disclose data mapping. Consequently, one of ordinary skill in the art would be motivated to further modify the system as in Xian to further include the unsupervised knowledge entity classification method and training as in Xu in order to realize the “beneficial effect of the present invention” which “does not need to pre-label training data” (Xu, [0045]).

Conclusion
The following prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Kadarundalagi Raghura (US 2022/0100772) discloses context-sensitive linking of entities to private databases;
Baker (US 2020/0051550) discloses a multi-stage machine learning and recognition system comprises multiple individual machine learning systems arranged in multiple stages.	

Any inquiry concerning this communication or earlier communications from the examiner should be directed to WILLIAM P BARTLETT whose telephone number is (469)295-9085.  The examiner can normally be reached on M-Th 11:30-8:30, F 11-3.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.  
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Usmaan Saeed can be reached on 5712724046.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/WILLIAM P BARTLETT/
Examiner, Art Unit 2169