DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Remarks
Examiner acknowledges applicants’ reply dated February 12, 2021, including arguments and amendments.

Examiner acknowledges applicants’ amendments to claims 6 and 13, overcoming the objections to those claims for reciting informalities. The objections are hereby withdrawn.

Claims 1 – 18 remain pending.

Information Disclosure Statement
The information disclosure statement (IDS) was submitted on February 12, 2021. The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1 – 4, 8 – 11, and 15 – 18 are rejected under 35 U.S.C. 103 as being unpatentable over Hudis, et al., U.S. PG-Pub. No. 2013/0110792 (hereafter, “Hudis”), in view of Liu, et al., U.S. Pat. No. 6,216,131 (hereafter, “Liu”).

As to Claim 1, Hudis discloses: a method comprising:
storing, by one or more input data sources, one or more reference data sets ([0080], referring to the system obtaining a structured dataset; and [0122], referring to the system’s access to curated datasets);
extracting, by an ingest engine of a cloud computing infrastructure system, a first data set from the one or more reference data sets stored in the one or more input data sources ([0081], referring to the system casting a dataset in to a samplex, with Hudis’ dataset corresponding to the claimed reference data set and Hudis’ samplex corresponding to the claimed extracted data set);
receiving, by an input unit, a second data set from a user ([0004], referring to the system obtaining a structured dataset in a user work context);
calculating, by a similarity metric module, a similarity metric value between a second subset of data of the second data set and a first subset of data of the first data set ([0082], “Step 306 may be accomplished by comparing simplexes of candidate datasets element-by-element with the samplex of an original dataset of interest.”);
[0082], “During an identifying step 306, an embodiment identifies one or more matching datasets 214 based on a samplex 210;” and [0058], “Depending on the embodiment, a samplex 210 may be cast from one or more of the following characteristics: individual attributes (a.k.a. labels) and their data types…”);
obtaining, by a recommendation engine from the similarity metric module, the determined category name of the first subset of data of the first data set ([0058], “Depending on the embodiment, a samplex 210 may be cast from one or more of the following characteristics: individual attributes (a.k.a. labels) and their data types…”); and
recommending, by the recommendation engine, the determined category name to the user on a user interface for the second subset of data of the second data set ([0073], referring to the generation of a graphical user interface, displaying details of the matching dataset).

Hudis does not appear to explicitly disclose: wherein the category name identifies a category type of the first subset of data.

Liu discloses: wherein the category name identifies a category type of the first subset of data (col. 7, lines 32 - 42, “If a field cannot be matched based on name alone (e.g., an identical match), the methodology of the present invention employs rules to determine a type for the field, based on the field's name.”).

It would have been obvious to one having ordinary skill in this art before the effective filing date of the invention, having the teachings of Hudis and Liu before him/her, to have modified the category name determination from Hudis with the field type from Liu, as 

As to Claim 2, Hudis, as modified, discloses: wherein the category name comprises one from a plurality of predetermined category names identified in the one or more reference data sets stored in the input data sources (Hudis, [0058], referring to a list of several example predetermined categories).

As to Claim 3, Hudis, as modified, discloses: wherein the first subset of data comprises a column of data in a table in the first data set (Hudis, [0058], showing the characteristics of the dataset from which a samplex may be cast).

As to Claim 4, Hudis, as modified, discloses: the category name comprises a category identifying a type of the column of data in the table in the first data set (Hudis, [0066], “In some embodiments, each typed attribute includes a column name 202 and an associated data type 204.”).

As to Claim 8, Hudis discloses: a data enrichment system comprising:
one or more input data sources configured to store one or more reference data sets ([0080], referring to the system obtaining a structured dataset; and [0122], referring to the system’s access to curated datasets); and
a cloud computing infrastructure system comprising:
an ingest engine that extracts a first data set from the one or more reference data sets stored in the one or more input data sources ([0081], referring to the system casting a dataset in to a samplex, with Hudis’ dataset corresponding to the claimed reference data set and Hudis’ samplex corresponding to the claimed extracted data set);
[0004], referring to the system obtaining a structured dataset in a user work context);
a similarity metric module, comprising a processor and a memory (Fig. 1, items 110 and 112), configured to: calculate a similarity metric value between a second subset of data of the second data set and a first subset of data of the first data set ([0082], “Step 306 may be accomplished by comparing simplexes of candidate datasets element-by-element with the samplex of an original dataset of interest.”); and
in response to the similarity metric value being a predetermined ratio, determining a category name of the first subset of data of the first data set ([0082], “During an identifying step 306, an embodiment identifies one or more matching datasets 214 based on a samplex 210;” and [0058], “Depending on the embodiment, a samplex 210 may be cast from one or more of the following characteristics: individual attributes (a.k.a. labels) and their data types…”); and
a recommendation engine configured to obtain, from the similarity metric module, the determined category name of the first subset of data of the first data set ([0058], “Depending on the embodiment, a samplex 210 may be cast from one or more of the following characteristics: individual attributes (a.k.a. labels) and their data types…”) and recommending the determined category name to the user on a user interface for the second subset of data of the second data set ([0073], referring to the generation of a graphical user interface, displaying details of the matching dataset).

Hudis does not appear to explicitly disclose: wherein the category name identifies a category type of the first subset of data.

Liu discloses: wherein the category name identifies a category type of the first subset of data (col. 7, lines 32 - 42, “If a field cannot be matched based on name alone (e.g., an identical match), the methodology of the present invention employs rules to determine a type for the field, based on the field's name.”).

It would have been obvious to one having ordinary skill in this art before the effective filing date of the invention, having the teachings of Hudis and Liu before him/her, to have modified the category name determination from Hudis with the field type from Liu, as suggested by Hudis at [0056], “Each typed attribute 130 has a name 202, such as a column name, and a data type 204, such as string, real, integer…” 

As to Claim 9, Hudis, as modified, discloses: wherein the category name comprises one from a plurality of predetermined category names identified in the one or more reference data sets stored in the input data sources (Hudis, [0058], referring to a list of several example predetermined categories).

As to Claim 10, Hudis, as modified, discloses: wherein the first subset of data comprises a column of data in a table in the first data set (Hudis, [0058], showing the characteristics of the dataset from which a samplex may be cast).

As to Claim 11, Hudis, as modified, discloses: wherein the category name comprises a category identifying a type of the column of data in the table in the first data set (Hudis, [0066], “In some embodiments, each typed attribute includes a column name 202 and an associated data type 204.”).

As to Claim 15, Hudis discloses: a non-transitory computer-readable medium comprising instructions (Fig. 1, item 114) which, when executed by one or more processors, causes the one or more processors to:
[0080], referring to the system obtaining a structured dataset; and [0122], referring to the system’s access to curated datasets);
extract, by an ingest engine of a cloud computing infrastructure system, a first data set from the one or more reference data sets stored in the one or more input data sources ([0081], referring to the system casting a dataset in to a samplex, with Hudis’ dataset corresponding to the claimed reference data set and Hudis’ samplex corresponding to the claimed extracted data set);
receive, by an input unit, a second data set from a user ([0004], referring to the system obtaining a structured dataset in a user work context);
calculate, by a similarity metric module, a similarity metric value between a second subset of data of the second data set and a first subset of data of the first data set ([0082], “Step 306 may be accomplished by comparing simplexes of candidate datasets element-by-element with the samplex of an original dataset of interest.”);
in response to the similarity metric value being a predetermined ratio, determine a category name of the first subset of data of the first data set ([0082], “During an identifying step 306, an embodiment identifies one or more matching datasets 214 based on a samplex 210;” and [0058], “Depending on the embodiment, a samplex 210 may be cast from one or more of the following characteristics: individual attributes (a.k.a. labels) and their data types…”);
obtain, by a recommendation engine from the similarity metric module, the determined category name of the first subset of data of the first data set ([0058], “Depending on the embodiment, a samplex 210 may be cast from one or more of the following characteristics: individual attributes (a.k.a. labels) and their data types…”); and
recommend, by the recommendation engine, the determined category name to the user on a user interface for the second subset of data of the second data set ([0073], referring to the generation of a graphical user interface, displaying details of the matching dataset).

Hudis does not appear to explicitly disclose: wherein the category name identifies a category type of the first subset of data.

Liu discloses: wherein the category name identifies a category type of the first subset of data (col. 7, lines 32 - 42, “If a field cannot be matched based on name alone (e.g., an identical match), the methodology of the present invention employs rules to determine a type for the field, based on the field's name.”).

It would have been obvious to one having ordinary skill in this art before the effective filing date of the invention, having the teachings of Hudis and Liu before him/her, to have modified the category name determination from Hudis with the field type from Liu, as suggested by Hudis at [0056], “Each typed attribute 130 has a name 202, such as a column name, and a data type 204, such as string, real, integer…” 

As to Claim 16, Hudis, as modified, discloses: wherein the category name comprises one from a plurality of predetermined category names identified in the one or more reference data sets stored in the input data sources (Hudis, [0058], referring to a list of several example predetermined categories).

As to Claim 17, Hudis, as modified, discloses: wherein the first subset of data comprises a column of data in a table in the first data set (Hudis, [0058], showing the characteristics of the dataset from which a samplex may be cast).

Claim 18, Hudis, as modified, discloses: wherein the category name comprises a category identifying a type of the column of data in the table in the first data set (Hudis, [0066], “In some embodiments, each typed attribute includes a column name 202 and an associated data type 204.”).

Claims 5, 6, 12 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Hudis, as modified by Liu and applied to claims 1 and 8, further in view of Austermann, U.S. PG-Pub. No. 2012/0117076 (hereafter, “Austermann”).

As to Claim 5, Hudis, as modified by Liu, does not appear to explicitly disclose: the similarity metric value comprises a numerical value between 0.0-1.0.

Austermann discloses: the similarity metric value comprises a numerical value between 0.0-1.0 ([0049], “For example, if the full length of the given query value exactly matches a record’s data field value, a similarity score of 1.0 may be assigned to that record; if no unigram of the given query value matches a record’s data field value, a similarity score of 0.0 may be assigned to that record. Partial matches may fall in the range between 0.0 and 1.0.”).

It would have been obvious to one having ordinary skill in this art before the effective filing date of the invention, having the teachings of Hudis, Liu, and Austermann before him/her, to have further modified the similarity metric value of Hudis with the range 0.0 – 1.0 from Austermann, in order to create a convenient metric by which data can be compared for similarity, as suggested by Hudis at [0069], referring to identifying step 306 including element-by-element comparison of datasets within a specified tolerance.

Claim 6, Hudis, as further modified by Austermann, discloses: wherein the similarity metric value having the numerical value of 1.0 indicates a higher degree of similarity between the second subset of data of the second data set and the first subset of data of the first data set, than the similarity metric value having the numerical value of 0.0 (Austermann, [0049], “For example, if the full length of the given query value exactly matches a record’s data field value, a similarity score of 1.0 may be assigned to that record; if no unigram of the given query value matches a record’s data field value, a similarity score of 0.0 may be assigned to that record. Partial matches may fall in the range between 0.0 and 1.0.”).

As to Claim 12, Hudis, as modified by Liu, does not appear to explicitly disclose: the similarity metric value comprises a numerical value between 0.0-1.0.

Austermann discloses: the similarity metric value comprises a numerical value between 0.0-1.0 ([0049], “For example, if the full length of the given query value exactly matches a record’s data field value, a similarity score of 1.0 may be assigned to that record; if no unigram of the given query value matches a record’s data field value, a similarity score of 0.0 may be assigned to that record. Partial matches may fall in the range between 0.0 and 1.0.”).

It would have been obvious to one having ordinary skill in this art before the effective filing date of the invention, having the teachings of Hudis Liu, and Austermann before him/her, to have further modified the similarity metric value of Hudis with the range 0.0 – 1.0 from Austermann, in order to create a convenient metric by which data can be compared for similarity, as suggested by Hudis at [0069], referring to identifying step 306 including element-by-element comparison of datasets within a specified tolerance.

As to Claim 13, Hudis, as further modified by Austermann, discloses: wherein the similarity metric value having the numerical value of 1.0 indicates a higher degree of similarity between the second subset of data of the second data set and the first subset of data of the first data set, than the similarity metric value having the numerical value of 0.0 (Austermann, [0049], “For example, if the full length of the given query value exactly matches a record’s data field value, a similarity score of 1.0 may be assigned to that record; if no unigram of the given query value matches a record’s data field value, a similarity score of 0.0 may be assigned to that record. Partial matches may fall in the range between 0.0 and 1.0.”).

Claims 7 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Hudis, as modified by Liu and applied to claims 1 and 8, further in view of Brill, U.S. PG-Pub. No. 2004/0260695 (hereafter, “Brill”).

As to Claim 7, Hudis, as modified by Liu, does not appear to explicitly disclose: wherein the similarity metric value is calculated using at least one a Jaccard index, a Dice-Sorensen index, a Tversky index, a Tanimoto metric, and a cosine similarity metric.

Brill discloses: wherein the similarity metric value is calculated using at least one a Jaccard index ([0057], referring to the use of a Jaccard coefficient as a measurement of similarity), a Dice-Sorensen index, a Tversky index, a Tanimoto metric, and a cosine similarity metric ([0057], referring to the use of a cosine distance as a similarity measurement).



As to Claim 14, Hudis, as modified by Liu, does not appear to explicitly disclose: wherein the similarity metric value is calculated using at least one a Jaccard index, a Dice-Sorensen index, a Tversky index, a Tanimoto metric, and a cosine similarity metric.

Brill discloses: wherein the similarity metric value is calculated using at least one a Jaccard index ([0057], referring to the use of a Jaccard coefficient as a measurement of similarity), a Dice-Sorensen index, a Tversky index, a Tanimoto metric, and a cosine similarity metric ([0057], referring to the use of a cosine distance as a similarity measurement).

It would have been obvious to one having ordinary skill in this art before the effective filing date of the invention, having the teachings of Hudis, Liu, and Brill before him/her, to have further modified the similarity metric of Hudis with the Jaccard index or cosine similarity of Brill, in order to quantify the similarities between sets of data, as suggested by Hudis at [0069], referring to identifying step 306 including element-by-element comparison of datasets within a specified tolerance.

Response to Arguments
Applicant's arguments filed February 12, 2021, have been fully considered but they are not persuasive. Accordingly, THIS ACTION IS MADE FINAL.

Applicants’ arguments regarding the applicability of Hudis to the amended feature of the independent claims reciting, “wherein the category name identifies a category type of the first subset of the data” are found to be persuasive. That limitation is rendered obvious by the combination of Hudis and Liu, as explained in the detailed rejection above.

Applicants argue that Hudis fails to adequately disclose the “storing” step of the independent claims. Examiner respectfully disagrees. The claimed “reference data sets” are equivalent to the “structured data sets” of Hudis. These structure data sets are disclosed at [0066] to reside in local memory.

Applicants further argue that Hudis fails to adequately disclose the “extracting” step of the independent claims. Examiner again respectfully disagrees. The “structured data sets” of Hudis correspond to the “reference data sets” of the claims; the “samplexes” of Hudis correspond to the “extracted data sets” of the claims, and the various elements of the samplexes of Hudis correspond to the “subsets” of the claims. Hudis’ casting step 304 (described at [0081]) creates, or “casts”, a samplex from selected information of the structured data set, just as the current invention extracts a first data set from the reference data sets.

Applicants further argue that Hudis fails to adequately disclose the “calculating” step of the independent claims. Examiner again respectfully disagrees. As correctly indicated by applicants, [0082] of Hudis discloses the feature of element-by-element comparison 

Applicants argue that Hudis fails to adequately disclose the “determining” step of the independent claims. Examiner respectfully disagrees. At [0004] – [0005], Hudis generally discusses the purpose of data enrichment though data set matching. In particular, this section indicates that once a matching set is found, data management may enrich the structured dataset by adding at least one typed attribute of the matching data set, which may include names and data types.

For these reasons, the rejection of the claims based on the disclosure of Hudis is maintained.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NIRAV K KHAKHAR whose telephone number is (571)270-1004.  The examiner can normally be reached on Monday through Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Robert W Beausoliel, Jr. can be reached on 571-272-3645.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/NIRAV K KHAKHAR/Examiner, Art Unit 2167     

/ROBERT W BEAUSOLIEL JR/Supervisory Patent Examiner, Art Unit 2167