DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Office action is in response to the amendment, arguments and remarks, filed on 11/15/2021, in which claim(s) 1-20 is/are presented for further examination.
Claim(s) 1, 4, 6, 16, 19 and 20 has/have been amended.
Claim(s) 1-12 and 14-20 is/are allowed (renumbered 1-19).

Response to Amendments
Applicant’s amendment(s) to claim(s) 20 has/have been accepted.  The objection(s) to the claim(s) for informalities has/have been withdrawn.
Applicant’s amendment(s) to claim(s) 1, 19 and 20 has/have been accepted.  Support was found in at least [0093], [0095], [0096] and [0101] of the specification.
Applicant’s amendment(s) to claim(s) 4 has/have been accepted.  Support was found in at least [0093] of the specification.
Applicant’s amendment(s) to claim(s) 6 has/have been accepted.  Support was found in at least [0095] of the specification.
Applicant’s amendment(s) to claim(s) 16 has/have been accepted.  Support was found in at least [0098] of the specification.

EXAMINER’S AMENDMENT
An examiner’s amendment to the record appears below.  Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312.  To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.
On 1/24/2022, the examiner had a telephone interview with applicant’s representative at (202) 860-5570 to discuss clarifying amendments to the claims to obviate any potential 35 U.S.C. 101 subject matter eligibility rejections.
Authorization for this examiner’s amendment was given by Matthew Karas, Registration No. 74,279, on 1/25/2022.
Please amend the claims, filed on 11/15/2021, as follows:
1.	(Currently Amended) A system for searching data, comprising:
at least one memory storing instructions; and
one or more processors that execute the instructions to perform operations comprising:
receiving a search request comprising a sample dataset and a vector similarity threshold of similarity between vectors;
in response to the received search request, performing:
identifying, using a data-profiling model comprising a machine learning model configured to compute statistical metrics descriptive of the sample dataset, a data schema of the sample dataset;
computing, using the data-profiling model, statistical metrics describing at least one statistical attribute of data within the sample dataset;

searching a data index comprising a plurality of stored data vectors corresponding to a plurality of reference datasets, the stored data vectors comprising statistical metrics of data within the reference datasets and information describing corresponding data schema of the reference datasets, wherein searching the data index comprises:
performing data schema comparisons between the data schema of the sample data vector and data schemas of the stored vectors; and
performing statistical metric comparisons between the computed statistical metrics of the sample data vector and statistical metrics of the stored vectors;
generating, based on both the data schema comparisons and the statistical metric comparisons, one or more similarity metrics of the sample dataset to individual ones of the reference datasets;
determining, based on the one or more similarity metrics, at least a portion of the reference datasets having at least one data vector satisfying the vector similarity threshold 
returning, as a result of the received search request, 
2.	(Currently Amended) The system of claim 1, the operations further comprising:
receiving a new reference dataset;

generating a new reference data vector comprising statistical measures of the new reference dataset; and
updating the data index based on the new reference data vector.
13.	(Canceled).
19.	(Currently Amended) A method for searching data, the method comprising the following operations performed by one or more processors:
receiving a search request comprising a sample dataset and a vector similarity threshold of similarity between vectors;
in response to the received search request, performing:
identifying, using a data-profiling model comprising a machine learning model configured to compute statistical metrics descriptive of the sample dataset, a data schema of data within the sample dataset;
computing, using the data-profiling model, statistical metrics describing at least one statistical attribute of the sample dataset;
generating a sample data vector comprising the computed statistical metrics of the sample dataset and information describing the data schema of the sample dataset;
searching a data index comprising a plurality of stored data vectors corresponding to a plurality of reference datasets, the stored data vectors comprising statistical metrics of data within the corresponding reference datasets and information describing corresponding data schema of the reference datasets, wherein searching the data index comprises:

performing statistical metric comparisons between the computed statistical metrics of the sample data vector and statistical metrics of the stored vectors;
generating, based on both the data schema comparisons and the statistical metric comparisons, one or more similarity metrics of the sample dataset to individual ones of the reference datasets;
determining, based on the one or more similarity metrics, at least a portion of the reference datasets having at least one data vector satisfying the vector similarity threshold 
returning, as a result of the received search request, 
20.	(Currently Amended) A system for searching data, comprising:
at least one memory storing instructions; and
one or more processors that execute the instructions to perform operations comprising:
receiving, by an aggregator, a plurality of reference datasets;
identifying, by the aggregator, data schema corresponding to the reference datasets;
generating, by the aggregator, a plurality of stored data vectors corresponding to the reference datasets, the stored data vectors comprising statistical metrics of data within the reference datasets and information describing the corresponding data schema of the reference datasets;
generating, by the aggregator, a data index comprising the stored data vectors;
;
receiving, via an interface, a search request comprising a sample dataset and a vector similarity threshold of similarity between vectors;
in response to the received search request, performing:
identifying, using a data-profiling model comprising a machine learning model configured to compute statistical metrics descriptive of the sample dataset, a data schema of the sample dataset;
computing, using the data-profiling model, statistical metrics describing at least one statistical attribute of the sample dataset;
generating, using the data-profiling model, a sample data vector comprising the computed statistical metrics of the sample dataset and information describing the data schema of the sample dataset;
searching the data index based on the sample data vector, the searching comprising:
performing data schema comparisons between the data schema of the sample data vector and data schemas of the stored vectors; and
performing statistical metric comparisons between the computed statistical metrics of the sample data vector and statistical metrics of the stored vectors;
generating, based on both the data schema comparisons and the statistical metric comparisons, one or more similarity metrics of the sample dataset to individual ones of the reference datasets;

returning, as a result of the received search request, 

Allowable Subject Matter
The following is an examiner’s statement of reasons for allowance:
The prior art of record does not teach the limitations of claim 1.  An updated search did not reveal any prior art that would anticipate or render obvious the invention as presented in the claim.  Specifically, the prior art does not teach:
“receiving a search request comprising a sample dataset and a vector similarity threshold of similarity between vectors;
in response to the received search request, performing:
identifying, using a data-profiling model comprising a machine learning model configured to compute statistical metrics descriptive of the sample dataset, a data schema of the sample dataset;
computing, using the data-profiling model, statistical metrics describing at least one statistical attribute of data within the sample dataset;
generating a sample data vector comprising the computed statistical metrics of the sample dataset and information describing the data schema of the sample dataset;
wherein searching the data index comprises:
performing data schema comparisons between the data schema of the sample data vector and data schemas of the stored vectors; and
performing statistical metric comparisons between the computed statistical metrics of the sample data vector and statistical metrics of the stored vectors;
generating, based on both the data schema comparisons and the statistical metric comparisons, one or more similarity metrics of the sample dataset to individual ones of the reference datasets;
determining, based on the one or more similarity metrics, at least a portion of the reference datasets having at least one data vector satisfying the vector similarity threshold; and
returning, as a result of the received search request, the at least a portion of the reference datasets.”.
Claim(s) 19 and 20 recite(s) features similar to those of claim 1 and is/are allowed for at least the same reasons.
The dependent claim(s), which depend directly or indirectly upon claim(s) 1, is/are also distinct from the prior art for at least the same reasons.

An updated search for prior art was conducted.  The prior art searched and examined do not fairly teach or suggest the limitations of the claimed subject matter.
The prior art of record neither anticipates nor renders obvious the recited combination.

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Prior Art
Pertinent prior art was discovered, but does neither anticipate nor render obvious the claimed subject matter.
Llaves et al. discloses data property recognition.  However, Llaves does not disclose the described and highlighted major features in independent claim 1.
Sodhani et al. discloses classification training techniques to map datasets to a standardized data model.  However, Sodhani does not disclose the described and highlighted major features in independent claim 1.

Point of Contact

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Neveen Abel-Jalil can be reached on (571) 270-0474. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



Examiner: Hubert Cheung
/Hubert Cheung/Assistant Examiner, Art Unit 2152Date: January 26, 2022

/NEVEEN ABEL JALIL/Supervisory Patent Examiner, Art Unit 2152