DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Office action is in response to the RCE/IDS, filed on 5/5/2022, in which claim(s) 1-20 is/are presented for further examination.
Claim(s) 1-12 and 14-20 is/are allowed (renumbered 1-19).

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission, filed on 5/5/2022, has been entered.

Information Disclosure Statement
The information disclosure statement(s) (IDS), submitted on 5/5/2022, is/are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement(s) is/are being considered by the examiner.

EXAMINER’S AMENDMENT
An examiner’s amendment to the record appears below.  Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312.  To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.
On 1/24/2022, the examiner had a telephone interview with applicant’s representative at (202) 860-5570 to discuss clarifying amendments to the claims to obviate any potential 35 U.S.C. 101 subject matter eligibility rejections.
Authorization for this examiner’s amendment was given by Matthew Karas, Registration No. 74,279, on 1/25/2022.
Please amend the claims, filed on 11/15/2021, as follows:
1.	(Currently Amended) A system for searching data, comprising:
at least one memory storing instructions; and
one or more processors that execute the instructions to perform operations comprising:
receiving a search request comprising a sample dataset and a vector similarity threshold of similarity between vectors;
in response to the received search request, performing:
identifying, using a data-profiling model comprising a machine learning model configured to compute statistical metrics descriptive of the sample dataset, a data schema of the sample dataset;
computing, using the data-profiling model, statistical metrics describing at least one statistical attribute of data within the sample dataset;
generating a sample data vector comprising the computed statistical metrics of the sample dataset and information describing the data schema of the sample dataset;
searching a data index comprising a plurality of stored data vectors corresponding to a plurality of reference datasets, the stored data vectors comprising statistical metrics of data within the reference datasets and information describing corresponding data schema of the reference datasets, wherein searching the data index comprises:
performing data schema comparisons between the data schema of the sample data vector and data schemas of the stored vectors; and
performing statistical metric comparisons between the computed statistical metrics of the sample data vector and statistical metrics of the stored vectors;
generating, based on both the data schema comparisons and the statistical metric comparisons, one or more similarity metrics of the sample dataset to individual ones of the reference datasets;
determining, based on the one or more similarity metrics, at least a portion of the reference datasets having at least one data vector satisfying the vector similarity threshold 
returning, as a result of the received search request, 
2.	(Currently Amended) The system of claim 1, the operations further comprising:
receiving a new reference dataset;
identifying a data schema of the new reference dataset;
generating a new reference data vector comprising statistical measures of the new reference dataset; and
updating the data index based on the new reference data vector.
13.	(Canceled).
19.	(Currently Amended) A method for searching data, the method comprising the following operations performed by one or more processors:
receiving a search request comprising a sample dataset and a vector similarity threshold of similarity between vectors;
in response to the received search request, performing:
identifying, using a data-profiling model comprising a machine learning model configured to compute statistical metrics descriptive of the sample dataset, a data schema of data within the sample dataset;
computing, using the data-profiling model, statistical metrics describing at least one statistical attribute of the sample dataset;
generating a sample data vector comprising the computed statistical metrics of the sample dataset and information describing the data schema of the sample dataset;
searching a data index comprising a plurality of stored data vectors corresponding to a plurality of reference datasets, the stored data vectors comprising statistical metrics of data within the corresponding reference datasets and information describing corresponding data schema of the reference datasets, wherein searching the data index comprises:
performing data schema comparisons between the data schema of the sample data vector and data schemas of the stored vectors; and
performing statistical metric comparisons between the computed statistical metrics of the sample data vector and statistical metrics of the stored vectors;
generating, based on both the data schema comparisons and the statistical metric comparisons, one or more similarity metrics of the sample dataset to individual ones of the reference datasets;
determining, based on the one or more similarity metrics, at least a portion of the reference datasets having at least one data vector satisfying the vector similarity threshold 
returning, as a result of the received search request, 
20.	(Currently Amended) A system for searching data, comprising:
at least one memory storing instructions; and
one or more processors that execute the instructions to perform operations comprising:
receiving, by an aggregator, a plurality of reference datasets;
identifying, by the aggregator, data schema corresponding to the reference datasets;
generating, by the aggregator, a plurality of stored data vectors corresponding to the reference datasets, the stored data vectors comprising statistical metrics of data within the reference datasets and information describing the corresponding data schema of the reference datasets;
generating, by the aggregator, a data index comprising the stored data vectors;
storing the data index in an aggregation database;
receiving, via an interface, a search request comprising a sample dataset and a vector similarity threshold of similarity between vectors;
in response to the received search request, performing:
identifying, using a data-profiling model comprising a machine learning model configured to compute statistical metrics descriptive of the sample dataset, a data schema of the sample dataset;
computing, using the data-profiling model, statistical metrics describing at least one statistical attribute of the sample dataset;
generating, using the data-profiling model, a sample data vector comprising the computed statistical metrics of the sample dataset and information describing the data schema of the sample dataset;
searching the data index based on the sample data vector, the searching comprising:
performing data schema comparisons between the data schema of the sample data vector and data schemas of the stored vectors; and
performing statistical metric comparisons between the computed statistical metrics of the sample data vector and statistical metrics of the stored vectors;
generating, based on both the data schema comparisons and the statistical metric comparisons, one or more similarity metrics of the sample dataset to individual ones of the reference datasets;
determining, based on the one or more similarity metrics, at least a portion of the reference datasets having at least one data vector satisfying the vector similarity threshold of the user input; and
returning, as a result of the received search request, 

Allowable Subject Matter
The following is an examiner’s statement of reasons for allowance:
The prior art of record does not teach the limitations of claim 1.  An updated search did not reveal any prior art that would anticipate or render obvious the invention as presented in the claim.  Specifically, the prior art does not teach:
“receiving a search request comprising a sample dataset and a vector similarity threshold of similarity between vectors;
in response to the received search request, performing:
identifying, using a data-profiling model comprising a machine learning model configured to compute statistical metrics descriptive of the sample dataset, a data schema of the sample dataset;
computing, using the data-profiling model, statistical metrics describing at least one statistical attribute of data within the sample dataset;
generating a sample data vector comprising the computed statistical metrics of the sample dataset and information describing the data schema of the sample dataset;
searching a data index comprising a plurality of stored data vectors corresponding to a plurality of reference datasets, the stored data vectors comprising statistical metrics of data within the reference datasets and information describing corresponding data schema of the reference datasets, wherein searching the data index comprises:
performing data schema comparisons between the data schema of the sample data vector and data schemas of the stored vectors; and
performing statistical metric comparisons between the computed statistical metrics of the sample data vector and statistical metrics of the stored vectors;
generating, based on both the data schema comparisons and the statistical metric comparisons, one or more similarity metrics of the sample dataset to individual ones of the reference datasets;
determining, based on the one or more similarity metrics, at least a portion of the reference datasets having at least one data vector satisfying the vector similarity threshold; and
returning, as a result of the received search request, the at least a portion of the reference datasets.”.
Claim(s) 19 and 20 recite(s) features similar to those of claim 1 and is/are allowed for at least the same reasons.
The dependent claim(s), which depend directly or indirectly upon claim(s) 1, is/are also distinct from the prior art for at least the same reasons.
After further review of the results of the searches conducted and the claims most currently amended, the examiner is persuaded that the prior art does not teach the above described and highlighted major features in independent claim(s) 1, 19 and 20 and other recited features.
An updated search for prior art was conducted.  The prior art searched and examined do not fairly teach or suggest the limitations of the claimed subject matter.
The prior art of record neither anticipates nor renders obvious the recited combination.

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Prior Art
Pertinent prior art was discovered, but does neither anticipate nor render obvious the claimed subject matter.
Drevo et al., 2016/0132787, discloses generating vectors for data sets, but does not disclose generating and comparing data schema vectors.  Additionally, Drevo does not disclose the described and highlighted major features in independent claim 1.
Birdwell et al., 2010/0332474, discloses generating vectors for data schemas, but does not disclose generating and comparing data set vectors.  Additionally, Birdwell does not disclose the described and highlighted major features in independent claim 1.
Herz et al., 2009/0254971, discloses generating vectors for data sets, but does not disclose generating and comparing data schema vectors.  Additionally, Herz does not disclose the described and highlighted major features in independent claim 1.
Ellingsworth, 2007/0282824, discloses a schema to classify calculating a score and a vector, but does not disclose comparing data schema vectors or data set vectors.  Additionally, Ellingsworth does not disclose the described and highlighted major features in independent claim 1.
Rosengard, 2006/0155697, discloses a vector space created using a database schema, but does not disclose comparing data schema vectors or data set vectors.  Additionally, Rosengard does not disclose the described and highlighted major features in independent claim 1.
Parkinson, 2006/0053133, discloses creating feature vectors for tokens, but does not disclose comparing data schema vectors or data set vectors.  Additionally, Parkinson does not disclose the described and highlighted major features in independent claim 1.
Glaenzer et al., 2005/0278139, discloses using a vector to produce combined degrees of similarity between schemas, but does not disclose comparing data set vectors.  Additionally, Glaenzer does not disclose the described and highlighted major features in independent claim 1.
Ma et al., CA 2507309, discloses statistically identifying a result schema and calculating an estimated vector similarity, but does not disclose comparing data set vectors.  Additionally, Ma does not disclose the described and highlighted major features in independent claim 1.

Point of Contact
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HUBERT G CHEUNG whose telephone number is (571) 270-1396. The examiner can normally be reached M-R 8:00A-5:00P EST; alt. F 8:00A-4:00P EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Neveen Abel-Jalil can be reached on (571) 270-0474. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



Examiner: Hubert Cheung
/Hubert Cheung/Assistant Examiner, Art Unit 2152Date: July 5, 2022

/NEVEEN ABEL JALIL/Supervisory Patent Examiner, Art Unit 2152