Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Status of Claims
This office action is in response to communication filed on 02-13-2019. This application was filed 02-13-2019.
Claims 1, 9, and 16 have been amended by Examiner’s AMENDMENT; Claims 5-6, 12-13, and 20 have been cancelled, by EXAMINER’S AMENDMENT.

Information Disclosure Statement
The information disclosure statements (IDS) filed on 03-14-2019, has been acknowledged. The submissions is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.











USC § 101 Analysis
Claim(s) 1, 9, 16, and dependent claims 2-4, 7-8, 10-11, 14-15, 17-18 and 20 are directed to a technical solution to a technical problem associated with employing a means for receiving data by a computing system (Applicant specification, paragraph 21), through a machine learning technique instantiated through the computing system’s processors, and memory (Applicant specification, paragraphs 21, 69-70), said machine learning technique such as spectral regularization, effecting normalization of a generated matrix representing clusters of entities and features associated with the entities including information in a reference dataset and a candidate dataset, said normalization including the creation of a sparse matrix, as represented by an aggregated matrix which  describes entities that are absent from the reference dataset (Applicant specification paragraph 44), and further identifying, from a subset of entities associated with a threshold distance from the  cluster, a merit attribute of the candidate dataset, to reduce the training time for training a machine learning model, minimizing cost and time, thus improving the efficiency for achieving a desired result for a target application (Applicant specification, paragraph 22), specifically:
“receiving (i) a reference dataset identifying first entities associated with first features that include a baseline feature of a target population and (ii) a candidate dataset identifying second entities associated with second features;
identifying, in the candidate dataset, first unique candidate entities that are absent from the reference dataset and that are associated with the baseline feature in the candidate dataset; 
forming, in a multi-dimensional space and based on a subset of the first features lacking the baseline feature, a cluster of data points representing the first entities, wherein forming the cluster of data points representing the first entities comprises:
generating an aggregated matrix that identifies the first entities, the first features, the second entities, and the second features;
estimating missing values in the aggregated matrix using spectral regularization; and
mapping, based on the aggregated matrix, the first entities to the data points representing the first entities, wherein coordinates of each data point in the data points representing the first entities are determined based on corresponding values in the aggregated matrix; 
mapping a subset of the second entities that are absent from the reference dataset and that are not in the first unique candidate entities to additional data points, respectively in the multi-dimensional space, wherein mapping the subset of the second entities to the additional data points in the multi-dimensional space comprises mapping each second entity in the subset of the second entities to a respective additional data point in the multi-dimensional space based on corresponding values in the aggregated matrix; 
identifying, from the subset of the second entities, second unique candidate entities corresponding to a subset of the additional data points within a threshold distance of the cluster; 
determining a merit attribute of the candidate dataset based on a first weight for each first unique candidate entity, a second weight for each second unique candidate entity, a number of the first unique candidate entities in the candidate dataset, and a number of the second unique candidate entities in the candidate dataset; and
selecting the candidate dataset as input data for a target software application based on the merit attribute of the candidate dataset being greater than a threshold value”.
Thus, based on the aforementioned analysis, claims are patent eligible.

35 USC § 103 
Closest prior art of record, Modarresi (US 2017/0140023), Hohwald (US 2017/0286522) and Non-Patent Literature, NPL, Dinuzzo, Elsevier, 2013, pages 119–126, are withdrawn from consideration pursuant to Allowable Subject Matter.

Examiner’s Amendment
Authorization for this examiner’s amendment, was given in an Examiner-Initiated Interview with Applicant Representative Bryan Gordy initiated on 9 May 2022, culminating in authorization of Examiner Amendments provided on 14 June 2022.
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.






























--- Claims 1, 9, and 16 have been amended by Examiner’s AMENDMENT;
Claims 5-6, 12-13, and 20 have been cancelled, by EXAMINER’S AMENDMENT as Follows ---

AMENDMENT TO CLAIMS
1.	(Currently Amended) A method for applying machine-learning techniques to evaluate candidate datasets for use by software applications, the method comprising performing, by one or more processing devices, operations including:
receiving (i) a reference dataset identifying first entities associated with first features that include a baseline feature of a target population and (ii) a candidate dataset identifying second entities associated with second features;
identifying, in the candidate dataset, first unique candidate entities that are absent from the reference dataset and that are associated with the baseline feature in the candidate dataset; 
forming, in a multi-dimensional space and based on a subset of the first features lacking the baseline feature, a cluster of data points representing the first entities, wherein forming the cluster of data points representing the first entities comprises:
generating an aggregated matrix that identifies the first entities, the first features, the second entities, and the second features;
estimating missing values in the aggregated matrix using spectral regularization; and
mapping, based on the aggregated matrix, the first entities to the data points representing the first entities, wherein coordinates of each data point in the data points representing the first entities are determined based on corresponding values in the aggregated matrix; 
mapping a subset of the second entities that are absent from the reference dataset and that are not in the first unique candidate entities to additional data points, respectively in the multi-dimensional space, wherein mapping the subset of the second entities to the additional data points in the multi-dimensional space comprises mapping each second entity in the subset of the second entities to a respective additional data point in the multi-dimensional space based on corresponding values in the aggregated matrix; 
identifying, from the subset of the second entities, second unique candidate entities corresponding to a subset of the additional data points within a threshold distance of the cluster; 
determining a merit attribute of the candidate dataset based on a first weight for each first unique candidate entity, a second weight for each second unique candidate entity, a number of the first unique candidate entities in the candidate dataset, and a number of the second unique candidate entities in the candidate dataset; and
selecting the candidate dataset as input data for a target software application based on the merit attribute of the candidate dataset being greater than a threshold value.  

2.	(Original)  The method of claim 1, wherein identifying the second unique candidate entities comprises:
determining, in the multi-dimensional space, a centroid of the cluster of the data points representing the first entities;
determining, in the multi-dimensional space, an average reference distance between each data point in the data points representing the first entities and the centroid of the cluster; and
determining the threshold distance based on the average reference distance.  

.	(Original)  The method of claim 2, wherein identifying the second unique candidate entities further comprises: 
determining, in the multi-dimensional space, a distance between a respective additional data point and the centroid of the cluster.  

4.	(Original)  The method of claim 3, wherein the distance between the respective additional data point and the centroid of the cluster includes a Pearson correlation distance, Euclidean distance, cosine distance, or Jaccard distance.  

.	(Canceled) 
6.	(Canceled) 

7.	(Original)  The method of claim 1, wherein the operations further comprise determining, based on merit attributes of training datasets, the first weight and the second weight using linear regression.  

8.	(Original)  The method of claim 1, wherein determining the merit attribute of the candidate dataset comprises:
determining a weighted sum of the number of the first unique candidate entities in the candidate dataset and the number of the second unique candidate entities in the candidate dataset,
wherein each of the first unique candidate entities is associated with the first weight; and 
wherein each of the second unique candidate entities is associated with the second weight.  

9.	(Currently Amended)  A system comprising:
a processing device; and
a non-transitory computer-readable medium communicatively coupled to the processing device, wherein the processing device is configured to execute program code stored in the non-transitory computer-readable medium and thereby perform operations comprising:
receiving (i) a reference dataset identifying first entities associated with first features that include a baseline feature of a target population and (ii) a candidate dataset identifying second entities associated with second features;
identifying, in the candidate dataset, first unique candidate entities that are absent from the reference dataset and that are associated with the baseline feature in the candidate dataset; 
forming, in a multi-dimensional space and based on a subset of the first features lacking the baseline feature, a cluster of data points representing the first entities, wherein forming the cluster of data points representing the first entities comprises:
generating an aggregated matrix that identifies the first entities, the first features, the second entities, and the second features;
estimating missing values in the aggregated matrix using spectral regularization; and
mapping, based on the aggregated matrix, the first entities to the data points representing the first entities, wherein coordinates of each data point in the data points representing the first entities are determined based on corresponding values in the aggregated matrix; 
mapping a subset of the second entities that are absent from the reference dataset and that are not in the first unique candidate entities to additional data points, respectively in the multi-dimensional space, wherein mapping the subset of the second entities to the additional data points in the multi-dimensional space comprises mapping each second entity in the subset of the second entities to a respective additional data point in the multi-dimensional space based on corresponding values in the aggregated matrix; 
identifying, from the subset of the second entities, second unique candidate entities corresponding to a subset of the additional data points within a threshold distance of the cluster; 
determining a merit attribute of the candidate dataset based on a first weight for each first unique candidate entity, a second weight for each second unique candidate entity, a number of the first unique candidate entities in the candidate dataset, and a number of the second unique candidate entities in the candidate dataset; and
selecting the candidate dataset as input data for a target software application based on the merit attribute of the candidate dataset being greater than a threshold value.

10.	(Original)  The system of claim 9, wherein identifying the second unique candidate entities comprises:
determining, in the multi-dimensional space, a centroid of the cluster of the data points representing the first entities;
determining, in the multi-dimensional space, an average reference distance between each data point in the data points representing the first entities and the centroid of the cluster; 
determining the threshold distance based on the average reference distance; and
determining, in the multi-dimensional space, a distance between a respective additional data point and the centroid of the cluster.

11.	(Original)  The system of claim 10, wherein the distance between the respective additional data point and the centroid of the cluster includes a Pearson correlation distance, Euclidean distance, cosine distance, or Jaccard distance.

12.	(Canceled)  
13.	(Canceled) 
14.	(Original)  The system of claim 9, wherein the operations further comprise determining, based on merit attributes of training datasets, the first weight and the second weight using linear regression.

15.	(Original)  The system of claim 9, wherein determining the merit attribute of the candidate dataset comprises:
determining a weighted sum of the number of the first unique candidate entities in the candidate dataset and the number of the second unique candidate entities in the candidate dataset,
wherein each of the first unique candidate entities is associated with the first weight; and 
wherein each of the second unique candidate entities is associated with the second weight.  

16.	(Currently Amended)  A system comprising:
means for receiving (i) a reference dataset identifying first entities associated with first features that include a baseline feature of a target population and (ii) a candidate dataset identifying second entities associated with second features;
means for identifying, in the candidate dataset, first unique candidate entities that are absent from the reference dataset and that are associated with the baseline feature in the candidate dataset; 
means for forming, in a multi-dimensional space and based on a subset of the first features lacking the baseline feature, a cluster of data points representing the first entities, wherein forming the cluster of data points representing the first entities comprises:
generating an aggregated matrix that identifies the first entities, the first features, the second entities, and the second features;
estimating missing values in the aggregated matrix using spectral regularization; and
mapping, based on the aggregated matrix, the first entities to the data points representing the first entities, wherein coordinates of each data point in the data points representing the first entities are determined based on corresponding values in the aggregated matrix; 
means for mapping a subset of the second entities that are absent from the reference dataset and that are not in the first unique candidate entities to additional data points, respectively in the multi-dimensional space, wherein mapping the subset of the second entities to the additional data points in the multi-dimensional space comprises mapping each second entity in the subset of the second entities to a respective additional data point in the multi-dimensional space based on corresponding values in the aggregated matrix; 
means for identifying, from the subset of the second entities, second unique candidate entities corresponding to a subset of the additional data points within a threshold distance of the cluster; 
means for determining a merit attribute of the candidate dataset based on a first weight for each first unique candidate entity, a second weight for each second unique candidate entity, a number of the first unique candidate entities in the candidate dataset, and a number of the second unique candidate entities in the candidate dataset; and
means for selecting the candidate dataset as input data for a target software application based on the merit attribute of the candidate dataset being greater than a threshold value.

17.	(Original)  The system of claim 16, wherein the means for identifying the second unique candidate entities comprise:
means for determining, in the multi-dimensional space, a centroid of the cluster of the data points representing the first entities;
means for determining, in the multi-dimensional space, an average reference distance between each data point in the data points representing the first entities and the centroid of the cluster; 
means for determining the threshold distance based on the average reference distance; and
means for determining, in the multi-dimensional space, a distance between a respective additional data point and the centroid of the cluster.

18.	(Original)  The system of claim 17, wherein the distance between the respective additional data point and the centroid of the cluster includes a Pearson correlation distance, Euclidean distance, cosine distance, or Jaccard distance.

19.	(Canceled) 


20.	(Original)  The system of claim 16, wherein the means for determining the merit attribute of the candidate dataset comprise:
means for determining a weighted sum of the number of the first unique candidate entities in the candidate dataset and the number of the second unique candidate entities in the candidate dataset,
wherein each of the first unique candidate entities is associated with the first weight; and 
wherein each of the second unique candidate entities is associated with the second weight.


















Allowable Subject Matter
Claims 1-4, 7-11, 14-18 and 20, are allowed.

The following is an examiner’s statement of reasons for allowance:
While closest prior art of record, Modarresi (US 2017/0140023), Hohwald (US 2017/0286522) and Non-Patent Literature, NPL, Dinuzzo, Elsevier, 2013, pages 119–126 disclose the handling of missing data, and grouping of documents through spectral clustering and spectral clustering, they do not teach: 
“receiving (i) a reference dataset identifying first entities associated with first features that include a baseline feature of a target population and (ii) a candidate dataset identifying second entities associated with second features;
identifying, in the candidate dataset, first unique candidate entities that are absent from the reference dataset and that are associated with the baseline feature in the candidate dataset; 
forming, in a multi-dimensional space and based on a subset of the first features lacking the baseline feature, a cluster of data points representing the first entities, wherein forming the cluster of data points representing the first entities comprises:
generating an aggregated matrix that identifies the first entities, the first features, the second entities, and the second features;
estimating missing values in the aggregated matrix using spectral regularization; and
mapping, based on the aggregated matrix, the first entities to the data points representing the first entities, wherein coordinates of each data point in the data points representing the first entities are determined based on corresponding values in the aggregated matrix; 
mapping a subset of the second entities that are absent from the reference dataset and that are not in the first unique candidate entities to additional data points, respectively in the multi-dimensional space, wherein mapping the subset of the second entities to the additional data points in the multi-dimensional space comprises mapping each second entity in the subset of the second entities to a respective additional data point in the multi-dimensional space based on corresponding values in the aggregated matrix; 
identifying, from the subset of the second entities, second unique candidate entities corresponding to a subset of the additional data points within a threshold distance of the cluster; 
determining a merit attribute of the candidate dataset based on a first weight for each first unique candidate entity, a second weight for each second unique candidate entity, a number of the first unique candidate entities in the candidate dataset, and a number of the second unique candidate entities in the candidate dataset; and
selecting the candidate dataset as input data for a target software application based on the merit attribute of the candidate dataset being greater than a threshold value”, in the context of the claim when considered as a whole.  These uniquely distinct features render the claim(s) 1 allowable. 
Therefore, independent claims 1, 9, 16, and dependent claims 2-4, 7-8, 10-11, 14-15, 17-18 and 20 are allowable based on the same rationale as the claim(s) from which they depend.
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”


Conclusion
The prior art made of record and NOT relied upon is considered pertinent to applicant's disclosure including information well-known to one of ordinary skill in the art:

    PNG
    media_image1.png
    1379
    1716
    media_image1.png
    Greyscale

Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL EZEWOKO whose telephone number is (571)272-7850.  The examiner can normally be reached on Monday - Thursday.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Waseem Ashraf can be reached on (571) 270-3948.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/MICHAEL I EZEWOKO/Examiner, Art Unit 3682