DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
This office action is in response to the amendment filed on 07/21/2021. After the examiner’s amendment shown below, claims 1, 8, 13 and 18 are independent. Claims 2, 9, 16-17 and 19 are cancelled. Claims 1, 3, 8, 10, 13 and 18 are amended. Thus, claims 1, 3-8, 10-15, 18 and 20-23 are pending and being considered. Furthermore, the
claim rejection(s) under 35 U.S.C. § 101 has been waived and/or withdrawn.
claim rejection(s) 35 U.S.C. § 112(b) has been waived and/or withdrawn.
substitute (clean and marked-up version) specification, filed on 7/21/2021, has been reviewed and accepted.

Examiner’s Amendment
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.
Authorization for this examiner’s amendment was given in a telephone interview with the applicant’s representative- Mr. J. Charles Dougherty (Reg. No. 41,715) on 09/03/2021. The summary of the interview is attached.

Amendments to the Claims
The application has been amended as followed:
1. (Currently Amended) An apparatus for determining data leakage in data files with mixed data, the apparatus comprising: 
a server configured to receive a wild file, wherein the wild file comprises a plurality of wild file records each comprising a plurality of wild file fields containing mixed data representing multiple measurement scales comprising at least one categorical scale and at least one numeric scale; 
at least one data store cache in communication with the server, wherein the data store cache comprises a data owner dataset comprising a plurality of data owner dataset records each comprising a plurality of data owner dataset fields containing mixed data representing multiple measurement scales comprising at least one categorical scale and at least one numeric scale; 
a subset selection subroutine implemented on the server, wherein the subset selection subroutine is configured to receive a data owner dataset and a date-adjusted wild file and create a data owner subset and a wild file subset, wherein the data owner subset and the wild file subset contain a reduced number of records compared to the data owner dataset and the wild file, respectively; 
a PCAmix data analysis subroutine implemented on the server, wherein the PCAmix data analysis subroutine is configured to receive the data owner subset and the wild file subset, divide the wild file dataset into a horizontally partitioned dataset comprising a numeric matrix and a categorical matrix, and produce a set of data owner 
a score generation subroutine implemented on the server, wherein the score generation subroutine is configured to receive the data owner subset eigenvalues, the data owner subset eigenvectors, the wild file subset eigenvalues, and the wild file subset eigenvectors to produce a similarity score indicative of the likelihood that the wild file was derived from the data owner dataset; and
a file date determination subroutine implemented on the server, wherein the file date determination subroutine is configured to analyze the wild file and adjust dates in the wild file to account for a passage of time since the wild file was leaked to produce the date-adjusted wild file. 

2. (Cancelled) 

3. (Currently Amended) The apparatus of claim [[2]]1, further comprising a record matching subroutine implemented on the server, the record matching subroutine configured to receive the date-adjusted wild file and the data owner dataset and produce a set of matched records between the date-adjusted wild file and the data owner dataset. 

4. (Previously Presented) The apparatus of claim 1, wherein the PCAmix data analysis subroutine is further configured to build a first4 diagonal matrix constructed from row 

5. (original) The apparatus of claim 4, wherein the PCAmix data analysis subroutine is further configured to perform a generalized singular value decomposition on the numerical data matrix using metrics from the first and second diagonal matrices to produce the data set owner subset eigenvectors and the wild file subset eigenvectors. 

6. (original) The apparatus of claim 5, wherein the PCAmix data analysis subroutine is further configured to eliminate all of the data set owner subset eigenvectors and wild file subset eigenvectors other than those that account for a significant portion of variance. 

7. (original) The apparatus of claim 6, wherein the PCAmix data analysis subroutine is further configured to eliminate all of the data set owner subset eigenvectors and wild file subset eigenvectors other than those that account for at least ten percent of variance. 

8. (Currently Amended) An apparatus for creating a fingerprint for a data file, the apparatus comprising: 
a server; 
a data store cache in communication with the server, wherein the data 
store cache comprises a data owner dataset comprising a plurality of data owner dataset records each comprising a plurality of data owner dataset fields5 containing mixed data representing multiple measurement scales comprising at least one categorical scale and at least one numeric scale; 

a PCAmix data analysis subroutine implemented on the server, wherein the PCAmix data analysis subroutine is configured to receive the data owner subset, divide the data owner subset into a horizontally partitioned dataset comprising a numerical data matrix and a categorical matrix, and produce from the horizontally partitioned dataset a set of data owner subset eigenvalues and a set of data owner subset eigenvectors, wherein the PCAmix data analysis subroutine is further configured to build a first diagonal matrix constructed from a set of row weights from the numerical data matrix, and a second diagonal matrix constructed from a set of column weights from the numerical data matrix.

9. (Cancelled) 

10. (Currently Amended) The apparatus of claim [[9]]8, wherein the PCAmix data analysis subroutine is further configured to perform a generalized singular value decomposition on the numerical data matrix using a set of metrics from the first and second diagonal matrices to produce the data set owner subset eigenvectors. 

6 11. (Original) The apparatus of claim 10, wherein the PCAmix data analysis subroutine is further configured to eliminate all of the data set owner subset eigenvectors other than those that account for a significant portion of variance. 

12. (Original) The apparatus of claim 11, wherein the PCAmix data analysis subroutine is further configured to eliminate all of the data set owner subset eigenvectors other than those that account for at least ten percent of variance. 

13. (Currently Amended) A method for fingerprinting a data owner dataset using a server, wherein the dataset is stored on a data store cache in communication with the server and the dataset comprises a plurality of records each comprising a plurality of fields, the method comprising the steps of:
Selecting, at the server, a subset of the records from the dataset on the data store cache to produce a data owner subset comprising a plurality of data owner dataset records each comprising a plurality of data owner dataset fields containing mixed data representing multiple measurement scales comprising at least one categorical scale and at least one numeric scale; 
Applying, at the server, principal components analysis to the data owner subset to divide the data owner dataset into a horizontally partitioned dataset comprising a numeric matrix and a categorical matrix to produce a matrix of data owner subset eigenvalues and a matrix of data owner subset eigenvectors, wherein the step of applying principal components analysis to the data owner subset further comprises the step(s) of 
removing, at the server, all of the data owner subset eigenvectors from the matrix of data owner subset eigenvectors other than those that account for a significant portion of variance, and/or 
removing, at the server, all of the data owner subset eigenvectors from the matrix of data owner subset eigenvectors other than those that account for at least ten percent of variance; and
analyzing, at the server, the matrix of data owner subset eigenvectors to produce a set of scores that define observational values on the data owner subset; and 
storing the set of scores at the server. 

7 14. (Previously Presented) The method of claim 13, wherein the step of applying principal components analysis to the data owner subset further comprises the step of building at the server a numerical data matrix, a first diagonal matrix constructed from a set of row weights from the numerical data matrix, and a second diagonal matrix constructed from a set of column weights from the numerical data matrix. 

15. (Previously Presented) The method of claim 14, wherein the step of applying principal components analysis to the data owner subset further comprises the step of performing at the server a generalized singular value decomposition on the numerical data matrix using a set of metrics from the first and second diagonal matrices to produce the matrix of data owner subset eigenvectors. 

16. (Cancelled) 

17. (Cancelled) 

18. (Currently Amended) A method for determining if a wild file is derived from a data owner dataset, wherein the data owner dataset is stored on a data store8 
cache in communication with a server, and wherein each of the data owner dataset and the wild file comprise a plurality of records each comprising a plurality of fields containing mixed data representing multiple measurement scales comprising at least one categorical scale and at least one numeric scale, the method comprising the steps of, at the server: 
extracting a subset of the records from the data owner dataset to produce a data owner subset; 
extracting a subset of the records from the wild file corresponding to the records in the data owner subset to produce a wild file subset, wherein
determining a file date for the wild file prior to the step of extracting the subset of the records from the wild file corresponding to the records in the data owner subset;
dividing the data owner dataset into a horizontally partitioned data owner dataset comprising a data owner numeric matrix and a data owner categorical matrix; 

dividing the wild file subset into a wild file horizontally partitioned subset comprising a wild file numeric matrix and a wild file categorical matrix; 
applying principal components analysis to the wild file horizontally partitioned subset to produce a set of wild file subset eigenvalues and a matrix of wild file subset eigenvectors; and 
analyzing each data owner subset eigenvector relative to a corresponding wild file subset eigenvector to produce a similarity score; and 
storing the similarity score at the server. 

919. (Cancelled) 

20. (Previously Presented) The method of claim 18, wherein the step of applying principal components analysis to the data owner horizontally partitioned subset further comprises the step of, at the server, building a first diagonal matrix constructed from row weights from the numerical data matrix, and a second diagonal matrix constructed from column weights from the numerical data matrix. 

21. (Previously Presented) The method of claim 20, wherein the step of applying principal components analysis to the data owner horizontally partitioned subset further comprises the step of, at the server, performing a generalized singular value 

22. (Previously Presented) The method of claim 21, wherein the step of applying principal components analysis to the data owner horizontally partitioned subset further comprises the step of, at the server, eliminating all data owner subset eigenvectors from the matrix of data owner subset eigenvectors and all wild file subset eigenvectors from the matrix of wild file subset eigenvectors other than those that account for a significant portion of variance. 

23. (Previously Presented) The method of claim 21, wherein the step of10 applying principal components analysis to the data owner subset further comprises the step of, at the server, eliminating all data owner subset eigenvectors from the matrix of data owner subset eigenvectors and all wild file subset eigenvectors from the matrix of wild file subset eigenvectors other than those that account for at least ten percent of variance. 
11 
Allowable Subject Matter
The following is an examiner’s statement of reasons for allowance: 
After further search and consideration, the claims 1, 3-8, 10-15, 18 and 20-23 are allowed over the cited prior art(s) of record. 
The following references/prior arts disclose the general subject matter recited in the independent claims 1, 8, 13 and 18 before/after the current amendment is made and/or submitted.
(Sparse Signal Decomposition Techniques for Multimedia Fingerprinting; Dated: 2011), this paper presents watermark fingerprinting as a technique for digital media copyright control that aims at traitor tracing for the prevention of the resource leakage. Specifically, the paper presents a digital watermark fingerprinting technique for an image/video file based on PCA and wavelet. Wherein, the PCA is a way of identifying patterns in data, and expressing the data in such a way as to highlight their similarities and differences, in other words, remove the redundancy of the data. The patterns of data can be considered as components of data. For instance, a pair of eigenvector and eigenvalue derived by PCA presents a component of the data. Thus the number of the groups of the eigenvector/eigenvalue pairs determines the number of components (or patterns) the data has. Since patterns in data can be hard to find in data of high dimension, where the luxury of graphical representation is not available, PCA is a powerful tool for analyzing data. The other main advantage of PCA is that once these patterns in the data are found, the data can be compressed, i.e., by reducing the number of dimensions (in other words, by eliminating the negligible eigenvectors/eigenvalues), without causing much loss of information. From the linear algebra point of view, the procedure of PCA is to find a matrix so that the original data can be rotated to a new set of coordinates; and these coordinates highlight the correlations among the data as much as possible. The row vectors in the found matrix are called eigenvectors.

B.	Sharma; Ravi K. (US 20130064419 A1), discloses that the content fingerprints and watermarks are combined in various ways for content 

C.	Miles et al. (US 2014/0258352 A1), discloses a technique for identifying a set of parameters representative of a data set is provided. An eigen decomposit-ion of a covariance matrix is calculated to form a decomposed matrix and an eigenvalue vector. The covariance matrix is calculated for a matrix of data including a plurality of data values for each of a plurality of parameters. The decomposed matrix includes a number of eigenvectors equal to a number of the plurality of parameters with each eigenvector including a coefficient for each parameter. The eigenvalue vector includes an eigenvalue defined for each eigenvector. A first matrix is created by rank ordering the coefficient within each parameter of the plurality of parameters for each of the plurality of parameters. A score is determined for each parameter using the created first matrix and the 

D.	Ernande F. Melo (A Fingerprint-based access controlling using principal component analysis and edge detection; Dated: 2015), this paper presents a novel approach for deciding on the appropriateness or not of an acquired fingerprint image into a given database. The process begins with the assembly of a training base in an image space constructed by combining Principal Component Analysis (PCA) and edge detection. Then, the parameter value (H) – a new feature that helps in the decision making about the relevance of a fingerprint image in databases. Such as, a fingerprint classification algorithm discloses to introduce a basic version of the algorithm for fingerprint classifying (FPC), which has as preliminary input a database of fingerprint images (Db_img). A test fingerprint image (IX) is then entered, and the algorithm returns whether or not the test image is in the stored fingerprint bank. Wherein, the algorithm builds the image space by using the PCA technique; The PCA is a technique that allows, from extracting the eigenvectors and eigenvalues of the covariance matrix of images, to create a space of reduced images, which contains the “main components” of the fingerprint images for subsequent recognition.

E.	Chae, Seung Hoon (WO 2012/138004 A1), the present invention relates to […] a principal component analysis (PCA)-based fingerprint-matching unit for performing a fingerprint-matching operation through a PCA method on the basis of the feature extracted from the image of the fingerprint of the user and the registered fingerprint image, such as, the fingerprint matching method based on 

F.	Nandy et al. (US 20150341376 A1), discloses to perform an iterative principal component analysis on the collected network data flow 116 to detect an anomaly associated with the collected network data flow 116. The various steps (operations) for the PCA operation are: Operation [1] includes directing a server (not depicted) to generate (create) a zero-mean traffic matrix (with mean zero for all the columns) from the [m.times.n] input network traffic matrix. Operational control is passed over to operation [2] that includes directing the server to generate (create) a covariance matrix of the zero-mean traffic matrix that was generated in operation [1]. Operational control is passed over to operation [3] that includes directing the server to calculate the eigenvalues and eigenvectors of the covariance matrix that was generated in operation [2]. Operational control is passed over to operation [4] that includes directing the server to sort the eigenvalues and select the first [k] largest eigenvalues and consider the corresponding eigenvectors to be principal components. Operational control is 

G.	Aguayo Gonzalez; Carlos R. et al. (US 20150317475 A1), discloses a well know approach to determine the appropriate W that optimizes the entropy (or information) in the traces is known as Principal Component Analysis (PCA). We assume that the covariance matrices of the different classes, C.sub.i, are normally distributed and identical C.sub.i=C. Hence, the eigenvectors can be considered as the information bearers for the traces under consideration. Some of these vectors carry more discriminatory information in the classification sense than others, which can be safely eliminated without much performance penalty. It 

H.	Ochs, Michael F. et al. (US 20040111220 A1), discloses the process of decomposing complex data. Such as the inputted/received data is converted into the data format used by the computing system, for example, unformatted data on a Digital UNIX workstation or an ASCII file. Principal component analysis (PCA) is then applied to the dataset (step 200). FIG. 3 shows the application of PCA to a dataset representing multiple measurements. The input data 210 is identical to the original data 100. PCA calculates by standard mathematical methods the covariance matrix and determines its eigenvalues (step 220). It then orders these eigenvalues by how much of the total variance in the data they explain, from greatest proportion to smallest (step 230). The eigenvectors corresponding to these eigenvalues are determined by standard mathematical methods and their scores (the percentage of each data series, or row of the data matrix, which they explain) are determined by projection onto the data (step 240). Any data series which shows artifacts or is an outlier in the view of the operator is removed from the dataset (step 250). In addition, if insignificant data is discovered, (i.e., data series which only add noise to the data) such data can be removed if the operator so desires (steps 260, 270). The data without the artifacts, outliers, and 

I.	See the other cited prior arts.

However, the above prior arts of record including the rest of the cited prior arts either taken alone or in combination neither anticipates nor renders obvious the claimed subject matter of the instant application that is taken as a whole recited in the independent claims 1, 8, 13 and 18. 

For this reason, the specific claim limitations recited in the independent claims 1, 8, 13 and 18 taken as whole are allowed. 

The dependent claims 3-7, 10-12, 14-15 and 20-23 which are dependent on the above independent claim(s) being further limiting to the independent claims, definite and enabled by the specification are also allowed.

Furthermore, the applicant’s replies make evident the reasons for allowance, satisfying the “record as a whole” proviso of the rule 37 CFR 1.104(e). The grounds of claim rejection was reconsidered and withdrawn based on the substance of applicant’s amendments, remarks and arguments (see arguments/remarks, filed on 07/21/2021, pages 12-19), as such the reasons for allowance are in all probability evident from the record.	



Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALI CHEEMA, whose contact number is 571-272-1239. The examiner can normally be reached on Mon-Fri: 8AM – 4PM. 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jeffrey Pwu can be reached on 571-272-6798. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. 
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).If you would like assistance from a USPTO 

/ALI CHEEMA/
Examiner, Art Unit 2433	

/SAMSON B LEMMA/Primary Examiner, Art Unit 2498