DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
This is a Non-Final Office Action in response to the communication filed on November 27, 2019.
Claims 1-20 have been examined.


Drawings
The drawings filed on November 27, 2019 are acceptable for examination proceedings.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or      nonobviousness.

Claims 1-2, and 4-20 are rejected under 35 U.S.C. 103 as being unpatentable over Truong et al. (U.S. Patent No.: US 10,460,235 B1 / or “Truong” hereinafter) in view of Unger et al. (Canadian Patent Application Publication No.: CA 3033438 A1 / or “Unger” hereinafter).
	
Regarding claim 1, Truong discloses “A method for generating a synthetic dataset, the method comprising” (Col 2: 52-53: Systems and Methods for synthetic data generation):
“generating discretized synthetic data based on driving a model of a [cumulative distribution function (CDF)] with random numbers, wherein the [CDF] is based on a source dataset” (Col 11:34-41, generates data model);
“generating the synthetic dataset from the discretized synthetic data, comprising[i.e., use dictionary to replace sensitive data, see Spec, Para 0039]” (Col 17: 47-60, replace sensitive data with synthetic portion): 
“selecting, for inclusion into the synthetic dataset, values from a plurality of entries of the source dataset, based on the discretized synthetic data” (Col 13:16-21, obtains data from random number generator); 
“and providing the synthetic dataset to a downstream application that is configured to operate on the source dataset” (Col 19:32-40).
Truong teaches generation of data model, but does not explicitly teach generation of a cumulative distribution function (CDF) model.  
However, generation of a CDF model would have been obvious (see Unger, paragraph 8).  
	It would have been obvious to an ordinary person skilled in the art before the effective filing date of the claimed invention to employ the teachings of generation of a cumulative distribution function (CDF) model of Unger to the system of Truong to create a system where “mapping the cumulative distribution function into the median-relative space” can be achieved and the ordinary person skilled in the art would have been motivated to combine to “minimize mean-squired error of the function” (Unger, paragraph 8).

Regarding claim 2, in view of claim 1, Truong in view of Unger disclose “wherein the model of the CDF is a regression model implemented by a neural network” (Truong, Col 31: 53-63, the models are regression model; and Unger, Para 008: discloses CDF).

Regarding claim 4, in view of claim 2, Truong in view of Unger disclose “wherein the discretized synthetic data comprise indices that identify bins of the CDF, and wherein the regression model outputs continuous numbers and the continuous numbers are binned to the indices that identify the bins” (Unger, Para 0072).

Regarding claim 5, in view of claim 1, Truong in view of Unger disclose “wherein generating the synthetic dataset from the discretized synthetic data further comprises resampling the values of the source dataset” (Truong, Col 20: 40-66).

Regarding claim 6, in view of claim 1, Truong in view of Unger disclose “wherein identifying the values from the plurality of entries in the source dataset comprises translating, based on a dictionary, values in bins of the [CDF] to the values in the source dataset” (Unger, Para 0072).

Regarding claim 7, in view of claim 1, Truong discloses “wherein the values of the source dataset are of a plurality of data fields, the plurality of data fields each corresponding to one of user identification information, user financial information, and user demographics” (Truong, Col 9: 1-20).

Regarding claim 8, in view of claim 1, Truong discloses “wherein the values of the source dataset comprise at least one selected from the group consisting of continuous data and categorical data” (Truong, Col 21: 52-60, categorical data).

Regarding claim 9, in view of claim 1, Truong in view of Unger disclose “ further comprising, prior to generating the discretized synthetic data: 
“obtaining the source dataset, the source dataset comprising the plurality of entries for a plurality of data fields” (Col 16: 55-65, obtains dataset from database 105);  
“generating a dictionary for each of the plurality of data fields, wherein each dictionary establishes a mapping between the entries of the data field associated with the dictionary and corresponding dictionary values” (Col 17: 47-60, replace sensitive data with synthetic portion); 
“generating the discretized dataset from the source dataset by replacing the plurality of entries for the plurality of data fields by the corresponding dictionary values” (Col 17: 47-60, replace sensitive data with synthetic portion);
 “generating the [CDF] based on the discretized dataset; 
and training the model of the [CDF]” (Col 33: 11-37, uses neural network to predict and generate synthetic data and initializes with random sequence of characters; and Col 23: 57-67 to Col 24: 1-17, discloses use of bins).
Truong teaches generation of data model, but does not explicitly teach generation of a cumulative distribution function (CDF) model.  
However, generation of a CDF model would have been obvious (see Unger, paragraph 8).  
	It would have been obvious to an ordinary person skilled in the art before the effective filing date of the claimed invention to employ the teachings of generation of a cumulative distribution function (CDF) model of Unger to the system of Truong to create a system where “mapping the cumulative distribution function into the median-relative space” can be achieved and the ordinary person skilled in the art would have been motivated to combine to “minimize mean-squired error of the function” (Unger, paragraph 8).

Regarding claim 10, in view of claim 9, Truong in view of Unger disclose “wherein generating the CDF comprises establishing bins of the CDF based on a combination of the plurality of data fields” (Unger, Para 0072).

Regarding claim 11, in view of claim 9, Truong in view of Unger disclose “wherein generating the model of the CDF comprises training a neural network that approximates the CDF when provided with the random numbers, wherein the random numbers are distributed uniformly” (Unger, Para 0008 and 0023).

Regarding claim 12, Truong discloses “A method for securely driving a downstream application, the method comprising” (Col 2: 52-53: Systems and Methods for synthetic data generation):  
“obtaining a source dataset for driving the downstream application” (Col 16: 55-65, obtains dataset from database 105);
“generating a discretized dataset from the source dataset [i.e., use dictionary to replace sensitive data, see Spec, Para 0039]” (Col 17: 47-60, replace sensitive data with synthetic portion);  
[generating a cumulative distribution function (CDF) for the discretized dataset; establishing a model of the CDF]; 
“obtaining random numbers” (Col 13:16-21, obtains data from random number generator);
“generating discretized synthetic data by driving the model of the [CDF] with the random numbers, wherein the discretized synthetic data comprises indices identifying bins of the [CDF]” (Col 33: 11-37, uses neural network to predict and generate synthetic data and initializes with random sequence of characters; and Col 23: 57-67 to Col 24: 1-17, discloses use of bins);  
generating a synthetic dataset by selecting, for the synthetic dataset, values from a plurality of entries in the source dataset, based on the bins” (Col 33: 11-37, uses neural network to predict and generate synthetic data; and Col 23: 57-67 to Col 24: 1-17, discloses use of bins); 
“and providing the synthetic dataset to the downstream application as a substitute for the source dataset” (Col 19:32-40).
Truong teaches generation of data model, but does not explicitly teach generation of a cumulative distribution function (CDF) model.  
However, generation of a CDF model would have been obvious (see Unger, paragraph 8).  
	It would have been obvious to an ordinary person skilled in the art before the effective filing date of the claimed invention to employ the teachings of generation of a cumulative distribution function (CDF) model of Unger to the system of Truong to create a system where “mapping the cumulative distribution function into the median-relative space” can be achieved and the ordinary person skilled in the art would have been motivated to combine to “minimize mean-squired error of the function” (Unger, paragraph 8).

Regarding claim 13, in view of claim 12, Truong discloses “wherein the downstream application comprises an algorithm being trained, using the synthetic dataset, and wherein, after the training, the downstream application operates on non-synthetic data” (Truong, Col 30: 8-20, actual data is used).

Regarding claim 14, in view of claim 12, Truong discloses “wherein the downstream application is a financial software application” (Truong, Col 36: 9-31).

Regarding claim 15, in view of claim 12, Truong discloses “wherein the source dataset comprises at least one selected from a group consisting of user identification information, user financial information, and user demographics” (Truong, Col 9: 1-20).

Regarding claim 16, Truong discloses “A system for generating a synthetic dataset, the system comprising” (Col 2: 52-53: Systems and Methods for synthetic data generation): 
“a random number source configured to generate random numbers with a uniform distribution” (Col 11:34-41, a random source is used to map a data model); 
“a data repository storing a source dataset” (10:16-19, database 105 stores data 
“and a computer processor configured to execute instructions to perform” (4:10-15, a processor generate synthetic data): 
“obtaining the source dataset” (Col 16: 55-65, obtains dataset from database 105); 
“generating a discretized dataset from the source dataset [i.e., use dictionary to replace sensitive data, see Spec, Para 0039]” (Col 17: 47-60, replace sensitive data with synthetic portion); 
[generating a cumulative distribution function (CDF) for the discretized dataset]; 
“establishing a model of the [CDF]” (Col 11:34-41, generates data model); 
“obtaining the random numbers from the random number generator” (Col 13:16-21, obtains data from random number generator); 
“generating discretized synthetic data by driving the model of the [CDF] with the random numbers, wherein the discretized synthetic data comprises indices identifying bins of the [CDF]” (Col 33: 11-37, uses neural network to predict and generate synthetic data and initializes with random sequence of characters; and Col 23: 57-67 to Col 24: 1-17, discloses use of bins); 
“and generating the synthetic dataset by selecting, for the synthetic dataset, values from a plurality of entries in the source dataset, based on the bins” (Col 33: 11-37, uses neural network to predict and generate synthetic data; and Col 23: 57-67 to Col 24: 1-17, discloses use of bins).
Truong teaches generation of data model, but does not explicitly teach generation of a cumulative distribution function (CDF) model.  
However, generation of a CDF model would have been obvious (see Unger, paragraph 8).  
	It would have been obvious to an ordinary person skilled in the art before the effective filing date of the claimed invention to employ the teachings of generation of a cumulative distribution function (CDF) model of Unger to the system of Truong to create a system where “mapping the cumulative distribution function into the median-relative space” can be achieved and the ordinary person skilled in the art would have been motivated to combine to “minimize mean-squired error of the function” (Unger, paragraph 8).

Regarding claim 17, in view of claim 16, Truong discloses “further comprising a downstream application configured to operate on the source dataset, wherein the processor is further configured to providing the synthetic dataset to the downstream application, and wherein the downstream application operates on the synthetic dataset providing a substitute for the source dataset” (Col 20:40-50, if the current sample is sensitive data is determined).

Regarding claim 18, in view of claim 16, Truong in view of Unger disclose “wherein the model of the CDF is a regression model” (Truong, Col 31: 53-63, the models are regression model; and Unger, Para 008: discloses CDF).
Regarding claim 19, in view of claim 18, Truong discloses “wherein the regression model is implemented by a neural network” (Col 21: 52-60, categorical data).

Regarding claim 20, in view of claim 16, Truong discloses “wherein the source dataset comprises at least one selected from a group consisting of continuous data and categorical data” (Col 21: 52-60, categorical data).

Claim 3 rejected under 35 U.S.C. 103 as being unpatentable over Truong in view of Unger and in further view of Marshall et al. (US 2021/0000442 A1 / or “Marshall” hereinafter).
Regarding claim 3, in view of claim 2, Truong discloses generation of data model.
Unger discloses CDF (Para 008).
But Truong and Unger do not explicitly teach “regression model operates on the random numbers that are uniformly distributed in the range between 0 and 1”.
 However, Marshall discloses “wherein the regression model operates on the random numbers that are uniformly distributed in the range between 0 and 1” (Marshall, Para 0171).
	It would have been obvious to an ordinary person skilled in the art before the effective filing date of the claimed invention to employ the teachings of “the regression model operates on the random numbers that are uniformly distributed in the range between 0 and 1” of Marshall in the system of Truong and Unger to create a system where “oversampling was used to
increase the sample size” and the ordinary person skilled in the art would have been motivated to combine to “correct for this bias and corroborate the accuracy of the model” (Marshall, Para 0171).
Relevant Prior Arts
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Szeto et al. (US 2018/0018590 A1) discloses “… systems include many private data servers, each having local private data. Researchers can request that relevant private data servers train implementations of machine learning algorithms on their local private data without requiring de-identification of the private data or without exposing the private data to unauthorized computing systems. The private data servers also generate synthetic or proxy data according to the data distributions of the actual data. The servers then use the proxy data to train proxy Models” (Abstract).
Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ABDULLAH ALMAMUN whose telephone number is         (571) 270-3392.  The examiner can normally be reached on 8 AM - 5 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Lynn Feild can be reached on (571) 272-2092.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ABDULLAH ALMAMUN/Examiner, Art Unit 2431                                                                                                                                                                                                        
/LYNN D FEILD/Supervisory Patent Examiner, Art Unit 2431