Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-4, 7-9, 11, 13-15, 19 and 21 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by US Pat. Pub. No. 2018/0018590 to Szeto et al. (hereinafter Szeto).
Per claim 1, Szeto discloses a process (fig. 5…generation of proxy data to preserve privacy of individual, e.g., patient, data) for evolving candidate individuals (fig. 2:260 and ¶74…proxy data is synthetic data that has been ‘evolved’ from actual individual/patient data: “Proxy data 260 can be considered synthetic data randomly generated, in some cases deterministically generated, that retains the learnable salient features (i.e., knowledge) of the training data while eliminating the references to real information stored in private data 222”; fig. 1:126A-126N, fig. 2:226 and ¶74, 90…proxy data is generated by a modeling engine, “the modeling engine can use a genetic algorithm to alter the values of proxy data 360 until a suitable similar trained proxy model emerges using the similarity score as a fitness function“; fig. 5:540-560 and ¶102-104… proxy data is equivalent to the actual individual people data it corresponds to, both in the nature of the data as well as the same number of samples) for optimization against a secure third-party data set (fig. 1:122A-N; fig. 2:222…private data residing/owned by a particular entity 112N/220, the private data being third-party data sets, for example: private individual patient data residing/owned locally at Hospital 122A, Clinic 122B or Laboratory 122N; fig. 5:540-560 and ¶104…proxy data is optimized against the actual private individual/patient data based on a similarity score, where the proxy data is iteratively generated until a sufficiently accurate, e.g., within 1% or closer, proxy model is able to be generated from the proxy data relative to an actual model generated from the actual private individual/patient data) comprising:
receiving at a first server (figs. 1:124A-N; fig. 2:224…private data server for an entity receives model instructions 230) of a receiving party (fig. 1:120N; fig.2: 220…entity that houses the private data server is receiving party of model instructions; ¶53…entity 220 is an institution having private local raw data stored locally on a private data server) a first secure request for evolution (fig. 2: 230; fig. 5:510 and ¶45…model instructions initiated/sent by user/research is construed as first secure request for running model engine to generate proxy data…”researcher may interface with system 100 through the non-private computing device 130…The programmatic model instructions on how to create the desired model are then submitted to each relevant private data server 124, which also has a corresponding modeling engine 126…Each local modeling engine 126 accesses its own local private data 122 and creates local trained models according to model instructions created by the researcher”; ¶18…”the modeling engine is able to receive model instructions from one or more remote computing devices over a network…”; ¶59…communications over network is encrypted) of a first population of candidate individuals (fig. 5:520-560…model instructions initiated by researcher/user at 510 causes “at least some of the local private data”, e.g., a first population of candidate individuals/patients, to be used to generate a corresponding proxy data and proxy model for that first population of candidate individuals; ¶102-104… proxy data is equivalent to the actual individual people data it corresponds to, both in the nature of the data as well as the same number of samples) in accordance with a set of domain factors (¶62… model instructions include a set of criteria construed as domain factors: ”Model instructions 230 represent many possible mechanisms by which modeling engine 226 can be configured to gain knowledge from private data 222 and can comprise…a remote command sourced over network 215…model instructions 230 can include stream-lined instructions that inform modeling engine 226 on how to create the desired trained models…can include data filters or data selection criteria that define requirements for desired result sets created from private data 222 as well as which machine learning algorithm 295 is to be used…”) established by a requesting party (¶45…model instructions established and initiated/sent by user/research on remote non-private computing device 130, e.g., the requesting party: ”researcher may interface with system 100 through the non-private computing device 130…The programmatic model instructions on how to create the desired model are then submitted to each relevant private data server 124…Each local modeling engine 126 accesses its own local private data 122 and creates local trained models according to model instructions created by the researcher”; ¶18…”the modeling engine is able to receive model instructions from one or more remote computing devices over a network…”);
creating by the receiving party a first population of candidate individuals (fig. 5:520 and ¶100…private data server 124A-124N creates a first population of candidate individuals to use as actual/proxy model training data , e.g., some subset of the local private data of individuals/patients, after receiving model instructions: “Operation 520 includes the model engine creating the trained actual model according to the model instructions as a function of at least some of the local private data by training the implementation of the machine learning algorithm on the local private data.  The modeling engine is able to construct a training d--ata sample based on the private data selection criteria provided within the model instructions.  The modeling engine submits the data selection criteria to the local private database…results set becomes the training set for the target machine learning algorithm”; ¶82…private data can comprise various types of data associated with each individual patient: “a single sample within private data 322 could represent a single patient and the patient’s specific set of attributes or information”) and assigning a unique candidate identifier to each of the candidate individuals in the first population (¶67…subset of the local private individual/patient data used include unique identifiers for each individual/patient, the unique identifiers can be anything associated with each patient: ”results set becomes the training data for trained actual model 240. That is, the results set may be used to train actual model 240. Within the context of health care, the results set includes patient data that could also include one or more of the following patient specific information: symptoms, tests, test results, provider names, patient name, age, address, diagnosis, CPT codes, ICD codes, DSM codes, relationships, or other information that can be leveraged to describe the patients”; ¶53…local private individual/patient data 222 contains unique individual/patient identifier for each individual/patient: “patient-specific data (e.g., name, SNN, normal WGS, tumor WGS, genomic diff objects, a patient identifier, etc.”; ¶82…unique candidate identifiers can be any one of many types of data associated with each sample of the private data used); and
transmitting a first secure response (fig. 5:570…proxy data, e.g., synthetic version of the private patient data stored locally on private data servers, is transferred to user/researcher; ¶59…transmission of proxy data is encrypted; fig. 2:260,270,275,280…proxy data, proxy model and other proxy related information are transmitted back to the remote/non-private computing device 130), including the first population of candidate individuals with assigned candidate identifiers (¶46…”proxy data may be considered as a transformation of raw data into data of a different form that retains the characteristics of the raw data”; ¶88…transformation of the raw data into proxy data can be combinations of eigenvectors as derived from the private data: “The "eigenvectors" can be used to represent the training data set. Thus, proxy data 360 can be considered as comprising combinations of the eigenvectors as derived from private data 322, private data distributions 350, actual model parameters, or other information related to private data 322… Such combinations can be considered to include an eigenpatient, an eigenprofile, an eigendrug, an eigenhealth record, an eigengenome, an eigenproteome, an eigenRNA profile, an eigenpathway, or other type of vector depending on the nature of the data within private data 322”; ¶91…”The proxy data, which is synthesized, may be mapped to "fake" patients with fake records, having characteristics similar to real patients, and may be compared to patient data to ensure that it is a suitable representation of patient data”; ¶102…”The proxy data could  include the same number of samples as the provided data training set”), to a second server (fig. 5:570; fig. 1:130…proxy data sent to non-private computer device; ¶42…” Non-private computing device 130 can comprise one or more global model servers (e.g., cloud, SaaS, PaaS, IaaS, LaaS, farm, etc.) that offer distributed machine learning services to the researcher”) of the requesting party (¶45…user/research on remote non-private computing device 130 that initiated/sent the request with the modeling instructions), wherein the first server and the second server are separate by a firewall (fig. 1…remote non-private computing device 130 networked with private data servers 120A-120N are separated by a firewall; ¶53…”Private data server 224 represents a local server, typically located behind a firewall of entity 220”).
Per claim 2, Szeto discloses claim 1, further disclosing the set of domain factors includes one or more of: domain constraints, known domain parameters (¶62…model instructions contain domain constraints and parameters for the problem/inquiry the user/researcher wishes to discover: “model instructions can include data filters or data selection criteria that define requirements for desired results sets created from private data 222 as well as which machine learning algorithm 295 is to be used. Consider a scenario where a researcher wishes to research which patients are responders or non-responders to various drugs based on a support vector machine (SVM) in view of a specific genome difference between the patient's tumor sequence and the patient's matched normal sequence.  Model instructions 230 for such a case can include…the requirements for the data to be selected form private data 222, identified drug, reference to specific genomic diff object(s), indication of response vs. non-response, etc.”) and formatting rules (¶62…modeling instructions can be packaged via XML or HDF5 construed as file formatting rules; ¶66-67…model instructions can include query engine allowing it to query the database storing the private data, e.g., understand/interpret formatting rules of database) for a specific representation of each of the candidate individuals (modelling instruction applied to subset of individual/patients in local private data).
Per claim 3, Szeto discloses claim 2, further disclosing translating by a translator of the receiving party each of the candidate individuals from a first format to a second format in accordance with the formatting rules (fig. 2:222 and ¶67…private data stored in local database in one format, e.g., SQL database, of private data server 226, where the database is the translator translating queries from model instruction and supplying results for the query in another format: “the query could include a SQL query properly formatted from the requirements in the model instructions 230 to access or received attributes or tables stored in private data 222”). 
Per claim 4, Szeto discloses claim 2, further disclosing the first secure response further includes a first checkpoint key (fig. 2:280 and ¶79-80…similarity score checks the accuracy between the proxy model and the actual model, construed to be a checkpoint key, the similarity score can be sent to the user/researcher on the remote non-private computer device 130: “modeling engine 226 can transmit, e.g., according to model instructions 230, one or more of proxy data 260, proxy model parameters 275, similarity score 280, or other information to a non-private computing device located over network 215”). 
Per claim  7, Szeto discloses claim 1, further disclosing the candidate individuals are neural networks (fig. 2:295 and ¶69…neural networks can be used).
Per claim 8, Szeto discloses claim 1, further disclosing the first secure request and the first secure response are encrypted (¶59…communication on network between devices can be encrypted: “Still, the communication link can be secured through encryption (e.g., HTTPS, SSL, SSH, AES, etc.)”).
Per claim 9, Szeto discloses claim 3, further disclosing the first format is a coded genome format (fig. 2:222 and ¶82…private data includes coded genomic information stored in a database, construed as a coded genome format).
Per claim 11, Szeto discloses claim 3, further disclosing the second format is in a JSON code (¶58…”the various data elements exchange in the system (e.g., model instructions 230, proxy data 260, etc.) can be packaged via one or more markup languages (e.g., XML, YAML, JSON, etc.) or other file formats (e.g., HDFS, etc.)”).
Per claim 13, Szeto discloses a process (fig. 5…generation of proxy data to preserve privacy of individual, e.g., patient, data) for evolving candidate individuals (fig. 2:260 and ¶74…proxy data is synthetic data that has been ‘evolved’ from actual individual/patient data: “Proxy data 260 can be considered synthetic data randomly generated, in some cases deterministically generated, that retains the learnable salient features (i.e., knowledge) of the training data while eliminating the references to real information stored in private data 222”; fig. 1:126A-126N, fig. 2:226 and ¶74, 90…proxy data is generated by a modeling engine, “the modeling engine can use a genetic algorithm to alter the values of proxy data 360 until a suitable similar trained proxy model emerges using the similarity score as a fitness function“; fig. 5:540-560 and ¶102-104… proxy data is equivalent to the actual individual people data it corresponds to, both in the nature of the data as well as the same number of samples) for optimization against a secure data set (fig. 1:122A-N; fig. 2:222…private data residing/owned by a particular entity 112N/220, the private data being third-party data sets, for example: private individual patient data residing/owned locally at Hospital 122A, Clinic 122B or Laboratory 122N; fig. 5:540-560 and ¶104…proxy data is optimized against the actual private individual/patient data based on a similarity score, where the proxy data is iteratively generated until a sufficiently accurate, e.g., within 1% or closer, proxy model is able to be generated from the proxy data relative to an actual model generated from the actual private individual/patient data) comprising: 
transmitting a first secure request (fig. 2: 230; fig. 5:510 and ¶45…model instructions initiated/sent by user/research is construed as transmitting a first secure request for running model engine to generate proxy data is transmitted from non-private computer device 130 to private data servers 124A-N…”researcher may interface with system 100 through the non-private computing device 130…The programmatic model instructions on how to create the desired model are then submitted to each relevant private data server 124, which also has a corresponding modeling engine 126…Each local modeling engine 126 accesses its own local private data 122 and creates local trained models according to model instructions created by the researcher”; ¶18…”the modeling engine is able to receive model instructions from one or more remote computing devices over a network…”; ¶59…communications over network is encrypted) from a first server (fig. 1:130…non-private computing device) for evolution of a first population of candidate individuals (fig. 5:520-560…model instructions initiated by researcher/user at 510 causes “at least some of the local private data”, e.g., a first population of candidate individuals/patients, to be used to generate a corresponding proxy data and proxy model for that first population of candidate individuals, the transformation being evolution; ¶102-104… proxy data is equivalent to the actual individual people data it corresponds to, both in the nature of the data as well as the same number of samples)  in accordance with a set of domain factors (¶62… model instructions include a set of criteria construed as domain factors: ”Model instructions 230 represent many possible mechanisms by which modeling engine 226 can be configured to gain knowledge from private data 222 and can comprise…a remote command sourced over network 215…model instructions 230 can include stream-lined instructions that inform modeling engine 226 on how to create the desired trained models…can include data filters or data selection criteria that define requirements for desired result sets created from private data 222 as well as which machine learning algorithm 295 is to be used…”) to a second server (fig. 1:124A-N; fig. 2:224…private data server is second server receiving the model instructions); 
receiving a first secure response (fig. 5:570…proxy data and association proxy information, e.g., synthetic version of the private patient data stored locally on private data servers, is transferred to user/researcher in response to the model instructions transmitted; ¶59…transmission of proxy data is encrypted; fig. 2:260,270,275,280…proxy data, proxy model and other proxy related information are transmitted from the private data server 224 and received at the remote/non-private computing device 130), including the first population of candidate individuals with assigned candidate identifiers (¶46…”proxy data may be considered as a transformation of raw data into data of a different form that retains the characteristics of the raw data”; ¶88…transformation of the raw data into proxy data can be combinations of eigenvectors as derived from the private data: “The "eigenvectors" can be used to represent the training data set. Thus, proxy data 360 can be considered as comprising combinations of the eigenvectors as derived from private data 322, private data distributions 350, actual model parameters, or other information related to private data 322… Such combinations can be considered to include an eigenpatient, an eigenprofile, an eigendrug, an eigenhealth record, an eigengenome, an eigenproteome, an eigenRNA profile, an eigenpathway, or other type of vector depending on the nature of the data within private data 322”; ¶91…”The proxy data, which is synthesized, may be mapped to "fake" patients with fake records, having characteristics similar to real patients, and may be compared to patient data to ensure that it is a suitable representation of patient data”; ¶102…”The proxy data could include the same number of samples as the provided data training set”), at the first server fig. 2:260,270,275,280…proxy data, proxy model and other proxy related information are transmitted from the private data server 224 and received at the remote/non-private computing device 130), wherein the first server and the second server are separate by a firewall (fig. 1…remote non-private computing device 130 networked with private data servers 120A-120N are separated by a firewall; ¶53…”Private data server 224 represents a local server, typically located behind a firewall of entity 220”); and
evaluating one or more of the candidates individuals against the secure data set to determine measurements indicative of a fitness of each of the candidate individuals (fig. 5:540-560 and ¶104…proxy data is optimized against the actual private individual/patient data is based on similarity scores, e.g., measurements, where the proxy data is iteratively generated until a sufficiently accurate, e.g., within 1% or closer, proxy model is able to be generated from the proxy data relative to an actual model generated from the actual private individual/patient data, such that the iterative process is based on a fitness function associated with the similarity score; ¶74, 90…proxy data is generated by a modeling engine, “the modeling engine can use a genetic algorithm to alter the values of proxy data 360 until a suitable similar trained proxy model emerges using the similarity score as a fitness function“) for a predetermined use (¶62,99,106…proxy data/model can be used by user/researcher for pre-determined research projects/inquiries).
Per claim 14, Szeto discloses claim 13, further disclosing translating by a translator of the first server each of the first candidate individuals from a first format to a second format in accordance with the formatting rules (fig. 2:222 and ¶67…private data stored in local database in one format, e.g., SQL database, of private data server 226; ¶66…model instructions 230 could be self-contained to include a complete modeling packaging including a query engine, e.g., for SQL, thus having a translator for translating queries and supplying results for the query in another format).
Per claim 15, Szeto discloses claim 13, further disclosing the first secure response further includes a first checkpoint key (fig. 2:280 and ¶79-80…similarity score checks the accuracy between the proxy model and the actual model, construed to be a checkpoint key, the similarity score can be sent to the user/researcher on the remote non-private computer device 130: “modeling engine 226 can transmit, e.g., according to model instructions 230, one or more of proxy data 260, proxy model parameters 275, similarity score 280, or other information to a non-private computing device located over network 215”).
Per claim 19, Szeto discloses claim 14, further disclosing the first format is a coded genome format (fig. 2:222 and ¶82…private data includes coded genomic information stored in a database, construed as a coded genome format).
Per claim 21, Szeto discloses claim 17, further disclosing the second format is in JSON code (¶58…”the various data elements exchange in the system (e.g., model instructions 230, proxy data 260, etc.) can be packaged via one or more markup languages (e.g., XML, YAML, JSON, etc.) or other file formats (e.g., HDFS, etc.)”).
Allowable Subject Matter
Claims 5, 6, 10 and 12 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is the statement of reasons for the indication of allowable subject matter:  The prior art disclosed by the applicant and cited by the Examiner fail to teach or suggest, alone or in combination, all the limitations of the independent and intervening claims (claims 1, 2 and 4), further including the particular notable limitations provided below: 
receiving at the first server, a second secure request for evolution of a second population of candidate individuals, where the second secure request includes the first checkpoint key and results of evaluation by the second server of one or more of the candidate individuals from the first population against the secure third-party data set; creating by the receiving party the second population of candidate individuals and assigning a unique candidate identifier to each of the candidate individuals in the second population; and transmitting a second secure response, including the second population of candidate individuals with assigned candidate identifiers, to the second server of the requesting party, wherein the second secure response includes a second checkpoint key.

Claims 16-18, 20 and 22 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is the statement of reasons for the indication of allowable subject matter:  The prior art disclosed by the applicant and cited by the Examiner fail to teach or suggest, alone or in combination, all the limitations of the independent and intervening claims (claims 13 and 15), further including the particular notable limitations provided below: 
transmitting by the first server, a second secure request for evolution of a second population of candidate individuals, where the second secure request includes the first checkpoint key and results of the evaluation by the first server of the one or more candidate individuals from the first population against the secure data set; receiving a second secure response, including the second population of candidate individuals with assigned candidate identifiers, at the first server, wherein the second secure response includes a second checkpoint key.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.  Patents and/or related publications are cited in the Notice of References Cited (Form PTO-892) attached to this action to further show the state of the art with respect to protecting third party datasets using machine learning processes to maintain privacy and security.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALAN CHEN whose telephone number is (571)272-4143. The examiner can normally be reached M-F 10-7.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ALAN CHEN/Primary Examiner, Art Unit 2125