DETAILED ACTION
Claims 1-30 have been examined.
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

Information Disclosure Statement
The information disclosure statement filed 6/24/2020 fails to comply with 37 CFR 1.98(a)(2), which requires a legible copy of each cited foreign patent document; each non-patent literature publication or that portion which caused it to be listed; and all other information or that portion which caused it to be listed.  It has been placed in the application file, but the information referred to therein has not been considered.  A copy of the International Search Report and Written Opinion for PCT/IL18/51345 was not provided.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.

Claims 1-30 are rejected on the ground of nonstatutory obviousness-type double patenting as being unpatentable over claim 1-28 of US Patent No. 10,699,194 (reference patent), in view of Zheng et al. (US 2021/0374279, hereinafter Zheng).
Although the conflicting claims are not identical, they are not patentably distinct from each other. Below is an explanation of claim 1 of the instant application as being met by claim 1 of US Patent No. 10,699,194, in view of Zheng.

Claim 1 of instant application
Claim 1 of 10,699,194
1. A method to mimic a pre-trained target model at a device without access to the pretrained target model or its original training dataset, the method comprising, at the device:
 
1. A method to mimic a pre-trained target model at a device without access to the pre-trained target model or its original training dataset, the method comprising, at the device: 
requesting a remote device perform an initial probe of the pre-trained target model with multiple input samples of each of a plurality of data types or distributions to generate corresponding multiple target model outputs associated with each data type or distribution, wherein the multiple input samples of each data type or distribution are different from each other in an input space, and selecting the data type or distribution for the random probe training dataset associated with the corresponding multiple target model outputs with the smallest difference from each other in an output space;
sending a set of random or semi-random input data to a remote device to randomly probe the pre--trained target model remotely by inputting the set of random or semi--random input data into the pre-trained target model;
sending a set of random or semi-random input data of the selected data type or distribution to the remote device to randomly probe the pre-trained target model remotely by inputting the set of random or semi-random input data into the pre-trained target model;

receiving from the remote device a set of corresponding output data generated by applying the pre-trained target model to the set of random or semi-random input data;
receiving from the remote device a set of corresponding output data generated by applying the pre-trained target model to the set of random or semi-random input data;
generating a random probe training dataset comprising the set of random or semi-random input data and corresponding output data generated by randomly probing the pre-trained target model;
generating a random probe training dataset comprising the set of random or semi-random input data and corresponding output data generated by randomly probing the pre-trained target model; and
training a new model with the random probe training dataset so that the new model generates substantially the same corresponding output data in response to said
input data to mimic the pre--trained target model; and
training a new model with the random probe training dataset so that the new model generates substantially the same corresponding output data in response to said input data to mimic the pre-trained target model.
removing a correlation in the new model based on training data linking an input to an output, without accessing at least one of the input or output, by adding to the random probe training dataset a plurality of random correlations to the output or input, respectively, to weaken or eliminate the correlation between the input and output.



As shown by the table above, claim 1 of the instant application differs from claim 1 of the ‘194 patent in that the ‘194 patent does not recite removing a correlation in the new model based on training data linking an input to an output, without accessing at least one of the input or output, by adding to the random probe training dataset a plurality of random correlations to the output or input, respectively, to weaken or eliminate the correlation between the input and output.  However, Zheng teaches this feature at Fig. 1C, Fig. 1D, [0012], [0022], [0041], [0042]). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention that claim 1 of the ‘194 patent also include this feature as doing so would protect sensitive data (see at least [0012], [0042] of Zheng).

Independent claims 16 of the instant application is not patentably distinct from claim 15 of ‘194 patent and independent claim 30 of the instant application is not patentably distinct from claim 28 of ‘194 patent using the same analysis as above. The features recited in claims 2-15 and 17-29 of the instant application are found in claims 1-28 of the ‘194 application. 
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 14 and 23 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.

Claim 14 recites “requesting the remote device perform an initial probe of the pre-trained target model with multiple input samples of each of a plurality of data types or distributions that are slightly different from each other in an input space; and selecting the data type or distribution for the random probe training dataset associated with corresponding multiple target model outputs with the smallest difference in the output space.”  Claim 23 recites similar claim language. This is not clearly understood.  First, the term “slightly” is a term of relative degree, and it is not clear how different is considered as “slightly different.”  Second, the  phrase “selecting the data type or distribution for the random probe training dataset associated with corresponding multiple target model outputs with the smallest difference in the output space” is not clearly understood because it is not clear which elements are compared to determine the difference, i.e., does the difference refer to differences when comparing outputs within each of the data type or distribution against each other, or differences when comparing outputs from one data type or distribution against another data type or distribution.
Appropriate corrections are required. 
Any claim not specifically addressed, above, is being rejected as incorporating the deficiencies of a claim upon which it depends.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 8, 11, 13, 15-17, 24, 27, 29, and 30 are rejected under 35 U.S.C. 103 as being unpatentable over Papernot et al. “Practical Black-Box Attacks against Machine Learning” (hereinafter Papernot), in view of Zheng et al. (US 2021/0374279, hereinafter Zheng).

As per claim 1, Papernot teaches the invention as claimed, including a method to mimic a pre-trained target model at a device without access to the pre- trained target model or its original training dataset (i.e., train a substitute model for a target model without knowledge of the model internals or its training data, see at least page 506, abstract, right column, paragraphs 2, 3), the method comprising, at the device: 
sending a set of random or semi-random input data to a remote device to randomly probe the pre-trained target model remotely by inputting the set of random or semi-random input data into the pre-trained target model (i.e., querying an oracle, which is a targeted DNN hosted remotely, we apply reservoir sampling to reduce the number of queries made to the oracle, reservoir sampling is a technique that randomly selects k samples from a list of samples, see at least page 507, left column, paragraphs 2, 3, page 508, right column, paragraph 1, pages 508-509, section 4.1, page 510, Figure 3, page 513, right column, paragraph 3); 
receiving from the remote device a set of corresponding output data generated by applying the pre-trained target model to the set of random or semi-random input data (i.e., querying for a label, see at least page 507, left column, paragraphs 2, 3, page 508, right column, paragraph 1, pages 508-509, section 4.1, page 510, Figure 3, page 513, right column, paragraph 3); 
generating a random probe training dataset comprising the set of random or semi- random input data and corresponding output data generated by randomly probing the pre-trained target model (i.e., label each sample in initial substitute training set by querying for labels output by oracle, see at least page 507, left column, paragraphs 2, 3, page 508, right column, paragraph 1, pages 508-509, section 4.1, page 510, Figure 3, page 513, right column, paragraph 3); and 
training a new model with the random probe training dataset so that the new model generates substantially the same corresponding output data in response to said input data to mimic the pre-trained target model (i.e., train substitute model using substitute training set, see at least page 507, left column, paragraphs 2, 3, page 508, right column, paragraph 1, pages 508-509, section 4.1, page 510, Figure 3, page 513, right column, paragraph 3).
Papernot does not explicitly teach removing a correlation in the new model based on training data linking an input to an output, without accessing at least one of the input or output, by adding to the random probe training dataset a plurality of random correlations to the output or input, respectively, to weaken or eliminate the correlation between the input and output.
Zheng teaches removing a correlation in the new model based on training data linking an input to an output (i.e., a differentially private student model, algorithm may be referred to as a differentially private algorithm if an observer seeing output of the algorithm cannot tell if a particular individual's information was used to compute the output, see at least Fig. 1C, Fig. 1D, [0012], [0041], [0042]), without accessing at least one of the input or output (i.e., synthetic data is provided to a student model, see at least Fig. 1C, Fig. 1D, [0042]), by adding to the random probe training dataset a plurality of random correlations to the output or input, respectively, to weaken or eliminate the correlation between the input and output (i.e., noise may be applied to the synthetic data, the application of noise to the input provided to the student model enhances differential privacy, see at least Fig. 1D, [0022], [0042]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Papernot to remove a correlation in the new model based on training data linking an input to an output, without accessing at least one of the input or output, by adding to the random probe training dataset a plurality of random correlations to the output or input, respectively, to weaken or eliminate the correlation between the input and output as similarly taught by Zheng in order to enhance differential privacy, to protect sensitive data (see at least [0012], [0042] of Zheng).

As per claim 8, Papernot teaches training the new model over multiple epochs with a different random probe training dataset in each of the multiple epochs (i.e., training substitute DNN for several substitute epochs, see at least pages 508-509, section 4.1, page 510, Figure 3, page 513, right column, paragraph 3).

As per claim 11, Papernot teaches wherein the models are neural networks (see at least pages 506-507, sections 1, 2), and comprising generating synthetic training samples by backpropagating error at an output layer backwards through the target model neural network to adjust training samples in the input layer to reduce the error (i.e., back propagation algorithm in training phase, see at least page 507, section 2, pages 508-509, section 4.1).

As per claim 13, Papernot does not explicitly teach measuring statistical properties of one or more sample inputs of the same type as the original training dataset or an accessible subset thereof; and semi-randomly selecting the set of input data according to those statistical properties.
Zheng teaches measuring statistical properties of one or more sample inputs of the same type as the original training dataset or an accessible subset thereof (i.e., teacher models may be trained to determine statistical information of the true knowledge graph, see at least [0034]); and
semi-randomly selecting the set of input data according to those statistical properties (i.e., synthetic knowledge graph may include randomly generated data or pseudo randomly generated data, generate a synthetic data set that has the same or similar statistical properties as the true data set, see at least [0022], [0027], [0047]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Papernot to measure statistical properties of one or more sample inputs of the same type as the original training dataset or an accessible subset thereof; and semi-randomly selecting the set of input data according to those statistical properties as similarly taught by Zheng to provide differentially private synthetic data that share the same statistical characteristics of true data while maintaining privacy of true data (see least [0023] of Zheng).

As per claim 15, Papernot teaches after training the new model, executing the new model in a run-time phase by inputting new data into the new model and generating corresponding data output by the new model (i.e., classify inputs by the substitute DNN, see at least page 506, right column, paragraph 4, page 511, left column, paragraphs 2, 5).

As per claims 16, 24, 27 and 29, these are the system claims of claims 1, 8, 11 and 15.  Therefore, claims 16, 24, 27 and 29 are rejected using the same reasons as claims 1, 8, 11 and 15. 

As per claim 17, Papernot teaches one or more memories to store one or more samples of the random probe training dataset (see at least page 511, section “Initial Substitute Training Sets”, page 513, right column, paragraph 3).

As per claim 30, this is the non-transitory computer-readable medium claim of claims 1.  Therefore, claim 30 is rejected using the same reasons as claim 1. 




Claims 2, 3, 19, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Papernot, in view of Zheng, further in view of Bowers et al. (US 2016/0300156, hereinafter Bowers).

As per claim 2, Papernot does not explicitly teach adding new data to the random probe training dataset to incorporate new knowledge not present in the pre-trained target model.
Bowers teaches adding new data to a training dataset to incorporate new knowledge not present in a pre-trained target model (i.e., training latent model using new training dataset, see at least [0012], [0013], [0019], [0020]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Papernot to add new data to the random probe training dataset to incorporate new knowledge not present in the pre-trained target model as similarly taught by Bowers in order to keep models current (see at least [0003] of Bowers).

As per claim 3, Papernot does not explicitly teach defining data to be omitted from the random probe training dataset to eliminate knowledge present in the pre-trained target model.
Bowers teaches defining data to be omitted from the random probe training dataset to eliminate knowledge present in the pre-trained target model (i.e., edit configurations by removing one or more training sets, training the latent model based on the specified configurations, see at least [0012], [0013], [0019], [0020]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Papernot to define data to be omitted from the random probe training dataset to eliminate knowledge present in the pre-trained target model as similarly taught by Bowers in order to keep models current (see at least [0003] of Bowers).

As per claims 19 and 20, these are the system claims of claims 2 and 3.  Therefore, claims 19 and 20 are rejected using the same reasons as claims 2 and 3. 

Claims 4 and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Papernot, in view of Zheng, further in view of Heaton et al. (US 2018/0000385, hereinafter Heaton).

As per claim 4, Papernot does not explicitly teach re-training the new model using the random probe training dataset to mimic re-training the target pre-trained model.
Heaton teaches re-training a new model using a training dataset to mimic re-training a
target pre-trained model (i.e., retrain a compressed and complete model based on appended
training set, see at least [0014], [0071]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Papernot to re-train the new model using the random probe training dataset to mimic re-training the target pre-trained model as similarly taught by Heaton in order to update the model (see at least [0071] of Heaton).

As per claim 21, this is the system claim of claim 4. Therefore, claim 21 is rejected using
the same reasons as claim 4.
Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Papernot, in view of Zheng, further in view of Heaton, further in view of Darvish Rouhani et al. (US 2019/0197406, hereinafter Darvish).

As per claim 5, Papernot does not explicitly teach sparsifying the new model to mimic the pre-trained target model to generate a sparse new model.
Darvish teaches sparsifying a model to generate a sparse model (i.e., prune neurons to create a sparse DNN, see at least Fig. 14, [0065]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Papernot to sparsifying the new model to mimic the pre-trained target model to generate a sparse new model as similarly taught by Darvish in order to reduce computational burdens of a neural network (see at least [0001] of Darvish).

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Papernot, in view of Zheng, further in view of Heaton, further in view of Susskind et al. (US 2018/0157992, hereinafter Susskind).

As per claim 6, Papernot does not explicitly teach evolving the new model by applying evolutionary algorithms to mimic the pre-trained target model.
Susskind teaches evolving a new model by applying evolutionary algorithms to mimic a pre-trained target model (i.e., model training unit configured to compare matrices using a function that may be optimized by evolutionary search algorithms, based on the comparison, the model training unit configures a model, such as the student model to mimic behavior of another model, such as the teacher model, see at least [0185]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Papernot to evolving the new model by applying evolutionary algorithms to mimic the pre-trained target model as similarly taught by Susskind to use known technique in the field to mimic a pre-trained model (see at least [0185] of Susskind).

Claims 7 and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Papernot, in view of Zheng, further in view of Ferguson et al. (US 2003/0130899, hereinafter Ferguson).

As per claim 7, Papernot does not explicitly teach generating or re-training the new model after all copies of the original training dataset are deleted at the remote device.
Ferguson teaches generating or re-training a model after all copies of original training dataset are deleted as a remote device (i.e., updating the training set to generate new training sets by removing old data and adding new data, and training the non-linear model with each training set, see at least [0038]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Papernot to generate or re-train the new model after all copies of the original training dataset are deleted at the remote device as similarly taught by Ferguson in order to update the model when new training data is available (see at least [0038], [0154] of Ferguson).

As per claim 22, this is the system claim of claim 7.  Therefore, claim 22 is rejected using the same reasons as claim 7. 

Claims 9, 10, 25, and 26 are rejected under 35 U.S.C. 103 as being unpatentable over Papernot, in view of Zheng, further in view of Ray et al. (US 2019/0206090, hereinafter Ray).

As per claim 9, Papernot does not explicitly teach setting the structure of the new model to be simpler than the structure of the pre-trained target model.
Ray teaches setting structure of a new model to be simpler than the structure of a pre-trained target model (i.e., transfer knowledge to a smaller model, see at least [0272]-[0283]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Papernot to set the structure of the new model to be simpler than the structure of the pre-trained target model as similarly taught by Ray in order to compress models for deployment on devices with lower amount of memory and compute capabilities (see at least [0272] of Ray).

As per claim 10, Papernot teaches wherein the models are neural networks (see at least page 507, section 2).
Papernot does not explicitly teach comprising setting the new model to have a number of neurons, synapses, or layers, to be less than that of the pre-trained target model.
Ray teaches setting a new model to have a number of neurons, synapses, or layers, to be less than that of a pre-trained target model (i.e., transfer knowledge to a smaller model, the compressed model may have fewer number of layers,  see at least [0272]-[0283]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Papernot to set the new model to have a number of neurons, synapses, or layers, to be less than that of the pre-trained target model as similarly taught by Ray in order to compress models for deployment on devices with lower amount of memory and compute capabilities (see at least [0272] of Ray).

As per claims 25 and 26, these are the system claims of claims 9 and 10.  Therefore, claims 25 and 26 are rejected using the same reasons as claims 9 and 10. 

Claims 12 and 28 are rejected under 35 U.S.C. 103 as being unpatentable over Papernot, in view of Zheng, further in view of Courville et al. (US 2017/0308324, hereinafter Courville).

As per claim 12, Papernot teaches wherein the models are neural networks each comprising a plurality of layers (see at least pages 506-507, sections 1, 2).
Papernot does not explicitly teach training the new model layer-by-layer in a plurality of sequential stages, each stage training a respective sequential layer of the new model neural network.
Courville teaches training a model layer-by-layer in a plurality of sequential stages, each stage training a respective sequential layer of the model neural network (i.e., sequential neural network training, one or more stages of the multistage sequential data process are compute layers in a neural network training process, see at least Fig. 1, [0021]-[0029], claim 4).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Papernot to train the new model layer-by-layer in a plurality of sequential stages, each stage training a respective sequential layer of the new model neural network as similarly taught by Courville because it would have been obvious to train a neural network using known training techniques in the art.

As per claim 28, this is the system claim of claim 12.  Therefore, claim 28 is rejected using the same reasons as claim 12. 

Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over Papernot, in view of Zheng, further in view of Bendre et al. (US 2018/0322417, hereinafter Bendre).

As per claim 18, Papernot does not explicitly teach wherein the one or more memories are temporary memories that store samples of the random probe training dataset on-the-fly and delete the samples on- the-fly after the samples are used to train the new model.
Bendre teaches one or more memories are temporary memories that store samples of training dataset on-the-fly and delete the samples on-the-fly after the samples are used to train a new model (i.e., temporarily store training data, delete training data from temporary storage after ML trainer process completed the serving of the corresponding ML training request, see at least [0132], [0196]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Papernot such that the one or more memories are temporary memories that store samples of the random probe training dataset on-the-fly and delete the samples on- the-fly after the samples are used to train the new model as similarly taught by Bendre in order to secure the training data against unauthorized access (see at least [0132] of Bendre).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Sun (US 2020/0364542) is cited to teach sensitive data is perturbed by random noise during training the student model.
Zhang et al. “Privacy-preserving Machine Learning through Data Obfuscation”, 2018, arXiv:1807.01860v2.This document teaches adding noise to obfuscate data.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jue Louie whose telephone number is 571-270-1655.  The examiner can normally be reached on M-F 9:30 am - 5:00pm (EST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li Zhen can be reached on 571-272-3768.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/Jue Louie/
Primary Examiner
Art Unit 2121