Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the 
first inventor to file provisions of the AIA .
Examiner notes the entry of the following papers:
Amended claims filed 3/15/2022.
Applicant arguments/remarks made in amendment filed 3/15/2022.
Examiner notes amendments to claims 18 and 19.  Rejections under 35 USC § 112(b) are withdrawn.
Claims  1, 14, and 18-20 are amended.
Claims 1-20 are presented for examination.
Response to Arguments
Applicant presents several arguments.  Each is addressed.
Applicant argues that amended independent claims 1, 14, and 20 are not directed to non-statutory subject matter. (Remarks, page 8, paragraph 2.)  In particular, Applicant argues that “claim 1 as amended recites the following limitations that do not recite an abstract idea under 2AP1:
training a first neural network using the training data resulting in a first 	neural network model, the first neural network model having a first 	numerical precision level;
training a second neural network using the training data resulting in a 	second neural network model, the second neural network model having 	a second numerical precision level different from the first numerical 	precession level (sic):” (Remarks, page 9, paragraph 2.)

The argument is persuasive. Rejection of claims under 35 USC § 101 is withdrawn.
Applicant argues that “claims 14-19 have been amended to recite storage ‘media’ rather than storage ‘devices.’ Thus, the disavowal of transitory signals is more clearly applicable to claims 14-19.” (Remarks, page 12, paragraph 1.) Examiner concurs.  The 35 USC § 101 rejections of claims 14-19 for covering both statutory and non-statutory matter regarding transitory signals are withdrawn.
Applicant argues that 35 USC § 102  rejection of claims 1-9 and 11-13 are incorrect because “each and every feature of claim 1 is not disclosed by Menet.” The argument is moot in view of new grounds of rejection. See detailed rejection below.
Applicant argues that the combination of Tople and Menet fail to teach the limitations of claims 10 and 14-20 because “Menet fails to overcome the above-described distinctions of claims 1, 14, and 20”. (Remarks, page 17, paragraph 2.)  The argument is moot in view of new grounds of rejection.  See detailed rejection below.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-9, 11-17, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Xu et al (Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks, herein Xu), and Mishra et al (Apprentice: Using Knowledge Distillation Techniques to Improve Low-Precision Network Accuracy, herein Mishra).
Regarding claim 1,
	Xu teaches a method comprising: receiving, by a processor, training data; (Xu, page 1, column 1, paragraph, line 12 “By comparing a DNN model’s prediction on the original input with that on squeezed inputs, feature squeezing detects adversarial examples with high accuracy and few false positives.” And, page 2 column 1, paragraph 5 “Deep Neural Networks (DNNs) can efficiently learn highly accurate  models from large corpora of training samples in many domains [18], [26].” In other words, DNN model is a method that runs on a processor, input is receiving, and training samples is training data.)
	[training  a first neural network using the training data resulting in a first neural network model, the first neural network model having a first numerical precision level; training a second neural network using the training data resulting in a second neural network model, the second neural network model having a second numerical precision level different from the first numerical precision level;] 
	generating a first feature vector from input data using the first neural network model; generating a second feature vector from the input data using the second neural network model;  and computing a difference metric between the first feature vector and the second feature vector, the difference metric indicative of whether the input data includes adversarial data.  (Xu, Fig. 1, “If the difference between the model’s prediction on a squeezed input and its prediction on the original input exceeds a threshold level, the input is identified to be adversarial”

    PNG
    media_image1.png
    321
    588
    media_image1.png
    Greyscale

In other words, input is input data, difference is computing a difference, prediction1 and prediction2 are first feature vector and second feature vector, and if the difference exceeds a threshold is the difference metric is indicative of whether the input data includes adversarial data.) 
	Thus far, Xu does not explicitly teach training a first neural network using the training data resulting in a first neural network model, the first neural network model having a first numerical precision level; training a second neural network using the training data resulting in a second neural network model, the second neural network model having a second numerical precision level different from the first numerical precision level;
	Mishra teaches training  a first neural network using the training data resulting in a first neural network model, the first neural network model having a first numerical precision level; training a second neural network using the training data resulting in a second neural network model, the second neural network model having a second numerical precision level different from the first numerical precision level; (Mishra, Figure 2, page 2, paragraph 2, line 8 “In our work, the student network has similar topology as that of the teacher network which has neurons operating at full-precision.” And, page 5, paragraph 2, line 5 “The first scheme (scheme-A) jointly trains both the networks – full-precision teacher and low-precision student network.”

    PNG
    media_image2.png
    419
    920
    media_image2.png
    Greyscale

In other words, trains both networks is training a first neural network and training a second neural network, full-precision teacher network is first network having a first precision level, and low-precision student network is second network having a second numerical precision level different from the first.)
	Both Mishra and Xu are directed to improving the performance of neural networks.  Xu teaches evaluating the difference of the output of two levels of precision to detect adversarial input but does not explicitly teach two neural networks have different levels of precision.  Mishra teaches two neural networks that have different levels of precision but does not teach evaluating the difference of the output of the two levels of precision to detect adversarial input.  In view of the teaching of Xu, it would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Mishra into Xu.  This would result in being able evaluate the difference of the output of two neural networks using two levels of precision in order to detect adversarial input. 
	One of ordinary skill in the art would be motivated to do this because instead of using ever more large and complex methods to detect adversarial input, being able to detect adversarial input by simply reducing the precision and comparing the resulting output to the output of the original neural network, reduces overall computation and space requirements.  (Mishra, page 1, paragraph 4, line 5 “With quantization, a low-precision version of the network model is generated and deployed on the device.  Operating in lower precision mode reduces compute as well as data movement and storage requirements.)
Regarding claim 2,
	The combination of Xu and Mishra teaches the method of claim 1, further comprising:
	comparing the difference metric to a predetermined threshold value. ( Xu, Fig. 1, shows comparing the difference to a predetermined threshold T. )
Regarding claim 3,
	The combination of Xu and Mishra teaches the method of claim 1, further comprising: 
	determining that the difference metric exceeds the predetermined threshold value; and determining that the input data includes the adversarial data responsive to the determining that the difference metric exceeds the predetermined threshold value. (Xu, Fig. 1, shows that when the difference metric exceeds a predetermined threshold value T, a determination that the input includes adversarial data is made.)
Regarding claim 4,
	The combination of Xu and Mishra teaches the method of claim 3, further comprising: 
	discarding the input data. ( Xu, Fig. 1, and, page 1, column 2, paragraph 3, line 11 “By comparing the difference between predictions with a selected threshold value, our system outputs the correct prediction for legitimate examples and rejects adversarial inputs.” In other words, rejects adversarial inputs is discarding the input data.)
Regarding claim 5,
	The combination of Xu and Mishra teaches the method of claim 2, further comprising: 
	determining that the difference metric does not exceed the predetermined threshold value; and determining a classification of the input data responsive to the determining that the difference metric does not exceed the predetermined threshold value. ( Xu, Fig. 1, and, page 1, column 2, paragraph 3, line 11 “By comparing the difference between predictions with a selected threshold value, our system outputs the correct prediction for legitimate examples and rejects adversarial inputs.” In other words, comparing the difference with a selected threshold value is determining the difference metric does not exceed the predetermined threshold value, and outputs the correct prediction for legitimate examples is determining a classification of the input data responsive to the determining that the difference metric does not exceed the predetermined threshold.)
Regarding claim 6,
	The combination of Xu and Mishra teaches the method of claim 1, wherein 
	the first numerical precision level is greater than the second numerical precision level.  (Xu, Fig. 1 “The model is evaluated on both the original input and the input after being pre-processed by feature squeezers.” In other words, the original input before the pre-processing by feature squeezing is the first numerical precision level is greater than the second numerical precision level.) 	

Regarding claim 7,
	The combination of Xu and Mishra teaches the method of claim 1, wherein 
	the first numerical precision level is a full numerical precision level.  (Xu, Fig. 1 “The model is evaluated on both the original input and the input after being pre-processed by feature squeezers.” In other words, the original input before the pre-processing by feature squeezing is the first numerical precision level is a full numerical precision level.)
Regarding claim 8,
	The combination of Xu and Mishra teaches the method of claim 1, wherein 
	the first neural network model is a published neural network model with a known numerical precision level.  (Xu, TABLE 1,  

    PNG
    media_image3.png
    154
    594
    media_image3.png
    Greyscale

In other words, DenseNet and MobileNet are published neural network models.)
Regarding claim 9,
	The combination of Xu and Mishra teaches the method of claim 1, wherein 
	the second neural network model is a reduced precision neural network model.  (Mishra, Figure 2, page 2, paragraph 2, line 8 “In our work, the student network has similar topology as that of the teacher network which has neurons operating at full-precision.” And, page 5, paragraph 2, line 5 “The first scheme (scheme-A) jointly trains both the networks – full-precision teacher and low-precision student network.” In other words, the low-precision student network is the second neural network model is a reduced precision neural network model.)
Regarding claim 11,
	The combination of Xu and Mishra teaches the method of claim 1, wherein 
	one or more layers of the second neural network model include different numerical precision levels. (Mishra, page 2, paragraph 2, line 8 “In our work, the student network has similar topology as that of the teacher network, except that the student network has low-precision neurons compared to the teacher network which has neurons operating at full-precision.”  In other words, lower precision neurons is one or more layers of the second neural network model include different numerical precision levels.)
Regarding claim 12,
	The combination of Xu and Mishra teaches the method of claim 1, wherein 
	one or more of the first neural network or the second neural network includes a deep neural network (DNN).  (Mishra, page 4, paragraph 2, line 1 “Figure 2 shows the schematic of the knowledge distillation setup. Given an input image x, a teacher DNN maps this image to predictions pT.”  In other words, teacher DNN is one or more of the first neural network or the second neural network includes a deep neural network.)
Regarding claim 13,
	The combination of Xu and Mishra teaches the method of claim 1, wherein 
	the input data includes image data.  (Xu, page 2, column 1, paragraph 3, line 3 “Our experiments show that joint-detection can successfully detect adversarial examples form eleven static attacks at the detection rates of 98% on MNIST and 85% on CIFAR-10 and ImageNet, with low (around 5%) false positive rates.”  In other words, ImageNet is image data.)
Claims 14-17 are computer usable program product claims corresponding to method claims 1-4, respectively. Otherwise they are the same.  It is implicit that a computer implemented method requires computer usable program products in order to execute.  Therefore, claims 14-17 are rejected for the same reasons as claims 1-4, respectively.
Claim 20 is a computer system claim corresponding to method claim 1.  Otherwise, they are the same.  It is implicit that a computer implemented method requires a computer system in order to execute.  Therefore, claim 20 is rejected for the same reasons as claim 1.
Claims 10, 18, and 19 are rejected under 35 USC 103 as being unpatentable over Xu, Mishra, and Tople et al (PRIVADO: Practical and Secure DNN Inference, herein Tople).
Regarding claim 10,
	The combination of Xu and Mishra teaches the method of claim 1, wherein
	Thus far, the combination of Xu and Mishra does not explicitly teach the second neural network model is an encrypted neural network model.
	Tople teaches the second neural network model is an encrypted neural network model. (Tople, Figure 3, and page 6, paragraph 4, line 2, “PRIVADO-Generator takes as input the ONNX representation of the model, the input-oblivious DNN framework that Step 1 generated, required SGX libraries, and an encryption key.  It outputs an enclave-executable model binary and the encrypted model parameters.”  

    PNG
    media_image4.png
    426
    534
    media_image4.png
    Greyscale


In other words, Tople creates an encrypted neural network model.) 
	Both Tople and the combination of Xu and Mishra are directed to deep neural networks (DNN).  In view of the teaching of Xu and Mishra, it would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Tople into the combination of Xu and Mishra.  This would result in being able to create an encrypted neural network model for inference from an unencrypted one, thus being able to use the adversarial detection method with an encrypted neural network model. 
	One of ordinary skill in the art would be motivated to do this in order to perform adversarial input detection over the cloud as a service, thus saving clients the cost and effort required to generate a custom method. (Tople, page 1, paragraph 1, line 1 “Recently, cloud providers have extended support for trusted hardware primitives such as Intel SGX.  Simultaneously, the field of deep learning is seeing enormous innovation and increase in adoption…. WE first demonstrate that side-channel based attacks on DNN models are indeed possible.  We show that by observing access patterns, we can recover inputs to the DNN model.  This motivates the need for PRIVADO, a system we have designed for secure inference-as-a-service.”)
Regarding claim 18,
	The combination of Xu, Mishra and Tople teach the computer usable product of claim 14, wherein
the program instructions are stored in a computer readable storage medium in a data processing system, and wherein the program instructions are transferred over a network from a remoted data processing system. (Tople, Figure 1, and page 2, column 2, paragraph 4, line 1 “Figure 1 shows the entities involved in such an inference service: the cloud provider, the model owner and multiple model users.  The cloud provider supports trusted hardware primitives such as Intel SGX. SGX-enabled CPUs create isolated execution environments called enclaves in which all data are encrypted.  SGX-enabled CPUs also remotely attest the code executing within the enclaves to ensure its integrity [20].”

    PNG
    media_image5.png
    508
    522
    media_image5.png
    Greyscale

	In other words, CPUs also remotely attest is computer usable code transferred over a network.)
Regarding claim 19,
	The combination of Xu, Mishra, and Tople teach the computer usable program product of claim 14, wherein
	the program instructions are downloaded over a network to a remote data processing system for use in a computer readable storage medium associated with the remote data processing system. (Tople, Figure 1, and page 2, column 1, paragraph 3, line 1 “To address ease-of-use challenge, PRIVADO used the PRIVADO-Generator which takes as input models represented in the  popular ONNX format [17], and automatically generates a minimal set of enclave-specific code and encrypted parameters for the model.”   In other words, Figure 1 shows server data processing system together with the code generation process and remote processing.)
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BART RYLANDER whose telephone number is (571)272-8359. The examiner can normally be reached Monday - Thursday 8:00 to 5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on 571-270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/B.I.R./Examiner, Art Unit 2124                                                                                                                                                                                                        
/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124