DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is in response to the application filed December 15, 2017. Claims 1-20 are pending and have been considered. 

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 12/15/2017 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Specification
The disclosure is objected to because of the following informalities: 
In paragraph [0023], line 8, “with the a function” should read “with a function”. 
In paragraph [0049], line 9, "neural network model 160" should read "neural network model 130".  
Appropriate correction is required.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 8 and 18 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

The term "small" in claims 8 and 18 is a relative term which renders the claims indefinite.  The term "small" is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention.  It is not clear to what degree a perturbation is considered to be small. Clarification is required.

Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-4, 6, 7, 9-14, 16, 17, 19, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Papernot et al. (The Limitations of Deep Learning in Adversarial Settings, hereinafter “Papernot”) in view of Mathieu et al. (Deep Multi-Scale Video Prediction Beyond Mean Square Error, hereinafter “Mathieu”).

Regarding claim 1, Papernot teaches A method, in a data processing system comprising a processor and a memory, the memory comprising instructions which are executed by the processor to specifically configure the processor to implement a hardened neural network, the method comprising (“Indeed, all our experiments are facilitated using GPU acceleration on a machine equipped with a Xeon E5-2680 v3 processor and a Nvidia Tesla K5200 graphics processor.” [pg. 387, Appendix, A. Validation setup details, ¶1, lines 7-10; Papernot discloses processors and a compiler (Theano) to train the DNN. A compiler would allocate memory for the program.]):
[see pg. 387, A. Validation setup details; Examiner is interpreting neural network engine to be equivalent to a processor to train the neural network.] that operates on a neural network to harden the neural network against evasion attacks and generates a hardened neural network (“The second class of solutions seeks to improve training to increase the robustness of DNNs. Interestingly, the problem of adversarial samples is closely linked to training.” [pg. 385, § 6. Discussion, ¶5, lines 1-3; Training to increase the robustness of DNNs would be equivalent to generating a hardened neural network.]);
generating, by the hardened neural network engine, a reference training data set based on an original training data set (“This adversary is able to collect a surrogate dataset, sampled from the same distribution as the original dataset used to train the DNN.” [pg. 375, § 2.3 Adversarial Capabilities, Training Data, lines 1-3; Examiner is interpreting surrogate dataset to be equivalent to a reference dataset);
processing, by the neural network, the original training data set and the reference training data set to generate first and second output data sets (“
    PNG
    media_image1.png
    111
    499
    media_image1.png
    Greyscale
” [pg. 375, § 3.1 Studying a Simple Neural Network, ¶2; Examiner is interpreting X to be the original training data set and F(X) = Y to be the first output of original training data set and X* to be the reference training data set and Y* to be the second output data set.]);
	However Papernot fails to explicitly teach calculating, by the hardened neural network engine, a modified loss function of the neural network, wherein the modified loss function is a combination of an original loss function associated with the neural network, and a function of the first and second output data sets;
and training, by the hardened neural network engine, the neural network based on the modified loss function to generate the hardened neural network.
Mathieu teaches calculating, by the hardened neural network engine, a modified loss function of the neural network (“
    PNG
    media_image2.png
    95
    812
    media_image2.png
    Greyscale
” [pg. 4, § Training D, Equation 4 would be the modified loss function and is modifying the original loss in Equation 3.]), wherein the modified loss function is a combination of an original loss function associated with the neural network (“
    PNG
    media_image3.png
    82
    838
    media_image3.png
    Greyscale
” [pg. 4, § Training D, Equation 3 would correspond to the original loss function of the neural network]), and a function of the first and second output data sets (See Equation (4), Lbce(Y, Yhat) where Y would correspond to the first output data set and Yhat would correspond to a second output data set [pg. 4, § Training D]);
and training, by the hardened neural network engine, the neural network based on the modified loss function to generate the hardened neural network (“
    PNG
    media_image4.png
    279
    800
    media_image4.png
    Greyscale
” [pg. 4, § Training D; Mathieu discloses training the neural network model by using the modified loss functions in equation 4, this would generate a hardened neural network as a result]).
Papernot and Mathieu are both in the same field of endeavor of training deep neural networks against adversarial attacks. Papernot discloses a method of crafting adversarial samples to train the deep neural network. Mathieu discloses a method for frame prediction using loss functions to train the deep neural network model. It would have been obvious for a person of ordinary skill in the art before the effective filing date to combine the teachings of Papernot with Mathieu to include a modified loss function of the training data sets. One would have been motivated to perform this modification in order to improve classification results of images. [Mathieu, Introduction, ¶1]

Regarding claim 2, the combination of Papernot and Mathieu teaches the method of claim 1, where Mathieu further teaches wherein the modified loss function [See pg. 4; equation 4] causes differences between data in the original training data set and data in the reference training data set result to be commensurate with differences between the corresponding outputs of the hardened neural network (See Figure 4 [pg. 8; note: Examiner is interpreting that the modified loss function doesn’t statistically change the differences between the datasets. Figure 4 discloses image frames results where both the input and adversarial images are recognized and the resulting differences would be commensurate with each other.]).
Papernot and Mathieu are both in the same field of endeavor of training deep neural networks against adversarial attacks. Papernot discloses a method of crafting adversarial samples to train the deep neural network. Mathieu discloses a method for frame prediction using loss functions to train the deep neural network model. It would have been obvious for a person of ordinary skill in the art before the effective filing date to combine the teachings of Papernot with Mathieu to include a modified loss function of the training data sets. One would have been motivated to perform this modification in order to improve classification results of images. [Mathieu, Introduction, ¶1]

Regarding claim 3, the combination of Papernot and Mathieu teaches the method of claim 1, where Papernot further teaches wherein generating the reference training data set comprises at least one of performing a perturbation operation on the original training data set to introduce perturbations in the original training data set and generate the reference training data set comprising the original training data set with the introduced perturbations, or performing a sampling operation to sample data from the original training data set to generate the reference training data set (“
    PNG
    media_image5.png
    232
    488
    media_image5.png
    Greyscale
” [pg. 373, left column, ¶3; Adding a perturbation vector to X would correspond to performing a perturbation operation on the original training dataset. Furthermore, X* would be the reference training data set generated as a result of adding the perturbation vector]). 

Regarding claim 4, the combination of Papernot and Mathieu teaches the method of claim 1, where Papernot further teaches wherein the function of the first and second output data sets is further a function of the original training data set and the reference training data set. (“Most importantly, the N-dimensional function F learned by the DN during training assigns an output Y = F(X) when given an M-dimensional input X.” [pg. 376, § 3.2 generalizing to deep neural network, ¶1; Y would be function of the original training data set X.] “As input, the algorithm takes a benign sample X, a target adversarial output Y*, an acyclic feed forward DNN F, a maximum distortion parameter Υ, and a feature variation parameter θ. It returns new adversarial sample X* such that F(X*) = Y*” [pg. 376, § 3.2 generalizing to deep neural network, ¶2; F(X*) = Y* would be the function of the reference data set X*]).
claim 6, the combination of Papernot and Mathieu teaches the method of claim 1, where Papernot further teaches performing, by the hardened neural network engine, a classification operation to classify an input data into one of a plurality of predefined classes of input data (“Our goal is to report whether we can reach any adversarial target class for a given source class. For instance, if we are given a handwritten 0, we increase some of the pixel intensities to produce 9 adversarial samples respectively classified in each of the classes 1 to 9.” [pg. 380, § 4.2 crafting by increasing pixel intensities, ¶1; see Fig.9; predefined classes of input data would be a pixel image of 1-9]);
and performing, by a cognitive computing system [Abstract; computer vision], a reasoning operation based on results of the classification operation (“To verify the validity of our algorithms, and of our adversarial saliency maps, we run a simple experiment. We run the crafting algorithm on an empty input (all pixel intensities initially set to 0) and craft one adversarial sample for each class from 0 to 9. The different samples shown in Figure 9 demonstrate how adversarial saliency maps are able to identify input features relevant to classification in a class.” [pg. 381, left column, ¶2; Papernot discloses using adversarial saliency maps to identify features based off a classification of an input. This would be equivalent to using a reasoning operation as it is using a crafting algorithm to look into pixel intensities.]).

	Regarding claim 7, the combination of Papernot and Mathieu teaches the method of claim 6, where Papernot further teaches wherein performing the reasoning (“Adversarial saliency maps are defined to suit problem specific adversarial goals. For instance, we later study a network used as a classifier, its output is a probability vector across classes, where the final predicted class value corresponds to the component with the highest probability” [pg. 378, adversarial saliency maps, ¶2; annotating would be equivalent to classifying the input to a predefined class.]).

	Regarding claim 9, the combination of Papernot and Mathieu teaches the method of claim 1, where Papernot further teaches further comprising: 
receiving, by the hardened neural network, input data to be processed (“The network input is black and white images (28x28 pixels) of handwritten digits, which are flattened as vectors of 784 features, where each feature corresponds to a pixel intensity taking normalized values between 0 and 1.” [[pg. 379, § 4. Application of the Approach, ¶2, lines 8-12]);
generating, by the hardened neural network, an output vector comprising a plurality of probability values stored in vector slots, wherein each vector slot is associated with a different class in a plurality of predefined classes such that the probability value stored in a vector slot indicates a probability that the input data is properly classified into a class corresponding to the vector slot (“The output is a 10 class probability vector, where each class corresponds to a digit from 0 to 9 (i.e. plurality of defined classes), as shown in Figure 8. The deep neural network then labels the input image with the class assigned the maximum probability, as shown in Equation 7.” [pg. 379, § 4. Application of the Approach. ¶2, lines 16-21; Fig 8. Shows that the probability value stored in a vector slot corresponds to an image being correctly classified.]);
providing, by the hardened neural network, the output vector to a cognitive system (“Adversarial saliency maps are defined to suit problem specific adversarial goals. For instance, we later study a network used as a classifier, its output is a probability vector across classes, where the final predicted class value corresponds to the component with the highest probability” [pg. 378, § 3.2.2. Adversarial saliency maps, ¶2; the saliency map would be a cognitive system since it is a visualization tool for image analysis]);
executing, by the cognitive system, a cognitive operation based on the probability values stored in the output vector (“The deep neural network then labels the input image with the class assigned the maximum probability, as shown in Equation 7.” [pg. 379, § 4. Application of the approach. ¶2, lines 19-21, analyzing the image and classifying the image with a class is considered a cognitive operation. Output vector is noted above as a 10 class probability vector.]). 

Regarding claim 10, the combination of Papernot and Mathieu teaches the method of claim 9, where Papernot further teaches wherein the input data is a data sample of at least one of image data, audio data, or textual data (“The network input is black and white images (28x28 pixels) of handwritten digits, which are flattened as vectors of 784 features, where each feature corresponds to a pixel intensity taking normalized values between 0 and 1.” [pg. 379, § 4. Application of the Approach, ¶2, lines 8-12; this would correspond to the input data being an image data.]), wherein the cognitive model logic operates on the data sample to classify the data sample according to the cognitive operation of the cognitive system (“The algorithm is able to craft successful adversarial samples for all 90 source-target class pairs. Figure 1 shows the 90 adversarial samples obtained as well as the 10 original samples used to craft them. The original samples are found on the diagonal. A sample on row i and column j, when i                         
                            ≠
                        
                     j, is a sample crafted from an image originally classified as source class i to be misclassified as target class j. To verify the validity of our algorithms, and of our adversarial saliency maps, we run a simple experiment. We run the crafting algorithm on an empty input (all pixel intensities initially set to 0) and craft one adversarial sample for each class from 0 to 9. The different samples shown in Figure 9 demonstrate how adversarial saliency maps are able to identify input features relevant to classification in a class.” [pg. 381, left column, ¶1-2; note: The algorithm would correspond to a cognitive model logic and is used to classify a data sample (i.e. image)]), and wherein the cognitive operation is one of an image analysis operation, a speech recognition operation, an audio recognition operation, a social network filtering operation, a machine translation operation, a natural language processing operation, a patient treatment recommendation operation, a medical imaging analysis operation, or a bioinformatics operation. (“We now apply these tools to a DNN used for a computer vision classification task: handwritten digit recognition.” [pg. 379, § 4. Application of the Approach, ¶1, lines 4-6; computer vision is a form of automated image analysis which corresponds to an image analysis operation]).

Regarding claim 11, Papernot teaches a computer program product comprising a computer readable storage medium having a computer readable program stored therein, wherein the computer readable program, when executed on a data processing system, causes the data processing system to (“To train and use DNNs, we use Theano, a Python package designed to simplify large-scale scientific computing.” [pg. 387, Appendix, A. Validation setup details, lines 1-3; Papernot discloses a computer program to perform the training of the DNN, this program would inherently be stored on a computer storage medium]): 
configure the data processing system to implement a hardened neural network engine [see pg. 387, A. Validation setup details; Examiner is interpreting neural network engine to be equivalent to a processor to train the neural network.] that operates on a neural network to harden the neural network against evasion attacks and generates a hardened neural network (“The second class of solutions seeks to improve training to increase the robustness of DNNs. Interestingly, the problem of adversarial samples is closely linked to training.” [pg. 385, § 6. Discussion, ¶5, lines 1-3; Training to increase the robustness of DNNs would be equivalent to generating a hardened neural network.]);
generate, by the hardened neural network engine, a reference training data set based on an original training data set (“This adversary is able to collect a surrogate dataset, sampled from the same distribution as the original dataset used to train the DNN.” [pg. 375, § 2.3 Adversarial Capabilities, Training Data, lines 1-3; Examiner is interpreting surrogate dataset to be equivalent to a reference dataset);
process, by the neural network, the original training data set and the reference training data set to generate first and second output data sets (“
    PNG
    media_image1.png
    111
    499
    media_image1.png
    Greyscale
” [pg. 375, § 3.1 Studying a Simple Neural Network, ¶2; Examiner is interpreting X to be the original training data set and F(X) = Y to be the first output of original training data set and X* to be the reference training data set and Y* to be the second output data set.]);
	However Papernot fails to explicitly teach calculate, by the hardened neural network engine, a modified loss function of the neural network, wherein the modified loss function is a combination of an original loss function associated with the neural network, and a function of the first and second output data sets;
and train, by the hardened neural network engine, the neural network based on the modified loss function to generate the hardened neural network.
Mathieu teaches calculate, by the hardened neural network engine, a modified loss function of the neural network (“
    PNG
    media_image2.png
    95
    812
    media_image2.png
    Greyscale
” [pg. 4, § Training D, Equation 4 would be the modified loss function and is modifying the original loss in Equation 3.]), wherein the modified loss function is a combination of an original loss function associated with the neural network (“
    PNG
    media_image3.png
    82
    838
    media_image3.png
    Greyscale
” [pg. 4, § Training D, Equation 3 would correspond to the original loss function of the neural network]), and a function of the first and second output data sets (See Equation (4), Lbce(Y, Yhat) where Y would correspond to the first output data set and Yhat would correspond to a second output data set [pg. 4, § Training D]);
and train, by the hardened neural network engine, the neural network based on the modified loss function to generate the hardened neural network (“
    PNG
    media_image4.png
    279
    800
    media_image4.png
    Greyscale
” [pg. 4, § Training D; Mathieu discloses training the neural network model by using the modified loss functions in equation 4, this would generate a hardened neural network as a result]).
Papernot and Mathieu are both in the same field of endeavor of training deep neural networks against adversarial attacks. Papernot discloses a method of crafting adversarial samples to train the deep neural network. Mathieu discloses a method for frame prediction using loss functions to train the deep neural network model. It would have been obvious for a person of ordinary skill in the art before the effective filing date [Mathieu, Introduction, ¶1]

Regarding claim 12, the combination of Papernot and Mathieu teaches the computer program product of claim 11, where Mathieu further teaches wherein the modified loss function [See pg. 4; equation 4] causes differences between data in the original training data set and data in the reference training data set result to be commensurate with differences between the corresponding outputs of the hardened neural network (See Figure 4 [pg. 8; note: Examiner is interpreting that the modified loss function doesn’t statistically change the differences between the datasets. Figure 4 discloses image frames results where both the input and adversarial images are recognized and the resulting differences would be commensurate with each other.]).
Papernot and Mathieu are both in the same field of endeavor of training deep neural networks against adversarial attacks. Papernot discloses a method of crafting adversarial samples to train the deep neural network. Mathieu discloses a method for frame prediction using loss functions to train the deep neural network model. It would have been obvious for a person of ordinary skill in the art before the effective filing date to combine the teachings of Papernot with Mathieu to include a modified loss function of the training data sets. One would have been motivated to perform this modification in order to improve classification results of images. [Mathieu, Introduction, ¶1]

claim 13, the combination of Papernot and Mathieu teaches the computer program product of claim 11, where Papernot further teaches wherein the computer readable program further causes the data processing system to generate the reference training data set at least one of performing a perturbation operation on the original training data set to introduce perturbations in the original training data set and generate the reference training data set comprising the original training data set with the introduced perturbations, or performing a sampling operation to sample data from the original training data set to generate the reference training data set (“
    PNG
    media_image5.png
    232
    488
    media_image5.png
    Greyscale
” [pg. 373, left column, ¶3; Adding a perturbation vector to X would correspond to performing a perturbation operation on the original training dataset. Furthermore, X* would be the reference training data set generated as a result of adding the perturbation vector]). 

Regarding claim 14, the combination of Papernot and Mathieu teaches the computer program product of claim 11, where Papernot further teaches wherein the function of the first and second output data sets is further a function of the original training data set and the reference training data set. (“Most importantly, the N-dimensional function F learned by the DN during training assigns an output Y = F(X) when given an M-dimensional input X.” [pg. 376, § 3.2 generalizing to deep neural network, ¶1; Y would be function of the original training data set X.] “As input, the algorithm takes a benign sample X, a target adversarial output Y*, an acyclic feed forward DNN F, a maximum distortion parameter Υ, and a feature variation parameter θ. It returns new adversarial sample X* such that F(X*) = Y*” [pg. 376, § 3.2 generalizing to deep neural network, ¶2; F(X*) = Y* would be the function of the reference data set X*]).

Regarding claim 16, the combination of Papernot and Mathieu teaches the computer program product of claim 11, where Papernot further teaches wherein the computer readable program further causes the data processing system to:
perform, by the hardened neural network engine, a classification operation to classify an input data into one of a plurality of predefined classes of input data (“Our goal is to report whether we can reach any adversarial target class for a given source class. For instance, if we are given a handwritten 0, we increase some of the pixel intensities to produce 9 adversarial samples respectively classified in each of the classes 1 to 9.” [pg. 380, § 4.2 crafting by increasing pixel intensities, ¶1; see Fig.9; predefined classes of input data would be a pixel image of 1-9]);
and perform, by a cognitive computing system [Abstract; computer vision], a reasoning operation based on results of the classification operation (“To verify the validity of our algorithms, and of our adversarial saliency maps, we run a simple experiment. We run the crafting algorithm on an empty input (all pixel intensities initially set to 0) and craft one adversarial sample for each class from 0 to 9. The different samples shown in Figure 9 demonstrate how adversarial saliency maps are able to identify input features relevant to classification in a class.” [pg. 381, left column, ¶2; Papernot discloses using adversarial saliency maps to identify features based off a classification of an input. This would be equivalent to using a reasoning operation as it is using a crafting algorithm to look into pixel intensities.]).

	Regarding claim 17, the combination of Papernot and Mathieu teaches the computer program product of claim 16, where Papernot further teaches wherein the computer readable program further causes the data processing system to perform the reasoning operation based on the results of the classification operation at least by annotating the input data to include a class corresponding to a predefined class having a highest probability value associated with the class as determined by the hardened neural network engine (“Adversarial saliency maps are defined to suit problem specific adversarial goals. For instance, we later study a network used as a classifier, its output is a probability vector across classes, where the final predicted class value corresponds to the component with the highest probability” [pg. 378, adversarial saliency maps, ¶2; annotating would be equivalent to classifying the input to a predefined class.]).

claim 19, the combination of Papernot and Mathieu teaches the computer program product of claim 11, where Papernot further teaches wherein the computer readable program further causes the data processing system to: 
receive, by the hardened neural network, input data to be processed (“The network input is black and white images (28x28 pixels) of handwritten digits, which are flattened as vectors of 784 features, where each feature corresponds to a pixel intensity taking normalized values between 0 and 1.” [[pg. 379, § 4. Application of the Approach, ¶2, lines 8-12]);
generate, by the hardened neural network, an output vector comprising a plurality of probability values stored in vector slots, wherein each vector slot is associated with a different class in a plurality of predefined classes such that the probability value stored in a vector slot indicates a probability that the input data is properly classified into a class corresponding to the vector slot (“The output is a 10 class probability vector, where each class corresponds to a digit from 0 to 9 (i.e. plurality of defined classes), as shown in Figure 8. The deep neural network then labels the input image with the class assigned the maximum probability, as shown in Equation 7.” [pg. 379, § 4. Application of the Approach. ¶2, lines 16-21; Fig 8. Shows that the probability value stored in a vector slot corresponds to an image being correctly classified.]);
provide, by the hardened neural network, the output vector to a cognitive system (“Adversarial saliency maps are defined to suit problem specific adversarial goals. For instance, we later study a network used as a classifier, its output is a probability vector across classes, where the final predicted class value corresponds to the component with the highest probability” [pg. 378, § 3.2.2. Adversarial saliency maps, ¶2; the saliency map would be a cognitive system since it is a visualization tool for image analysis]);
execute, by the cognitive system, a cognitive operation based on the probability values stored in the output vector (“The deep neural network then labels the input image with the class assigned the maximum probability, as shown in Equation 7.” [pg. 379, § 4. Application of the approach. ¶2, lines 19-21, analyzing the image and classifying the image with a class is considered a cognitive operation. Output vector is noted above as a 10 class probability vector.]). 

Regarding claim 20, Papernot teaches a data processing system [pg. 378 Appendix; § A. Validation setup details] comprising: 
at least one processor (“Indeed, all our experiments are facilitated using GPU acceleration on a machine equipped with a Xeon E5-2680 v3 processor and a Nvidia Tesla K5200 graphics processor.” [pg. 378 Appendix; § A. Validation setup details; ¶1, lines 7-10]); and 
at least one memory coupled to the at least one processor, wherein the at least one memory comprises instructions which, when executed by the at least one processor, cause the least one processor to (“To train and use DNNs, we use Theano, a Python package designed to simplify large-scale scientific computing.” [pg. 387, Appendix, A. Validation setup details, lines 1-3; Papernot discloses a computer program to perform the training of the DNN, this program would inherently be stored in memory and executed by a processor]): 
[see pg. 387, A. Validation setup details; Examiner is interpreting neural network engine to be equivalent to a processor to train the neural network.] that operates on a neural network to harden the neural network against evasion attacks and generates a hardened neural network (“The second class of solutions seeks to improve training to increase the robustness of DNNs. Interestingly, the problem of adversarial samples is closely linked to training.” [pg. 385, § 6. Discussion, ¶5, lines 1-3; Training to increase the robustness of DNNs would be equivalent to generating a hardened neural network.]);
generate a reference training data set based on an original training data set (“This adversary is able to collect a surrogate dataset, sampled from the same distribution as the original dataset used to train the DNN.” [pg. 375, § 2.3 Adversarial Capabilities, Training Data, lines 1-3; Examiner is interpreting surrogate dataset to be equivalent to a reference dataset);
process, by the neural network, the original training data set and the reference training data set to generate first and second output data sets (“
    PNG
    media_image1.png
    111
    499
    media_image1.png
    Greyscale
” [pg. 375, § 3.1 Studying a Simple Neural Network, ¶2; Examiner is interpreting X to be the original training data set and F(X) = Y to be the first output of original training data set and X* to be the reference training data set and Y* to be the second output data set.]);

and train, by the hardened neural network engine, the neural network based on the modified loss function to generate the hardened neural network.
Mathieu teaches calculate, by the hardened neural network engine, a modified loss function of the neural network (“
    PNG
    media_image2.png
    95
    812
    media_image2.png
    Greyscale
” [pg. 4, § Training D, Equation 4 would be the modified loss function and is modifying the original loss in Equation 3.]), wherein the modified loss function is a combination of an original loss function associated with the neural network (“
    PNG
    media_image3.png
    82
    838
    media_image3.png
    Greyscale
” [pg. 4, § Training D, Equation 3 would correspond to the original loss function of the neural network]), and a function of the first and second output data sets (See Equation (4), Lbce(Y, Yhat) where Y would correspond to the first output data set and Yhat would correspond to a second output data set [pg. 4, § Training D]);
and train, by the hardened neural network engine, the neural network based on the modified loss function to generate the hardened neural network (“
    PNG
    media_image4.png
    279
    800
    media_image4.png
    Greyscale
” [pg. 4, § Training D; Mathieu discloses training the neural network model by using the modified loss functions in equation 4, this would generate a hardened neural network as a result]).
Papernot and Mathieu are both in the same field of endeavor of training deep neural networks against adversarial attacks. Papernot discloses a method of crafting adversarial samples to train the deep neural network. Mathieu discloses a method for frame prediction using loss functions to train the deep neural network model. It would have been obvious for a person of ordinary skill in the art before the effective filing date to combine the teachings of Papernot with Mathieu to include a modified loss function of the training data sets. One would have been motivated to perform this modification in order to improve classification results of images. [Mathieu, Introduction, ¶1]

Claims 5, 8, 15, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Papernot in view of Mathieu and further in view of Zheng et al. (Improving the Robustness of Deep Neural Networks via Stability Training, hereinafter “Zheng”). 

Regarding claim 5, the combination of Papernot and Mathieu teaches the method of claim 1, where Mathieu further teaches wherein training the neural network 
    PNG
    media_image2.png
    95
    812
    media_image2.png
    Greyscale
” [pg. 4, § Training D, Equation 4 would be the modified loss function. Mathieu further discloses training the neural network model to minimize the loss function.])
However the combination of Papernot and Mathieu fails to explicitly teach training the neural network to minimize the original loss function associated with the neural network.
Zheng teaches training the neural network to minimize the original loss function associated with the neural network (“
    PNG
    media_image6.png
    279
    497
    media_image6.png
    Greyscale
” [pg. 4383, § 3.4 Stability for classification; note: Zheng discloses a loss function of an original training data set and states the training objective (i.e. training the neural network) is to minimize the loss.]).
Papernot, Mathieu and Zheng are all in the same field of endeavor of training deep neural networks against adversarial attacks. Papernot discloses a method of crafting adversarial samples to train the deep neural network. Mathieu discloses a method for frame prediction using loss functions to train the deep neural network model.  [Zheng, pg. 4483, § 4.2. Distortion types, ¶1] 

Regarding claim 8, the combination of Papernot and Mathieu teaches the method of claim 1, however the combination of Papernot and Mathieu fails to explicitly teach wherein the hardened neural network is hardened against gradient based attacks such that small perturbations in input data to the hardened neural network do not cause misclassification by the hardened neural network.
Zheng teaches wherein the hardened neural network is hardened against gradient based attacks such that small perturbations in input data to the hardened neural network do not cause misclassification by the hardened neural network (“Our goal is to stabilize the output f(x) ∈ R m of a neural network N against small natural perturbations to a natural image x ∈ [0, 1] w×h of size w × h, where we normalize all pixel values. Intuitively, this means that we want to formulate a training objective that flattens f in a small neighborhood of any natural image x: if a perturbed copy x′ is close to x, we want f(x) to be close to f(x′)” [pg. 4481 – 4482, § 3.1. stability objective; Zheng discloses stability training which trains a neural network to take in small perturbations of the input data and causes the neural network to not misclassify an image by keeping the output of f(x) and f(x’) to be close.]).
Papernot, Mathieu and Zheng are all in the same field of endeavor of training deep neural networks against adversarial attacks. Papernot discloses a method of crafting adversarial samples to train the deep neural network. Mathieu discloses a method for frame prediction using loss functions to train the deep neural network model. Zheng discloses a method of improving robustness of neural networks by analyzing output instability. It would have been obvious for a person of ordinary skill in the art before the effective filing date to combine the teachings of Papernot and Mathieu with the teachings of Zheng to add small perturbations to the input dataset and train the neural network not to misclassify the output. One would have been motivated to include this modification in order to improve the robustness of the neural network and improve the classification of output data. [Zheng, pg. 4482, § 3.3.Stability for feature embeddings, ¶1] 

Regarding claim 15, the combination of Papernot and Mathieu teaches the computer program product of claim 11, where Mathieu further teaches wherein the computer readable program further causes the data processing system to train the neural network based on the modified loss function to generate the hardened neural network comprises training the neural network to minimize both the function of the first and second output data sets (“
    PNG
    media_image2.png
    95
    812
    media_image2.png
    Greyscale
” [pg. 4, § Training D, Equation 4 would be the modified loss function. Mathieu further discloses training the neural network model to minimize the loss function.])
However the combination of Papernot and Mathieu fails to explicitly teach train the neural network to minimize the original loss function associated with the neural network.
Zheng teaches train the neural network to minimize the original loss function associated with the neural network (“
    PNG
    media_image6.png
    279
    497
    media_image6.png
    Greyscale
” [pg. 4383, § 3.4 Stability for classification; note: Zheng discloses a loss function of an original training data set and states the training objective (i.e. training the neural network) is to minimize the loss.]).
Papernot, Mathieu and Zheng are all in the same field of endeavor of training deep neural networks against adversarial attacks. Papernot discloses a method of crafting adversarial samples to train the deep neural network. Mathieu discloses a method for frame prediction using loss functions to train the deep neural network model. Zheng discloses a method of improving robustness of neural networks by analyzing output instability. It would have been obvious for a person of ordinary skill in the art before the effective filing date to combine the teachings of Papernot and Mathieu with the teachings of Zheng to include a neural network that minimizes the original and modified loss function. One would have been motivated to train a neural network to  [Zheng, pg. 4483, § 4.2. Distortion types, ¶1] 

Regarding claim 18, the combination of Papernot and Mathieu teaches the computer program product of claim 11, however the combination of Papernot and Mathieu fails to explicitly teach wherein the hardened neural network is hardened against gradient based attacks such that small perturbations in input data to the hardened neural network do not cause misclassification by the hardened neural network.
Zheng teaches wherein the hardened neural network is hardened against gradient based attacks such that small perturbations in input data to the hardened neural network do not cause misclassification by the hardened neural network (“Our goal is to stabilize the output f(x) ∈ R m of a neural network N against small natural perturbations to a natural image x ∈ [0, 1] w×h of size w × h, where we normalize all pixel values. Intuitively, this means that we want to formulate a training objective that flattens f in a small neighborhood of any natural image x: if a perturbed copy x′ is close to x, we want f(x) to be close to f(x′)” [pg. 4481 – 4482, § 3.1. stability objective; Zheng discloses stability training which trains a neural network to take in small perturbations of the input data and causes the neural network to not misclassify an image by keeping the output of f(x) and f(x’) to be close.]).
Papernot, Mathieu and Zheng are all in the same field of endeavor of training deep neural networks against adversarial attacks. Papernot discloses a method of  [Zheng, pg. 4482, § 3.3.Stability for feature embeddings, ¶1] 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Xu et al. (Multi-loss Regularized Deep Neural Network) discloses DNN learning using multiple loss functions. Nguyen et al. (Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images) discloses training the neural network to correctly classify images.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL H HOANG whose telephone number is (571)272-8491.  The examiner can normally be reached on Mon-Fri 8:00AM-4:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571) 272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/M.H.H./Examiner, Art Unit 2122                                                                                                                                                                                                        

/ERIC NILSSON/           Primary Examiner, Art Unit 2122