Detailed Action
This action is in response to Applicant's communications filed 18 October 2018.  
Claims 1-20 are pending in this Application.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement(s) (IDS) submitted on 18 October 2018 is/are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement(s) is/are being considered by the examiner.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1 and 11-13 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Grosse et al. (Adversarial Examples for Malware Detection, hereinafter "Grosse").

Regarding Claim 1,
Grosse teaches a method for detecting security vulnerabilities, comprising:
generating a corpus of input samples each labeled to indicate a threat level when executed by an input processing code (Fig. 1, malware, benign, p. 3; "our training set" sec. 5.2, p. 12; "benign samples" sec. 5.2, p. 13; "malware samples" sec. 5.2, p. 13; "neural network malware detection system", sec. 3.2, p. 5; this teaches that the input samples are labeled as benign or malware for the malware detection classifier, with benign and malware indicating threat levels);
training a neural network (NN) using the plurality of input samples to classify inputs according to a plurality of labels of the plurality of input samples ("We will thus train out own neural network malware detection system. This also enables us to consider a worst case attacker having full knowledge about model and training data. Since the binary indicator vector X we use to represent an application does not possess any particular structural properties or interdependencies, like for example images, we apply a regular, feed-forward neural network as described in Section 2 to solve our malware classification task." sec. 3.2, p. 5);
for each input sample:
iteratively altering the input sample ("Crafting an adversarial example x∗—misclassified by model F—from a legitimate sample x can be formalized as the following problem [36]: where δx is the minimal perturbation z yielding misclassification, according to a norm || · || appropriate for the input domain." sec. 2.3, p. 4) to correspond to a process of gradient change of the NN, until the NN classifies the altered input sample to a different label than a respective label of the input sample ("To craft an adversarial example, we take mainly two steps. In the first, we compute the gradient of F with respect to X to estimate the direction in which a perturbation in X would change F’s output. In the second step, we choose a perturbation δ of X with maximal positive gradient into our target class y'. For malware misclassification, this means that we choose the index i = arg maxj∈[1,m],Xj=0 F0(Xj) that maximizes the change into our target class 0 by changing Xi." sec. 3.3, p. 7);
assigning the different label to the altered input sample ("2. Craft adversarial examples A for F using the forward gradient method described in Section 3.3" sec. 5.2, p. 12;  Fig. 1, malware, benign, p. 3; "We combined the adversarial examples to create training batches by mixing them with benign samples at each network's malware ratio." sec. 5.2, p. 13);
using the plurality of relabeled altered input samples to further train the NN and augment the corpus of input samples ("3. Iterate additional training epochs on F with the adversarial examples from the last step as additional, malicious samples." sec. 5.2, p. 12; "We then trained the network for one more epoch on one training batch and re-evaluated their susceptibility against adversarial examples." sec. 5.2, p. 13).

Regarding Claim 11,
Grosse teaches the method of claim 1.  Grosse further teaches performing a plurality of intermediate alterations of the input sample; and incorporating the plurality of intermediate alterations into the corpus of input samples ("Next, we apply the adversarial example crafting algorithm described in Section 3 and observe how often the adversarial inputs are able to successfully mislead our neural network based classifiers. As mentioned previously, we quantify the performance of our algorithm through the achieved misclassification rate, which measures the amount of previously correctly classified malware that is misclassified after the adversarial example crafting. In addition, we also measure the average number of modifications required to achieve misclassification to assess which architecture provided a harder time being mislead. As discussed above, we allow at most 20 modification to any of the malware applications." sec. 4.2, p. 9; this teaches that the adversarial examples include examples that are not misclassified (intermediate alterations) and examples that are misclassified (complete alterations)).

Regarding Claim 12,
Grosse teaches the method of claim 1.  Grosse further teaches wherein the NN alters the plurality of input samples ("We combined the adversarial examples to create training batches by mixing them with benign samples at each network’s malware ratio." sec. 5.2, p. 13") to allow the NN to classify each input sample of the plurality of input samples to a respective label ("For the network trained with malware ratio 0.3 and 0.4, we observe a reduction of the misclassification rate, and an increase of the required average distortion for n1 and n2 additional training samples. For instance, we achieve a misclassification rate of 67% for the network trained with 100 additional samples at 0.3 malware ratio, from 73% for the original network." sec. 5.2, p. 13) which corresponds to a predetermined mapping between labels of the plurality of labels ("Since the DREBIN data set has a fairly unbalanced ratio between malware and benign applications, we experiment with different ratios of malware in each training batch to compare the achieved performance values. The number of training iterations is then set in such a way that all malware samples are at least used once. We evaluate the classification performance of each of these networks using accuracy, false negative and false positive rates as performance measures." sec. 4.1, p. 8-9; "We combined the adversarial examples to create training batches by mixing them with benign samples at each network’s malware ratio." sec. 5.2, p. 13; the malware ratio teaches the mapping between labels of the plurality of labels).

Regarding Claim(s) 13,
Claim(s) 13 recite(s) a system including a processor (Grosse: "computer" p.1) to execute code for performing functions corresponding to the method steps recited in claim(s) 1, respectively.  Grosse teaches the limitations of claim(s) 13 as set forth above in connection with claim(s) 1.  Therefore, claim(s) 13 is/are rejected under the same rationale as respective claim(s) 1.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim(s) 2-8 and 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Grosse et al. (Adversarial Examples for Malware Detection, hereinafter "Grosse") in view of Stevens et al. (Summoning Demons, The Pursuit of Exploitable Bugs in Machine Learning; hereinafter "Stevens").

Regarding Claim 2,
Grosse teaches the method of claim 1.  Grosse further teaches wherein each of the plurality of labels is a benign label (Fig. 1, benign; "benign samples" sec. 5.2, p. 13) or a malicious label (Fig. 1, malware; "malware samples" sec. 5.2, p. 13).  
Grosse does not explicitly teach wherein the plurality of labels is an error label.
Stevens teaches wherein the plurality of labels is an error label ("Fuzzing [20] is a popular method for bug discovery. A fuzzing tool tests a program using randomly generated inputs, which are often invalid or unexpected by the implementation, and records program exceptions or failures. In security, fuzzing has been employed to identify crashes that are indicative of memory safety errors in application. This technique has obvious applications to discovering one class of bug in machine learning systems—crashes" sec. 3.3, p. 4).
Grosse and Stevens are analogous art because both are directed to identifying malicious code. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the malware detection classifier of Grosse with the malware detection using fuzzing of Stevens.  The modification would have been obvious because one of ordinary skill in the art would be motivated to mitigate the thread of security vulnerabilities, as suggested by Stevens ("As a result of our work, we responsibly disclosed ﬁve vulnerabilities, established three new CVE-IDs, and illuminated a common insecure practice across many machine learning systems. Finally, we outline several research directions for further understanding and mitigating this threat." sec. Abs, p. 1).

Regarding Claim 3,
The Grosse/Stevens combination teaches the method of claim 2.  
Gross further teaches wherein the threat level indicated by the benign label corresponds to a benign effect produced by processing a respective input sample by the input processing code (Fig. 1, benign; "benign samples" sec. 5.2, p. 13; benign samples in contrast to malware samples teaches that the samples would have a benign effect), and
wherein the threat level indicated by the malicious label corresponds to a malicious effect produced by processing a respective input sample by the input processing code (Fig. 1, malware; "malware samples" sec. 5.2, p. 13; "detect network intrusions or other instances of malicious activities [35]", p. 1).

Gross does not teach wherein the threat level indicated by the error label corresponds to an error effect produced by processing a respective input sample by the input processing code.
Stevens further teaches wherein the threat level indicated by the error label corresponds to an error effect produced by processing a respective input sample by the input processing code ("Fuzzing [20] is a popular method for bug discovery. A fuzzing tool tests a program using randomly generated inputs, which are often invalid or unexpected by the implementation, and records program exceptions or failures. In security, fuzzing has been employed to identify crashes that are indicative of memory safety errors in application. This technique has obvious applications to discovering one class of bug in machine learning systems—crashes" sec. 3.3, p. 4).
Grosse and Stevens are analogous art because both are directed to identifying malicious code. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the malware detection classifier of Grosse with the malware detection using fuzzing of Stevens.  The modification would have been obvious because one of ordinary skill in the art would be motivated to mitigate the thread of security vulnerabilities, as suggested by Stevens ("As a result of our work, we responsibly disclosed ﬁve vulnerabilities, established three new CVE-IDs, and illuminated a common insecure practice across many machine learning systems. Finally, we outline several research directions for further understanding and mitigating this threat." sec. Abs, p. 1).

Regarding Claim 4,
Grosse teaches the method of claim 1.  Gross does not explicitly teach wherein generating the corpus of input samples is based on a fuzzing algorithm.
Stevens teaches wherein generating the corpus of input samples is based on a fuzzing algorithm ("Fuzzing [20] is a popular method for bug discovery. A fuzzing tool tests a program using randomly generated inputs, which are often invalid or unexpected by the implementation, and records program exceptions or failures. In security, fuzzing has been employed to identify crashes that are indicative of memory safety errors in application. This technique has obvious applications to discovering one class of bug in machine learning systems—crashes—but can we use fuzzing to ﬁnd bugs that silently corrupt the system’s outputs? In this section, we use OpenCV as a running example while describing our bug discovery methodology." sec. 3.3, p. 4).
Grosse and Stevens are analogous art because both are directed to identifying malicious code. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the malware detection classifier of Grosse with the malware detection using fuzzing of Stevens.  The modification would have been obvious because one of ordinary skill in the art would be motivated to mitigate the thread of security vulnerabilities, as suggested by Stevens ("As a result of our work, we responsibly disclosed ﬁve vulnerabilities, established three new CVE-IDs, and illuminated a common insecure practice across many machine learning systems. Finally, we outline several research directions for further understanding and mitigating this threat." sec. Abs, p. 1).

Regarding Claim 5,
The Grosse/Stevens combination teaches the method of claim 4.  Grosse further teaches labeling each of the input samples according to the threat level (Fig. 1, malware, benign, p. 3; "our training set" sec. 5.2, p. 12; "benign samples" sec. 5.2, p. 13; "malware samples" sec. 5.2, p. 13; "neural network malware detection system", sec. 3.2, p. 5; this teaches that the input samples are labeled as benign or malware for the malware detection classifier, with benign and malware indicating threat levels).

Grosse does not explicitly teach wherein the fuzzing algorithm is from a group comprising: generation based fuzzing of the input samples; genetic fuzzing of the input samples; concolic testing of the input samples.
Stevens further teaches wherein the fuzzing algorithm is from a group comprising:
generation based fuzzing of the input samples ("Research from Cha et al. [5] further explores automated generation of such perturbations. Utilizing a well-formed seed input, a mutational fuzzer iteratively manipulates the seed to achieve maximum path traversal in a target program. This technique can isolate particular sets of input that cause the program to enter a state that might be of interest for an attacker." sec. 5, p. 8; it is noted that the claims only require one algorithm, but Stevens teaches more than one);
genetic fuzzing of the input samples ("We use American Fuzzy Lop (AFL) [36] to instrument and fuzz-test machine learning programs. AFL was designed and is commonly used for ﬁnding crashes due to parsing failures, so the AFL loop involves running an application on multiple inputs and creating a report if an input causes a crash. AFL utilizes a genetic algorithm to generate inputs while maximizing the code coverage and has heuristics to discriminate between unique crashes and duplicates. We want to capitalize on AFL’s ability to maximize code coverage while also ﬁnding crashing inputs." sec. 3.3, p. 4);
Grosse and Stevens are analogous art because both are directed to identifying malicious code. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the malware detection classifier of Grosse with the malware detection using fuzzing of Stevens.  The modification would have been obvious because one of ordinary skill in the art would be motivated to mitigate the thread of security vulnerabilities, as suggested by Stevens ("As a result of our work, we responsibly disclosed ﬁve vulnerabilities, established three new CVE-IDs, and illuminated a common insecure practice across many machine learning systems. Finally, we outline several research directions for further understanding and mitigating this threat." sec. Abs, p. 1).

Regarding Claim 6,
The Grosse/Stevens combination teaches the method of claim 5. Grosse further teaches extracting a plurality of features from internal layers of the NN (Fig. 1, n = number features, p. 3; " As an example of an approach combining static and dynamic analysis we mention Marvin[20], which extracts features from an application while running it in an analysis sandbox and observing data flow, network behavior and other operations." sec. 2.1, p. 3; "Starting with the model input, each network layer produces an output used as input by the next layer. Networks with a single intermediate—hidden—layer are qualified as shallow neural networks whereas models with multiple hidden layers are deep neural networks. Using multiple hidden layers is interpreted as hierarchically extracting representations from the input [8], eventually producing a representation relevant to solve the machine learning task and output a prediction." sec. 2.2, p. 3),
creating a dictionary based on the plurality of features ("The forward derivative based approach introduced by Papernot et al. [25] evaluates the model’s output sensitivity to each input com-ponent using its Jacobian matrix. From this, we derive a saliency map ranking the individual features by their influence for a particular class." sec. 2.3, p. 4; the saliency map ranking the features teaches a dictionary based on the plurality of features; "each single application only exhibits very few features relatively to the entire feature set." sec. 3.1, p. 5; the feature set also teaches the dictionary), and 
using the dictionary for embedding dictionary entries within the corpus of input samples ("To make sure that modifications caused by the above algorithms do not change the application too much, we bound the maximum distortion δ applied to the original sample... In our case, each modification to an entry will always change its value by exactly 1, and we thus use the L1 norm to bound the overall number of features modified. We further bound the number of features to k = 20 (see Appendix B for details)." sec. 3.4, p. 7; features teach dictionary entries, the feature set teaches the dictionary, and crafting examples by modification of features teaches embedding dictionary entries within the corpus of input samples).

Regarding Claim 7,
The Grosse/Stevens combination teaches the method of claim 5.  Grosse further teaches augmenting the corpus of input samples by altering the input sample by the process of gradient change of the NN ("To craft an adversarial example, we take mainly two steps. In the first, we compute the gradient of F with respect to X to estimate the direction in which a perturbation in X would change F’s output. In the second step, we choose a perturbation δ of X with maximal positive gradient into our target class y'. For malware misclassification, this means that we choose the index i = arg maxj∈[1,m],Xj=0 F0(Xj) that maximizes the change into our target class 0 by changing Xi." sec. 3.3, p. 7).

Gross does not explicitly teach augmenting the corpus of input samples by altering the input sample by the fuzzing algorithm .
Stevens teaches augmenting the corpus of input samples by altering the input sample by the fuzzing algorithm ("We use American Fuzzy Lop (AFL) [36] to instrument and fuzz-test machine learning programs. AFL was designed and is commonly used for ﬁnding crashes due to parsing failures, so the AFL loop involves running an application on multiple inputs and creating a report if an input causes a crash. AFL utilizes a genetic algorithm to generate inputs while maximizing the code coverage and has heuristics to discriminate between unique crashes and duplicates. We want to capitalize on AFL’s ability to maximize code coverage while also ﬁnding crashing inputs." sec. 3.3, p. 4).
Grosse and Stevens are analogous art because both are directed to identifying malicious code. It would have been obvious to one of ordinary skill in the art before the effective filing date to alternate adversarial example generator of Grosse with the fuzzing algorithm of Stevens.  The modification would have been obvious because one of ordinary skill in the art would be motivated to mitigate the thread of security vulnerabilities, as suggested by Stevens ("As a result of our work, we responsibly disclosed ﬁve vulnerabilities, established three new CVE-IDs, and illuminated a common insecure practice across many machine learning systems. Finally, we outline several research directions for further understanding and mitigating this threat." sec. Abs, p. 1).

Regarding Claim 8,
The Grosse/Stevens combination teaches the method of claim 5. Stevens further teaches training the NN using the plurality of input samples and respective feature extraction data obtained during processing the plurality of input samples ("A common, but optional, practice is to normalize the features prior to feeding them to the algorithm. This involves feature scaling and standardization. In the training phase, the (normalized) features are applied onto the current model in order to obtain the perceived prediction. The predictions are compared to the actual class labels using a cost function. The cost function output quantiﬁes the distance between the current model and the ground truth. The model is then updated to reduce the cost through a minimization algorithm. This iterative process is repeated until the model becomes a suﬃciently accurate representation of the ground truth. Upon convergence, the model is used to predict new class labels." sec. 3.1, p. 3)  by the fuzzing algorithm ("We use American Fuzzy Lop (AFL) [36] to instrument and fuzz-test machine learning programs. AFL was designed and is commonly used for ﬁnding crashes due to parsing failures, so the AFL loop involves running an application on multiple inputs and creating a report if an input causes a crash. AFL utilizes a genetic algorithm to generate inputs while maximizing the code coverage and has heuristics to discriminate between unique crashes and duplicates. We want to capitalize on AFL’s ability to maximize code coverage while also ﬁnding crashing inputs." sec. 3.3, p. 4).
Grosse and Stevens are analogous art because both are directed to identifying malicious code. It would have been obvious to one of ordinary skill in the art before the effective filing date to alternate adversarial example generator of Grosse with the fuzzing algorithm of Stevens.  The modification would have been obvious because one of ordinary skill in the art would be motivated to mitigate the thread of security vulnerabilities, as suggested by Stevens ("As a result of our work, we responsibly disclosed ﬁve vulnerabilities, established three new CVE-IDs, and illuminated a common insecure practice across many machine learning systems. Finally, we outline several research directions for further understanding and mitigating this threat." sec. Abs, p. 1).

Regarding Claim 10,
Grosse teaches the method of claim 1.  Grosse does not explicitly teach wherein an amount of iterations applied for the gradient change of the NN on altered input samples of a plurality of altered input samples is equal to an amount of iterations needed until a respective internal layer of the NN reaches a state of maximal activation.
Stevens teaches wherein an amount of iterations applied for the gradient change of the NN on altered input samples of a plurality of altered input samples is equal to an amount of iterations needed until a respective internal layer of the NN reaches a state of maximal activation (Stevens "In the training phase, the (normalized) features are applied onto the current model in order to obtain the perceived prediction. The predictions are compared to the actual class labels using a cost function. The cost function output quantiﬁes the distance between the current model and the ground truth. The model is then updated to reduce the cost through a minimization algorithm. This iterative process is repeated until the model becomes a suﬃciently accurate representation of the ground truth. Upon convergence, the model is used to predict new class labels." sec. 3.1, p. 3; minimizing the cost function that quantifies the distance between the current model and the ground truth teaches the maximal activation).
Grosse and Stevens are analogous art because both are directed to identifying malicious code. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the malware detection classifier of Grosse with the malware detection using fuzzing of Stevens.  The modification would have been obvious because one of ordinary skill in the art would be motivated to mitigate the thread of security vulnerabilities, as suggested by Stevens ("As a result of our work, we responsibly disclosed ﬁve vulnerabilities, established three new CVE-IDs, and illuminated a common insecure practice across many machine learning systems. Finally, we outline several research directions for further understanding and mitigating this threat." sec. Abs, p. 1).

Claim(s) 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Grosse et al. (Adversarial Examples for Malware Detection, hereinafter "Grosse") in view of Nichols et al. (Faster Fuzzing: Reinitialization with Deep Neural Models, hereinafter "Nichols").

Regarding Claim 9,
Grosse teaches the method of claim 1.  Grosse does not explicitly teach discarding a plurality of relabeled altered input samples which do not trigger unique execution paths when processed by the input processing code.
Nichols teaches discarding a plurality of relabeled altered input samples ("American Fuzzy Lop (AFL) is an advanced fuzzing framework that has been used to discover a number of novel software vulnerabilities (https://github.com/mrash/afl-cve). AFL uses random mutations of byte strings to identify unique code paths and discover defects in target programs. The inputs that successfully generated unique code paths are then documented as "seed ﬁles". We propose to use these native seed ﬁles as training data for deep generative models to create augmented seed ﬁles. Our proposed reinitialization methods are a scalable process that can improve the time to discovery of software defects." p. 1-2) which do not trigger unique execution paths when processed by the input processing code ("After removing identical seed ﬁles from across the nodes, and seed ﬁles that resulted in the same code path length, we estimate 802 of the initial ﬁles were duplicates from the independent worker nodes. Removing those duplicates resulted in a total of 38,384 unique ﬁles." p. 3).
Grosse and Nichols are analogous art because both are directed to software vulnerabilities. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the malware detector of Grosse with the software defect detector of Nichols.  The modification would have been obvious because one of ordinary skill in the art would be motivated to improve the time to discovery of software defects, as suggested by Nichols ("Our proposed reinitialization methods are a scalable process that can improve the time to discovery of software defects." sec. 1, p. 2).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHARLES C KUO whose telephone number is (571)270-7477. The examiner can normally be reached M-F: 9:00 a.m. - 6:00 p.m..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571) 272-9767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/CHARLES C KUO/Examiner, Art Unit 2126    
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126