DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
1.The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

Response to Arguments
2. Applicant's arguments filed 06/15/2022; claims 1-3,5-6,9,11,13-15 and 18-20 have been amended hereby acknowledged.

3. Applicants amended made the withdrawal of 112 (b) rejection over claims 3,6,7,13 and 18-19.

4. Applicants arguments regarding independent claims 1,11 and 20 have been fully considered but they are not persuasive. 

5. Applicant argues that Zhang doses not discloses: “ the computer system determines whether the input data is an adversarial attack based on the input data and the output of the DNN after processing the input data”, as recited by the newly amended independent claims

 6. Examiner would like to point out that Zhang teaches this limitation, see Para:0026-0027 teaches  adversarial attacks intentionally inject small perturbations (also known as adversarial examples) to a DNN's input data to cause misclassifications. Fig.1 is an example of an adversarial attack 10 on an original DNN (deep neural network) using an image to create a misclassification by the original DNN. An original image 50 of a panda. The original image 50 has a 66 percent (%) probability of a DNN's selecting the class “panda” for the image. An adversarial attack 10 injects perturbations c, illustrated by image 70, into a data stream with the original image 50 to create the final image 90. The final image 90, which has been perturbed by the adversarial attack 10, causes the DNN to select the class “dog” with a 99.6% confidence. Thus, the adversarial attack caused a high probability of error in image detection for this example.

Para:0036 also teaches detecting adversarial attacks through decoy training.  Generating decoy data from regular data; training a deep neural network, which has been trained with the regular data, with the decoy data; responsive to a client request comprising input data, operating the trained deep neural network on the input data; performing post-processing using at least an output of the operated trained deep neural network to determine whether the input data is regular data or decoy data; and performing one or more actions based on a result of the performed post-processing. Para:0057-0060 teaches as an example, of generating a decoy data (“cat” + “0”) in class “cat”, and assign a “dog” label to such a decoy sample, training a DNN (Deep neural network) model (e.g., g(x)) with both regular and decoy data to detect “dogs” and “cats”. If an attacker attempts to generate adversarial examples from “cat” to “dog”, the examples generated will be similar to (“cat” + “0”). However, while “cat” + “0” is classified as “dog” and a regular dog is also classified as “dog”, there are significant differences between them in terms of the final distribution/output. Therefore, the techniques described herein can either check the distribution of logits layer or train a new DNN model (e.g., g(x)) to distinguish them, as possible examples of implementation. 

Therefore , Zhang teaches determines whether the input data is an adversarial attack based on the input data and the output of the DNN after processing the input data. 

Moreover, output data can only be detected as adversarial attack based on the input data, wherein the input data is processed to generate output data, as such output data is based on the input data. As such meets the above claimed limitation.
Claim Rejections - 35 USC § 102
7. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

8. Claims 1,8-11,13 and 19- 20 are rejected under 35 U.S.C. 102 (a)(2) as being anticipated by Zhang (US Pub.No.2020/0005133).

9. Regarding claim 1 Zhang teaches a computer-implemented method for detecting adversarial attacks on a machine-learning (ML) system, the method comprising: processing data via a ML model included in the ML system to generate output data; and processing, via an adversarial detection module included in the ML system, the data input into the ML model and the output data to determine whether the data input into the ML model is adversarial (Para:0026-0027 teaches adversarial attacks intentionally inject small perturbations (also known as adversarial examples) to a DNN's input data to cause misclassifications. Fig.1 is an example of an adversarial attack 10 on an original DNN (deep neural network) using an image to create a misclassification by the original DNN. An original image 50 of a panda. The original image 50 has a 66 percent (%) probability of a DNN's selecting the class “panda” for the image. An adversarial attack 10 injects perturbations c, illustrated by image 70, into a data stream with the original image 50 to create the final image 90. The final image 90, which has been perturbed by the adversarial attack 10, causes the DNN to select the class “dog” with a 99.6% confidence. Thus, the adversarial attack caused a high probability of error in image detection for this example. 
Para:0036 teaches detecting adversarial attacks through decoy training.  Generating decoy data from regular data; training a deep neural network, which has been trained with the regular data, with the decoy data; responsive to a client request comprising input data, operating the trained deep neural network on the input data; performing post-processing using at least an output of the operated trained deep neural network to determine whether the input data is regular data or decoy data; and performing one or more actions based on a result of the performed post-processing. Para:0057-0062 teaches as an example, of generating a decoy data (“cat” + “0”) in class “cat”, and assign a “dog” label to such a decoy sample, training a DNN (Deep neural network) model (e.g., g(x)) with both regular and decoy data to detect “dogs” and “cats”. If an attacker attempts to generate adversarial examples from “cat” to “dog”, the examples generated will be similar to (“cat” + “0”). However, while “cat” + “0” is classified as “dog” and a regular dog is also classified as “dog”, there are significant differences between them in terms of the final distribution/output. Therefore, the techniques described herein can either check the distribution of logits layer or train a new DNN model (e.g., g(x)) to distinguish them, as possible examples of implementation. 
Moreover, output data can only be detected as adversarial attack based on the input data, wherein the input data is processed to generate output data, as such output data is based on the input data);

and performing one or more remedial actions if the data input into the ML model is determined to be adversarial (Para:0070-0074 and Para:0079 teaches performing one or more remedial actions such as blocking the request, returning the correct labels or retuning random labels, if the input is determined to be adversarial).

10. Regarding claim 8 Zhang teaches the computer-implemented method, wherein the adversarial detection module comprises a software module capable of being implemented in multiple different ML systems without modification (Para:0026-0028 teaches the adversarial detection software module can be implement on different ML system such as DNN).

11. Regarding claim 9 Zhang teaches the computer-implemented method, wherein the input data  includes at least one of an image, a microphone recording, a thermal camera image, LIDAR (Light Detection and Ranging) data, or RADAR data (Para:0026-0027 and Para:0050 -0051 teaches the data input includes an camera image or Radar data).

12. Regarding claim 10 Zhang teaches the computer-implemented method, wherein the ML model comprises one of a deep learning model, a support vector machine, a boosted tree, a random forest, a logistic regression model, or a linear regression model (Zhang: Para:0003, Para:00020-0024 and Para:0042 teaches the ML model comprises the deep neural learning model).

13.  Regarding claim 11 Zhang teaches a non-transitory computer-readable storage medium including instructions that, when executed by a processor, cause the processor to perform steps for detecting adversarial attacks on a machine-learning (ML) system, the steps comprising:   receiving data via a ML model included in the ML system, input  for the ML model; processing, via the ML model, the input data to generate output data: receiving, via an adversarial detection module that is included in the ML system, the input data and the output data; determining, via the adversarial detection module, whether the input data is adversarial based on the input data and the output data (Para:0026-0027 teaches adversarial attacks intentionally inject small perturbations (also known as adversarial examples) to a DNN's input data to cause misclassifications. Fig.1 is an example of an adversarial attack 10 on an original DNN (deep neural network) using an image to create a misclassification by the original DNN. An original image 50 of a panda. The original image 50 has a 66 percent (%) probability of a DNN's selecting the class “panda” for the image. An adversarial attack 10 injects perturbations c, illustrated by image 70, into a data stream with the original image 50 to create the final image 90. The final image 90, which has been perturbed by the adversarial attack 10, causes the DNN to select the class “dog” with a 99.6% confidence. Thus, the adversarial attack caused a high probability of error in image detection for this example. 
Para:0036 teaches detecting adversarial attacks through decoy training.  Generating decoy data from regular data; training a deep neural network, which has been trained with the regular data, with the decoy data; responsive to a client request comprising input data, operating the trained deep neural network on the input data; performing post-processing using at least an output of the operated trained deep neural network to determine whether the input data is regular data or decoy data; and performing one or more actions based on a result of the performed post-processing. Para:0057-0062 teaches as an example, of generating a decoy data (“cat” + “0”) in class “cat”, and assign a “dog” label to such a decoy sample, training a DNN (Deep neural network) model (e.g., g(x)) with both regular and decoy data to detect “dogs” and “cats”. If an attacker attempts to generate adversarial examples from “cat” to “dog”, the examples generated will be similar to (“cat” + “0”). However, while “cat” + “0” is classified as “dog” and a regular dog is also classified as “dog”, there are significant differences between them in terms of the final distribution/output. Therefore, the techniques described herein can either check the distribution of logits layer or train a new DNN model (e.g., g(x)) to distinguish them, as possible examples of implementation. 
Moreover, output data can only be detected as adversarial attack based on the input data, wherein the input data is processed to generate output data, as such output data is based on the input data);

and performing one or more remedial actions if the data input into the ML model is determined to be adversarial (Para:0070-0074 and Para:0079 teaches performing one or more remedial actions such as blocking the request, returning the correct labels or retuning random labels, if the input is determined to be adversarial).

14. Regarding claim 13 Zhang teaches the computer-readable storage medium wherein the one or more remedial actions include accessing an alternative source of information to classify the input data (Figs.4a-b, Para:0054-0056 and Para:0078-0080 teaches the client 101 (in this example, a human being) in step 4, operation 441, sends a request 460 including input data 461. The DNN f(x) 280 is executed using the input data 461. The post-processing 435 that is performed in step 4, operation 446, is performed on the output 450 of the DNN f(x) 280. In block 480, the server computer system 170, using previously recorded results of the logits layer from the DNN f(x) 280, compares similarity between the input data 461, decoy data 415, and regular training data 405. In block 482, given input data, the DNN f(x) 280 (e.g., under control of the server computer system 170) can determine its output class “a”. The server computer system 170 then compares the logits of the input data to the logits of all (e.g., or a random sampling of) the regular data in class “a” and all the decoy data in class “a”. One technique for this comparison is similarity, and one way to determine similarity is to determine a similarity score against the regular data and decoy data. Typically, the results (i.e., output) of the logits layer 481 are just vectors, and one can use, e.g., the general cosine similarity or Euclidean distance to calculate their similarity. In block 485, the server computer system 170, in response to the logits result of the input image being much more similar to decoy data than regular training data, detects the input data as an adversarial attack and takes some predetermined protective action 470. For instance, similarity may be determined using general cosine similarity or Euclidean distance, for (1) logits output of the input data 461 and the logits output of the regular data and (2) logits output of the input data and the logits output of the decoy data. Whichever of these has the best value based on the particular metric being used would be selected. If that selection is the decoy data, then this is detected as an adversarial attack. Otherwise, the output 450 is returned. The predetermined protective action 470 or return of the output 450 would occur using the output 456).

15. Regarding claim 19 Zhang teaches the computer-readable storage medium, wherein the adversarial detection module determines whether the input data is adversarial independently of a type of the ML model (Para:0026-0028 teaches determines whether the input data is adversarial independently of a type of the ML model).

16. Regarding claim 20 Zhang teaches a system, comprising: a memory storing a machine learning (ML) system comprising a ML model and an adversarial detection module,  and
a processor that is coupled to the memory and configured to, receive, via the 
the adversarial detection module, the input data and the output data; determine, via the adversarial detection module, whether the input data is adversarial based on the input data and the output data (Para:0026-0027 teaches adversarial attacks intentionally inject small perturbations (also known as adversarial examples) to a DNN's input data to cause misclassifications. Fig.1 is an example of an adversarial attack 10 on an original DNN (deep neural network) using an image to create a misclassification by the original DNN. An original image 50 of a panda. The original image 50 has a 66 percent (%) probability of a DNN's selecting the class “panda” for the image. An adversarial attack 10 injects perturbations c, illustrated by image 70, into a data stream with the original image 50 to create the final image 90. The final image 90, which has been perturbed by the adversarial attack 10, causes the DNN to select the class “dog” with a 99.6% confidence. Thus, the adversarial attack caused a high probability of error in image detection for this example. 
Para:0036 teaches detecting adversarial attacks through decoy training.  Generating decoy data from regular data; training a deep neural network, which has been trained with the regular data, with the decoy data; responsive to a client request comprising input data, operating the trained deep neural network on the input data; performing post-processing using at least an output of the operated trained deep neural network to determine whether the input data is regular data or decoy data; and performing one or more actions based on a result of the performed post-processing. Para:0057-0062 teaches as an example, of generating a decoy data (“cat” + “0”) in class “cat”, and assign a “dog” label to such a decoy sample, training a DNN (Deep neural network) model (e.g., g(x)) with both regular and decoy data to detect “dogs” and “cats”. If an attacker attempts to generate adversarial examples from “cat” to “dog”, the examples generated will be similar to (“cat” + “0”). However, while “cat” + “0” is classified as “dog” and a regular dog is also classified as “dog”, there are significant differences between them in terms of the final distribution/output. Therefore, the techniques described herein can either check the distribution of logits layer or train a new DNN model (e.g., g(x)) to distinguish them, as possible examples of implementation. 
Moreover, output data can only be detected as adversarial attack based on the input data, wherein the input data is processed to generate output data, as such output data is based on the input data);

and performing one or more remedial actions if the data input into the ML model is determined to be adversarial (Para:0070-0074 and Para:0079 teaches performing one or more remedial actions such as blocking the request, returning the correct labels or retuning random labels, if the input is determined to be adversarial).

                                           Claim Rejections - 35 USC § 103
17.The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

18.Claims 2-4, 12 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang (US Pub.No.2020/0005133) as applied to claims 1,11 above and in view of Araujo (US Pat.No.10,733,292).

19. Regarding claim 2 Zhang teaches the computer-implemented method, wherein processing the input data and the output data via the adversarial detection module comprises: perturbing the input data using a set of predefined random perturbations; inputting the perturbed data into the model included in the adversarial detection module which generates output perturbations; and determining a difference between the output perturbations and a set of expected output perturbations (Para:0054-0056 teaches a first adversarial attack 10 (from FIG. 1) on an original DNN using an image to create a misclassification by an original neural network and a second adversarial attack 300 using the same image to create a misclassification by a neural network with decoy training. The adversarial attack 10 is only for comparison, as a currently existing reference system. For the second adversarial attack 300, the original image 50 has a 66 percent (%) probability of the DNN's selecting the class “panda” for the image. The second adversarial attack 300 is performed on the same DNN but that has undergone decoy training in accordance with an exemplary embodiment herein. The second adversarial attack 300 injects perturbations ε′, illustrated by image 370, into a data stream with the original image 50 to create the final image 390. The final image 390, which has been perturbed by the adversarial attack 300, causes the DNN to select the class “dog” with a 99.8% confidence. To implement decoy training and make adversarial examples more detectable, one possible exemplary method first generates training decoy samples for each DNN class, where the decoy data is similar to the regular training samples (i.e., data) of each class but may implement specially crafted patterns (e.g., watermarks). Then the method assigns counterfeit labels to the training decoy data (e.g., a decoy resembling the image of a cat is labeled as class “dog”). Next, the DNN is trained on both regular and decoy data. As a result, the regular data will still be classified as their original classes but the adversarial data, which are generated through, e.g., the gradient descent algorithm will resemble decoy data, and will be misclassified to incorrect classes. In this way, this exemplary approach can easily detect such adversarial examples. Para:0047-0048 teaches the data sample comprising fingerprints). 

Zhang teaches all the above claimed limitations but does not expressly teach inputting the perturbed data into the neural fingerprint model included in the adversarial detection module which generates output perturbations.

Araujo teaches inputting the perturbed data into the neural fingerprint model included in the adversarial detection module which generates output perturbations (Col.3, lines.8-38; Col.12, lines.56-67 and Col.13, lines.1-51 teaches inputting the perturbed data into the neural fingerprint model).

Therefore, it would have been obvious to one of ordinary skill in the art before the invention was filing to modify Zhang to include inputting the perturbed data into the neural fingerprint model included in the adversarial detection module which generates output perturbations as taught by Araujo in such a setup the tracing mechanisms uses data input as fingerprints to detect adversarial attack.

20. Regarding claim 3 Zhang teaches the computer-implemented method, wherein the difference between the output perturbations and the set of expected output perturbations comprises a distance within a feature space between each output perturbations and a corresponding expected output perturbation (Para:0054-0056 and Para:0078-0080 teaches the client 101 (in this example, a human being) in step 4, operation 441, sends a request 460 including input data 461. The DNN f(x) 280 is executed using the input data 461. The post-processing 435 that is performed in step 4, operation 446, is performed on the output 450 of the DNN f(x) 280. In block 480, the server computer system 170, using previously recorded results of the logits layer from the DNN f(x) 280, compares similarity between the input data 461, decoy data 415, and regular training data 405. In block 482, given input data, the DNN f(x) 280 (e.g., under control of the server computer system 170) can determine its output class “a”. The server computer system 170 then compares the logits of the input data to the logits of all (e.g., or a random sampling of) the regular data in class “a” and all the decoy data in class “a”. One technique for this comparison is similarity, and one way to determine similarity is to determine a similarity score against the regular data and decoy data. Typically, the results (i.e., output) of the logits layer 481 are just vectors, and one can use, e.g., the general cosine similarity or Euclidean distance to calculate their similarity. In block 485, the server computer system 170, in response to the logits result of the input image being much more similar to decoy data than regular training data, detects the input data as an adversarial attack and takes some predetermined protective action 470. For instance, similarity may be determined using general cosine similarity or Euclidean distance, for (1) logits output of the input data 461 and the logits output of the regular data and (2) logits output of the input data and the logits output of the decoy data. Whichever of these has the best value based on the particular metric being used would be selected. If that selection is the decoy data, then this is detected as an adversarial attack. Otherwise, the output 450 is returned. The predetermined protective action 470 or return of the output 450 would occur using the output 456).

21. Regarding claim 4 Zhang teaches the computer-implemented method, further comprising performing one or more remedial actions if the difference between the output perturbations and the set of expected output perturbations satisfies a predefined threshold (Para:0054-0056 and Para:0078-0080 teaches the client 101 (in this example, a human being) in step 4, operation 441, sends a request 460 including input data 461. The DNN f(x) 280 is executed using the input data 461. The post-processing 435 that is performed in step 4, operation 446, is performed on the output 450 of the DNN f(x) 280. In block 480, the server computer system 170, using previously recorded results of the logits layer from the DNN f(x) 280, compares similarity between the input data 461, decoy data 415, and regular training data 405. In block 482, given input data, the DNN f(x) 280 (e.g., under control of the server computer system 170) can determine its output class “a”. The server computer system 170 then compares the logits of the input data to the logits of all (e.g., or a random sampling of) the regular data in class “a” and all the decoy data in class “a”. One technique for this comparison is similarity, and one way to determine similarity is to determine a similarity score against the regular data and decoy data. Typically, the results (i.e., output) of the logits layer 481 are just vectors, and one can use, e.g., the general cosine similarity or Euclidean distance to calculate their similarity. In block 485, the server computer system 170, in response to the logits result of the input image being much more similar to decoy data than regular training data, detects the input data as an adversarial attack and takes some predetermined protective action 470. For instance, similarity may be determined using general cosine similarity or Euclidean distance, for (1) logits output of the input data 461 and the logits output of the regular data and (2) logits output of the input data and the logits output of the decoy data. Whichever of these has the best value based on the particular metric being used would be selected. If that selection is the decoy data, then this is detected as an adversarial attack. Otherwise, the output 450 is returned. The predetermined protective action 470 or return of the output 450 would occur using the output 456.
 An option (see block 486) for block 485 is to use the labels of a top k closest (based on the similarity) regular or decoy data to determine the type of input data. Consider an example. Assume k=10, and there is some mixture of regular and decoy data in the top k closest regular or decoy data. In order, to decide whether the input data is regular data or decoy data, one may set a threshold t (e.g., t=50%) here. In this case, if more than five are decoy data, the input is assumed to be decoy data. Similarly, if more than five are regular data, the input is assumed to be regular data. If there are five of each regular and decoy data, then an error could be generated or additional metrics might be used to make this decision).

22. Regarding claim 12 Zhang teaches all the above claimed limitations but does not expressly teach the computer-readable storage medium, wherein the one or more remedial actions include notifying a user.

Araujo teaches the computer-readable storage medium, wherein the one or more remedial actions include notifying a user (Col.14, lines.4-45 teaches remedial actions include notifying a user).

Therefore, it would have been obvious to one of ordinary skill in the art before the invention was filing to modify Zhang to include the one or more remedial actions include notifying a user as taught by Araujo such a setup would yield a predictable result of notifying authorized personnel and take further action to identify the attacker.

23. Regarding claim 14 Zhang teaches the computer-readable storage medium, wherein processing the input data and the output data via the adversarial detection module comprises: perturbing the input data using a set of predefined random perturbations; inputting the perturbed data into a neural model included in the adversarial detection module which generates output perturbations; and determining a difference between the output perturbations and a set of expected output perturbations (Para:0054-0056 teaches a first adversarial attack 10 (from FIG. 1) on an original DNN using an image to create a misclassification by an original neural network and a second adversarial attack 300 using the same image to create a misclassification by a neural network with decoy training. The adversarial attack 10 is only for comparison, as a currently existing reference system. For the second adversarial attack 300, the original image 50 has a 66 percent (%) probability of the DNN's selecting the class “panda” for the image. The second adversarial attack 300 is performed on the same DNN but that has undergone decoy training in accordance with an exemplary embodiment herein. The second adversarial attack 300 injects perturbations ε′, illustrated by image 370, into a data stream with the original image 50 to create the final image 390. The final image 390, which has been perturbed by the adversarial attack 300, causes the DNN to select the class “dog” with a 99.8% confidence. To implement decoy training and make adversarial examples more detectable, one possible exemplary method first generates training decoy samples for each DNN class, where the decoy data is similar to the regular training samples (i.e., data) of each class but may implement specially crafted patterns (e.g., watermarks). Then the method assigns counterfeit labels to the training decoy data (e.g., a decoy resembling the image of a cat is labeled as class “dog”). Next, the DNN is trained on both regular and decoy data. As a result, the regular data will still be classified as their original classes but the adversarial data, which are generated through, e.g., the gradient descent algorithm will resemble decoy data, and will be misclassified to incorrect classes. In this way, this exemplary approach can easily detect such adversarial examples). 

Zhang teaches all the above claimed limitations but does not expressly teach inputting the perturbed data into the neural fingerprint model included in the adversarial detection module which generates output perturbations.

Araujo teaches inputting the perturbed data into the neural fingerprint model included in the adversarial detection module which generates output perturbations (Col.3, lines.8-38; Col.12, lines.56-67 and Col.13, lines.1-51 teaches inputting the perturbed data into the neural fingerprint model).

Therefore, it would have been obvious to one of ordinary skill in the art before the invention was filing to modify Zhang to include inputting the perturbed data into the neural fingerprint model included in the adversarial detection module which generates output perturbations as taught by Araujo in such a setup the tracing mechanisms uses data input as fingerprints to detect adversarial attack.

24.Claims 5-7,15-18 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang (US Pub.No.2020/0005133) as applied to claims 1,11 above and further in view of Wang (US Pub. No. 2020/0410228). 

25. Regarding claims 5 and 15 Zhang teaches all the above claimed limitations but does not expressly teach the computer-implemented method and the computer-readable storage medium, wherein processing the input  data and the output data via the adversarial detection module comprises: extracting, via a surrogate ML model included in the adversarial detection module, features from the input data; and determining whether the input data is adversarial based on a comparison of the features extracted via the surrogate ML model with an expected feature distribution associated with the output data. 

Wang teaches the computer-implemented method, wherein processing the data input into the ML model and the output data via the adversarial detection module comprises: extracting, via a surrogate ML model included in the adversarial detection module, features from the data input into the ML model; and determining whether the data input into the ML model is adversarial based on a comparison of the features extracted via the surrogate ML model with an expected feature distribution associated with the output data (Para:0023 and Pra:0029 teaches  the surrogate ML model will extract the features from the data input into the ML model and determines whether the data input into the ML model is adversarial).

Therefore, it would have been obvious to one of ordinary skill in the art before the invention was filing to modify Zhang to include extracting, via a surrogate ML model included in the adversarial detection module, features from the data input into the ML model; and determining whether the data input into the ML model is adversarial based on a comparison of the features extracted via the surrogate ML model with an expected feature distribution associated with the output data as taught by Wang such a setup would result in fast training of adversarially robust models against adversarial attacks.

26. Regarding claim 6 Zhang  in view of Wang teaches the computer-implemented method, wherein the comparison determines an energy  distance between the extracted features with the expected feature distribution or a maximum mean discrepancy between the extracted features with the expected feature distribution (Zhang: Para:0078-0080 teaches he server computer system 170, in response to the logits result of the input image being much more similar to decoy data than regular training data, detects the input data as an adversarial attack and takes some predetermined protective action 470. For instance, similarity may be determined using general cosine similarity or Euclidean distance, for (1) logits output of the input data 461 and the logits output of the regular data and (2) logits output of the input data and the logits output of the decoy data. Whichever of these has the best value based on the particular metric being used would be selected. If that selection is the decoy data, then this is detected as an adversarial attack. Otherwise, the output 450 is returned. The predetermined protective action 470 or return of the output 450 would occur using the output 456.  An option (see block 486) for block 485 is to use the labels of a top k closest (based on the similarity) regular or decoy data to determine the type of input data. Consider an example. Assume k=10, and there is some mixture of regular and decoy data in the top k closest regular or decoy data. In order, to decide whether the input data is regular data or decoy data, one may set a threshold t (e.g., t=50%) here. In this case, if more than five are decoy data, the input is assumed to be decoy data. Similarly, if more than five are regular data, the input is assumed to be regular data. If there are five of each regular and decoy data, then an error could be generated or additional metrics might be used to make this decision.

Wang:  Para:0023 and Pra:0029 teaches extract the features from the data input into the ML model and determines whether the data input into the ML model is adversarial).

27. Regarding claim 7 Zhang teaches the computer-implemented method, further comprising performing one or more remedial actions if the energy distance or maximum mean discrepancy satisfies a predefined threshold (Para:0074 and Para:0078-0080 teaches the server computer system 170, in response to the logits result of the input image being much more similar to decoy data than regular training data, detects the input data as an adversarial attack and takes some predetermined protective action 470. For instance, similarity may be determined using general cosine similarity or Euclidean distance, for (1) logits output of the input data 461 and the logits output of the regular data and (2) logits output of the input data and the logits output of the decoy data. Whichever of these has the best value based on the particular metric being used would be selected. If that selection is the decoy data, then this is detected as an adversarial attack. Otherwise, the output 450 is returned. The predetermined protective action 470 or return of the output 450 would occur using the output 456. An option (see block 486) for block 485 is to use the labels of a top k closest (based on the similarity) regular or decoy data to determine the type of input data. Consider an example. Assume k=10, and there is some mixture of regular and decoy data in the top k closest regular or decoy data. In order, to decide whether the input data is regular data or decoy data, one may set a threshold t (e.g., t=50%) here. In this case, if more than five are decoy data, the input is assumed to be decoy data. Similarly, if more than five are regular data, the input is assumed to be regular data. If there are five of each regular and decoy data, then an error could be generated or additional metrics might be used to make this decision).

28. Regarding claim 16 Zhang teaches all the above claimed limitations but does not expressly the computer-readable storage medium, wherein the surrogate ML model is trained on a smaller set of training data than the ML model is trained.

Wang teaches the computer-readable storage medium, wherein the surrogate ML model is trained on a smaller set of training data than the ML model is trained (Para:0023 and Para:0029 teaches the surrogate ML model is trained on a smaller set of training data. 

Therefore, it would have been obvious to one of ordinary skill in the art before the invention was filing to modify Zhang to include the surrogate ML model is trained on a smaller set of training data than the ML model is trained as taught by Wang such a setup would result in fast training of adversarial robust models against adversarial attacks.

29. Regarding claim 17 Zhang teaches all the above claimed limitations but does not expressly the computer-readable storage medium, wherein an architecture of the surrogate ML model is less complex than an architecture of the ML model.

Wang teaches the computer-readable storage medium, wherein an architecture of the surrogate ML model is less complex than an architecture of the ML model (Para:0023 and Para:0029 teaches the surrogate ML model is less complex since the surrogate ML model is trained on a smaller set of training data). 

Therefore, it would have been obvious to one of ordinary skill in the art before the invention was filing to modify Zhang to include an architecture of the surrogate ML model is less complex than an architecture of the ML model as taught by Wang such a setup would result in fast training of adversarially robust models against adversarial attacks.

30. Regarding claim 18 Zhang teaches the computer-readable storage medium, wherein: the comparison determines an energy distances between the extracted features and the expected feature distribution or a maximum mean discrepancy between the extracted features and the expected feature distribution; and the processor is further configured to perform one or more remedial actions if the energy distance or maximum mean discrepancy satisfies a predefined threshold (Para:0054-0056  and Para:0078-0080 teaches the client 101 (in this example, a human being) in step 4, operation 441, sends a request 460 including input data 461. The DNN f(x) 280 is executed using the input data 461. The post-processing 435 that is performed in step 4, operation 446, is performed on the output 450 of the DNN f(x) 280. In block 480, the server computer system 170, using previously recorded results of the logits layer from the DNN f(x) 280, compares similarity between the input data 461, decoy data 415, and regular training data 405. In block 482, given input data, the DNN f(x) 280 (e.g., under control of the server computer system 170) can determine its output class “a”. The server computer system 170 then compares the logits of the input data to the logits of all (e.g., or a random sampling of) the regular data in class “a” and all the decoy data in class “a”. One technique for this comparison is similarity, and one way to determine similarity is to determine a similarity score against the regular data and decoy data. Typically, the results (i.e., output) of the logits layer 481 are just vectors, and one can use, e.g., the general cosine similarity or Euclidean distance to calculate their similarity. In block 485, the server computer system 170, in response to the logits result of the input image being much more similar to decoy data than regular training data, detects the input data as an adversarial attack and takes some predetermined protective action 470. For instance, similarity may be determined using general cosine similarity or Euclidean distance, for (1) logits output of the input data 461 and the logits output of the regular data and (2) logits output of the input data and the logits output of the decoy data. Whichever of these has the best value based on the particular metric being used would be selected. If that selection is the decoy data, then this is detected as an adversarial attack. Otherwise, the output 450 is returned. The predetermined protective action 470 or return of the output 450 would occur using the output 456.
 An option (see block 486) for block 485 is to use the labels of a top k closest (based on the similarity) regular or decoy data to determine the type of input data. Consider an example. Assume k=10, and there is some mixture of regular and decoy data in the top k closest regular or decoy data. In order, to decide whether the input data is regular data or decoy data, one may set a threshold t (e.g., t=50%) here. In this case, if more than five are decoy data, the input is assumed to be decoy data. Similarly, if more than five are regular data, the input is assumed to be regular data. If there are five of each regular and decoy data, then an error could be generated or additional metrics might be used to make this decision).

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to DEREENA T CATTUNGAL whose telephone number is (571)270-0506. The examiner can normally be reached Mon-Fri: 7:30 AM-5 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Lynn Feild can be reached on 571-272-2092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DEREENA T CATTUNGAL/ Primary Examiner, Art Unit 2431