DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 18 April 2022 has been entered.

Response to Amendment
Applicant’s response, filed 18 April 2022, to the last office action has been entered and made of record. 
In response to the amendments to the claims, they are acknowledged, supported by the original disclosure, and no new matter is added.
In response to the amendments to the claims, specifically addressing the objection to the claim 13 of the previous Office action, the amended language has partially addressed the respective objection, and the objections is maintained and updated below.
Amendments to the independent claims 10 and 15 have necessitated an updated ground of rejection over the applied prior art. Please see below for the updated interpretations and rejections.
Response to Arguments
Applicant's arguments filed 18 April 2022 have been fully considered but they are not persuasive. 
Examiner notes the claims are treated with their broadest reasonable interpretations consistent with the specification. See MPEP 2111. Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims. See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993). Furthermore, the test for obviousness is what the combined teachings of the references would have suggested to those of ordinary skill in the art. See In re Keller, 642 F.2d 413, 208 USPQ871 (CCPA 1981).
	
In response to Applicant’s remarks on p. 11 of Applicant’s reply, that Jang’s teachings towards adversarial attacks on machine-learning algorithms is not pertinent or applicable to training, the Examiner respectfully disagrees.
Ioffe, Szegardy, and Chang are relied upon to suggest a method for regularizing training data for a neural network by modifying the training data, where normalized label distributions associated with a training image is modified by a smoothing distribution, and the smoothing label distribution may be a non-uniform distribution that includes one or more smoothing scores that are capable of being different from one or more other smoothing scores in the same smoothing label distribution, and that training data is augmented with adversarial samples to increase robustness of the model against adversarial attacks (see Ioffe [0051]-[0052], [0058], and [0062]-[0063]; see Chang sect. 2.1 White Box Attacks and sect. 3. Methods). 
Jang is relied upon to teach a technique for finding adversarial samples, where the determined adversarial perturbation, “di*”, which corresponds to a decrease of probability used to “fool” the classifier into a different class, is the minimal norm solution to a gradient linear system function, where the corresponding decrease in probability amount is determined based on a gradient of the probability of a sample belonging to the class “l” (see Jang sect. 3. Our Algorithms; and see Jang sect. A. 2 Generalizing the Solution of 3).
As Ioffe, Szegardy, and Chang suggest performing adversarial training by incorporating adversarial samples into the training data and performing label smoothing on the training label distribution, Jang’s teachings for finding adversarial samples based on adversarial perturbations determined as minimal norm solution to a gradient linear system function, which the corresponding decrease in probability amount is determined based on a gradient of the probability of a sample belonging to a particular class. 
As the teachings of Jang is reasonably pertinent to the particular problem of performing adversarial training using adversarial samples, Jang’s teachings are relied upon as a basis for rejection of the claimed invention.  

In response to Applicant’s remarks on p. 11-12 of Applicant’s reply, that the teachings of Ioffe and Jang do not teach or suggest distributing a perturbation amount among at least one of the one or more non-ground truth classes, in which a non-ground-truth class receives a portion of the perturbation among based on the gradient of a classification loss with respect to that non-ground truth class, the Examiner respectfully disagrees. 
Ioffe is relied upon to teach smoothing a label distribution to generate a modified target label distribution based on a weighted sum of the initial target label distribution and a smoothing label distribution, where the modified distribution suggests the decreasing of the probability for a ground truth class by a predetermined amount and increasing the probability for non-ground truth class by the predetermined amount (see Ioffe [0051]-[0052] and [0058]).
	In the combined teachings of the cited prior art, Jang is relied upon to suggest that a determined amount to modify the probability values of a label distribution used to “fool” the classifier into a different class,  in which the highest label score in the training label distribution is reduced and the lowest scores in the training label distribution is increased, is based on the gradient of the probability function of a classifier that an input sample belongs to a particular class (see Jang sect. 3. Our Algorithm).
The combined teachings of the cited references suggest that the determined amount to modify the probability values of a label distribution in which the highest label score in the training label distribution is reduced and the lowest scores in the training label distribution is increased by the predetermined amount, suggesting the broadest reasonable interpretation that a portion of the determined amount to modify the label distribution is distributed to a non-ground-truth class.

In response to Applicant’s remarks on p. 12, that the teachings of Jang does not teach or suggest replacing the ground-truth label with a targeted label of a selected non-ground-truth class, which is a most confusing class having the minimum gradient of a classification loss among the one or more non-ground truth classes, the Examiner respectfully disagrees.
Ioffe is relied to teach generating a modified target label distribution is generated based on the initial training distribution and a smoothing label distribution (see Ioffe [0051]-[0052], [0058], and [0063]). Ioffe teachings thus suggest that the initial training distribution is replaced with the modified target label distribution. 
Jang teaches the determined adversarial perturbation, “di*”, which corresponds to the chosen decrease of probability used to “fool” the classifier into a different class, is the minimal norm solution to a gradient linear system function (see Jang sect. 3. Our Algorithm; see also Jang sect A.2 Generalizing the Solution of 3). 
As Jang teaches that the determined amount to modify the probability of a sample corresponds to the adversarial perturbation, “di*”, which is the minimal norm solution to a gradient linear system function, the combined teachings of the cited prior art references suggests that the predetermined amount to generate a modified target label distribution corresponds to a minimal norm solution to a gradient linear system function and to a minimum change in probability to “fool” the classifier into a different class. 
Thus, the combined teachings of the cited references suggest the broadest reasonable interpretation that the ground-truth label is replaced with a targeted label of a selected non-ground-truth class, which is a most confusing class having the minimum gradient of a classification loss among the one or more non-ground truth classes. 

In response to Applicant’s remarks on p. 12, that the teachings of Ioffe and Chang do not teach or suggest the concept of training the image classification model using the perturbed images and perturbed labels, the Examiner respectfully disagrees. 
Chang is relied upon to suggest that generated adversarial images are used with label smoothing to perform adversarial training for deep neural network models, where adversarial training augments the training dataset with adversarial examples, corresponding to claimed perturbed images, and label smoothing is also used for the training process (see Chang sect. 3.1. Adversarial Training, sect. 3.2.3 The final loss function, and sect. 3.3. The training process).
The combined teachings of the cited prior art references thus suggest the broadest reasonable interpretation of training image classification model using the perturbed images and perturbed labels.

Applicant’s remaining arguments towards amended independent claim 1, see p. 11 of Applicant’s reply, filed 18 April 2022, with respect to amended independent claim 1 have been fully considered and are persuasive.  The corresponding rejections of claims 1-9 of 18 January 2022 have been withdrawn. 
Claim Objections
Claim 13 and 19 are objected to because of the following informalities:  
Amended claim 13 recites in the body of the claim, a typographical error and the Examiner assumes “… and distributing the perturbation amount among at least one [[one]] of the one or more non-ground-truth classes” (emphasis added) is intended.  Appropriate correction is required.
Amended claim 19 recites in the body of the claim, a typographical error and the Examiner assumes “… distributing, for each of the one or more non-ground-truth classes other than the most confusing class, a portion of the perturbation amount that is related to [[to]] the gradient of the non-ground-truth class.” (emphasis added) is intended.  Appropriate correction is required.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 10-20 are rejected under 35 U.S.C. 103 as being unpatentable over Ioffe (US 20170132512), in view of Szegedy et al. (“Rethinking the Inception Architecture for Computer Vision”), herein Szegedy, Chang et al. (“Efficient Two-Step Adversarial Defense for Deep Neural Networks”), herein Chang, and Jang et al. (“Objective Metrics and Gradient Descent Algorithms for Adversarial Examples in Machine Learning”), herein Jang.
Regarding claim 10, Ioffe disclose a computer-implemented method for training an image classification model to improve robustness, the method comprising: 
receiving a clean image and a corresponding ground-truth label (see Ioffe [0061], where a set of training data is obtained; see Ioffe [0038], where the set of training data includes training items with associated labels; and see Ioffe [0042]-[0043], where the training data items are training images), the ground-truth label is represented as a ground-truth label representation comprising a probability distribution among a ground-truth class and one or more non-ground- truth classes (see Ioffe [0043]-[0045], where the training image are associated with training label distributions for a set of labels indicating a correct label among other labels); 
replacing the ground-truth label with a targeted label of a selected non-ground-truth class (see Ioffe [0063], where a modified target label distribution is generated based on the initial training distribution and a smoothing label distribution; suggesting that the initial training distribution is replaced with the modified target label distribution and reading upon the broadest reasonable interpretation of the claimed subject matter).
Although Ioffe teaches that the training label distribution may be a one hot distribution where the correct label is a positive value “1” and the other labels are assigned a value of “0” (see Ioffe [0045]); Ioffe does not explicitly disclose the probability distribution is in a probability simplex. 
Szegedy teaches in a related and pertinent model regularization using label smoothing of training examples for neural network models (see Szegedy sect. 7. Model Regularization via Label Smoothing), where the ground truth distribution, q(k|x), for training examples are normalized so that the sum of the distribution over labels is equal to 1 (see Szegedy sect. 7. Model Regularization via Label Smoothing).
At the time of filing, one of ordinary skill in the art would have found it obvious to apply the teachings of Szegedy to the teachings of Ioffe to normalize the label / ground truth distributions so that the sum of the distribution is equal to 1, and thus equivalent to being in a claimed “probability simplex” as defined in the instant specification paragraph [0042] as “a subset of the unit simplex in which each element in the vector is non-negative and the sum of the elements of the vector is one”. This modification is rationalized as an application of a known technique to a known device ready for improvement to yield predictable results. In this instance, Ioffe disclose a base method for regularizing training data for a neural network by modifying the training data, where the label distribution associated with a training image is modified by a smoothing distribution and that the training label distribution may be a one hot distribution vector. Szegedy teaches a known technique in performing label smoothing where the ground truth distribution for training examples are normalized so that the sum of the distribution over labels is equal to 1. One of ordinary skill in the art would have recognized that by applying Szegedy’s technique would allow for the Ioffe’s label distributions to be normalized such that the sum of the label distributions would equal to one, leading to an improved representation of the label distribution for subsequent computations.
Ioffe and Szegedy does not explicitly disclose generating an initial image by altering the clean image within a predetermined image perturbation budget; generating a perturbed image by applying at least one step projected gradient descent using a gradient of a classification loss function with respect to the initial image, the classification loss function being a function of an image, the targeted label and parameters of the image classification model; and training the image classification model using at least the perturbed image.
Chang teaches in a related and pertinent method for adversarial attacks and defense for deep neural networks (see Chang Abstract), where adversarial examples are generated per each clean input based on an adversarial image attack adding a noise perturbation amount to the clean input (see Chang sect. 3.2. Proposed Efficient Two-Step Adversarial Defense, and sect. 2.1 White Box Attacks), where Projected gradient descent (PGD) is a suggested method for adversarial attack, where the loss function is a function of the input image, label, and parameters of the model parameters, and then applies an iterative fast gradient sign method to generate adversarial examples (see Chang sect. 2.1 White Box attacks, Eq. (2), and sect. 3.1. Adversarial Training) and that the generated adversarial images are used with label smoothing to train the deep neural network model (see Chang sect. 3.1. Adversarial Training, sect. 3.2.3 The final loss function, and sect. 3.3. The training process).
At the time of filing, one of ordinary skill in the art would have found it obvious to apply the teachings of Chang to the teachings of Ioffe and Szegedy to generate adversarial training images using the acquired clean training image dataset based on adversarial attacks, where Projected gradient descent (PGD) is a suggested method for adversarial attack with a loss function that is a function of the input image, label, and parameters of the model parameters, and training the neural network with the training dataset augmented with the adversarial image examples. This modification is rationalized as an application of a known technique to a known device ready for improvement to yield predictable results. In this instance, Ioffe and Szegedy disclose a base method for regularizing training data for a neural network by modifying the training data, where normalized label distributions associated with a training image is modified by a smoothing distribution and that the training label distribution may be a one hot distribution vector. Chang teaches a known technique for performing adversarial training for deep neural networks, where adversarial examples are generated from clean training inputs based on adversarial image attacks, such as projected gradient descent method, and training the neural network using training data augmented with adversarial examples and with label smoothing performed. One of ordinary skill in the art would have recognized that by applying Chang’s technique would allow for improved adversarial training using a combination of generated adversarial images in the training data set and label smoothing of Ioffe and Szegedy’s modified label distributions. 
Although Ioffe teaches that the smoothing label distribution may be a non-uniform distribution that includes one or more smoothing scores that are capable of being different from one or more other smoothing scores in the same smoothing label distribution (see Ioffe [0062]); Ioffe, Szegedy, and Chang do not explicitly disclose that the selected non-ground truth class is a most confusing class having the minimum gradient of a classification loss among the one or more non-ground-truth classes.
Jang teaches in a related and pertinent gradient-descent based algorithm for finding adversarial samples (see Jang Abstract), where the decrease from a probability of a sample, “xi” belonging in class “l”, denoted as “pi” to a chosen probability value to be achieved, denoted as “pi+1”, where the decrease is denoted as “δi = pi – pi+1”, is determined as a minimum of a function of a gradient of the probability of a sample belonging to class “l”, denoted as “gi”, and corresponds to the minimal-norm solution to a gradient linear system function described at equation (4), where the determined adversarial perturbation, “di*”, which corresponds to the chosen decrease of probability used to “fool” the classifier into a different class, is the minimal norm solution to a gradient linear system function (see Jang sect. 3. Our Algorithm; see also Jang sect A.2 Generalizing the Solution of 3).
At the time of filing, one of ordinary skill in the art would have found it obvious to apply the teachings of Jang to the teachings of Ioffe, Szegedy, and Chang to use a modified target label distribution which alters the initial label distribution by a smoothing amount, where the determined adversarial perturbation, “di*”, corresponds to the chosen decrease of probability used to “fool” the classifier into a different class, which is a minimal norm solution to a gradient linear system function, suggesting the broadest reasonable interpretation that the ground-truth class is replaced with a selected non-ground truth class which is a most confusing class having the minimum gradient of a classification loss among the one or more non-ground-truth classes. This modification is rationalized as an application of a known technique to a known device ready for improvement to yield predictable results. In this instance, Ioffe, Szegedy, and Chang disclose a base method for regularizing training data for a neural network by modifying the training data, where normalized label distributions associated with a training image is modified by a smoothing distribution, and the smoothing label distribution may be a non-uniform distribution that includes one or more smoothing scores that are capable of being different from one or more other smoothing scores in the same smoothing label distribution. Jang teaches a known technique for finding adversarial samples, where the decrease in a probability amount is determined based on a gradient of the probability of a sample belonging to class “l”, where the determined adversarial perturbation, “di*”, which corresponds to the chosen decrease of probability used to “fool” the classifier into a different class, is the minimal norm solution to a gradient linear system function.  One of ordinary skill in the art would have recognized that by applying Jang’s technique to the teachings of Ioffe, Szegedy, and Chang would allow for determining an amount to modify the probability values of a label distribution, in which the highest label score in the training label distribution is reduced and the lowest scores in the training label distribution is increased by a predetermined amount, where the determined adversarial perturbation, “di*”, which corresponds to the chosen decrease of probability, is the minimal norm solution to a gradient linear system function, suggesting the broadest reasonable interpretation that the ground-truth class is replaced with a selected non-ground truth class which is a most confusing class having the minimum gradient of a classification loss among the one or more non-ground-truth classes.

Regarding claim 11, please see the above rejection of claim 10. Ioffe, Szegedy, Chang, and Jang disclose the computer-implement method of Claim 10 wherein the at least one step projected gradient descent is one step projected gradient descent (see Chang sect. 2.1 White Box attacks and sect. 3.1. Adversarial Training, where Projected gradient descent (PGD) is a suggested method for adversarial attack, which first randomly picks a point within a confined small ball around each clean input).

Regarding claim 12, please see the above rejection of claim 10. Ioffe, Szegedy, Chang, and Jang disclose the computer-implement method of Claim 10 wherein the 
probability of the ground-truth class is at least a certain factor times larger than a maximal probability over non-ground truth classes (see Ioffe [0045] where the training label distribution may be a one-hot distribution, where the correct label is a positive value “1” and the other labels are assigned a value of “0”; where Ioffe’s teachings suggests that the probability of the correct label being a positive value of “1” would be at least a certain factor larger than the probability value of “0” of the other labels). 

Regarding claim 13, please see the above rejection of claim 10. Ioffe, Szegedy, Chang, and Jang disclose the computer-implement method of Claim 10 further comprising: 
generating a perturbed label from the ground-truth label by decreasing a probability for the ground-truth class by a perturbation amount no more than a predetermined perturbation budget (see Ioffe [0062]-[0063], where a smoothing label distribution is determined and is used to regularize  and generate a modified target label distribution based on a calculated weighted sum of the initial target label distribution and smoothing label distribution; see also Ioffe [0051]-[0052], where the modified distribution determined from weighted sum of the initial target and smoothing label distribution suggests decreasing the probability for a ground truth class by a predetermined amount; see Jang sect. 3. Our Algorithm, where the decrease of probability is limited to a maximum of “pi – 1/|C|” ) and distributing the perturbation amount among at least one of the one or more non-ground-truth classes, in which a non-ground-truth class receives a portion of the perturbation amount based on the gradient of a classification loss with respect to that non-ground-truth class (see Jang sect. 3. Our Algorithm, where the decrease of a probability of a sample, “xi” belonging in class “l”, denoted as “pi” to a chosen probability value to be achieved, denoted as “pi+1”, where the decrease is denoted as “δi = pi – pi+1”, is determined as a minimum of a function of a gradient of the probability of a sample belonging to class “l”, denoted as “gi”, and corresponds to the minimal-norm solution to a gradient linear system function described at equation (4); where the combined teachings of the cited references suggest that the determined amount to modify the probability values of a label distribution in which the highest label score in the training label distribution is reduced and the lowest scores in the training label distribution is increased by the predetermined amount, suggesting the broadest reasonable interpretation that a portion of the determined amount to modify the label distribution is distributed to a non-ground-truth class); and 
training the image classification model using both the perturbed image and the perturbed label (see Ioffe [0064], where the regularized training data is used to train the neural network model; see Chang sect. 3.1. Adversarial Training, sect. 3.2.3 The final loss function, and sect. 3.3. The training process, where the generated adversarial images are used with label smoothing to train the deep neural network model).

Regarding claim 14, please see the above rejection of claim 13. Ioffe, Szegedy, Chang, and Jang disclose the computer-implement method of Claim 13 wherein distributing the predetermined perturbation budget among at least one of the one or more non-ground-truth classes comprises setting a share for the most confusing class as a minimal value (see Jang sect. 3. Our Algorithm, where the decrease is denoted as “δi = pi – pi+1”, is determined as a minimum of a function of a gradient of the probability of a sample belonging to class “l”, denoted as “gi”, and corresponds to the minimal-norm solution to a gradient linear system function described at equation (4), and the corresponding decrease of probability is picked as the minimum of two condition values, see Eq. (6)).

Regarding claim 15, Ioffe, Szegedy, Chang, and Jang disclose a computer-implemented method for training a deep learning neural network model to improve robustness, the method comprising: 
receiving a dataset comprising a plurality of elements and corresponding ground-truth label representations for the elements (see Ioffe [0061], where a set of training data is obtained; see Ioffe [0038], where the set of training data includes training items with associated labels; and see Ioffe [0042]-[0043], where the training data items are training images, and each label may be a score distribution), each ground-truth label representations represents a probability distribution across a ground-truth class and one or more non-ground-truth classes (see Ioffe [0043]-[0045], where the training image are associated with training label distributions for a set of labels indicating a correct label among other labels; see Szegedy sect. 7. Model Regularization via Label Smoothing, where the ground truth distribution, q(k|x), for training examples are normalized so that the sum of the distribution over labels is equal to 1)); 
generating, for each ground-truth label representation, a perturbed label representation based on the ground-truth label representation by altering the probability distribution for the ground-truth label representation (see Ioffe [0063], where a modified target label distribution is generated based on the initial training distribution and a smoothing label distribution) based on at least a gradient of a classification loss with respect to at least one of the one or more non- ground-truth class (see Jang sect. 3. Our Algorithm, where the decrease of a probability of a sample, “xi” belonging in class “l”, denoted as “pi” to a chosen probability value to be achieved, denoted as “pi+1”, where the decrease is denoted as “δi = pi – pi+1”, is determined as a minimum of a function of a gradient of the probability of a sample belonging to class “l”, denoted as “gi”, and corresponds to the minimal-norm solution to a gradient linear system function described at equation (4)); and 
training the neural network model using at least the perturbed label representations (see Ioffe [0064], where the regularized training data is used to train the neural network model; see Chang sect. 3.3. The training process, where the generated adversarial images are used with label smoothing to train the deep neural network model).
Please see the above rejection of claim 10, as the rationale to combine the teachings of Ioffe, Szegedy, Chang, and Jang are similar, mutatis mutandis. 

Regarding claim 16, please see the above rejection of claim 15. Ioffe, Szegedy, Chang, and Jang disclose the computer-implement method of Claim 15 wherein the neural network model is an image classification model (see Ioffe [0032], where the neural network generates an estimated likelihood that an input image contains an image of an object belonging to a category), and the plurality of elements are clean images (see Ioffe [0042]-[0043], where the training data items are training images).

Regarding claim 17, please see the above rejection of claim 15. Ioffe, Szegedy, Chang, and Jang disclose the computer-implement method of Claim 15 wherein each of the ground-truth label representations is a vector representing each ground- truth label representation as a one-hot vector with the probability, at least initially, corresponding to the ground-truth class as 1 and probabilities for the one or more non-ground-truth classes as 0 (see Ioffe [0045] where the training label distribution may be a one-hot distribution).

Regarding claim 18, please see the above rejection of claim 16. Ioffe, Szegedy, Chang, and Jang disclose the computer-implement method of Claim 16 further comprising: 
for each clean image: 
generating an initial image by adding random noise within a predetermined image perturbation budget to the clean image (see Chang sect. 3.2. Proposed Efficient Two-Step Adversarial Defense, and sect. 2.1 White Box Attacks, where adversarial examples are generated per each clean input based on an adversarial image attack adding a noise perturbation amount to the clean input); 
generating a perturbed image by applying at least one-step projected gradient descent using a gradient of a classification loss function with respect to the clean image (see Chang sect. 2.1 White Box attacks and sect. 3.1. Adversarial Training, where Projected gradient descent (PGD) is a suggested method for adversarial attack; see Chang Eq. (2) where the gradient of the loss function is used with respect to the input image, label, and parameters of the model parameters); and
wherein the step of training the neural network model using at least the perturbed label representations comprises training the image classification model using both the perturbed images and the perturbed label representations (see Ioffe [0064], where the regularized training data is used to train the neural network model; see Chang sect. 3.1. Adversarial Training, sect. 3.2.3 The final loss function, and sect. 3.3. The training process, where the generated adversarial images are used with label smoothing to train the deep neural network model).

Regarding claim 19, please see the above rejection of claim 15. Ioffe, Szegedy, Chang, and Jang disclose the computer-implement method of Claim 15 wherein altering the probability distribution for the ground-truth label representation based on a gradient of a classification loss with respect to at least one of the one or more non-ground-truth class comprising: 
decreasing a probability for the ground-truth class by a perturbation amount no more than a predetermined perturbation budget (see Jang sect. 3. Our Algorithm, where the decrease of probability is limited to a maximum of “pi – 1/|C|”); 
choosing a most confusing class among the one or more non-ground-truth classes, the most confusing class having the minimum gradient of the classification loss among the one or more non-ground-truth classes (see Jang sect. 3. Our Algorithm, where the determined adversarial perturbation, “di*”, which corresponds to the chosen decrease of probability, is the minimal norm solution to a gradient linear system function, and suggests the determined adversarial perturbation corresponds to a class label);
setting, for the most confusing class, a minimal value that is smaller than the perturbation amount (see Jang sect. 3. Our Algorithm, where the decrease of probability is a minimum of two condition values and limited to a maximum of “pi – 1/|C|”, see Eq. (6), and where the determined adversarial perturbation, “di*”, which corresponds to the chosen decrease of probability, is the minimal norm solution to a gradient linear system function, and suggests the determined adversarial perturbation corresponds to a class label); and 
distributing, for each of the one or more non-ground-truth classes other than the most confusing class, a portion of the perturbation amount that is related to the gradient of the non-ground-truth class (see Ioffe [0062], where the smoothing label distribution may be a non-uniform distribution that includes one or more smoothing scores that are capable of being different from one or more other smoothing scores in the same smoothing label distribution; see Jang sect. 3. Our Algorithm, where the decrease is denoted as “δi = pi – pi+1”, is determined as a minimum of a function of a gradient of the probability of a sample belonging to class “l”; where the combined teachings of the cited references suggest that the determined amount to modify the probability values of a label distribution in which the highest label score in the training label distribution is reduced and the lowest scores in the training label distribution is increased by the predetermined amount, suggesting the broadest reasonable interpretation that a portion of the determined amount to modify the label distribution is distributed to a non-ground-truth class that is related to the gradient of the non-ground-truth class).

Regarding claim 20, please see the above rejection of claim 15. Ioffe, Szegedy, Chang, and Jang disclose the computer-implement method of Claim 15 wherein neural network model, once trained, is robust to the adversarial image attacks are white-box attacks (see Chang sect. 2. Background and sect. 2.1 White Box attacks, where the suggested FGSM , IFGSM, and PGD adversarial attacks are described as white box attacks; see Chang sect. 3. Methods, where adversarial training augments the training dataset with adversarial examples during the training process to increase robustness of the model against white box attacks).

Allowable Subject Matter
Claims 1-9 are allowed.
The following is an examiner’s statement of reasons for allowance: 
Regarding the subject matter of the amended independent claim 1, the prior art of record, alone or in combination, fails to fairly teach or suggest, combined with the other recited claimed subject matter, the following limitations:
“generating a perturbed label based on the ground-truth label by altering the probability distribution in a probability simplex by performing steps comprising: 
decreasing a probability for the ground-truth class by a perturbation amount; and
	dividing the perturbation amount among at least one of the one or more non-ground-truth classes, in which a non-ground-truth class receives a portion of the perturbation amount based on the gradient of a classification loss with respect to each that non-ground-truth class”.

Previously cited Ioffe was cited to teach a method for regularizing training data for a neural network by modifying the training data, where normalized label distribution associated with training images is modified by a smoothing distribution, and that the modified distribution determined from a weighted sum of the initial target and smoothing label distribution suggests decreasing the probability for a ground truth class by a predetermined amount and increasing non-ground truth scores by a predetermined amount (see Ioffe [0051]-[0052]; see also Ioffe [0058]).
Previously cited Jang was cited to teach in finding adversarial samples, the technique of determining a decrease amount in the belief probability based on the minimum of a function of a gradient of the probability of a sample belonging to a class, which correspond to the minimal-norm solution to a gradient linear system function (see Jang sect. 3. Our Algorithm; see also Jang sect A.2 Generalizing the Solution of 3).
The combined teachings of Ioffe, Jang, and the other cited prior art references suggest modifying the label distribution by a non-uniform smoothing distribution, where highest label is reduced by a predetermined amount, and the lowest scores are increased by a predetermined amount, where the predetermined amount is determined based on a gradient of the probability of a sample belonging to the class. As the predetermined amount applied to modify the label distribution corresponds as a portion of the predetermined amount, the combined teachings suggest the broadest reasonable interpretation of distributing a portion of the perturbation amount to at least one of the one or more non-ground truth classes based on the gradient of a classification loss with respect to that non-ground-truth class. 
However, the combined teachings of the cited prior art references fails to fairly teach, alone or in combination, that the perturbation amount is divided among at least one or more of the non-ground-truth classes, in which a non- ground-truth class receives a portion of the perturbation amount based on the gradient of a classification loss with respect to that non-ground-truth class.
 
Further search and consideration of the prior art, failed to yield a fair teaching, alone or in combination, of the noted combination of claimed subject matter. 

Nicolae et al. (“Adversarial Robustness Toolbox v0.4.0”) is pertinent in teaching various adversarial sample attacks and defenses in supporting machine learning models to improve adversarial robustness of visual recognition system (see Nicolae Abstract and sect. 5 Attacks, and 6 Defenses). However Nicolae fails to further teach, alone or in combination, generating a perturbed label based on the ground-truth label by altering the probability distribution by decreasing a probability for the ground-truth class by a perturbation amount, and dividing the perturbation amount among at least one of the one or more non-ground-truth classes, in which a non-ground-truth class receives a portion of the perturbation amount based on the gradient of a classification loss with respect to each that non-ground-truth class. 
Zantedeschi et al. (“Efficient Defenses Against Adversarial Attacks”) is pertinent in teaching defense technique to reinforce the structure of a deep neural network to make more stable predictions and to be less likely to be fooled by adversarial samples, where classifiers robust to adversarial examples are learned by considering all possible local perturbations which are weighted with respect to their magnitudes (see Zantedeschi Abstract and sect. 3. Efficient Defenses, and sect. 3.2. Gaussian Data Augmentation). However, Zantedeschi fails to further teach, alone or in combination, generating a perturbed label based on the ground-truth label by altering the probability distribution by decreasing a probability for the ground-truth class by a perturbation amount, and dividing the perturbation amount among at least one of the one or more non-ground-truth classes, in which a non-ground-truth class receives a portion of the perturbation amount based on the gradient of a classification loss with respect to each that non-ground-truth class.

Regarding claims 2-9, they are dependent claims of independent claim 1, which incorporate the allowable subject matter of the independent claim 1, and are therefore allowed.

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TIMOTHY WING HO CHOI whose telephone number is (571)270-3814. The examiner can normally be reached 9:00 AM to 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, VINCENT RUDOLPH can be reached on (571) 272-8243. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/TIMOTHY CHOI/Examiner, Art Unit 2661                                                                                                                                                                                                        

/VINCENT RUDOLPH/Supervisory Patent Examiner, Art Unit 2661