Notice of Pre-AIA  or AIA  Status
         The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
 
Status of Claims
            The amendment filed on January 10, 2022 in response to the October 08, 2021 non-final Office action has been entered. The status of the claims is as follows:
Claims 1-2, 4-12, and 14-20 remain pending in the application.
Claims 1-2, 4-12, and 14-17, and 20 have been amended. 
Claims 18-19 remain original as filed.
The originally filed claims 3 and 13 are canceled in the January 10, 2022 amendment.
 
Response to Arguments
            The amendment and arguments filed on January 10, 2022 have been fully considered. The examiner’s response is delineated as follows.
(a)       Response to Arguments Regarding Objections to the claims: The objection to claims 2, 5, 7-10, 12, 15, and 17 is hereby withdrawn in view of Applicant’s amendment to the claims.
(b)       Response to Arguments Regarding Rejection of Claims under 35 U.S.C. § 101: The rejection of claims 11-19 under 35 U.S.C. § 101 is maintained because Applicant’s arguments are not persuasive.  More specifically, Applicant argued that Applicant can be his own lexicographer to rebut the presumption of plain meaning, that “the Specification computer readable storage medium, including but not limited to computer-readable storage devices as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.”  That is, ¶ [0096] explicitly describes that it is the “computer readable storage medium” that “is not to be construed as being transitory signals per se”.  ¶ [0096] does not explicitly describe that “a ‘computer readable storage medium’ and a ‘computer-readable storage device’ are limited to non-transitory signals” as argued by the Applicant.  Applicant’s arguments are inconsistent with the explicit description of the present disclosure and are thus not persuasive.  Further, claims 11-19 are NOT directed to a “computer-readable storage medium”.  Instead, claim 11 explicitly recites “[a] computer usable program product comprising a computer-readable storage device, and computer usable code stored on the storage device”.  Although Applicant can be his own lexicographer, the present disclosure nevertheless fails to disavow the claimed “computer-readable storage device” from being 
(c)        Response to Arguments Regarding Rejection of Claims under 35 U.S.C. § 112(a), Written Description: The rejection of claims 2, 9-10, and 12 under 35 USC § 112(a) is withdrawn in view of Applicant’s amendment to these claims.
(d)       Response to Arguments Regarding Rejection of Claims under 35 U.S.C. § 112(b): The rejections of claims 1-20 under 35 U.S.C. § 112(b) in the previous non-final Office action are withdrawn in view of Applicant’s cancellation of claims 3 and 13 as well as amendment to claims 1-2, 4-12, and 14-20.
(e)	Response to Arguments Concerning Rejections of claims under 35 U.S.C. § 103: Applicant’s arguments are regarding newly amended claim language which is addressed in the rejections of claims under 35 U.S.C. § 103 below. 

Claim Objections
Claims 1, 5, and 15 stand objected to because of the following informalities:  
(a)	Claim 1: The limitation “a mitigation-nodes hyperparameter” in “wherein the number of nodes is based on a mitigation-nodes hyperparameter value” contains a clerical informality because the noun “mitigation-nodes” is used to modify “hyperparameter value” in the compound noun “mitigation-nodes hyperparameter value”, and thus a singular term “mitigation-node” should be used. The examiner suggests amending the aforementioned limitation to recite “a mitigation-node hyperparameter”.

(c)	Claim 15: Claim 15 contains a minor grammatical informality of missing an indefinite article for the countable noun “mitigation layer” in “program instructions to select a layer of the neural network as mitigation layer”.  The examiner suggests amending the above limitation to recite “program instructions to select a layer of the neural network as a mitigation layer”.
Appropriate correction is required.
 
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


The claimed invention in claims 11-19 stand rejected under 35 U.S.C. 101 because claims 11-19 are directed to software per se or computer program per se that also does not fall within the four statutory categories of patent eligible subject matter.  
Claims 11-19 recite “computer usable program product comprising a computer-readable storage device”.  Nonetheless, the broadest reasonable interpretation (BRI) of computer usable program product encompass products that do not have a physical or 
Therefore, claims 11-19 thus claim a product without having a physical or tangible form or structural recitations and are thus interpreted as software per se or computer program per se that also does not fall within any of the four categories of patent eligible subject matter.  MPEP 2106.03. 
More specifically, with respect to claim 11: 
Claim 11 merely broadly encompasses a transitory form of signal transmission such as a propagating electrical or electromagnetic signal or carrier wave and is thus interpreted as signal per se.  Moreover, claim 11 does not recite physical elements and thus generally encompasses a product that does not have a physical or tangible form or structural recitations and is thus interpreted as software per se or computer program per se.  Therefore, claim 11 does not fall within any of the four recognized statutory categories of patent eligible subject matter.  
Further, Applicant argued that Applicant can be his own lexicographer to rebut the presumption of plain meaning, that “the Specification as filed, defines the phrases ‘computer readable storage medium’ and ‘computer-readable storage devices’ as explicitly excluding transitory signals” as shown in ¶ [0096] of the present disclosure, and that “a ‘computer readable storage medium’ and a ‘computer-readable storage device’ are limited to non-transitory signals because the Specification as filed disavowed 
The examiner disagrees. ¶ [0096] describes, inter alia, “[a] computer readable storage medium, including but not limited to computer-readable storage devices as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.”  That is, ¶ [0096] explicitly describes that it is the “computer readable storage medium” that “is not to be construed as being transitory signals per se”.  ¶ [0096] does not explicitly describe that “a ‘computer readable storage medium’ and a ‘computer-readable storage device’ are limited to non-transitory signals” as argued by the Applicant.  Therefore, Applicant’s arguments are inconsistent with the explicit description of the present disclosure and are thus not persuasive.  
Further, claims 11-19 are NOT directed to a “computer-readable storage medium”.  Instead, claim 11 explicitly recites “[a] computer usable program product comprising a computer-readable storage device, and computer usable code stored on the storage device”.  Although Applicant can be his own lexicographer, the present disclosure nevertheless fails to disavow the claimed “computer-readable storage device” from being transitory signals. Therefore, Applicant’s argument is not persuasive, and the rejection of claims 11-19 under 35 USC § 101 is thus maintained for at least the foregoing reasons.
Claims 12-19 similarly recites the same computer usable program product as in claim 11 and are thus similarly rejected. 


Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
 
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
 
            Claims 1-2, 4-12, and 15-20 stand rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
(a)       Independent claims 1, 11, and 20: (1) The limitation “the nodes” in “generating an ordered list of the nodes according to respective activation differences” is indefinite because claim 1 does not recite “nodes” to provide proper antecedent basis. For the purpose of examination, this limitation is interpreted as “generating an ordered list of a plurality of nodes according to respective activation differences”. (2) The two instances of “object recognition” in the limitations “inputting, to a neural network that is pre-configured to recognize objects from inputs when operating using a processor and a memory, a valid training input for object recognition” and “inputting, to the neural network, an altered training input for object recognition” are indefinite because it is unclear whether these two instances of “object recognition” refer to the same or different “object recognition” 
(b)       Dependent claims 2, 4-10, 12, and 15-20 inherit the aforementioned deficiencies from independent claims 1 and 11, respectively.  Therefore, claims 2, 4-10, 12, and 15-20 are also rejected under 35 U.S.C. 112(b), the same rationale applying.
(c)      Dependent claims 2 and 12: The limitation “object recognition” in the limitation “sending an actual input to the neural network during the operational phase for object recognition, wherein the actual input is a valid input; and” is indefinite because it is unclear whether the “objection recognition” recited in claim 2 is the same as or different from the “object recognition” recited in the base claim 1. For the purpose of examination, this limitation of claim 2 is interpreted as “sending an actual input to the neural network during the operational phase for the object recognition, wherein the actual input is a valid input; and”.
(d)      Dependent claims 4 and 14: the limitation “the activation differences” in “wherein the activation differences are respective differences between the altered output and the valid output for each node” is indefinite because neither claim 4 nor its base claim 1 provides a proper antecedent basis for “the activation difference”.  For the purpose of examination, this limitation is interpreted as “wherein the respective activation differences are respective differences between the altered output and the valid output for each node.”
 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
 
s 1-2, 4, 7-12, 14, 17, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over  Gao et al. DeepCloak: Masking Deep Neural Network Models for Robustness Against Adversarial Samples (April 17, 2017) (hereinafter Gao) in view of Baker et al. US PGPub 20200285939 effectively filed on September 28, 2017 (hereinafter Baker) and further in view of Xiao et al. Learning Deep Feature Representations with Domain Guided Dropout for Person Re-identification (Apr. 26, 2016) (hereinafter Xiao).
With respect to claim 1, Gao teaches: 
inputting, to a neural network that is pre-configured to recognize objects from inputs when operating using a processor and a memory, a valid training input for object recognition; (Gao, p. 2, § 3, ¶ 2: “To remove those unnecessary features, we insert a mask layer in a DNN model right before the linear layer handling classification.” p. 3, § 3, ¶ 2: “The process is summarized in Algorithm 1. X can be a subset of the full training set, so the process of learning the mask can be fast.” “Algorithm 1 DeepCloak algorithm”: “Input: Training set X = {x1, x2 … xN}, DNN classifier F(), adversarial power                         
                            ϵ
                        
                    . g() represents feature extraction layers of F()”. P. 3, § 4.1, ¶ 1: “Dataset: We choose CIFAR-10 (Krizhevsky & Hinton, 2009), an image dataset with 50,000 32x32 training images and 10,000 testing images.” 
The examiner notes that Gao’s training set X = {x1, x2 … xN} teaches a valid input, that Gao’s image classification of the training set X (e.g., the CIFAR-10 cited above) teaches recognizing objects from inputs, and that Gao’s DeepCloak teaches a neural network that is pre-configured and the remaining limitation.)

(Gao, p. 3, § 3, ¶ 2: “The process is summarized in Algorithm 1. X can be a subset of the full training set, so the process of learning the mask can be fast.” p. 3, § 3, Algorithm 1, esp. Step 5: “5: Forward                         
                            
                                
                                    x
                                
                                
                                    i
                                
                                
                                    '
                                
                            
                        
                     into the network, get the output feature vector g(                        
                            
                                
                                    x
                                
                                
                                    i
                                
                                
                                    '
                                
                            
                        
                    )”. The examiner notes that Gao’s forwarding adversarial samples,                         
                            
                                
                                    x
                                
                                
                                    i
                                
                                
                                    '
                                
                            
                        
                    , into the neural network during the training process for classification of                         
                            
                                
                                    x
                                
                                
                                    i
                                
                                
                                    '
                                
                            
                        
                     (e.g., recognition of                         
                            
                                
                                    x
                                
                                
                                    i
                                
                                
                                    '
                                
                            
                        
                    ) teaches the above limitation.) 

determining an activation difference from the valid training input to the altered training input for each node of the neural network; (Gao, p. 3, Algorithm 1, esp. Steps 4-6: “4: Forward xi into the network, get the output feature vector g(xi)  5: Forward xi‘ into the network, get the output feature vector g(xi’)  6: Add |g(xi) − g(xi’)| into v.” p. 3, § 4.2 “Experiment Result”, Table 1. The examiner notes that Gao’s computing the output feature vectors g(xi) and g(xi’) respectively teaches activation for the valid training input and the altered training input, and that Gao’s computing the difference between these two output feature vectors to determine which nodes are to be masked (e.g., 0%, 1%, 2%, …, 6% of “Nodes masked” in Table 1) teaches the above limitation.)

Gao does not appear to explicitly teach: 
when operating using a processor and a memory,
generating an ordered list of the nodes according to respective activation differences; and 

wherein the number of nodes is based on a mitigation-nodes hyperparameter value. 
Baker does, however, teach: 
when operating using a processor and a memory, (Baker, ¶ [0650]: “The illustrated computer system 4100 comprises multiple processor units”, “onboard memory”, and “off-board memory 4106”; and ¶ [0657]: “aspects of the present invention can improve recommender systems, speech recognition systems, and classification systems, including image and diagnostic classification systems”. The examiner notes that Baker’s recommender systems, speech recognition systems, and classification systems, including image and diagnostic classification systems teach recognizing objects from inputs.)
wherein the number of nodes is based on a mitigation-nodes hyperparameter value. (Baker, ¶ [0097]: “Dropout is a technique that randomly selects nodes in a neural network and temporarily sets the activation values of those nodes to zero”; and “Some embodiments of this invention expand the number of hyperparameters to customize the control of dropout. Some embodiments implement nonrandom, controlled dropout.” The examiner further notes that Baker’s customizing the control of dropout with hyperparameters to select nodes for dropout thus teaches the number of nodes is based on a hyperparameter that further teaches a mitigation-node hyperparameter.)

Gao and Baker are analogous art because both pertain to removing features from a neural network to improve the robustness of a neural network. 
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Gao’s inputting valid training input and altered training input and determining the activation difference from these two inputs (Gao, supra) to incorporate Baker’s processor and memory as well as a mitigation node hyperparameter value (Baker, supra). The modification provides not only a computer system having at least one processor and memory for implementing computerized dropout but also the ability to customize the control of dropout rather than randomly selecting nodes for dropout (Baker, ¶ [0097]: “10. Dropout: Dropout is a technique that randomly selects nodes in a neural network and temporarily sets the activation values of those nodes to zero.” “Some embodiments of this invention expand the number of hyperparameters to customize the control of dropout. Some embodiments implement nonrandom, controlled dropout.” ¶ [0650]: “FIG. 41 is a diagram of a computer system 4100 that could be used to implement the embodiments described above.”)

Gao modified by Baker does not appear to explicitly teach: 
generating an ordered list of the nodes according to respective activation differences; and 
causing suppressing of outputs of a number of nodes on the ordered list during an operational phase of the neural network, 

Xiao does, however, teach: 
generating an ordered list of the nodes according to respective activation differences; and (Xiao, p. 1252, § 3.3, ¶ 1: “Given the CNN model pretrained by using the mixed dataset, we identify for each domain which neurons are effective. For each domain sample, we define the impact of a particular neuron on this sample as the gain of the loss function when we remove the neuron. Specifically, let g(x) ∈ Rd denote the d-dimensional CNN feature vector of an image x. The impact score of the i-th (i ∈ {1, 2, . . . , d}) neuron on this image sample is defined as si =                         
                            L
                        
                     (g(x)\i) −                         
                            L
                        
                     (g(x)), (3) where g(x)\i is the feature vector after we setting the i-th neuron response to zero.” P. 1253, § 3.3, ¶ 4: “After obtaining all the                         
                            
                                
                                    s
                                
                                -
                            
                        
                    i, we continue to train the CNN model, but with these impact scores as guidance to dropout different neurons for different domains during the training process.”  
The examiner notes that Xiao’s impact score is computed as the difference between cross-entropy loss (                        
                            L
                        
                     in Eq. (3) above) with a neuron removed from a neural network and the cross-entropy loss with the neuron remaining in the network and thus teaches respective activation differences.  The examiner further notes that Xiao’s computing the impact score of each of the neurons according to EQ. (3) and subsequently dropout different neurons by using such impact scores as guidance teaches generating an ordered list of the nodes according to respective activation differences as claimed.)

causing suppressing of outputs of a number of nodes on the ordered list during an operational phase of the neural network, (Xiao, p. 1252, § 3.3, ¶ 1: “Given the CNN model pretrained by using the mixed dataset, we identify for each domain which neurons are effective. For each domain sample, we define the impact of a particular neuron on this sample as the gain of the loss function when we remove the neuron. Specifically, let g(x) ∈ Rd denote the d-dimensional CNN feature vector of an image x. The impact score of the i-th (i ∈ {1, 2, . . . , d}) neuron on this image sample is defined as si =                         
                            L
                        
                     (g(x)\i) −                         
                            L
                        
                     (g(x)), (3) where g(x)\i is the feature vector after we setting the i-th neuron response to zero.” P. 1253, § 3.3, ¶ 4: “After obtaining all the                         
                            
                                
                                    s
                                
                                -
                            
                        
                    i, we continue to train the CNN model, but with these impact scores as guidance to dropout different neurons for different domains during the training process.”  
The examiner notes that Xiao’s using a pretrained CNN model in subsequent computation of impact scores and dropping out neurons teaches an operational phase of the neural network.  The examiner further notes that Xiao’s dropping out neurons by using the respective impact scores of the neurons as guidance teaches causing suppressing of outputs of a number of nodes on the ordered list during an operational phase of the neural network as claimed.)

Gao, Baker, and Xiao are analogous art because all three references pertain to removing features from a neural network to improve the robustness of a neural network. 
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Gao in view of Baker to incorporate Xiao’s generating an ordered list of the nodes and suppressing the output of a number of nodes on the ordered list during an operational phase of the neural network (Xiao, supra).  The modification provides not only the ability to quantify and visualize effective neurons and Xiao, p. 1252, § 3.3, ¶ 1: “We visualize the neuron impact scores between several pairs of domains in Figure 3. It clearly shows that the two sets of impact scores have little correlation, indicating that the effective neurons for different domains are not the same.”  p. 1256, § 4.3, last paragraph: “We further investigate how does the deterministic Domain Guided Dropout change the network behavior by evaluating the relative performance gain on each domain with respect to the number of neurons having negative impact scores on that domain.”) 

With respect to claim 2, Gao modified by Baker teaches the method of claim 1 from which claim 2 depends, and Gao further teaches: 
sending an actual input to the neural network during the operational phase for object recognition, wherein the actual input is a valid input; and (Gao, P. 3, § 4.1, ¶ 1: “Dataset: We choose CIFAR-10 (Krizhevsky & Hinton, 2009), an image dataset with 50,000 32x32 training images and 10,000 testing images.”  P. 3, § 4.1, ¶ 3: “Metric: We generate adversarial samples for every sample in the test set and test all adversarial samples on each DNN model. The accuracy on the adversarial sample set is reported as “adversarial accuracy” to measure the adversarial robustness of a DNN model.” P. 3, § 4.2, Table 1 (reproduction omitted). The examiner notes that any of the CIFAR-10 images for testing Gao’s neural network teaches an actual input that is also a valid input.  The examiner further notes that Gao’s neural network classifying the CIFAR-10 testing images with different percentages of nodes masked teaches an operational phase, and that Gao’s obtaining the corresponding accuracies for image classification of the CIFAR-10 images with respect to the adversarial examples and their corresponding actual inputs teaches sending an actual input to the neural network during the operational phase for object recognition as claimed.)
 
causing the neural network to generate, responsive to the actual input, an object recognition output while suppressing the outputs of the number of nodes on the ordered list. (Gao, p. 3, Figure 1 Caption “A mask layer with weights either 0 or 1 is added right before the classification layers”.  P. 3, § 4.2, Table 1 (reproduction omitted). p. 5, § 6.3.1, ¶ 1: “We add experiments to show our method can be applied to a wide range of DNN models. The results are displayed in Table 2, Table 3 and Table 4.” P. 5, § 6.3.1, ¶ 2: “More specifically, we train a small CNN on the MNIST dataset (LeCun et al., 1998) , and also VGG (Simonyan & Zisserman, 2014) model and Wide Residual Network model (Zagoruyko & Komodakis, 2016) on the CIFAR-10 dataset.” P. 6, Tables 2-4.  
The examiner notes that Gao’s masking nodes with the aforementioned mask layer teaches suppressing the outputs of the number of nodes. The examiner also notes that Gao’s obtaining accuracies that measure the robustness of its DNN model for image classification (see citation and rationale for claim 1, supra) teaches generating an object recognition output responsive to the actual input because Gao measures the accuracy based on the object recognition output of the actual images in the aforementioned image datasets and their corresponding adversarial examples.  The examiner thus notes that Gao’s performing image classification on the aforementioned datasets (e.g., MNIST, CIFAR-10, etc.) and obtaining the respective accuracies listed in Tables 2-4 while the corresponding sets (e.g., percentages) of nodes are masked teaches the above limitation.)

With respect to claim 4, Gao modified by Baker teaches the method of claim 1 from which claim 4 depends, and Gao further teaches:
recording a valid output of each node of the neural network corresponding to the valid training input; (Gao, ¶ 1, § 3, p. 2: “The basic idea of DeepCloak is to remove unnecessary features that can be used for generating adversarial samples. To identify which feature is unnecessary, we test pairs of adversarial samples x’ and its normal seed x, and compare the difference between the extracted features in DNN.” Algorithm 1, § 3, p. 3: “Input: Training set X = { x1, x2 … xN }, DNN classifier F(), adversarial power ϵ. g() represents feature extraction layers of F()” and “Forward xi into the network, get the output feature vector g(xi)”. 
The examiner notes that Gao’s normal seed xi  teaches a normal input, that Gao’s interchangeable use of features and nodes teaches nodes in a neural network, that an output feature vector g(xi) for the normal seed xi teaches a valid output of each node of the neural network, and that Gao’s generating the output feature vector for a normal sample/seed (xi) teaches this limitation. )
 
recording an altered output of each node of the neural network corresponding to the altered training input; and (Gao, p. 3, Algorithm 1, esp. Step 5: 5: Forward xi‘ into the network, get the output feature vector g(xi’)”.  p. 3, § 4.2 “Experiment Result”, Table 1. The examiner notes that Gao’s computing the output feature vector g(xi’) for the altered training input (e.g., Gao’s adversarial input xi’) uses all the nodes in the neural network to generate the resulting feature vector and thus teaches recording an altered output for each node in the neural network. Gao’s sending the adversarial inputs to generate the aforementioned feature vector teaches that the neural network and the thus generated altered output correspond to the altered training input.)
 
associating an array with the neural network, the array comprising the ordered list of the nodes; (Gao, p. 3, § 3, ¶ 1: “The input of the mask layer is the feature vector extracted by previous layers of the DNN model. The weight of the mask layer is either 0 or 1. An element-wise multiplication is done by the mask layer. Therefore, the output is either the input feature or 0. We remove the top n% of features with highest sensitivity to adversarial samples.” P. 3, § 3, Algorithm 1: “3: Generate an adversarial sample xi’ using sample xi with power ϵ  4: Forward xi into the network, get the output feature vector g(xi)  5: Forward xi’ into the network, get the output feature vector g(xi’) 6: Add | g(xi) − g(xi’) | into v”; and “the sensitivity of are accumulated into the                         
                            v
                        
                     vector as                         
                            v
                            =
                             
                            
                                
                                    ∑
                                    
                                        i
                                        =
                                        1
                                    
                                    
                                        N
                                    
                                
                                
                                    |
                                     
                                    g
                                    (
                                    
                                        
                                            x
                                        
                                        
                                            i
                                        
                                    
                                    )
                                     
                                    -
                                     
                                    g
                                    (
                                    
                                        
                                            x
                                        
                                        
                                            i
                                        
                                    
                                    ’
                                    )
                                     
                                    |
                                
                            
                             
                        
                    after step 7 of Algorithm 1.” 
The examiner notes that Gao’s removing top n% of features (e.g., nodes) with the highest sensitivity to adversarial samples teaches ranking and storing nodes according to their respective sensitivities, and that the data structure in which the nodes and their corresponding sensitivities are stored teaches an array that comprises the ordered list of the nodes. The examiner further notes that Gao’s generating a vector                         
                            v
                        
                     for accumulating, for each feature and each sample, the sensitivity indicative of the difference between a normal sample and the corresponding adversarial sample teaches associating an array with the neural network.)
 
Gao modified by Baker does not appear to explicitly teach: 
wherein the activation differences are respective differences between the altered output and the valid output for each node.
          Xiao does, however, teach: 
wherein the activation differences are respective differences between the altered output and the valid output for each node. (Xiao, p. 1252, § 3.3, ¶ 1: “Given the CNN model pretrained by using the mixed dataset, we identify for each domain which neurons are effective. For each domain sample, we define the impact of a particular neuron on this sample as the gain of the loss function when we remove the neuron. Specifically, let g(x) ∈ Rd denote the d-dimensional CNN feature vector of an image x. The impact score of the i-th (i ∈ {1, 2, . . . , d}) neuron on this image sample is defined as si =                         
                            L
                        
                     (g(x)\i) −                         
                            L
                        
                     (g(x)), (3) where g(x)\i is the feature vector after we setting the i-th neuron response to zero.” P. 1253, § 3.3, ¶ 4: “After obtaining all the                         
                            
                                
                                    s
                                
                                -
                            
                        
                    i, we continue to train the CNN model, but with these impact scores as guidance to dropout different neurons for different domains during the training process.”  
The examiner notes that Xiao’s impact score is computed as the difference between cross-entropy loss (                        
                            L
                        
                     in Eq. (3) above) with a neuron removed from a neural network and the cross-entropy loss with the neuron remaining in the network and thus teaches a respective activation difference between an altered output (e.g., g(xi’) above) and the valid output (e.g., g(xi’) above) for each i-th node as claimed.) 
In addition, Xiao also teaches:  
associating an array with the neural network, the array comprising the ordered list of the nodes; (Xiao, p. 1252, § 3.3, ¶ 1: “The impact score of the i-th (i ∈ {1, 2, . . . , d}) neuron on this image sample is defined as si =                         
                            L
                        
                     (g(x)\i) −                         
                            L
                        
                     (g(x)), (3) where g(x)\i is the feature vector after we setting the i-th neuron response to zero.”  The examiner notes that Xiao’s storage of the impact scores in si in a data structure for the i-th neuron where i ∈ {1, 2, . . . , d} teaches associating an array of si for a domain D with a neural network, and the impact scores of corresponding neurons thus stored teach an ordered list of the nodes.)
Gao, Baker, and Xiao are analogous art because all three references pertain to removing features from a neural network to improve the robustness of a neural network. 
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Gao in view of Baker to incorporate Xiao’s teaching of activation differences that are respective differences between the altered output and the valid output for each node (Xiao, supra).  The modification uses the activation differences as guidance to control dropout of different neurons for different domains and improves the performance of standard dropout techniques even after model convergence (Xiao, p. 1253, left-hand column, last paragraph: “After obtaining all the                         
                            
                                
                                    
                                        
                                            s
                                        
                                        -
                                    
                                
                                
                                    i
                                
                            
                        
                    , we continue to train the CNN model, but with these impact scores as guidance to dropout different neurons for different domains during the training process.”  p. 1256, left-hand column, ¶ 2 “Standard Dropout vs. Domain Guided Dropout”: “From Figure 7(a) we can see that since the model is already converged, continue to use standard Dropout scheme cannot further improve the performance. The performance would rather jitter insignificantly or decrease on particular domains due to overfitting. However, by using the deterministic Domain Guided Dropout scheme, the performance improves consistently on all the domains, especially for the small-scale ones.”)

With respect to claim 7, Gao modified by Baker and Xiao teaches the method of claim 1 from which claim 7 depends, and Baker further teaches:
wherein the altered input comprises an image representing an altered facial feature of a human. (Baker, ¶ [260]: “Block 505 may also degrade the pattern in other ways than just adding noise. For example, if the pattern is an image, it may blur the image or it may sample the image at lower resolution. It may distort the image or move parts of the image around. If the pattern is text, it may change the order of the words or substitute one word for another.” ¶ [0280]: “A system can learn that, in general, an image of a face will have a nose and can soft tie nodes that represent noses in different images of faces.” The examiner notes that moving parts representing an identifying characteristic (e.g., a nose of a user) around in an image teaches an altered input comprising an image representing an altered facial feature of a human.)
Gao, Baker, and Xiao are analogous art because all three references pertain to removing features from a neural network to improve the robustness of a neural network.
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Gao in view of Baker and Xiao to further incorporate Baker’s teaching of an altered input that comprises an image representing an altered facial feature of a human (Baker, supra). The modification provides the ability to control the amount of alterations to images not only to prevent mode collapse but also directly optimizing the degree to which a model generates realistic output that generalizes to patterns that are not in the training data (Baker, ¶ [0261]:” The learning coach can control the amount of noise in the network, not only to prevent mode collapse, but directly optimizing the degree to which the network generates realistic output that generalizes to patterns not in the training data.”)

With respect to claim 8, Gao modified by Baker teaches the method of claim 1 from which claim 8 depends, and Baker further teaches:
wherein the altered input comprises an image representing an altered identifying characteristic of a human.  (Baker, ¶ [260]: “Block 505 may also degrade the pattern in other ways than just adding noise. For example, if the pattern is an image, it may blur the image or it may sample the image at lower resolution. It may distort the image or move parts of the image around. If the pattern is text, it may change the order of the words or substitute one word for another.” ¶ [0274]: “A system can learn that, in general, an image of a face will have a nose and can soft tie nodes that represent noses in different images of faces.” The examiner notes that moving parts representing an identifying characteristic (e.g., a nose of a user) around teaches an altered input comprising an image representing an altered identifying characteristic of a human.)
Gao, Baker, and Xiao are analogous art because all three references pertain to removing features from a neural network to improve the robustness of a neural network.
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Gao in view of Baker and Xiao to further incorporate Baker’s teaching of an altered input that comprises an image representing an altered identifying characteristic of a human (Baker, supra). The modification provides the ability to control the amount of alterations to images not only to prevent mode collapse but also directly optimizing the degree to which a model generates realistic output that generalizes to patterns that are not in the training data (Baker, ¶ [0261]:” The learning coach can control the amount of noise in the network, not only to prevent mode collapse, but directly optimizing the degree to which the network generates realistic output that generalizes to patterns not in the training data.”)

With respect to claim 9, Gao modified by Baker teaches the method of claim 1 from which claim 9 depends, and Baker further teaches:
wherein data of the altered training input has been modified by adding a feature to the data.  (Baker, ¶ [260]: “Block 505 may also degrade the pattern in other ways than just adding noise. For example, if the pattern is an image, it may blur the image or it may sample the image at lower resolution. It may distort the image or move parts of the image around. If the pattern is text, it may change the order of the words or substitute one word for another.” ¶ [0435]: “the computer system 4100 generates random perturbations of the examples selected” and “if the pattern is an image or a speech or audio signal, the perturbations could be generated simply by adding random noise to the signal or image.”)
Gao, Baker, and Xiao are analogous art because all three references pertain to removing features from a neural network to improve the robustness of a neural network.
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Gao in view of Baker and Xiao to further incorporate Baker’s adding a feature to training input data to generate altered training input data (Baker, supra). The modification generates additional training data for Baker’s transformation to generalize to new data to fill the gaps among training data in order to correct difficult classification errors (Baker, ¶ [0431]: “FIG. 17A is a flowchart of an illustrative embodiment of a process that is used to correct difficult classification errors in various embodiments of this invention.” ¶ [0436]: “There need to be enough examples selected in 1703 or generated in 1704 so that the transform in block 1705 (described below) learns to make a transformation that will generalize to new data. If there are gaps among the examples in block 1704, the transform may merely learn to transform the data example into one of those gaps.”)

With respect to claim 10, Gao modified by Baker and Xiao teaches the method of claim 1 from which claim 10 depends, and Baker further teaches: 
wherein data of the altered training input has been modified by deleting a feature from the data.  (Baker, ¶ [260]: “Block 505 may also degrade the pattern in other ways than just adding noise. For example, if the pattern is an image, it may blur the image or it may sample the image at lower resolution. It may distort the image or move parts of the image around. If the pattern is text, it may change the order of the words or substitute one word for another.” ¶ [0552]: “With each round of feedback, the associative memory (2904) refines its prediction (2905) of the full, undegraded pattern. In this recursion, the associative memory may, for example, recover part of the missing parts and remove part of the noise and distortion in the first round of the recursion.” 
The examiner notes that Baker’s removing noise or distortion from the training data teaches deleting a feature (e.g., noise or the distorted portion) from the training input to generate an altered training input. The examiner further notes that Baker’s substituting a first word with a second word effectively deletes the first word and thus renders the above limitation obvious.)
Gao, Baker, and Xiao are analogous art because all three references pertain to removing features from a neural network to improve the robustness of a neural network.
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Gao in view of Baker and Xiao to further incorporate Baker’s deleting a feature to training input data to generate altered training input data (Baker, supra). The modification generates a more complete and cleaner input for a neural network to recover more data to aid recognition (Baker, ¶ [0552]: “With that more complete, somewhat cleaner input, it then recovers more in the next round, and so on.”)

Gao teaches: 
program instructions to input, to a neural network that is pre-configured to recognize objects from inputs when operating using a processor and a memory, a valid training input for object recognition; (Gao, p. 2, § 3, ¶ 2: “To remove those unnecessary features, we insert a mask layer in a DNN model right before the linear layer handling classification.” p. 3, § 3, ¶ 2: “The process is summarized in Algorithm 1. X can be a subset of the full training set, so the process of learning the mask can be fast.” “Algorithm 1 DeepCloak algorithm”: “Input: Training set X = {x1, x2 … xN}, DNN classifier F(), adversarial power                         
                            ϵ
                        
                    . g() represents feature extraction layers of F()”. P. 3, § 4.1, ¶ 1: “Dataset: We choose CIFAR-10 (Krizhevsky & Hinton, 2009), an image dataset with 50,000 32x32 training images and 10,000 testing images.” 
The examiner notes that Gao’s training set X = {x1, x2 … xN} teaches a valid input, that Gao’s image classification of the training set X (e.g., the CIFAR-10 cited above) teaches recognizing objects from inputs, and that Gao’s DeepCloak teaches a neural network that is pre-configured and the remaining limitation.)

program instructions to input, to the neural network, an altered training input for object recognition; (Gao, p. 3, § 3, ¶ 2: “The process is summarized in Algorithm 1. X can be a subset of the full training set, so the process of learning the mask can be fast.” p. 3, § 3, Algorithm 1, esp. Step 5: “5: Forward                         
                            
                                
                                    x
                                
                                
                                    i
                                
                                
                                    '
                                
                            
                        
                     into the network, get the output feature vector g(                        
                            
                                
                                    x
                                
                                
                                    i
                                
                                
                                    '
                                
                            
                        
                    )”. The examiner notes that Gao’s forwarding adversarial samples,                         
                            
                                
                                    x
                                
                                
                                    i
                                
                                
                                    '
                                
                            
                        
                    , into the neural network during the training process for classification of                         
                            
                                
                                    x
                                
                                
                                    i
                                
                                
                                    '
                                
                            
                        
                     (e.g., recognition of                         
                            
                                
                                    x
                                
                                
                                    i
                                
                                
                                    '
                                
                            
                        
                    ) teaches the above limitation.)

program instructions to determine an activation difference from the valid training input to the altered training input for each node of the neural network; (Gao, p. 3, Algorithm 1, esp. Steps 4-6: “4: Forward xi into the network, get the output feature vector g(xi)  5: Forward xi‘ into the network, get the output feature vector g(xi’)  6: Add |g(xi) − g(xi’)| into v.” p. 3, § 4.2 “Experiment Result”, Table 1. The examiner notes that Gao’s computing the output feature vectors g(xi) and g(xi’) respectively teaches activation for the valid training input and the altered training input, and that Gao’s computing the difference between these two output feature vectors to determine which nodes are to be masked (e.g., 0%, 1%, 2%, …, 6% of “Nodes masked” in Table 1) teaches the above limitation.)
Gao does not appear to explicitly teach: 
A computer usable program product comprising a computer-readable storage device, and computer usable code stored on the storage device, the stored computer usable code comprising: program instructions 
when operating using a processor and a memory,
generating an ordered list of the nodes according to respective activation differences; and 
causing suppressing of outputs of a number of nodes on the ordered list during an operational phase of the neural network, 


Baker does, however, teach: 
A computer usable program product comprising a computer-readable storage device, and computer usable code stored on the storage device, the stored computer usable code comprising: program instructions (Baker, ¶ [0650]: “onboard memory 4106 may comprise primary, volatile, and/or non-volatile storage”; and ¶ [0826]: “the one or more memories storing” “instructions”.)

when operating using a processor and a memory, (Baker, ¶ [0650]: “The illustrated computer system 4100 comprises multiple processor units”, “onboard memory”, and “off-board memory 4106”; and ¶ [0657]: “aspects of the present invention can improve recommender systems, speech recognition systems, and classification systems, including image and diagnostic classification systems”. The examiner notes that Baker’s recommender systems, speech recognition systems, and classification systems, including image and diagnostic classification systems teach recognizing objects from inputs.)

wherein the number of nodes is based on a mitigation-nodes hyperparameter value. (Baker, ¶ [0097]: “Dropout is a technique that randomly selects nodes in a neural network and temporarily sets the activation values of those nodes to zero”; and “Some embodiments of this invention expand the number of hyperparameters to customize the control of dropout. Some embodiments implement nonrandom, controlled dropout.” The examiner further notes that Baker’s customizing the control of dropout with hyperparameters to select nodes for dropout thus teaches the number of nodes is based on a hyperparameter that further teaches a mitigation-node hyperparameter.)

Gao and Baker are analogous art because both pertain to removing features from a neural network to improve the robustness of a neural network. 
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Gao’s inputting valid training input and altered training input and determining the activation difference from these two inputs (Gao, supra) to incorporate Baker’s processor and memory as well as a mitigation node hyperparameter value (Baker, supra). The modification provides not only a computer system having at least one processor and memory for implementing computerized dropout but also the ability to customize the control of dropout rather than randomly selecting nodes for dropout (Baker, ¶ [0097]: “10. Dropout: Dropout is a technique that randomly selects nodes in a neural network and temporarily sets the activation values of those nodes to zero.” “Some embodiments of this invention expand the number of hyperparameters to customize the control of dropout. Some embodiments implement nonrandom, controlled dropout.” ¶ [0650]: “FIG. 41 is a diagram of a computer system 4100 that could be used to implement the embodiments described above.”)

Gao modified by Baker does not appear to explicitly teach: 
program instructions to generate an ordered list of the nodes according to respective activation differences; and
program instructions to cause suppressing of outputs of a number of nodes on the ordered list during an operational phase of the neural network, 

Xiao does, however, teach: 
program instructions to generate an ordered list of the nodes according to respective activation differences; and (Xiao, p. 1252, § 3.3, ¶ 1: “Given the CNN model pretrained by using the mixed dataset, we identify for each domain which neurons are effective. For each domain sample, we define the impact of a particular neuron on this sample as the gain of the loss function when we remove the neuron. Specifically, let g(x) ∈ Rd denote the d-dimensional CNN feature vector of an image x. The impact score of the i-th (i ∈ {1, 2, . . . , d}) neuron on this image sample is defined as si =                         
                            L
                        
                     (g(x)\i) −                         
                            L
                        
                     (g(x)), (3) where g(x)\i is the feature vector after we setting the i-th neuron response to zero.” P. 1253, § 3.3, ¶ 4: “After obtaining all the                         
                            
                                
                                    s
                                
                                -
                            
                        
                    i, we continue to train the CNN model, but with these impact scores as guidance to dropout different neurons for different domains during the training process.”  
The examiner notes that Xiao’s impact score is computed as the difference between cross-entropy loss (                        
                            L
                        
                     in Eq. (3) above) with a neuron removed from a neural network and the cross-entropy loss with the neuron remaining in the network and thus teaches respective activation differences.  The examiner further notes that Xiao’s computing the impact score of each of the neurons according to EQ. (3) and subsequently dropout different neurons by using such impact scores as guidance teaches generating an ordered list of the nodes according to respective activation differences as claimed.)

program instructions to cause suppressing of outputs of a number of nodes on the ordered list during an operational phase of the neural network, (Xiao, p. 1252, § 3.3, ¶ 1: “Given the CNN model pretrained by using the mixed dataset, we identify for each domain which neurons are effective. For each domain sample, we define the impact of a particular neuron on this sample as the gain of the loss function when we remove the neuron. Specifically, let g(x) ∈ Rd denote the d-dimensional CNN feature vector of an image x. The impact score of the i-th (i ∈ {1, 2, . . . , d}) neuron on this image sample is defined as si =                         
                            L
                        
                     (g(x)\i) −                         
                            L
                        
                     (g(x)), (3) where g(x)\i is the feature vector after we setting the i-th neuron response to zero.” P. 1253, § 3.3, ¶ 4: “After obtaining all the                         
                            
                                
                                    s
                                
                                -
                            
                        
                    i, we continue to train the CNN model, but with these impact scores as guidance to dropout different neurons for different domains during the training process.”  
The examiner notes that Xiao’s using a pretrained CNN model in subsequent computation of impact scores and dropping out neurons teaches an operational phase of the neural network.  The examiner further notes that Xiao’s dropping out neurons by using the respective impact scores of the neurons as guidance teaches causing suppressing of outputs of a number of nodes on the ordered list during an operational phase of the neural network as claimed.)

Gao, Baker, and Xiao are analogous art because all three references pertain to removing features from a neural network to improve the robustness of a neural network. 
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Gao in view of Baker to incorporate Xiao’s generating an ordered list of the nodes and suppressing the output of a number of nodes on the ordered list during an operational phase of the neural network (Xiao, supra).  The modification provides not only the ability to quantify and visualize effective neurons and less related neurons but also the ability to evaluate relative performance gains on multiple domains (Xiao, p. 1252, § 3.3, ¶ 1: “We visualize the neuron impact scores between several pairs of domains in Figure 3. It clearly shows that the two sets of impact scores have little correlation, indicating that the effective neurons for different domains are not the same.”  p. 1256, § 4.3, last paragraph: “We further investigate how does the deterministic Domain Guided Dropout change the network behavior by evaluating the relative performance gain on each domain with respect to the number of neurons having negative impact scores on that domain.”) 

With respect to claim 12 depending upon claim 11, it is substantially similar to claim 2 and is rejected in the same manner, the same art and reasoning applying.

With respect to claim 14 depending upon claim 11, it is substantially similar to claim 4 and is rejected in the same manner, the same art and reasoning applying.



With respect to claim 20, it is substantially similar to claim 1 and is rejected in the same manner, the same art and reasoning applying.  Further, Gao teaches:
program instructions to input, to a neural network that is pre-configured to recognize objects from inputs when operating using a processor and a memory, a valid training input for object recognition; (Gao, p. 2, § 3, ¶ 2: “To remove those unnecessary features, we insert a mask layer in a DNN model right before the linear layer handling classification.” p. 3, § 3, ¶ 2: “The process is summarized in Algorithm 1. X can be a subset of the full training set, so the process of learning the mask can be fast.” “Algorithm 1 DeepCloak algorithm”: “Input: Training set X = {x1, x2 … xN}, DNN classifier F(), adversarial power                         
                            ϵ
                        
                    . g() represents feature extraction layers of F()”. P. 3, § 4.1, ¶ 1: “Dataset: We choose CIFAR-10 (Krizhevsky & Hinton, 2009), an image dataset with 50,000 32x32 training images and 10,000 testing images.” 
The examiner notes that Gao’s training set X = {x1, x2 … xN} teaches a valid input, that Gao’s image classification of the training set X (e.g., the CIFAR-10 cited above) teaches recognizing objects from inputs, and that Gao’s DeepCloak teaches a neural network that is pre-configured and the remaining limitation.)

program instructions to input, to the neural network, an altered training input for object recognition; (Gao, p. 3, § 3, ¶ 2: “The process is summarized in Algorithm 1. X can be a subset of the full training set, so the process of learning the mask can be fast.” p. 3, § 3, Algorithm 1, esp. Step 5: “5: Forward                         
                            
                                
                                    x
                                
                                
                                    i
                                
                                
                                    '
                                
                            
                        
                     into the network, get the output feature vector g(                        
                            
                                
                                    x
                                
                                
                                    i
                                
                                
                                    '
                                
                            
                        
                    )”. The examiner notes that Gao’s forwarding adversarial samples,                         
                            
                                
                                    x
                                
                                
                                    i
                                
                                
                                    '
                                
                            
                        
                    , into the neural network during the training process for classification of                         
                            
                                
                                    x
                                
                                
                                    i
                                
                                
                                    '
                                
                            
                        
                     (e.g., recognition of                         
                            
                                
                                    x
                                
                                
                                    i
                                
                                
                                    '
                                
                            
                        
                    ) teaches the above limitation.)

program instructions to determine an activation difference from the valid training input to the altered training input for each node of the neural network; (Gao, p. 3, Algorithm 1, esp. Steps 4-6: “4: Forward xi into the network, get the output feature vector g(xi)  5: Forward xi‘ into the network, get the output feature vector g(xi’)  6: Add |g(xi) − g(xi’)| into v.” p. 3, § 4.2 “Experiment Result”, Table 1. The examiner notes that Gao’s computing the output feature vectors g(xi) and g(xi’) respectively teaches activation for the valid training input and the altered training input, and that Gao’s computing the difference between these two output feature vectors to determine which nodes are to be masked (e.g., 0%, 1%, 2%, …, 6% of “Nodes masked” in Table 1) teaches the above limitation.)

Gao does not appear to explicitly teach: 
A computer system comprising a processor, a computer-readable memory, and a computer-readable storage device, and computer usable code stored on the storage device for execution by the processor via the memory, the stored computer usable code comprising:
when operating using a processor and a memory,


Baker does, however, teach: 
A computer system comprising a processor, a computer-readable memory, and a computer-readable storage device, and computer usable code stored on the storage device for execution by the processor via the memory, the stored computer usable code comprising: (Baker, ¶ [0062]: “FIG. 41 illustrates a diagram of a computer system that may be used to implement various aspects of the present disclosure”. ¶ [0650]: “onboard memory may comprise primary, volatile, and/or non-volatile storage,” “microprocessor,” “co-processor”; ¶ [0826]: “memories storing the machine learning system(s) and instructions”.)

when operating using a processor and a memory, (Baker, ¶ [0650]: “The illustrated computer system 4100 comprises multiple processor units”, “onboard memory”, and “off-board memory 4106”; and ¶ [0657]: “aspects of the present invention can improve recommender systems, speech recognition systems, and classification systems, including image and diagnostic classification systems”. The examiner notes that Baker’s recommender systems, speech recognition systems, and classification systems, including image and diagnostic classification systems teach recognizing objects from inputs.)

(Baker, ¶ [0097]: “Dropout is a technique that randomly selects nodes in a neural network and temporarily sets the activation values of those nodes to zero”; and “Some embodiments of this invention expand the number of hyperparameters to customize the control of dropout. Some embodiments implement nonrandom, controlled dropout.” The examiner further notes that Baker’s customizing the control of dropout with hyperparameters to select nodes for dropout thus teaches the number of nodes is based on a hyperparameter that further teaches a mitigation-node hyperparameter.)

Gao and Baker are analogous art because both pertain to removing features from a neural network to improve the robustness of a neural network. 
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Gao’s inputting valid training input and altered training input and determining the activation difference from these two inputs (Gao, supra) to incorporate Baker’s processor and memory as well as a mitigation node hyperparameter value (Baker, supra). The modification provides not only a computer system having at least one processor and memory for implementing computerized dropout but also the ability to customize the control of dropout rather than randomly selecting nodes for dropout (Baker, ¶ [0097]: “10. Dropout: Dropout is a technique that randomly selects nodes in a neural network and temporarily sets the activation values of those nodes to zero.” “Some embodiments of this invention expand the number of hyperparameters to customize the control of dropout. Some embodiments implement nonrandom, controlled dropout.” ¶ [0650]: “FIG. 41 is a diagram of a computer system 4100 that could be used to implement the embodiments described above.”)

Gao modified by Baker does not appear to explicitly teach: 
program instructions to generate an ordered list of the nodes according to the respective activation differences; and 
program instructions to cause suppressing of outputs of a number of nodes on the ordered list during an operational phase of the neural network, 

Xiao does, however, teach: 
program instructions to generate an ordered list of the nodes according to the respective activation differences; and (Xiao, p. 1252, § 3.3, ¶ 1: “Given the CNN model pretrained by using the mixed dataset, we identify for each domain which neurons are effective. For each domain sample, we define the impact of a particular neuron on this sample as the gain of the loss function when we remove the neuron. Specifically, let g(x) ∈ Rd denote the d-dimensional CNN feature vector of an image x. The impact score of the i-th (i ∈ {1, 2, . . . , d}) neuron on this image sample is defined as si =                         
                            L
                        
                     (g(x)\i) −                         
                            L
                        
                     (g(x)), (3) where g(x)\i is the feature vector after we setting the i-th neuron response to zero.” P. 1253, § 3.3, ¶ 4: “After obtaining all the                         
                            
                                
                                    s
                                
                                -
                            
                        
                    i, we continue to train the CNN model, but with these impact scores as guidance to dropout different neurons for different domains during the training process.”  
The examiner notes that Xiao’s impact score is computed as the difference between cross-entropy loss (                        
                            L
                        
                     in Eq. (3) above) with a neuron removed from a neural network and the cross-entropy loss with the neuron remaining in the network and thus teaches respective activation differences.  The examiner further notes that Xiao’s computing the impact score of each of the neurons according to EQ. (3) and subsequently dropout different neurons by using such impact scores as guidance teaches generating an ordered list of the nodes according to respective activation differences as claimed.)

program instructions to cause suppressing of outputs of a number of nodes on the ordered list during an operational phase of the neural network, (Xiao, p. 1252, § 3.3, ¶ 1: “Given the CNN model pretrained by using the mixed dataset, we identify for each domain which neurons are effective. For each domain sample, we define the impact of a particular neuron on this sample as the gain of the loss function when we remove the neuron. Specifically, let g(x) ∈ Rd denote the d-dimensional CNN feature vector of an image x. The impact score of the i-th (i ∈ {1, 2, …, d}) neuron on this image sample is defined as si =                         
                            L
                        
                     (g(x)\i) −                         
                            L
                        
                     (g(x)), (3) where g(x)\i is the feature vector after we setting the i-th neuron response to zero.” P. 1253, § 3.3, ¶ 4: “After obtaining all the                         
                            
                                
                                    s
                                
                                -
                            
                        
                    i, we continue to train the CNN model, but with these impact scores as guidance to dropout different neurons for different domains during the training process.”  
The examiner notes that Xiao’s using a pretrained CNN model in subsequent computation of impact scores and dropping out neurons teaches an operational phase of the neural network.  The examiner further notes that Xiao’s dropping out neurons by using the respective impact scores of the neurons as guidance teaches causing suppressing of outputs of a number of nodes on the ordered list during an operational phase of the neural network as claimed.)
Gao, Baker, and Xiao are analogous art because all three references pertain to removing features from a neural network to improve the robustness of a neural network. 
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Gao in view of Baker to incorporate Xiao’s generating an ordered list of the nodes and suppressing the output of a number of nodes on the ordered list during an operational phase of the neural network (Xiao, supra).  The modification provides not only the ability to quantify and visualize effective neurons and less related neurons but also the ability to evaluate relative performance gains on multiple domains (Xiao, p. 1252, § 3.3, ¶ 1: “We visualize the neuron impact scores between several pairs of domains in Figure 3. It clearly shows that the two sets of impact scores have little correlation, indicating that the effective neurons for different domains are not the same.”  p. 1256, § 4.3, last paragraph: “We further investigate how does the deterministic Domain Guided Dropout change the network behavior by evaluating the relative performance gain on each domain with respect to the number of neurons having negative impact scores on that domain.”) 

Claims 5 and 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over   Gao et al. DeepCloak: Masking Deep Neural Network Models for Robustness Against Adversarial Samples (April 17, 2017) (hereinafter Gao) in view of Baker et al. US PGPub Baker) and Xiao et al. Learning Deep Feature Representations with Domain Guided Dropout for Person Re-identification (Apr. 26, 2016) (hereinafter Xiao) and further in view of Pham, V. et al. “Dropout improves Recurrent Neural Networks for Handwriting Recognition” (Mar. 10, 2014) (hereinafter Pham).

With respect to claim 5, Gao modified by Baker and Xiao teaches the method of claim 1 from which claim 5 depends but does not appear to explicitly teach:
selecting a layer of the neural network as a mitigation layer, 
wherein the mitigation layer includes at least one of the number of nodes, and 
wherein the mitigation layer is one of a number of one or more mitigation layers selected based on a value of a mitigation-layer hyperparameter. 

Pham does, however, teach: 
selecting a layer of the neural network as a mitigation layer, (Pham, FIG. 1 where Pham’s neural network repeats the configuration having “four LSTM layers” and “4 convolutional layers” twice and further includes four LSTM layers respectively connected to four fully connected layers.  § II, ¶ 3: “LSTM cells are carefully designed recurrent neurons with multiplicative gates to store information over long periods and forget when needed.” § IV-B-1): “Dropout at the topmost LSTM layer” in § IV-B-1) where Pham applies dropout to “the topmost LSTM layer”; “Dropout at multiple layers” in § IV-B -2) where Pham applies dropout to “the topmost the top two, and all the tree LSTM layers” or “any layer”.  
The examiner notes that a layer to which Pham’s dropout applies teaches a mitigation layer, that a layer in Pham includes one or more cells (neurons), and that Pham’s selecting a layer having one or more cells (neurons) for dropout teaches selecting a layer as a mitigation layer.)

wherein the mitigation layer includes at least one of the number of nodes, and (Pham, FIG. 1 where Pham’s neural network repeats the configuration having “four LSTM layers” and “4 convolutional layers” twice and further includes four LSTM layers respectively connected to four fully connected layers.  § II, ¶ 3: “LSTM cells are carefully designed recurrent neurons with multiplicative gates to store information over long periods and forget when needed.” § IV-B-1): “Dropout at the topmost LSTM layer” in § IV-B-1) where Pham applies dropout to “the topmost LSTM layer”; “Dropout at multiple layers” in § IV-B -2) where Pham applies dropout to “the topmost the top two, and all the tree LSTM layers” or “any layer”.  
The examiner notes that a layer to which Pham’s dropout applies teaches a mitigation layer, that a layer in Pham includes one or more cells (neurons), and that one or more cells (neurons) in a layer for Pham’s dropout teaches at least one of the number of nodes whose outputs are to be suppressed. Therefore, the examiner asserts that Pham teaches the above limitation.)

 (Pham FIG. 1 where Pham’s neural network repeats the configuration having “four LSTM layers” and “4 convolutional layers” twice and further includes four LSTM layers respectively connected to four fully connected layers. § IV-B-1): “Dropout at the topmost LSTM layer” in § IV-B-1) where Pham applies dropout to “the topmost LSTM layer”; “Dropout at multiple layers” in § IV-B -2) where Pham applies dropout to “the topmost, the top two, all the three LSTM layers” or “any layer”.  § III, ¶ 1: “Dropout involves a hyper-parameter p, for which a common value is p = 0:5.” 
The examiner notes that each configuration in Pham’s FIG. 1 corresponds to a total number of layers to which Pham applies dropout.  The examiner further notes that a total number of layers (e.g., the topmost,, the top two, all LSTM layers, etc.) to which dropout is to be applied pertains to a configuration of Pham’s neural network for applying dropout and thus teaches a value of a hyperparameter, and that Pham’s specifying the “topmost,” “the top two,” “all the three,” and/or “any layer” to which dropout is to be applied teaches a number of one or more mitigation layers selected based on a value of a mitigation-layer hyperparameter.)

Gao, Baker, Xiao, and Pham are analogous art because all four references pertain to improving neural networks for object recognition with dropout. 
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Gao in view of Baker and Xiao to incorporate Pham’s selecting a layer as a mitigation layer having one or more nodes based on a Xiao, supra).  The modification enables the performance of dropout on the aforementioned mitigation layer to effectively prevent neural network parameters from overfitting without incurring too much time for training when compared to convention approaches (Pham, § I, right-hand column, ¶ 2: “Meanwhile, in the emerging deep learning movement, dropout was used to effectively prevent deep neural networks with lots of parameters from overfitting.” § IV, right-hand column, ¶ 1: “Moreover, since the inputs of this layer have smaller sizes than those of lower layers due to subsampling, dropout at this layer will not take too much time during training.” § IV, right-hand column, ¶ 6: “Normally when dropout is applied at any layer, we double the number of LSTM units at that layer. This is to keep the same number of active hidden units (on average) when using dropout with p = 0:5 as in the baseline where all hidden units are active.”) 

With respect to claim 15 depending upon claim 11, it is substantially similar to claim 5 and is rejected in the same manner, the same art and reasoning applying.

Claims 6 and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over   Gao et al. DeepCloak: Masking Deep Neural Network Models for Robustness Against Adversarial Samples (April 17, 2017) (hereinafter Gao) in view of Baker et al. US PGPub 20200285939 effectively filed on September 28, 2017 (hereinafter Baker), Xiao et al. Learning Deep Feature Representations with Domain Guided Dropout for Person Re-identification (Apr. 26, 2016) (hereinafter Xiao), and Pham, V. et al. “Dropout improves Recurrent Neural Networks for Handwriting Recognition” (Mar. 10, 2014) (hereinafter Pham) and further in view of Xu et al. US PGPub 20180150783 published on May 31, 2018 (hereinafter Xu) and Wen et al. Learning structured sparsity in deep neural networks. In Advances in Neural Information Processing Systems, pages 2074–2082, 2016 (hereinafter Wen).

With respect to claim 6, Gao modified by Baker, Xiao, and Pham teaches the method of claim 5 from which claim 6 depends, and Pham further teaches: 
associating an array with the neural network, the array comprising the one or more mitigation layers; and (Pham, FIG. 1 and Caption: “The Recurrent Neural Network considered in this paper, with the places where dropout can be applied.” § IV-B, right-hand column, ¶ 5: “In our architecture, there are 3 LSTM layers, hence we tried applying dropout at the topmost, the top two and all the three LSTM layers.” The examiner notes that Pham’s storing its complex layer structure of its neural network architecture (e.g., in a data structure) and ordering the LSTM layers in such a data structure to distinguish the topmost and the top two layers renders obvious that Pham stores its neural network’s layers in a data structure which teaches an array.  The examiner further notes that Pham’s storing and ordering the LASTM layers of its neural network in its array in order to distinguish the topmost and the top two layers teaches associating an array with the neural network, and that the array comprising the one or more mitigation layers (e.g., Pham’s topmost, top two, or all LSTM layers).)

(Pham, FIG. 1 and Caption: “The Recurrent Neural Network considered in this paper, with the places where dropout can be applied.” § IV-B, right-hand column, ¶ 5: “In our architecture, there are 3 LSTM layers, hence we tried applying dropout at the topmost, the top two and all the three LSTM layers.” The examiner notes that Pham’s ordering the LSTM layers in order to distinguish the “topmost” and “top two” layers from all layers renders obvious the limitation positioning the mitigation layer in the array in an order relative to other mitigation layers.)
Gao, Baker, Xiao, and Pham are analogous art because all four references pertain to improving neural networks for object recognition with dropout. 
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Gao in view of Baker, Xiao, and Pham to further incorporate Pham’s array that comprises one or more mitigation layers where a mitigation layer is positioned in an order relative to other mitigation layers (Pham, supra).  The modification enables sorting the layers of a neural network stored in such an array to distinguish the topmost, the top two layers to which dropout is applied to effectively prevent neural network parameters from overfitting without incurring too much time for training when compared to convention approaches (Pham, § I, right-hand column, ¶ 2: “Meanwhile, in the emerging deep learning movement, dropout was used to effectively prevent deep neural networks with lots of parameters from overfitting.” § IV, right-hand column, ¶ 1: “Moreover, since the inputs of this layer have smaller sizes than those of lower layers due to subsampling, dropout at this layer will not take too much time during training.” § IV, right-hand column, ¶ 6: “Normally when dropout is applied at any layer, we double the number of LSTM units at that layer. This is to keep the same number of active hidden units (on average) when using dropout with p = 0:5 as in the baseline where all hidden units are active.”)

Gao modified by Baker, Xiao, and Pham does not appear to explicitly teach: 
the order using an amount of the number of nodes that are on the mitigation layer in determining a position of the mitigation layer in the array.
Wen does, however, teach: 
the order using an amount of the number of nodes that are on the mitigation layer in determining a position of the mitigation layer in the array. (Abstract: “propose a Structured Sparsity Learning (SSL) method to regularize the structures (i.e., filters, channels, filter shapes, and layer depth) of DNNs”. § 3.2, p. 4: “our approach tends to remove less important filters and channels”; “SSL removes an entire unimportant layers [sic]”; § 4.1, p. 5: “a neuron can degenerate to a removable dummy neuron if all of its output connections are zeroed out” and “[t]he results show that SSL can not only remove hidden neurons but also discover the sparsity of images.” FIG. 4(a) on p. 6: arrangement in descending error rates with descending, corresponding numbers of neurons in each layer of an MLP network of 3 MLP networks. FIG. 6 on p. 7: depth regularization by Wen’s SSL and graphically positioning of each depth-regularized ResNet according to the layers removed (e.g., SSL-ResNet-20 represents the original ResNet-20, SSL-ResNet-18 represents two layers removed from the original ResNet-20, etc.) 
The examiner notes that Wen teaches removing layers for depth-regularization of a neural network and further arranging the descending error rates with corresponding numbers of neurons in each layer of an MLP network of 3 MLP networks in FIG. 4(a) and/or removing different numbers of less important layers in FIG. 6 teaches this limitation in its entirety.)
Gao, Baker, Xiao, Pham, and Wen are analogous art because all five references pertain to improving neural networks by removing elements of neural networks with dropout and/or regularization techniques.  
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Gao in view of Baker, Xiao, and Pham to further incorporate Wen’s ordering a layer using an amount of the number of nodes that are on the layer in determining a position of the layer in the array.  The modification of arranging layers according to the respective numbers of neurons in the layers after removing neurons corresponding to zero connections provides the capability of discovering the sparsity of images (e.g., input neurons having zero connections) as well as where the neurons having zero connections are located in the data (Wen, p. 5, last paragraph: “We enforce the group Lasso regularization on all the input (or output) connections of each neuron. A neuron whose input connections are all zeroed out can degenerate to a bias neuron in the next layer; similarly, a neuron can degenerate to a removable dummy neuron if all of its output connections are zeroed out. Figure 4(a) summarizes the learned structure and FLOP of different MLP networks. The results show that SSL can not only remove hidden neurons but also discover the sparsity of images. For example, Figure 4(b) depicts the number of connections of each input neuron in MLP 2, where 40.18% of input neurons have zero connections and they concentrate at the boundary of the image.”) 

With respect to claim 16 depending upon claim 15, it is substantially similar to claim 6 and is rejected in the same manner, the same art and reasoning applying.

Claims 18-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over   Gao et al. DeepCloak: Masking Deep Neural Network Models for Robustness Against Adversarial Samples (April 17, 2017) (hereinafter Gao) in view of Baker et al. US PGPub 20200285939 effectively filed on September 28, 2017 (hereinafter Baker) and Xiao et al. Learning Deep Feature Representations with Domain Guided Dropout for Person Re-identification (Apr. 26, 2016) (hereinafter Xiao) and further in view of von Kaenel et al. US PGPub 2015/0193630 with the effective filing date of Mar. 16, 2002 (hereinafter von Kaenel).

With respect to claim 18 depending upon claim 11, Gao modified by Baker teaches the computer usable program product of claim 11 but does not appear to explicitly teach: 
wherein the computer usable code is stored in a computer readable storage device in a data processing system, and wherein the computer usable code is transferred over a network from a remote data processing system.  
von Kaenel does, however, teach: 
(¶ [1178]: “In certain implementations, maintenance of all the software/hardware systems deployed at the Operations Center will be the responsibility of the Operations Group.” “The Operations staff will also be responsible for the uploading and deploying of data, imagery and application releases to the Web Center.” ¶ [1216]: “The enterprise spatial system scales seamlessly from a prototype system deployed on a single machine for development testing at the Operations Center to a large-scale web deployment at the Web Center.” The examiner notes that von Kaenel’s deploying software from one system to another system or center teaches this limitation.)
Gao, Baker, Xiao, and von Kaenel are analogous art because all four references pertain to improving performance and robustness of computing systems.  
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Gao in view of Baker and Xiao to incorporate von Kaenel’s transferring computer code from a remote data processing system over a network (von Kaenel, supra).  The modification provides the clients with a software maintenance functions that are not supported by the clients’ environments (von Kaenel, “The Operations Group will do all system maintenance functions not supported by the staff at the web-hosting environment.”)

With respect to claim 19 depending upon claim 11, Gao modified by Baker teaches the computer usable program product of claim 11 but does not appear to explicitly teach 
von Kaenel does, however, teach: 
wherein the computer usable code is stored in a computer readable storage device in a server data processing system, and wherein the computer usable code is downloaded over a network to a remote data processing system for use in a computer readable storage device associated with the remote data processing system.  (¶ [0523]: “the enterprise spatial system is designed in a way that supports a “0 to fat” client download, where a ‘fat’ client download is defined as augmenting approximately 75% or more of the functionality through some level of download”; ¶ [1178]: “the Operations Group” “will also be responsible for the uploading and deploying of data, imagery and application releases to the Web Center”; ¶ [1216]: “The enterprise spatial system scales seamlessly from a prototype system deployed on a single machine for development testing at the Operations Center to a large-scale web deployment at the Web Center.” The examiner notes that the 0 to fat client download of von Kaenel’s enterprise spatial system teaches this limitation.)
Gao, Baker, Xiao, and von Kaenel are analogous art because all four references pertain to improving performance and robustness of computing systems.  
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Gao in view of Baker and Xiao to incorporate von Kaenel’s downloading computer usable code over a network to a remote data processing von Kaenel, supra).  The modification allows clients the flexibility to download various downloads depending on their respective importance and to perform the download by an enterprise spatial system on behalf of a client (von Kaenel, ¶ [0523]: “This approach allows users the flexibility to download various client downloads or not, depending upon what is most important to them (e.g., performance vs flexibility). Specifically, with this approach, when a specific function is called, it is determined whether the client download is stored locally at the client system. If a client download is present at the client system, the download is used to perform the functionality. If the client download is not present at the client system, the enterprise spatial system goes to the server system to perform the functionality.”)
provides the clients with a software maintenance functions that are not supported by the clients’ environments (von Kaenel, “The Operations Group will do all system maintenance functions not supported by the staff at the web-hosting environment.”)

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: Xu et al. US PGPub 20180150783 published on May 31, 2018 teaches maintaining a set of predictive models as a library of predictive models as well as maintaining any data such as layers of a deep neural network in a data store in a variety of data structures, such as tables or databases. 
THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERICH C. TZOU whose telephone number is (571)272-9852. The examiner can normally be reached Monday-Friday 5:30AM-5:30PM PST with alternative Fridays off.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann J. Lo can be reached on 571-272-9767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.



/E.C.T./Examiner, Art Unit 2126 
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126