DETAILED ACTION
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114 was filed in this application after appeal to the Patent Trial and Appeal Board, but prior to a decision on the appeal. Since this application is eligible for continued examination under 37 CFR 1.114 and the fee set forth in 37 CFR 1.17(e) has been timely paid, the appeal has been withdrawn pursuant to 37 CFR 1.114 and prosecution in this application has been reopened pursuant to 37 CFR 1.114. Applicant’s submission filed on 09/28/2020 has been entered.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
The present application was filed on 07/20/2015 and claims benefit of Provisional Application No. 62/026,359 (filed on 07/18/2014).
This action is in response to amendments and remarks filed with the RCE submitted on 09/28/2020. In the current amendments, claims 14-15 and 29-30 are amended. Claims 1-30 are pending and have been examined.
In response to the Terminal Disclaimer filed on 09/28/2020, the nonstatutory Double Patenting rejection made in the previous Office Action has been withdrawn.
In response to amendments and remarks filed on 09/28/2020, the 35 U.S.C. 112(b) rejection to claims 14-15 and 29-30 made in the previous Office Action has been withdrawn.



Information Disclosure Statement
The information disclosure statement (IDS) submitted on 04/16/2020 was filed after the mailing date of the Non-Final Rejection Office Action on 02/26/2020.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 2, 4, 5, 7-9, 11-13, 15-17, 19, 20, 22-24, 26-28, and 30 are rejected under 35 U.S.C. 103 as being unpatentable over Poole et al. (“Analyzing noise in autoencoders and deep networks”) in view of Sukhbaatar et al. (“Learning from Noisy Labels with Deep Neural Networks”).
Claim limitations reciting “or” have been interpreted as requiring at least one of the alternatives and not requiring all of the alternatives listed in the claim. 


Regarding Claim 1,
Poole et al. teaches a learning computer system that estimates parameters and states of a stochastic or uncertain system comprising a data processing system that includes a hardware processor that has a configuration that (pg. 5 Section 4: “All experiments were run in Python using the Pylearn21 framework on a single Intel Xeon machine with an NVIDIA GTX 660 GPU” teaches a learning computer system that includes a GPU (hardware processor); pg. 2 Section 2.2: “We call these models noisy autoencoders (NAEs) as their hidden representations are stochastic, and no longer a deterministic function of the input...When using the NAE to extract features or perform denoising on testing data we can compute the expectation of the noisy hidden activation or reconstruction by sampling from the NAE...we can approximate the expectation by scaling each of the corrupted variables by their expectation as in dropout” teaches approximating the expectation of the noisy hidden activations (estimating parameters and states) of the noisy autoencoders (stochastic system)): 
receives data from a user or other source (pg. 5 Section 4.1: “In our first experiment, we evaluated the effect of dropout noise on the generalization performance of a noisy autoencoder. We trained two NAEs on 12x12 patches drawn from the van Hateren natural image dataset” teaches receving input data from an image dataset (other source)); 
processes the received data through layers of processing units, thereby generating processed data (Fig. 1 teaches processing received input data through a noisy autoencoder model containing several processing units, thus generating processed data); 
...compares the output signals with reference signals to generate error signals (pg. 2 Section 2.1: “An autoencoder is a type of one layer neural network that is trained to reconstruct its inputs...The composition of the encoder and decoder yield the reconstruction function: r(x) = g(f (x)). The typical training criterion for autoencoders is minimizing the reconstruction error,                         
                            
                                
                                    ∑
                                    
                                        x
                                        ∈
                                        X
                                    
                                
                                
                                    L
                                    (
                                    x
                                    ,
                                     
                                    r
                                    (
                                    x
                                    )
                                    )
                                
                            
                        
                     with respect to some loss L, typically either squared error or the binary cross-entropy” teaches the reconstruction                         
                            
                                
                                    ∑
                                    
                                        x
                                        ∈
                                        X
                                    
                                
                                
                                    L
                                    (
                                    x
                                    ,
                                     
                                    r
                                    (
                                    x
                                    )
                                    )
                                
                            
                        
                    . pg. 5 Section 4: “All experiments were run in Python using the Pylearn21 framework on a single Intel Xeon machine with an NVIDIA GTX 660 GPU” teaches a learning computer system that includes a GPU (hardware processor)); 
sends and processes the error signals back through the layers of processing units (pg. 5 Section 4.3: “To better evaluate the impact of hidden unit input and activation noise on NAE classification performance, we trained larger models...These models were used as initialization for a MLP that was trained with standard backpropagation” teaches backpropagation, which corresponds to sending the error signals back through layers of processing units); 
generates random, chaotic, fuzzy, or other numerical perturbations of the received data, the processed data, or the output signals (pg. 2 Section 2.2: “We parameterize the noise in the NAE as a tuple (                        
                            
                                
                                    ϵ
                                
                                
                                    I
                                
                            
                            ,
                             
                            
                                
                                    ϵ
                                
                                
                                    H
                                
                            
                            ,
                             
                            
                                
                                    ϵ
                                
                                
                                    Z
                                
                            
                        
                    ) that characterizes the distribution of the noises corrupting the input, hidden unit inputs, and hidden activations respectively (see Figure 1)” teaches generating noise (numerical perturbations) of the input data (received data) and data processed through hidden units (processed data)); 
estimates the parameters and states of the stochastic or uncertain system using the received data, the numerical perturbations, and previous parameters and states of the stochastic or uncertain system (Fig. 1 and pg. 2 Section 2.2: “We call these models noisy autoencoders (NAEs) as their hidden representations are stochastic, and no longer a deterministic function of the input...When using the NAE to extract features or perform denoising on testing data we can compute the expectation of the noisy hidden activation or reconstruction by sampling from the NAE...we can approximate the expectation by scaling each of the corrupted variables by their expectation as in dropout” teach pg. 5 Section 4.3 teaches backpropagation)); 
determines whether the generated numerical perturbations satisfy a condition; and if the numerical perturbations satisfy the condition, injects the numerical perturbations into the estimated parameters or states, the received data, the processed data, the masked or filtered data, or the processing units (pg. 4 Section 3.1: “We cannot directly relate this penalty to a form of noise, but we can recover a penalty that encourages sparsity on hidden unit activations. If we inject additive Gaussian noise on the activations of the hidden units with variance equal to the uncorrupted hidden unit activation then the marginalized noise penalty becomes...” teaches injecting additive Gaussian noise (numerical perturbation) to the activations of the hidden units (processing units) to encourage sparsity after determining the variance of the Gaussian noise on the activations of the hidden units equal to the uncorrupted hidden unit activation (satisfies a criterion)).
Poole et al. teaches noise injection in the internal representations of a deep network (pg. 7-8 Section 5), but Poole et al. does not appear to explicitly teach applies masks or filters to the processed data using convolutional processing; processes the masked or filtered data to produce one or more intermediate and output signals.
However, Sukhbaatar et al. teaches applies masks or filters to the processed data using convolutional processing; processes the masked or filtered data to produce one or more intermediate and output signals (pg. 5-6 Section 4: “In this section, we empirically examine the robustness of deep networks with and without noise modeling. We experiment on several different image classification datasets with label noise. As the base model, we use convolutional deep networks because they produce state-of-art performance on many image classification tasks” and pg. 6 Section 4.1: “We use a publicly available fast GPU code1 for training deep networks. As the base model, we use their “18% model” with three convolutional layers (layers-18pct.cfg) for both SVHN and CIFAR-10 experiments” teach convolutional neural network with noise modeling that contains three convolutional layers, which corresponds to applying filters to the processed data using convolutional processing since a convolutional neural network contains convolutional filters; each of the three convolutional layers produces intermediate and output signals to the next layer).
Poole et al. and Sukhbaatar et al. are analogous art because they are directed to analysis of noise injection in neural network modeling.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate applies masks or filters to the processed data using convolutional processing; processes the masked or filtered data to produce one or more intermediate and output signals as taught by Sukhbaatar et al. to the disclosed system of Poole et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to modify deep learning models so they can be effectively trained on data with high level of label noise by adding a noise layer [corresponds to injecting noise] (Sukhbaatar et al. Fig. 1 and pg. 1 Section 1).
Regarding Claim 2,
Poole et al. in view of Sukhbaatar et al. teaches the learning computer system of claim 1.
Sukhbaatar et al. further teaches wherein the learning computer system unconditionally injects noise or chaotic or other perturbations into the estimated parameters or states, the received data, the processed data, the masked or filtered data, or the processing units (pg. 6 Section 4.1: “We use a publicly available fast GPU code1 for training deep networks. As the base model, we use their “18% model” with three convolutional layers (layers-18pct.cfg) for both SVHN and CIFAR-10 experiments” teaches learning computer system; pg. 6 Fig. 4.1: “We synthesize noisy data from clean data by deliberately changing some of the labels. Original label i is randomly changed to j with fixed probability” teaches randomly injecting noise (unconditionally injecting noise) into the received data).
Poole et al. and Sukhbaatar et al. are analogous art because they are directed to analysis of noise injection in neural network modeling.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein the learning computer system unconditionally injects noise or chaotic or other perturbations into the estimated parameters or states, the received data, the processed data, the masked or filtered data, or the processing units as taught by Sukhbaatar et al. to the disclosed system of Poole et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to modify deep learning models so they can be effectively trained on data with high level of label noise by adding a noise layer [corresponds to injecting noise] (Sukhbaatar et al. Fig. 1 and pg. 1 Section 1).
Regarding Claim 4,
Poole et al. in view of Sukhbaatar et al. teaches the learning computer system of claim 2.
Sukhbaatar et al. further teaches wherein the unconditional injection improves the accuracy of the learning computer system (pg. 8 Section 4.3: “the absolute performance achieved with our techniques and the 1.4M additional noisy data equals the performance when training on an additional 15M clean images from ImageNet 2011 (row 3). This demonstrates that noisy data can be very beneficial for training”).
Poole et al. and Sukhbaatar et al. are analogous art because they are directed to analysis of noise injection in neural network modeling.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein the unconditional injection improves the accuracy of the learning computer system as taught by Sukhbaatar et al. to the disclosed system of Poole et al.
 Fig. 1 and pg. 1 Section 1).
Regarding Claim 5,
Poole et al. in view of Sukhbaatar et al. teaches the learning computer system of claim 1.
Poole et al. further teaches wherein the received data represents an image (pg. 5 Section 4: “we evaluate the effectiveness of noisy autoencoders at learning representations through a variety of experiments on natural images, MNIST, and CIFAR-10”).
Regarding Claim 7,
Poole et al. in view of Sukhbaatar et al. teaches the learning computer system of claim 1.
Poole et al. further teaches wherein the injection improves the accuracy of the learning computer system (pg. 7 Section 4.5: “We have shown that different types of noise can be used to regularize hidden representations and improve classification performance on MNIST and CIFAR-10”).
Regarding Claim 8,
Poole et al. teaches a learning computer system that estimates parameters and states of a stochastic or uncertain system comprising a data processing system that includes a hardware processor that has a configuration that (pg. 5 Section 4: “All experiments were run in Python using the Pylearn21 framework on a single Intel Xeon machine with an NVIDIA GTX 660 GPU” teaches a learning computer system that includes a GPU (hardware processor) and Python instructions; pg. 2 Section 2.2: “We call these models noisy autoencoders (NAEs) as their hidden representations are stochastic, and no longer a deterministic function of the input...When using the NAE to extract features or perform denoising on testing data we can compute the expectation of the noisy hidden activation or reconstruction by sampling from the NAE...we can approximate the expectation by scaling each of the corrupted variables by their expectation as in dropout” teaches approximating the expectation of the noisy hidden activations (estimating parameters and states) of the noisy autoencoders (stochastic system)): 
receives data from a user or other source (pg. 5 Section 4.1: “In our first experiment, we evaluated the effect of dropout noise on the generalization performance of a noisy autoencoder. We trained two NAEs on 12x12 patches drawn from the van Hateren natural image dataset” teaches receiving input data from an image dataset (other source)); 
processes only a portion of the received data through layers of processing units, thereby generating processed data (Fig. 1 teaches processing received input data through a noisy autoencoder model containing several processing units, thus generating processed data; pg. 5 Section 4.1: “In our first experiment, we evaluated the effect of dropout noise on the generalization performance of a noisy autoencoder. We trained two NAEs on 12x12 patches drawn from the van Hateren natural image dataset” teaches receiving only a portion of input data from an image dataset (other source) in which the portion contains the “12x12 patches drawn from the van Hateren natural image dataset”); 
...compares the output signals with reference signals to generate error signals (pg. 2 Section 2.1: “An autoencoder is a type of one layer neural network that is trained to reconstruct its inputs...The composition of the encoder and decoder yield the reconstruction function: r(x) = g(f (x)). The typical training criterion for autoencoders is minimizing the reconstruction error,                         
                            
                                
                                    ∑
                                    
                                        x
                                        ∈
                                        X
                                    
                                
                                
                                    L
                                    (
                                    x
                                    ,
                                     
                                    r
                                    (
                                    x
                                    )
                                    )
                                
                            
                        
                     with respect to some loss L, typically either squared error or the binary cross-entropy” teaches the reconstruction error of an autoencoder, which is a type of neural network. The calculation of the error is with respect to the loss (L) between x (reference) and r(x) (reconstruction output), which corresponds to comparing reference signal with output signal. Please see reconstruction error                         
                            
                                
                                    ∑
                                    
                                        x
                                        ∈
                                        X
                                    
                                
                                
                                    L
                                    (
                                    x
                                    ,
                                     
                                    r
                                    (
                                    x
                                    )
                                    )
                                
                            
                        
                    . pg. 5 Section 4: “All experiments were run in Python using the Pylearn21 framework on a single Intel Xeon machine with an NVIDIA GTX 660 GPU” teaches a learning computer system that includes a GPU (hardware processor));
pg. 5 Section 4.3: “To better evaluate the impact of hidden unit input and activation noise on NAE classification performance, we trained larger models...These models were used as initialization for a MLP that was trained with standard backpropagation” teaches backpropagation, which corresponds to sending the error signals back through layers of processing units);
generates random, chaotic, fuzzy, or other numerical perturbations of the portion of the received data, the processed data, or the output signals; receives data from a user or other source (pg. 2 Section 2.2: “We parameterize the noise in the NAE as a tuple (                        
                            
                                
                                    ϵ
                                
                                
                                    I
                                
                            
                            ,
                             
                            
                                
                                    ϵ
                                
                                
                                    H
                                
                            
                            ,
                             
                            
                                
                                    ϵ
                                
                                
                                    Z
                                
                            
                        
                    ) that characterizes the distribution of the noises corrupting the input, hidden unit inputs, and hidden activations respectively (see Figure 1)” teaches generating noise (numerical perturbations) of the input data (received data) and data processed through hidden units (processed data));
processes only a portion of the received data through layers of processing units, thereby generating processed data (Fig. 1 teaches processing received input data through a noisy autoencoder model containing several processing units, thus generating processed data; pg. 5 Section 4.1: “In our first experiment, we evaluated the effect of dropout noise on the generalization performance of a noisy autoencoder. We trained two NAEs on 12x12 patches drawn from the van Hateren natural image dataset” teaches receiving only a portion of input data from an image dataset (other source) in which the portion contains the “12x12 patches drawn from the van Hateren natural image dataset”);
...compares the output signals with reference signals to generate error signals (pg. 2 Section 2.1: “An autoencoder is a type of one layer neural network that is trained to reconstruct its inputs...The composition of the encoder and decoder yield the reconstruction function: r(x) = g(f (x)). The typical training criterion for autoencoders is minimizing the reconstruction error,                         
                            
                                
                                    ∑
                                    
                                        x
                                        ∈
                                        X
                                    
                                
                                
                                    L
                                    (
                                    x
                                    ,
                                     
                                    r
                                    (
                                    x
                                    )
                                    )
                                
                            
                        
                     with respect to some loss L, typically either squared error or the binary cross-entropy” teaches the reconstruction error of an autoencoder, which is a type of neural network. The calculation of the error is with respect                         
                            
                                
                                    ∑
                                    
                                        x
                                        ∈
                                        X
                                    
                                
                                
                                    L
                                    (
                                    x
                                    ,
                                     
                                    r
                                    (
                                    x
                                    )
                                    )
                                
                            
                        
                    . pg. 5 Section 4: “All experiments were run in Python using the Pylearn21 framework on a single Intel Xeon machine with an NVIDIA GTX 660 GPU” teaches a learning computer system that includes a GPU (hardware processor));
sends and processes the error signals back through the layers of processing units  (pg. 5 Section 4.3: “To better evaluate the impact of hidden unit input and activation noise on NAE classification performance, we trained larger models...These models were used as initialization for a MLP that was trained with standard backpropagation” teaches backpropagation, which corresponds to sending the error signals back through layers of processing units);  
generates random, chaotic, fuzzy, or other numerical perturbations of the portion of the received data, the processed data, or the output signals (pg. 2 Section 2.2: “We parameterize the noise in the NAE as a tuple (                        
                            
                                
                                    ϵ
                                
                                
                                    I
                                
                            
                            ,
                             
                            
                                
                                    ϵ
                                
                                
                                    H
                                
                            
                            ,
                             
                            
                                
                                    ϵ
                                
                                
                                    Z
                                
                            
                        
                    ) that characterizes the distribution of the noises corrupting the input, hidden unit inputs, and hidden activations respectively (see Figure 1)” teaches generating noise (numerical perturbations) of the input data (received data) and data processed through hidden units (processed data)); 
estimates the parameters and states of the stochastic or uncertain system using the portion of the received data, the numerical perturbations, and previous parameters and states of the stochastic or uncertain system (Fig. 1 and pg. 2 Section 2.2: “We call these models noisy autoencoders (NAEs) as their hidden representations are stochastic, and no longer a deterministic function of the input...When using the NAE to extract features or perform denoising on testing data we can compute the expectation of the noisy hidden activation or reconstruction by sampling from the NAE...we can approximate the expectation by scaling each of the corrupted variables by their expectation as in dropout” teach approximating the expectation of the noisy hidden activations (estimating parameters pg. 5 Section 4.3 teaches backpropagation));
determines whether the generated numerical perturbations satisfy a condition; and if the numerical perturbations satisfy the condition, injects the numerical perturbations into the estimated parameters or states, the portion of the received data, the processed data, the masked or filtered data, or the processing units (pg. 4 Section 3.1: “We cannot directly relate this penalty to a form of noise, but we can recover a penalty that encourages sparsity on hidden unit activations. If we inject additive Gaussian noise on the activations of the hidden units with variance equal to the uncorrupted hidden unit activation then the marginalized noise penalty becomes...” teaches injecting additive Gaussian noise (numerical perturbation) to the activations of the hidden units (processing units) to encourage sparsity after determining the variance of the Gaussian noise on the activations of the hidden units equal to the uncorrupted hidden unit activation (satisfies a criterion)).
Poole et al. teaches noise injection in the internal representations of a deep network (pg. 7-8 Section 5), but Poole et al. does not appear to explicitly teach processes the masked or filtered data to produce one or more intermediate and output signals; processes the masked or filtered data to produce one or more intermediate and output signals.
However, Sukhbaatar et al. teaches processes the masked or filtered data to produce one or more intermediate and output signals; processes the masked or filtered data to produce one or more intermediate and output signals (pg. 5-6 Section 4: “In this section, we empirically examine the robustness of deep networks with and without noise modeling. We experiment on several different image classification datasets with label noise. As the base model, we use convolutional deep networks because they produce state-of-art performance on many image classification tasks” and pg. 6 Section 4.1: “We use a publicly available fast GPU code1 for training deep networks. As the base model, we use their “18% model” with three convolutional layers (layers-18pct.cfg) for both SVHN and CIFAR-10 experiments” teach convolutional neural network with noise modeling that contains three convolutional layers, which corresponds to applying filters to the processed data using convolutional processing since a convolutional neural network contains convolutional filters; each of the three convolutional layers produces intermediate and output signals to the next layer).
Poole et al. and Sukhbaatar et al. are analogous art because they are directed to analysis of noise injection in neural network modeling.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate processes the masked or filtered data to produce one or more intermediate and output signals; processes the masked or filtered data to produce one or more intermediate and output signals as taught by Sukhbaatar et al. to the disclosed system of Poole et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to modify deep learning models so they can be effectively trained on data with high level of label noise by adding a noise layer [corresponds to injecting noise] (Sukhbaatar et al. Fig. 1 and pg. 1 Section 1).
Regarding Claim 9,
Poole et al. in view of Sukhbaatar et al. teaches the learning computer system of claim 8.
Sukhbaatar et al. further teaches wherein the system applies masks or filters to the processed data using convolutional processing (pg. 5-6 Section 4: “In this section, we empirically examine the robustness of deep networks with and without noise modeling. We experiment on several different image classification datasets with label noise. As the base model, we use convolutional deep networks because they produce state-of-art performance on many image classification tasks” and pg. 6 Section 4.1: “We use a publicly available fast GPU code1 for training deep networks. As the base model, we use their “18% model” with three convolutional layers (layers-18pct.cfg) for both SVHN and CIFAR-10 experiments” teach convolutional neural network with noise modeling that contains three convolutional 
Poole et al. and Sukhbaatar et al. are analogous art because they are directed to analysis of noise injection in neural network modeling.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein the system applies masks or filters to the processed data using convolutional processing as taught by Sukhbaatar et al. to the disclosed system of Poole et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to modify deep learning models so they can be effectively trained on data with high level of label noise by adding a noise layer [corresponds to injecting noise] (Sukhbaatar et al. Fig. 1 and pg. 1 Section 1).
Regarding Claim 11,
Poole et al. in view of Sukhbaatar et al. teaches the learning computer system of claim 8.
Poole et al. further teaches wherein the injection improves the accuracy of the learning computer system (pg. 7 Section 4.5: “We have shown that different types of noise can be used to regularize hidden representations and improve classification performance on MNIST and CIFAR-10”).
Regarding Claim 12,
Poole et al. in view of Sukhbaatar et al. teaches the learning computer system of claim 8.
Poole et al. further teaches wherein the system injects the random, chaotic, fuzzy, or other numerical perturbations into the portion of the received data (pg. 4 Section 3.1: “We cannot directly relate this penalty to a form of noise, but we can recover a penalty that encourages sparsity on hidden unit activations. If we inject additive Gaussian noise on the activations of the hidden units with variance equal to the uncorrupted hidden unit activation then the marginalized noise penalty becomes...” teaches injecting additive Gaussian noise (numerical perturbation) to the activations of the hidden units (processing units).
Regarding Claim 13,
Poole et al. in view of Sukhbaatar et al. teaches the learning computer system of claim 8.
Sukhbaatar et al. further teaches wherein the learning computer system unconditionally injects noise or chaotic or other perturbations into the estimated parameters or states, the received data, the processed data, the masked or filtered data, or the processing units (pg. 6 Section 4.1: “We use a publicly available fast GPU code1 for training deep networks. As the base model, we use their “18% model” with three convolutional layers (layers-18pct.cfg) for both SVHN and CIFAR-10 experiments” teaches learning computer system; pg. 6 Fig. 4.1: “We synthesize noisy data from clean data by deliberately changing some of the labels. Original label i is randomly changed to j with fixed probability” teaches randomly injecting noise (unconditionally injecting noise) into the received data).
Poole et al. and Sukhbaatar et al. are analogous art because they are directed to analysis of noise injection in neural network modeling.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein the learning computer system unconditionally injects noise or chaotic or other perturbations into the estimated parameters or states, the received data, the processed data, the masked or filtered data, or the processing units as taught by Sukhbaatar et al. to the disclosed system of Poole et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to modify deep learning models so they can be effectively trained on data with high level of label noise by adding a noise layer [corresponds to injecting noise] (Sukhbaatar et al. Fig. 1 and pg. 1 Section 1).
Regarding Claim 15,
Poole et al. in view of Sukhbaatar et al. teaches the learning computer system of claim 13.
Sukhbaatar et al. further teaches wherein the unconditional injection improves the accuracy of the learning computer system (pg. 8 Section 4.3: “the absolute performance achieved with our techniques and the 1.4M additional noisy data equals the performance when training on an additional 15M clean images from ImageNet 2011 (row 3). This demonstrates that noisy data can be very beneficial for training”).
Poole et al. and Sukhbaatar et al. are analogous art because they are directed to analysis of noise injection in neural network modeling.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein the unconditional injection improves the accuracy of the learning computer system as taught by Sukhbaatar et al. to the disclosed system of Poole et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to modify deep learning models so they can be effectively trained on data with high level of label noise by adding a noise layer [corresponds to injecting noise] (Sukhbaatar et al. Fig. 1 and pg. 1 Section 1).
Regarding Claim 16,
Claim 16 is directed to the storage medium of the steps recited in Claim 1. Therefore. Claim 16 is rejected under the same rationale as claim 1.
Poole et al. further teaches a non-transitory, tangible, computer-readable storage medium containing a program of instructions that causes a computer learning system comprising a data processing system that includes a hardware processor running the program of instructions to estimate parameters and states of a stochastic or uncertain system by (pg. 5 Section 4: “All experiments were run in Python using the Pylearn21 framework on a single Intel Xeon machine with an NVIDIA GTX 660 GPU” teaches a learning computer system that includes a GPU (hardware processor) and Python instructions; pg. 2 Section 2.2: “We call these models noisy autoencoders (NAEs) as their hidden representations are stochastic, and no longer a deterministic function of the input...When using the NAE to extract features or perform denoising on testing data we can compute the expectation of the noisy hidden activation or reconstruction by sampling from the NAE...we can approximate the expectation by scaling each of the corrupted variables by their expectation as in dropout” teaches approximating the expectation of the noisy hidden activations (estimating parameters and states) of the noisy autoencoders (stochastic system)).
Regarding Claim 17,
Poole et al. in view of Sukhbaatar et al. teaches the storage medium of claim 16. Claim 17 is directed to the storage medium of the steps recited in Claim 2. Therefore. Claim 17 is rejected under the same rationale as claim 2.
Regarding Claim 19,
Poole et al. in view of Sukhbaatar et al. teaches the storage medium of claim 17. Claim 19 is directed to the storage medium of the steps recited in Claim 4. Therefore. Claim 19 is rejected under the same rationale as claim 4.
Regarding Claim 20,
Poole et al. in view of Sukhbaatar et al. teaches the storage medium of claim 16. Claim 20 is directed to the storage medium of the steps recited in Claim 5. Therefore. Claim 20 is rejected under the same rationale as claim 5.
Regarding Claim 22,
Poole et al. in view of Sukhbaatar et al. teaches the storage medium of claim 16. Claim 22 is directed to the storage medium of the steps recited in Claim 7. Therefore. Claim 22 is rejected under the same rationale as claim 7.
Regarding Claim 23,
Poole et al. teaches a non-transitory, tangible, computer-readable storage medium containing a program of instructions that causes a computer learning system comprising a data6 Serial No. 14/803797 Atty. Dkt. No. USC0211PUSP Reply to Office Action of February 15, 2018 2014-330-02 processing system pg. 5 Section 4: “All experiments were run in Python using the Pylearn21 framework on a single Intel Xeon machine with an NVIDIA GTX 660 GPU” teaches a learning computer system that includes a GPU (hardware processor) and Python instructions; pg. 2 Section 2.2: “We call these models noisy autoencoders (NAEs) as their hidden representations are stochastic, and no longer a deterministic function of the input...When using the NAE to extract features or perform denoising on testing data we can compute the expectation of the noisy hidden activation or reconstruction by sampling from the NAE...we can approximate the expectation by scaling each of the corrupted variables by their expectation as in dropout” teaches approximating the expectation of the noisy hidden activations (estimating parameters and states) of the noisy autoencoders (stochastic system)):
receiving data from a user or other source (pg. 5 Section 4.1: “In our first experiment, we evaluated the effect of dropout noise on the generalization performance of a noisy autoencoder. We trained two NAEs on 12x12 patches drawn from the van Hateren natural image dataset” teaches receiving input data from an image dataset (other source)); 
processing only a portion of the received data through layers of processing units, thereby generating processed data (Fig. 1 teaches processing received input data through a noisy autoencoder model containing several processing units, thus generating processed data; pg. 5 Section 4.1: “In our first experiment, we evaluated the effect of dropout noise on the generalization performance of a noisy autoencoder. We trained two NAEs on 12x12 patches drawn from the van Hateren natural image dataset” teaches receiving only a portion of input data from an image dataset (other source) in which the portion contains the “12x12 patches drawn from the van Hateren natural image dataset”); 
...comparing the output signals with reference signals to generate error signals (pg. 2 Section 2.1: “An autoencoder is a type of one layer neural network that is trained to reconstruct its inputs...The composition of the encoder and decoder yield the reconstruction function: r(x) = g(f (x)). The typical training criterion for autoencoders is minimizing the reconstruction error,                         
                            
                                
                                    ∑
                                    
                                        x
                                        ∈
                                        X
                                    
                                
                                
                                    L
                                    (
                                    x
                                    ,
                                     
                                    r
                                    (
                                    x
                                    )
                                    )
                                
                            
                        
                     with respect to some loss L, typically either squared error or the binary cross-entropy” teaches the reconstruction error of an autoencoder, which is a type of neural network. The calculation of the error is with respect to the loss (L) between x (reference) and r(x) (reconstruction output), which corresponds to comparing reference signal with output signal. Please see reconstruction error                         
                            
                                
                                    ∑
                                    
                                        x
                                        ∈
                                        X
                                    
                                
                                
                                    L
                                    (
                                    x
                                    ,
                                     
                                    r
                                    (
                                    x
                                    )
                                    )
                                
                            
                        
                    . pg. 5 Section 4: “All experiments were run in Python using the Pylearn21 framework on a single Intel Xeon machine with an NVIDIA GTX 660 GPU” teaches a learning computer system that includes a GPU (hardware processor));
 sending and processing the error signals back through the layers of processing units (pg. 5 Section 4.3: “To better evaluate the impact of hidden unit input and activation noise on NAE classification performance, we trained larger models...These models were used as initialization for a MLP that was trained with standard backpropagation” teaches backpropagation, which corresponds to sending the error signals back through layers of processing units);  
generating random, chaotic, fuzzy, or other numerical perturbations of the portion of the received data, the processed data, or the output signals (pg. 2 Section 2.2: “We parameterize the noise in the NAE as a tuple (                        
                            
                                
                                    ϵ
                                
                                
                                    I
                                
                            
                            ,
                             
                            
                                
                                    ϵ
                                
                                
                                    H
                                
                            
                            ,
                             
                            
                                
                                    ϵ
                                
                                
                                    Z
                                
                            
                        
                    ) that characterizes the distribution of the noises corrupting the input, hidden unit inputs, and hidden activations respectively (see Figure 1)” teaches generating noise (numerical perturbations) of the input data (received data) and data processed through hidden units (processed data)); 
estimating the parameters and states of the stochastic or uncertain system using the portion of the received data, the numerical perturbations, and previous parameters and states of the stochastic or uncertain system (Fig. 1 and pg. 2 Section 2.2: “We call these models noisy autoencoders (NAEs) as their hidden representations are stochastic, and no longer a deterministic function of the input...When using the NAE to extract features or perform denoising on testing data we can compute the expectation of the noisy hidden activation or reconstruction by sampling from the NAE...we can approximate the expectation by scaling each of the corrupted variables by their expectation as in dropout” teach approximating the expectation of the noisy hidden activations (estimating parameters and states) of the noisy autoencoders (stochastic system) using input data (received data), noises (numerical perturbations) corrupting the inputs, hidden inputs, and hidden activations, and previous parameters and states (pg. 5 Section 4.3 teaches backpropagation));
and determining whether the generated numerical perturbations satisfy a condition; if the numerical perturbations satisfy the condition, injecting the numerical perturbations into the estimated parameters or states, the portion of the received data, the processed data, the masked or filtered data, or the processing units (pg. 4 Section 3.1: “We cannot directly relate this penalty to a form of noise, but we can recover a penalty that encourages sparsity on hidden unit activations. If we inject additive Gaussian noise on the activations of the hidden units with variance equal to the uncorrupted hidden unit activation then the marginalized noise penalty becomes...” teaches injecting additive Gaussian noise (numerical perturbation) to the activations of the hidden units (processing units) to encourage sparsity after determining the variance of the Gaussian noise on the activations of the hidden units equal to the uncorrupted hidden unit activation (satisfies a criterion)).
Poole et al. teaches noise injection in the internal representations of a deep network (pg. 7-8 Section 5), but Poole et al. does not appear to explicitly teach processing the masked or filtered data to produce one or more intermediate and output signals.
However, Sukhbaatar et al. teaches processing the masked or filtered data to produce one or more intermediate and output signals (pg. 5-6 Section 4: “In this section, we empirically examine the robustness of deep networks with and without noise modeling. We experiment on several different image classification datasets with label noise. As the base model, we use convolutional deep networks because they produce state-of-art performance on many image classification tasks” and pg. 6 Section 4.1: “We use a publicly available fast GPU code1 for training deep networks. As the base model, we use their “18% model” with three convolutional layers (layers-18pct.cfg) for both SVHN and CIFAR-10 experiments” teach convolutional neural network with noise modeling that contains three convolutional layers, which corresponds to applying filters to the processed data using convolutional processing since a convolutional neural network contains convolutional filters; each of the three convolutional layers produces intermediate and output signals to the next layer).
Poole et al. and Sukhbaatar et al. are analogous art because they are directed to analysis of noise injection in neural network modeling.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate processing the masked or filtered data to produce one or more intermediate and output signals as taught by Sukhbaatar et al. to the disclosed system of Poole et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to modify deep learning models so they can be effectively trained on data with high level of label noise by adding a noise layer [corresponds to injecting noise] (Sukhbaatar et al. Fig. 1 and pg. 1 Section 1).
Regarding Claim 24,
Poole et al. in view of Sukhbaatar et al. teaches the storage medium of claim 23.
Sukhbaatar et al. further teaches wherein the program of instructions causes the computer learning system to apply masks or filters to the processed data using convolutional processing (pg. 5-6 Section 4: “In this section, we empirically examine the robustness of deep networks with and without noise modeling. We experiment on several different image classification datasets with label noise. As the base model, we use convolutional deep networks because they produce state-of-art performance on many image classification tasks” and pg. 6 Section 4.1: “We use a publicly available fast GPU code1 for training deep networks. As the base model, we use their “18% model” with three convolutional layers (layers-18pct.cfg) for both SVHN and CIFAR-10 experiments” teach convolutional neural network with noise modeling that contains three convolutional layers, which corresponds to applying filters to the processed data using convolutional processing since a convolutional neural network contains convolutional filters; each of the three convolutional layers produces intermediate and output signals to the next layer).
Poole et al. and Sukhbaatar et al. are analogous art because they are directed to analysis of noise injection in neural network modeling.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein the system applies masks or filters to the processed data using convolutional processing as taught by Sukhbaatar et al. to the disclosed system of Poole et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to modify deep learning models so they can be effectively trained on data with high level of label noise by adding a noise layer [corresponds to injecting noise] (Sukhbaatar et al. Fig. 1 and pg. 1 Section 1).
Regarding Claim 26,
Poole et al. in view of Sukhbaatar et al. teaches the storage medium of claim 23.
Poole et al. further teaches wherein the injection improves the accuracy of the learning computer system (pg. 7 Section 4.5: “We have shown that different types of noise can be used to regularize hidden representations and improve classification performance on MNIST and CIFAR-10”).
Regarding Claim 27,
Poole et al. in view of Sukhbaatar et al. teaches the storage medium of claim 23.
Poole et al. further teaches wherein the program of instructions causes the computer learning system to inject the random, chaotic, fuzzy, or other numerical perturbations into the portion of the received data (pg. 4 Section 3.1: “We cannot directly relate this penalty to a form of noise, but we can recover a penalty that encourages sparsity on hidden unit activations. If we inject additive Gaussian noise on the activations of the hidden units with variance equal to the uncorrupted hidden unit activation then the marginalized noise penalty becomes...” teaches injecting additive Gaussian noise (numerical perturbation) to the activations of the hidden units (processing units).
Regarding Claim 28,
Poole et al. in view of Sukhbaatar et al. teaches the storage medium of claim 23.
Sukhbaatar et al. further teaches herein the program of instructions causes the computer learning system to unconditionally inject noise or chaotic or other perturbations into the estimated parameters or states, the portion of the received data, the processed data, the masked or filtered data, or the processing units (pg. 6 Section 4.1: “We use a publicly available fast GPU code1 for training deep networks. As the base model, we use their “18% model” with three convolutional layers (layers-18pct.cfg) for both SVHN and CIFAR-10 experiments” teaches learning computer system; pg. 6 Fig. 4.1: “We synthesize noisy data from clean data by deliberately changing some of the labels. Original label i is randomly changed to j with fixed probability” teaches randomly injecting noise (unconditionally injecting noise) into the received data).
Poole et al. and Sukhbaatar et al. are analogous art because they are directed to analysis of noise injection in neural network modeling.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein the learning computer system unconditionally injects noise or chaotic or other perturbations into the estimated parameters or states, the received data, the processed data, the masked or filtered data, or the processing units as taught by Sukhbaatar et al. to the disclosed system of Poole et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to modify deep learning models so they can be effectively trained on data with high level of label noise by adding a noise layer [corresponds to injecting noise] (Sukhbaatar et al. Fig. 1 and pg. 1 Section 1).
Regarding Claim 30,
Poole et al. in view of Sukhbaatar et al. teaches the storage medium of claim 28.
Sukhbaatar et al. further teaches wherein the unconditional injection improves the accuracy of the learning computer system (pg. 8 Section 4.3: “the absolute performance achieved with our techniques and the 1.4M additional noisy data equals the performance when training on an additional 15M clean images from ImageNet 2011 (row 3). This demonstrates that noisy data can be very beneficial for training”).
Poole et al. and Sukhbaatar et al. are analogous art because they are directed to analysis of noise injection in neural network modeling.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein the unconditional injection improves the accuracy of the learning computer system as taught by Sukhbaatar et al. to the disclosed system of Poole et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to modify deep learning models so they can be effectively trained on data with high level of label noise by adding a noise layer [corresponds to injecting noise] (Sukhbaatar et al. Fig. 1 and pg. 1 Section 1).

Claims 3, 14, 18, and 29 are rejected under 35 U.S.C. 103 as being unpatentable over Poole et al. (“Analyzing noise in autoencoders and deep networks”) in view of Sukhbaatar et al. (“Learning from Noisy Labels with Deep Neural Networks”) and further in view of Hollis et al. (“A Neural Network Learning Algorithm Tailored for VLSI Implementation”).
Regarding Claim 3, 
Poole et al. in view of Sukhbaatar et al. teaches the learning computer system of claim 2.
Poole et al. in view of Sukhbaatar et al. does not appear to explicitly teach wherein the unconditional injection speeds up learning by the learning computer system.
Hollis et al. teaches wherein the unconditional injection speeds up learning by the learning computer system (pg. 786 ¶-1: “Then the weights feeding forward from a given input to each of the hidden layer neurons are simultaneously perturbed (↑wji)3” teaches the unconditional injection of perturbation in the context of the “Chain-rule” perturbation algorithm (CHRP) because the weight values are perturbed, which corresponds to injection of perturbation into estimated parameters; as there is no condition identified in order to allow the perturbation to take place and all of the hidden layers neurons' weights are perturbed (which renders there is no special condition to be fulfilled), the injection of perturbation in this context is considered unconditional;; pg. 786 ¶-2: “In large networks, CHRP can speed up hidden-layer weight processing over the strictly serial algorithm by a factor approaching the number of hidden layer neurons” teaches the CHRP algorithm, which includes the unconditional injection of perturbation, speeds up the hidden-layer weight processing process, which corresponds to a process of learning in the context of neural network; pg. 784 Fig. 1 shows an implementation of the neural network on an analog computer architecture, which reasonably corresponds to a learning computer system).
Poole et al., Sukhbaatar et al., and Hollis et al. are analogous art because they are directed to analysis of adding noise to modeling techniques.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein the unconditional injection speeds up learning by the learning computer system as taught by Hollis et al. to the disclosed system of Poole et al. in view of Sukhbaatar et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to provide a method that speeds up hidden-layer weight processing (Hollis et al. pg. 786 ¶-2).
Regarding Claim 14, 
Poole et al. in view of Sukhbaatar et al. teaches the learning computer system of claim 13.
Poole et al. in view of Sukhbaatar et al. does not appear to explicitly teach wherein the unconditional injection speeds up learning by the learning computer system
However, Hollis et al. teaches wherein the unconditional injection speeds up learning by the learning computer system (pg. 786 ¶-1: “Then the weights feeding forward from a given input to each of the hidden layer neurons are simultaneously perturbed (↑wji)3” teaches the unconditional injection of perturbation in the context of the “Chain-rule” perturbation algorithm (CHRP) because the weight values are perturbed, which corresponds to injection of perturbation into estimated parameters; as there is no condition identified in order to allow the perturbation to take place and all of the hidden layers neurons' weights are perturbed (which renders there is no special condition to be fulfilled), the injection of perturbation in this context is considered unconditional;; pg. 786 ¶-2: “In large networks, CHRP can speed up hidden-layer weight processing over the strictly serial algorithm by a factor approaching the number of hidden layer neurons” teaches the CHRP algorithm, which includes the unconditional injection of perturbation, speeds up the hidden-layer weight processing process, which corresponds to a process of learning in the context of neural network; pg. 784 Fig. 1 shows an implementation of the neural network on an analog computer architecture, which reasonably corresponds to a learning computer system).
Poole et al., Sukhbaatar et al., and Hollis et al. are analogous art because they are directed to analysis of adding noise to modeling techniques.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein the unconditional injection speeds up learning by the learning computer system as taught by Hollis et al. to the disclosed system of Poole et al. in view of Sukhbaatar et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to provide a method that speeds up hidden-layer weight processing (Hollis et al. pg. 786 ¶-2).
Regarding Claim 18,
Poole et al. in view of Sukhbaatar et al. teaches the storage medium of claim 17. Claim 18 is directed to the storage medium of the steps recited in Claim 3. Therefore. Claim 18 is rejected under the same rationale as claim 3.
Regarding Claim 29, 
Poole et al. in view of Sukhbaatar et al. teaches the storage medium of claim 28.
Poole et al. in view of Sukhbaatar et al. does not appear to explicitly teach wherein the unconditional injection speeds up learning by the learning computer system.
However, Hollis et al. teaches wherein the unconditional injection speeds up learning by the learning computer system (pg. 786 ¶-1: “Then the weights feeding forward from a given input to each of the hidden layer neurons are simultaneously perturbed (↑wji)3” teaches the unconditional injection of perturbation in the context of the “Chain-rule” perturbation algorithm (CHRP) because the weight values are perturbed, which corresponds to injection of perturbation into estimated parameters; as there is no condition identified in order to allow the perturbation to take place and all of the hidden layers neurons' weights are perturbed (which renders there is no special condition to be fulfilled), the injection of perturbation in this context is considered unconditional;; pg. 786 ¶-2: “In large networks, CHRP can speed up hidden-layer weight processing over the strictly serial algorithm by a factor approaching the number of hidden layer neurons” teaches the CHRP algorithm, which includes the unconditional injection of perturbation, speeds up the hidden-layer weight processing process, which corresponds to a process of learning in the context of neural network; pg. 784 Fig. 1 shows an implementation of the neural network on an analog computer architecture, which reasonably corresponds to a learning computer system).
Poole et al., Sukhbaatar et al., and Hollis et al. are analogous art because they are directed to analysis of adding noise to modeling techniques.

One of ordinary skill in the arts would have been motivated to make this modification in order to provide a method that speeds up hidden-layer weight processing (Hollis et al. pg. 786 ¶-2).

Claims 6, 10, 21, and 25 are rejected under 35 U.S.C. 103 as being unpatentable over Poole et al. (“Analyzing noise in autoencoders and deep networks”) in view of Sukhbaatar et al. (“Learning from Noisy Labels with Deep Neural Networks”) and further in view of Osoba et al. (“Noise Benefits in the Expectation-Maximization Algorithm: NEM Theorems and Models”; Item 10 under “Non-Patent Literature Documents” of IDS filed on 04/12/2016).
Regarding Claim 6, 
Poole et al. in view of Sukhbaatar et al. teaches the learning computer system of claim 1.
Poole et al. in view of Sukhbaatar et al. does not appear to explicitly teach wherein the injection speeds up learning by the learning computer system.
However, Osoba et al. teaches wherein the injection speeds up learning by the learning computer system (Abstract: “Additive noise speeds the average convergence of the EM algorithm to a local maximum of the likelihood surface when the noise condition holds” teaches the addition of additive noise (corresponds to injection) speeds up the convergence of the EM algorithm (corresponds to learning)).
Poole et al. in view of Sukhbaatar et al., and Osoba et al. are analogous art because they are directed to analysis of adding noise to modeling techniques.

One of ordinary skill in the arts would have been motivated to make this modification in order to speed up the average convergence of a modeling algorithm with the addition of noise (Osoba et al. Abstract).
Regarding Claim 10, 
Poole et al. in view of Sukhbaatar et al. teaches the learning computer system of claim 8.
Poole et al. in view of Sukhbaatar et al. does not appear to explicitly teach wherein the injection speeds up learning by the learning computer system.
However, Osoba et al. teaches wherein the injection speeds up learning by the learning computer system (Abstract: “Additive noise speeds the average convergence of the EM algorithm to a local maximum of the likelihood surface when the noise condition holds” teaches the addition of additive noise (corresponds to injection) speeds up the convergence of the EM algorithm (corresponds to learning)).
Poole et al. in view of Sukhbaatar et al., and Osoba et al. are analogous art because they are directed to analysis of adding noise to modeling techniques.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein the injection speeds up learning by the learning computer system as taught by Osoba et al. to the disclosed system of Poole et al. in view of Sukhbaatar et al. 
One of ordinary skill in the arts would have been motivated to make this modification in order to speed up the average convergence of a modeling algorithm with the addition of noise (Osoba et al. Abstract).

Regarding Claim 21,
Poole et al. in view of Sukhbaatar et al. teaches the storage medium of claim 16. Claim 21 is directed to the storage medium of the steps recited in Claim 6. Therefore. Claim 21 is rejected under the same rationale as claim 6.
Regarding Claim 25, 
Poole et al. in view of Sukhbaatar et al. teaches the storage medium of claim 23.
Poole et al. in view of Sukhbaatar et al. does not appear to explicitly teach wherein the injection speeds up learning by the learning computer system.
However, Osoba et al. teaches wherein the injection speeds up learning by the learning computer system (Abstract: “Additive noise speeds the average convergence of the EM algorithm to a local maximum of the likelihood surface when the noise condition holds” teaches the addition of additive noise (corresponds to injection) speeds up the convergence of the EM algorithm (corresponds to learning)).
Poole et al. in view of Sukhbaatar et al., and Osoba et al. are analogous art because they are directed to analysis of adding noise to modeling techniques.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein the injection speeds up learning by the learning computer system as taught by Osoba et al. to the disclosed system of Poole et al. in view of Sukhbaatar et al. 
One of ordinary skill in the arts would have been motivated to make this modification in order to speed up the average convergence of a modeling algorithm with the addition of noise (Osoba et al. Abstract).



Response to Arguments
Applicant's arguments filed on 09/28/2020 with respect to the 35 U.S.C. 103 rejection to claims 1, 2, 4, 5, 7-9, 11-13, 15-17, 19, 20, 22-24, 26-28, and 30 have been fully considered but they are not persuasive. 
Applicant asserts that “[t]he Office Action incorrectly identifies a different criterion as teaching the condition of the present claims. The Office Action equates the alleged condition of Poole to the condition recited in the claims and described in the specification. Simplistically, the Office Action argues condition A is equivalent to condition B. But that's just not true (i.e., condition A is equivalent to condition A)” (Remarks, pg. 11).
Examiner’s Response:
The Examiner respectfully disagrees. The limitations of claim 1 in question are the following: “determines whether the generated numerical perturbations satisfy a condition; and if the numerical perturbations satisfy the condition, injects the numerical perturbations...” According to MPEP 2111, “The broadest reasonable interpretation does not mean the broadest possible interpretation. Rather, the meaning given to a claim term must be consistent with the ordinary and customary meaning of the term (unless the term has been given a special definition in the specification), and must be consistent with the use of the claim term in the specification and drawings. Further, the broadest reasonable interpretation of the claims must be consistent with the interpretation that those skilled in the art would reach.” 
Specification [0039] recites the following, “The second theoretical result is that carefully chosen and injected noise may speed up the EM algorithm on average as the algorithm iteratively climbs the nearest hill of likelihood. This result is stated below as Theorem 1. Below also shows that this guaranteed EM noise-boost may give rise to a simple noise-space hyperplane condition for training CNNs with backpropagation: Noise chosen from above the 

Applicant asserts that “[t]he assertion in the Office Action that the algorithmic structure of Poole using a different condition and combined with the mask from the different algorithmic structure of Sukhbaatar render the current invention obvious is like stating that a handle renders all other handles obvious, despite the context or features of the actual handle. Clearly, that is not true. Any algorithmic structure cannot be injected with noise that satisfies any condition to improve the speed and accuracy of a learning computer” (Remarks, pg. 11); and “A condition not carefully chosen for the algorithmic structure of the current invention will not produce the noise injection benefits witnessed in the specification. A condition not carefully chosen is effectively no different than blind noise” (Remarks, pg. 12). 
Examiner’s Response:
The Examiner respectfully disagrees. Regarding claim interpretation, MPEP 2173.01 provides the following:
“The presumption that a term is given its ordinary and customary meaning may be rebutted by the applicant by clearly setting forth a different definition of the term in the specification. In re Morris, 127 F.3d 1048, 1054, 44 USPQ2d 1023, 1028 (Fed. Cir. 1997) (the USPTO looks to the ordinary use of the claim terms taking into account definitions or other "enlightenment" 

As discussed above, the Specification does not set forth a special definition for the term “condition” recited in claim 1. As set forth in MPEP 2173.01, Examiner is cautioned against reading limitations into a claim from description in the Specification. Applicant’s argument that the condition “carefully chosen,” as described in the Specification, would be the only condition that would read on the claim language requires the importation of a feature from the Specification, which was specifically cautioned against in MPEP 2173.01. Applicant’s analogy of “stating that a handle renders all other handles obvious, despite the context or features of the actual handle” is not applicable in view of present claim 1. Claim 1 requires a “condition” and determining whether the “condition” is satisfied. Claim 1 does not recite specific “context or features” of the “condition.” The Office Action does not suggest that the condition in the Poole et al. reference renders “all other” conditions obvious, it merely points out that the reference teaches a “condition,” which reads on the claim recitation of “condition” in claim 1.
Further, as put forth in the prior art rejection to claim 1, Poole et al. is relied upon in teaching the features associated with injecting perturbation in the following limitations: “determines whether the generated numerical perturbations satisfy a condition; and if the numerical perturbations satisfy the condition, injects the numerical perturbations...”. The Sukhbaatar et al. reference teaches “applies masks or filters to the processed data using convolutional processing; processes the masked or filtered data to produce one or more intermediate and output signals.”

Applicant asserts that “the Office Action asserts "minimizing error" includes comparing an output signal to a reference signal. However, this is an improper inherency argument. MPEP 2112. Minimizing error does not necessarily include comparing an output signal to a reference signal. Error can be minimized in numerous ways and not all of them include comparing an output signal with a reference signal. The Office Action's over generalization may be based on the use of the term "with respect to some loss L." However, this again does not disclose, teach or suggest a learning computer system with a hardware processor configured to compare an output signal with an error signal."” (Remarks, pg. 13).
Examiner’s Response:
The Examiner respectfully disagrees. Poole et al. in pg. 2 Section 2.1: “An autoencoder is a type of one layer neural network that is trained to reconstruct its inputs...The composition of the encoder and decoder yield the reconstruction function: r(x) = g(f (x)). The typical training criterion for autoencoders is minimizing the reconstruction error,                 
                    
                        
                            ∑
                            
                                x
                                ∈
                                X
                            
                        
                        
                            L
                            (
                            x
                            ,
                             
                            r
                            (
                            x
                            )
                            )
                        
                    
                
             with respect to some loss L, typically either squared error or the binary cross-entropy” teaches the reconstruction error of an autoencoder, which is a type of neural network. The calculation of the error is with respect to the loss (L) between x (reference) and r(x) (reconstruction output), which corresponds to comparing reference signal with output signal. Please see reconstruction error                 
                    
                        
                            ∑
                            
                                x
                                ∈
                                X
                            
                        
                        
                            L
                            (
                            x
                            ,
                             
                            r
                            (
                            x
                            )
                            )
                        
                    
                
            .  Poole et al. in pg. 5 Section 4: “All experiments were run in Python using the Pylearn21 framework on a single Intel Xeon machine with an NVIDIA GTX 660 GPU” teaches a learning computer system that includes a GPU (hardware processor).

Applicant asserts that “The Office Action proposes combining the autoencoders of Poole with the convolutional deep network of Sukhbaatar. (Poole, page 2; Sukhbaatar, page 5-6). A person of ordinary skill in the art would understand that neural networks are not plug and play devices with interchangeable components but complex systems of integrated models and algorithms. Accordingly, the autoencoders of Poole could not predictably be modified to incorporate the convolutional processing (i.e., mask and filters) of Sukhbaatar” (Remarks, pg. 13-14).
Examiner’s Response:
The Examiner respectfully disagrees. According to MPEP 2144, 
“The strongest rationale for combining references is a recognition, expressly or impliedly in the prior art or drawn from a convincing line of reasoning based on established scientific principles or legal precedent, that some advantage or expected beneficial result would have been produced by their combination. In re Sernaker, 702 F.2d 989, 994-95, 217 USPQ 1, 5-6 (Fed. Cir. 1983). See also Dystar  Textilfarben  GmbH & Co. Deutschland KG  v. C.H. Patrick, 464 F.3d 1356, 1368, 80 USPQ2d 1641, 1651 (Fed. Cir. 2006) ("Indeed, we have repeatedly held that an implicit motivation to combine exists not only when a suggestion may be gleaned from the prior art as a whole, but when the ‘improvement’ is technology-independent and the combination of references results in a product or process that is more desirable, for example because it is stronger, cheaper, cleaner, faster, lighter, smaller, more durable, or more efficient. Because the desire to enhance commercial opportunities by improving a product or process is universal—and even common-sensical—we have held that there exists in these situations a motivation to combine prior art references even absent any hint of suggestion in the references themselves.").” (emphasis added).

In Poole et al., the Noisy autoencoder (NAE) model employs a multi-layer neural network model (see Fig. 1). Poole et al. further discusses using the neural network to produce classifications in pg. 5 Section 4.2: “We also evaluated the classification error for these different models by using them to initialize a multilayer perceptron with a softmax classifier on top of the learned hidden representation.” Moreover, Sukhbaatar et al. in pg. 2 Section 3: “In this paper, we consider two approaches to make an existing classification model, which we call the base model, robust against noisy labels...we add an additional layer to the model” teaches making an existing classification model more robust against noisy labels by adding an additional layer. Therefore, it would reasonable to view the teachings of Sukhbaatar et al. as improving the base neural network model of Poole et al. such that the model would be robust against noisy labels. Moreover, Poole et al. teaches noise injection in the internal representations of a deep network (pg. 7-8 Section 5), therefore it would be reasonable to incorporate the convolutional processing See MPEP 2144.

Applicant asserts that “For example, Sukhbaatar suggest using its teachings with models using cross entropy cost... But the Office Action, contrary to teaching of Sukhbaatar, proposes combining the models of Sukhbaatar with autoencoders using a squared error loss-not a cross entropy cost... Therefore, the Office Action fails to provide an adequate reason to combine and explain why, contrary to the teachings of Sukhbaatar, a person of skill would combine the references and thus, the claims are allowable over Poole in view of Sukhbaatar” (Remarks, pg. 14-15).
Examiner’s Response:
The Examiner respectfully disagrees. Poole et al. in pg. 2 Section 2.1: “An autoencoder is a type of one layer neural network that is trained to reconstruct its inputs...The composition of the encoder and decoder yield the reconstruction function: r(x) = g(f (x)). The typical training criterion for autoencoders is minimizing the reconstruction error,                 
                    
                        
                            ∑
                            
                                x
                                ∈
                                X
                            
                        
                        
                            L
                            (
                            x
                            ,
                             
                            r
                            (
                            x
                            )
                            )
                        
                    
                
             with respect to some loss L, typically either squared error or the binary cross-entropy” specifically identifies that the loss could be either a squared or entropy-based loss. The fact that in one experiment Poole et al. specifically uses squared error does not conflict with Sukhbaatar et al.’s use of entropy error since Poole et al. identified that either type of loss measures may be used. 




Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YING YU CHEN whose telephone number is (571)270-1484.  The examiner can normally be reached on Monday-Friday 7:30 am-5:00 pm (EST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on (571) 272-7796.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/Y.C./Examiner, Art Unit 2125                                                                                                                                                                                                        

/KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125