Detailed Action

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim 1-20 are pending.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


	Claim 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

	Regarding claim 1, 
	2A Prong 1: The limitation of dumping parameters used to train the trained classifier to obtain dumped parameters is a mental process, as it merely recites a process of updating parameters of a classifier while training it. The limitation of recording change rates of each of the dumped parameters is a mathematical concept, because the limitation merely recites calculating difference between weight values before training and weight values after training. 
The limitation of creating without training, a new classifier from at least one other classifier in the ensemble by calculating the dumped parameters plus change rates to obtain sums multiplied by random numbers for each local prediction by the trained classifier is a mathematical concept, as the limitation merely recites process of making more classifiers by multiplying random numbers to the parameters of classifiers.
	2A Prong 2: The judicial exception is not integrated into a practical application. In particular, the claim recites the additional element – processing the methods by the processor, which is recited at a high-level of generality such that it amounts no more than mere instructions to apply the exception using a generic computer component. 
	2B: The claim does not include additional elements that are sufficient to amount to significantly
more than the judicial exception. The limitation of training a classifier from among the ensemble to obtain a trained classifier and given machine-learning-based classifier merely says which particular technological field or environment the abstract idea is performed in (MPEP 2106.05(h)).
Regarding claim 11, the limitation of the computer program product comprising a non-transitory computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computer to cause the computer to perform a method merely recites executing a generic computer program on a generic computer component. Claim 11 is a computer program product claim having similar limitation to a method claim 1 above. Therefore, it is rejected with same rationale as claim 1 above.
Regarding claim 20, the limitation of the system comprising: a memory for storing program code; and a hardware processor for running the program code merely recites generic computer components. Claim 20 is a computer processing system claim having similar limitation to a method claim 1 above. Therefore, it is rejected with same rationale as claim 1 above.

Regarding claim 2, the limitation of wherein the dumped parameters used to train the trained classifier are connection weights in the given machine-learning-based classifier merely says which particular technological field or environment the abstract idea is performed in (MPEP 2106.05(h)).
The judicial exception is not integrated into a practical application. In particular the claim does not include any additional element that amount to be significant more than the abstract idea. 
Claim 12 is a computer program product claim having similar limitation to a method claim 2 above. Therefore, it is rejected with same rationale as claim 2 above.

Regarding claim 3, the limitation of wherein said recording step is performed responsive to said dumping step is a mental process, as it recites performing a calculation step following another step.
Claim 13 is a computer program product claim having similar limitation to a method claim 3 above. Therefore, it is rejected with same rationale as claim 3 above.

Regarding claim 4, the limitation of wherein the new classifier is created for inference use in an absence of training the new classifier is a mental process, because the limitation merely recites creating a new classifier without training.
The judicial exception is not integrated into a practical application. In particular the claim does not include any additional element that amount to be significant more than the abstract idea.
Claim 14 is a computer program product claim having similar limitation to a method claim 4 above. Therefore, it is rejected with same rationale as claim 4 above.

Regarding claim 5, the limitation of wherein said creating step mutates the dumped parameters based on the recorded change rate to avoid prediction accuracy degradation by the new classifier is a mathematical concept as it merely recites multiplying and adding change rate to create a new classifier.
The judicial exception is not integrated into a practical application. In particular the claim does not include any additional element that amount to be significant more than the abstract idea.
Claim 15 is a computer program product claim having similar limitation to a method claim 5 above. Therefore, it is rejected with same rationale as claim 5 above.

Regarding claim 6, the limitation of wherein the random numbers are taken from a limited range of random numbers is a mathematical process.
The judicial exception is not integrated into a practical application. In particular the claim does not include any additional element that amount to be significant more than the abstract idea.
Claim 16 is a computer program product claim having similar limitation to a method claim 6 above. Therefore, it is rejected with same rationale as claim 6 above.

Regarding claim 7, the limitation of wherein the change rates are from a start time to an end time of a final training epoch merely specifies the range of time where to extract the change rate, which is a particular technological field or environment the abstract idea is performed in (MPEP 2106.05(h)).
The judicial exception is not integrated into a practical application. In particular the claim does not include any additional element that amount to be significant more than the abstract idea.
Claim 17 is a computer program product claim having similar limitation to a method claim 7 above. Therefore, it is rejected with same rationale as claim 7 above.

Regarding claim 8, 
2A Prong 1: the limitation of wherein the intermediate training epoch immediately precedes the final training epoch in a sequence of training epochs including the intermediate training epoch and the final training epoch is a mental process, as it merely recites a training epoch comes before the final training epoch. 
2A Prong 2: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
2B: The limitation of wherein the change rates are from an intermediate training epoch to a final training epoch is a particular technological field or environment the abstract idea is performed in (MPEP 2106.05(h)). 
Claim 18 is a computer program product claim having similar limitation to a method claim 8 above. Therefore, it is rejected with same rationale as claim 8 above.

Regarding claim 9, the limitation of wherein multiple random numbers are used to create the new classifier, each of the multiple random numbers corresponding to a respective different one of a plurality of machine-learning-based classifier layers in the given machine-learning-based classifier is a mathematical concept, as it merely recites a process of multiplying random numbers to each of the layers.
The judicial exception is not integrated into a practical application. In particular the claim does not include any additional element that amount to be significant more than the abstract idea.
Claim 19 is a computer program product claim having similar limitation to a method claim 9 above. Therefore, it is rejected with same rationale as claim 9 above.

Regarding claim 10, the limitation of wherein multiple random numbers are used to create the new classifiers, each of the multiple random numbers corresponding to a different one of the dumped parameters is a mathematical concept, as it merely recites using random numbers to modify each of the dumped parameters.
The judicial exception is not integrated into a practical application. In particular the claim does not include any additional element that amount to be significant more than the abstract idea.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

	Claim 1-20 are rejected under 35 U.S.C. 103 over Andoni (US 11,106,978 B2) in view of Maruyama (US 6067536 A), and further in view of Palmes (Palmes et al, 2005, “Mutation-Based Genetic Neural Network”).	
Regarding claim 1, Andoni teaches a computer-implemented method for reducing training costs for an ensemble of machine-learning-based classifiers ([Andoni, column 1, SUMMARY, line 55-59] “The present application describes automated model building systems and methods that utilize a genetic algorithm having variable topological parameters to generate and train a neural network in a manner that is applicable to multiple types of machine-learning problems”, [Andoni, column 5, line 34-43] “The input set 120 and the output set 130 may each include a plurality of models, where each model includes data representative of a neural network. For example, each model may specify a neural network by at least a neural network topology, a series of activation functions, and connection weights. The topology of a neural network may include a configuration of nodes of the neural network and connections between such nodes. The models may also be specified to include other parameters, including but not limited to bias values/functions and aggregation functions”, discloses ensemble of models), the method comprising: 
training, by a processor, a given machine-learning-based classifier from among the ensemble to obtain a trained classifier ([Andoni, column 14, line 54-59] “In the example of FIGS. 5 and 7, a single trainable model 122 is provided to the backpropagation trainer 180 and a single trained model 182 is received from the backpropagation trainer 180”, [Andoni, column 12, line 46-58; Fig 5] “The backpropagation trainer 180 may train connection weights of the trainable model 122 based on a portion of the input data set 102. When training is complete, the resulting trained model 182 may be received from the backpropagation trainer 180 and may be input into a subsequent epoch of the genetic algorithm”); 
dumping, by the processor, parameters used to train the trained classifier to obtain dumped parameters ([Andoni, column 8, line 23-29] “For example, the portion of the input data set 102 may be input into the trainable model 122, which may in turn generate output data. The input data set 102 and the output data may be used to determine an error value, and the error value may be used to modify connection weights of the model, such as by using gradient descent or another function”, dumping corresponds to the process of updating the weight); 
creating, by the processor without training, a new classifier from at least one other machine- learning-based classifier in the ensemble ([Andoni, column 18, line 16-27] “After the evolutionary weights 174 have been modified (to generate modified evolutionary weights 834), mutation operations 836 may be performed based on the modified evolutionary weights 834. For example, the mutation operations 836 may have a higher probability of increasing the number of nodes of a model by 2 and a lower probability of decreasing the number connections by 1, as compared to the mutation operations 830 based on the initial evolutionary weights 832. Performing the mutation operations 836 on the second plurality of models 810 may generate an output set of models for the second epoch. The output set includes a third plurality of models 820”, shows Andoni performing genetic operations (mutation) without training to generate the ensemble of models. [Andoni, Fig. 6] shows the diagram of Generated output set of N model, [Andoni, column 14, line 11-25] “The mutation operation 170 may thus be a random or pseudo-random biological operator or variable-probability biological operator that generates or contributes to a model of the output set 130 by mutating any aspect of a model of the input set 120 ... As another example, the mutation operation 170 may change the value(s) of one or more topological parameters to cause one or more activation functions, aggregation functions, bias values/functions, and/or or connection weights to be modified”, discloses the mutation can modify the connection weights of the network).
Andoni failed to teach recording, by the processor, change rates of each of the dumped parameters, and updating weight by calculating the dumped parameters plus change rates to obtain sums multiplied by random numbers for each local prediction by the trained classifier.
Maruyama teaches recording, by the processor, change rates of each of the dumped parameters ([Maruyama, column 14, line 27-40] “According to the neural network circuit of the present invention described above, even if the weight values of the whole neural network (initially learned weight values) are stored in the ROM, difference values between the initial weight values and the additionally learned weight values are stored in the difference value memory such as a RAM having a smaller size than that of the ROM”, [Maruyama, column 7, line 48-51] “wherein each numeral in parentheses represents the number of times. Assuming that the weight value Wi(n) is obtained by addition of an initial weight value and a difference value, the following equations are obtained: Wi(n)=Wi(0)+dWi(n) (4)”, [Maruyama, column 5, line 11-16] “FIG. 7(b) is a chart limiting a difference value dWij (n) between the initial weight value Wij (o) and the weight value obtained after additional learning by using the limiter according to the second embodiment of the present invention”, shows that the dWi is the difference value between initial weight and weight after training); 
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having both the teachings of Andoni and Maruyama, to use the process of recording change rate of parameters of Maruyama to implement the ensemble machine-learning based classifier method of Andoni. The suggestion and/or motivation for doing so is to improve the performance of entire model, as we can track the change of parameters by recording the change rates.
Andoni in view of Maruyama failed to teach updating weight by calculating the dumped parameters plus user defined parameter to obtain sums multiplied by random numbers for each local prediction by the trained classifier.
Palmes teaches updating weight by calculating the dumped parameters plus user defined parameter to obtain sums multiplied by random numbers for each local prediction by the trained classifier ([Palmes, page 590 bottom of right column -591 upper left column, V. MGNN ALGORITHM] “Algorithm 1 summarizes MGNN training algorithm. Initially (at iteration 0), MGNN generates a population of ANNs with all the vectors and matrices initialized to zero             
                P
                
                    
                        t
                    
                
                =
                
                    
                        n
                        e
                        
                            
                                t
                            
                            
                                1
                            
                            
                                t
                            
                        
                        ,
                        …
                        .
                        ,
                        n
                        e
                        
                            
                                t
                            
                            
                                μ
                            
                            
                                t
                            
                        
                    
                
            
           Next, each undergoes transformation through the mutation operator p(9)–(11) operating on the two threshold vectors             
                (
                
                    
                        θ
                    
                    
                        ω
                        1
                    
                
                ,
                 
                
                    
                        θ
                    
                    
                        ω
                        2
                    
                
                )
            
         and two connections weight matrices (W1 and W2). Then, MGNN evaluates each             
                n
                e
                
                    
                        t
                    
                    
                        i
                    
                
            
         performance using the fitness function (12). A new population P(t+1) is formed by selecting individuals with better fitness from P(t). From this new population, the processes of mutation, fitness evaluation, and selection are repeated until the stopping criterion is satisfied or the maximum number of generations is reached”, Palmes teaches generation of new classifiers without training. [Palmes, page 588, right column, first paragraph and equation (1)] “Common to these approaches is the use of Gaussian perturbation in the mutation operation to bring about changes in structure or weights of ANN             
                ω
                =
                ω
                +
                N
                
                    
                        0
                        ,
                        α
                        ϵ
                        
                            
                                μ
                            
                        
                    
                
                 
                ∀
                ω
                ∈
                μ
            
           (1)   where             
                N
                
                    
                        0
                        ,
                        α
                        ϵ
                        
                            
                                μ
                            
                        
                    
                
                 
            
        is the Gaussian perturbation with mean 0 and standard deviation             
                α
                ϵ
                
                    
                        μ
                    
                
            
        ,             
                ω
            
         is a weight, and             
                ϵ
                
                    
                        μ
                    
                
                 
            
        is an error function [e.g., mean-squared error (MSE)] which is scaled by the user-defined constant”, teaches the mutation operation with Gaussian perturbation. The Gaussian perturbation is the random numbers, [Palmes, page 588, 3rd paragraph] “Parametric mutation is carried out by perturbing ANN’s weights using Gaussian noise with the severity of mutation controlled by the annealing temperature … where U and N are uniform and Gaussian random variables, respectively”, the             
                α
                 
            
        determines the rate of change of the random variable, so it is a change rate).
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Andoni, Maruyama, and Palmes to use the process of updating weight by calculating the dumped parameters plus change rates to obtain sums multiplied by random numbers for each local prediction by the trained classifier of Palmes to implement the ensemble machine-learning based classifier method of Andoni and Maruyama. The suggestion and/or motivation for doing so is to improve the performance of entire model, as multiplying random number to the weights enables addition of new classifiers without training.
Neither Andoni, Maruyama, nor Palmes explicitly teaches using the change rate to update weights of a classifier, but it would have been obvious to a person of ordinary skill in art before the effective filling date of the claimed invention to replace the user-defined change rate             
                α
            
         of Palmes with the change rate of weights of Maruyama. The modification would have been obvious because using the difference of initial weight and weight after training to modify the next weight is a common practice in the art of updating neural network as shown in Maruyama. The suggestion and/or motivation to do so is to improve the accuracy of classifier after mutation.

	Regarding claim 11, Andoni in view of Maruyama, and further in view of Palmes teaches a computer program product for reducing training costs for an ensemble of machine-learning-based classifiers, the computer program product comprising a non-transitory computer readable storage medium having program instructions embodies therewith, the program instructions executable by a computer to cause the computer to perform a method ([Andoni, column 34, line 31-41] “Furthermore, the system may take the form of a computer program product on a computer-readable storage medium or device having computer-readable program code (e.g., instructions) embodied or stored in the storage medium or device. Any suitable computer-readable storage medium or device may be utilized, including hard disks, CD-ROM, optical storage devices, magnetic storage devices, and/or other storage media. As used herein, a “computer-readable storage medium” or “computer-readable storage device” is not a signal”). Claim 11 is a computer program product claim having similar limitation to a method claim 1 above. Therefore, it is rejected with same rationale as claim 1 above.
	Regarding claim 20, Andoni in view of Maruyama, and further in view of Palmes teaches a computer processing system for reducing training costs for an ensemble of machine-learning-based classifiers, the system comprising: a memory for storing program code ([Andoni, column 3, line 29-33] “A computer system in accordance with the present disclosure may include a memory that stores an input data set and a first plurality of data structures. For example, each data structure may be a model of a neural network that models the input data set”); 
and a hardware processor for running the program code ([Andoni, column 3, line 33-37] “The computer system may also include at least one processor that is configured to execute a recursive search. For example, the recursive search may be a genetic algorithm to generate a neural network that best models the input data set”). Claim 20 is a computer processing system claim having similar limitation to a method claim 1 above. Therefore, it is rejected with same rationale as claim 1 above.

	Regarding claim 2, Andoni in view of Maruyama, and further in view of Palmes teaches the computer-implemented method of claim 1, wherein the dumped parameters used to train the trained classifier are connection weights in the given machine-learning-based classifier ([Andoni, column 8, line 23-29] “For example, the portion of the input data set 102 may be input into the trainable model 122, which may in turn generate output data. The input data set 102 and the output data may be used to determine an error value, and the error value may be used to modify connection weights of the model, such as by using gradient descent or another function”, dumping corresponds to the process of updating the weight).
Claim 12 is a computer program product claim having similar limitation to a method claim 2 above. Therefore, it is rejected with same rationale as claim 2 above.

Regarding claim 3, Andoni in view of Maruyama, and further in view of Palmes teaches the computer-implemented method of claim 1, wherein said recording step is performed responsive to said dumping step ([Maruyama, column 14, line 27-40] “According to the neural network circuit of the present invention described above, even if the weight values of the whole neural network (initially learned weight values) are stored in the ROM, difference values between the initial weight values and the additionally learned weight values are stored in the difference value memory such as a RAM having a smaller size than that of the ROM”, [Maruyama, column 7, line 48-51] “wherein each numeral in parentheses represents the number of times. Assuming that the weight value Wi(n) is obtained by addition of an initial weight value and a difference value, the following equations are obtained: Wi(n)=Wi(0)+dWi(n) (4)”, [Maruyama, column 5, line 11-16] “FIG. 7(b) is a chart limiting a difference value dWij (n) between the initial weight value Wij (o) and the weight value obtained after additional learning by using the limiter according to the second embodiment of the present invention”, shows that the dWi is the difference value between initial weight and weight after training. It is inherent that the recording step is performed responsive to dumping step, as the difference between initial weight and weight after training process must be calculated before computing the updated parameter of Maruyama Wi(n)).
Claim 13 is a computer program product claim having similar limitation to a method claim 3 above. Therefore, it is rejected with same rationale as claim 3 above.

Regarding claim 4, Andoni in view of Maruyama, and further in view of Palmes teaches the computer-implemented method of claim 1, wherein the new classifier is created for inference use in an absence of training the new classifier ([Andoni, column 18, line 16-27] “After the evolutionary weights 174 have been modified (to generate modified evolutionary weights 834), mutation operations 836 may be performed based on the modified evolutionary weights 834. For example, the mutation operations 836 may have a higher probability of increasing the number of nodes of a model by 2 and a lower probability of decreasing the number connections by 1, as compared to the mutation operations 830 based on the initial evolutionary weights 832. Performing the mutation operations 836 on the second plurality of models 810 may generate an output set of models for the second epoch. The output set includes a third plurality of models 820”, shows Andoni performing genetic operations (mutation) without training to generate the ensemble of models. [Andoni, Fig. 6] shows the diagram of Generated output set of N model).
Claim 14 is a computer program product claim having similar limitation to a method claim 4 above. Therefore, it is rejected with same rationale as claim 4 above.

Regarding claim 5, Andoni in view of Maruyama, and further in view of Palmes teaches wherein said creating step mutates the dumped parameters based on the recorded change rate to avoid prediction accuracy degradation by the new classifier ([Palmes, page 588, 3rd paragraph] “Parametric mutation is carried out by perturbing ANN’s weights using Gaussian noise with the severity of mutation controlled by the annealing temperature … where U and N are uniform and Gaussian random variables, respectively”, the             
                α
                 
            
        determines the rate of change of the random variable, so it is a change rate, [Palmes, page 591, left column, 5th paragraph, line 6-15] “On the other hand, the scheduled implementation assigns higher probabilities to those with low fitness and lower probabilities to those with high fitness within the range of 0.01–0.05 using their fitness rank score. This is a similar idea to the annealing implementation of GNARL except that the assignment of probability in MGNN is purely deterministic. It works under the principle that individuals located away from the best solution need drastic changes to improve their fitness than those located near the optimal solution”, shows change rate of probability to avoid accuracy degradation as giving higher probabilities to lower fitness will improve the classifier accuracy).
Claim 15 is a computer program product claim having similar limitation to a method claim 5 above. Therefore, it is rejected with same rationale as claim 5 above.

Regarding claim 6, Andoni in view of Maruyama, and further in view of Palmes teaches the computer-implemented method of claim 1, wherein the random numbers are taken from a limited range of random numbers ([Palmes, page 588, right column, first paragraph and equation (1)] “Common to these approaches is the use of Gaussian perturbation in the mutation operation to bring about changes in structure or weights of ANN             
                ω
                =
                ω
                +
                N
                
                    
                        0
                        ,
                        α
                        ϵ
                        
                            
                                μ
                            
                        
                    
                
                 
                ∀
                ω
                ∈
                μ
            
           (1)   where             
                N
                
                    
                        0
                        ,
                        α
                        ϵ
                        
                            
                                μ
                            
                        
                    
                
                 
            
        is the Gaussian perturbation with mean 0 and standard deviation             
                α
                ϵ
                
                    
                        μ
                    
                
            
        ,             
                ω
            
         is a weight, and             
                ϵ
                
                    
                        μ
                    
                
                 
            
        is an error function [e.g., mean-squared error (MSE)] which is scaled by the user-defined constant             
                α
            
        . Selection of weights for mutation is random with probability based on certain criterion such as the variance of incident nodes [50] or using equal probability [32].”, teaches the mutation operation with Gaussian perturbation. The Gaussian perturbation is the random numbers, [Palmes, page 588, 3rd paragraph] “Parametric mutation is carried out by perturbing ANN’s weights using Gaussian noise with the severity of mutation controlled by the annealing temperature …             
                ω
                =
                ω
                +
                N
                (
                0
                ,
                α
                T
                (
                μ
                )
                )
            
         where U and N are uniform and Gaussian random variables, respectively”, the             
                α
                 
            
        determines the rate of change of the random variable, so it is a change rate. The random numbers are taken from Gaussian distribution, which is a limited range. The range is [Palmes, page 588, right column, equation (5) and 4th  paragraph] “Structural mutation involves addition or deletion of nodes or links. Node and links to be mutated are selected uniformly with the number of units to be modified determined by             
                
                    
                        Δ
                    
                    
                        m
                        i
                        n
                    
                
                +
                [
                U
                [
                0,1
                ]
                T
                (
                μ
                )
                (
                
                    
                        Δ
                    
                    
                        m
                        a
                        x
                    
                
                -
                
                    
                        Δ
                    
                    
                        m
                        i
                        n
                    
                
                )
                ]
            
         , where             
                (
                
                    
                        Δ
                    
                    
                        m
                        a
                        x
                    
                
                -
                
                    
                        Δ
                    
                    
                        m
                        i
                        n
                    
                
                )
            
         is a user-defined interval”).
Claim 16 is a computer program product claim having similar limitation to a method claim 6 above. Therefore, it is rejected with same rationale as claim 6 above.

Regarding claim 7, Andoni in view of Maruyama, and further in view of Palmes teaches wherein the change rates are from a start time to an end time of a final training epoch ([Maruyama, column 14, line 27-40] “According to the neural network circuit of the present invention described above, even if the weight values of the whole neural network (initially learned weight values) are stored in the ROM, difference values between the initial weight values and the additionally learned weight values are stored in the difference value memory such as a RAM having a smaller size than that of the ROM”, [Maruyama, column 7, line 48-51] “wherein each numeral in parentheses represents the number of times. Assuming that the weight value Wi(n) is obtained by addition of an initial weight value and a difference value, the following equations are obtained: Wi(n)=Wi(0)+dWi(n) (4)”, [Maruyama, column 5, line 11-16] “FIG. 7(b) is a chart limiting a difference value dWij (n) between the initial weight value Wij (o) and the weight value obtained after additional learning by using the limiter according to the second embodiment of the present invention”, shows that the dWi is the difference value between initial weight and weight after training. It is inherent that the recording step is performed responsive to dumping step, as the difference between initial weight and weight after training process must be calculated before computing the updated parameter of Maruyama Wi(n). Initial training weight must be obtained when the training begins (start time), and weight after training must be obtained when the training ends (end time)).
Claim 17 is a computer program product claim having similar limitation to a method claim 7 above. Therefore, it is rejected with same rationale as claim 7 above.

Regarding claim 8, Andoni in view of Maruyama, and further in view of Palmes teaches wherein the intermediate training epoch immediately precedes the final training epoch in a sequence of training epochs including the intermediate training epoch and the final training epoch ([Andoni, column 14, line 29-32] “For example, the crossover operation 160 may combine aspects of two models of the input set 120 to generate an intermediate model and the mutation operation 170 may be performed on the intermediate model to generate a model of the output set 130”, teaches the intermediate training epoch (generating an intermediate model) precedes the final training epoch (generating a model of output set), [Andoni, column 15, line 11-23; Figure 8] “FIG. 8 illustrates operation of the genetic algorithm 110 during a first epoch, a second epoch, and a third epoch. In a particular implementation, the first epoch is an initial epoch of the genetic algorithm 110, and the second epoch and the third epoch are the next two consecutive epochs. In an alternate implementation, the first epoch is a non-initial epoch, and the second and third epochs are subsequent to the first epoch. In some implementations, the first epoch, the second epoch, and the third epoch are consecutive epochs. In other implementations, the first epoch and the second epoch are separated by at least one epoch, the second epoch and the third epoch are separated by at least one epoch, or both”, shows the first training epoch, second epoch, and final epoch output).
Andoni does not specifically teach wherein the change rates are from an intermediate training epoch to a final training epoch. 
Maruyama teaches wherein the change rates are from an intermediate training epoch to a final training epoch ([Maruyama, column 14, line 27-40] “According to the neural network circuit of the present invention described above, even if the weight values of the whole neural network (initially learned weight values) are stored in the ROM, difference values between the initial weight values and the additionally learned weight values are stored in the difference value memory such as a RAM having a smaller size than that of the ROM”, [Maruyama, column 7, line 48-51] “wherein each numeral in parentheses represents the number of times. Assuming that the weight value Wi(n) is obtained by addition of an initial weight value and a difference value, the following equations are obtained: Wi(n)=Wi(0)+dWi(n) (4)”, [Maruyama, column 5, line 11-16] “FIG. 7(b) is a chart limiting a difference value dWij (n) between the initial weight value Wij (o) and the weight value obtained after additional learning by using the limiter according to the second embodiment of the present invention”, shows that the dWi is the difference value between initial weight and weight after training. It is inherent that the recording step is performed responsive to dumping step, as the difference between initial weight and weight after training process must be calculated before computing the updated parameter of Maruyama Wi(n). Initial training weight must be obtained when the training begins (start time), and weight after training must be obtained when the training ends (end time)).
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Andoni and Maruyama to use the change rates are from an intermediate training epoch to a final training epoch of Maruyama to implement the ensemble machine-learning based classifier method of Andoni. The suggestion and/or motivation for doing so is to improve the accuracy of final output, as the intermediate weight data contains more information about the entire process than the beginning data.

Claim 18 is a computer program product claim having similar limitation to a method claim 8 above. Therefore, it is rejected with same rationale as claim 8 above.

Regarding claim 9, Andoni in view of Maruyama, and further in view of Palmes teaches wherein multiple random numbers are used to create the new classifier, each of the multiple random numbers corresponding to a respective different one of a plurality of machine-learning-based classifier layers in the given machine-learning-based classifier ([Palmes, page 591, left column, 5th paragraph, line 4-6] “MGNN implements two types of mutation, namely: stochastic mutation (GA-inspired) and scheduled stochastic mutation (EP-inspired). Algorithm 2 describes the former while Algorithm 3 describes the latter. In the stochastic implementation, each weight in matrices W1, W2, and vectors             
                
                    
                        θ
                    
                    
                        ω
                        1
                    
                
            
        ,             
                
                    
                        θ
                    
                    
                        ω
                        2
                    
                
            
         has the same 0.01 probability of perturbation”, so each of the mutation applies to each of different weights. [Palmes, page 591, right column, Algorithm 2] the code after the comment /*mutate weight matrices */ shows the process of applying mutation to each of the machine-learning based classifier layers).
Claim 19 is a computer program product claim having similar limitation to a method claim 9 above. Therefore, it is rejected with same rationale as claim 9 above.

Regarding claim 10, Andoni in view of Maruyama, and further in view of Palmes teaches wherein multiple random numbers are used to create the new classifiers, each of the multiple random numbers corresponding to a different one of the dumped parameters ([Palmes, page 591, left column, 5th paragraph, line 4-6] “MGNN implements two types of mutation, namely: stochastic mutation (GA-inspired) and scheduled stochastic mutation (EP-inspired). Algorithm 2 describes the former while Algorithm 3 describes the latter. In the stochastic implementation, each weight in matrices W1, W2, and vectors             
                
                    
                        θ
                    
                    
                        ω
                        1
                    
                
            
        ,             
                
                    
                        θ
                    
                    
                        ω
                        2
                    
                
            
         has the same 0.01 probability of perturbation”, so each of the mutation applies to each of different weights. [Palmes, page 591, right column, Algorithm 2] the code after the comment /*mutate weight matrices */ shows the process of applying mutation to each of the layers).

Responses to Arguments
	Applicant’s arguments filed 06/22/2022 have been fully considered but they are not persuasive.
The applicant respectfully argues that the 35 U.S.C. 101 rejection failed to show that the invention is an abstract idea, because assuming arguendo that these claims recite an abstract idea, these claims nonetheless recite significantly more than an abstract idea for the preceding reasons. Namely, the claims are directed to neural network training, a concept found to be patent eligible under the MPEP (2106. 04{a)(1)) in regarding to 38 U.S.C. 101.
The examiner respectfully disagrees. The limitation of training, by a processor, a given machine-learning based classifier from among the ensemble to obtain a trained classifier broadly recites a process of training a classifier. The claim do not disclose any details functions of neural network that is being performed in the claim language. As disclosed in the 101 rejection the claim limitation as recited are generally linking the claimed elements to a field of use and technical environment. Therefore the claims are not eligible under 101.

Regarding the 35 U.S.C. 103 rejection of claim 1, 15, and 18, the applicant respectfully argues that the combination of Andoni, Maruyama, and Palmes failed to disclose or suggest ‘creating, by the processor without training, a new classifier from at least one other machine-learning-based classifier in the ensemble by calculating the dumped parameters plus change rates to obtain sums multiplied by random numbers for each local prediction by the trained classifier.’.
The examiner respectfully disagrees. The claim limitation requires a process of performing a math calculation and initiating or creating a new classifier without training process. The claim limitation is broad and do not disclose how a classifier is created by performing a math function. The claim only recites creating a classifier without training which is just the mathematical function of generation of a new classifier. It does not require not to train the classifier after it has been created. Under the Broadest Reasonable Interpretation, the prior art Andoni discloses the process of generating new classifiers without training process. 
Andoni discloses the process of creating, by the processor without training, a new classifier from at least one other machine-learning-based classifier in the ensemble.  According to the Figure 8 of Andoni, each of the machine learning models 804, 806, and 808 are mutated in the mutation operations 830 and 836. There is no traditional training process while creating classifiers recited in Andoni reference. Mutation operation of 830 and 836 does not involve any traditional training process.
Even though neither Andoni, Maruyama, nor Palmes explicitly teaches ‘calculating the dumped parameters plus change rates to obtain sums multiplied by random numbers’, it is still obvious to combine Andoni, Maruyama, and Palmes to use change rates to make changes to the weights of a classifier, as the claim limitation merely discloses the mathematical calculation process and does not disclose how it creates a new neural network. Multiplying random number to the sum of dumped parameters and change rates which merely recites a mathematical calculation does not provide any inventive concept and the combination of Andoni, Maruyama, and Palmes fits into the limitation of the claim. 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Regarding mutating neural network parameters.
US-20070043513-A1
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JUN KWON whose telephone number is (571)272-2072. The examiner can normally be reached on M-F 7:30AM – 4:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abdullah Kawsar can be reached on (571)270-3169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/JUN KWON/
Patent Examiner, Art Unit 2127

/ABDULLAH AL KAWSAR/Supervisory Patent Examiner, Art Unit 2127