DETAILED ACTION
This action is in response to the claims filed 03/21/2022 for application #16/696,061. Claims 1, 3-6, 8-11, 13-16, and 18-20 have been amended, claims 7 and 17 have been canceled. Thus, claims 1-6, 8-16, and 18-20 are currently pending.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Objections
Claim 15 is objected to because of the following informalities:  "performing the architecture on the plurality of candidate solutions.." appear to be grammatically incorrect and should read "performing the architecture variation..." as similarly recited in the independent claim.  Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-6, 8-16, and 18-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
The term “relatively high importance” in claims 1 and 11 is a relative term which renders the claim indefinite. The term “relatively high importance” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. It is not clear as to what degree a layer would have relatively high importance since there is no frame of reference. Thus, the claim is indefinite. 

Claims 2-6, 8-10, 12-16, and 18-20 are rejected as being dependent on a rejected base claim without curing any of the deficiencies.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-6, 10-16, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Husain ("US 20190122119 A1", hereinafter "Husain") in view of Tomkins et al. ("US 20070288410 A1", hereinafter "Tomkins") further in view of Kim et al. ("Evolutionary model selection in unsupervised learning", hereinafter "Kim") and further in view of Teerapittayanon et al. ("BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks", hereinafter "Teera").

Regarding claim 1, Husain teaches A method of training a neural network, the method comprising: 
generating, by a processor (¶0012-¶0013), a candidate solution set by modifying a candidate solution (“To illustrate, each epoch of the genetic algorithm may produce a particular number of candidate neural networks based on crossover and mutation operations that are performed on the candidate neural networks of a preceding epoch.” [¶0005]) which represents a basic neural network model in a variable-length string form (“In some examples, the normalized vectors are binary (or Boolean) strings. For example, a first neural network may be represented by a first binary string normalized vector 401 and a second neural network may be represented by a second binary string normalized vector 402. When binary strings are used, there are only two possible values for each field of a normalized vector—zero or one.” [¶0086]); 
acquiring, by the processor, first candidate solutions by performing architecture variation on a plurality of candidate solutions selected from the candidate solution set (“In some examples, the models may be clustered into species based on genetic distance. One illustrative non-limiting method of determining similarity/genetic distance between models is using a binned hamming distance, as further described with reference to FIG. 4. In a particular aspect, a species ID of each of the models may be set to a value corresponding to the species that the model has been clustered into. Next, a species fitness may be determined for each of the species. The species fitness of a species may be a function of the fitness of one or more of the individual models in the species. As a simple illustrative example, the species fitness of a species may be the average of the fitness of the individual models in the species. As another example, the species fitness of a species may be equal to the fitness of the fittest or least fit individual model in the species. In alternative examples, other mathematical functions may be used to determine species fitness. The genetic algorithm 110 may maintain a data structure that tracks the fitness of each species across multiple epochs. Based on the species fitness, the genetic algorithm 110 may identify the “fittest” species, which may also be referred to as “elite species.” Different numbers of elite species may be identified in different embodiments.” [¶0039; Examiner is interpreting clustering candidate neural networks into species to be equivalent to acquiring a “first candidate solutions”. The genetic algorithm would be performing operations equivalent to “architecture variation”. See further [¶0022, ¶0026]])
selecting, by the processor, a neural network model satisfying a targeted effective performance, as a first neural network model (“The fittest models of each “elite species” may be identified. The fittest models overall may also be identified. An “overall elite” need not be an “elite member,” e.g., may come from a non-elite species. Different numbers of “elite members” per species and “overall elites” may be identified in different embodiments. In some embodiments, the expected reliability or performance 105 of a neural network is also considered in determining whether the corresponding model is an “elite species,” a “elite member,” or an “overall elite.”” [¶0041; selecting the fittest model would be equivalent to satisfying a targeted effective performance.]); 
acquiring, by the processor, a second candidate solution by performing selective error propagation-based supervised learning on the first neural network model (“In some examples, the system 100 includes an optimization trainer, such as a backpropagation trainer to train selected models generated by the genetic algorithm 110 and feed the trained models back into the genetic algorithm… Rather, the trainable model may represent an advancement with respect to the fittest models of the input set 120. The trainable model may be sent to the backpropagation trainer, which may train connection weights of the trainable model based on a portion of the input data set 102. When training is complete, the resulting trained model may be received from the backpropagation trainer and may be input into a subsequent epoch of the genetic algorithm 110.” [¶0042; note: The trained model corresponds to the selected first neural network model. The backpropagation trainer would correspond to performing selective error propagation based supervised learning. See further: ¶0044]);
selecting, by the processor, a neural network model represented by the second candidate solution, which satisfies the targeted effective performance, as a final neural network model (“Operation at the system 100 may continue iteratively until specified a termination criterion, such as a time limit, a number of epochs, or a threshold fitness value (of an overall fittest model) is satisfied. When the termination criterion is satisfied, an overall fittest model of the last executed epoch may be selected and output as representing a neural network that best models the input data set 102. In some examples, the overall fittest model may undergo a final training operation (e.g., by the backpropagation trainer) before being output.” [¶0054])
wherein the acquiring, by the processor, the second candidate solution by performing the selective error propagation-based supervised learning on the first neural network model comprises: (“The backpropagation trainer may utilize a portion, but not all of the input data set 102 to train the connection weights of the trainable model, thereby generating the trained model. For example, the portion of the input data set 102 may be input into the trainable model, which may in turn generate output data. The input data set 102 and the output data may be used to determine an error value, and the error value may be used to modify connection weights of the model, such as by using gradient descent or another function.” [¶0044; the trained model would correspond to a first neural network model])
However Husain fails to explicitly teach performing, by the processor, unsupervised learning on each of neural network models represented by respective first candidate solutions and calculating effective performances of the neural network models; 
wherein the acquiring, by the processor, the second candidate solution by performing the selective error propagation-based supervised learning on the first neural network model comprises: 
analyzing, by the processor, weight matrix densities of the first neural network model; 
identifying, by the processor, a layer having relatively high importance based on the weight matrix densities; and
setting, by the processor, an identified layer as a branch point and setting a path of the selective error propagation-based supervised learning from the branch point
Tomkins teaches analyzing, by the processor, weight matrix densities of the first neural network model (“Note: for the convolution weight function, there are no "real" synaptic link connections. Instead a matrix operation is performed between the activation matrix and the weight matrix. Here, the weight matrix is dependent upon the numbers of layers connecting to layer(i) and the number of nodes in layer(i)” [¶0162]);
Husain and Tomkins are both in the same field of endeavor of using genetic algorithms to find the best model. Husain teaches automated neural network generation using a genetic algorithm approach. Tomkins teaches a genetic algorithm approach. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Husain’s teachings by using weight matrices and matrices representing neural network structures as taught by Tomkins. One would have been motivated to make this modification in order to generate weight matrices to find the best performing neural network model. [¶0007, Tomkins]
However Husain/Tomkins fails to explicitly teach performing, by the processor, unsupervised learning on each of neural network models represented by respective first candidate solutions and calculating effective performances of the neural network models;
Kim teaches performing, by the processor, unsupervised learning on each of neural network models represented by respective first candidate solutions (“We outline the ELSA algorithm in Fig. 4. Each agent (candidate solution) in the population is first initialized with some random solution and an initial reservoir of energy” [pg. 540, ¶2; See pg. 532, Kim discloses unsupervised learning to evaluate solutions: “When we do not have prior information to evaluate candidate solutions, we instead wish to find natural grouping of the examples in the feature space via clustering or unsupervised learning and utilize the clustering results to evaluate solutions.”]) and calculating effective performances of the neural network models (“The net energy intake of an agent is determined by its offspring’s fitness. This is a function of how well the candidate solution performs with respect to the criteria being optimized. But the energy also depends on the state of the environment. The environment corresponds to the set of possible values for each of the criteria being optimized” [pg. 541, para under Fig. 4]);
Husain, Tomkins, and Kim are all in the same field of endeavor of using genetic algorithms to find the best model. Husain teaches automated neural network generation using a genetic algorithm approach. Tomkins teaches a genetic algorithm approach. Kim discloses a genetic algorithm using unsupervised learning. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Husain’s/Tomkins’ teachings by performing unsupervised learning in a genetic algorithm as taught by Kim. One would have been motivated to make this modification in order to evaluate candidate solutions based on performance. [pg. 532, ¶2, Kim]
Husain/Tomkins/Kim fails to explicitly teach identifying, by the processor, a layer having relatively high importance based on the weight matrix densities; and
setting, by the processor, an identified layer as a branch point and setting a path of the selective error propagation-based supervised learning from the branch point
Teera teaches identifying, by the processor, a layer having relatively high importance based on the weight matrix densities (“On a simplified version of BranchyAlexNet with only the first and last branch, weighting the first branch with 1.0 and the last branch with 0.3 provides a 1% increase in classification accuracy over weighting each branch equally. Giving more weight to earlier exit branches encourages more discriminative feature learning in early layers of the network and allows more samples to exit early with high confidence.” [pg. 2468, left col, top para; Examiner is interpreting layer given more weight to be equivalent with a layer having “high importance.”]); and
setting, by the processor, an identified layer as a branch point and setting a path of the selective error propagation-based supervised learning from the branch point (“BranchyNet modifies the standard deep network structure by adding exit branches (also called side branches or simply branches for brevity), at certain locations throughout the network. These early exit branches allow samples which can be accurately classified in early stages of the network to exit at that stage. In training the classifiers at these exit branches, we also consider network regularization and mitigation of vanishing gradients in backprogation. For the former, branches will provide regularization on the main branch (baseline network), and vice versa. For the latter, a relatively shallower branch at a lower layer will provide more immediate gradient signal in backpropagation, resulting in discriminative features in lower layers of the main branch, thus improving its accuracy.” [pg. 2465-2466, § III. BranchyNet, ¶1])
	Husain teaches automated neural network generation using a genetic algorithm approach. Tomkins teaches a genetic algorithm approach. Kim discloses a genetic algorithm using unsupervised learning. Teera teaches a method of training neural networks by finding branch points to exit inference early. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Husain’s/Tomkins’/Kim’s teachings to identify a layer as a branch point and perform error based supervised learning from that branch point as taught by Teera. One would have been motivated to make this modification in order to reduce runtime and improve accuracy. [pg. 2465, left col, 3 bullets, Teera]

Regarding claim 2, Husain/Tomkins/Kim/Teera teaches The method of claim 1, where Husain teaches wherein the candidate solution, which represents the basic neural network model in a variable-length string form ([¶0086]), includes weight matrices, which represent neural interconnections and weights related to connection strengths between neurons (“The connection data for each connection in a neural network may include at least one of a node pair or a connection weight. For example, if a neural network includes a connection from node N1 to node N2, then the connection data for that connection may include the node pair <N1, N2>. The connection weight may be a numerical quantity that influences if and/or how the output of N1 is modified before being input at N2. In the example of a recurrent network, a node may have a connection to itself (e.g., the connection data may include the node pair <N1, N1>).” [¶0016]), and a matrix representing a neural network structure (“Each data structure 210 includes information describing the topology of a neural network as well as other characteristics of the neural network, such as link weight, bias values, activation functions, and so forth.” [¶0062]).
While Husain teaches interconnections and weights related to connection strengths between neurons and neural network structure, the reference does not provide details of matrix/matrices. 
Tomkins teaches weight matrices representing neural interconnections and weights related to connection strengths between neurons and a matrix representing a neural network structure (“It contains [0022] (a) a first chromosome layer with a plurality of chromosome tables to record the connections of the weighted synaptic link among nodes; each the chromosome table comprising a plurality of rows and a plurality of columns, with a non-zero table element in the chromosome table denoting that there is a connection between the row and the column while a zero entry denoting an absence of the connection, and 
[0023] (b) a second chromosome layer arranged in a chromosome matrix with a plurality of rows and columns of matrix elements; each column representing one neural layer of the neural network, the first row recording the number of nodes in each the neural layer; and the other rows representing one of the function categories; and each matrix element in the other rows denoting the choice of the plurality of functions in the function category.”)
Husain teaches automated neural network generation using a genetic algorithm approach. Tomkins teaches a genetic algorithm approach. Kim discloses a genetic algorithm using unsupervised learning. Teera teaches a method of training neural networks by finding branch points to exit inference early. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Husain’s/Kim’s/Teera’s teachings by using weight matrices and matrices representing neural network structures as taught by Tomkins. One would have been motivated to make this modification in order to generate weight matrices to find the best performing neural network model. [¶0007, Tomkins]

Regarding claim 3, Husain/Tomkins/Kim/Teera teaches The method of claim 1, where Husain teaches wherein the wherein the performing, by the processor, the unsupervised learning on each of the neural network models represented by respective first candidate solutions and calculating the effective performances of the neural network models comprises performing, by the processor, unsupervised learning in parallel on the basis of degree of parallelism (DOP) (“In a particular aspect, fitness evaluation of models may be performed in parallel. To illustrate, the system 100 may include additional devices, processors, cores, and/or threads 190 to those that execute the genetic algorithm 110 and the trained classifier 101. These additional devices, processors, cores, and/or threads 190 may test model fitness in parallel based on the input data set 102 and may provide the resulting fitness values to the genetic algorithm 110.” [¶0024; Examiner is interpreting using these additional devices in parallel to be equivalent to performing unsupervised learning in parallel on the basis of degree of parallelism.]).

Regarding claim 4, Husain/Tomkins/Kim/Teera teaches The method of claim 1, where Husain teaches wherein the acquiring, by the processor, the first candidate solutions by performing the architecture variation-based on the plurality of candidate solutions selected from the candidate solution set comprises acquiring the first candidate solutions by merging two candidate solutions in the candidate solution set (“During a crossover operation 160, a portion of one model may be combined with a portion of another model, where the size of the respective portions may or may not be equal. When normalized vectors are used to represent neural networks, the crossover operation may include concatenating bits/bytes/fields 0 to p of one normalized vector with bits/bytes/fields p+1 to q of another normalized vectors, where p and q are integers and p+q is equal to the size of the normalized vectors… Thus, the crossover operation may be a random or pseudo-random operator that generates a model of the output set 130 by combining aspects of a first model of the input set 120 with aspects of one or more other models of the input set 120. For example, the crossover operation may retain a topology of hidden nodes of a first model of the input set 120 but connect input nodes of a second model of the input set to the hidden nodes.” [¶0048- ¶0049]).

Regarding claim 5, Husain/Tomkins/Kim/Teera teaches The method of claim 1, where Husain teaches wherein the acquiring, by the processor, the first candidate solutions by the performing architecture variation-based on the plurality of candidate solutions selected from the candidate solution set comprises acquiring the first candidate solutions by performing at least one architecture variation method among weight modification, interneuron connection removal, interneuron connection addition, neuron removal, and neuron addition (“As another example, the mutation operation may cause one or more activation functions, aggregation functions, bias values/functions, and/or or connection weights to be modified.” [¶0051; note: The claim under BRI only requires at least one of the methods, thus the Examiner has cited a portion corresponding to a weight modification. However, see ¶0006, ¶0020 for interneuron connection removal/neuron removal and ¶0051 for interneuron connection addition/neuron addition.]).

Regarding claim 6, Husain/Tomkins/Kim/Teera teaches The method of claim 1, wherein the acquiring, by the processor, the second candidate solution by performing selective error propagation-based supervised learning on the first neural network model 
However Husain fails to explicitly teach comprises setting, by the processor,  a pseudo reverse weight matrix to finely tune weight matrices.
Tomkins teaches comprises setting a pseudo reverse weight matrix to finely tune weight matrices (“[0043] (a). choosing a specific training function from a plurality of training functions; 
[0044] (b). inputting the set of input signals to the input nodes of the neural network;
[0045] (c). computing the set of output responses by propagating the set of input signals from the input nodes to the output nodes via the plurality of weighted synaptic links; 
[0046] (d). accumulating the total error between the set of output responses and the set of target signals; 
[0047] (e). invoking the specific training algorithm to adjust the weight values of the weighted synaptic links to minimize the total error; 
[0048] (f). calculating the fitness score; the fitness score being related to the total error; 
[0049] (g). repeating steps (b), (c), (d), (e) and (f) for a predetermined number of iterations unless the fitness score is smaller than a pre-defined criterion.” [note: ¶0022-¶0023, ¶0162 discloses “weight matrix”, the cited process would correspond to finely tuning weight matrices.).
Husain teaches automated neural network generation using a genetic algorithm approach. Tomkins teaches a genetic algorithm approach. Kim discloses a genetic algorithm using unsupervised learning. Teera teaches a method of training neural networks by finding branch points to exit inference early. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Husain’s/Kim’s/Teera’s teachings by using weight matrices and matrices representing neural network structures as taught by Tomkins. One would have been motivated to make this modification in order to generate weight matrices to find the best performing neural network model. [¶0007, Tomkins]

Regarding claim 10, Husain/Tomkins/Kim/Teera teaches The method of claim 1, Husain teaches wherein the analyzing, by the processor, the weight densities of the first neural network model and setting of the path of selective error propagation-based supervised learning comprise updating weights on the basis of error difference values of the first neural network model extracted along the path of selective error propagation-based supervised learning (“The backpropagation trainer may utilize a portion, but not all of the input data set 102 to train the connection weights of the trainable model, thereby generating the trained model. For example, the portion of the input data set 102 may be input into the trainable model, which may in turn generate output data. The input data set 102 and the output data may be used to determine an error value, and the error value may be used to modify connection weights of the model, such as by using gradient descent or another function.” [¶0044; using the error value to modify connection weights would be equivalent to updating weights.]).
While Husain teaches updating, by the processor weights on the basis of error difference values of the first neural network model extracted along the path of selective error propagation-based supervised learning, the reference does not provide details of matrix/matrices.
Tomkins teaches weight matrix densities (“Note: for the convolution weight function, there are no "real" synaptic link connections. Instead a matrix operation is performed between the activation matrix and the weight matrix. Here, the weight matrix is dependent upon the numbers of layers connecting to layer(i) and the number of nodes in layer(i)” [¶0162])
Husain teaches automated neural network generation using a genetic algorithm approach. Tomkins teaches a genetic algorithm approach. Kim discloses a genetic algorithm using unsupervised learning. Teera teaches a method of training neural networks by finding branch points to exit inference early. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Husain’s/Kim’s/Teera’s teachings by using weight matrices and matrices representing neural network structures as taught by Tomkins. One would have been motivated to make this modification in order to generate weight matrices to find the best performing neural network model. [¶0007, Tomkins]

Regarding claim 11, Husain teaches A device for training a neural network, the device comprising: 
a processor ([¶0012]); and 
a memory configured to store at least one command executed through the processor, wherein the at least one command, when executed by the processor, causes the processor to ([¶0061]):
generate a candidate solution set by modifying a candidate solution (“To illustrate, each epoch of the genetic algorithm may produce a particular number of candidate neural networks based on crossover and mutation operations that are performed on the candidate neural networks of a preceding epoch.” [¶0005]) which represents a basic neural network model in a variable-length string form (“In some examples, the normalized vectors are binary (or Boolean) strings. For example, a first neural network may be represented by a first binary string normalized vector 401 and a second neural network may be represented by a second binary string normalized vector 402. When binary strings are used, there are only two possible values for each field of a normalized vector—zero or one.” [¶0086]); 
acquire first candidate solutions by performing architecture variation on a plurality of candidate solutions selected from the candidate solution set (“In some examples, the models may be clustered into species based on genetic distance. One illustrative non-limiting method of determining similarity/genetic distance between models is using a binned hamming distance, as further described with reference to FIG. 4. In a particular aspect, a species ID of each of the models may be set to a value corresponding to the species that the model has been clustered into. Next, a species fitness may be determined for each of the species. The species fitness of a species may be a function of the fitness of one or more of the individual models in the species. As a simple illustrative example, the species fitness of a species may be the average of the fitness of the individual models in the species. As another example, the species fitness of a species may be equal to the fitness of the fittest or least fit individual model in the species. In alternative examples, other mathematical functions may be used to determine species fitness. The genetic algorithm 110 may maintain a data structure that tracks the fitness of each species across multiple epochs. Based on the species fitness, the genetic algorithm 110 may identify the “fittest” species, which may also be referred to as “elite species.” Different numbers of elite species may be identified in different embodiments.” [¶0039; Examiner is interpreting clustering candidate neural networks into species to be equivalent to acquiring a “first candidate solutions”. The genetic algorithm would be performing operations equivalent to “architecture variation”. See further [¶0022, ¶0026]])
select a neural network model satisfying a targeted effective performance, as a first neural network model (“The fittest models of each “elite species” may be identified. The fittest models overall may also be identified. An “overall elite” need not be an “elite member,” e.g., may come from a non-elite species. Different numbers of “elite members” per species and “overall elites” may be identified in different embodiments. In some embodiments, the expected reliability or performance 105 of a neural network is also considered in determining whether the corresponding model is an “elite species,” a “elite member,” or an “overall elite.”” [¶0041; selecting the fittest model would be equivalent to satisfying a targeted effective performance.]); 
acquire a second candidate solution by performing selective error propagation-based supervised learning on the first neural network model (“In some examples, the system 100 includes an optimization trainer, such as a backpropagation trainer to train selected models generated by the genetic algorithm 110 and feed the trained models back into the genetic algorithm… Rather, the trainable model may represent an advancement with respect to the fittest models of the input set 120. The trainable model may be sent to the backpropagation trainer, which may train connection weights of the trainable model based on a portion of the input data set 102. When training is complete, the resulting trained model may be received from the backpropagation trainer and may be input into a subsequent epoch of the genetic algorithm 110.” [¶0042; note: The trained model corresponds to the selected first neural network model. The backpropagation trainer would correspond to performing selective error propagation based supervised learning. See further: ¶0044]);
select a neural network model represented by the second candidate solution, which satisfies the targeted effective performance, as a final neural network model (“Operation at the system 100 may continue iteratively until specified a termination criterion, such as a time limit, a number of epochs, or a threshold fitness value (of an overall fittest model) is satisfied. When the termination criterion is satisfied, an overall fittest model of the last executed epoch may be selected and output as representing a neural network that best models the input data set 102. In some examples, the overall fittest model may undergo a final training operation (e.g., by the backpropagation trainer) before being output.” [¶0054])
wherein the command causing the processor to acquire the second candidate solution by performing the selective error propagation-based supervised learning on the first neural network model comprises: (“The backpropagation trainer may utilize a portion, but not all of the input data set 102 to train the connection weights of the trainable model, thereby generating the trained model. For example, the portion of the input data set 102 may be input into the trainable model, which may in turn generate output data. The input data set 102 and the output data may be used to determine an error value, and the error value may be used to modify connection weights of the model, such as by using gradient descent or another function.” [¶0044; the trained model would correspond to a first neural network model])
However Husain fails to explicitly teach perform unsupervised learning on each of neural network models represented by respective first candidate solutions and calculating effective performances of the neural network models; 
wherein the acquire the second candidate solution by performing the selective error propagation-based supervised learning on the first neural network model comprises: 
analyze weight matrix densities of the first neural network model; 
identify a layer having relatively high importance based on the weight matrix densities; and
set an identified layer as a branch point and setting a path of the selective error propagation-based supervised learning from the branch point
Tomkins teaches analyze, by the processor, weight matrix densities of the first neural network model (“Note: for the convolution weight function, there are no "real" synaptic link connections. Instead a matrix operation is performed between the activation matrix and the weight matrix. Here, the weight matrix is dependent upon the numbers of layers connecting to layer(i) and the number of nodes in layer(i)” [¶0162]);
Husain and Tomkins are both in the same field of endeavor of using genetic algorithms to find the best model. Husain teaches automated neural network generation using a genetic algorithm approach. Tomkins teaches a genetic algorithm approach. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Husain’s teachings by using weight matrices and matrices representing neural network structures as taught by Tomkins. One would have been motivated to make this modification in order to generate weight matrices to find the best performing neural network model. [¶0007, Tomkins]
However Husain/Tomkins fails to explicitly teach perform unsupervised learning on each of neural network models represented by respective first candidate solutions and calculating effective performances of the neural network models;
Kim teaches perform unsupervised learning on each of neural network models represented by respective first candidate solutions (“We outline the ELSA algorithm in Fig. 4. Each agent (candidate solution) in the population is first initialized with some random solution and an initial reservoir of energy” [pg. 540, ¶2; See pg. 532, Kim discloses unsupervised learning to evaluate solutions: “When we do not have prior information to evaluate candidate solutions, we instead wish to find natural grouping of the examples in the feature space via clustering or unsupervised learning and utilize the clustering results to evaluate solutions.”]) and calculating effective performances of the neural network models (“The net energy intake of an agent is determined by its offspring’s fitness. This is a function of how well the candidate solution performs with respect to the criteria being optimized. But the energy also depends on the state of the environment. The environment corresponds to the set of possible values for each of the criteria being optimized” [pg. 541, para under Fig. 4]);
Husain, Tomkins, and Kim are all in the same field of endeavor of using genetic algorithms to find the best model. Husain teaches automated neural network generation using a genetic algorithm approach. Tomkins teaches a genetic algorithm approach. Kim discloses a genetic algorithm using unsupervised learning. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Husain’s/Tomkins’ teachings by performing unsupervised learning in a genetic algorithm as taught by Kim. One would have been motivated to make this modification in order to evaluate candidate solutions based on performance. [pg. 532, ¶2, Kim]
Husain/Tomkins/Kim fails to explicitly teach identify a layer having relatively high importance based on the weight matrix densities; and
set an identified layer as a branch point and setting a path of the selective error propagation-based supervised learning from the branch point
Teera teaches identify a layer having relatively high importance based on the weight matrix densities (“On a simplified version of BranchyAlexNet with only the first and last branch, weighting the first branch with 1.0 and the last branch with 0.3 provides a 1% increase in classification accuracy over weighting each branch equally. Giving more weight to earlier exit branches encourages more discriminative feature learning in early layers of the network and allows more samples to exit early with high confidence.” [pg. 2468, left col, top para; Examiner is interpreting layer given more weight to be equivalent with a layer having “high importance.”]); and
set an identified layer as a branch point and setting a path of the selective error propagation-based supervised learning from the branch point (“BranchyNet modifies the standard deep network structure by adding exit branches (also called side branches or simply branches for brevity), at certain locations throughout the network. These early exit branches allow samples which can be accurately classified in early stages of the network to exit at that stage. In training the classifiers at these exit branches, we also consider network regularization and mitigation of vanishing gradients in backprogation. For the former, branches will provide regularization on the main branch (baseline network), and vice versa. For the latter, a relatively shallower branch at a lower layer will provide more immediate gradient signal in backpropagation, resulting in discriminative features in lower layers of the main branch, thus improving its accuracy.” [pg. 2465-2466, § III. BranchyNet, ¶1])
	Husain teaches automated neural network generation using a genetic algorithm approach. Tomkins teaches a genetic algorithm approach. Kim discloses a genetic algorithm using unsupervised learning. Teera teaches a method of training neural networks by finding branch points to exit inference early. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Husain’s/Tomkins’/Kim’s teachings to identify a layer as a branch point and perform error based supervised learning from that branch point as taught by Teera. One would have been motivated to make this modification in order to reduce runtime and improve accuracy. [pg. 2465, left col, 3 bullets, Teera]

Regarding claim 12, Husain/Tomkins/Kim/Teera teaches The device of claim 11, where Husain teaches wherein the candidate solution, which represents the basic neural network model in a variable-length string form ([¶0086]), includes weight matrices, which represent neural interconnections and weights related to connection strengths between neurons (“The connection data for each connection in a neural network may include at least one of a node pair or a connection weight. For example, if a neural network includes a connection from node N1 to node N2, then the connection data for that connection may include the node pair <N1, N2>. The connection weight may be a numerical quantity that influences if and/or how the output of N1 is modified before being input at N2. In the example of a recurrent network, a node may have a connection to itself (e.g., the connection data may include the node pair <N1, N1>).” [¶0016]), and a matrix representing a neural network structure (“Each data structure 210 includes information describing the topology of a neural network as well as other characteristics of the neural network, such as link weight, bias values, activation functions, and so forth.” [¶0062]).
While Husain teaches interconnections and weights related to connection strengths between neurons and neural network structure, the reference does not provide details of matrix/matrices. 
Tomkins teaches weight matrices representing neural interconnections and weights related to connection strengths between neurons and a matrix representing a neural network structure (“It contains [0022] (a) a first chromosome layer with a plurality of chromosome tables to record the connections of the weighted synaptic link among nodes; each the chromosome table comprising a plurality of rows and a plurality of columns, with a non-zero table element in the chromosome table denoting that there is a connection between the row and the column while a zero entry denoting an absence of the connection, and 
[0023] (b) a second chromosome layer arranged in a chromosome matrix with a plurality of rows and columns of matrix elements; each column representing one neural layer of the neural network, the first row recording the number of nodes in each the neural layer; and the other rows representing one of the function categories; and each matrix element in the other rows denoting the choice of the plurality of functions in the function category.”)
Husain teaches automated neural network generation using a genetic algorithm approach. Tomkins teaches a genetic algorithm approach. Kim discloses a genetic algorithm using unsupervised learning. Teera teaches a method of training neural networks by finding branch points to exit inference early. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Husain’s/Kim’s/Teera’s teachings by using weight matrices and matrices representing neural network structures as taught by Tomkins. One would have been motivated to make this modification in order to generate weight matrices to find the best performing neural network model. [¶0007, Tomkins]
Regarding claim 13, Husain/Tomkins/Kim/Teera teaches The device of claim 11, where Husain teaches wherein the command causing the processor to perform the unsupervised learning on each of the neural network models represented by respective first candidate solutions and calculate the effective performances of the neural network models comprises a command causing the processor to perform the unsupervised learning in parallel on the basis of degree of parallelism (DOP) (“In a particular aspect, fitness evaluation of models may be performed in parallel. To illustrate, the system 100 may include additional devices, processors, cores, and/or threads 190 to those that execute the genetic algorithm 110 and the trained classifier 101. These additional devices, processors, cores, and/or threads 190 may test model fitness in parallel based on the input data set 102 and may provide the resulting fitness values to the genetic algorithm 110.” [¶0024; Examiner is interpreting using these additional devices in parallel to be equivalent to performing unsupervised learning in parallel on the basis of degree of parallelism.]).

Regarding claim 14, Husain/Tomkins/Kim/Teera teaches The device of claim 11, where Husain teaches wherein the command causing the processor to acquire the first candidate solutions by performing architecture variation on the plurality of candidate solutions selected from the candidate solution set comprises a command for causing the processor to acquire the first candidate solution by merging two candidate solutions in the candidate solution set (“During a crossover operation 160, a portion of one model may be combined with a portion of another model, where the size of the respective portions may or may not be equal. When normalized vectors are used to represent neural networks, the crossover operation may include concatenating bits/bytes/fields 0 to p of one normalized vector with bits/bytes/fields p+1 to q of another normalized vectors, where p and q are integers and p+q is equal to the size of the normalized vectors… Thus, the crossover operation may be a random or pseudo-random operator that generates a model of the output set 130 by combining aspects of a first model of the input set 120 with aspects of one or more other models of the input set 120. For example, the crossover operation may retain a topology of hidden nodes of a first model of the input set 120 but connect input nodes of a second model of the input set to the hidden nodes.” [¶0048- ¶0049]).

Regarding claim 15, Husain/Tomkins/Kim/Teera teaches The device of claim 11, where Husain teaches wherein the command causing the processor to acquire the first candidate solutions by performing the architecture on the plurality of candidate solutions selected from the candidate solution set comprises a command causing the processor to acquire the first candidate solutions by performing at least one architecture variation method among weight modification, interneuron connection removal, interneuron connection addition, neuron removal, and neuron addition (“As another example, the mutation operation may cause one or more activation functions, aggregation functions, bias values/functions, and/or or connection weights to be modified.” [¶0051; note: The claim under BRI only requires at least one of the methods, thus the Examiner has cited a portion corresponding to a weight modification. However, see ¶0006, ¶0020 for interneuron connection removal/neuron removal and ¶0051 for interneuron connection addition/neuron addition.]).

Regarding claim 16, Husain/Tomkins/Kim/Teera teaches The device of claim 11, wherein the command causing the processor to acquire the second candidate solution by performing selective error propagation-based supervised learning on the first neural network model 
However Husain fails to explicitly teach comprises a command causing the processor to set a pseudo reverse weight matrix to finely tune weight matrices.
Tomkins teaches comprises a command causing the processor to set a pseudo reverse weight matrix to finely tune weight matrices (“[0043] (a). choosing a specific training function from a plurality of training functions; 
[0044] (b). inputting the set of input signals to the input nodes of the neural network;
[0045] (c). computing the set of output responses by propagating the set of input signals from the input nodes to the output nodes via the plurality of weighted synaptic links; 
[0046] (d). accumulating the total error between the set of output responses and the set of target signals; 
[0047] (e). invoking the specific training algorithm to adjust the weight values of the weighted synaptic links to minimize the total error; 
[0048] (f). calculating the fitness score; the fitness score being related to the total error; 
[0049] (g). repeating steps (b), (c), (d), (e) and (f) for a predetermined number of iterations unless the fitness score is smaller than a pre-defined criterion.” [note: ¶0022-¶0023, ¶0162 discloses “weight matrix”, the cited process would correspond to finely tuning weight matrices.).
Husain teaches automated neural network generation using a genetic algorithm approach. Tomkins teaches a genetic algorithm approach. Kim discloses a genetic algorithm using unsupervised learning. Teera teaches a method of training neural networks by finding branch points to exit inference early. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Husain’s/Kim’s/Teera’s teachings by using weight matrices and matrices representing neural network structures as taught by Tomkins. One would have been motivated to make this modification in order to generate weight matrices to find the best performing neural network model. [¶0007, Tomkins]

Regarding claim 20, Husain/Tomkins/Kim/Husain teaches The device of claim 11, where Husain teaches wherein the command causing the processor to analyze the weight densities of the first neural network model and set the path of selective error propagation-based supervised learning (“The backpropagation trainer may utilize a portion, but not all of the input data set 102 to train the connection weights of the trainable model, thereby generating the trained model. For example, the portion of the input data set 102 may be input into the trainable model, which may in turn generate output data. The input data set 102 and the output data may be used to determine an error value, and the error value may be used to modify connection weights of the model, such as by using gradient descent or another function.” [¶0044; using the error value to modify connection weights would be equivalent to updating weights.]).
While Husain teaches updating weights on the basis of error difference values of the first neural network model extracted along the path of selective error propagation-based supervised learning, the reference does not provide details of matrix/matrices.
Tomkins teaches weight matrix densities (“Note: for the convolution weight function, there are no "real" synaptic link connections. Instead a matrix operation is performed between the activation matrix and the weight matrix. Here, the weight matrix is dependent upon the numbers of layers connecting to layer(i) and the number of nodes in layer(i)” [¶0162])
Husain teaches automated neural network generation using a genetic algorithm approach. Tomkins teaches a genetic algorithm approach. Kim discloses a genetic algorithm using unsupervised learning. Teera teaches a method of training neural networks by finding branch points to exit inference early. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Husain’s/Kim’s/Teera’s teachings by using weight matrices and matrices representing neural network structures as taught by Tomkins. One would have been motivated to make this modification in order to generate weight matrices to find the best performing neural network model. [¶0007, Tomkins]

Claims 8, 9, 18, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Husain in view of Tomkins, Kim, and Teera and further in view of Paliwal et al. ("Assessing the contribution of variables in feed forward neural network", hereinafter "Paliwal").

Regarding claim 8, Husain/Tomkins/Kim/Teera teaches The method of claim 1, wherein the analyzing, by the processor, the weight matrix densities of the first neural network model and setting of the path of selective error propagation-based supervised learning 
However Husain/Tomkins/Kim/Teera fails to explicitly teach comprise analyzing, by the processor, the weight matrix densities by using an interquartile range.
Paliwal teaches comprise analyzing, by the processor, the weight matrix densities by using an interquartile range. (“In the proposed method, the empirical distribution of the net-work connection weights of the neural network model is obtained by training the network a number of times (say t). Every time the training is carried out, the initial weight for training the network is chosen randomly. The average of interquartile range of each of the network weights from input node to hidden nodes is calcu-lated for all hidden units for a given input node. For this reason, the method is referred to as interquartile range (IQR) method.” [pg. 3691, § 2.2 The proposed method, ¶1])
Husain/Tomkins both disclose genetic algorithms used to find the best model. Kim discloses a genetic algorithm using unsupervised learning. Teera teaches a method of training neural networks by finding branch points to exit inference early. Paliwal teaches a method of training a neural network using multiple methods to determine the importance of input variables. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the teachings of Husain/Tomkins/Kim/Teera to use an interquartile range to analyze the weight densities of a neural network as taught by Paliwal. One would have been motivated to make this combination in order to determine the importance of the corresponding input variables. [pg. 3691, § 2.1 Connection weight method - 2.2 The proposed method, ¶1, Paliwal]

Regarding claim 9, Husain/Tomkins/Kim/Teera teaches The method of claim 1, wherein the analyzing, by the processor, the weight matrix densities of the first neural network model and setting of the path of selective error propagation-based supervised learning 
However Husain/Tomkins/Kim/Teera fails to explicitly teach comprise analyzing, by the processor, the weight matrix densities by using an average or total sum of weights constituting weight matrices.
Paliwal teaches comprise analyzing, by the processor, the weight matrix densities by using an average or total sum of weights constituting weight matrices (“The connection weight method calculates the sum of product of raw weights of the connection from input node to hidden nodes with the connection from hidden node to output nodes for all input nodes. The larger the sum for a given input node, the more the importance of the corresponding input variable.” [pg. 3691, 2.1 Connection weight method, ¶1; note the claim under BRI recites “or” thus the examiner has provided a citation corresponding to using a total sum of weights. However, average of weights is cited on pg. 3691, § 2.2]).
Husain/Tomkins both disclose genetic algorithms used to find the best model. Kim discloses a genetic algorithm using unsupervised learning. Teera teaches a method of training neural networks by finding branch points to exit inference early. Paliwal teaches a method of training a neural network using multiple methods to determine the importance of input variables. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the teachings of Husain/Tomkins/Kim/Teera to use an average or total sum of weights to analyze the weight densities of a neural network as taught by Paliwal. One would have been motivated to make this combination in order to determine the importance of the corresponding input variables. [pg. 3691, § 2.1 Connection weight method - 2.2 The proposed method, ¶1, Paliwal]

Regarding claim 18, Husain/Tomkins/Kim/Teera teaches The device of claim 11, wherein the command causing the processor to analyze the weight matrix densities of the first neural network model and set of the path of selective error propagation-based supervised learning 
However Husain/Tomkins/Kim/Teera fails to explicitly teach comprises a command causing the processor to analyze the weight matrix densities by using an interquartile range.
Paliwal teaches comprises a command causing the processor to analyze the weight matrix densities by using an interquartile range. (“In the proposed method, the empirical distribution of the net-work connection weights of the neural network model is obtained by training the network a number of times (say t). Every time the training is carried out, the initial weight for training the network is chosen randomly. The average of interquartile range of each of the network weights from input node to hidden nodes is calcu-lated for all hidden units for a given input node. For this reason, the method is referred to as interquartile range (IQR) method.” [pg. 3691, § 2.2 The proposed method, ¶1])
Husain/Tomkins both disclose genetic algorithms used to find the best model. Kim discloses a genetic algorithm using unsupervised learning. Teera teaches a method of training neural networks by finding branch points to exit inference early. Paliwal teaches a method of training a neural network using multiple methods to determine the importance of input variables. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the teachings of Husain/Tomkins/Kim/Teera to use an interquartile range to analyze the weight densities of a neural network as taught by Paliwal. One would have been motivated to make this combination in order to determine the importance of the corresponding input variables. [pg. 3691, § 2.1 Connection weight method - 2.2 The proposed method, ¶1, Paliwal]

Regarding claim 19, Husain/Tomkins/Kim/Teera teaches The device of claim 11, wherein the command causing the processor to analyze the weight matrix densities of the first neural network model and setting of the path of selective error propagation-based supervised learning 
However Husain/Tomkins/Kim/Teera fails to explicitly teach comprises a command causing the processor to analyze the weight matrix densities by using an average or total sum of weights constituting weight matrices.
Paliwal teaches comprises a command causing the processor to analyze the weight matrix densities by using an average or total sum of weights constituting weight matrices (“The connection weight method calculates the sum of product of raw weights of the connection from input node to hidden nodes with the connection from hidden node to output nodes for all input nodes. The larger the sum for a given input node, the more the importance of the corresponding input variable.” [pg. 3691, 2.1 Connection weight method, ¶1; note the claim under BRI recites “or” thus the examiner has provided a citation corresponding to using a total sum of weights. However, average of weights is cited on pg. 3691, § 2.2]).
Husain/Tomkins both disclose genetic algorithms used to find the best model. Kim discloses a genetic algorithm using unsupervised learning. Teera teaches a method of training neural networks by finding branch points to exit inference early. Paliwal teaches a method of training a neural network using multiple methods to determine the importance of input variables. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the teachings of Husain/Tomkins/Kim/Teera to use an average or total sum of weights to analyze the weight densities of a neural network as taught by Paliwal. One would have been motivated to make this combination in order to determine the importance of the corresponding input variables. [pg. 3691, § 2.1 Connection weight method - 2.2 The proposed method, ¶1, Paliwal]



Response to Arguments
	Regarding the 35 U.S.C. § 101 Rejection:
Applicant’s arguments regarding the 101 rejection on pgs. 8-9 has been considered and are persuasive. Amendments to the claims appear to have overcome the 101 rejection and thus the rejection has been withdrawn.

Regarding the 35 U.S.C. § 103 Rejections:
Applicant’s arguments regarding the newly amended limitations in particular, the limitations of “performing, by the processor, unsupervised learning on each of neural network models represented by respective first candidate solutions and calculating effective performances of the neural network models, identifying, by the processor, a layer having relatively high importance based on the weight matrix densities; and setting, by the processor, an identified layer as a branch point and setting a path of the selective error propagation-based supervised learning from the branch point.” has been considered but are moot because the amended limitations are now taught by the newly presented arts of Kim and Teera. Please see the updated 103 rejection above. 

Applicant’s arguments with respect to the rejections of the dependent claims have been fully considered but they are not persuasive as they rely upon the allowability of the independent claims.

Conclusion
Applicant's amendment necessitated the new grounds of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL H HOANG whose telephone number is (571)272-8491. The examiner can normally be reached Mon-Fri 8:30AM-4:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571) 272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/M.H.H./Examiner, Art Unit 2122                                                                                                                                                                                                        

/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122