DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Amendments
Claims 1-3, 11-14, 16, and 18-19 are amended. Claims 1-20 are pending and have been considered.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(d):
(d) REFERENCE IN DEPENDENT FORMS.—Subject to subsection (e), a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.

The following is a quotation of pre-AIA  35 U.S.C. 112, fourth paragraph:
Subject to the following paragraph [i.e., the fifth paragraph of pre-AIA  35 U.S.C. 112], a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.

Claims 2 is rejected under 35 U.S.C. 112(d) or pre-AIA  35 U.S.C. 112, 4th paragraph, as being of improper dependent form for failing to further limit the subject matter of the claim upon which it depends, or for failing to include all the limitations of the claim upon which it depends for the following reasons. The use of the conjunction “or” in claim 1 is interpreted as shorthand for three claims in one: (1) determining computational costs of training, (2) determining computational costs of testing, and (3) determining computational costs of both training and testing. Claim 2 does not further limit the “determining” (i.e., Claim 2 does not require determining training costs and determining testing costs). Claim 2 only requires that the costs that could be determined include training costs and testing costs, 

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 11-16 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
CLAIM 11
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The claim recites the following limitations:
(1) perform an iterative model-growing process that involves modifying parent models to obtain child models, the iterative model-growing process comprising 
(2) selecting specific candidate layers from a plurality of candidate layers to include in the child models for subsequent training, 
(3) [selecting] the specific candidate layers from the plurality of candidate layers based at least on weights learned in an initialization process of where the plurality of candidate layers are initialized when connected to the parent models; and 
(4) [selecting] the final model from the child models
These limitations are abstract ideas of the “mental process” grouping which can reasonably be performed in one’s mind with the aid of pencil and paper. Each limitation is a judgement. Regarding the second limitation, Applicant should note that selecting layers to include in models for training is not a positive recitation of training the models. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: This judicial exception is not integrated into a practical application. The claim recites the following additional elements:
a hardware processing unit
a storage resource 
computer-readable instructions
output a final model
A hardware processing unit, a storage resource, and computer-readable instructions are mere instructions to apply the judicial exceptions as discussed in MPEP 2106.05(f). Outputting a final model is mere-data gathering, an insignificant extra-solution activity as discussed in MPEP 2106.05(g). 
Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2. The outputting a final model is well-understood, routine, conventional activity of receiving or transmitting data over a network. See MPEP 2106.05(d)(II)(i): 
The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network, e.g., using the Internet to gather data
The claim is not patent eligible.


CLAIM 12 incorporates the rejection of claim 11. 
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The judicial exceptions of claim 11 are incorporated. The claim recites the following limitations:
(1) generate different candidate layers that share connectivity to a particular parent model and perform different operations; 
(2) initialize the different candidate layers together with the particular parent model to obtain different weights for the different candidate layers.
These limitations are abstract ideas of the “mental process” grouping which can be performed in one’s mind with the aid of pencil and paper. Specifically the first limitation involves creating and connecting candidate layer, and the second limitation involves setting weights for the layers. Accordingly, the claim recites an abstract idea. 
Step 2A Prong 2: This judicial exception is not integrated into a practical application because the claim recites no additional elements. Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2.The claim is not patent eligible.

CLAIM 13 incorporates the rejection of claim 11.
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The judicial exceptions of claim 11 are incorporated. The claim further limits the judicial exceptions of claim 11. Applicant should note that the phrase “on which the subsequent training is performed” is not a positive recitation of training a layer or model. 
Step 2A Prong 2: This judicial exception is not integrated into a practical application because the claim recites no additional elements. Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2.The claim is not patent eligible.

CLAIM 14 incorporates the rejection of claim 11.
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The judicial exceptions of claim 11 are incorporated. The claim recites the following limitations:
(1) perform a feature selection technique on the weights of the plurality of candidate layers 
(2) select the specific candidate layers for inclusion in the child models for subsequent training.
The first limitation is an abstract idea of the “mathematical calculation” grouping. The second limitation is an abstract idea of the “mental process” grouping which can be performed in one’s mind with the aid of pencil and paper, specifically a judgement. Applicant should note that selecting layers for inclusion in models for subsequent training is not a positive recitation of training the models. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: This judicial exception is not integrated into a practical application because the claim recites no additional elements. Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2.The claim is not patent eligible.

CLAIM 15 incorporates the rejection of claim 14.
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The judicial exceptions of claim 14 are incorporated. The claim further limits the “mathematical calculation” judicial exception of claim 14. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: This judicial exception is not integrated into a practical application because the claim recites no additional elements. Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2.The claim is not patent eligible.

CLAIM 16 incorporates the rejection of claim 13.
Step 1: The claim recites a system, one of the four categories of eligible subject matter.
Step 2A Prong 1: The judicial exceptions of claim 13 are incorporated. The claim recites the following limitations:
(1) performing different operations including at least convolution operations and pooling operations.
This limitation is an abstract idea of the “mathematical calculation” grouping. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: This judicial exception is not integrated into a practical application because the claim recites no additional elements. Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the reasons given in Step 2A Prong 2.The claim is not patent eligible.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 11-13 and 17-18 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Prellberg et al. (“Lamarckian Evolution of Convolutional Neural Networks”).
	Regarding Claim 11, Prellberg teaches: A system comprising: 
a hardware processing unit; and (§4.3: Nvidia K40 GPU)
a storage resource storing computer-readable instructions which, when executed by the hardware processing unit, cause the hardware processing unit to: (Experiments in §4 are evidence of a storage resource storing instructions)
perform an iterative model-growing process (P. 4, §3.1 teaches “the network is trained for e epochs” [iterations] and P. 7, ln. 6 teaches e = 16) that involves modifying parent models to obtain child models, the iterative model-growing process comprising selecting specific candidate layers from a plurality of candidate layers to include in the child models (In P. 4, §3.1 ¶ 1, by selecting a child network, all the layers of the child network are necessarily selected as candidate layers. Weights are inherited from parent model, see P. 6, §3.3) 
for subsequent training, (P. 4, §3.1 ¶ 1: “the [child] network is trained for e epochs”)
the specific candidate layers being selected from the plurality of candidate layers based at least on weights (The broadest reasonable interpretation of this limitation is that the entire child network as taught by Prellberg is selected to replace the parent based at least in part on its weights. Its weights determine at least in part whether the network has the best fitness – P. 6, §3.3 states, “However, once a network has been evaluated its weights contain useful, learned values. When the mutation operator is applied, most of these weights are kept intact.”)
[weights] learned in an initialization process where the plurality of candidate layers are initialized when connected to the parent models; and (P. 6, §3.3, states, “However, once a network has been evaluated its weights contain useful, learned values. When the mutation operator is applied, most of these weights are kept intact [inherited]. The mutations add block, remove block and change stride do not influence existing weights so that all of them can be reused. The additional weights that belong to the convolutional and batch-normalization layers created by add block are randomly initialized. However, the mutations add filters, remove filters and change kernel size influence the shapes of some existing weights. For example, since the shape of a convolutional layer’s kernel depends on its input and output shape, adding filters to a layer also affects the successive layer. In such cases, the affected weights are randomly reinitialized, while all other weights are reused.”
Under the broadest reasonable interpretation of this limitation in light of the specification, Examiner interprets a weight initialization process where candidate layers are initialized when connected to the parent models as randomly initializing or randomly reinitializing weights according to Prellberg §3.3 and inheriting weights according to Prellberg §3.3. )
output a final model, the final model being selected from the child models. (P. 7, ln. 6 teaches 16 epochs, after which a final model is obtained. The child model is output at the end of Algorithm 1 on P. 5). 

	Regarding Claim 12, Prellberg teaches: The system of claim 11, wherein the computer- readable instructions, when executed by the hardware processing unit, cause the hardware processing unit to: 
generate different candidate layers that share connectivity to a particular parent model and perform different operations; and initialize the different candidate layers together with the particular parent model to obtain different weights for the different candidate layers. (Under BRI, this limitation encompasses sequentially adding a first candidate layer to a parent model, initializing the first candidate layer with the parent model, reverting to the model to the original parent model, adding a different second candidate layer to the parent model, and initializing the second candidate layer with the parent model. §3.1 teaches that Algorithm 1 applies a random mutation to the parent to create a child network, and if the child’s fitness is worse and no better network is found during niching, then optimization proceeds with the original parent. §3.2 teaches that one of the possible random mutations is adding a building block containing a convolution layer. 
Regarding the limitation “perform different operations”, new convolution layers are initiated with random weights, so two different convolution layers will perform different computations, i.e. operations.
Examiner interprets a weight initialization process where candidate layers are initialized when connected to the parent models as randomly initializing or randomly reinitializing weights according to Prellberg §3.3 and inheriting weights according to Prellberg §3.3.)

	Regarding Claim 13, Prellberg teaches: The system of claim 11 wherein respective child models inherit structures of respective parent models and (P. 6, §3.3, states, “However, once a network has been evaluated its weights contain useful, learned values. When the mutation operator is applied, most of these weights are kept intact [inherited].)
include respective specific candidate layers on which the subsequent training is performed. (§3.1 teaches, “Next, the child network’s fitness is evaluated. This means the network is trained for e epochs”)
Regarding claim 17, Prellberg teaches: The system of claim 11, wherein the computer-readable instructions, when executed by the hardware processing unit, cause the hardware processing unit to: 
train the final model using training data for at least one classification, machine translation, or pattern recognition task; and (P. 7, §4.2: the models are trained for image classification on CIFAR-10 and CIFAR-100 datasets.)
provide the final model for execution, the final model being adapted to perform the at least one classification, machine translation, or pattern recognition task. (P. 8, Fig. 2 is evidence of final models after e epochs.)

Regarding Claim 18, Prellberg teaches: A computer-readable storage medium storing instructions which, when executed by a processing device, cause the processing device to perform acts comprising: (§4.3 teaches Nvidia K40 GPU. Experiments in §4 are evidence of a storage resource storing instructions)
performing two or more iterations of an iterative model-growing process, the iterative model-growing process comprising: (P. 4, §3.1 teaches “the network is trained for e epochs” [iterations] and P. 7, ln. 6 teaches e = 16)
selecting a particular parent model from a parent model pool of one or more parent models; (P. 4, § 3.1, ¶ 1: “An initial network consisting of a single convolutional layer” is the parent model.)
initializing a plurality of candidate layers that are connected to the particular parent model; (§3.1 teaches that Algorithm 1 applies a random mutation to the parent to create a child network, and if the child’s fitness is worse and no better network is found during niching, then optimization proceeds with the original parent. Examiner interprets initializing a plurality of candidate layers as sequentially initializing a first candidate layer connected the parent model, and as a result of the child’s fitness being worse and no better network found during niching, initializing a second candidate layer connected to the same parent model.)
selecting a subset of the plurality of candidate layers for subsequent training  based at least on weights learned when initializing the plurality of candidate layers; (The English Oxford Dictionary defines subsequent as “Following or succeeding in time; existing or occurring after something expressed or implied; coming or happening later.” The claim does not preclude the “subsequent training” from being different from the training in the next limitation “training a plurality of child models…”. The “subsequent training” may happen when the next parent model, having been chosen from the current plurality of candidate layers, is training the next plurality of candidate layers. It then follows that at least one of the current plurality of candidate layers is selected as a subset.
§ 3.3 states, “The additional weights that belong to the convolutional and batch-normalization layers created by add block are randomly initialized.” Examiner broadly interprets the randomly initialized weights as learned weights. The randomly initialized weight determine if the candidate layer joins the next parent model. 
training a plurality of child models to obtain trained child models, respective child models inheriting a structure of the particular parent model and including at least one candidate layer from the selected subset of candidate layers; and (§3.1 teaches: “Next, the child network’s fitness is evaluated. This means the network is trained for e epochs”. The inherited structure is interpreted as the inherited weights from the parent model. P. 6, §3.3, states, “However, once a network has been evaluated its weights contain useful, learned values. When the mutation operator is applied, most of these weights are kept intact [inherited].” Finally, at least one candidate layer from the selected subset of candidate layers must necessarily be included because the subset contains the candidate layer to join the next parent model, as interpreted above.)
designating an individual trained child model as a new parent model based at least in part on one or more criteria and adding the new parent model to the parent model pool; and (§3.1 teaches: “Next, the child network’s fitness is evaluated. This means the network is trained for e epochs and the validation set accuracy is returned as its fitness. If the child’s fitness is greater than the parent’s fitness, the child replaces the parent”. The one or more criteria is interpreted as the validation set accuracy. The parent model pool is interpreted as consisting only of the new parent model.)
after the two or more iterations, selecting at least one trained child model as a final model and outputting the final model. (P. 7, ln. 6 teaches 16 epochs, after which a final model is obtained. The child model is output at the end of Algorithm 1 on P. 5)

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-5 and 7-10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Prellberg, in view of “ADONN: Adaptive Design of Optimized Deep Neural Networks for Embedded Systems” (Aug. 2018) to Loni et al., hereinafter Loni.

	Regarding Claim 1, Prellberg teaches: A method performed on a computing device (§4.3: Nvidia K40 GPU), the method comprising:
performing two or more iterations of an iterative model-growing process (P. 4, §3.1 teaches “the network is trained for e epochs”  [an epoch implies multiple iterations] and P. 7, ln. 6 teaches e = 16), the iterative model-growing process comprising: 
selecting a particular parent model from a parent model pool of one or more parent models; (P. 4, § 3.1, ¶ 1: “An initial network consisting of a single convolutional layer” is the parent model. The parent model pool consists of the current parent model.)
generating a plurality of candidate layers and initializing the plurality of candidate layers while reusing learned parameters of the particular parent model; (P. 4, §3.1 ¶ 1: “First, a random mutation from the set of possible mutations is applied to the parent to create a child network”, where the child network is interpreted as a plurality of candidate layers.)
selecting particular candidate layers to include in child models for training, respective child models including the particular parent model and one or more of the particular candidate layers; (In P. 4, §3.1 ¶ 1, by selecting a child network, the layers of the child network are necessarily selected as candidate layers.) 
training the child models to obtain trained child models; (P. 4, §3.1 ¶ 1: “This means the network is trained for e epochs”) 
…
selecting an individual trained child model as a new parent model and adding the new parent model to the parent model pool; and (P. 4, §3.1 ¶ 1: “If the child’s fitness is greater than the parent’s fitness, the child replaces the parent.” The parent model pool consists of the current parent model.)
after the two or more iterations, selecting at least one trained child model as a final model and outputting the final model. (P. 7, ln. 6 teaches 16 epochs, after which a final model is obtained. The child model is output at the end of Algorithm 1 on P. 5)

	However, Prellberg does not explicitly teach: determining computational costs of training or testing the trained child models; and selecting a child model based at least on the computational costs of training or testing the trained child models,
	But Loni teaches: determining computational costs of training or testing the trained child models; and selecting a child model based at least on the computational costs of training or testing the trained child models, (Loni teaches a computational cost being a network size in the Abstract: “ADONN also considers the network size factor as the second objective to build a highly optimized network fitting with limited computational resource budgets while delivers comparable accuracy level.” Fig. 2 on p. 399 shows a Pareto frontier with the network size on the y-axis as an objective for selecting a child model. Algorithm 1 on p. 401 teaches determining computational cost of training and testing the child network as determining the size of the child network as shown by the function Objective_Function.)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings of Loni’s system into Prellberg’s system by determining a network size of Prellberg’s trained child models during training and testing, with a motivation to lower the need for network bandwidth and increase privacy and power efficiency, as well as guaranteeing worst case response-times. (Abstract: “Instead, we intend to find a near-sensor processing solution which will lower the need for network bandwidth and increase privacy and power efficiency, as well as guaranteeing worst case response-times.”)

	Regarding Claim 2, the Prellberg/ Loni combination teaches: The method of claim 1,
Further, Loni teches: wherein the computational costs include training costs of training the trained child models and testing costs of testing the trained child models. (Loni teaches a computational cost being a network size in the Abstract: “ADONN also considers the network size factor as the second objective to build a highly optimized network fitting with limited computational resource budgets while delivers comparable accuracy level.” Fig. 2 on p. 399 shows a Pareto frontier with the network size on the y-axis as an objective for selecting a child model. Algorithm 1 on p. 401 teaches determining computational cost of training and testing the child network as determining the size of the child network as shown by the function Objective_Function.)

	Regarding Claim 3, the Prellberg/Loni combination teaches: The method of claim 2, 
	Further, Prellberg teaches: further comprising: determining losses associated with the trained child models; and (Prellberg p. 6, § 4.1, ¶ 2: “Choosing e is a trade-off between evaluation speed and the accuracy of the fitness assessment”. Examiner interprets accuracy as a measure of losses.)
selecting the individual trained child model as the new parent model and adding the new parent model to the parent model pool based at least on the losses. (Prellberg p. 4, §3.1 ¶ 1: “If the child’s fitness is greater than the parent’s fitness, the child replaces the parent.” The parent model pool consists of the current parent model.)

Regarding Claim 4, the Prellberg/Loni combination of Claim 3 teaches: The method of claim 3, 
However the Prellberg/Loni combination so far does not explicitly teach: further comprising: plotting the child models on a graph having a first axis reflecting the computational costs and a second axis reflecting the losses; and selecting the new parent model based at least on a corresponding location of the new parent model on the graph.
	But Loni teaches: further comprising: plotting the child models on a graph having a first axis reflecting the computational costs and a second axis reflecting the losses; and (Loni on p. 399 shows in Fig. 2 (reproduced/annotated below) the network size on the y-axis and losses on the x-axis, where error = 1-accuracy is interpreted as losses.

    PNG
    media_image1.png
    539
    683
    media_image1.png
    Greyscale

Loni, Fig. 2

selecting the new parent model based at least on a corresponding location of the new parent model on the graph. (Loni P. 399, col. 1, top: “Moreover, by doing crowding distance sorting, we can orchestrate the density of solution for each Pareto front. NSGA-II selects the best N candidates for generating the next population called Pt+1. This procedure is repeated for the next generations until exceeds a predefined maximum number of generations or satisfies developer’s criterion including a desired level of accuracy/network size”)
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings of Loni’s system into the combination of Prellberg and Loni’s system of Claim 3 by plotting computational cost against losses with a motivation of selecting the best candidates for generating the next population (Loni P. 399, col. 1, top).

Regarding Claim 5, the Prellberg/ Loni combination of Claim 4 teaches: The method of claim 4
Further, Loni teaches: further comprising: determining at least one of a lower convex hull or a Pareto frontier on the graph; and (Loni Fig. 2 shows a Pareto frontier.)
selecting the new parent model based at least on proximity of the new parent model to the lower convex hull or the Pareto frontier. (Loni P. 399, col. 2, bottom: “(3) The NSGA-II sorts the combination… to find the next generation parent population of N acceptable individuals which cannot dominant each other in terms of accuracy and network size.”)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings of Loni’s system into the combination of Prellberg and Loni’s system of Claim 4 by selecting the new parent model based on proximity to the Pareto frontier with a motivation of improving accuracy (losses) and network size (cost) (Loni P. 399, col. 2, bottom).

Regarding Claim 7, the Prellberg/ Loni combination teaches: The method of claim 1, 
Further, Prellberg teaches: wherein generating an individual candidate layer comprises: selecting a target layer from the particular parent model to receive outputs of the individual candidate layer; (P. 5 ¶ 1 states that mutations are picked randomly from a list containing the “add block” choice which adds a convolution layer at a random position. The target layer comes after the added convolution layer.)
selecting one or more input layers from the particular parent model to provide inputs to the individual candidate layer; and (On P. 4, §3.1 ¶ 1, by default the first parent’s convolutional layer is selected as an input layer. Further, a layer preceding the randomly added block becomes an input layer.)
selecting a particular operation to be performed by the individual candidate layer on the inputs. (P. 5 teaches a list of mutations operators to apply to the child network (candidate layer).)

Regarding Claim 8, the Prellberg/Loni combination of Claim 7 teaches: The method of claim 7, 
Further, Prellberg teaches: the selecting the particular operation comprising: defining a group of operations; and (P. 5 teaches a list of mutations operators to apply to the child network (candidate layer).)
randomly selecting the particular operation from the group of operations. (P. 5 ¶ 1: “Mutations are picked randomly from the list below”.)

Regarding Claim 9, the Prellberg/Loni combination of Claim 7 teaches: The method of claim 7, 
Further, Prellberg teaches: further comprising: selecting the target layer and at least one input layer randomly from the particular parent model. (P. 5: Target and input layers are effectively selected at random.)

Regarding claim 10, the Prellberg/Loni combination teaches: The method of claim 1, 
Further, Prellberg teaches: the final model being a neural network. (Algorithm 1 on p. 5 shows the algorithm outputs a final neural network)

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Prellberg, in view of Loni, and further in view of “MOGA: Multi-Objective Genetic Algorithms” (1995) to Murata et al., hereinafter Murata.

Regarding claim 6, the combination of Prellberg and Loni of Claim 5 teaches: The method of claim 5, 
Further, Loni teaches: wherein the selecting comprises: identifying a subset of the trained child models that are within a predetermined vicinity of the lower convex hull or the Pareto frontier; (Loni P. 399, col. 2, bottom: “(3) The NSGA-II sorts the combination… to find the next generation parent population of N acceptable individuals which cannot dominant each other in terms of accuracy and network size.” The subset of trained child models are on the Pareto frontier.)
However, the combination of Prellberg and Loni does not explicitly teach: determining respective probabilities for the subset of the trained child models; and selecting the new parent model based at least on the respective probabilities.
But Murata teaches: determining respective probabilities for the subset of the trained child models; and (Murata on p. 2, col. 2 step 2 teaches calculating selection probability P(x):

    PNG
    media_image2.png
    186
    361
    media_image2.png
    Greyscale


 	selecting the new parent model based at least on the respective probabilities. (Murata on P. 3, col. 1, step 7 teaches selecting the best solution:

    PNG
    media_image3.png
    82
    350
    media_image3.png
    Greyscale

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings of Murata’s system into the combination of Prellberg and Loni’s system by assigning a selection probability to a string with a motivation of finding the Pareto optimal solutions. (Murata P. 2, col. 1, top: “one general approach is to show the set of Pareto optimal solutions to the decision maker”)

Claims 14 and 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Prellberg in view of “Feature Selection with a Genetic Algorithm for Classification of Brain Imaging Data” (2017) to Szenkovits et al., hereinafter Szenkovits.

Regarding Claim 14, Prellberg teaches: The system of claim 11, wherein the computer- readable instructions, when executed by the hardware processing unit, cause the hardware processing unit to: perform a… technique on the weights of the plurality of candidate layers to select the specific candidate layers for inclusion in the child models for subsequent training. (In Prellberg § 3.3, the weights of different layers are interpreted as features. §3.1 teaches, “Next, the child network’s fitness is evaluated. This means the network is trained for e epochs”.)

However, Prellberg does not explicitly teach: feature selection
But Szenkovits teaches: feature selection (Szenkovits teaches feature selection by LASSO technique on P. 6, ¶ 3: “this method [LASSO] can be seen as a feature selection technique”)
LASSO for feature selection is in the same field of endeavor as the claimed invention (i.e., iterative and evolutionary algorithms). Szenkovits compares LASSO with an evolutionary algorithm for feature selection. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings of Szenkovits’ system into Prellberg’s system by selecting features in Prellberg’s system using the LASSO technique taught by Szenkovits, with a motivation of classifying images (Szenkovits pp. 6-7: “Indeed, it [LASSO] has been shown to be successful for classification tasks related to brain networks in cases where the number of features is 10 to 50 times larger than the number of instances.”)

	Regarding Claim 15, the Prellberg/ Szenkovits combination teaches: The system of claim 14, 
Further, Szenkovits teaches: the feature selection technique comprising least absolute shrinkage and selection operator (LASSO). (Szenkovits teaches feature selection by LASSO technique on P. 6, ¶ 3: “this method [LASSO] can be seen as a feature selection technique”)

Claim 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Prellberg, in view of Sun et al. (“Automatically Evolving CNN Architectures Based on Blocks”).

Regarding Claim 16, Prellberg teaches: The system of claim 13, individual candidate layers performing different operations including at least convolution operations (P. 4, Fig. 1 teaches a candidate layers performing Conv2D)
Prellberg Fig. 1 shows a Global Average Pooling layer in the network. However, it is not one of the possible candidate layers in the add block mutation on p. 5. Thus Prellberg does not explicitly teach: individual candidate layers performing… pooling operations.
	But Sun teaches: individual candidate layers performing… pooling operations (Sun P. 4 §3.2: “The proposed encoding strategy aims at effectively modelling CNNs with different architectures by individuals in the used GA [genetic algorithm]. Typically, the architecture of a CNN is decided by multiple convolutional layers, pooling layers and fully-connected layers with a particular order, as well as their parameter settings. In the proposed algorithm, CNNs are constructed based on RBs, DBs and pooling layers”
Sun P. 4, col. 2, ¶ 2: “Accordingly, the proposed encoding strategy is based on three different types of units and their positions in the CNNs. The units are the RB Unit (RBU), the DB Unit (DBU) and the Pooling layer Unit (PU)… a PU is composed of only a single pooling layer.” 
Sun P. 5, § 3.4: “In the proposed algorithm, the available mutation types are: • Adding (adding an RBU, adding a DBU, or adding a PU [Pooling layer Unit] to the selected position);”)

Sun is in the same field of endeavor as the claimed invention, namely evolutionary neural networks. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings of Sun’s system into Prellberg’s system by including a mutation type add block containing a pooling layer and removing the Global Average Pooling layer. A motivation is to reduce the need for a base network which was manually designed based on expertise (“For example, EAS takes effect on a base network which already has fairly good performance on the investigated problem. However, the base network is manually designed based on expertise. Block-QNN-S only designs several small networks, and these networks are then integrated into a larger CNN framework. However, the other types of layers, such as the pooling layers, need to be assigned into the CNN framework with expertise.”)

Claims 19 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Prellberg in view of “Evolving Deep Neural Networks” (Mar. 2017) to Miikkulainen et al., hereinafter Miikkulainen.

	Regarding Claim 19, Prellberg teaches: The computer-readable storage medium of claim 18, the acts further comprising:
	Prellberg teaches: while stabilizing other weights of the particular parent model. (P. 6, §3.3, states that parent models weights are “kept intact” (“However, once a network has been evaluated its weights contain useful, learned values. When the mutation operator is applied, most of these weights are kept intact.”)) 

However Prellberg does not explicitly teach: concurrently initializing the plurality of candidate layers to obtain the weights
But Miikkulainen teaches: concurrently initializing the plurality of candidate layers to obtain the weights (Miikkulainen on P. 3, col 1, end teaches: “two populations of modules and blueprints are evolved separately, using the same methods as described above for DeepNEAT” [p. 3, col. 1, bottom]. Miikkulainen on P. 4, §4.2, ¶2 teaches initializing “uniformly random initial connection weights within 
[-0.05, 0.05].”)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings of Miikkulainen’s system into Prellberg’s system by creating a plurality of layers and initializing their weights, with a motivation to automatically design deep learning neural networks. (Miikkulainen P. 1, col. 2, ¶ 3: “This paper develops an approach for automatic design of DNNs.”)

	Regarding Claim 20, the Prellberg/Miikkulainen combination teaches: The computer-readable storage medium of claim 19, 
Prellberg teaches: the acts further comprising: randomly selecting operations from a group of enumerated operations to include in the plurality of candidate layers. (Prellberg P. 5 ¶ 1: “Mutations are picked randomly from the list below”.)


Response to Arguments
Claim Rejections under 35 U.S.C. § 101
Applicant's arguments filed 04/29/2021 have been fully considered but they are not persuasive. Applicant argues in Remarks p. 9-11 that (1) in the context of revised Step 2A, an additional element may have integrated the exception into a practical application if the additional element reflects an improvement in the functioning of a computer, or an improvement to other technology or technical field; (2) claim 11 has been amended to recite “subsequent training”; and (3) Applicant’s invention saves space needed for storing child models. 
Examiner will respond to Applicant’s arguments in order. (1) The additional element of a computer in Claim 11 is mere instructions to apply the exception as discussed in MPEP 2106.05(f). (2) Selecting layers to include in models for training is not a positive recitation of training the models. (3) Claim 11 is not an improvement to the functioning of a computer. The claim is directed to improving an iterative model-growing process which is an abstract idea, and improving an abstract idea does not make a claim eligible under 35 U.S.C. 101.

Claim Rejections under 35 U.S.C. §§ 102 and 103
Applicant’s arguments with respect to claim 1 have been considered but they are unpersuasive. The prior art of record Loni et al. teaches determining the size of a network during the training and the testing of a child network and selecting a child model based on the size of the network. Therefore, Loni teaches the Claim 1 amendment not taught by Prellberg et al.
Regarding Claim 3, the amendment has been rejected over Prellberg using the same citation as the second-to-last limitation in Claim 1.
Applicant's arguments with respect to claim 18 have been fully considered but they are not persuasive. Applicant argues that Prellberg’s mutation operations are selected randomly and that it appears that all of Prellberg’s mutations are fully trained, rather than selecting a subset of mutations for subsequent training based on weights learned when initializing the mutations. 
Examiner response to claim 18 arguments: The English Oxford Dictionary defines subsequent as “Following or succeeding in time; existing or occurring after something expressed or implied; coming or happening later.” The claim does not preclude the “subsequent training” from being different from the training in the next limitation “training a plurality of child models…”. The “subsequent training” may happen when the next parent model, having been chosen from the current plurality of candidate layers, is training the next plurality of candidate layers. It then follows that at least one of the current plurality of candidate layers is selected as a subset.
Prellberg § 3.3 states, “The additional weights that belong to the convolutional and batch-normalization layers created by add block are randomly initialized.” Examiner broadly interprets the randomly initialized weights as learned weights. The randomly initialized weight determine if the candidate layer joins the next parent model.  The rejections of Claims 11-13 and 17-18 are maintained.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Bolukbasi et al. (“Adaptive Neural Networks for Fast Test-Time Prediction”) teaches, on p. 3 col. 1-2, testing cost by the decision function                         
                            
                                
                                    γ
                                
                                
                                    k
                                
                            
                        
                     that determines whether the example should exit the network early with a label of                         
                            
                                
                                    
                                        
                                            y
                                        
                                        ^
                                    
                                
                                
                                    k
                                
                            
                            (
                            x
                            )
                        
                    or proceed to the next layer for further evaluation. The decision function                         
                            
                                
                                    γ
                                
                                
                                    k
                                
                            
                        
                     is minimized in Equation 2. 
Justus et al. (“Predicting the Computational Cost of Deep Learning Models”) predicts computational costs.
Wong et al. (US 20180018555 A1) teaches a process for iteratively generating a neural network model which accounts for a desired size of the model. 

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Asher Jablon whose telephone number is (571)270-7648.  The examiner can normally be reached on Monday - Friday, 9:00 am - 6:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571) 272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/ASHER JABLON/Examiner, Art Unit 2122                                                                                                                                                                                                        
/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122