DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 11/30/2021 has been entered.

Amendments
Per Applicant’s request, claims 1-2, 11-13, and 18-19 are amended. Claim 21 is new. Claims 1-21 are pending and have been considered.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-10 and 18-21 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter 
	Claim 1, lines 11-12 recite “learned parameters of the particular parent model”. The claim does not disclose any training of the parent model. For purposes of examination, Examiner interprets this limitation to mean the parent model has been trained.
	Claim 1, line 3 and second-to-last line recite “two or more iterations”. Since the claim recites both training iterations and an iterative model-growing process, it is unclear to Examiner what constitutes an iteration. For examining purposes, Examiner interprets an iteration to mean any iteration that modifies the model as an iteration of model growing.
	Claims 2-10 are rejected for failing to cure the deficiencies of claim 1 upon which they depend. 
	Claim 18, line 4 and the second-to-last line recite “two or more iterations”. Since the claim recites both training iterations and an iterative model-growing process, it is unclear to Examiner what constitutes an iteration. For examining purposes, Examiner interprets an iteration to mean any iteration that modifies the model as an iteration of model growing.
	Claims 19-21 are rejected for failing to cure the deficiencies of claim 18 upon which they depend. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:


The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-5, 7-14, and 16-20 are rejected under 35 U.S.C. 103 as being unpatentable over Kobayashi (US 20180365557 A1) in view of Merity et al. (US 20180336453 A1) and Prellberg et al. (“Lamarckian Evolution of Convolutional Neural Networks” arXiv version 1, see IDS filed 06/02/2020).

Regarding CLAIM 1, Kobayashi teaches: A method performed on a computing device, the method comprising: 
performing two or more iterations of an iterative model-growing process, the iterative model-growing process comprising: (Para. [0078], lines 1-6 teaches that networks may be generated from a seed network or a Pareto optimal network. Since the Pareto optimal solution had been generated in a previous iteration (para. [0075], lines 6-10), at least two iterations must necessarily be performed.)
selecting a particular parent model from a parent model pool of one or more parent models; (¶ [0085], lines 4-end; ¶ [0113], lines 4-end)
generating a plurality of candidate layers and… (¶ [0095] teaches generating new layers such as Relu activation function. ¶ [0058]-[0059] and Fig. 1 teach generating activation functions Tanh and Abs. ¶ [0128], ¶ [0131] and Figs. 10A and 10B teach generating Conv, Pool, Abs, Relu, and Dropout.)
selecting less than all of the plurality of candidate layers as particular candidate layers to include in child models for training, respective child models including the particular parent model and one or more of the particular candidate layers; (Kobayashi teaches several candidate layer types including:
Tanh and Abs in ¶ [0058]-[0059] and Fig. 1
Conv, Pool in ¶ [0128] and Fig. 10A
Conv, Abs, Relu, and Dropout in ¶ [0131] and Fig. 10B
Fig. 6 and ¶ [0102], lines 1-4 teaches the generating unit may perform multiple layer insertions at the same time. Since it is possible for a network to include the plurality of candidate layers (i.e., Tanh, Abs, Conv, Pool, Relu, and Dropout), each mutated network in Figs. 1, 10A, and 10B is a network with less than all of the plurality of candidate layers.)
training the child models using one or more subsequent training iterations to obtain trained child models; (¶ [0079]-[0080] discloses training the generated neural networks.)
determining computational costs of training or testing the trained child models; and (¶ [0109] discloses determining a calculation amount as the number of multiply-add operations; Fig. 9 displays error vs. computational cost for both training and validation for various networks. Fig. 9 is discussed starting at ¶ [0117].)
based at least on the computational costs of training or testing the trained child models, selecting an individual trained child model as a new parent model and adding the new parent model to the parent model pool; and (¶ [0078], lines 1-6 teaches that child networks may be generated from a Pareto optimal parent network; ¶ [0109]; ¶ [0113], lines 4-8;  ¶ [0118], last 3 lines teach Pareto optimal solutions are based on computational costs of testing.)
after the two or more iterations, selecting at least one trained child model as a final model and outputting the final model. (¶ [0124] - [0125])
	However, Kobayashi does not explicitly teach: 
		initializing the plurality of candidate layers using one or more training iterations while the plurality of candidate layers are connected to the particular parent model, the plurality of candidate layers being initialized while reusing learned parameters of the particular parent model; 
	But Merity teaches initializing the plurality of candidate layers using one or more training iterations while the plurality of candidate layers are connected to the particular parent model, (Fig. 5 and ¶ [0039]-[0043] discloses an iterative process for growing an RNN. ¶ [0034], lines 5-9 disclose initializing weights required by a node; weights are further taught in claim 7 and ¶ [0024], line 4 “Wx+b” where W is a weight vector. ¶ [0046], lines 1-5 teach that child models may replace the source node in the growing process. Training occurs in Fig. 4, step 430 and ¶ [0038], lines 1-4.) 
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated Merity’s architecture generation system into Kobayashi’s network mutation system. A motivation for the combination is that the candidate architecture (¶ [0041], last 4 lines)
	However, neither Kobayashi nor Merity explicitly teaches: the plurality of candidate layers being initialized while reusing learned parameters of the particular parent model;
	But Prellberg teaches: the plurality of candidate layers being initialized while reusing learned parameters of the particular parent model; (Abstract and p.6, § 3.3)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have initialized Merity’s weights randomly while keeping weights from the parent model intact because the learned weights contain useful values (See Prellberg, p. 6, § 3.3, lines 4-6). A motivation for the combination is that initializing weights of new layers randomly is useful for training weights via stochastic gradient descent.

	Regarding CLAIM 2, the combination of Kobayashi, Merity, and Prellberg teaches: The method of claim 1, 
	Kobayashi teaches: wherein the determining the computational costs includes determining training costs of training the trained child models and determining testing costs of testing the trained child models. (¶ [0109] discloses determining a calculation amount as the number of multiply-add operations; Fig. 9 displays error vs. computational cost for both training and validation for various networks. Fig. 9 is discussed starting at ¶ [0117].)

	Regarding CLAIM 3, the combination of Kobayashi, Merity, and Prellberg teaches: The method of claim 2, 
further comprising: determining losses associated with the trained child models; and (Fig. 9 displays error vs. computational cost for both training and validation for various networks. Fig. 9 is discussed starting at ¶ [0117].)
selecting the individual trained child model as the new parent model and adding the new parent model to the parent model pool based at least on the losses. (¶ [0078], lines 1-6 teaches that child networks may be generated from a Pareto optimal parent network; ¶ [0119] discloses three Pareto optimal networks.)

Regarding CLAIM 4, the combination of Kobayashi, Merity, and Prellberg teaches: The method of claim 3, 
Kobayashi teaches: further comprising: plotting the child models on a graph having a first axis reflecting the computational costs and a second axis reflecting the losses; and (¶ [0118] and Fig. 9 shows computational cost on the horizontal axis and error on the vertical axis.)
selecting the new parent model based at least on a corresponding location of the new parent model on the graph. (¶ [0078], lines 1-6 teaches that teaches that child networks may be generated from a Pareto optimal parent network; ¶ [0119] discloses three Pareto optimal networks.)

Regarding CLAIM 5, the combination of Kobayashi, Merity, and Prellberg teaches: The method of claim 4, 
Kobayashi teaches: further comprising: determining at least one of a lower convex hull or a Pareto frontier on the graph; and (¶ [0110], lines 6-7 discloses PL is a boundary of a Pareto optimal solution in Fig. 8A.  ¶ [0112] teaches the Pareto frontier as seen in Fig. 8B.)
selecting the new parent model based at least on proximity of the new parent model to the lower convex hull or the Pareto frontier. (¶ [0113], lines 4-8)

	Regarding CLAIM 7, the combination of Kobayashi, Merity, and Prellberg teaches: The method of claim 1, 
Kobayashi teaches: wherein generating an individual candidate layer comprises: selecting a target layer from the particular parent model to receive outputs of the individual candidate layer; selecting one or more input layers from the particular parent model to provide inputs to the individual candidate layer; and selecting a particular operation to be performed by the individual candidate layer on the inputs. (All limitations are taught by ¶ [0095], ¶ [0131], and Fig. 1, network MN2 and Fig. 10B)

	Regarding CLAIM 8, the combination of Kobayashi, Merity, and Prellberg teaches: The method of claim 7, 
However, Kobayashi does not teach: the selecting the particular operation comprising: defining a group of operations; and randomly selecting the particular operation from the group of operations.
	But Merity teaches: the selecting the particular operation comprising: defining a group of operations; and (¶ [0024]-[0026])
randomly selecting the particular operation from the group of operations. (¶ [0041], lines 3-7)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have defined operations and randomly selected from those operations as taught by Merity. A motivation for the combination is that random architecture generation is that mutations are usually random.	

Regarding CLAIM 9, the combination of Kobayashi, Merity, and Prellberg teaches: The method of claim 7, 
further comprising: selecting the target layer and at least one input layer randomly from the particular parent model. (¶ [0094] – [0095]. The BRI of this limitation is that a candidate layer is inserted into a randomly position.)

	Regarding CLAIM 10, the combination of Kobayashi, Merity, and Prellberg teaches: The method of claim 1, 
Kobayashi teaches: the final model being a neural network. (¶ [0119], line 1 and ¶ [0120], line 1.)

Regarding CLAIM 11, Kobayashi teaches: A system comprising:
a hardware processing unit; and (¶ [0204])
a storage resource storing computer-readable instructions which, when executed by the hardware processing unit, cause the hardware processing unit to: (¶ [0205])
perform an iterative model-growing process comprising: (Para. [0078], lines 1-6 teaches that networks may be generated from a seed network or a Pareto optimal network. Since the Pareto optimal solution had been generated in a previous iteration (para. [0075], lines 6-10), at least two iterations must necessarily be performed.)
selecting a particular parent model from a parent model pool of one or more parent models; (¶ [0085], lines 4-8; ¶ [0113], lines 4-end)
selecting, from a plurality of candidate layers, specific candidate layers to include in child models for subsequent training, the specific candidate layers including less than all of the plurality of candidate layers (Kobayashi teaches several candidate layer types including:
Tanh and Abs in ¶ [0058]-[0059] and Fig. 1
Conv, Pool in ¶ [0128] and Fig. 10A
Conv, Abs, Relu, and Dropout in ¶ [0131] and Fig. 10B
Fig. 6 and ¶ [0102], lines 1-4 teaches the generating unit may perform multiple layer insertions at the same time. Since it is possible for a network to include the plurality of candidate layers (i.e., Tanh, Abs, Conv, Pool, Relu, and Dropout), each mutated network in Figs. 1, 10A, and 10B is a network with less than all of the plurality of candidate layers.)
training a plurality of child models in one or more subsequent training iterations to obtain trained child models, respective child models inheriting a structure of the particular parent model and including at least one of the specific candidate layers selected from the plurality of candidate layers; and (¶ [0079]-[0080] discloses training the generated neural networks. ¶ [0095] teaches the generating unit inserting a layer into a parent model; ¶ [0102], lines 1-4 teaches the generating unit may perform multiple layer insertions at the same time. The network in Fig. 10B (see ¶ [0131]) has multiple layers inserted: Abs1, Relu1, and Dropout.)
output a final model, the final model being selected from the plurality of child models. (¶ [0124] - [0125])
	However, Kobayashi does not explicitly teach: specific candidate layers being selected from the plurality of candidate layers based at least on weights learned in an initialization process where the plurality of candidate layers are initialized via one or more training iterations when connected to the particular parent model; and
	But Merity teaches: an initialization process where the plurality of candidate layers are initialized via one or more training iterations when connected to the particular parent model; and (Fig. 5 and ¶ [0039]-[0043] discloses an iterative process for growing an RNN. ¶ [0034], lines 5-9 disclose initializing weights required by a node; weights are further taught in claim 7 and ¶ [0024], line 4 “Wx+b” where W is a weight vector. ¶ [0046], lines 1-5 teach that child models may replace the source node in the growing process. Training occurs in Fig. 4, step 430 and ¶ [0038], lines 1-4.)
(¶ [0041], last 4 lines)
	However, neither Kobayashi nor Merity explicitly teaches: specific candidate layers being selected from the plurality of candidate layers based at least on weights learned in an initialization process
	But Prellberg teaches: specific candidate layers being selected from the plurality of candidate layers based at least on weights learned in an initialization process (p. 6, § 3.3)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have initialized Merity’s weights randomly while keeping weights from the parent model intact because the learned weights contain useful values (See Prellberg, p. 6, § 3.3, lines 4-6). A motivation for the combination is that initializing weights of new layers randomly is useful for training weights via stochastic gradient descent.

Regarding CLAIM 12, the combination of Kobayashi, Merity, and Prellberg teaches: The system of claim 11, 
However, Kobayashi does not explicitly teach: wherein the plurality of candidate layers share connectivity to the particular parent model and perform operations.
	But Merity teaches: wherein the plurality of candidate layers share connectivity to the particular parent model and perform different operations. (¶ [0039]-[0041] and Fig. 5 disclose the candidate layers share connectivity to a parent model, where a parent model is the node                         
                            
                                
                                    h
                                
                                
                                    t
                                
                            
                        
                    . Operators are disclosed in ¶ [0024]-[0026].)
(¶ [0041], last 4 lines)

	Regarding CLAIM 13, the combination of Kobayashi, Merity, and Prellberg teaches: The system of claim 11, 
Kobayashi teaches: wherein the iterative model-growing process comprises: adding individual child models to the parent model pool. (¶ [0078], lines 1-6 teaches that child networks may be generated from a Pareto optimal parent network)

Regarding CLAIM 14, the combination of Kobayashi, Merity, and Prellberg teaches: The system of claim 11, 
However, neither Kobayashi nor Merity explicitly teaches: wherein the computer-readable instructions, when executed by the hardware processing unit, cause the hardware processing unit to: perform a feature selection technique on the weights of the plurality of candidate layers to select the specific candidate layers for inclusion in the child models for subsequent training. 
But Prellberg teaches teaches: wherein the computer-readable instructions, when executed by the hardware processing unit, cause the hardware processing unit to: perform a feature selection technique on the weights of the plurality of candidate layers to select the specific candidate layers for inclusion in the child models for subsequent training.  (The BRI of this limitation is selecting a candidate layer based at least in part on weights of the layer. The middle of p. 5 teaches different operations in the search space, each with weights associated with them.) 
(Prellberg, Abstract, lines 3-5)

Regarding CLAIM 16, the combination of Kobayashi, Merity, and Prellberg teaches: The system of claim 13, 
Kobayashi teaches: individual candidate layers performing different operations including at least convolution operations and pooling operations. (¶ [0128]-[0129] and Fig. 10A)

Regarding CLAIM 17, the combination of Kobayashi, Merity, and Prellberg teaches: The system of claim 11,
Kobayashi teaches: wherein the computer-readable instructions, when executed by the hardware processing unit, cause the hardware processing unit to: 
train the final model using training data for at least one classification, machine translation, or pattern recognition task; and (¶ [0133]-[0134] teaches training data for handwritten number recognition task, which is both classification and pattern recognition.)
provide the final model for execution, the final model being adapted to perform the at least one classification, machine translation, or pattern recognition task. (¶ [0124] - [0125])

	Regarding CLAIM 18, Kobayashi teaches: A computer-readable storage medium storing instructions which, when executed by a processing device, cause the processing device to perform acts comprising: (¶ [0205]) 
performing two or more iterations of an iterative model-growing process, the iterative model-growing process comprising: (Para. [0078], lines 1-6 teaches that networks may be generated from a seed network or a Pareto optimal network. Since the Pareto optimal solution had been generated in a previous iteration (para. [0075], lines 6-10), at least two iterations must necessarily be performed.)
selecting a particular parent model from a parent model pool of one or more parent models; (¶ [0085], lines 4-end; ¶ [0113], lines 4-end)
…
selecting a subset of less than all of the plurality of candidate layers for subsequent training based at least on weights learned when initializing the plurality of candidate layers; (Kobayashi teaches several candidate layer types including:
Tanh and Abs in ¶ [0058]-[0059] and Fig. 1
Conv, Pool in ¶ [0128] and Fig. 10A
Conv, Abs, Relu, and Dropout in ¶ [0131] and Fig. 10B
Fig. 6 and ¶ [0102], lines 1-4 teaches the generating unit may perform multiple layer insertions at the same time. Since it is possible for a network to include the plurality of candidate layers (i.e., Tanh, Abs, Conv, Pool, Relu, and Dropout), each mutated network in Figs. 1, 10A, and 10B is a network with less than all of the plurality of candidate layers.)
training a plurality of child models using one or more subsequent training iterations to obtain trained child models, respective child models inheriting a structure of the particular parent model and including at least one candidate layer from the selected subset of less than all of the plurality of candidate layers; and (¶ [0079]-[0080] discloses training the generated neural networks. ¶ [0095] teaches the generating unit inserting a layer into a parent model; ¶ [0102], lines 1-4 teaches the generating unit may perform multiple layer insertions at the same time. The network in Fig. 10B (see ¶ [0131]) has multiple layers inserted: Abs1, Relu1, and Dropout.)
designating an individual trained child model as a new parent model based at least in part on one or more criteria and adding the new parent model to the parent model pool; and (¶ [0078], lines 1-6 teaches that child networks may be generated from a Pareto optimal parent network. Regarding “one or more criteria”, ¶ [0118], last 3 lines teach Pareto optimal solutions are based on computational costs of testing.)
after the two or more iterations, selecting at least one trained child model as aApplication No.: 16/213,470 Attorney Docket No.: 405549-US-NPPage 7 of 13final model and outputting the final model. (¶ [0124] - [0125])
	However, Kobayashi does not explicitly teach:
connecting a plurality of candidate layers to the particular parent model; 
initializing the plurality of candidate layers via one or more initial training iterations while the plurality of candidate layers are connected to the particular parent model; 
	But Merity teaches:
connecting a plurality of candidate layers to the particular parent model; (¶ [0046], lines 1-5 teach that child models may replace the source node in the iterative model growing process disclosed in ¶ [0039]-[0043].)
initializing the plurality of candidate layers via one or more initial training iterations while the plurality of candidate layers are connected to the particular parent model; (Fig. 5 and ¶ [0039]-[0043] discloses an iterative process for growing an RNN. ¶ [0034], lines 5-9 disclose initializing weights required by a node; weights are further taught in claim 7 and ¶ [0024], line 4 “Wx+b” where W is a weight vector. ¶ [0046], lines 1-5 teach that child models may replace the source node in the growing process. Training occurs in Fig. 4, step 430 and ¶ [0038], lines 1-4.)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated Merity’s candidate architecture generator 310 into (¶ [0041], last 4 lines)
	However, neither Kobayashi nor Merity explicitly teaches: initializing the plurality of candidate layers
	But Prellberg teaches: initializing the plurality of candidate layers (Abstract and p.6, § 3.3)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have initialized Merity’s weights randomly while keeping weights from the parent model intact because the learned weights contain useful values (See Prellberg, p. 6, § 3.3, lines 4-6). A motivation for the combination is that initializing weights of new layers randomly is useful for training weights via stochastic gradient descent.

Regarding CLAIM 19, the combination of Kobayashi, Merity, and Prellberg teaches: The computer-readable storage medium of claim 18, 
However, Kobayashi does not explicitly teach: the acts further comprising: stabilizing other weights of the particular parent model when initializing the plurality of candidate layers while the plurality of candidate layers are connected to the particular parent model.
But Merity teaches: while the plurality of candidate layers are connected to the particular parent model. (Merity ¶ [0039]-[0043] discloses an iterative process for growing an RNN. Candidate layers are connected to the RNN and initialized via reinforcement learning by an architecture generator neural network (¶ [0042], line 5-6 and ¶ [0043], lines 1-5). This happens during training iterations for the architecture generator neural network (¶ [0038], lines 4-end).)
However, neither Kobayashi nor Merity explicitly teaches: the acts further comprising: stabilizing other weights of the particular parent model when initializing the plurality of candidate layers
the acts further comprising: stabilizing other weights of the particular parent model when initializing the plurality of candidate layers (p. 6, § 3.3, lines 4-6)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have kept weights of the parent model intact, as taught by Prellberg. A motivation for the combination is that the parent model’s weights contain useful, learned values. (Prellberg, p. 6, § 3.3, lines 4-6)

Regarding CLAIM 20, the combination of Kobayashi, Merity, and Prellberg teaches: The computer-readable storage medium of claim 19, 
Kobayashi teaches: the acts further comprising: randomly selecting operations from a group of enumerated operations to include in the plurality of candidate layers. (¶ [0094] – [0095].)
	Although Kobayashi discloses operations in the networks in Figs. 1, 10A, and 10B, Kobayashi does not explicitly teach: enumerated operations
	But Merity teaches: enumerated operations (The BRI of this limitation includes unary, binary, and ternary operators in ¶ [0024], lines 1-2. Examples of such operators are given in ¶ [0024] – [0026].)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have enumerated Kobayashi’s operations as taught by Merity. A motivation for the combination is that a list of mutations must be available to the system before a random mutation is selected.
	Prellberg also teaches enumerated operations at the bottom of p. 4 and the middle of p. 5.

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Kobayashi (US 20180365557 A1) in view of Merity et al. (US 20180336453 A1), Prellberg et al. (“Lamarckian Evolution of .

Regarding CLAIM 6, the combination of Kobayashi, Merity, and Prellberg teaches: The method of claim 5, 
Kobayashi teaches: wherein the selecting comprises: identifying a subset of the trained child models that are within a predetermined vicinity of the lower convex hull or the Pareto frontier;Application No.: 16/213,470 (¶ [0113], lines 4-8, where a “predetermined vicinity” is lying on top of the Pareto frontier itself)
selecting the new parent model… (¶ [0078], lines 1-6 teaches that child networks may be generated from a Pareto optimal parent network)
However, neither Kobayashi, Merity, nor Prellberg explicitly teaches: Attorney Docket No.: 405549-US-NPPage 3 of 13determining respective probabilities for the subset of the trained child models; and 
selecting the new parent model based at least on the respective probabilities.
But Murata teaches: determining respective probabilities for the subset of the trained child models; and (P. 290, col. 2, Step 2 teaches calculating a selection probability.)
selecting the new parent model based at least on the respective probabilities. (P. 291, col. 1, Step 7)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have calculated a selection probability for Kobayashi’s optimal networks. A motivation for the combination is to find Pareto optimal solutions. (Murata p. 289, col. 2, last 4 lines to p. 290, col. 1, line 3)

Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Kobayashi (US 20180365557 A1) in view of Merity et al. (US 20180336453 A1), Prellberg et al. (“Lamarckian Evolution .

Regarding CLAIM 15, the combination of Kobayashi, Merity, and Prellberg teaches: The system of claim 14, 
	However, neither Kobayashi, Merity, nor Prellberg explicitly teaches: the feature selection technique comprising least absolute shrinkage and selection operator (LASSO).
	But Szenkovits teaches: the feature selection technique comprising least absolute shrinkage and selection operator (LASSO). (P. 6, between the 6th and 3rd lines above equation (3).)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have used LASSO on layers of Kobayashi’s candidate layers for crossover, with a motivation to classify images. (Szenkovits, p. 6, last 3 lines to p. 7, lines 1-2)

Claim 21 is rejected under 35 U.S.C. 103 as being unpatentable over Kobayashi (US 20180365557 A1) in view of Merity et al. (US 20180336453 A1), Prellberg et al. (“Lamarckian Evolution of Convolutional Neural Networks” arXiv version 1, see IDS filed 06/02/2020), and Sivaraman et al. (US 20200342324 A1).

Regarding CLAIM 21, the combination of Kobayashi, Merity, and Prellberg teaches: The computer-readable storage medium of claim 18, 
However, Kobayashi does not explicitly teach: wherein the plurality of candidate layers are connected to specific layers of the particular parent model during the one or more initial training iterations, and the subset of less than all of the plurality of candidate layers remain connected to the same specific layers of the particular parent model during the one or more subsequent training iterations.
But Merity teaches: wherein the plurality of candidate layers are connected to specific layers of the particular parent model during the one or more initial training iterations, and the subset of layers remain connected during the one or more subsequent training iterations. (¶ [0039]-[0043] discloses an iterative process for growing an RNN. Candidate layers are connected to the RNN and initialized via reinforcement learning by an architecture generator neural network (¶ [0042], line 5-6 and ¶ [0043], lines 1-5). This happens during training iterations for the architecture generator neural network (¶ [0038], lines 4-end).)
However, neither Kobayashi, Merity, nor Prellberg explicitly teaches: the subset of less than all of the plurality of candidate layers remain connected to the same specific layers of the particular parent model
But Sivaraman teaches: the subset of less than all of the plurality of candidate layers remain connected to the same specific layers of the particular parent model (¶ [0007], lines 7-13)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have pruned Kobayashi’s parent network. A motivation for the combination is that pruning the network increases inference speed and reduces computational requirements. (Sivaraman, all of ¶ [0006])

Response to Arguments
Examiner herein responds to the Examiner Interview held 11/04/2021, the Advisory Action filed 11/18/2021, and Applicant’s remarks and claim amendments filed 11/30/2021.

Claim Rejections Under 35 U.S.C. 112(d) (Remarks p. 9): The previous rejection of claim 2 under 35 U.S.C. 112(d) is withdrawn due to the amendment of claim 2.

Claim Rejections Under 35 U.S.C. 101 (Remarks pp. 9-10): Applicant’s arguments with respect to the rejections of claims 11-16 under 35 U.S.C. 101 have been fully considered and are persuasive.  The rejections of claims 11-16 have been withdrawn. 

Claim Rejections Under 35 U.S.C. 102 and 103 (Remarks pp. 10-12): Applicant’s arguments with respect to claim(s) 1-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

 Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Yosinksi et al. (“How transferable are features in deep neural networks?”) teaches transferring layers from one model to another model, randomizing some of the layers, and retraining the full model.
Roy et al. (“Tree-CNN: A Hierarchical Deep Convolutional Neural Network for Incremental Learning”) teaches adding new nodes to a neural network during the training process. See Abstract and p. 3, § 3. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Asher H. Jablon whose telephone number is (571)270-7648. The examiner can normally be reached Monday - Friday, 9:00 am - 6:00 pm.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abdullah Al Kawsar can be reached on (571)270-3169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ASHER H. JABLON/Examiner, Art Unit 2127                                                                                                                                                                                                        


/ABDULLAH AL KAWSAR/Supervisory Patent Examiner, Art Unit 2127