Detailed Action
This action is in response to Applicant's communications filed 13 October 2021.
Claim(s) 1, 2, 4, 15, 16, 18, and 20 was/were amended.  No claims were cancelled or withdrawn.  Claims 1-5, 9-10, 14-18, and 20-24 are pending in this Application.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendments/Arguments
Applicant's amendments, filed 13 October 2021, regarding the rejections of claims 2, 4, 16, and 18 under 35 USC 112(b) have been fully considered and are sufficient to overcome the rejections.  Accordingly, the rejections to the claims under 35 USC 112(b) have been withdrawn.
Applicant’s arguments, filed 13 October 2021, with respect to the rejections of claims 1-5, 9-10, 14-18, and 20-24 under 35 USC 103 are regarding newly amended claims and are addressed in the current rejection. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claim(s) 1-5, 9-10, 14-18, and 20-24 is/are rejected under 35 U.S.C. 103 as being unpatentable over Rodvold (U.S. Pub. No. 2002/0059154) in view of Harvey (The Microbial Genetic Algorithm), Yao et al. (Evolving Artificial Neural Network Ensembles, hereinafter "Yao"), Chen et al. (Net2Net: Accelerating Learning Via Knowledge Transfer, hereinafter "Chen"), and Ooi et al. (SINGA: A Distributed Deep Learning Platform, hereinafter "Ooi").

Regarding Claim 1,
Rodvold teaches a method comprising receiving training data for training a neural network to perform a machine learning task, the training data comprising a plurality of training examples and a respective target output for each of the training examples ("Training pairs (consisting of an input vector and an output vector)" [0007]);
providing a population repository comprising a current population of compact representations, wherein each compact representation in the current population encodes a different candidate neural network architecture for performing the machine learning task ("In particular, the chromosomes span the entire connectivity space, and allow the representation of any architecture. The fitness function is based on the performance of the ANN corresponding to a given chromosomal pattern, with modifications to encourage spurious input rejection and architecture minimization." [0023]; alternatively, see [0051]-[0053] for methods of storing unique architectures to efficiently use CPU for unique architectures);
("The fitness function is based on the performance of the ANN corresponding to a given chromosomal pattern, with modifications to encourage spurious input rejection and architecture minimization. Finally, the dynamics of the GA are designed to effectively span the search space and quickly approach the optimal architecture." [0023]), comprising:
selecting, by the worker computing unit, a pair of compact representations from the current population of compact representations in the population repository ("After two individuals are selected for reproduction, the offspring are determined via a process called crossover." [0016]),
determining, by the worker computing unit, a measure of fitness of the trained new neural network ("The concept of fitness is central to GAs, and one of the most challenging and important tasks associated with implementing a genetic algorithm. In order to determine which individuals pass their genetic information on to subsequent generations, each individual is assessed with a fitness function that defines a numeric value for its desirability. The individuals are then ranked according to their fitness, and the fittest individuals are most likely to reproduce." [0015]; "The primary contributor to the fitness assessment is the accuracy of the neural network that corresponds to the chromosome. Thus the first step in determining fitness is to exercise the ANN training module (described above) to find the accuracy of the ANN architecture for the chromosome being assessed. The accuracy of the neural network can be any of the common performance measures used by ANNs, such as RMS (root-mean-square) error, mean absolute error, ROC (receiver-operator characteristic) curve area, number of correct cases, or any other appropriate metric." [0044]), and 
adding, by the worker computing unit, the new compact representation to the current population in the population repository and associating the new compact representation with the measure of fitness ("The true power of genetic algorithms lies in the evolution of the population. Individuals within a population combine to form new members, and the “fittest” members are the most likely to become “parents” of new members." [0015]; "These new individual then replace their “parents” in the population." [0016]; "In order to determine which individuals pass their genetic information on to subsequent generations, each individual is assessed with a fitness function that defines a numeric value for its desirability." [0015]); and
 selecting, as the optimized neural network architecture, the neural network architecture that is encoded by the compact representation that is associated with a best measure of fitness ("When the GA finishes running, the fittest individual in the population represents the (near) optimal solution." [0018]; Figure 5, Report fittest ANN to user); and
 determining trained values of parameters of a neural network having the optimized neural network architecture ("there are a large number of high-quality training algorithms available for MLP ANNs, and, for the most part, any will work well here. This method has been successfully tested with the venerable “Backpropagation of Errors” training algorithm, but other methods, such as “Conjugate Gradient Descent,” “Levenberg-Marquardt,” or genetic/evolutionary algorithms will also be effective." [0035]).

Rodvold does not explicitly teach identifying a compact representation of the pair of compact representations that is associated with a best fitness, generating, by the worker computing unit, a new compact representation from the identified compact representation.  
Rodvold also does not explicitly teach wherein generating the new compact representation comprises: selecting a mutation from a predetermined set of mutations, determining a plurality of valid locations in the identified compact representation, randomly selecting one of the plurality of valid locations, and applying the selected mutation to the identified compact representation to generate the new compact representation.
Rodvold also does not explicitly teach instantiating a new neural network having an architecture encoded by the new compact representation; initializing values of parameters of the new neural network using values of parameters of a trained neural network having an architecture encoded by the identified compact representation; and training the new neural network starting from the initialized values of the parameters to generate a trained new neural network.
Rodvold does not explicitly teach repeatedly performing the following operations using each of a plurality of worker computing units configured to operate independently of one another to update the population repository, each worker computing unit operating asynchronously from each other worker computing unit. 


Harvey teaches identifying a compact representation of the pair of compact representations that is associated with a best fitness ("Tournament Selection... For each birth/death cycle, generate one new offspring with random parentage... it can similarly work with an asexual GA through picking a single parent at random.  A single individual must be culled to be replaced by the new individual; by picking two at random, and culling the Loser, or least fit of the two, we have the requisite selection pressure." sec. 2.4, pp. 129-130),
generating, by the worker computing unit, a new compact representation from the identified compact representation ("Tournament Selection... For each birth/death cycle, generate one new offspring with random parentage... it can similarly work with an asexual GA through picking a single parent at random.  A single individual must be culled to be replaced by the new individual; by picking two at random, and culling the Loser, or least fit of the two, we have the requisite selection pressure." sec. 2.4, pp. 129-130; of the two randomly selected parents, the non-Loser is not culled and goes on to reproduce asexually, teaching generating a new compact representation from the remaining compact representations other than the identified compact representation in the pair of compact representations).  
Rodvold and Harvey are analogous art because both are directed towards programming genetic algorithms. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the method for optimizing neural network architectures using genetic algorithms of Rodvold with the genetic algorithms of Harvey.  The modification would have been obvious because one of ordinary skill in the art would 

Yao teaches wherein generating the new compact representation comprises: selecting a mutation from a predetermined set of mutations ("PNet uses five mutation operators to evolve ANN weights as well as architectures" sec. 2.1, p. 33), determining a plurality of valid locations in the identified compact representation ("architectural mutations (i.e., node/connection deletion/addition)" sec. 2.1, p. 33; Figure 1, Connection/Node Addition; nodes are components in the neural network that are valid locations), randomly selecting  one of the plurality of valid locations ("Node deletion in EPNet is done totally at random, i.e., a node is selected uni-formly at random for deletion." sec. 2.1, p. 33), and applying the selected mutation to the remaining compact representation to generate the new compact representation (Figure 1, Mutations, Obtain the New Generation).
Rodvold and Yao are analogous art because both are directed to applying mutations to neural networks. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the neural network optimization method of the Rodvold/Harvey combination with the mutation methods of Yao.  The modification would have been obvious because one of ordinary skill in the art would be motivated to increase the effectiveness and efficiency of adaptation in neural networks, as suggested by Yao (Yao: p. 31).

Chen teaches instantiating a new neural network having an architecture encoded by the new compact representation;  initializing values of parameters of the new neural network using values of parameters of a trained neural network having an architecture encoded by the identified compact representation (Figure 1, Net2Net Workflow, Reuse the Model, p. 2; "Net2Net reuses information from an already trained model to speed up the training of a new model" sec. 1, p. 2); and
training the new neural network starting from the initialized values of the parameters to generate a trained new neural network (Figure 1, Net2Net Workflow, Training, p. 2; "Instead of training each considered design of model for as much as a month, the experimenter can use Net2Net to train the model for a shorter period of time beginning from the function learned by the previous best model." sec. 1, pp. 1-2).
Rodvold and Chen are analogous art because they are both directed to optimizing neural networks. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the neural network optimization method of the Rodvold/Harvey/Yao combination with the accelerated learning via knowledge transfer of Chen.  The modification would have been obvious because one of ordinary skill in the art would be motivated to decrease training time of models, as suggested by Chen ("Instead of training each considered design of model for as much as a month, the experimenter can use Net2Net to train the model for a shorter period of time beginning from the function learned by the previous best model." sec. 1, pp. 1-2).

Ooi teaches repeatedly performing the following operations using each of a plurality of worker computing units ("For each SGD iteration, every worker calls the Train-OneBatch function to compute gradients of parameters asso-ciated with local layers (i.e., layers dispatched to it)." sec. 3.2, p. 687) configured to operate independently of one another to update the population repository ("worker groups run asynchronously." sec. 4.1, p. 687), each worker computing unit operating asynchronously from each other worker computing unit ("SINGA supports various synchronous and asynchronous training frameworks. Users can change the cluster topology conﬁguration to run diﬀerent frameworks. Here we illustrate how users can train with SINGA using popular distributed training frameworks." sec. 4.3, p. 687).
Rodvold and Ooi are analogous art because both are directed to updating neural networks. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the neural network optimization method of the Rodvold/Harvey/Yao/Chen combination with the distributed updating of neural networks of Ooi.  The modification would have been obvious because one of ordinary skill in the art would be motivated to improve runtime performance, as suggested by Ooi (sec. 1, p. 685).

Regarding Claim 2,
The Rodvold/Harvey/Yao/Chen/Ooi combination teaches the method of claim 1.  Rodvold further teaches wherein training the new neural network starting from the initialized values of the parameters comprises training the new neural network on a ("Once the training is complete, testing is performed. When creating the database of training pairs, some of the data are withheld from the system. Usually at least ten percent of the available data are set aside to run through the trained network, testing the system's ability to correctly assess cases that it has not trained on." [0008]), and 
wherein determining a measure of fitness of the trained new neural network having the architecture encoded by the new compact representation comprises: determining the measure of fitness by evaluating a performance of the trained new neural network on a validation subset of the training data ("The primary contributor to the fitness assessment is the accuracy of the neural network that corresponds to the chromosome. Thus the first step in determining fitness is to exercise the ANN training module (described above) to find the accuracy of the ANN architecture for the chromosome being assessed. The accuracy of the neural network can be any of the common performance measures used by ANNs, such as RMS (root-mean-square) error, mean absolute error, ROC (receiver-operator characteristic) curve area, number of correct cases, or any other appropriate metric." [0044]).

Regarding Claim 3,
The Rodvold/Harvey/Yao/Chen/Ooi combination teaches the method of claim 2.  Rodvold further teaches the operations further comprising: associating the trained values of the parameters of the new neural network with the new compact representation in the population repository (Figure 5, CalculateFitness of the Population: For each individual (chromosome): 1. Train the corresponding ANN to determine performance; this teaches that the trained values are asoociated with the compact representation in the population repository).

Regarding Claim 4,
The Rodvold/Harvey/Yao/Chen/Ooi combination teaches the method of claim 3.  Rodvold further teaches wherein determining the trained values of parameters of the neural network having the optimized neural network architecture comprises: selecting, as the trained values of the parameters of the neural network having the optimized neural network architecture, trained values that are associated with the compact representation that is associated with the best measure of fitness (Figure 5: Report fittest ANN to user, CalculateFitness of the Population: For each individual (chromosome): 1. Train the corresponding ANN to determine performance; this teaches selecting the optimal neural network architecture, which includes the trained values).

Regarding Claim 5,
The Rodvold/Harvey/Yao/Chen/Ooi combination teaches the method of claim 1.  Rodvold further teaches initializing the population repository with one or more default compact representations that encode default neural network architectures for performing the machine learning task ("The initial network configuration (number of hidden neuron layers, number of hidden neurons in each layer, activation function, training rate, error tolerance, etc.) is chosen by the system designer. There are no set rules to determine these network parameters, and trial and error based on experience seems to be the best way to do this currently. Some commercial programs use optimization techniques such as simulated annealing to find good network architectures. The synaptic weights are initially randomized, so that the system initially consists of “white noise.”" [0006]).

Regarding Claim 9,
The Rodvold/Harvey/Yao/Chen/Ooi combination teaches the method of claim 1.  Rodvold further teaches wherein generating the new compact representation comprises: randomly selecting a mutation from the predetermined set of mutations; and applying the randomly selected mutation to the remaining compact representation to generate the new compact representation ("A population can experience change from a source other than reproduction. In particular, spontaneous mutations can occur at a pre-selected probability. If a mutation is determined to have occurred, a new individual is created from an existing individual with one binary position reversed." [0018]; "A zero value indicates that the connection should not exist in the corresponding ANN, while a value of one indicates that the connection should exist." [0038]; this teaches mutations of adding or subtracting connections, wherein the random selection depends on which bit is selected and the value of the bit).

Regarding Claim 10,
The Rodvold/Harvey/Yao/Chen/Ooi combination teaches the method of claim 1.  Rodvold does further teaches wherein generating the new compact representation comprises: processing the remaining compact representation using a mutation neural (the broadest reasonable interpretation of these limitations in light of the specification is an algorithm that performs mutations to a neural network; Figure 5: For each non-elite member of new generation, perform selection crossover, and mutation; "The most critical component of this invention is genetic algorithm, which controls the optimization of the ANN." [0031]; "If a mutation is determined to have occurred, a new individual is created from an existing individual with one binary position reversed." [0018]).

Regarding Claim 14,
The Rodvold/Harvey/Yao/Chen/Ooi combination teaches the method of claim 1.  Rodvold further teaches using the neural network having the optimized neural network architecture to process new input examples in accordance with the trained values of the parameters of the neural network ("If the testing pairs are assessed with success similar to the training pairs, and if this performance is sufficient, the network is ready for actual use." [0008]; actual use teaches using the trained neural network using new inputs).

Regarding Claims 15-18 and 21-24,
Claims 15-18 and 21-24 recite(s) a system comprising computers (Rodvold: [0005]) performing operations corresponding to the method steps recited in claims 1-5, 9-10, and 14, respectively.  The Rodvold/Harvey/Yao/Chen/Ooi combination teaches the 

Regarding Claim 20,
Claim 20 recite(s) a non-transitory computer storage media with instructions executed by a computer (Rodvold: [0005]) to perform operations corresponding to the method steps recited in claim 1, respectively.  The Rodvold/Harvey/Yao/Chen/Ooi combination teaches the limitations of claim 20 as set forth above in connection with claim 1.  Therefore, claim 20 is rejected under the same rationale as respective claim 1.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHARLES C KUO whose telephone number is (571)270-7477.  The examiner can normally be reached on M-F: 9:00 a.m. - 6:00 p.m..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571) 272-9767.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/CHARLES C KUO/Examiner, Art Unit 2126
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126