DETAILED ACTION
Claims 1-5, 7-9, 11-15, and 17-19 have been examined.
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 1-5, 7-9, 11-15, and 17-19 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.

Claims 1 and 11 recite “identifying one or more candidate neural networks forming part of the current Pareto-optimal front as the at least one suitable neural network.”  This limitation is not clearly understood because claims 1 and 11 recite a current Pareto-optimal front composed of at least two performance characteristics of one or more previous candidate neural network, and updating the current Pareto-optimal front to include the at least two performance characteristics. The Pareto-optimal front is recited as containing performance characteristics and not as containing candidate neural networks. It is not clear which candidate neural networks are identified from the current Pareto-optimal front.
Appropriate corrections are required. 
Any claim not specifically addressed, above, is being rejected as incorporating the deficiencies of a claim upon which it depends.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 4, 9, 11, 14, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Drevo (US 20160132787), in view of Goh, et al. ("Hybrid Multiobjective Evolutionary Design for Artificial Neural Networks" (hereinafter Goh), further in view of Cruz-Ramirez et al. “Selecting the Best Artificial Neural Network Model from a Multi-Objective Differential Evolution Pareto Front” (hereinafter Cruz).

As per claim 1, Drevo teaches a method for identifying at least one neural network suitable for a given application, comprising: (Abstract; The system uses a hybrid optimization technique to select between multiple machine learning approaches for a given dataset);
iteratively executing a sequence (Paragraph 0115; The correlate-and-train processing of blocks 404-410 is repeated until) that includes: 
selecting a candidate set of neural network parameters associated with a candidate neural network (Paragraph 0034; modeling methodology include DNN. Paragraph 0035; the term “model parameters” refer to the possible settings or choices for a given modeling methodology. Paragraph 0086; A CPT is abstraction that compactly expresses every parameter, hyperparameter and design choice, in general, for a modeling methodology);
predicting at least two performance characteristic of the candidate neural network (Paragraph 0141; At block 510, for each parameterization p.sub.j, the performance y.sub.j is estimated using the GP model to get μ.sub.y.sub.j, and σ.sub.y.sub.j, where μ.sub.y.sub.j is the maximum a posteriori value for y.sub.j and σ.sub.y.sub.j expresses the confidence in the prediction. Paragraph 0143; With EI, the parameterization is selected using both the average performance predicted by the GP model and also the confidence in its prediction);
	when an end condition associated with the sequence is reached, identifying one or more candidate neural networks as the at least one suitable neural network (Paragraph 0115; if the termination criteria is reached, the highest performing model k* is returned).
Drevo does not explicitly teach comparing the at least two performance characteristic of the candidate neural network against a current performance baseline, the at least two actual performance characteristics obtained upon testing the candidate neural network; and when the at least two actual performance characteristic exceeds the current performance baseline, updating the current performance baseline to include the at least two performance characteristics; and identifying one or more candidate neural networks forming part of the current Pareto-optimal front as the last one suitable neural network.
Goh, teaches comparing the at least two performance characteristic of the candidate neural network against a current performance baseline, the at least two actual performance characteristics obtained upon testing the candidate neural network  (II. BACKGROUND INFORMATION: A. Multiobjective Optimization; the solution to MO optimization problem exists in the form of alternate tradeoffs known as Pareto optimal set. The different dominance relationship is illustrated in Fig. 1 where the solutions denoted by closed circles formed the optimal PF and dominated the solutions represented by open circles. IV. HYBRID MULTIOBJECTIVE EVOLUTIONARY NEURAL NETWORKS, B. MO Fitness Evaluation, 1) Pareto Ranking; The Pareto ranking scheme [14] is adopted in this paper based on the objectives of minimizing training error and network complexity. This scheme assigns the same smallest cost for all nondominated individuals, while the dominated individuals are ranked according to how many individuals in the population dominate them; so the rank of an individual in the population will be given by Equation (8). V. HYBRID MULTIOBJECTIVE EVOLUTIONARY NEURAL NETWORKS, B. MO Fitness Evaluation, 3) Diversity Preservation: The approximation of the Pareto optimal front requires the MOEA to perform a multidirectional search simultaneously to discover multiple, widely different solution Fig. 8; EVALUATE ANNs, STORE best ANN. Examiner note: According to the specification: “the current performance baseline defined by a current Pareto-optimal front composed of at least two performance characteristics of one or more previous candidate neural networks”. Therefore, the Pareto optimal set in Goh is the performing baseline from one or more previous candidate neural networks see Fig. 1. In Goh the objectives of minimizing training error and network complexity, by ranking the number of solution dominating other solution in the objective domain, Goh is comparing the two objective performance characteristics (minimizing training error and network complexity) against the current performance baseline (Pareto optimal set)).
when the at least two actual performance characteristic exceeds the current performance baseline, updating the current performance baseline to include the at least two performance characteristics. (V. HYBRID MULTIOBJECTIVE EVOLUTIONARY NEURAL NETWORKS, B. MO Fitness Evaluation, 1) Pareto Ranking; The Pareto ranking scheme [14] is adopted in this paper based on the objectives of minimizing training error and network complexity. This scheme assigns the same smallest cost for all nondominated individuals, while the dominated individuals are ranked according to how many individuals in the population dominate them; so the rank of an individual in the population will be given by Equation (8). V. HYBRID MULTIOBJECTIVE EVOLUTIONARY NEURAL NETWORKS, B. MO Fitness Evaluation, 3) Diversity Preservation: The approximation of the Pareto optimal front requires the MOEA to perform a multidirectional search simultaneously to discover multiple, widely different solution Fig. 8; EVALUATE ANNs, STORE best ANN. Examiner note: training error and network complexity are the performance characteristics).
	It would have been obvious to one of ordinary skill in the art before the effective filing date, to combine the method of Drevo with the method of Goh in order to account for the inherent tradeoffs in capacity and complexity (I. INTRODUCTION), and good solutions are updated into an external population or archive. The selection process typically involves the archive of nondominated solutions to improve convergence (II. BACKGROUND INFORMATION, B. Multiobjective Evolutionary Algorithms).
	Cruz teaches identifying one or more candidate neural networks forming part of the current Pareto-optimal front as the last one suitable neural network (i.e., obtain the best model from the Pareto front, see at least page 1, abstract, right column, paragraph 2). 
	It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Drevo to identifying one or more candidate neural networks forming part of the current Pareto-optimal front as the last one suitable neural network as similarly taught by Cruz to use known methods to choose best models based on different objectives (see at least abstract, pages 3-4, section III of Cruz).

As per claim 4, Drevo also teaches wherein predicting the at least two performance characteristic comprises predicting an average error and at least one of a computation time, a latency, an energy efficiency, an implementation cost, and a computational complexity of the candidate neural network. (Paragraph 0045; the system 100 can store many aspects of the model exploration search process: model training times, measures of predictive power, average performance for evaluation, training time, number of features, baselines, and comparative performance among methodologies. Paragraph 0136; The GP modeling is used to model the relationship between the continuous tunable parameters for the hyperpartition and the performance metric. Paragraph 0145; The time cost for training t.sub.y.sub.j may be determined from, or estimated by, the elapsed time attribute 208o within the performance table 106d. ).

As per claim 9, Drevo also teaches wherein the end condition is an iteration limit, and further wherein the sequence is iteratively executed until the iteration limit is attained (Paragraph 0115; The correlate-and-train processing of blocks 404-410 is repeated until certain termination criteria are reached (block 412). The termination criteria can include whether desired performance is reached, whether a computational or time-based budget (or “deadline”) is met, or any other suitable criteria.).

As per claim 11, this is a system claim corresponding to the method of claim 1, it is substantially similar to claim 1 and is rejected in the same manner, the same art and reasoning applying. Further, Drevo also teaches a system for identifying at least one neural network suitable for a given application, comprising: a processing unit; and a non-transitory computer-readable memory communicatively coupled to the processing unit and comprising computer-readable program instructions executable by the processing unit for (Paragraph 0162; FIG. 8 shows an illustrative computer or other processing device 800 that can perform at least part of the processing described herein. In some embodiments, the system 100 of FIG. 1 includes one or more processing devices 800, or portions thereof. The illustrative processing device 800 includes a processor 802, a volatile memory 804, a non-volatile memory 806 (e.g., hard disk), an output device 808 and a graphical user interface (GUI) 810 (e.g., a mouse, a keyboard, a display, for example), each of which is coupled together by a bus 818. The non-volatile memory 806 stores computer instructions 812, an operating system 814, and data 816. In one example, the computer instructions 812 are executed by the processor 802 out of volatile memory 804. In one embodiment, an article 580 comprises non-transitory computer-readable instructions.).

As per claims 14 and 19, these are the system claims 4 and 9.  Therefore, claims 14 and 19 are rejected using the same reasons as claims 4 and 9. 

Claims 2, 5, 7, 8, 12, 15, 17 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Drevo, in view of Goh, further in view of Cruz, in further view of Teng, et al. "An automated design system for finding the minimal configuration of a feed-forward neural network” (hereinafter Teng).

As per claim 2, modified Drevo teaches the method of claim 1, but does not explicitly teach wherein the at least two performance characteristic of the candidate neural network is predicted using a modelling neural network.
Teng teaches wherein the at least two performance characteristic of the candidate neural network is predicted using a modelling neural network (1 Introduction; Our CANN takes as inputs partial training behavior of two ANNs, and predicts which ANN will lead to a smaller configuration when training is completed. Section 3. Learning to Predict Relative Convergence Times; we develop a CANN whose inputs consist of TSSE traces from two partially trained ANNs that are filtered by low-pass filters with different cut-off frequencies and extrapolated by different methods, and whose output predicts which of the two ANNs will converge faster we develop a comparator artificial neural network (CANN) that takes into consideration these factors. Abstract; Our system is a population-based generate-and-test method that maintains a population of candidate ANNs, and that selectively train these that are predicted to require smaller configurations. 2. Population-Based Learning System for Designing Neural Networks; The performance of training is then saved in the Learning Performance Database that maintains the history of performance for each candidate. Note that the learning performance of a candidate includes the number of hidden units and its temporal trace of TSSE. Examiner note: The two performance characteristics of the candidate neural network here are the smaller configuration meaning a smaller number of hidden unit and shorter time to converge. The CANN is the modelling neural network).
It would have been obvious to one of ordinary skill in the art, prior to the effective filing date, to combine the method of Drevo with the method of Teng because it is difficult to predict the exact number of hidden units required when the CAS algorithm terminates, our system compares two partially trained ANNs and predicts which one will converge with a smaller number of hidden units relative to the other (Abstract) and there are infinitely many different configuration (Section1 Introduction).

As per claim 5, modified Drevo teaches the method of claim 4, and Teng also teaches:
wherein predicting the at least two performance characteristic comprises using a multi-layer perceptron (MLP) model (Section 3. Learning to Predict Relative Convergence Times; we develop a CANN whose inputs consist of TSSE traces from two partially trained ANNs that are filtered by low-pass filters with different cut-off frequencies and extrapolated by different methods, and whose output predicts which of the two ANNs will converge faster we develop a comparator artificial neural network (CANN) that takes into consideration these factors. Abstract; Our system is a population-based generate-and-test method that maintains a population of candidate ANNs, and that selectively train these that are predicted to require smaller configurations.) 
to model a response surface relating the candidate set of neural network parameters to the average error. (Section 3. Learning to Predict Relative Convergence Times Using a Comparator Neural Network; To overcome this difficulty, we develop a CANN whose inputs consist of TSSE traces from two partially trained ANNs that are filtered by low-pass filters with different cut-off frequencies and extrapolated by different methods, and whose output predicts which of the two ANNs will converge faster.).
Further, it would have been obvious to one of ordinary skill in the art, prior to the effective filing date, to combine the method of Drevo with the method of Teng because errors in prediction can be used to adjust the weights of the CANN (Section 1. Introduction) and predicts which of the two ANNs will converge faster (Section 3. Learning to Predict Relative Convergence Times Using a Comparator Neural Network).

As per claim 7, modified Drevo teaches the method of claim 2, and Teng also teaches:
further comprising, when the at least two performance characteristic exceeds the current performance baseline, updating the modelling neural network based on the candidate neural network, (2. Population-Based Learning System for Designing Neural Networks, Page 1297; The Internal Critic predicts the relative convergence time of one candidate ANN with respect to another using a CANN to be is discussed in the next section. The prediction leads to the following alternative actions. (1) If the ANN selected has been trained to convergence and its number of hidden units is less than Nincum then Nincumb is updated. The Resource Scheduler then instructs the Heuristics Manager to generate a new candidate ANN, and schedules time to train the new ANN for Nincum /4 training episodes. Abstract; Our system is a population-based generate-and-test method that maintains a population of candidate ANNs, and that selectively train these that are predicted to require smaller configurations. Section 3. Learning to Predict Relative Convergence Times These processed traces, together with their actual convergence times, are then used to train a CANN, which predicts for any two partially trained ANNs, which one will converge faster. Since we know the exact convergence times of these traces, errors in prediction can be used to update the weights of the CANN.);
comprising retraining the modelling neural network with at least one actual performance characteristic obtained upon testing the candidate neural network and with one or more performance characteristics obtained upon testing one or more previous candidate neural networks. (Section 4. Experimental Results; Using the normalized training patterns, we then trained a CANN to differentiate between any two training patterns which one will have a smaller convergence time (see Figure 2). The configuration of each subnet in the CANN is 9-15-1. We stopped training when we reached 80% accuracy.).
	It would have been obvious to one of ordinary skill in the art, prior to the effective filing date, to combine the method of Drevo with the method of Teng because the accuracy of prediction is improved by using the CANN (Section 4. Experimental Results).

As per claim 8, modified Drevo teaches the method of claim 1, and Teng also teaches:
further comprising, when the at least two performance characteristics does not exceed the current performance baseline, discarding the candidate neural network. (Section 2. Population-Based Learning System for Designing Neural Networks; Page 1296; selects and trains one promising ANN for a quantum using a point-based method, updates performance obtained at the end of the quantum, generates new ANNs when none of the existing ANNs is promising, and discards an existing ANN when it is found to be inferior. Page 1297; The Internal Critic predicts the relative convergence time of one candidate ANN with respect to another using a CANN to be is discussed in the next section. The prediction leads to the following alternative actions.  Note that if a non-converged candidate ANN has Nincum -1 hidden units, then this candidate will require at least Nincum, hidden units when training converges, and, hence, can be pruned. (3) Otherwise, the candidate ANN is pruned, and the Heuristics Manager generates a new ANN. Examiner note; The method of Teng prunes ANN with larger predicted converge time and prunes ANN if the predicted number of hidden units is not less than the baseline hidden unit (Nincum ).
	It would have been obvious to one of ordinary skill in the art, prior to the effective filing date to combine the method of Drevo with the method of Teng because it can be used to prune unpromising ANNs before they are trained to convergence (Section 1. Introduction).

As per claims 12, 15, 17, and 18, these are the system claims 2, 5, 7, and 8.  Therefore, claims 12, 15, 17, and 18 are rejected using the same reasons as claims 2, 5, 7, and 8. 

Claims 3, and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Drevo, in view of Goh, further in view of Cruz, further view of Ravindran (US 20160259994).

As per claim 3, Drevo teaches the method of claim 1, but does not explicitly teach wherein the candidate set of neural network parameters comprises at least one of a number of layers, a number of nodes per layer, a convolution kernel size, a maximum pooling size, a type of activation function, and a network training rate.
Ravindran teaches wherein the candidate set of neural network parameters comprises at least one of a number of layers, a number of nodes per layer, a convolution kernel size, a maximum pooling size, a type of activation function, and a network training rate (Paragraph 0010; The candidate architecture may include a number of convolution layers and subsampling layers and a classifier type. The candidate parameters may include a learning rate, a batch size, a maximum number of training epochs, an input image size, a number of feature maps at every layer of the CNN, a convolutional filter size, a sub-sampling pool size, a number of hidden layers, a number of units in each hidden layer, a selected classifier algorithm, and a number of output classes.).
	It would have been obvious to one of ordinary skill in the art, prior to the effective filing date, to combine the method of Drevo with the method of Ravindran because CNNs, however, include many functional components, which make it difficult to determine the necessary network architecture that performs accurately to detect and classify particular features of images relevant for a problem in hand. Furthermore, each component of the CNN typically has a multitude of parameters associated with it. The specific values of those parameters necessary for a successful and accurate image classification are not known a priori without any application of a robust image processing system (Paragraph 0010).

As per claims 13, this is the system claim of claim 3. Therefore, claim 13 is rejected using the same reasons as claim 3. 

Response to Arguments
Rejection of claims under §103: 
As per claim 1, Applicant argued that Drevo concerned with predicting a single performance characteristic and is silent with regards to “predicting at least two performance characteristics.” Applicant argued that a person skilled in the art would not be motivated to combine Drevo and Goh because Drevo is concerned with single-objective optimization and a person skill in the art would therefore not have been led to combine the single-objective maximization approach of Drevo with the multi-objective maximization approach of Goh to arrive at the claimed subject-matter.
Examiner respectfully disagrees. Drevo at [0141], teaches the performance y_j is estimated using the GP model to get μ_yj and σ_yj, where μ_yj is the maximum a posteriori value for y_j and σ_yj expresses the confidence in the prediction.  μ_yj and σ_yj are two performance characteristics.  Further, Examiner disagrees that it would not have been obvious to modify a single-objective optimization approach with a multi-objective approach. One of ordinary skill in the art, when performing methods related to optimization, would look to known optimization methods, and would look to multi-objective approach when the optimization problem at hand would benefit from optimization based on multiple objectives. Second, Drevo at [0142], teaches the acquisition function is applied using different characteristics including μ_yj and σ_yj. Therefore, Drevo is concerned with different characteristics when maximizing an acquisition function.  Drevo at [0143], [0145], further teaches non-limiting examples of acquisition functions include expected improvement per time, which is multi-objective on the performance of a parameterization by taking into account the time cost for training.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jue Louie whose telephone number is 571-270-1655.  The examiner can normally be reached on M-F 9:30 am - 5:00pm (EST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li Zhen can be reached on 571-272-3768.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/Jue Louie/
Primary Examiner
Art Unit 2121