DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 05/07/2019. The submission is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1, 4, 6, 9, 11, 14, 16, and 19 are rejected under 35 U.S.C. 102(a)(1) and (a)(2) as being anticipated by Drevo (US 20160132787)

Regarding claim 1, Drevo teaches:
	A method for identifying at least one neural network suitable for a given application, comprising: (Abstract; The system uses a hybrid optimization technique to select between multiple machine learning approaches for a given dataset);
	selecting a candidate set of neural network parameters associated with a candidate neural network (Paragraph 0034; modeling methodology include DNN. Paragraph 0035; the term “model parameters” refer to the possible settings or choices for a given modeling methodology. Paragraph 0086; A CPT is abstraction that compactly expresses every parameter, hyperparameter and design choice, in general, for a modeling methodology);
	predicting at least one performance characteristic of the candidate neural network (Paragraph 0141; At block 510, for each parameterization p.sub.j, the performance y.sub.j is estimated using the GP model to get μ.sub.y.sub.j, and σ.sub.y.sub.j, where μ.sub.y.sub.j is the maximum a posteriori value for y.sub.j and σ.sub.y.sub.j expresses the confidence in the prediction. Paragraph 0143; With EI, the parameterization is selected using both the average performance predicted by the GP model and also the confidence in its prediction);
	comparing the at least one performance characteristic of the candidate neural network against a current performance baseline (Paragraph 0018; the current performance baseline is a collection of candidate ANNs having the most ideal performance characteristic(s). The steps (f)-(l) may be repeated until a model having the highest performance on the dataset has a performance greater than or equal to a predetermined performance threshold); and
	when the at least one performance characteristic exceeds the current performance baseline, using a predetermined training dataset for training and testing the candidate neural network to identify the at least one suitable neural network. (Paragraph 0018; a model having the highest performance on the dataset has a performance greater than or equal to a predetermined performance threshold. Paragraph 0114; At block 410, the highest performing model k* is trained on the received dataset using, for example, the training process described below in conjunction with FIG. 7. Paragraph 0159; FIG. 7 is a flowchart of a model training process 700 for use within the system of FIG. 1 and, more specifically, within the ICRT routine 400 of FIG. 4 and/or the hybrid optimization process 500 of FIG. 5. The process 700 can be used to train a single model on a given dataset).
	
Regarding claim 4, Drevo teaches the method of claim 1, and Drevo also teaches:
	wherein predicting the at least one performance characteristic comprises predicting an average error and at least one of a computation time, a latency, an energy efficiency, an implementation cost, and a computational complexity of the candidate neural network. (Paragraph 0045; the system 100 can store many aspects of the model exploration search process: model training times, measures of predictive power, average performance for evaluation, training time, number of features, baselines, and comparative performance among methodologies. Paragraph 0136; The GP modeling is used to model the relationship between the continuous tunable parameters for the hyperpartition and the performance metric. Paragraph 0145; The time cost for training t.sub.y.sub.j may be determined from, or estimated by, the elapsed time attribute 208o within the performance table 106d. ).

Regarding claim 6, Drevo teaches the method of claim 1, and Drevo also teaches:
	wherein the at least one performance characteristic is compared against the current performance baseline comprising a current Pareto-optimal front composed of one or more performance characteristics of one or more previous candidate neural networks. (Paragraph 0094; Within this hyperpartition, the system 100 can optimize for the parameters “Epochs” (node 332), “Learn Rate” (node 326), “Pretrain Learn Rate” (node 328), “Learn Rate Decay” (node 324), and “Layer 1 Size” (node 334). Paragraph 0104; At block 424, for continuous and discrete (i.e., optimizable) parameters and hyperparameters, a feasible step size is chosen to derive the possible modeling possibilities.).

Regarding claim 9, Drevo teaches the method of claim 1, and Drevo also teaches:
further comprising iteratively performing the steps of claim 1 until an iteration limit is attained. (Paragraph 0115; The correlate-and-train processing of blocks 404-410 is repeated until certain termination criteria are reached (block 412). The termination criteria can include whether desired performance is reached, whether a computational or time-based budget (or “deadline”) is met, or any other suitable criteria.).

Regarding Claim 11, it is substantially similar to Claim 1 and is rejected in the same manner, the same art and reasoning applying. Further, Drevo also teaches:
	A system for identifying at least one neural network suitable for a given application, comprising: a processing unit; and a non-transitory computer-readable memory communicatively coupled to the processing unit and comprising computer-readable program instructions executable by the processing unit for (Paragraph 0162; FIG. 8 shows an illustrative computer or other processing device 800 that can perform at least part of the processing described herein. In some embodiments, the system 100 of FIG. 1 includes one or more processing devices 800, or portions thereof. The illustrative processing device 800 includes a processor 802, a volatile memory 804, a non-volatile memory 806 (e.g., hard disk), an output device 808 and a graphical user interface (GUI) 810 (e.g., a mouse, a keyboard, a display, for example), each of which is coupled together by a bus 818. The non-volatile memory 806 stores computer instructions 812, an operating system 814, and data 816. In one example, the computer instructions 812 are executed by the processor 802 out of volatile memory 804. In one embodiment, an article 580 comprises non-transitory computer-readable instructions.).

Regarding Claim 14, it is substantially similar to Claim 4 without any further teachings, and is rejected in the same manner, the same art and reasoning applying.

Regarding Claim 16, it is substantially similar to Claim 6 without any further teachings, and is rejected in the same manner, the same art and reasoning applying.

Regarding Claim 19, it is substantially similar to Claim 9 without any further teachings, and is rejected in the same manner, the same art and reasoning applying.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 2, 5, 7,10, 12, 15, 17 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Drevo (US 20160132787), in view of Teng, et al. ("An automated design system for finding the minimal configuration of a feed-forward neural network," Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94), 1994).


Teng, however, does teach:
	wherein the at least one performance characteristic of the candidate neural network is predicted using a modelling neural network (Section 3. Learning to Predict Relative Convergence Times; we develop a comparator artificial neural network (CANN) that takes into consideration these factors. Our CANN takes as inputs partial training behavior of two ANNs, and predicts which ANN will lead to a smaller configuration when training is completed.).
Further, it would have been obvious to one of ordinary skill in the art, prior to the effective filing date, to combine the method of Drevo with the method of Teng because it is difficult to predict the exact number of hidden units required when the CAS algorithm terminates, our system compares two partially trained ANNs and predicts which one will converge with a smaller number of hidden units relative to the other (Abstract) and there are infinitely many different configuration (Section1 Introduction).

Regarding Claim 5, Drevo teaches the method of claim 4, but does not explicitly teach wherein predicting the at least one performance characteristic comprises using a multi-layer perceptron (MLP) model to model a response surface relating the candidate set of neural network parameters to the average error.
Teng, however, does teach:
(Section 1. Introduction; Our CANN takes as inputs partial training behavior of two ANNs, and predicts which ANN will lead to a smaller configuration when training is completed. In training the CANN, we assume that we have determined ahead of time the complete training error behavior of a number of ANNs for a given application. Consequently, errors in prediction can be used to adjust the weights of the CANN.) 
to model a response surface relating the candidate set of neural network parameters to the average error. (Section 3. Learning to Predict Relative Convergence Times Using a Comparator Neural Network; To overcome this difficulty, we develop a CANN whose inputs consist of TSSE traces from two partially trained ANNs that are filtered by low-pass filters with different cut-off frequencies and extrapolated by different methods, and whose output predicts which of the two ANNs will converge faster.).
Further, it would have been obvious to one of ordinary skill in the art, prior to the effective filing date, to combine the method of Drevo with the method of Teng because errors in prediction can be used to adjust the weights of the CANN (Section 1. Introduction) and predicts which of the two ANNs will converge faster (Section 3. Learning to Predict Relative Convergence Times Using a Comparator Neural Network)

Regarding Claim 7, Modified Drevo teaches the method of claim 2, but does not explicitly teach further comprising, when the at least one performance characteristic exceeds the 
Teng, however, does teach:
further comprising, when the at least one performance characteristic exceeds the current performance baseline, updating the modelling neural network based on the candidate neural network, (Section 3. Learning to Predict Relative Convergence Times These processed traces, together with their actual convergence times, are then used to train a CANN, which predicts for any two partially trained ANNs, which one will converge faster. Since we know the exact convergence times of these traces, errors in prediction can be used to update the weights of the CANN.);
comprising retraining the modelling neural network with at least one actual performance characteristic obtained upon testing the candidate neural network and with one or more performance characteristics obtained upon testing one or more previous candidate neural networks. (Section 4. Experimental Results; Using the normalized training patterns, we then trained a CANN to differentiate between any two training patterns which one will have a smaller convergence time (see Figure 2). The configuration of each subnet in the CANN is 9-15-1. We stopped training when we reached 80% accuracy.).
	Further, it would have been obvious to one of ordinary skill in the art, prior to the effective filing date, to combine the method of Drevo with the method of Teng because (Section 4. Experimental Results).

Regarding claim 8, Drevo teaches the method of claim 1, but does not explicitly teach further comprising, when the at least one performance characteristic does not exceed the current performance baseline, discarding the candidate neural network.
Teng, however, does teach:
further comprising, when the at least one performance characteristic does not exceed the current performance baseline, discarding the candidate neural network. (Section 2. Population-Based Learning System for Designing Neural Networks; the candidate ANN is pruned, and the Heuristics Manager generates a new ANN.).
	Further, it would have been obvious to one of ordinary skill in the art, prior to the effective filing date to combine the method of Drevo with the method of Teng because it can be used to prune unpromising ANNs before they are trained to convergence (Section 1. Introduction).

Regarding Claim 10, Drevo teaches the method of claim 1, but does not explicitly teach further comprising: comparing at least one actual performance characteristic of the candidate neural network against the current performance baseline, the at least one actual performance characteristic obtained upon testing the candidate neural network; and when the at least one actual performance characteristic exceeds the current performance baseline, updating the current performance baseline to include the at least one performance characteristic.

further comprising: comparing at least one actual performance characteristic of the candidate neural network against the current performance baseline, the at least one actual performance characteristic obtained upon testing the candidate neural network; and when the at least one actual performance characteristic exceeds the current performance baseline, updating the current performance baseline to include the at least one performance characteristic. (Section 2. Population-Based Learning System for Designing Neural Networks; selects and trains one promising ANN for a quantum using a point-based method, updates performance obtained at the end of the quantum, generates new ANNs when none of the existing ANNs is promising, and discards an existing ANN when it is found to be inferior.).
Further, it would have been obvious to one of ordinary skill in the art, prior to the effective filing date, to combine the method of Drevo with the method of Teng because it can be used to prune unpromising ANNs before they are trained to convergence and To maximize the number of different ANNs considered without having to train each to completion (Section 1. Introduction).

Regarding Claim 12, it is substantially similar to Claim 2 without any further teachings, and is rejected in the same manner, the same art and reasoning applying.

Regarding Claim 15, it is substantially similar to Claim 5 without any further teachings, and is rejected in the same manner, the same art and reasoning applying.



Regarding Claim 18, it is substantially similar to Claim 8 without any further teachings, and is rejected in the same manner, the same art and reasoning applying.

Regarding Claim 20, it is substantially similar to Claim 10 without any further teachings, and is rejected in the same manner, the same art and reasoning applying.

Claims 3, and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Drevo (US 20160132787), in view of Ravindran (US 20160259994)

Regarding claim 3, Drevo teaches the method of claim 1, but does not explicitly teach wherein the candidate set of neural network parameters comprises at least one of a number of layers, a number of nodes per layer, a convolution kernel size, a maximum pooling size, a type of activation function, and a network training rate.
Ravindran, however, does teach:
	wherein the candidate set of neural network parameters comprises at least one of a number of layers, a number of nodes per layer, a convolution kernel size, a maximum pooling size, a type of activation function, and a network training rate (Paragraph 0010; The candidate architecture may include a number of convolution layers and subsampling layers and a classifier type. The candidate parameters may include a learning rate, a batch size, a maximum number of training epochs, an input image size, a number of feature maps at every layer of the CNN, a convolutional filter size, a sub-sampling pool size, a number of hidden layers, a number of units in each hidden layer, a selected classifier algorithm, and a number of output classes.).
	Further, it would have been obvious to one of ordinary skill in the art, prior to the effective filing date, to combine the method of Drevo with the method of Ravindran because CNNs, however, include many functional components, which make it difficult to determine the necessary network architecture that performs accurately to detect and classify particular features of images relevant for a problem in hand. Furthermore, each component of the CNN typically has a multitude of parameters associated with it. The specific values of those parameters necessary for a successful and accurate image classification are not known a priori without any application of a robust image processing system (Paragraph 0010).

Regarding Claim 13, it is substantially similar to Claim 3 without any further teachings, and is rejected in the same manner, the same art and reasoning applying.

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to PHAT MINH DANG whose telephone number is (571)272-8665. The examiner can normally be reached Monday - Friday 7:30am - 5:30pm EST.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached on (571) 272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/P.M.D./Examiner, Art Unit 2121                                                                                                                                                                                                        

/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121